Methods of mathematical statistics


1. Introduction

Mathematical statistics is a science that develops methods for obtaining, describing and processing experimental data in order to study the patterns of random mass phenomena.

In mathematical statistics, two areas can be distinguished: descriptive statistics and inductive statistics (statistical inference). Descriptive statistics is concerned with the accumulation, systematization and presentation of experimental data in a convenient form. Inductive statistics based on these data allows one to draw certain conclusions about the objects about which data is collected, or estimates of their parameters.

Typical areas of mathematical statistics are:

1) sampling theory;

2) theory of estimates;

3) testing statistical hypotheses;

4) regression analysis;

5) analysis of variance.

Mathematical statistics is based on a number of basic concepts without which it is impossible to study modern methods of processing experimental data. Among the first of them is the concept of the general population and sample.

In mass industrial production, it is often necessary, without checking each manufactured product, to establish whether the product quality meets standards. Since the number of manufactured products is very large or the verification of products is associated with rendering it unusable, a small number of products are checked. On the basis of this check, a conclusion must be made on the entire product series. Of course, one cannot say that all transistors from a batch of 1 million pieces are good or bad by checking one of them. On the other hand, since the sampling process for testing and the testing itself can be time-consuming and costly, the scope of product verification should be such that it can provide a reliable representation of the entire batch of products, while maintaining a minimum size. For this purpose, we will introduce a number of concepts.

The whole set of studied objects or experimental data is called the general population. We will denote by N the number of objects or the amount of data that make up the general population. The value N is called the volume of the general population. If N \u003e\u003e 1, that is, N is very large, then N \u003d ¥ is usually considered.

A random sample or simply a sample is a part of the general population, randomly selected from it. The word "at random" means that the probabilities of choosing any object from the general population are the same. This is an important assumption, however, it is often difficult to test it in practice.

The sample size is the number of objects or the amount of data that make up the sample, and is n ... In what follows, we will assume that the elements of the sample can be assigned, respectively, the numerical values \u200b\u200bx 1, x 2, ... x n. For example, in the process of quality control of manufactured bipolar transistors, this can be measurements of their DC gain.


2. Numerical characteristics of the sample

2.1 Sample mean

For a specific sample of size n, its sample mean

is determined by the ratio

where x i is the value of the sample elements. Usually, you want to describe the statistical properties of random samples, and not one of them. This means that a mathematical model is being considered, which assumes a sufficiently large number of samples of size n. In this case, the sample elements are considered as random variables X i, taking values \u200b\u200bx i with the probability density f (x), which is the probability density of the general population. Then the sample mean is also a random variable

equal

As before, we will denote random variables by capital letters, and the values \u200b\u200bof random variables - by lowercase.

The average value of the general population from which the sample is made will be called the general average and denoted by m x. It can be expected that if the sample size is significant, then the sample mean will not differ significantly from the general mean. Since the sample mean is a random variable, the mathematical expectation can be found for it:

Thus, the mathematical expectation of the sample mean is equal to the general mean. In this case, the sample mean is said to be the unbiased estimate of the general mean. We will come back to this term later. Since the sample mean is a random variable that fluctuates around the general mean, it is desirable to estimate this fluctuation using the variance of the sample mean. Consider a sample whose size n is significantly less than the size of the general population N (n<< N). Предположим, что при формировании выборки характеристики генеральной совокупности не меняются, что эквивалентно предположению N = ¥. Тогда

Random variables X i and X j (i¹j) can be considered independent, therefore,

Substitute this result into the variance formula:

where s 2 is the variance of the general population.

It follows from this formula that with an increase in the sample size, fluctuations of the sample mean around the general average decrease as s 2 / n. Let us illustrate this with an example. Let there be a random signal with mathematical expectation and variance, respectively, equal to m x \u003d 10, s 2 \u003d 9.

The signal samples are taken at equidistant times t 1, t 2, ...,

X (t)

X 1

t 1 t 2. ... ... t n t

Since the samples are random variables, we will denote them by X (t 1), X (t 2),. ... ... , X (t n).

Let us determine the number of counts so that the standard deviation of the estimate of the mathematical expectation of the signal does not exceed 1% of its mathematical expectation. Since m x \u003d 10, it is necessary that

On the other hand, therefore, or From this we obtain that n ³ 900 samples.

2.2 Sample variance

For sample data, it is important to know not only the sample mean, but also the spread of sample values \u200b\u200baround the sample mean. If the sample mean is an estimate of the general mean, then the sample variance should be an estimate of the general variance. Sample variance

for a sample consisting of random variables is determined as follows

Using this representation of the sample variance, we find its mathematical expectation

(E.P. Vrublevsky, O.E. Likhachev, L.G. Vrublevskaya)

Applying certain methods in the study, in the end the experimenter receives a larger or smaller set of various numerical indicators designed to characterize the phenomenon under study. But without systematization and proper processing of the results obtained, without a deep and comprehensive analysis of the facts, it is not possible to extract the information contained in them, discover patterns, and draw well-grounded conclusions. The most elementary and quite accessible methods of mathematical processing of the results given in the text are demonstrative in nature. This means that the examples illustrate the application of one or another mathematical and statistical method, and do not give its detailed interpretation.

Average values \u200b\u200band indicators of variation. Before talking about more significant things, it is necessary to understand such statistical concepts as the general and sample population. A group of numbers united by any sign is called a collection . Observations carried out over some objects can cover all members of the studied population, without exception, or be limited to examining only a certain part of it. In the first case, the observation will be called continuous, or complete, in the second - partial, or selective. A complete survey is very rarely carried out, since for a number of reasons it is practically either impracticable or impractical. So, it is impossible, for example, to examine all the masters of sports in athletics. Therefore, in the overwhelming majority of cases, instead of continuous observation, some part of the surveyed population is subjected to study, by which its condition as a whole is judged.

The population from which a part of its members is selected for joint study is called general, and the part of this population selected in one way or another is called the sample population or simply the sample. It should be clarified that the concept of the general population is relative. In one case, these are all athletes, and in the other - cities, universities. So, for example, the general population can be all university students, and the sample can be students of the football specialization. The number of objects in any population is called the volume (the size of the general population is denoted by N, and the sample size is n).

It is assumed that the sample with due reliability represents the general population only if its elements are selected from the general population in a non-tendentious manner. There are several ways for this: selection of a sample in accordance with a table of random numbers, dividing the general population into a number of non-overlapping groups, when a certain number of objects are selected from each, etc.


As for the sample size, in accordance with the basic provisions of mathematical statistics, the sample is the more representative (more representative), the more complete it is. The researcher, striving for the profitability of his work, is interested in the minimum sample size, and in such a situation the number of objects selected in the sample is the result of a compromise solution. To know to what extent the sample is sufficiently reliable to represent the general population, it is necessary to determine a number of indicators (parameters).

Calculating the arithmetic meanThe arithmetic mean of the sample characterizes the average level of the values \u200b\u200bof the studied random variable in the observed cases and is calculated by dividing the sum of the individual values \u200b\u200bof the studied attribute by the total number of observations:

, (1)

where x i - row variant;

n is the volume of the population.

The sum Σ is used to denote the summation of those data that are to the right of it. The lower and upper indices Σ indicate at what number the addition should begin and with what indices to end it. So, means that it is necessary to add all x having ordinal numbers from 1 to p... The sign shows the summation of all x from the first to the last indicator.

Thus, calculations by formula (1) assume the following procedure:

1. Sum all received x i, that is,

2. Found amount - divided by the population size p.

For convenience and clarity of work with indicators, it is necessary to draw up a table, since they are subject to addition x i iterated over from the first to the last number.

For example, the arithmetic mean is determined by the formula:

The measurement results are shown in Table 1.

Table 1

Athlete testing results

The data obtained as a result of the experiment is characterized by variability, which can be caused by a random error: the error of the measuring device, the heterogeneity of the samples, etc. After conducting a large amount of homogeneous data, the experimenter needs to process them in order to extract the most accurate information about the value under consideration. For processing large arrays of measurement data, observations, etc., which can be obtained during an experiment, it is convenient to use methods of mathematical statistics.

Mathematical statistics is inextricably linked with the theory of probability, but there is a significant difference between these sciences. Probability theory uses the already known distributions of random variables, on the basis of which the probabilities of events, the mathematical expectation, etc. are calculated. The problem of mathematical statistics - to obtain the most reliable information about the distribution of a random variable based on experimental data.

Typical directions mathematical statistics:

  • sampling theory;
  • theory of estimates;
  • testing statistical hypotheses;
  • regression analysis;
  • analysis of variance.

Methods of mathematical statistics

Methods for evaluating and testing hypotheses are based on probabilistic and hyper-random models of data origins.

Mathematical statistics estimates the parameters and functions from them, which represent important characteristics of distributions (median, mathematical expectation, standard deviation, quantiles, etc.), density and distribution functions, etc. Point and interval estimates are used.

Modern Mathematical Statistics contains a large section - statistical sequential analysis, in which it is allowed to form an array of observations by one array.

Mathematical statistics also contains general hypothesis testing theory and a large number of methods for testing specific hypotheses (for example, about the symmetry of the distribution, about the values \u200b\u200bof parameters and characteristics, about the agreement of the empirical distribution function with a given distribution function, the hypothesis for testing homogeneity (coincidence of characteristics or distribution functions in two samples), etc.).

By conducting sample surveysconnected with the construction of adequate methods for evaluating and testing hypotheses, with the properties of different sampling schemes, the section of mathematical statistics is of great importance. Methods of mathematical statistics directly uses the following basic concepts.

Sample

Definition 1

Sampling the data obtained during the experiment are called.

For example, the results of the range of a bullet when firing the same or a group of similar guns.

Empirical distribution function

Remark 1

Distribution function makes it possible to express all the most important characteristics of a random variable.

In mathematical statistics there is a concept theoretical (not known in advance) and empirical distribution functions.

The empirical function is determined from experience data (empirical data), i.e. by sample.

bar chart

Histograms are used for a visual, but rather approximate, representation of an unknown distribution.

bar chart is a graphical representation of the distribution of data.

To obtain a high-quality histogram, adhere to the following rules:

  • The number of elements in the sample should be significantly less than the sample size.
  • The split intervals must contain a sufficient number of sample items.

If the sample is very large, the interval of sample elements is often divided into equal parts.

Sample mean and sample variance

Using these concepts, it is possible to obtain an estimate of the necessary numerical characteristics of an unknown distribution without resorting to the construction of a distribution function, histogram, etc.

RANDOM VALUES AND LAWS OF THEIR DISTRIBUTION.

Random is called a value that takes on values \u200b\u200bdepending on the coincidence of random circumstances. Distinguish discrete and random continuous magnitudes.

Discretea quantity is called if it takes a countable set of values. ( Example:the number of patients at the doctor's appointment, the number of letters on the page, the number of molecules in a given volume).

Continuousis a quantity that can take values \u200b\u200bwithin a certain interval. ( Example: air temperature, body weight, human height, etc.)

Distribution law A random variable is a set of possible values \u200b\u200bof this quantity and, corresponding to these values, probabilities (or frequencies).

PRI me R:

x x 1 x 2 x 3 x 4 ... x n
p p 1 p 2 p 3 p 4 ... p n
x x 1 x 2 x 3 x 4 ... x n
m m 1 m 2 m 3 m 4 ... m n

NUMERICAL CHARACTERISTICS OF RANDOM VALUES.

In many cases, along with the distribution of a random variable or instead of it, information about these quantities can be provided by numerical parameters, called numerical characteristics of a random variable ... The most common ones:

1 .Expected value - (average value) of a random variable is the sum of the products of all its possible values \u200b\u200bby the probabilities of these values:

2 .Dispersion random variable:


3 .Mean square deviation :

Rule "THREE SIGMA" - if a random variable is distributed according to the normal law, then the deviation of this value from the mean value in absolute value does not exceed three times the standard deviation

GAUSS LAW - THE NORMAL LAW OF DISTRIBUTION

Often there are quantities distributed over normal law (Gauss's law). main feature : it is the limiting law to which other distribution laws approach.

A random variable is distributed according to the normal law if its probability density looks like:



M (X)- mathematical expectation of a random variable;

sis the standard deviation.

Probability density (distribution function) shows how the probability changes relative to the interval dx a random variable, depending on the value of the quantity itself:


BASIC CONCEPTS OF MATHEMATICAL STATISTICS

Math statistics - a section of applied mathematics directly related to the theory of probability. The main difference between mathematical statistics and probability theory is that in mathematical statistics, it is not actions on the distribution laws and numerical characteristics of random variables that are considered, but approximate methods for finding these laws and numerical characteristics based on the results of experiments.

Basic concepts mathematical statistics are:

1. General population;

2. sample;

3. variation range;

4. fashion;

5. median;

6. percentile,

7. frequency polygon,

8. bar chart.

General population- a large statistical population, from which some of the objects are selected for research

(Example: the entire population of the region, students of universities of a given city, etc.)

Sample (sample population) - a set of objects selected from the general population.

Variational series- statistical distribution, consisting of a variant (values \u200b\u200bof a random variable) and the corresponding frequencies.

Example:

X, kg
m

x - value of a random variable (weight of girls aged 10 years);

m- frequency of occurrence.

Fashion - the value of a random variable, which corresponds to the highest frequency of occurrence. (In the example above, the mod corresponds to the value of 24 kg, it is more common than others: m \u003d 20).

Median - the value of a random variable that divides the distribution in half: half of the values \u200b\u200bare located to the right of the median, half (no more) - to the left.

Example:

1, 1, 1, 1, 1. 1, 2, 2, 2, 3 , 3, 4, 4, 5, 5, 5, 5, 6, 6, 7 , 7, 7, 7, 7, 7, 8, 8, 8, 8, 8 , 8, 9, 9, 9, 10, 10, 10, 10, 10, 10

In the example, we observe 40 values \u200b\u200bof a random variable. All values \u200b\u200bare listed in ascending order based on their frequency of occurrence. You can see that to the right of the highlighted value 7 are 20 (half) of 40 values. Therefore, 7 is the median.

To characterize the scatter, we find the values \u200b\u200bthat did not exceed 25 and 75% of the measurement results. These values \u200b\u200bare called 25th and 75th percentiles ... If the median halves the distribution, then the 25th and 75th percentiles are cut off by a quarter. (By the way, the median itself can be considered the 50th percentile.) As you can see from the example, the 25th and 75th percentiles are equal to 3 and 8, respectively.

Use discrete (point) statistical distribution and continuous (interval) statistical distribution.

For clarity, statistical distributions are shown graphically in the form frequency polygon or - histograms .

Frequency polygon- polyline, segments of which connect points with coordinates ( x 1, m 1), (x 2, m 2), ..., or for polygon of relative frequencies - with coordinates ( x 1, p * 1), (x 2, p * 2), ... (Fig. 1).


m m i / n f (x)

Fig. 1 Fig. 2

Frequency histogram- a set of adjacent rectangles built on one straight line (Fig. 2), the bases of the rectangles are the same and equal dx , and the heights are equal to the ratio of the frequency to dx , or r * to dx (probability density).

Example:

x, kg 2,7 2,8 2,9 3,0 3,1 3,2 3,3 3,4 3,5 3,6 3,7 3,8 3,9 4,0 4,1 4,2 4,3 4,4
m

Frequency polygon

The ratio of the relative frequency to the width of the interval is called probability density f (x) \u003d m i / n dx \u003d p * i / dx

Example of plotting a histogram .

Let's use the data from the previous example.

1. Calculation of the number of class intervals

where n - number of observations. In our case n = 100 ... Hence:

2. Calculation of the interval width dx :

,

3. Drawing up an interval series:

dx 2.7-2.9 2.9-3.1 3.1-3.3 3.3-3.5 3.5-3.7 3.7-3.9 3.9-4.1 4.1-4.3 4.3-4.5
m
f (x) 0.3 0.75 1.25 0.85 0.55 0.6 0.4 0.25 0.05

bar chart

Methods of mathematical statistics are used, as a rule, at all stages of the analysis of research materials for choosing a strategy for solving problems based on specific sample data, evaluating the results. Methods of mathematical statistics were used to process the material. Mathematical processing of materials makes it possible to clearly identify and evaluate quantitative parameters of objective information, analyze and present them in various ratios and dependencies. They allow you to determine the measure of variation in the values \u200b\u200bin the collected materials containing quantitative information about a certain set of cases, some of which confirm the alleged links, and some do not reveal them, calculate the reliability of quantitative differences between the selected sets of cases, and obtain other mathematical characteristics necessary for the correct interpretation of the facts. ... The reliability of the differences obtained during the study was determined by the Student's t-test.

The following values \u200b\u200bwere calculated.

1. The arithmetic mean of the sample.

Characterizes the average value of the considered population. Let's mark the results of measurements. Then:

where Y is the sum of all values \u200b\u200bwhen the current index i changes from 1 to n.

2. The standard deviation (standard deviation) characterizing the dispersion, the dispersion of the considered population relative to the arithmetic mean.

\u003d (x max - x min) / k

where is the standard deviation

хmaх is the maximum value of the table;

хmin is the minimum value of the table;

k - coefficient

3. Standard error of arithmetic mean or error of representativeness (m). The standard error of the arithmetic mean characterizes the degree of deviation of the sample arithmetic mean from the arithmetic mean of the general population.

The standard error of the arithmetic mean is calculated by the formula:

where y is the standard deviation of the measurement results,

n is the sample size. The smaller m, the higher the stability and sustainability of the results.

4. Student's criterion.

(in the numerator - the difference between the means of the two groups, in the denominator - the square root of the sum of the squares of the standard errors of these means).

When processing the results of the study, a computer program with an Excel package was used.

Organization of research

The study was carried out by us according to generally accepted rules, and was carried out in 3 stages.

At the first stage, the received material on the considered research problem was collected and analyzed. The subject of scientific research was formed. The analysis of the literature at this stage made it possible to specify the purpose and objectives of the study. The primary testing of the technique of running at 30 meters was carried out.<... class="gads_sm">

At the third stage, the material obtained as a result of scientific research was systematized, all available information on the research problem was generalized.

The experimental study was carried out on the basis of the State Educational Institution "Lyakhovichi Secondary School", in total, the sample consisted of 20 students in grades 6 (11-12 years old).

Chapter 3. Analysis of research results

As a result of the pedagogical experiment, we identified the initial level of the 30 m running technique among students in the control and experimental groups (Appendices 1-2). Statistical processing of the results obtained made it possible to obtain the following data (table 6).

Table 6. Initial level of running quality

As can be seen from Table 6, the average number of points among athletes in the control and experimental groups does not differ statistically, in the experimental group the average score was 3.6 points, and in the control group it was 3.7 points. T-test in both groups temp \u003d 0.3; P? 0.05, at tcrit \u003d 2.1; The results of the initial testing showed that the indicators are independent of training and are random in nature. According to the initial testing, the running quality indicators in the control group were slightly higher than those in the experimental group. But there were no statistically significant differences in the groups, which is proof of the identity of the students in the control and experimental groups in the technique of running 30m.

During the experiment in both groups, the indicators that characterize the effectiveness of the running technique improved. However, this improvement was different in different groups of participants in the experiment. As a result of training, a regular small increase in indicators was revealed in the control group (3.8 points). As can be seen from Appendix 2, a large increase in indicators was revealed in the experimental group. The students studied according to the program we proposed, which significantly improved the indicators.

Table 7. Changes in running quality among the subjects of the experimental group

During the experiment, we found that increased loads in the experimental group gave significant improvements in the development of quickness than in the control group.

In adolescence, it is advisable to develop speed through the predominant use of physical education tools aimed at increasing the frequency of movements. At the age of 12-15, speed abilities increase, as a result of the use of mainly speed-strength and strength exercises, which we used in the process of conducting physical culture lessons and extracurricular activities in the sports section of basketball and athletics.

During the lessons in the experimental group, strict stages of complication and motor experience were followed. Errors were corrected in a timely manner. As the analysis of the actual data showed, the experimental teaching method had a significant change in the quality of the running technique (temp \u003d 2.4). The analysis of the results obtained in the experimental group and their comparison with the data obtained in the control group using the generally accepted teaching methodology give grounds to assert that the proposed methodology will increase the effectiveness of teaching.

Thus, at the stage of improving the 30m running methodology at school, we identified the dynamics of changes in testing indicators in the experimental and control groups. After the experiment, the quality of the technique increased in the experimental group to 4.9 points (t \u003d 3.3; P? 0.05). By the end of the experiment, the quality of running technique in the experimental group was higher than in the control group.


Close