47. Mathematical statistics, its methods. The main stages of statistical work.

What is the method of mathematical statistics. Methods of mathematical statistics. pie chart - graphic representation of data frequencies for nominal classes using sectors of a circle, the areas of which are directly proportional to the frequencies of the classes

47. Mathematical statistics, its methods. The main stages of statistical work.

What is the method of mathematical statistics. Methods of mathematical statistics. pie chart - graphic representation of data frequencies for nominal classes using sectors of a circle, the areas of which are directly proportional to the frequencies of the classes

47. Mathematical statistics, its methods. The main stages of statistical work.

48. General population and sample. Sampling methods

48. General population and sample. Sampling methods

"English" by Vereshchagina - Textbook for in-depth study of the language

English-Russian visual dictionary for children

Online tests for computer science All options for computer science oge

Thematic tests to prepare for the exam in chemistry Practice tests in chemistry exam dobrotin

The Earth once looked like an alien place!

48. General population and sample. Sampling methods

Similar articles

7. Basic hardware configuration of a personal computer. System unit: concepts, types. The internal structure of the system unit.

8. Meter's board of a computer: concept, purpose, characteristics, logic circuits.

9. The structure and main characteristics of the processor as the main microcircuit of the computer. Communication of the processor with other devices. Components of the main line of the computer.

10. Internal computer memory: RAM and cache memory, ROM chip and bios system, non-volatile memory cmos. External storage media and devices.

11. Design, principle of operation, basic parameters of the hard disk.

1. Data transfer protocol.

12. Classification of input and output devices, ports of the computer for connecting peripheral devices.

13. Types and basic user characteristics of modern monitors.

14. Printers: concept, purpose, types, principles of work.

15. Keyboard: groups of keys, assignment of keys.

16. Types, principle of operation, adjustable parameters of the mouse. Add. Comp-pa devices: modem, TV tuner, sound card.

17. Concept and structure of personal computer software.

18. Purpose, types, leading functions of the PC operating system. The main components of the operating system: kernel, interface, device drivers.

19. Concept and types of files. The file structure of the computer. Maintenance of the file structure of a personal computer.

20. Applied software: concept, meaning, structure, types, programs.

21. Purpose and types of programming languages. Components of the programming system.

22. Purpose and classification of service software.

23. Computer virus. Signs of a viral infection.

24. Classification of viruses.

25. Types of antivirus programs. Measures to protect computers from viruses.

26. The concept of archiving. Information compression methods and formats. Basic ideas of algorithms rle, Lempel-Ziv, Huffman.

27. Database. Classification. Database models. Advantages and disadvantages.

28. Subd. Views. Basic principles of creation.

29. Automated workstation of a medical specialist. Purpose, basic requirements and development principles.

30. The set of tasks solved with the help of the arm and the main directions of the use of automated workstations by medical personnel.

31. Structural components and functional modules of automated workstations of medical workers. Classification of automated workplaces for employees of medical organizations.

32. Knowledge as the basis for the functioning of expert systems. Concept, properties and types of knowledge.

33. Expert system: concept, purpose and structural components. The main stages of the development of an expert system

34. Basic functions of expert systems and requirements for the operation of medical expert systems.

35. Modes of functioning and types of modern expert systems. Expert system and specialist: comparative advantages and disadvantages

36. The concept of a computer network. Basic requirements for modern computer networks

37. The main components of a computer network

38. Classification of computer networks. Topology ks. Views. Advantages and disadvantages.

39. Global Internet. History of creation. General characteristics of the Internet. Packet switching principle

40. Internet Protocol. Network capabilities. "The World Wide Web". Html language.

41. Telemedicine, tasks of telemedicine. The history of development. The main directions of telemedicine

42. Subject, goals and objectives of medical informatics. Types of medical information

43. Classification of medical information systems (MIS). Mission tasks

44. Information technology. Information Systems

45. Types of technological information medical systems. Mis development levels

46. \u200b\u200bThe history of the development of computers. Generations of computers. The current stage of development of computing technology and its prospects

47. Mathematical statistics, its methods. The main stages of statistical work.

48. General population and sample. Sampling methods

49. Variational series and its visual representation. Building a histogram (algorithm)

50. Characteristics of the statistical distribution: characteristics of the position; shape characteristics; scattering characteristics.

51. Estimation of the parameters of the general population. Point and interval estimation. Confidence interval. Significance level

52. Analysis of variance. Factor grading and analysis. The simplest scheme of variation with differences in one factor

53. Analysis of variance. Working formula for calculating mean squares

54. Calculation of the f-criterion to determine the influence of the factor under study. Quantification of the influence of individual factors.

55. The concept of correlation. Functional and correlation dependence. Scatter plots.

56. Coefficient of correlation and its properties.

57. Regression analysis. Linear regression

58. Rows of dynamics. Time series concept. Row types. Defining a trend

59. Time series alignment: moving average method

60. Time series alignment: method of least squares

61. Time series alignment: period lengthening method

62. Analysis of time series. Chronological average. The absolute increase in the number. Growth rate

63. Analysis of time series. Chronological average. Growth rate. Rate of increase

Mathematical statistics is a scientific discipline, the subject of which is the development of methods for registration, description and analysis of statistical experimental data obtained as a result of observations of mass random phenomena.

The main tasks mathematical statistics are:

determination of the distribution law of a random variable or a system of random variables;

testing the plausibility of hypotheses;

determination of unknown distribution parameters.

All methods of mathematical statistics are based on the theory of probability. However, due to the specificity of the problems being solved, mathematical statistics stands out from the theory of probability into an independent field. If in the theory of probability the model of the phenomenon is considered to be a given and the possible real course of this phenomenon is calculated (Fig. 1), then in mathematical statistics a suitable theoretical-probabilistic model is selected based on the statistical data (Fig. 2).

Fig. 1. General Problem of Probability Theory

Fig. 2. General problem of mathematical statistics

As a scientific discipline, mathematical statistics developed along with the theory of probability. The mathematical apparatus of this science was built in the second half of the 19th century.

The main stages of statistical work.

Any statistical study consists of 3 main stages:

collection is a massive scientifically-organized observation, through which primary information is obtained about individual facts (units) of the phenomenon under study. This statistical accounting of a large number or all of the units included in the studied phenomenon is an information base for statistical generalizations, for formulating conclusions about the studied phenomenon or process;

grouping and summary. These data are understood as the distribution of a set of facts (units) into homogeneous groups and subgroups, the final count for each group and subgroup and the presentation of the results obtained in the form of a statistical table;

processing and analysis. Statistical analysis concludes the stage of statistical research. It contains the processing of statistical data that were obtained during the summary, the interpretation of the results obtained in order to obtain objective conclusions about the state of the studied phenomenon and the patterns of its development.

General population (in English - population) - the totality of all objects (units), relative to which the scientist intends to draw conclusions when studying a specific problem.

The general population consists of all objects that are subject to study. Structure the general population depends on the objectives of the study. Sometimes the general population is the entire population of a certain region (for example, when the attitude of potential voters to a candidate is studied), most often several criteria are set that determine the object of research. For example, men 30-50 years old who use a certain brand of razor at least once a week, and have an income of at least $ 100 per family member.

Sample or sample population - a set of cases (subjects, objects, events, samples), using a certain procedure, selected from the general population to participate in the study.

Sample characteristics:

Qualitative characteristics of the sample - who exactly we choose and what methods of constructing the sample we use for this

Quantitative characteristics of the sample - how many cases we select, in other words, the sample size.

Need for sampling

The research object is very extensive. For example, consumers of the products of a global company are a huge number of geographically dispersed markets.

There is a need to collect primary information.

Sample size

Sample size - the number of cases included in the sample. For statistical reasons, it is recommended that the number of cases be at least 30 - 35.

Basic sampling methods

Sampling is primarily based on knowledge of the sampling outline, which is understood as a list of all units of the population, from which sampling units are selected. For example, if we consider all car service workshops in the city of Moscow as an aggregate, then we must have a list of such workshops, considered as a contour within which the sample is formed.

The sampling contour inevitably contains an error called the sampling contour error, which characterizes the degree of deviation from the true size of the population. Obviously, there is no complete official list of all car service shops in Moscow. The researcher should inform the client of the work about the size of the sampling contour error.

When forming the sample, probabilistic (random) and improbable (non-random) methods are used.

If all sample units have a known chance (probability) of being included in the sample, then the sample is called probabilistic. If this probability is unknown, then the sample is called improbable. Unfortunately, in most marketing research, due to the impossibility of accurately determining the size of the population, it is not possible to accurately calculate the probabilities. Therefore, the term “known probability” is based on the use of specific sampling techniques rather than knowledge of the exact size of the population.

Probabilistic methods include:

simple random selection;

systematic selection;

cluster selection;

stratified selection.

Improbable methods:

selection based on the principle of convenience;

selection based on judgments;

sampling during the survey;

sampling based on quotas.

The meaning of the selection method based on the principle of convenience lies in the fact that the sampling is carried out in the most convenient way from the point of view of the researcher, for example, from the point of view of the minimum expenditure of time and effort, from the point of view of the availability of respondents. The choice of the place of research and the composition of the sample is made subjectively, for example, the survey of customers is carried out in a store closest to the place of residence of the researcher. Obviously, many members of the population do not take part in the survey.

The formation of a sample on the basis of judgment is based on the use of the opinion of qualified specialists, experts regarding the composition of the sample. This approach is often used to form a focus group.

Sampling in the survey process is based on expanding the number of respondents based on the proposals of respondents who have already taken part in the survey. Initially, the researcher forms a sample that is much smaller than required for the study, then it expands as it is carried out.

The formation of a sample based on quotas (quota selection) presupposes a preliminary, based on the objectives of the study, determination of the number of groups of respondents who meet certain requirements (attributes). For example, for research purposes, it was decided that fifty men and fifty women should be interviewed in a department store. The interviewer conducts a survey until he chooses a set quota.

Methods of mathematical statistics

1. Introduction

Mathematical statistics is a science that develops methods for obtaining, describing and processing experimental data in order to study the patterns of random mass phenomena.

In mathematical statistics, two areas can be distinguished: descriptive statistics and inductive statistics (statistical inference). Descriptive statistics is concerned with the accumulation, systematization and presentation of experimental data in a convenient form. Inductive statistics based on these data allows one to draw certain conclusions about the objects about which data is collected, or estimates of their parameters.

Typical areas of mathematical statistics are:

1) sampling theory;

2) theory of estimates;

3) testing statistical hypotheses;

4) regression analysis;

5) analysis of variance.

Mathematical statistics is based on a number of basic concepts without which it is impossible to study modern methods processing of experimental data. Among the first of them is the concept of the general population and sample.

In mass industrial production, it is often necessary, without checking each manufactured product, to establish whether the product quality meets standards. Since the number of manufactured products is very large or the verification of products is associated with rendering it unusable, a small number of products are checked. On the basis of this check, a conclusion must be made on the entire product series. Of course, you cannot say that all transistors from a batch of 1 million pieces are good or bad by checking one of them. On the other hand, since the sampling process for testing and the testing itself can be time-consuming and costly, the scope of product verification should be such that it can provide a reliable representation of the entire batch of products, while being the minimum size. For this purpose, we will introduce a number of concepts.

The whole set of studied objects or experimental data is called the general population. We will denote by N the number of objects or the amount of data that make up the general population. The value N is called the volume of the general population. If N \u003e\u003e 1, that is, N is very large, then N \u003d ¥ is usually considered.

A random sample or simply a sample is a part of the general population, randomly selected from it. The word "at random" means that the probabilities of choosing any object from the general population are the same. This is an important assumption, however, it is often difficult to test it in practice.

The sample size is the number of objects or the amount of data that make up the sample, and is n ... In what follows, we will assume that the elements of the sample can be assigned, respectively, numerical values \u200b\u200bx 1, x 2, ... x n. For example, in the process of quality control of manufactured bipolar transistors, this may be measuring their DC gain.

2. Numerical characteristics of the sample

2.1 Sample mean

For a specific sample of size n, its sample mean

is determined by the ratio

where x i is the value of the sample elements. Usually, you want to describe the statistical properties of random samples, and not one of them. This means that a mathematical model is being considered, which assumes a sufficiently large number of samples of size n. In this case, the sample elements are considered as random variables X i, taking values \u200b\u200bx i with the probability density f (x), which is the probability density of the general population. Then the sample mean is also a random variable

equal

As before, we will denote random variables by capital letters, and the values \u200b\u200bof random variables - by lowercase.

The average value of the general population from which the sample is made will be called the general average and denoted by m x. It can be expected that if the sample size is significant, then the sample mean will not differ significantly from the general mean. Since the sample mean is a random variable, the mathematical expectation can be found for it:

Thus, the mathematical expectation of the sample mean is equal to the general mean. In this case, the sample mean is said to be the unbiased estimate of the general mean. We will come back to this term later. Since the sample mean is a random variable that fluctuates around the general average, it is desirable to estimate this fluctuation using the variance of the sample mean. Consider a sample whose size n is significantly less than the size of the general population N (n<< N). Предположим, что при формировании выборки характеристики генеральной совокупности не меняются, что эквивалентно предположению N = ¥. Тогда

Random variables X i and X j (i¹j) can be considered independent, therefore,

Substitute this result into the variance formula:

where s 2 is the variance of the general population.

It follows from this formula that with an increase in the sample size, fluctuations of the sample mean around the general average decrease as s 2 / n. Let us illustrate this with an example. Let there be a random signal with mathematical expectation and variance, respectively, equal to m x \u003d 10, s 2 \u003d 9.

The signal samples are taken at equidistant times t 1, t 2, ...,

X (t)

X 1

t 1 t 2. ... ... t n t

Since the samples are random variables, we will denote them by X (t 1), X (t 2),. ... ... , X (t n).

Let's determine the number of samples so that the standard deviation of the estimate of the mathematical expectation of the signal does not exceed 1% of its mathematical expectation. Since m x \u003d 10, it is necessary that

On the other hand, therefore, or From this we obtain that n ³ 900 samples.

2.2 Sample variance

For sample data, it is important to know not only the sample mean, but also the spread of sample values \u200b\u200baround the sample mean. If the sample mean is an estimate of the general mean, then the sample variance should be an estimate of the general variance. Sample variance

for a sample consisting of random variables is determined as follows

Using this representation of the sample variance, we find its mathematical expectation

* This work is not a scientific work, it is not a final qualifying work and is the result of processing, structuring and formatting the collected information intended for use as a source of material for self-preparation of educational work.

Introduction.

References.

Methods of mathematical statistics

Introduction.

Basic concepts of mathematical statistics.

Statistical processing of the results of psychological and pedagogical research.

References.

Methods of mathematical statistics

Introduction.

Basic concepts of mathematical statistics.

Statistical processing of the results of psychological and pedagogical research.

References.

Introduction.

The application of mathematics to other sciences makes sense only in conjunction with a deep theory of a specific phenomenon. It is important to remember this in order not to get lost in a simple game of formulas, behind which there is no real content.

Academician Yu.A. Metropolitan

Theoretical research methods in psychology and pedagogy make it possible to reveal the qualitative characteristics of the studied phenomena. These characteristics will be fuller and deeper if the accumulated empirical material is subjected to quantitative processing. However, the problem of quantitative measurements in the framework of psychological and pedagogical research is very complex. This complexity lies primarily in the subjective-causal diversity of pedagogical activity and its results, in the very object of measurement, which is in a state of continuous movement and change. At the same time, the introduction of quantitative indicators into the study today is a necessary and obligatory component of obtaining objective data on the results of pedagogical work. As a rule, these data can be obtained both by direct or indirect measurement of various components of the pedagogical process, and by quantitative assessment of the corresponding parameters of its adequately constructed mathematical model. For this purpose, in the study of the problems of psychology and pedagogy, methods of mathematical statistics are used. With their help, various tasks are solved: processing factual material, obtaining new, additional data, substantiating the scientific organization of the research, and others.

2. Basic concepts of mathematical statistics

An extremely important role in the analysis of many psychological and pedagogical phenomena is played by average values, which are a generalized characteristic of a qualitatively homogeneous population based on a certain quantitative criterion. It is impossible, for example, to calculate the secondary specialty or average nationality of university students, since these are qualitatively heterogeneous phenomena. But it is possible and necessary to determine, on average, the numerical characteristic of their academic performance (average score), the effectiveness of methodological systems and techniques, etc.

In psychological and pedagogical research, various types of averages are usually used: arithmetic mean, geometric mean, median, fashion and others. The most common are arithmetic mean, median, and mode.

The arithmetic mean is used in cases where there is a directly proportional relationship between the defining property and the given attribute (for example, with an improvement in the performance of a study group, the performance of each of its members improves).

The arithmetic mean is the quotient of dividing the sum of quantities by their number and is calculated by the formula:

where X is the arithmetic mean; X1, X2, X3 ... Xn - the results of individual observations (techniques, actions),

n is the number of observations (techniques, actions),

The sum of the results of all observations (techniques, actions).

Median (Me) is a measure of the average position that characterizes the value of a feature on an ordered (built on the basis of increasing or decreasing) scale, which corresponds to the middle of the studied population. The median can be determined for ordinal and quantitative characteristics. The location of this value is determined by the formula: Location of the median \u003d (n + 1) / 2

For example. The study found that:

- 5 people from participating in the experiment study with excellent marks;

- 18 people study “well”;

- for “satisfactory” - 22 people;

- “unsatisfactory” - 6 people.

Since N \u003d 54 people took part in the experiment, the middle of the sample is equal to people. Hence, it is concluded that more than half of the students study below the grade “good”, that is, the median is more “satisfactory”, but less than “good” (see figure).

Mode (Mo) is the most common typical value of a feature among other values. It corresponds to the class with the maximum frequency. This class is called modal value.

For example.

If to the question of the questionnaire: “indicate the degree of proficiency in a foreign language”, the answers were distributed:

1 - speak fluently - 25

2 - I speak enough to communicate - 54

3 - I know how, but I have difficulty communicating - 253

4 - I hardly understand - 173

5 - don't speak - 28

Obviously, the most typical meaning here is “I own, but have difficulty communicating,” which will be modal. So the mod is - 253.

When using mathematical methods in psychological and pedagogical research, great importance is attached to the calculation of variance and root-mean-square (standard) deviations.

The variance is equal to the mean square of the deviations of the value of the options from the mean. It acts as one of the characteristics of individual results of the scatter of the values \u200b\u200bof the studied variable (for example, students' assessments) around the mean. The variance is calculated by determining: the deviation from the mean; the square of the specified deviation; the sum of the squares of the deviation and the mean of the square of the deviation (see Table 6.1).

The variance value is used in various statistical calculations, but is not directly observable. The quantity directly related to the content of the observed variable is the standard deviation.

Table 6.1

Variance calculation example

Value indicator	Deviation from average	deviations
	2 – 3 = – 1

The mean square deviation confirms the typicality and exponentialness of the arithmetic mean, reflects the measure of fluctuations in the numerical values \u200b\u200bof the signs, from which the average value is derived. It is equal to the square root of the variance and is determined by the formula:

where: - root mean square. With a small number of observations (actions) - less than 100 - in the value of the formula, you should put not “N”, but “N - 1”.

The arithmetic mean and root mean square are the main characteristics of the results obtained during the study. They allow you to generalize data, compare them, establish the advantages of one psychological and pedagogical system (program) over another.

The root mean square (standard) deviation is widely used as a measure of dispersion for various characteristics.

When evaluating the research results, it is important to determine the dispersion of a random variable around the mean. This scattering is described using Gauss's law (the law of the normal distribution of the probability of a random variable). The essence of the law is that when measuring a certain feature in a given set of elements, there are always deviations in both directions from the norm due to a variety of uncontrollable reasons, and the larger the deviations, the less often they occur.

Further processing of the data may reveal: coefficient of variation (stability) the phenomenon under study, which is the percentage of the standard deviation to the arithmetic mean; measure of obliquity, showing in which direction the predominant number of deviations is directed; measure of coolness, which shows the degree of accumulation of values \u200b\u200bof a random variable around the average, etc. All these statistical data help to more fully identify the signs of the phenomena under study.

Coupling measures between variables. Relationships (dependencies) between two or more variables in statistics are called correlation. It is estimated using the value of the correlation coefficient, which is a measure of the degree and magnitude of this relationship.

There are many correlation coefficients. Let's consider only a part of them that take into account the presence of a linear relationship between variables. Their choice depends on the scales of measurement of the variables, the relationship between which needs to be assessed. The most often used in psychology and pedagogy are the Pearson and Spearman coefficients.

Let's consider the calculation of the values \u200b\u200bof the correlation coefficients using specific examples.

Example 1. Let two comparable variables X (marital status) and Y (exclusion from the university) be measured on a dichotomous scale (a special case of the denomination scale). To determine the relationship, we use the Pearson coefficient.

In cases where there is no need to calculate the frequency of occurrence of different values \u200b\u200bof variables X and Y, it is convenient to calculate the correlation coefficient using a contingency table (see Tables 6.2, 6.3, 6.4), showing the number of joint occurrences of pairs of values \u200b\u200bfor two variables (features) ... A - the number of cases when the variable X has a value equal to zero, and at the same time the variable Y has a value equal to one B - the number of cases when the variables X and Y have simultaneously values \u200b\u200bequal to one; С - the number of cases when the variables X and Y have simultaneously values \u200b\u200bequal to zero; D - the number of cases when the variable X has a value equal to one, and, at the same time, the variable Y has a value equal to zero.

Table 6.2

General contingency table

	Feature X

In general, the formula for the Pearson correlation coefficient for dichotomous data has the form

Table 6.3

Sample data on a dichotomous scale

Let's substitute the data from the contingency table (see Table 6.4) corresponding to the considered example into the formula:

Thus, the Pearson correlation coefficient for the selected example is 0.32, that is, the relationship between the marital status of students and the facts of exclusion from the university is insignificant.

Example 2. If both variables are measured in scales of order, then Spearman's rank correlation coefficient (Rs) is used as a measure of the relationship. It is calculated by the formula

where Rs is Spearman's rank correlation coefficient; Di is the difference in the ranks of the compared objects; N is the number of compared objects.

The value of Spearman's coefficient varies from –1 to + 1. In the first case, there is an unambiguous, but oppositely directed relationship between the analyzed variables (with an increase in the value of one, the value of the other decreases). In the second, as the values \u200b\u200bof one variable grow, the value of the second variable increases proportionally. If the value of Rs is equal to zero or has a value close to it, then there is no significant relationship between the variables.

As an example of calculating the Spearman coefficient, we use the data from table 6.5.

Table 6.5

Data and intermediate results of calculating the coefficient value

rank correlation Rs

Qualities	Expert Ranks	Difference in rank	Rank difference squared

			–1 –1 –1
The sum of the squares of the rank differences Di \u003d 22

Let's substitute the example data into the formula for the Smirman coefficient:

The calculation results allow us to assert the presence of a sufficiently pronounced relationship between the variables under consideration.

Statistical test of a scientific hypothesis. The proof of the statistical reliability of the experimental influence differs significantly from the proof in mathematics and formal logic, where the conclusions are more universal in nature: statistical proofs are not so strict and final - they always risk making mistakes in conclusions and therefore statistical methods do not finally prove the legitimacy of one or another conclusion, and a measure of the likelihood of accepting a particular hypothesis is shown.

A pedagogical hypothesis (a scientific assumption about the advantage of a particular method, etc.) in the process of statistical analysis is translated into the language of statistical science and is formulated anew, at least in the form of two statistical hypotheses. The first (main) is called null hypothesis (H 0), in which the researcher speaks about his starting position. He (a priori), as it were, declares that the new (assumed by him, his colleagues or opponents) method does not have any advantages, and therefore from the very beginning the researcher is psychologically ready to take an honest scientific position: the differences between the new and the old methods are declared equal to zero. In another, alternative hypothesis (H 1) an assumption is made about the advantage of the new method. Sometimes several alternative hypotheses are put forward with appropriate designations.

For example, the hypothesis about the advantage of the old method (H 2). Alternative hypotheses are accepted if and only if the null hypothesis is refuted. This happens in cases when the differences, say, in the arithmetic means of the experimental and control groups are so significant (statistically significant) that the risk of error to reject the null hypothesis and accept the alternative does not exceed one of the three accepted ones. levels of significance statistical inference:

- the first level - 5% (in scientific texts they sometimes write p \u003d 5% or a? 0.05, if presented in fractions), where the risk of error in the conclusion is allowed in five cases out of a hundred theoretically possible similar experiments with strictly random selection of subjects for each experiment;

- the second level is 1%, i.e., accordingly, the risk of making a mistake is allowed only in one case out of a hundred (a? 0.01, with the same requirements);

- the third level is 0.1%, that is, the risk of making a mistake is allowed only in one case out of a thousand (a? 0.001). The last level of significance makes very high demands on substantiating the reliability of experimental results and therefore is rarely used.

When comparing the arithmetic mean of the experimental and control groups, it is important not only to determine which mean is greater, but also how much greater. The smaller the difference between them, the more acceptable the null hypothesis of the absence of statistically significant (reliable) differences will be. Unlike thinking at the level of everyday consciousness, which is inclined to perceive the difference in means obtained as a result of experience as a fact and a basis for inference, a teacher-researcher familiar with the logic of statistical inference will not rush in such cases. He will most likely make an assumption about the randomness of the differences, put forward a null hypothesis about the absence of significant differences in the results of the experimental and control groups, and only after refuting the null hypothesis will he accept the alternative.

Thus, the issue of differences in the framework of scientific thinking is transferred to another plane. The point is not only in the differences (they almost always exist), but in the magnitude of these differences and hence in determining the difference and the boundary after which one can say: yes, the differences are not accidental, they are statistically significant, which means that the subjects of these two groups belong after experiment no longer to one (as before), but to two different general populations, and that the level of preparedness of students potentially belonging to these populations will differ significantly. In order to show the boundaries of these differences, the so-called estimates of general parameters.

Let's look at a specific example (see Table 6.6), how using mathematical statistics, you can refute or confirm the null hypothesis.

For example, it is necessary to determine whether the effectiveness of group activities of students depends on the level of development in the study group of interpersonal relations. As a null hypothesis, it is suggested that such a relationship does not exist, and as an alternative, a relationship exists. For these purposes, the results of the effectiveness of activity in two groups are compared, one of which in this case acts as an experimental one, and the other as a control one. To determine whether the difference between the average values \u200b\u200bof performance indicators in the first and in the second group is significant (significant), it is necessary to calculate the statistical significance of this difference. For this, you can use the t - Student's test. It is calculated by the formula:

where X 1 and X 2 - the arithmetic mean of the variables in groups 1 and 2; М 1 and М 2 - values \u200b\u200bof average errors, which are calculated by the formula:

where is the mean square, calculated by the formula (2).

Let us determine the errors for the first row (experimental group) and the second row (control group):

We find the value of t - criterion by the formula:

Having calculated the value of the t - criterion, it is required to determine the level of statistical significance of the differences between the average performance indicators in the experimental and control groups using a special table. The higher the value of the t - criterion, the higher the significance of the differences.

For this, the calculated t is compared with the tabular t. The tabular value is selected taking into account the selected confidence level (p \u003d 0.05 or p \u003d 0.01), and also depending on the number of degrees of freedom, which is found by the formula:

where U is the number of degrees of freedom; N 1 and N 2 - the number of measurements in the first and second rows. In our example, U \u003d 7 + 7 –2 \u003d 12.

Table 6.6

Data and intermediate results of calculating the significance of statistical

Differences in mean values

For the table t - criterion, we find that the value of t table. \u003d 3.055 for one percent level (p

However, the teacher-researcher should remember that the existence of the statistical significance of the difference in mean values \u200b\u200bis an important, but not the only, argument in favor of the presence or absence of a relationship (dependence) between phenomena or variables. Therefore, it is necessary to involve other arguments for a quantitative or substantive substantiation of a possible connection.

Multivariate data analysis methods. The analysis of the relationship between a large number of variables is carried out using multivariate methods of statistical processing. The purpose of using such methods is to make the hidden patterns visible, to highlight the most significant relationships between variables. Examples of such multivariate statistical methods are:

- factor analysis;

- cluster analysis;

- analysis of variance;

- regression analysis;

- latent structural analysis;

- multidimensional scaling and others.

Factor analysis is to identify and interpret factors. A factor is a generalized variable that allows you to collapse a part of information, that is, to present it in a convenient form. For example, the factorial theory of personality identifies a number of generalized characteristics of behavior, which in this case are called personality traits.

Cluster Analysisallows you to highlight the leading feature and the hierarchy of relationships between features.

Analysis of variance - a statistical method used to study one or more simultaneously acting and independent variables for the variability of the observed trait. Its peculiarity lies in the fact that the observed feature can only be quantitative, at the same time, the explanatory features can be both quantitative and qualitative.

Regression analysis allows you to identify the quantitative (numerical) dependence of the average value of changes in a productive attribute (explained) on changes in one or more attributes (explanatory variables). As a rule, this type of analysis is used when it is required to find out how much the average value of one characteristic changes when another characteristic changes by one.

Latent structural analysis represents a set of analytical and statistical procedures for identifying hidden variables (features), as well as the internal structure of relationships between them. It makes it possible to investigate the manifestations of complex interrelationships of directly unobservable characteristics of socio-psychological and pedagogical phenomena. Latent analysis can be the basis for modeling these relationships.

Multidimensional scaling provides a visual assessment of the similarity or difference between some objects described by a wide variety of variables. These differences are presented as the distance between the evaluated objects in multidimensional space.

3. Statistical processing of the results of psychological and pedagogical

research

In any study, it is always important to ensure the mass and representativeness (representativeness) of the objects of study. To resolve this issue, they usually resort to mathematical methods of calculating the minimum value of objects (groups of respondents) subject to research, so that objective conclusions can be drawn on this basis.

According to the degree of completeness of coverage of primary units, statistics divides studies into continuous ones, when all units of the phenomenon under study are studied, and selective, if only a part of the population of interest is studied, taken according to some criterion. The researcher does not always have the opportunity to study the entire set of phenomena, although this should always be strived for (there is not enough time, funds, necessary conditions, etc.); on the other hand, often a continuous study is simply not required, since the conclusions will be quite accurate after studying a certain part of the primary units.

The theoretical basis of the selective method of research is the theory of probability and the law of large numbers. In order for the study to have a sufficient number of facts, observations, use a table of sufficiently large numbers. In this case, the researcher is required to establish the magnitude of the probability and the magnitude of the permissible error. Let, for example, the admissible error in the conclusions to be made as a result of observations, in comparison with theoretical assumptions, should not exceed 0.05 in both the positive and negative directions (in other words, we can be mistaken in no more than 5 cases out of 100). Then, according to the table of sufficiently large numbers (see Table 6.7), we find that the correct conclusion can be made in 9 cases out of 10 when the number of observations is at least 270, in 99 cases out of 100 with at least 663 observations, etc. This means that with an increase in the accuracy and probability with which we intend to draw conclusions, the number of required observations increases. However, in psychological and pedagogical research, it should not be excessively large. 300-500 observations are often quite enough for solid conclusions.

This method of determining the sample size is the simplest. Mathematical statistics also has more complex methods for calculating the required sample sets, which are described in detail in the special literature.

However, compliance with the requirements of mass character does not yet ensure the reliability of conclusions. They will be reliable when the units selected for observation (conversations, experiment, etc.) are sufficiently representative for the studied class of phenomena.

Table 6.7

A short table of large enough numbers

The representativeness of observation units is ensured primarily by their random selection using tables of random numbers. Suppose, it is required to determine 20 training groups for carrying out a mass experiment out of the available 200. For this, a list of all groups is drawn up, which is numbered. Then 20 numbers are written out from the table of random numbers, starting with any number, at a certain interval. These 20 random numbers, according to the observance of the numbers, determine the groups that the researcher needs. A random selection of objects from the general (general) population gives grounds to assert that the results obtained in the study of a sample set of units will not differ sharply from those that would be available in the case of a study of the entire set of units.

In the practice of psychological and pedagogical research, not only simple random selections are used, but also more complex selection methods: stratified random selection, multi-stage selection, etc.

Mathematical and statistical research methods are also means of obtaining new factual material. For this purpose, templating techniques are used that increase the informative capacity of the questionnaire and scaling, which makes it possible to more accurately assess the actions of both the researcher and the subjects.

The scales arose due to the need to objectively and accurately diagnose and measure the intensity of certain psychological and pedagogical phenomena. Scaling makes it possible to order the phenomena, to quantify each of them, to determine the lower and higher stages of the phenomenon under study.

So, when studying the cognitive interests of listeners, you can set their boundaries: very high interest - very weak interest. Introduce a number of steps between these boundaries that create a scale of cognitive interests: very great interest (1); great interest (2); medium (3); weak (4); very weak (5).

Scales of different types are used in psychological and pedagogical research, for example,

a) Three-dimensional scale

Very active …… .. ………… ..10

Active ………………………… 5

Passive… ... ………………… ... 0

b) Multidimensional scale

Very active ………………… ..8

Intermediate ………………… .6

Not too active ………… ... 4

Passive ……………………… ..2

Completely passive ………… ... 0

c) Two-sided scale.

Very interested in …………… ..10

Interested enough in ……… ... 5

Indifferent ……………………… .0

Not interested in ………………… ..5

No interest at all ……… 10

Numerical rating scales give each item a specific numerical designation. So, when analyzing the attitude of students to learning, their perseverance in work, willingness to cooperate, etc. you can draw up a numerical scale based on the following indicators: 1 - unsatisfactory; 2 - weak; 3 - medium; 4 is above average, 5 is much above average. In this case, the scale takes the following form (see Table 6.8):

Table 6.8

If the numeric scale is bipolar, the bipolar ordering is used with a zero value in the center:

Discipline Indiscipline

Pronounced 5 4 3 2 1 0 1 2 3 4 5 Not pronounced

Grading scales can be plotted graphically. In this case, they express categories in a visual form. Moreover, each division (step) of the scale is characterized verbally.

The considered methods play an important role in the analysis and generalization of the data obtained. They make it possible to establish various correlations, correlations between facts, to identify trends in the development of psychological and pedagogical phenomena. So, the theory of groupings of mathematical statistics helps to determine which facts from the collected empirical material are comparable, on what basis to correctly group them, what degree of reliability they will be. All this makes it possible to avoid arbitrary manipulations with facts and to define a program for their processing. Depending on the goals and objectives, three types of groupings are usually used: typological, variational and analytical.

Typological grouping it is used when it is necessary to break down the obtained factual material into qualitatively homogeneous units (distribution of the number of discipline violations between different categories of students, breakdown of indicators of their physical exercise performance by years of study, etc.).

If necessary, group the material according to the value of any changing (varying) attribute - a breakdown of groups of students according to the level of academic performance, percentage of assignments, violations of the same type, etc. - applied variation grouping, which makes it possible to consistently judge the structure of the phenomenon under study.

Analytical view of grouping helps to establish the relationship between the studied phenomena (the dependence of the degree of preparation of students on various teaching methods, the quality of tasks performed on temperament, abilities, etc.), their interdependence and interdependence in exact calculation.

The importance of the researcher's work in grouping the collected data is evidenced by the fact that errors in this work devalue the most comprehensive and meaningful information.

Currently, the mathematical foundations of grouping, typology, classification have received the most profound development in sociology. Modern approaches and methods of typology and classification in sociological research can be successfully applied in psychology and pedagogy.

In the course of the study, techniques are used for the final generalization of data. One of them is the technique of drawing up and studying tables.

When compiling a summary of data on one statistical quantity, a distribution series (variation series) of the value of this quantity is formed. An example of such a series (see Table 6.9) is a summary of data on the chest circumference of 500 persons.

Table 6.9

Summarizing data for two or more statistical quantities simultaneously involves compiling a distribution table that reveals the distribution of the values \u200b\u200bof one static quantity in accordance with the values \u200b\u200bthat other quantities take.

As an illustration, table 6.10 is given, compiled on the basis of statistics on chest circumference and weight of these people.

Table 6.10

Chest circumference in cm

The distribution table gives an idea of \u200b\u200bthe relationship and relationship existing between the two quantities, namely: with a low weight, the frequencies are located in the upper left quarter of the table, which indicates the predominance of persons with a small chest circumference. As the weight increases to a mean value, the frequency distribution moves to the center of the plate. This indicates that people weighing closer to the average have a chest circumference that is also close to the average. With a further increase in weight, frequencies begin to occupy the lower right quarter of the plate. This indicates that a person weighing more than average has a chest circumference that is also above average.

It follows from the table that the established relationship is not strict (functional), but probabilistic, when, with changes in the values \u200b\u200bof one quantity, the other changes as a trend, without a rigid unambiguous relationship. Similar connections and dependencies are often found in psychology and pedagogy. Currently, they are usually expressed using correlation and regression analysis.

Variational series and tables give an idea of \u200b\u200bthe statics of the phenomenon, while the dynamics can be shown by the series of development, where the first line contains successive stages or time intervals, and the second - the values \u200b\u200bof the studied statistical quantity obtained at these stages. This is how the increase, decrease or periodic changes of the studied phenomenon are revealed, its tendencies and patterns are revealed.

Tables can be filled with absolute values, or summary figures (average, relative). The results of statistical work - in addition to tables, are often depicted graphically in the form of diagrams, shapes, etc. The main methods of graphing statistical values \u200b\u200bare: the method of points, the method of lines and the method of rectangles. They are simple and accessible to every researcher. The technique of their use is to draw coordinate axes, establish a scale, and extract the designation of segments (points) on the horizontal and vertical axes.

Diagrams depicting the series of distribution of values \u200b\u200bof one statistical quantity allow plotting distribution curves.

The graphical representation of two (or more) statistical quantities makes it possible to form a certain curved surface, called the distribution surface. A series of development in graphic design form development curves.

The graphic representation of statistical material allows you to penetrate deeper into the meaning of digital values, to grasp their interdependencies and features of the phenomenon being studied, which are difficult to notice in the table. The researcher is freed from the work that he would have to do in order to deal with the abundance of numbers.

Tables and graphs are important, but only the first steps in the study of statistical quantities. The main method is analytical, operating with mathematical formulas, with the help of which the so-called “generalizing indicators” are derived, that is, absolute values \u200b\u200bgiven in a comparable form (relative and average values, balances and indices). So, with the help of relative values \u200b\u200b(percent), the qualitative features of the analyzed aggregates are determined (for example, the ratio of excellent students to the total number of students; the number of errors when working on complex equipment caused by the mental instability of students to the total number of errors, etc.). That is, the relationship is revealed: part to the whole (specific weight), terms to the sum (structure of the aggregate), one part of the aggregate to its other part; characterizing the dynamics of any changes over time, etc.

As you can see, even the most general understanding of the methods of statistical calculus suggests that these methods have great capabilities in the analysis and processing of empirical material. Of course, the mathematical apparatus can dispassionately process everything that a researcher puts into it, both reliable data and subjective conjectures. That is why perfect knowledge of the mathematical apparatus for processing the accumulated empirical material in unity with a thorough knowledge of the qualitative characteristics of the phenomenon under study is necessary for every researcher. Only in this case is it possible to select high-quality, objective factual material, its qualified processing and obtain reliable final data.

This is a brief description of the most frequently used methods of studying the problems of psychology and pedagogy. It should be emphasized that none of the methods considered, taken by itself, can claim universality, for a complete guarantee of the objectivity of the data obtained. Thus, the elements of subjectivity in the answers obtained by interviewing respondents are obvious. Observation results, as a rule, are not free from the subjective assessments of the researcher himself. Data taken from various documents require at the same time verification of the accuracy of this documentation (especially personal documents, second-hand documents, etc.).

Therefore, each researcher should strive, on the one hand, to improve the technique of applying any specific method, and on the other, to a comprehensive, mutually controlling use of different methods to study the same problem. Possession of the entire system of methods makes it possible to develop a rational research methodology, clearly organize and conduct it, and obtain significant theoretical and practical results.

References.

Shevandrin N.I. Social psychology in education: Textbook. Part 1. Conceptual and applied foundations of social psychology. - M .: VLADOS, 1995.

2. Davydov V.P. Fundamentals of methodology, methodology and technology of pedagogical research: Scientific and methodological manual. - M .: Academy of the FSB, 1997.

Math statistics - This is a branch of mathematics that studies approximate methods of collecting and analyzing data based on the results of an experiment to identify existing patterns, i.e. finding the laws of distribution of random variables and their numerical characteristics.

In mathematical statistics, it is customary to distinguish two main areas of research:

1. Estimation of the parameters of the general population.

2. Testing statistical hypotheses (some a priori assumptions).

The basic concepts of mathematical statistics are: general population, sample, theoretical distribution function.

The general population is a collection of all conceivable statistics when observing a random variable.

X G \u003d (x 1, x 2, x 3, ..., x N,) \u003d (x i; i \u003d 1, N)

The observed random variable X is called a feature or sampling factor. The general population is a statistical analogue of a random variable, its volume N is usually large, therefore a part of the data is selected from it, called a sample population or simply a sample.

X B \u003d (x 1, x 2, x 3, ..., x n,) \u003d (x i; i \u003d 1, n)

X B Ì X G, n £ N

Sample is a set of randomly selected observations (objects) from the general population for direct study. The number of objects in the sample is called the sample size and is denoted by n. Typically, the sample is 5% -10% of the general population.

The use of a sample to construct the patterns to which the observed random variable is subordinated allows avoiding its continuous (mass) observation, which is often a resource-intensive process, if not simply impossible.

For example, a population is a plurality of individuals. The study of an entire population is laborious and expensive, so data is collected from a sample of individuals who are considered to be representatives of this population, allowing to draw conclusions about this population.

However, the sample must necessarily satisfy the condition representativeness, i.e. to give an informed view of the general population. How to form a representative (representative) sample? Ideally, the aim is to obtain a random (randomized) sample. To do this, a list of all individuals in the population is made and they are randomly selected. But sometimes the costs in compiling a list may be unacceptable and then an acceptable sample is taken, for example, one clinic, a hospital, and all patients in that clinic with this disease are examined.

Each item in the sample is called a variant. The number of repetitions of variants in the sample is called the frequency of occurrence. The quantity is called relative frequency options, i.e. is found as the ratio of the absolute frequency of the variants to the entire sample size. A sequence of variants, written in ascending order, is called variation series.

Consider three forms of a variation series: ranked, discrete, and interval.

Ranked row is a list of individual units of the population in ascending order of the studied trait.

Discrete variation series is a table consisting of graphs, or rows: a specific value of the feature x i and the absolute frequency n i (or relative frequency ω i) of the i-th feature value x.

An example of a variation series is the table

Write the distribution of relative frequencies.

Decision: Find the relative frequencies. To do this, we divide the frequencies by the sample size:

The distribution of relative frequencies is as follows:


	0,15	0,5	0,35

Control: 0.15 + 0.5 + 0.35 \u003d 1.

Discrete series can be displayed graphically. In a rectangular Cartesian coordinate system, points with coordinates () or () are marked, which are connected by straight lines. Such a broken line is called polygon of frequencies.

Construct a discrete variation series (DVR) and draw a polygon for the distribution of 45 applicants according to the number of points they received on entrance exams:

39 41 40 42 41 40 42 44 40 43 42 41 43 39 42 41 42 39 41 37 43 41 38 43 42 41 40 41 38 44 40 39 41 40 42 40 41 42 40 43 38 39 41 41 42.

Decision: To construct a variation series, we arrange the various values \u200b\u200bof the attribute x (variants) in ascending order and write down its frequency under each of these values.

Let's construct a polygon of this distribution:

Figure: 13.1. Frequency polygon

Interval variation series used for a large number of observations. To build such a series, you need to select the number of feature intervals and set the length of the interval. With a large number of groups, the interval will be minimal. The number of groups in the variation series can be found using the Sturges formula: (k is the number of groups, n is the sample size), and the width of the interval is

where is the maximum; - the minimum value is a variant, and their difference R is called range of variation.

A sample of 100 people from the totality of all students of a medical university is investigated.

Decision: Let's calculate the number of groups:. Thus, to compile an interval series, it is better to divide this sample into 7 or 8 groups. The set of groups into which the results of observations and the frequencies of obtaining the results of observations in each group are divided is called statistical population.

To visualize the statistical distribution, use a histogram.

Frequency histogram is a stepped figure, consisting of adjacent rectangles, built on one straight line, the bases of which are the same and equal to the width of the interval, and the height is equal to either the frequency of falling into the interval or the relative frequency ω i.

Observations of the number of particles entering the Geiger counter within a minute gave the following results:

21 30 39 31 42 34 36 30 28 30 33 24 31 40 31 33 31 27 31 45 31 34 27 30 48 30 28 30 33 46 43 30 33 28 31 27 31 36 51 34 31 36 34 37 28 30 39 31 42 37.

Construct from these data an interval variation series with equal intervals (I interval 20-24; II interval 24-28, etc.) and draw a histogram.

Decision: n \u003d 50

The histogram of this distribution looks like:

Figure: 13.2. Distribution histogram

Job options

№ 13.1. The voltage in the mains was measured every hour. In this case, the following values \u200b\u200bwere obtained (B):

227 219 215 230 232 223 220 222 218 219 222 221 227 226 226 209 211 215 218 220 216 220 220 221 225 224 212 217 219 220.

Build a statistical distribution and draw a polygon.

№ 13.2. Observations of blood sugar in 50 people gave the following results:

3.94 3.84 3.86 4.06 3.67 3.97 3.76 3.61 3.96 4.04

3.82 3.94 3.98 3.57 3.87 4.07 3.99 3.69 3.76 3.71

3.81 3.71 4.16 3.76 4.00 3.46 4.08 3.88 4.01 3.93

3.92 3.89 4.02 4.17 3.72 4.09 3.78 4.02 3.73 3.52

3.91 3.62 4.18 4.26 4.03 4.14 3.72 4.33 3.82 4.03

Construct from these data an interval variation series with equal intervals (I - 3.45-3.55; II - 3.55-3.65, etc.) and depict it graphically, draw a histogram.

№ 13.3. Construct a polygon of distribution frequencies of the erythrocyte sedimentation rate (ESR) in 100 people.

Consider some concepts and basic approaches to classification errors. According to the method of calculation, the errors can be divided into absolute and relative.

Absolute error is equal to the difference of the average measurement of the quantity xand the true value of this quantity:

In some cases, if necessary, the errors of single determinations are calculated:

Note that the measured value in chemical analysis can be both the content of a component and an analytical signal. Depending on whether the analysis result overestimates or underestimates the error, the errors may be positiveand negative.

Relative error can be expressed in fractions or percentages and usually has no sign:

Errors can be classified according to their source. Since there are a lot of sources of errors, their classification cannot be unambiguous.

Most often, errors are classified according to the nature of the reasons that cause them. In this case, the errors are divided by systematicallysky and casual, misses (or gross errors) are also distinguished.

TO systematic include errors that are caused by a permanent cause, are constant in all dimensions or change according to a permanent law, can be identified and eliminated.

Random errors, the causes of which are unknown, can be estimated by methods of mathematical statistics.

Miss is an error that sharply distorts the result of the analysis and is usually easily detectable, usually caused by the analyst's negligence or incompetence. In fig. 1.1 is a diagram explaining the concepts of systematic and errors and misses. Straight 1 corresponds to the ideal case when there are no systematic and random errors in all N determinations. Lines 2 and 3 are also idealized examples of chemical analysis. In one case (line 2), random errors are completely absent, but all Ndefinitions have a constant negative systematic error Δх; otherwise (line 3) there is no systematic error at all. The real situation is reflected in the line 4: there are both random and systematic errors.

Figure: 4.2.1 Systematic and random errors in chemical analysis.

The division of errors into systematic and random is to a certain extent arbitrary.

Systematic errors of one sample of results, when considering a larger amount of data, can become random. For example, a systematic error caused by incorrect readings of the instrument, when measuring the analytical signal on different instruments in different laboratories, becomes random.

Reproducibility characterizes the degree of closeness to each other of single definitions, the scatter of single results relative to the average (Fig. 1.2).

Figure: 4.2..2. Repeatability and accuracy of chemical analysis

In some cases along with the term "reproducibility" use the term "convergence".In this case, convergence is understood as the scattering of the results of parallel determinations, and by reproducibility, the scattering of results obtained by different methods, in different laboratories, at different times, etc.

Right is the quality of chemical analysis, reflecting the closeness to zero of the systematic error. Correctness characterizes the deviation of the obtained analysis result from the true value of the measured quantity (see Fig. 1.2).

General population - a hypothetical set of all conceivable results from -∞ to + ∞;

Analysis of experimental data shows that large errors are observed less oftenthan small ones. It is also noted that with an increase in the number of observations, the same errors of different signs are encountered equally often. These and other properties of random errors are described by the normal distribution or the Gauss equation,which describes the probability density
.

where x-value of a random variable;

μ – general average (expected value- constant parameter);

Expected value- for a continuous random variable is the limit to which the mean tends with an unlimited increase in the sample. Thus, the mathematical expectation is the average value for the entire population as a whole, sometimes it is called general average.

σ 2 -dispersion (constant parameter) - characterizes the scattering of a random variable relative to its mathematical expectation;

σ is the standard deviation.

Dispersion - characterizes the scattering of a random variable relative to its mathematical expectation.

Sample population (sample) - the real number (n) of results that the researcher has, n \u003d 3 ÷ 10.

Normal distribution law unacceptable to handle a small number of changes in the sample (usually 3-10) - even if the population as a whole is distributed normally. For small samples, instead of the normal distribution, use student's distribution (t - distribution), which connects the three main characteristics of the sample -

The width of the confidence interval;

The corresponding probability;

Sample size.

Before processing data using methods of mathematical statistics, it is necessary to identify misses (gross errors) and exclude them from the considered results. One of the simplest is the method of detecting misses using the Q - test with the number of measurements n< 10:

where R = x max - x min - the range of variation; x 1 - a suspiciously prominent value; x 2 - the result of a single determination, the closest in value to x 1 .

The obtained value is compared with the critical value of Q crit at a confidence level of P \u003d 0.95. If Q\u003e Q crit, the rolled result is a miss and is discarded.

The main characteristics of the sample... To sample from n results are calculated average,:

and variancecharacterizing the scatter of the results relative to the mean:

Variance in an explicit form cannot be used to quantitatively characterize the scatter of results, since its dimension does not coincide with the dimension of the analysis result. To characterize the scattering use standard deviation,S.

This value is also called the root-mean-square (or standard) deviation or the root-mean-square error of an individual result.

ABOUTrelative standard deviationor the coefficient of variation (V) is calculated by the ratio

The variance of the arithmetic mean calculate:

and the standard deviation of the mean

It should be noted that all quantities - variance, standard deviation and relative standard deviation, as well as the variance of the arithmetic mean and standard deviation of the arithmetic mean - characterize the reproducibility of the results of chemical analysis.

Used when processing small (n<20) выборок из нормально распределенной генеральной совокупности t – распределение (т.е. распределение нормированной случайной величины) характеризуется соотношением

wheret p , f – student's distribution with the number of degrees of freedom f= n-1 and confidence level P \u003d 0.95(or significance level p \u003d 0.05).

The values \u200b\u200bof t - distributions are given in the tables, they are calculated for the sample in n results the value of the confidence interval of the measured value for a given confidence probability according to the formula

Confidence interval characterizes both the reproducibility of the results of chemical analysis, and - if the true value of x is known - their correctness.

An example of the performance of test number 2

The task

When aon the analysis of air for nitrogen content by the chromatographic method, the following results were obtained for two series of experiments:

Decision:

Check the rows for gross errors using the Q-test. Why place them in a descending row (from minimum to maximum or vice versa):

First episode:

77,90<77,92<77,95<77,99<78,05<78,07<78,08<78,10

We check the extreme results of the series (whether they contain a gross error).

The obtained value is compared with the tabulated value (Table 2 of the Appendix). For n \u003d 8, p \u003d 0.95 Q tab \u003d 0.55.

Because Q tab\u003e Q 1 calculation, the leftmost digit is not a "miss".

Checking the rightmost digit

Q calc

The number on the far right is also not wrong.

We have second row resultsyes in ascending order:

78,02<78,08<78,13<78,14<78,16<78,20<78,23<78,26.

We check the extreme results of the experiments - whether they are wrong.

Q (n \u003d 8, p \u003d 0.95) \u003d 0.55. Table value.

The leftmost value is not wrong.

The digit on the far right (is it wrong).

Those. 0.125<0,55

The number on the far right is not a "miss."

We subject the results of the experiments to statistical processing.

We calculate the weighted average of the results:

- for the first row of results.

- for the second row of results.

Dispersion relative to the mean:

- for the first row.

- for the second row.

Standard deviation:

- for the first row.

- for the second row.

Standard deviation of the arithmetic mean:

For small (n<20) выборках из нормально распределенной генеральной совокупности следует использовать t – распределение, т.е. распределение Стьюдента при числе степени свободы f=n-1 и доверительной вероятности p=0,95.

Using tables of t - distributions, for a sample of n - results, the value of the confidence interval of the measured value for a given confidence probability is determined. This interval can be calculated:

FROM equal varianceand average resultstwo samples.

Comparison of the two variances is carried out using the F-distribution (Fisher distribution). If we have two sample sets with variances S 2 1 and S 2 2 and the number of degrees of freedom f 1 \u003d n 1 -1 and f 2 \u003d n 2 -1, respectively, then we calculate the value of F:

F \u003d S 2 1 / S 2 2

Moreover the numerator always contains the greater of the two compared sample variances. The result is compared with the table value. If F 0\u003e F crit (at p \u003d 0.95; n 1, n 2), then the discrepancy between the variances is significant and the considered sample sets differ in reproducibility.

If the discrepancy between the variances is insignificant, it is possible to compare the means x 1 and x 2 of the two samples, i.e. find out if there is a statistically significant difference between the test results. To solve the problem, t - distribution is used. The weighted average of the two dispersions is preliminarily calculated:

And the weighted average standard deviation

and then - the value of t:

Value t exp compare with t crete with the number of degrees of freedom f \u003d f 1 + f 2 \u003d (n 1 + n 2 -2) and the sample confidence level p \u003d 0.95. If at the same time t exp > t crete , then the discrepancy between the average and significant and the sample does not belong to the same general population. If t exp< t крит, расхождение между средними незначимо, т.е. выборки принадлежат одной и той же генеральной совокупности, и, следовательно, данные обеих серий можно объединить и рассматривать их как одну выборочную совокупность из n 1 +n 2 результатов.

Control task number 2

Analysis of air for the content of component X by chromatographic method for two series gave the following results (table-1).

3. Whether the results of both samples and the same population are. Check by Student's t criterion (p \u003d 0.95; n \u003d 8).

Table-4.2.1- Initial data for control task No. 2

Option No.	Component

Experimental group

Control group

The value of performance