The term "sample" has a double meaning. This is both the procedure for selecting the elements of the object under study, and the totality of the elements of the object selected for direct examination.

The totality of all elements of the object of sociological research is called the general population. The part of the population selected for direct study is defined as the sample population, which is sometimes called the sample. The sample will be representative (representative) if it reflects the structure, essential properties and characteristics of the general population, i.e. represents its reduced model.

Depending on the methods of selecting units of the sample population, the sample may be random or non-random. Varieties of random selection are simple random or mechanical sampling, nested and stratified.

The basis of a simple random (mechanical) sample is a list of all potential respondents that make up the general population. Each of them is assigned a serial number, which is transferred to a separate card. Then, from the total number of these cards with numbers at random, as in a lottery, the required number is selected, which will make up the sample.

Along with the indicated methods of forming a sample population, systematic selection is also used in this type of sample. In this case, the selection of respondents is made through a certain step, which is determined by dividing the size of the entire general population by the size of the sample population. For example, the general population is 2 thousand people, and the sample is 200. Therefore, the step in the selection of respondents will be 10. That is, every tenth of the general population will be included in the sample. If the general population is even larger, then a table of random numbers is used to determine the sample population.

In the practice of sociological research, the method of nested selection is quite common, which involves the selection as research units not of individual respondents, but of groups of people (work collectives, brigades), followed by a complete survey of them. The representativeness of the nested sample is ensured by the maximum similarity of the composition of the groups.

With a stratified sample, strata (layers) are distinguished in the general population, which are characterized by the greatest homogeneity.

Within each stratum, simple random (mechanical) sampling is performed.

Non-random sampling is based on a conscious and purposeful selection of sample units. It is represented by spontaneous and quota selections, as well as the "basic array method".

Spontaneous selection is used mainly in pilot studies and involves the selection of the "first comer". An illustration of this method can be mail surveys of readers of periodicals or surveys of buyers who purchase a particular type of product. Since in this case it is difficult to assess the representativeness of the sample, the conclusions of the study apply only to the population surveyed.

The "snowball" method also belongs to spontaneous selection, when the search for some respondents is carried out at the prompt of others. For example, it is necessary to interview 200 people on any issue, but the addresses of only ten people are known, at the prompt of which the search for other respondents continues until the required sample size is reached.

For the implementation of quota selection, information is needed on a number of characteristics of the general population. For each of them, quotas (part, share) are compiled, reflecting in a certain proportion all the characteristics of the general population. Such selection, for example, takes into account the percentage representation of men, their age, education, occupation, marital status, ethnicity or territorial affiliation, etc.

The quota sample is purposefully formed by the interviewers in compliance with the quota parameters. When creating quotas, the main task for the interviewer is to ensure that the conditions for random selection are met, under which each element of the general population would have an equal chance of being included in the sample.

The main array method is convenient in pilot studies to clarify any control question. When using this method, the sample size is 60-70% of the sample size.

In the formation of a sample population, an important role is played by the determination of its volume or number. The sample size is determined by the degree of homogeneity or heterogeneity of the general population, the number of features characterizing it. The more homogeneous the composition of the population, the smaller the sample size required.

The type of sample dictates the specifics of the calculation .. of the sample size for each of its types according to certain formulas. As a rule, the sample size, depending on the depth of the study, its goals and objectives, is 5-10% of the general population.

learning goals

  1. It is clear to distinguish between the concepts of census (qualification) and sampling.
  2. Know the essence and sequence of the six stages implemented by researchers to obtain a sample population.
  3. Define the concept of "sampling frame".
  4. Explain the difference between probabilistic and deterministic sampling.
  5. Distinguish between fixed size sampling and multistage (consecutive) sampling.
  6. Explain what deliberate sampling is and describe both its strengths and weaknesses.
  7. Define the concept of quota sampling.
  8. Explain what a parameter is in a selection procedure.
  9. Explain what a derived set is.
  10. Explain why the concept of sampling distribution is the most important concept of statistics.

So, the researcher has precisely defined the problem and secured the appropriate research design and data collection tools for solving it. The next step in the research process should be the selection of those elements to be examined. It is possible to examine each element of a given population by making a complete census of this population. A complete survey of the population is called a census (qualification). There is another possibility. A certain part of the population, a sample of elements of a large group, is subjected to statistical examination, and according to the data obtained on this subset, certain conclusions are drawn regarding the entire group. The possibility of extending the results obtained from sample data to a large group depends on the method by which the sample was taken. Much of this chapter will be devoted to how the sample should be drawn and why.

Census (qualification)
Complete census of the population (population).
Sample
A collection of elements of a subset of a larger group of objects.

The concept of "population" or "collection" can refer not only to people, but also to firms operating in the manufacturing industry, to retailers or wholesalers, or even to completely inanimate objects, such as parts produced by the enterprise; this concept is defined as the whole set of elements that satisfy certain given conditions. These conditions uniquely define both the elements that belong to the target group and the elements that should be excluded from consideration.

A study that aims to determine the demographic profile of frozen pizza consumers should begin by identifying who should and should not be classified as such. Do people who have tried such pizza at least once belong to this category? Individuals who buy at least one pizza per month? In Week? Individuals who eat more than a certain minimum amount of pizza in a month? The researcher must be very precise in determining the target group. Care must also be taken to ensure that the sample is drawn from the target population and not from “some” population, which is the case when the sampling frame is inadequate or incomplete. The latter is a list of elements from which a real sample will be formed.

A researcher may prefer a sampling approach to a survey of the entire population for several reasons. First, a complete examination of a population, even of a relatively small size, requires very large material and time costs. Often, by the time the census is completed and the data are processed, the information is already out of date. In some cases, the qualification is simply impossible. Let's say the researchers set out to check the compliance of the actual service life of electric incandescent lamps with the calculated one, for which they need to keep them on until they fail. If you examine the entire supply of lamps in this way, reliable data will be obtained, but there will be nothing to trade.

Finally, to the great astonishment of beginners, the researcher may prefer sampling to census, striving for the accuracy of the results. Censuses require a large staff, which increases the likelihood of bias (non-sampling) errors. This circumstance is one of the reasons why the US Census Bureau uses sample surveys to test the accuracy of various types of censuses. You read that right: sample surveys can be conducted to test the accuracy of the qualification data.

Sample design steps

On fig. Figure 15.1 shows a six-step sequence that a researcher can follow when designing a sample. First of all, it is necessary to determine the target population or set of elements about which the researcher wants to know something.

For example, when studying children's preferences, researchers need to decide whether the target population will consist of only children, only parents, or both.

Aggregate (population)
A set of elements that satisfy certain given conditions.
Sampling frame (base)
The list of elements from which the selection will be made; may consist of territorial units, organizations, persons and other elements.

A certain company tested its electric "races" only on children. The children were completely enthralled. Parents reacted differently to the novelty. The moms didn't like the fact that the attraction didn't teach kids to be car friendly, and the dads didn't like the fact that the product was made like a toy.
The reverse situation is also possible. A firm launched a new food product and launched a nationwide advertising campaign that focused on the precocious child. The firm tested the effectiveness of commercials only on mothers who were thrilled. The children, on the other hand, found this "acceleration", and with it the advertised product itself, disgusting. Product ended 1 .

The researcher must decide who or what the relevant population will consist of: individuals, families, firms, other organizations, credit card transactions, etc. In making such decisions, it is necessary to determine the elements that should be excluded from populations. Both temporal and geographic referencing of elements should be made, which in some cases may be subject to additional conditions or restrictions. For example, if we are talking about individuals, the desired population may consist only of persons over 18 years of age, or only of women, or only of persons with at least a secondary education.

The task of determining the geographical boundaries for the target population in international marketing research can be a particular problem, since this increases the heterogeneity of the system under consideration. For example, the relative ratio of urban and rural areas can vary significantly from country to country. The territorial aspect has a serious impact on the composition of the population and within the same country. For example, in the north of Chile, a predominantly Indian population lives compactly, while in the southern regions of the country, mainly descendants of Europeans live.

Coverage (incident)
The percentage of members of a population or group that meet the conditions for inclusion in the sample.

Generally speaking, the simpler the target population is defined, the higher its coverage (incidence) and the easier and cheaper the sampling procedure. Coverage (incident) corresponds to the proportion of elements of a population or group, expressed as a percentage, that satisfy the conditions for inclusion in the sample. Coverage directly affects the time and material costs required to conduct a survey. If the coverage is large (i.e., most of the population elements meet one or more of the simple criteria used to identify potential respondents), the time and cost required to collect data is minimized. Conversely, with an increase in the number of criteria that potential respondents must meet, both material and time costs increase.

On fig. 15.2 shows the proportion of the adult population involved in certain sports. The data in the figure indicate that it is much more difficult and expensive to examine people who go in for motorcycling (only 3.6% of the total number of adults) than to examine people who take regular recreational walks (27.4% of the total number of adults). The main thing is that the researcher be precise in determining which elements should be included in the study population and which elements should be excluded from it. A clear statement of the purpose of the study greatly facilitates the solution of this problem. The second step in the sample selection process is to determine the sample frame, which, as you already know, is the list of elements from which the sample will be drawn. Let the target population of a certain study be all families living in the Dallas area. At first glance, the Dallas telephone directory might be a good and easily accessible sampling frame. Nevertheless, upon closer examination, it becomes obvious that the list of families contained in the directory is not entirely correct, because the numbers of some families are omitted in it (of course, it does not include families that do not have telephones), while some families have several telephone numbers . Persons who have recently changed their place of residence and, accordingly, their telephone number, are also not present in the directory.

Experienced researchers come to the conclusion that an exact match between the sampling frame and the target population of interest is very rare. One of the most creative steps in designing a sample is determining an appropriate sampling frame in cases where listing population members is difficult. This may require sampling from work blocks and prefixes when, for example, random dialing is used due to shortcomings in telephone directories. However, the significant increase in work units over the past 10 years has made this task more difficult. Similar situations can also arise in the case of selective observation of territorial zones or organizations, followed by taking subsamples, when, say, the target population is individuals, but there is no exact up-to-date list of them.

Source: based on data contained in SSI- LITe TM: L ow Incidence T targeted S ampling" (Fairfield, Conn.: Survey Sampling, Inc., 1994).

The third step in the sampling procedure is closely related to the determination of the sampling frame. The choice of sampling method or procedure depends largely on the sampling frame adopted by the researcher. Different types of samples require different types of sampling frames. This and the next chapter will give an overview of the main types of samples used in marketing research. When describing them, the connection between the sampling frame and the method of its formation should become obvious.

The fourth step in the sampling procedure is to determine the sample size. This problem is discussed in Chap. 17. At the fifth stage, the researcher needs to actually select the elements that will be subjected to the survey. The method used for this is determined by the sample type chosen; when discussing sampling methods, we will also talk about the selection of its elements. And finally, the researcher needs to actually examine the identified respondents. At this stage, there is a high probability of committing a number of errors.
These problems and some methods for their resolution are discussed in Chap. eighteen.

Types of sampling plans (sampling)

All sampling methods can be divided into two categories: observation of probability samples and observation of deterministic samples. In a probabilistic sample, each member of the population can be included with a certain specified non-zero probability. The probability of including certain members of the population in the sample may be different, but the probability of including each element in it is known. This probability is determined by a special mechanical procedure used to select the sample members.

For deterministic samples, estimating the probability of including any element in the sample becomes impossible. The representativeness of such a sample cannot be guaranteed. For example, Allstate Corporation was developing a system to process the claims data of 14 million households (its clients). The company plans to use this data to determine patterns in demand for its services, such as the likelihood that a household that owns a Mercedes Benz will also own a vacation home (which will require insurance). Although the database is very large, the company does not have the means to estimate the likelihood that any particular customer will make a claim. The company thus cannot be sure that the customer data that makes the claim is representative of all the company's customers; and to an even lesser extent - in relation to potential customers.

All deterministic samples are based on the personal position, judgment, or preference of the researcher, rather than on a mechanical selection procedure for sample members. Such preferences can sometimes give good estimates of the characteristics of the population, but there is no way to objectively determine the suitability of the sample for the task. An assessment of the accuracy of the results of the sample can only be made if the probabilities of selecting certain elements were known. For this reason, working with probability sampling is generally considered to be a better method for estimating the magnitude of sampling error. Samples can also be subdivided into fixed-size samples and sequential samples. When working with fixed-size samples, the sample size is determined before the start of the survey, and the analysis of the results is preceded by the collection of all necessary data. We will be mainly interested in fixed-size samples, since this type is usually used in marketing research.

Probability sampling
A sample in which each element of the population can be included with some known non-zero probability.
Deterministic sampling
Sampling based on some particular preferences or judgments that determine the selection of certain elements; at the same time, it becomes impossible to estimate the probability of including an arbitrary element of the population in the sample.

However, it should not be forgotten that there are also sequential samples that can be used with each of the basic sampling designs discussed below.

In a sequential sample, the number of selected elements is not known in advance, it is determined based on a series of sequential decisions. If a survey of a small sample does not lead to a reliable result, the range of elements to be examined is expanded. If the result remains inconclusive after that, the sample size is increased again. At each stage, a decision is made whether to consider the result obtained sufficiently convincing or whether to continue collecting data. Working with sequential sampling makes it possible to assess the trend (trend) of data as they are collected, which reduces the costs associated with additional observations in cases where their expediency fades.

Both probabilistic and deterministic sampling plans fall into a number of types. For example, deterministic samples can be non-representative (convenient), intentional or quota; probabilistic samples are divided into simple random, stratified or group (cluster), they, in turn, can be divided into subtypes. On fig. Figure 15.3 shows the types of samples that will be discussed in this and the next chapter.

Fixed Sample (Fixed Sample)
A sample whose size is determined a priori; the required information is determined by the selected elements.
Sequential sampling
A sample formed on the basis of a series of sequential decisions. If, after considering a small sample, the result is inconclusive, a larger sample is considered; if this step does not lead to a result, the sample size increases again, etc. Thus, at each stage, a decision is made as to whether the result obtained can be considered sufficiently convincing.

It should be remembered that the basic types of samples can be combined to form more complex sampling plans. If you learn their basic initial types, it will be easier for you to deal with more complex combinations.

Deterministic selections

As already mentioned, when selecting elements of a deterministic sample, private estimates or decisions play a decisive role. Sometimes these assessments come from the researcher, while in other cases the selection of population elements is given to field staff. Since the elements are not selected mechanically, it becomes impossible to determine the probability of including an arbitrary element in the sample and, accordingly, the sampling error. Ignorance of the error due to the chosen sampling procedure prevents researchers from assessing the accuracy of their estimates.

Non-representative (convenience) samples

Non-representative (convenience) samples sometimes referred to as random, since the selection of sample elements is carried out in a “random” way - those elements that are or appear to be the most accessible during the selection period are selected.

Our daily life is replete with examples of such selections. We talk with friends and, based on their reactions and positions, we draw conclusions about the political predilections prevailing in society; a local radio station encourages people to express their opinion on some controversial issue, their opinion is interpreted as prevailing; we call for the cooperation of volunteers and work with those who volunteer to help us. The problem with convenience samples is obvious—we cannot be sure that samples of this kind actually represent the target population. We can still doubt that the opinions of our friends correctly reflect the political views prevailing in society, but we are often very eager to believe that larger samples, selected in this way, are representative. Let us show the fallacy of such an assumption with an example.
A few years ago, one of the local television stations in the city where the author of this book lives conducted a daily public opinion poll on topics of interest to the local community. The polls, called "The Madison Pulse", were conducted as follows. Every evening during the six o'clock news, the station asked viewers a question regarding a specific controversial issue, to which it was necessary to give a positive or negative answer.

In the case of a positive answer, it was necessary to call one, in the case of a negative answer - to another phone number. The number of votes "for" and "against" was counted automatically. The ten o'clock newscast reported the results of the telephone survey. Every evening between 500 and 1000 people called the studio to express their position on this or that issue; the television commentator interpreted the results of the poll as the prevailing opinion in society.

Non-representative (convenience) sample
Sometimes called random, because the selection of sample elements is carried out in a “random” way - those elements that are or appear to be the most accessible during the selection period are selected.

In one of the six-hour episodes, viewers were asked the following question: "Don't you think the drinking age in Madison should be lowered to 18?" The existing legal qualification corresponded to 21 years. The audience reacted to this question with extraordinary activity - almost 4,000 people called the studio that evening, of which 78% were in favor of lowering the age limit. It seems clear that a sample of 4,000 "should be representative" of a community of 180,000. Nothing like that. As you may have guessed, certain age groups were more interested in a known outcome than others. Accordingly, it was not surprising that in the discussion of this issue, which took place a few weeks later, it turned out that during the time allotted for the survey, the students acted in concert. They called the television in turn, each several times. Thus, neither the sample size nor the percentage of advocates for the liberalization of the law was anything surprising. The sample was not representative.

Simply increasing the sample size does not make it representative. The representativeness of the sample is ensured not by the size, but by the proper procedure for selecting elements. When survey participants are selected voluntarily or sample items are selected on the basis of their availability, the sampling plan does not guarantee representativeness of the sample. Empirical evidence suggests that samples chosen for convenience are rarely representative (regardless of their size). Telephone polls, which consider 800-900 votes, are the most common form of large but unrepresentative samples.

Intentional sampling
Deterministic (targeted) sampling, the elements of which are selected manually; those elements are selected that, in the opinion of the researcher, meet the objectives of the survey.
Intentional sampling, depending on the ability of the researcher to set the initial set of respondents with the desired characteristics; then these respondents are used as informants who determine the further selection of individuals.

Unfortunately, many people treat the results of such surveys with confidence. One of the most typical examples of the use of non-representative samples in international marketing research is the survey of certain countries based on a sample consisting of foreigners currently living in the territory of the country that initiated the survey (for example, Scandinavians living in the USA). Although such samples may shed some light on certain aspects of the population under consideration, it must be remembered that these individuals usually represent an "Americanized" elite, whose connection with their own country may be rather arbitrary. The use of non-representative samples is not recommended for descriptive or causal surveys. They are acceptable only in exploratory research aimed at testing certain ideas or ideas, but even in this case it is preferable to use deliberate samples.

Intentional selections

Intentional samples are sometimes referred to as unfocused; their elements, which, in the opinion of the researcher, meet the objectives of the study, are selected manually. Procter & Gamble used this method when showing ads to people aged 13 to 17 living near its Cincinnati headquarters. The company's food and beverage division hired this group of teenagers to serve as a sort of consumer sample. Working 10 hours a week in exchange for $1,000 and going to a concert, they watched television commercials, visited supermarkets with company managers to view product displays, tested new products, and discussed buying behavior. By selecting representatives for the sample through a “hiring” process rather than randomly, a company could focus on traits it considered useful, such as a teenager’s ability to express themselves clearly, at the risk that their views might not be representative of their age group.

As already mentioned, the distinguishing feature of deliberate sampling is the directional selection of its elements. In some cases, sample items are selected not because they are representative, but because they can provide researchers with information of interest to them. When the court is guided by the testimony of an expert, it, in a certain sense, resorts to the use of a deliberate selection. A similar position may prevail in the development of research projects. During the initial study of the issue, the researcher is primarily interested in determining the prospects for the study, which determines the selection of sample elements.

Snowball sampling is a type of deliberate sampling used when dealing with specific types of populations. This sample depends on the researcher's ability to specify an initial set of respondents with the desired characteristics. These respondents are then used as informants to determine further selection of individuals.

Imagine, for example, that a company wants to evaluate the need for a product that would allow deaf people to communicate on the phone. Researchers can start developing this problem by identifying key figures in the deaf community; the latter could name other members of the group who would agree to take part in the survey. With this tactic, the sample grows like a snowball.

As long as the researcher is in the initial stages of problem solving, when the prospects and possible limitations of the planned survey are being determined, the use of intentional sampling can be very effective. But in no case should we forget about the weaknesses of this type of sampling, since it can also be used by the researcher in descriptive or causal studies, which will not be slow to affect the quality of their results. A classic example of this forgetfulness is the consumer price index (“CPI”). As Südman points out ( Sudman): “CPI is determined only for 56 cities and metropolitan areas, the selection of which is also influenced by the political factor. In fact, these cities can only represent themselves, while the index is called consumer price index for city dwellers who earn hourly wages*, And employees and appears to most people as an index reflecting the price level in any area of ​​the United States. The choice of retail outlets is also made non-randomly, as a result of which estimation of possible sampling error becomes impossible» (our italics) 2 .

* That is, workers. — Note. per.

Quota samples

The third type of deterministic sampling − quota samples; its known representativeness is achieved by including in it the same proportion of elements with certain characteristics as in the surveyed population (see "Research window 15.1"). As an example, consider trying to create a representative sample of students living on campus. If there is not a single senior student in a certain sample of 500 individuals, we will have the right to doubt its representativeness and the validity of applying the results obtained on this sample to the population being examined. When working with proportional sampling, the researcher can ensure that the proportion of undergraduates in the sample corresponds to their proportion in the total number of students.

Suppose that a researcher conducts a selective study of university students, while he is interested in the fact that the sample reflects not only their belonging to one or another gender, but also their distribution by courses. Let the total number of students be 10,000: 3,200 freshmen, 2,600 sophomores, 2,200 third-year students, and 2,000 fourth-year students; of which 7,000 boys and 3,000 girls. For a sample size of 1,000, the proportional sampling plan requires 320 freshmen, 260 sophomores, 220 third-years and 200 graduates, 700 boys and 300 girls. The researcher can implement this plan by giving each interviewer a certain quota, which will determine which students he should contact.

Quota sampling A deterministic sample, selected in such a way that the proportion of sample elements with certain characteristics approximately corresponds to the proportion of the same elements in the population under study; each field worker is assigned a quota that determines the characteristics of the population with which he must contact.

An interviewer who is to conduct 20 interviews may be instructed to ask:

            • six first-year students - five boys and one girl;
            • six sophomores - four boys and two girls;
            • four third-year students - three boys and one girl;
            • four fourth-year students - two boys and two girls.

Note that the selection of specific sample elements is not determined by the research plan, but by the choice of the interviewer, who is called upon to comply only with the conditions that were set by the quota: interview five freshmen, one freshman, etc.

Note also that this quota accurately reflects the gender distribution of the student population, but somewhat distorts the distribution of students across courses; 70% (14 out of 20) interviews are with boys, but only 30% (6 out of 20) with first-year students, while they make up 32% of the total number of students. The quota allocated to each individual interviewer may not, and usually does not, reflect the distribution of control characteristics in the population—only the final sample should be proportional.

It should be remembered that proportional sampling depends more on personal, subjective attitudes or judgments than on an objective sampling procedure. Moreover, in contrast to deliberate sampling, personal judgment here belongs not to the project developer, but to the interviewer. The question arises whether proportional samples can be considered representative, even if they reproduce the ratio of components inherent in the population that have certain control characteristics. In this regard, three remarks need to be made.

First, the sample may be strikingly different from the population in some other important characteristics, which can have a serious impact on the result. For example, if the study is devoted to the problem of racial prejudice among students, it may not be indifferent circumstance where the respondents came from: from the city or from the countryside. Since the quota for the characteristic "from the city/rural" has not been designated, an accurate representation of this characteristic becomes unlikely. Of course, there is such an alternative: to define quotas for all potentially significant characteristics. However, an increase in the number of control characteristics leads to a complication of the specification. This, in turn, complicates - and sometimes even makes it impossible - the selection of sample elements and, in any case, leads to its rise in price. If, for example, urban or rural affiliation and socioeconomic status are also relevant to the study, then the interviewer may have to look for a first-year student who is urban and upper or middle class. I agree that finding just a male freshman is much easier.

Secondly, it is very difficult to make sure that this sample is really representative. Of course, you can check the sample to see if the distribution of characteristics that are not included in the control, their distribution in the population. However, such a test can only lead to negative conclusions. It is possible to reveal only the divergence of distributions. If the distributions of the sample and the population for each of these characteristics repeat each other, there is a possibility that the sample differs from the population in some other, not explicitly specified, feature.

And finally, thirdly. Interviewers, being left to their own devices, are prone to certain actions. They too often resort to questioning their buddies. Since they often turn out to be like the interviewers themselves, there is a danger of error. Evidence from England suggests that quota samples tend to:

  1. exaggeration of the role of the most accessible elements;
  2. downplaying the role of small families;
  3. exaggeration of the role of families with children;
  4. downplaying the role of industrial workers;
  5. downplaying the role of those with the highest and lowest incomes;
  6. downplaying the role of poorly educated citizens;
  7. downplaying the role of persons occupying a low social position.
Interviewers who choose predetermined quotas by stopping random passers-by are likely to focus on areas with a large number of potential respondents, such as shopping malls, railway stations and airports, entrances to large supermarkets, and the like. This practice leads to an overrepresentation of those groups of people who visit such places most often. When home visits are required, interviewers are often driven by convenience.
For example, they may conduct surveys only during the day, which leads to an underestimation of the opinion of workers. Among other things, they do not enter dilapidated buildings and, as a rule, do not go up to the upper floors of buildings that do not have elevators.

Depending on the specifics of the problem under study, these tendencies can lead to various kinds of errors, while correcting them at the stage of data analysis seems to be very, very difficult. On the other hand, with an objective selection of sample elements, researchers have at their disposal certain tools that make it possible to simplify the procedure for assessing the representativeness of a given sample. When analyzing the problem of the representativeness of such samples, the researcher considers not so much the composition of the sample as the procedure for selecting its elements.

Research Window: Brilliant! But who will read it?

Every year, advertisers spend millions of dollars on ads that appear on the pages of countless publications from the Advertising Age to the Yankee. A certain assessment of the text and image can be made before its publication, as they say, at home, in an advertising agency; it is not really tested and judged until after the ad is published, surrounded by dozens of equally carefully crafted ads vying for the reader's attention.

Company Roper Starch Worldwide evaluates the readability of advertisements placed in consumer, business, trade and professional magazines and newspapers. The results of the research are brought to the attention of advertisers and agencies - of course, for an appropriate fee. Because advertisers go to great lengths every day to get their ads across to the consumer, the company Starch decided to create a sample that would give subscribers timely and accurate information about the effectiveness of advertising. Every year the company Starch interviewed more than 50,000 people, while considering about 20,000 advertisements. About 500 individual publications were studied annually.

Starch used proportional sampling, with a minimum of 100 readers of one gender and 100 readers of the other gender. Starch concluded that with this sample size, the main deviations in the level of readability stabilized. Readers over the age of 18 were interviewed in person, and all publications were considered, except for those intended for special populations (say, girls of the appropriate age were interviewed to evaluate publications from Seventeen magazine).

When conducting surveys, the distribution area of ​​a particular publication was taken into account. Let's say the Los Angeles magazine study looked at readers living in southern California. "Time" was studied nationwide. The survey was devoted to individual issues of the magazine and was conducted in 20-30 cities at the same time.

Each interviewer was given a small quota of interviews, which served the purpose of minimizing the variance of survey results. Questionnaires were distributed among people of different professions and ages with different incomes. Each such study made it possible to present positions to a fairly wide readership. When considering a number of professional, business and industry publications, the specifics of their subscription and distribution were also taken into account. Subscription lists dedicated to publications with a fairly narrow circulation made it possible to select acceptable respondents.

In each survey, interviewers asked respondents to browse through the publication and asked if they had noticed any ad. If the answer was yes, the registrar asked a series of questions to assess the degree of acceptance of the advertisement.

This assessment could be threefold:

  • Pay attention: those who have already paid attention to the very fact of the appearance of such an announcement.
  • Acquainted: those who remembered any part of the advertisement, which dealt with the advertised trademark or advertiser.
  • Read: people who read at least half of the advertisement.

After examining all ads, interviewers recorded key classification information: gender, age, occupation, marital status, nationality, income, family size, and family composition, which allowed for cross-tabulation of the degree of reader interest.

When used properly, company data Starch allow advertisers and agencies to identify both unsuccessful and successful types of advertising schemes that attract and hold the attention of the reader. Information of this kind is extremely valuable for advertisers who are primarily interested in the effectiveness of their advertising campaign.

Source: Roper Starch Worldwide, Mamaronek, NY 10543.

Probability samples

The researcher can determine the probability of including any element of the population in the probability sample, since the selection of its elements is carried out on the basis of some objective process and does not depend on the whims and predilections of the researcher or field worker. Since the element selection procedure is objective, the researcher can assess the reliability of the results obtained, which was impossible in the case of deterministic samples, no matter how careful the selection of the elements of the latter was.

It should not be thought that probabilistic samples are always more representative than deterministic ones. In fact, a deterministic sample may also be more representative. The advantage of probability samples is that they allow an estimate of the potential sampling error. If the researcher works with a deterministic sample, he does not have an objective method for assessing its adequacy to the objectives of the study.

Simple random sampling

Most people come across simple random samples in one way or another, either as part of a statistics course at the institute, or by reading about the results of relevant studies in newspapers or magazines. In a simple random sample, each element included in the sample has the same given probability of being among the elements under study, and any combination of elements in the original population can potentially become a sample. For example, if we want to make a simple random sample of all students enrolled in a particular college, we just need to make a list of all students, assign a number to each name in it, and use a computer to randomly select a given number of elements.

Population

Population
A set of elements that satisfy certain given conditions; also called the study (target) population.
Parameter
A certain characteristic or indicator of the general or studied population.

General, or studied, set is the collection from which the selection is made. This population (population) can be described by a number of specific parameters that are characteristics of the general population, each of which is a certain quantitative indicator that distinguishes one population from another.

Imagine that the population being studied is the entire adult population of Cincinnati. A number of parameters can be used to describe this population: median age, proportion of the population with a tertiary education, income level, etc. Note that all of these indicators have a certain fixed value. Of course, we can calculate them by conducting a complete census of the population under study. Usually, we rely not on the qualification, but on the sample we select and use the values ​​obtained during selective observation to determine the desired parameters of the population.

We illustrate what has been said given in Table. 15.1 an example of a hypothetical population of 20 people. Working with a small hypothetical population like this has a number of advantages. First, the small sample size makes it easy to calculate the population parameters that can be used to describe it. Secondly, this volume allows you to understand what can happen when a particular sampling plan is adopted. Both of these features make it easy to compare sample results with the "true" and in this case known population value, which is not the case for the typical situation in which the actual population value is unknown. Comparison of the assessment with the "true" value in this case acquires special clarity.

Suppose we want to estimate, from two randomly selected items, the average income of individuals in the original population. The average income will be its parameter. To estimate this average value, which we designate as μ, we must divide the sum of all values ​​by their number:

Population mean μ = Sum of population elements / Number of elements.

In our case, the calculations give:

Derived population

Derived population consists of all possible samples that can be selected from the general population according to a given sampling plan (sampling plan). Statistics is a characteristic, or indicator, of the sample. The sample statistic value is used to estimate a particular population parameter. Different samples provide different statistics or estimates for the same population parameter.

Derived population
The set of all possible distinguishable samples that can be selected from the general population according to a given sampling plan. Statistics A characteristic or measure of a sample.

Consider the derived set of all possible samples that can be selected from our hypothetical population of 20 individuals by a sampling plan that assumes that the sample size is n=2 can be obtained by random non-repetitive selection.

Suppose for a moment that the data for each unit of the population - in our case, the name and income of an individual - are written on circles, after which they are lowered into a jug and mixed. The researcher removes one circle from the jug, writes off information from it and puts it aside. He does the same with the second mug taken from the jug. Then the researcher returns both mugs to the jug, mixes its contents and repeats the same sequence of actions. In table. 15.2 shows the possible outcomes of the named procedure. For 20 circles, 190 such pair combinations are possible.

For each combination, you can calculate the average income. Let's say for sampling AB (k= 1)

k-e Sample Mean = Sum of Samples / Number of Samples =

On fig. 15.4 shows the estimate of the mean income for the entire population and the amount of error for each estimate for the samples k = 25, 62,108,147 And 189 .

Before proceeding to consider the relationship between the sample mean income (statistics) and the population mean income (a parameter that needs to be estimated), let's say a few words about the derived population. First, in practice we do not compile aggregates of this kind. It would require too much time and effort. The practitioner is limited to compiling only one sample of the required size. The researcher uses concept derived population and the associated concept of sampling distribution when formulating final conclusions.

How will be shown below. Secondly, it should be remembered that a derived population is defined as the totality of all possible different samples that can be selected from the general population according to a given sampling plan. When any part of the sampling plan is changed, the derived population also changes. So, if, when choosing circles, the researcher returns the first of the removed disks to the jug before removing the second one, the derived set will include.

samples AA, BB, etc. If the number of non-repeated samples is 3 instead of 2, there will be samples of type ABC, and there will be 1140 of them, not 190, as was the case in the previous case. When simple random selection is changed to any other method of determining the elements of the sample, the derived population also changes.

It should also be remembered that the selection of a sample of a given size from the general population is equivalent to the selection of one element (1 out of 190) from the derived population. This fact allows us to draw many statistical conclusions.

Sample mean and general mean

Can we equate the sample mean with the true population mean? In any case, we proceed from the fact that they are interconnected. However, we also believe that there will be an error. For example, it can be assumed that the information received from Internet users will differ significantly from the results of a survey of the "ordinary" population. In other cases, we can assume a fairly accurate match, otherwise we could not use the sample value to estimate the value of the general one. But how big can be the mistake we make in doing so?

Let's add up all the sample means contained in Table. 15.2, and divide the resulting sum by the number of samples, i.e., let's average the averages.
We will get the following result:

It coincides with the average value of the general population. They say that in this case we are dealing with unbiased statistic.

A statistic is called unbiased if its average over all possible samples is equal to the estimated population parameter. Note that we are not talking about a particular value here. The partial estimate can be very far from the true value - take, for example, the AB or ST samples. In some cases, the true value of the population may not be achievable when considering any possible sample, even if the statistics are unbiased. In our case, this is not the case: a number of possible samples - for example, AT - gives a sample mean equal to the true population mean.

It makes sense to consider the distribution of these sample estimates, and in particular the relationship between this dispersion of estimates and the variation in the level of income in the population. The variance of the general population is used as a measure of variation. To determine the variance of the general population, we must calculate the deviation of each value from the mean, add the squares of all deviations and divide the resulting sum by the number of terms. Denote by a^ the variance of the general population. Then:

Population variance σ 2 = Sum of squared differences of each element
population and population average / Number of population elements =

Dispersion mean value income level can be defined in the same way. That is, we can find it by determining the deviations of each mean from their total mean, summing the squares of the deviations, and dividing the resulting sum by the number of terms.

We can also define the variance of the mean income level in another way, using the variance of income levels in the general population, since there is a direct relationship between the two. To be precise, in cases where the sample represents only a small part of the population, the variance of the sample mean is equal to the variance of the population divided by the sample size:

where σ x 2 is the variance of the average sample value of the income level, σ 2 is the variance of the income level in the general population, n— sample size.

Now let's compare the distribution of results with the distribution of a quantitative trait in the general population. Figure 15.5 shows that the distribution of the population trait shown in box A is multi-vertex (each of the 20 values ​​appears only once) and is symmetrical about the true population mean of 9400.

Sampling distribution
The distribution of the values ​​of a particular statistic calculated for all possible distinguishable samples that can be extracted from the population under a given sampling plan.

The distribution of grades shown in field B is based on the data in Table. 15.3, which, in turn, was compiled by assigning values ​​from Table. 15.2 to one or another group, depending on their size, with subsequent calculation of their number in the group. Field B is a traditional histogram, considered at the very beginning of the study of statistics course, which represents sampling distribution statistics. We note in passing the following: the concept of sampling distribution is the most important concept of statistics, it is the cornerstone of the construction of statistical inferences. According to the known sample distribution of the studied statistics, we can conclude about the corresponding parameter of the general population. If it is only known that the sample estimate changes from sample to sample, but the nature of this change is unknown, it becomes impossible to determine the sampling error associated with this estimate. Since the sampling distribution of an estimate describes how it changes from sample to sample, it provides a basis for determining the validity of a sample estimate. It is for this reason that a probability sampling design is so important for statistical inference.

Given the known probabilities of including each member of the population in the sample, interviewers can find the sample distribution of various statistics. It is these distributions that researchers rely on—whether it be the sample mean, sample fraction, sample variance, or some other statistic—when extending the result of a sample observation to the general population. Note also that for samples of size 2, the distribution of the sample means is unimodal and symmetrical about the true mean.

So we have shown that:

  1. The mean of all possible sample means is equal to the general mean.
  2. The variance of the sample means is related in some way to the general variance.
  3. The distribution of sample means is unimodal, while the distribution of the values ​​of a quantitative attribute in the general population is multi-modal.

Central limit theorem

A theorem saying that for simple random samples of size n, isolated from the general population with the general average μ and variance σ 2 , at large n the distribution of the sample mean x approaches normal with a center equal to μ and a variance σ 2 . The accuracy of this approximation increases with increasing n.

Central limit theorem. The unimodal distribution of estimates can be considered as a manifestation of the central limit theorem, which states that for simple random samples of volume n, selected from the general population with the true mean μ and variance σ 2 , for large n the distribution of sample means approaches normal with a center equal to the true mean and a variance equal to the ratio of the population variance to the sample size, i.e.:

This approximation becomes more and more accurate as n. Remember this. Regardless of the type of population, the distribution of sample means will be normal for samples of a sufficiently large size. What is meant by a sufficiently large volume? If the distribution of values ​​of a quantitative attribute of the general population is normal, then the distribution of sample means for samples with a volume of n=1. If the distribution of a variable (quantitative attribute) in the population is symmetrical but not normal, samples of a very small size will give a normal distribution of sample means. If the distribution of a quantitative attribute of the general population has a pronounced asymmetry, there is a need for larger samples. And yet, the distribution of the sample mean can only be taken as normal if we are dealing with a sample of sufficient size.

In order to draw conclusions using a normal curve, it is not at all necessary to proceed from the condition of normality of the distribution of values ​​of a quantitative attribute of the general population. Rather, we rely on the central limit theorem and, depending on the population distribution, determine such a sample size that would allow us to work with a normal curve. Fortunately, the normal distribution of statistics is provided by samples of a relatively small size - Fig. 15.6 clearly demonstrates this circumstance. Confidence interval estimates. Can the above help us in making certain conclusions about the general average? Indeed, in practice, we select only one, and not all possible samples of a given size, and on the basis of the data obtained, we draw certain conclusions regarding the target group.

How does it happen? As you know, with a normal distribution, a certain percentage of all observations have a certain standard deviation; say 95% of the observations fit within ±1.96 standard deviations of the mean. The normal distribution of sample means, to which the central limit theorem can be applied, is no exception in this sense. The mean of such a sample distribution is equal to the general mean μ, and its standard deviation is called the standard error of the mean:

It turns out that:

  • 68.26% of the sample means deviate from the general mean by no more than ± σ x ;
  • 95.45% of the sample means deviate from the general mean by no more than ±σ x ;
  • 99.73% of the sample means deviate from the general mean by no more than ± σ x ,

i.e. a certain proportion of sample means depending on the chosen value z will be enclosed in the interval determined by the value z. This expression can be rewritten as an inequality:

General average - z < Среднее по выборке < Генеральное среднее + z(Standard error of the mean)

thus, the sample mean with a certain probability is in the interval, the boundaries of which are the sum and difference of the mean value of the distribution and a certain number of standard deviations. This inequality can be converted to the form:

Sample mean - z(Standard error of the mean)< Генеральное среднее < Среднее по выборке + z(Standard error of the mean)

If the ratio 15.1 is observed, for example, in 95% of cases ( z= 1.96), then in 95% of cases the ratio 15.2 is also observed. In cases where the conclusion is based on a single sample mean, we use expression 15.2.

It is important to remember that expression 15.2 does not mean that the interval corresponding to a given sample must necessarily include the general mean. The interval has more to do with the selection procedure. The interval built around this mean may or may not include the true population mean. Our confidence in the correctness of the conclusions made is based on the fact that 95% of all intervals constructed according to the selected sampling plan will contain the true mean. We believe that our sample belongs to this 95%.

To illustrate this important point, imagine for a moment that the distribution of sample means for samples of size n= 2 in our hypothetical example is normal. Table 15.4 graphically illustrates the outcome for the first 10 of the possible 190 samples that can be selected according to the given design. Note that only 7 out of 10 intervals include a general or true mean. Confidence in the correctness of the conclusion is due not to some private assessment, but precisely procedure estimates. This procedure is such that for 100 samples for which the sample mean and confidence interval will be calculated, in 95 cases this interval will include the true general value. The accuracy of this sample is determined by the procedure by which the sample was formed. A representative sampling design does not guarantee the representativeness of all samples. Statistical inference procedures are based on the representativeness of the sampling plan, which is why this procedure is so critical for probability samples.

Probabilistic sampling allows us to evaluate the accuracy of the results as the proximity of the estimates produced to the true value. The larger the standard error of statistics, the higher the degree of scatter of estimates and the lower the accuracy of the procedure.

Some may be confused by the fact that the confidence level is related to the procedure and not to a particular sample value, but it should be remembered that the value of the confidence level of the estimate of the general value can be adjusted by the researcher. If you don't want to take risks and are afraid that you might come across one of the five chosen sample intervals that does not include the population mean, you can choose a 99% confidence interval where only one of the hundred sample intervals does not include the population mean. Further, if you can increase the sample size, you will increase the degree of confidence in the result, providing the desired accuracy of the estimate of the population value. We will talk about this in more detail in Chap. 17.

The procedure we are describing has one more component, which can cause a certain embarrassment. When estimating the confidence interval, three quantities are used: x , z and σ x . The sample mean x is calculated from the sample data, z is chosen based on the desired confidence level. But what about the root mean square error of the mean σ x ? It is equal to:

and therefore, to determine it, we need to ask the standard deviation of the quantitative attribute of the general population, i.e. 5. What to do in cases where the standard deviation s unknown? This problem does not arise for two reasons. First, usually for most of the quantitative characteristics used in marketing research, the variation changes much more slowly than the level of most of the variables of interest to the marketer. Accordingly, if the study is repeated, we can use the previous, previously obtained value of s in the calculations. Second, once the sample is selected and the data is obtained, we can estimate the population variance by determining the sample variance. The unbiased sample variance is defined as:

Sample variance ŝ 2 = Sum of squared deviations from the sample mean / (number of sampled items -1). To determine the sample variance, we first need to find the sample mean. Then the differences between each of the sample values ​​and the sample mean are found; these differences are squared, summed, and divided by a number equal to the number of sample observations minus one. The sample variance not only provides an estimate of the total variance, but can also be used to estimate the standard error of the mean. When the general variance σ 2 is known, the root mean square error σ x is also known, because:

When the general variance is unknown, the standard error of the mean can only be estimated. This estimate is given ŝ x , which is equal to the standard deviation of the sample divided by the square root of the sample size, i.e. . The estimate is determined in the same way as the estimate of the true value was determined, but instead of the general standard deviation, the standard deviation of the sample is substituted into the calculation formula. So, let's say for sample AB with a sample mean of 5800:

Accordingly, ŝ = 283, and

and 95% spacing is now

which is less than the previous value.

In table. 15.5 summarizes the calculation formulas for various averages and dispersions, which were discussed in this chapter. Formation of a simple random sample. In our example, the selection of sample elements was carried out using a jug, which contained all the elements of the original population. This allowed us to visualize the concepts of derived population and sampling distribution. We do not recommend using such a method in practice, because this increases the likelihood of error. Mugs can differ in both size and texture, which in certain cases may lead to preference for one over the other. The selection of participants in the Vietnamese campaign, carried out by means of a lottery, can serve as an example of a mistake of this kind.

The selection was carried out by pulling discs with dates of birth from the big drum. Television broadcast this procedure throughout the country. Unfortunately, the discs were loaded into the drum in a systematic way, with January dates coming first and December dates last. Although the drum was subjected to intense spinning, December dates fell much more often than January. Subsequently, this procedure was revised in such a way that the probability of such systematic errors was significantly reduced. The preferred method for generating a simple random sample is based on the use of a table of random numbers.

Using such a table involves the following sequence of steps. First, the elements of the population must be assigned consecutive numbers from 1 to N; in our hypothetical population to the element BUT number 1 will be assigned to the element B- number 2, etc. Secondly, the number of digits in the table of random numbers must be the same as that of the number N. For N= 20 two-digit numbers will be used; for N between 100 and 999 - three-digit numbers, etc. Thirdly, the starting position must be determined randomly. We can open the corresponding table of random numbers and, closing our eyes, as they say, poke a finger at it. Because the numbers in the random number table are in random order, the starting position doesn't really matter.

And finally, we can move in any arbitrarily chosen direction - up, down or across, selecting those elements whose numbers will correspond to random numbers from the table. In order to illustrate what has been said, consider the abbreviated table of random numbers (Table 15.6). Insofar as N= 20, we should only work with double digit numbers. In this sense, Tab. 15.6 suits us perfectly. Suppose we have decided in advance to move down the column, the initial position is at the intersection of the eleventh row and the fourth column, where the number 77 is located. This number is too large and therefore should be discarded. The next two numbers will also be discarded, while the fourth value 02 will be used since 2 is the element number IN.

The next five numbers will also be discarded as too large, while the number 05 will indicate the element E. So the elements IN And E will become our two-element sample, by which we will judge the level of income of this population. An alternative strategy is also possible, in which a computer program generating random numbers will be used as the basis for selection. Recent publications indicate that the numbers generated by such programs are not completely random, which can manifest itself in a certain way when building complex mathematical models, but they can be used for most applied marketing research. Note again that a simple random sample requires the compilation of a sequential numbered list of elements of the general population.

In other words, each member of the original population must be identified. For some populations, this is not difficult to do, for example, in a study of the 500 largest American corporations, a list of which is given in Fortune magazine. This list has already been compiled, so the formation of a simple random sample in this case will not be difficult. For other initial populations (for example, for all families living in a particular city), compiling a general list is extremely difficult, which forces researchers to resort to other sample survey schemes.

Summary

Learning objective 1
Clearly distinguish between the concepts of census (qualification) and sampling

A complete census of the population (population) is called qualified. Sample set, formed from the selected elements.

Learning objective 2
Know the essence and sequence of the six stages implemented by researchers to obtain a sample population

The sampling process is divided into six steps:

  1. population assignment;
  2. determination of the sampling frame;
  3. choice of selection procedure;
  4. determination of the sample size;
  5. selection of sample elements;
  6. examination of the selected elements.

Learning objective 3
Define the concept of "sampling frame"

The sampling frame is the list of items from which the sample will be taken.

Learning objective 4
Explain the difference between probabilistic and deterministic sampling

In a probabilistic sample, each member of the population can be included with a certain given non-zero probability. The probabilities of including certain members of the population in the sample may differ from each other, but the probability of including each element in it is known. For deterministic samples, estimating the probability of including any element in the sample becomes impossible. The representativeness of such a sample cannot be guaranteed. All deterministic selections are based, rather, on a personal position, judgment, or preference. Such preferences can sometimes give good estimates of the characteristics of the population, but there is no way to objectively determine the suitability of the sample for the task.

Learning objective 5
Distinguish between fixed size sampling and multi-stage (consecutive) sampling

When working with fixed-size samples, the sample size is determined before the start of the survey and the analysis of the results is preceded by the collection of all required data. In a sequential sample, the number of selected elements is not known in advance, it is determined based on a series of sequential decisions.

Learning objective 6
Explain what deliberate sampling is and describe both its strengths and weaknesses

Intentional sampling items are hand-selected and presented to the researcher as appropriate for the purposes of the survey. It is assumed that the selected elements can give a complete picture of the studied population. As long as the researcher is in the early stages of problem solving, when the prospects and possible limitations of the planned survey are being determined, the use of intentional sampling can be very effective. But in no case should we forget about the weaknesses of this type of sampling, since it can also be used by the researcher in descriptive or causal studies, which will not be slow to affect the quality of their results.

Learning objective 7
Define the concept of quota sampling

Proportional sampling is selected in such a way that the proportion of sample elements with certain characteristics approximately corresponds to the proportion of the same elements in the population under study; to do this, each counter is assigned a quota that determines the characteristics of the population with which it must contact.

Learning objective 8
Explain what a parameter is in a selection procedure

Parameter - a certain characteristic or indicator of the general or studied population; a certain quantitative indicator that distinguishes one set from another.

Learning objective 9
Explain what a derived set is

A derived population consists of all possible samples that can be selected from the general population according to a given sampling plan.

Learning objective 10
Explain why the concept of sampling distribution is the most important concept of statistics.

The concept of sampling distribution is the cornerstone of statistical inference. According to the known sample distribution of the studied statistics, we can conclude about the corresponding parameter of the general population. If it is only known that the sample estimate changes from sample to sample, but the nature of this change is unknown, it becomes impossible to determine the sampling error associated with this estimate. Since the sampling distribution of an estimate describes how it changes from sample to sample, it provides a basis for determining the validity of a sample estimate.

Sample - this:

1) the totality of those elements of the object of study, which will be directly studied;

2) methods and procedures for selecting elements of the object of study.

Population - a complete set of objects related to the problem under study. In sociological studies as G.S. most often, aggregates of individuals act - the population (cities, countries, etc.), a social group (youth, the unemployed, businessmen, etc.), the audience of the mass media (MSK), etc. However, in many cases, G.S. . may consist of larger elements (objects) - families (households), academic groups, enterprises, religious communities, individual settlements or states, etc.

Sample population - part of the objects from the general population selected for study in order to draw a conclusion about the entire population.

In order for the conclusion obtained by studying the sample to be extended to the entire population, the sample must have the property of being representative.

Representativeness is the ability of the sample to represent the population under study. The more accurately the composition of the sample represents the population on the issues under study, the higher its representativeness.

EXAMPLE: Representativeness can be illustrated by the following example. Suppose the population is all the students of the school (600 people from 20 classes, 30 people in each class). The subject of study is the attitude to smoking. A sample of 60 high school students represents the population much worse than a sample of the same 60 people, which will include 3 students from each class. The main reason for this is the unequal age distribution in the classes. Therefore, in the first case, the representativeness of the sample is low, and in the second case, the representativeness is high (ceteris paribus).

Sample types

1. Random sampling.

1.1. Simple random selection.

1.2. The method of systematic (or mechanical) sampling.

1.3. Serial (nested or cluster) sampling.

1.4 Stratified sampling.

2. Non-random sampling (non-probability).

2.2. random selection.

2.3. Multi-stage and single-stage sampling.

1. Random sampling.

A feature of random sampling is that all units of the general population have an equal probability of being included in the sample. For random sampling, principle of chance. The basis of the sample can be lists of employees of the enterprise, telephone directories, registration lists of car owners, voter lists at polling stations, house books, as well as various lists compiled by the sociologist himself, depending on the objectives of the study (a list of streets on which the selection of respondents is then carried out).

Random sampling is usually used in public opinion polls before elections, referendums and other public events.

plus of this method is the complete observance of the principle of randomness and, as a result, the avoidance of systematic errors.

Disadvantages of this method:

– The need for a list of elements of the population.

- Difficulty in conducting the survey.

– Relatively large sample size.

Sample or sampling set - a set of cases (subjects, objects, events, samples), using a certain procedure, selected from the general population for participation in the study.

Sample characteristics:

  • Qualitative characteristics of the sample - who exactly we choose and what methods of sample construction we use for this.
  • The quantitative characteristic of the sample is how many cases we select, in other words, the sample size.

Need for sampling

  • The object of study is very broad. For example, consumers of the products of a global company are a huge number of geographically scattered markets.
  • There is a need to collect primary information.

Dependent and independent samples

When comparing two (or more) samples, their dependence is an important parameter. If it is possible to establish a homomorphic pair (that is, when one case from sample X corresponds to one and only one case from sample Y and vice versa) for each case in two samples (and this basis of relationship is important for the trait measured in the samples), such samples are called dependent. Examples of dependent selections:

  • pair of twins
  • two measurements of any feature before and after experimental exposure,
  • husbands and wives

If there is no such relationship between the samples, then these samples are considered independent, for example:

  • men and women,
  • psychologists and mathematicians.

Accordingly, dependent samples always have the same size, while the size of independent samples may differ.

The concept of "sample" in statistics, sociology, marketing is considered in two meanings. Firstly, it is a set of elements of the general population to be studied, i.e. sampling set. Secondly, sampling is the process of forming a sample population under the necessary condition of ensuring representativeness. Allocate different types of sampling (selection) and types of samples.

As for the types of samples, in principle there are three of them. We are talking about the very principles of the approach to the selection of sampling units from the general population. They may be as follows:

spontaneous selection, i.e. selection based on the principle of voluntariness and accessibility of the inclusion of units of the general population in the sample. It is used quite often, in particular, in mail and press surveys. The main disadvantage of such a selection is the impossibility of a qualitative representation of the general population;

probabilistic(random) selection- one of the main ones used in sociological research. The main principle of such selection is to ensure that each unit of the general population is able to get into the sample. For this purpose, tables of random numbers, lottery selection, mechanical selection are used;

stratified selection, which is based on the construction of a qualitative model of the general population, then - the selection of observation units in the sample population, based on the existing model.

[AND Sources: Wikipedia, Poltorak V.A. Marketing Research: Methods and Technologies]


Task number 3

Question: Expand the content of the concept of social change.

The concept of social change. The concept of "social change" refers to various changes occurring over time in social communities, groups, institutions, organizations and societies, in their relationships with each other, as well as with individuals. Such changes can be carried out: at the level of interpersonal relationships (for example, changes in the structure and functions of the family); at the level of organizations and institutions (education, science are constantly subject to changes both in terms of their content and in terms of their organization), at the level of small and large social groups (in Russia, in particular, the composition of the working class, the peasantry is now changing, new social groups - entrepreneurs), at the societal and global levels (migration processes, economic and technological development of some countries and stagnation and crisis in others, environmental and military threat to the existence of mankind, etc.).

Section II. MATHEMATICAL STATISTICS

Topic 6. Selective method. Variation series

And its characteristics

Mathematical statistics is concerned with the study of the patterns that govern mass phenomena, based on the results of observations.

Purpose of MS: creation of methods for collecting and processing statistical data to obtain scientific and practical conclusions.

Methods of mathematical statistics are needed to solve two tasks:

1) an indication of the methods for collecting and grouping statistical information obtained as a result of experiments or observations;

2) development of statistical data analysis methods (evaluation of distribution functions and parameters; testing of statistical hypotheses; evaluation of dependencies between random variables).

The concept of selective observation and its theoretical properties.

In the practice of statistical observations, two types of observations are distinguished:

Continuous, when all objects of the population are studied (population census);

Selective, when a part of randomly selected objects is studied (sociological studies covering a part of the population).

The theory of selective observation is based on statistical regularities that are formed and found in mass phenomena and processes.

Patterns associated with chance and only in a variety of phenomena manifesting themselves as a law are called statistical. This property of patterns is connected with the law of large numbers. The mathematical basis of the law of large numbers, and of statistical science in general, is the theory of probability, which studies random phenomena (events) that have a stable particularity, and, consequently, probability, which helps to identify patterns in the mass repetition of phenomena.

General population and sample. Sample types.

General population is the set of all objects to be studied, from which a sample is made.

sampling set, or, sampling, is a set of objects randomly selected from the general population, subject to direct study.

Population size is the number of its objects. The general population can have both finite and infinite size (N), while the sample can only have a finite size (n).

Example. Of the 2000 products, 100 products were selected for the survey, then the volume of the general population is , and the sample size is .

Sampling method- This is a research method in which the properties of the general population are examined using a sample. At the same time, the conclusions obtained in the study of this part are distributed to the entire set of objects.

Sample types

Simple random sampling, formed by a random selection of elements without dividing the general population into parts.

Mechanical sampling, in which elements from the general population are selected at a certain interval. So, if the sample size should be 10% of the general, then every 10th element is selected.

Typical sample, into which elements are randomly selected from typical groups, into which the general population is divided according to some criterion. For example, the selection of parts from the production of each machine, and not from the total.

serial sampling, in which not individual elements are randomly selected, but entire groups of the population (series).

Repeated called a sample in which the selected object after the study is returned to the general population and it can be re-selected.

Non-repeating called a sample in which the selected object in the sample is not returned to the general population.

representative(representative) is a sample by which we can judge the trait of interest to us in the entire general population. Sample representativeness conditions:

1) parts of the sample should be proportional to parts of the general population;

2) the sample should clearly demonstrate all the features of the trait under study;

3) the sample must be large enough;

4) random sampling.


close