09 June 2022

Stephenson's "Small-Sample Doctrine"


William Stephenson
The Play Theory of Mass Communication
(1987 edition)
(orig. 1967)

[10]

A METHODOLOGICAL ADVANCE


... For over a century social scientists have been concerned with the fundamental problem of what should be the basis of measurement in their science. In physics there are units of weight and length, time and mass, and these suffice for all measurements. In the social sciences, when these units cannot be used, recourse is made to other devices either of an ad hoc nature (different therefore in every study) or else systematically constructed, as scales of intelligence, attitudes... These scales are based on the large-sample theory and the Theory of Error... Psychologists understand this very well, for a branch of their work, called differential psychology, is fashioned upon this methodology. The principle is important: it supposes that if we wish to measure a person's intelligence (for example), a test is made and applied to a large sample of individuals from a parent population. According to the theory of errors, the scores gained by such a large sample, for a suitably constructed test, will tend to be normally distributed. The scores—whatever their units may be—can be transformed to standard scores, which are pure numbers whose mean for the test is 0 and whose standard deviation is 1.0. ...then, any test one cares to make for the parent population of individuals...can be systematically reduced to standard scores, the same pure numbers for everything so measured. ...

The elegance of this has long been overlooked even in the study of the psychology of individual differences; it is unlikely that the founders of
[11]
this branch of psychology, Sir Francis Galton and Charles Spearman, would have been so remiss. The methodlogy suffers, however, from the limitation that all measurement in it is relative to the samples. The units for a sample of ten-year-old children would not necessarily be the same as those for a sample of adults, and it is difficult, if not impossible, to find any beginning point or absolute zero from which to begin to make measurements.

What I have done in Q-methodology is to discard these differential and parent person-population methods (called R-methodology) in favor of a comparative one based on the single case. This miniscule procedure calls upon the individual to model his subjectivity...in the form of distributions of scores which are subjective to him but which again are subject to the law of error. A basis is provided in this way for measurement of anything subjective to the person. It looks at first sight as though it must be the antithesis of objective science and indeed for this reason the methodology has been slow to gain wide acceptance; yet it provides a basis for measurement of feelings, attitudes...and all else of a subjective nature; and it does so in relation to the law of error. All scores are pure numbers; all are standard scores... They are relative to a parent population of statements (and not individuals), whose nature will be described later. But, most important, the scores given to the statements by different individuals are comparable—the zero on all scales is the same absolute value for everyone.

Q-method is important in another way—every measurement made in the method is subjective and central-to-self of the person who performs a Q-sort. Every measurement involves the self explicitly, as a self-concept or the like. The importance of this becomes clear when it is realized that in all measurement along sampling (R) lines this self-reference is everywhere overlooked. The concern in Q-method is with a person's ideas, attitudes, opinions, beliefs, as these are modeled by the individual as such. A profound and basic error is made in R-method to achieve its objectivity: it measures ideas, attitudes, beliefs, opinions, and so on categorically—that is, as abstractions—oblivious of the self-reference which attaches to all such matters.


...

[17]

SMALL-SAMPLE DOCTRINE


...

Social scientists use random sampling method (whether probability, area, or quota) to select examples of people for their surveys. The practice depends on the large-sample doctrine... The law of error, upon which the doctrine depends, fascinated the early social scientists. Sir Francis Galton wrote:
I know of scarcely anything so apt to impress the imagination as the wonderful form of cosmic order, expressed by the "Law of Error." The law would have been personified by the Greeks and deified, if they had known of it [...] whenever a large sample of chaotic elements are taken in hand and marshalled in the order of their magnitude, an unsuspected and most beautiful form of regularity proves to have been latent all the time.
A parent population of persons is postulated, in the large-sample doctrine, as involving "chaotic elements"; people are so complex that it is considered better to start with the assumption that for anything one can consider about human beings and their behavior the law of error will apply. Thus,...the endeavor is to reduce error to a minimum by using large numbers of cases—the larger the n, the smaller the error, and the clearer, therefore (or so it
[18]
seems), will real facts appear. ...

It is only in recent years, with Fisher, that a different logic has become acceptable. Fisher does not concern himself with reducing error to a minimum, but with measuring it in any given set of circumstances; one can then determine how far real facts occur in comparison with the measured error. This is achieved in relation to balanced designs whose replication provides the necessary measurements of error in terms of which to measure the main "effects" of the design.

...

[19] Q-samples are usually composed for balanced designs, but the Q-sort data are rarely subjected to variance analysis because usually we are not
[20]
primarily interested in the specified effect. ...instead of testing, we wish to be inductive, that is, to make discoveries rather than to test specified hypotheses. ...

When individuals perform a Q-sort they are, of course, unaware of the structure; they merely offer a description of a condition. But in the process they are apt to project, to displace affect, to rationalize, and to do much else in a dynamic manner in relation to effects which are quite different from those of the Q-sample's structure. We want to be free, in analysis of Q-sorts, to make use of what the individual actually does in these respects. ...I have argued elsewhere for the wider use of abductive inference, that is, for conditions which allow us to arrive at conclusions on other than deductive grounds. I therefore resort to factor analysis, an abductive and not a deductive method like variance analysis, to analyze Q-sort data. The structure of a Q-sample is nevertheless important; it helps to make Q-sample comprehensive and provides the formula for repeating any Q-sample.

...

In practice I work with Q-samples of size n=60 or so. Except for illustrative purposes, Q-samples of size n=20 are scarcely large enough.


P-SAMPLES

... Instead of randomly selecting individuals from a defined population by probability, area, or stratification principles, I seek to represent known interests in the selection, and choose subjects to fit balanced designs.

Thus in the study by Stouffer...
[21]
...it was deduced from theory that women and rural dwellers would prefer to get their news by radio whereas men and urban dwellers would prefer to get the news from newspapers. Stouffer used a random sample of 5,528 men and women...to test his hypothesis. I would proceed, instead, with the factorial design in Table 3 to represent the hypotheses. There are 2 X 2

TABLE 3
P-SAMPLE FOR COMMUNICATION RESEARCH
Effects Levels No.
A, Region      
B, Sex     
(a) Rural       (b) Urban    
(b) Men     (d) Women      
     2      
     2      


combinations for this design, and a P-sample of size n=20 would be taken by replicating five times for each combination:

aa     bb
cd     cd

One would end with ten men and ten women, five of each group urban and five rural.
[SK: sic! My amateur coding sucks, but I have triple-checked the tiny little letters here. I suspect one or more typos in the book. But we get the point as well as we need to and are able to as non-specialists, no?]
Whatever one measures the individuals for...the effects can be analyzed by variance analysis; one tests in this way whether (ad) prefer radio and (bc) newspapers, whether men prefer newspapers irrespective of region, and whether urban people prefer newspapers irrespective of sex. All such hypotheses are asserted in the design, and in balance, a matter that Stouffer could not achieve in random sampling.

For my purposes, again, I am not primarily interested in such hypotheses. Many men may have feminine attitudes, and some rural dwellers may be highly sophisticated cosmopolitans. The allotment of individuals to the cells of the design, however, serves as a rough control of such effects, and the design helps us put together comparable P-samples.

...

[22]

STOUFFER'S THEORY


... What was novel in the case of Stouffer's study...was the assertion of a theory from which it was deduced that men would prefer to have their news from newspapers, women from radio, and so on. The theory consisted of a set of "preliminary speculations," consisting of two lists—one of advantages of radio and the other of advantages of newspapers—as follows:

ADVANTAGES OF RADIO

a) Radio delivers news first.
[etc., etc., ...]

ADVANTAGES OF THE NEWSPAPERS

f) It delivers fuller news.
[etc., etc., ...]


[23]

... Stouffer deduced that in general rural women have little time to read newspapers, can work while they listen to the news on the radio, and so on; therefore they would prefer to have news by radio. A comparable argument in terms of the postulated statements led him to deduce that urban men would prefer to have their news from newspapers. It may not be regarded as a profound theory, but it followed the pattern of the hypothetico-deductive method: postulates were asserted; a coordinating construct was introduced (the assumption that people prefer one medium to another); deductions were drawn; they were stated as testable propositions; and the testing on a 5,528 sample proved these to be true. Men actually did prefer newspapers, and women, radio, and so on.

Why then do I prefer to use factor analysis rather than such well-established methods? The reason is very simple: I do not accept the theories so employed, and I do not agree that they explain what they purport to explain. Thus one can provide a totally different set of "preliminary speculations" and derive the same hypotheses, proving them by the same 5,528 sample. Rural women, for example, are apt to think that newspapers are sinful, promoting sex, waste, violence, and the like; radio, instead, was thought of as a scientific marvel. From such "speculations" one can deduce well enough that rural women will prefer the news by radio rather than by newspaper. In general the hypothetico-deductive method has the failing that it ignores alternative explanations for tested hypotheses. I prefer to keep the door open for different theories, that is, for a different selection of postulates (from among innumerable available) in terms of which to explain data.

... whereas they ["preliminary speculations"] were all regarded as facts by Stouffer (requiring no proof in themselves) in our case all are synthetic propositions (all have excess meaning and cannot be proved easily except in general [averaging] terms.

It will become apparent that I make more use of information about subjects than is current in social science by large-sample methods. Individuals for us are never merely cyphers or numbers chosen by prob-
[24]
ability methods; instead, they represent known interests—being young or old, male or female, blue-shirt or white-collar, expert or non-expert, for example—information which is of aid to us, usually, in explaining factors.

...


[29]

EXPLANATION IN Q-METHOD

...

Toumlin suggests that there are three logically distinct types of explanation—a stated reason, a reported reason, and a causal explanation in a material sense. When a person says he smokes to soothe his nerves, he is giving a stated reason. If his friends say that he is always on edge unless he has a cigarette in his hands, it is a reported reason. If a scientist says that the taste of cigarattes is due to the tar and nicotene content, the concern is with material causes. Philosophers, including Toumlin, are fond of denying any scientific status to stated reasons: if a man says he feels sick, no one need believe him, and what he says can never be proved either true or false, that is, by way of any stated reasoning. But this, unfortunately, is apt to be taken too literally—there are some who think of Q-sorts as in some way only stated, not reported or
[30]
causal in any scientific sense. Clearly, no one is suggesting that we have to believe a person's Q-sort self-description, but it is a simple matter to show whether or not what he says is reliable, and easy to show whether other regularities are to be observed for it as well, as when what he says about himself correlates with what others say about him. With such facts as a beginning, it is possible to go further and to give reported and causal reasons for the facts. Factors are evidence, of course, of systematic conditions.

It is important to distinguish between ad hoc and genuine hypotheses. In current jargon a fact is explained if it is the conclusion of a valid deductive inference—only genuine hypotheses explain anything. Thus, when salt is put in water, it dissolves and the salt is said to be soluble in water. Solubility is thus attributed to the salt. But this is an ad hoc explanation, not a genuine one; it tells us nothing new and nothing more than is contained in the statement that salt, put in water, dissolves. The explanation becomes a genuine one only if it can be said that in solution the molecules of salt are held in suspension in Brownian movement (or the like), for reasons that can be given deductively, involving other primitive tests.

Factor analysis is associated with R-methodology more than with Q. In R all explanation, however, is of an ad hoc nature. Thus, the factorist finds that mental tests involving numbers (arithmetic) are clustered in factor space, that is, they can be operationally classified as alike—and one calls the factor n, or number factor. Similarly v is the factor for verbal tests of intelligence. To so designate factors is clearly only of taxonomic interest—no genuine hypotheses are at issue. It would have been very different if tests of number and tests of color blindness happened to cluster as one factor; in that event the factor could scarcely have been called either number or color, and one's curiosity would have been whetted to find a genuine explanation for such an interesting fact. There is scarcely a single convincing fact of the kind in all R-method and, therefore, scarcely a genuine hypothesis anywhere at issue. Q-method, on the contrary, never concerns itself with ad hoc explanations; it is always involved in genuine explanations. Thus, if a woman gives a Q-sort self-description, and then (with the same Q-sample) a Q-sort description of what she thinks her husband is like and these two descriptions are alike, constituting a factor, one seeks a "cause" for
[31]
the identity; either the woman has misunderstood the instructions, or she is so identified with her husband that one suspects either idealization or projection. The nature of the factor itself, that is, the order of the statements of the Q-sample for the factor at issue, will allow us to say which it is. The concern in every instance of a factor in Q-method is with such genuine hypotheses, genuine explanations of this kind. Each leads directly to additional primitive tests; each is a conclusion to valid deductive possibilities. ...

SUMMARY

It may be said that I am putting the cart before the horse in defining a methodology without saying what the theory is that it purports to measure. A methodology is not merely a technique, however, but a profound way of approaching nature. The methodology involving the above definitions...is set in a basic theory placing subjectivity—a person's own reflections on matters—at the hub of all else. This premise extends over all psychological or social science, and this is not particular to our concern with mass communication theory.

Meanwhile it is important to have the above definitions and principles clearly stated. In summary form they are as follows:
    1. Communication via mass communication is grasped by persons, not audiences.
    2. It is in a tripartite context of the person (X), the media or social mechanisms (Y) and an event or message (Z).
    3. What communication means, what its effects are, what may or may
[32]
not result from it, is never directly a matter of ideas, notions, beliefs, attitudes, opinions, wishes, or the like, but always directly ideas, notions, beliefs, etc., of a person.
    4. The primary data, therefore, are the person's self-referent statements. These are primary elements of communication theory.
    5. A collection of these statements, for a particular XYZ, constitutes a Q-population, samples from which provide Q-samples with which X can perform Q-sort self-descriptions which are homologous with X's ideas, notions, beliefs, attitudes, etc. These are the basic operations of mass communication research.
    6. Correlated and factored, these provide an objective basis for classification and comparative study for an XYZ situations, whether for one X or for many.
7. The factors are available for genuine explanation, that is, as matters of scientific theory.


---


re: abductive inference

Igor Douven
"Abduction"
The Stanford Encyclopedia of Philosophy
(Summer 2021 Edition)

The distinction between deduction, on the one hand, and induction and abduction, on the other hand, corresponds to the distinction between necessary and non-necessary inferences. In deductive inferences, what is inferred is necessarily true if the premises from which it is inferred are true; that is, the truth of the premises guarantees the truth of the conclusion.

...

It is standard practice to group non-necessary inferences into inductive and abductive ones. Inductive inferences form a somewhat heterogeneous class, but for present purposes they may be characterized as those inferences that are based purely on statistical data, such as observed frequencies of occurrences of a particular feature in a given population.

...

The mere fact that an inference is based on statistical data is not enough to classify it as an inductive one. You may have observed many gray elephants and no non-gray ones, and infer from this that all elephants are gray, because that would provide the best explanation for why you have observed so many gray elephants and no non-gray ones. This would be an instance of an abductive inference. It suggests that the best way to distinguish between induction and abduction is this: both are ampliative, meaning that the conclusion goes beyond what is (logically) contained in the premises (which is why they are non-necessary inferences), but in abduction there is an implicit or explicit appeal to explanatory considerations, whereas in induction there is not; in induction, there is only an appeal to observed frequencies or statistics.



Kyle Stanford
"Underdetermination of Scientific Theory"
The Stanford Encyclopedia of Philosophy
(Winter 2021 Edition)
Perhaps the most important division is between what we might call holist and contrastive forms of underdetermination. Holist underdetermination (Section 2 below) arises whenever our inability to test hypotheses in isolation leaves us underdetermined in our response to a failed prediction or some other piece of disconfirming evidence. That is, because hypotheses have empirical implications or consequences only when conjoined with other hypotheses and/or background beliefs about the world, a failed prediction or falsified empirical consequence typically leaves open to us the possibility of blaming and abandoning one of these background beliefs and/or ‘auxiliary’ hypotheses rather than the hypothesis we set out to test in the first place. But contrastive underdetermination (Section 3 below) involves the quite different possibility that for any body of evidence confirming a theory, there might well be other theories that are also well confirmed by that very same body of evidence.

...

Holist underdetermination ensures, Duhem argues, that there cannot be any such thing as a “crucial experiment”: a single experiment whose outcome is predicted differently by two competing theories and which therefore serves to definitively confirm one and refute the other.

...

Duhem thought that the sort of underdetermination he had described presented a challenge only for theoretical physics, but subsequent thinking in the philosophy of science has tended to the opinion that the predicament Duhem described applies to theoretical testing in all fields of scientific inquiry. We cannot, for example, test an hypothesis about the phenotypic effects of a particular gene without presupposing a host of further beliefs about what genes are, how they work, how we can identify them, what other genes are doing, and so on.

...

[quoting Quine]
“total science is like a field of force whose boundary conditions are experience. A conflict with experience at the periphery occasions readjustments in the interior of the field. But the total field is so underdetermined by its boundary conditions, experience, that there is much latitude of choice as to what statements to reevaluate in the light of any single contrary experience.”


No comments: