More thoughts on differential regression to the mean studies

In a number of previous posts I cited the results of differential regression to the mean studies as evidence against a within group variable factor causal hypothesis of the Black-White intelligence gap. I’ve noted in the comments section of one of my posts that my argument is tentative:

(a) Differential sibling regression to the mean indexes depressive effect. (This is obviously true.)

(b) The slopes of the regression lines index the dispersion of the depressive effect — i.e., if the effect was not fairly uniformly spread out, the lines would converge at the far right. (It seems that this would be true if the sample sizes were large enough at the extremes for the simple reason that less depressed Blacks would have higher IQs and so higher IQ Blacks would tend to be less depressed in IQ — and so as one moves to the right of the bell curve the depression effect should be less and therefore by (a) so should the differential regression.)

and importantly

(c) That what was found prior holds today.

The key here is (b) and if (b) holds, it would be worth my while to analyze some more recent data (e..g, the NLSY 97 or perhaps the recent ECLS) to see if (c) holds and maybe extend this to Hispanics, etc. This would be a time/effort consuming project, so I don’t plan to do it — or, at this point, pay someone else to — unless I can get confirmation about (b).

Let me give some background on this topic. Early on in the race IQ debate it was argued that the presence of differential regression would evidence genetic differences. To quote Sandra Scarr:

Regression effects can be predicted to differ for both blacks and whites if the two races indeed have genetically different population means. If the population mean for blacks is 15 points lower than that of whites, the offspring of high-IQ black parents should show greater regression towards a lower population mean than the offspring of Whites of equally high IQ. Similarly, the offspring of low-IQ black parents should show less regression than those of white parents of equally low IQ. (Scarr-Salapatek, 1971)

Racial Hereditarians such as the late Arthur Jensen investigated this issue by matching Blacks and Whites on IQ and comparing the IQs of their siblings. They confirmed the existence of differential regression and then offered this as evidence of a genetic hypothesis (Jensen, 1974; Jensen, 1998; Rushton and Jensen, 2005).

Jensen (1973, pp. 107–119) tested the regression predictions with data from siblings (900 White sibling pairs and 500 Black sibling pairs). These provide an even better test than parent– offspring comparisons because siblings share very similar environments. Black and White children matched for IQ had siblings who had regressed approximately halfway to their respective population means rather than to the mean of the combined population. For example, when Black children and White children were matched with IQs of 120, the siblings of Black children averaged close to 100, whereas the siblings of White children averaged close to 110. A reverse effect was found with children matched at the lower end of the IQ scale. When Black children and White children are matched for IQs of 70, the siblings of the Black children averaged about 78, whereas the siblings of the White children averaged about 85. The regression line showed no significant departure from linearity throughout the range of IQ from 50 to 150, as predicted by genetic theory but not by culture-only theory. (Rushton and Jensen, 2005. THIRTY YEARS OF RESEARCH ON RACE DIFFERENCES IN COGNITIVE ABILITY)

In counter, it has been argued that differential regression is a statistical tautology. Such arguments are still circulated. Misciting Flynn (2010), the wikipedia article on Race and Intelligence notes:

Jensen and Rushton argue that regression toward the mean effects observed in studies comparing blacks and whites with high and low IQs to the IQs of their close relatives provide evidence of a polygenetic basis for the black/white IQ gap. However, other researchers have found Jensen’s arguments to be unpersuasive, noting that regression to the mean is merely a statistical artifact and cannot be used to isolate potential causal factors.

As Murray (1999) has pointed out, while it’s a mathematical given that two groups drawn from a common population will show regression, unless there is some differentiating factor with respect to the dimension measured, the groups will regress to a common mean. If two groups regress towards separate population means, a causal explanation is needed. Murray (1999) went on to argue that differential sibling regression ruled out a shared environmental explanation for the differential regression as matching siblings, by definition, matches for shared environment.

As Brody (2002) pointed out, however, this is an erroneous conclusion. Contra Murray (and Rushton and Jensen) the impact of shared environment on the sibling regression lines will be identical to the impact of shared genes. Brody (2002) went on to argue that differential regression is consistent with any environmental hypotheses. But as I pointed out, this is obviously not true. What is true is the exact reverse of what Murray (1999) argued. Differential sibling regression is inconsistent with non-shared environmental factors. It’s caused by some combination of additive genetic and shared environmental differences between populations.

Now in addition to indexing shared environmental effect, I have suggested that differential sibling regression or more specifically the relative slopes of the sibling regression lines can tell about the dispersion of an effect causing a mean difference. As to that:

In a previous posts, I contrasted three different environmental scenarios:

There are three possible exclusively environmental causal scenarios for the between group difference: Factor-X, in which the factors causing the between group difference are unique to one group and do not vary in that group. Under this scenario, MI will not hold. Within group variable factors, in which the factors causing the between group difference are those environmental ones, or a subset of them, that cause the differences within groups. And the aggregated effect of these factors is variably distributed between populations. Under this scenario, MI will hold and within group heritabilities will be noticeably different. Within group uniform factors, in which the factors causing the between group difference are those environmental ones, or a subset of them, that cause the differences within groups. And the aggregated effect of these factors is uniformly distributed between populations. Under this scenario MI will hold, within group heritabilities will be the same, but an implausible homogenizing “Factor Y” is needed to explain the uniformity of effect between populations.

I rightly noted that only the one which I called “within group variable factors” is theoretically plausible. I also argued that differential sibling regression argues against some variable factor models. For example, a model in which, say, 10% of the African American population in US is unaffected by the factors depressing the Black mean in untenable, given the simple math:

This differential regression is an index of the depressing effect (genetic or environmental) causing the group deviation. But this effect is no less at the far right of the bell curve than at the far left. To see the implications of this, imagine if 15% of the Black population wasn’t depressed in IQ and if the other 85% was and was so equally. If so, 15% would be depressed in IQ 0 SD relative to the White mean and the remaining 85% would be depressing 1.18 SD, thus producing an average Black-White difference of 1 SD. Were this the case, at an IQ of 130, given a normal distribution, we would have roughly 2.1% x 15% of unaffected Blacks (= 0.315) and 0.069% x 85% of affected Black (= 0.059). The ratio of unaffected to affected Blacks at this IQ would be over 5 to 1. Were this the case, the sib regression difference (at 130) would be less than one fifth of what it was at an IQ of 85, since this differential regression indexes our depressive effect. And yet it is not.

I didn’t, however, explore alternative, perhaps more realistic, within group variable factor models. What if, for example, the effect depressing the Black mean was normally distributed?

To explore this, I created a simple sibling regression model for Whites and for Blacks assuming that Blacks were variable depressed in IQ by 15 IQ points, on average, with standard deviations of depression of 7. By this model, less than 2.5% of Blacks are depressed ~0 standard deviations and less than 2.5% of Blacks are depressed ~2 standard deviations. I matched Blacks and Whites on IQ and compared the IQs of their siblings. The two sibling regression lines, theoretical Black (in Black and in the middle) and theoretical White (in Blue and on top), are shown below. As one would predict, the deviation between the regression lines narrows slightly as IQ increases. The slope of the White sibling regression line was 0.6 and the slope of the Black sibling regression line was 0.66. The lines converge at an IQ of approximately 170. To compare these theoretical regression lines to the actual regression line exhibited by Blacks, I plotted the sibling regression line of Black siblings (in purple) from the 1997 National Longitudinal Study of Youth (N, sib pairs = 252). I did not have White sibling data on hand so these are in comparison to my theoretical White line. As can be seen, the slope of NLSY 97 Black regression line (in red on the bottom) is 0.52. The IQs of the siblings of IQ matched Blacks and Whites diverge slightly as scores increase. This is consistent with the results found by Murray (1999) in which the scores of Black and White siblings in the NLSY 79 and CNLSY were examined.

regression to the mean nlsy97theoretical

The above, then, is consistent with what I was saying. To refresh, my overall argument has been:

(a) There are three possible purely environmental causal scenarios for the B/W gap. Of these, only one is tenable: within group variable factors

(b) By a within group variable factor scenario, higher IQ Blacks will be less depressed on average, because less depressed Blacks on average will have higher IQs

(c) Differential sibling regression indexes depressive effect

(d) Given, (c), differential sibling regression should narrow as IQ increases if the between group difference is in fact caused by within group variable environmental factors

(e) There is no such narrowing

(f) As such, no purely environmental causal scenario is tenable

The major problem with my overall argument is (e). Specifically, demonstrating this premise. The differences in the slopes of the regression lines are so small (e.g., 0.52 (actual) versus 0.66 (predicted by an VE hypothesis, M=15, SD=7) as to require a massive sample size to establish a significant difference between them. In short, it’s difficult to establish that Blacks are exhibiting a differential regression of a nature inconsistent with that which would be produced by purely variable shared environmental factors. (Data sets exist which allow for this — they are just not public use.) As such, I can only rule out certain variable factor models — such as ones which maintain that an appreciable percent of the Black population is unaffected by the factors depressing the mean.

This entry was posted in Uncategorized. Bookmark the permalink.