Analyzing racial disparities in socioeconomic outcomes in three NCES datasets

Last Updated on July 18, 2022

In a previous post, I cited several studies showing that racial disparities in many important social outcomes are largely driven by racial disparities in cognitive ability. This post will expand on those findings by demonstrating similar patterns in 3 nationally representative datasets that I have not yet considered elsewhere. The datasets include data on socioeconomic outcomes from the early 1990s to early 2010s. I will examine how racial disparities in educational attainment, occupational prestige, and income (the three primary measures of socioeconomic status, as explained here) are related to various factors such as parental income, high school academic achievement, and family structure. My main focus is on disparities between blacks and whites, where I find that the vast majority (over 90%) of the adulthood income gap is explained by some combination of the aforementioned factors, and virtually all of the disparity in educational attainment and occupational prestige are explained by high school achievement.

The data


Before providing the results, it will be appropriate to briefly describe each of the datasets. Each dataset comes from a nationally representative, longitudinal study conducted by the National Center for Education Statistics (NCES). Each study follows respondents beginning in middle/high school with regular follow-up interviews extending 8-10 years after school. These datasets allow one to examine the influence of various factors on racial disparities in various measures of socioeconomic status (e.g., income). The following graph illustrates the timelines and follow-ups of each of the studies.

All of my analyses are performed using the NCES DataLab tool. This is a free web tool available online for anyone willing to register an account. The tool allows one to run many simple (yet useful) statistical analyses on different datasets. For example, one can compute averages, medians, and percentage distributions of selected variables while filtering on other variables. The tool also allows the ability to run linear and logistic regressions on the data. The regression features are somewhat limited (e.g., no interaction terms, no transforming variables, etc.), but the available functionality is sufficient to provide highly useful insight. The UI was also fairly intuitive and easy to learn. I suggest readers play around with the tool if you’re interested in this topic. At the bottom of this post, I include table specifications that I exported from the site. Readers can import these specifications to reproduce my findings.

For my purposes, I rely on the regression analysis feature of the DataLab. For each dataset and each socioeconomic outcome (e.g., income), I analyze the influence of various predictors on racial disparities in that outcome as follows. First, run a regression including only race and sex as independent variables. This is the baseline model that will show the baseline effect of race on the outcome. Then, repeat the analysis after including more independent variables for the predictors of interest in the model (e.g., parental income, high school GPA, etc.). The difference in the race effect in the baseline model vs the more robust models shows the proportion of the race effect that can be (statistically) explained by the additional independent variables that were added to the model.

To understand the information presented in this post, it may be worth it to learn how to read regression tables. There are some useful pages here and here interpreting results from a linear regression. There is a useful set of slides here describing how to interpret the results from a logistic regression.

The next few sections will describe the variables of each dataset in a bit more detail.

High School and Beyond

The High School and Beyond study (HS&B) is a nationally representative, longitudinal study of 10th and 12th graders in 1980. The study began with two cohorts of students, one consisting of seniors and one consisting of sophomores. Follow-up surveys were conducted in 1982, 1984, 1986 for both cohorts. There was also a 4th follow-up survey in 1992 for the sophomore cohort, 10 years after high school. High school and postsecondary transcripts were collected in 1983 and 1993, respectively. A 5th follow-up survey was conducted in 2014, but the data is not yet available in the DataLab.

I used the following variables to measure socioeconomic outcomes in 1991-1993, about 10 years after high school. These were used as dependent variables in the regression models:

  • Total household income before taxes in 1991 (Y4601C): A continuous variable indicating the household income of the subject. The mean household income was $35k with standard deviation of $31k. No data for 1.9% of subjects.
  • Consolidated highest degree 1993 (DEGREE93): A categorical variable indicating whether the subject’s highest degree is no degree (67%), a certificate (5.1%), an associate’s degree (5.6%), a bachelor’s degree (16%), or a graduate/professional degree (6.8%).
  • Occupation code in 1992 (Y4303FA): A categorical variable indicating the occupation of the subject (e.g., clerical-financial, laborer, manager-retail, professional-engineer, etc.). No data for about 16% of subjects.

The predictors used were as follows. These were used as independent variables in the regression models:

  • Race/ethnicity in 1992 (RACE4): A categorical variable indicating whether the subject identified as Native American/Alaska Native (1.3%), Asian/Pacific Islander (1.2%), Black (13%), White (75%), or Hispanic (8%). No data for 1.5% of subjects.
  • Gender (SEX): A categorical variable for male (49.4%) or female (50.6%).
  • Collapsed family income in 1980 (INCOME): a categorical variable indicating the family income of the subject during the base year. About 17% of subjects had family incomes below $15k, whereas 14% had incomes over $40k. No data for 26% of subjects.
  • Senior test percentile-adjusted (SRTSPCT2): A continuous variable indicating the percentile of the composite score from a mini-SAT test given to the subjects in grade 12. SAT and ACT scores were used for some subjects who didn’t take the mini-SAT test. No data for 8.35% of participants.
  • Grades in high school (HSGRADES): a categorical variable with 7 values: Mostly A: 90-100 (3.8%), Half A/B 85-89 (12%), Mostly B 80-84 (20%), Half B/C 75-79 (27%), Mostly C 70-74 (24%), Mostly C/D 65-69 (10%), Mostly D: 60-64 (1.8%). No data for 1.13% of participants.
  • Marital and parental status in 1992 (FMFRM92): a categorical variable with 8 values: Married no children (15.06%), Married with children (35.64%), Divorced/separated/widowed no children (2.77%), Divorced/separated/widowed with children (6.62%), Never married no children (29.84%), Never married with children (7.02%), Living together no children (0.37%), Living together with children (0.43%). No data for 2.3% of subjects.

The National Education Longitudinal Study of 1988

The National Education Longitudinal Study of 1988 (NELS:88) is a nationally representative, longitudinal study of 8th graders in 1988. Follow-up surveys were conducted in 1990, 1992, 1994, and 2000, including data up to 8 years after high school. High school and postsecondary transcripts were collected in 1992 and 2000, respectively.

I used the following variables to measure socioeconomic outcomes in 2000, about 8 years after high school. These were used as dependent variables in the regression models:

  • Income of respondent in 1999 (asked in 2000) (F4HI99): A continuous variable indicating the income of respondents who worked in 1999. The mean was $26k with a standard deviation of $19k. Data was obtained only for respondents who worked in 1999. No data for 6.5% of subjects.
  • Current/previous occupation code (2000) (F4BXOCCD): A categorical variable describing the current/most recent job for the subject (e.g., secretaries and receptionists, laborer, educators, etc.). No data for about 3% of subjects.
  • Highest postsecondary education degree attained as of 2000 (F4HHDG): A categorical variable indicating whether the subject has attained a certificate/license (9.4%), associate’s degree (6.6%), bachelor’s degree (25%), master’s degree (2.6%), a Ph.D/professional degree (0.5%), or some PSE but no degree (30%). This only included data for subjects with some PSE. No data for 26% of subjects. PSE standards for postsecondary education.

The predictors used were as follows. These were used as independent variables in the regression models:

  • Old definition of race of respondent (2000) (F4RACE): A categorical variable indicating whether the subject is Asian or Pacific Islander (3.8%), Hispanic (11%), black (13%), white (70%), or Native American or Alaska Native (1.7%). No data for 0.27% of subjects.
  • Gender (2000) (F4SEX): a categorical variable for male (50%) or female (50%).
  • Family income (1991) (F2P74): A categorical variable indicating the income of the subject’s family during high school. About 20% of subjects had family incomes exceeding $50k and about 19% had family incomes below $20k. No data for 22% of subjects.
  • Math proficiency centiles (1992) (F22XMC): A continuous variable indicating the percentile of the subject’s performance on a math proficiency test during high school. No data for 31% of subjects.
  • Marital status (2000) (F4GMRS): A categorical variable indicating whether the subject is single/unmarried (52%), married (40%), divorced (4.8%), separated (2.0%), widowed (0.1%), or in a marriage-like relationship (1.0%). No data for about 0.4% of subjects.
  • Number of biological children (2000) (F4GNCH): A categorical variable indicating the number of children of the subject. About 58% of subjects had no children, 20% had 1 child, and about 20% had more than 1 child.

The Education Longitudinal Study of 2002

The Education Longitudinal Study of 2002 (ELS:2002) is a nationally representative, longitudinal study of 10th graders in 2002 and 12th graders in 2004. Follow-up surveys were conducted in 2006 and 2012, 8 years after high school. High school and postsecondary transcripts were collected in 2004 and 2012, respectively.

I used the following variables to measure socioeconomic outcomes in 2011, 8 years after high school. These were used as dependent variables in the regression models:

  • 2011 employment income: R only (F3ERN2011): A continuous variable indicating the respondent’s earnings from employment during the 2011 calendar year. The mean income was about $26k with a standard deviation of $22k.
  • Highest level of education earned as of F3 (F3ATTAINMENT): A categorical variable indicating the respondent’s level of education. About 16% of subjects had no postsecondary attendance, about 3% had no high school credentials, and about 34% had a bachelor’s degree or higher.
  • 2-digit ONET code for current/most recent job (F3ONET2CURR): A categorical variable indicating the ONET code for the subject’s occupation, including management (8%), sales and related (4.4%), office and administrative support (8.4%) occupations, etc.

The predictors used were as follows. These were used as independent variables in the regression models:

  • Student’s race/ethnicity (F1RACE): A categorical variable for American Indian/Alaska Native (1.0%), Asian or Hawaii/Pacific Islander (4.2%), black (14%), Hispanic race specified (7.2%), Hispanic race specified (9.0%), multi-racial (4.1%), or white (60%).
  • Sex (F1SEX): A categorical variable for male (50.4%) or female (49.6%).
  • Total family income (BYINCOME): A categorical variable indicating the family income of the during high school in 2001. About 20% of respondents had family incomes above $75k, and about 20% had family incomes below $25k.
  • Standardized test composite score: math, reading (BYTXCSTD): A continuous variable indicating the average of the subject’s math and reading scores during the base interview in 2002, standardized to a national mean of 50 and standard deviation of 10.
  • GPA for all courses taken in grades 9-12 (F1RGPP2): A categorical variable indicating the subject’s GPA during the 1st follow-up in 2004. About 40% of subjects had GPAs over a 3.0, 40% between 2.0 and 3.0, and about 20% below 2.0.
  • Marital status as of F3 (F3MARRSTATUS): A categorical variable indicating the marital status of the subject during the 3rd follow-up. About 28% of subjects were married and 66% were never married. About 5% were divorced, widowed, or separated. No data for 1% of subjects.
  • Whether R has any biological children (F3D06): A categorical variable indicating whether the subject does (34%) or does not (65%) have biological children during the 3rd follow-up. No data for 1% of subjects.

Educational attainment


Models

For each dataset, I run various logistic regressions to determine how different predictors influence the relationship between race and educational attainment. For each model, the dependent variable is a binary variable indicating whether the participant has attained a bachelor’s degree or higher. I consider 4 models for each dataset. The independent variables in each model are as follows:

  • Baseline: race and sex.
  • Parental Income: race, sex, and parental income.
  • High School Achievement: race, sex, and high school achievement.
  • Parental Income + High School Achievement: race, sex, parental income, and high school achievement.

Each regression model estimates the relationship between each independent variable and the odds of attaining a bachelor’s degree or higher, while controlling for all other independent variables. I will focus solely on the relationship between race and the odds of attaining a bachelor’s degree. To estimate the influence of the predictors on racial disparities in educational attainment, I first note the effect of race while only including race and sex in the regression model. Then I compare that to the effect of race after introducing more variables in the regression model.

Variables

To measure educational attainment, parental income, and high school achievement, I use the following variables for each dataset:

Educational attainmentParental IncomeHigh School Achievement
HSB
  • Consolidated highest degree 1993 (DEGREE93)
  • Collapsed family income in 1980 (INCOME)
  • Senior test percentile-adjusted (SRTSPCT2)
  • Grades in high school (HSGRADES)
NELS:88
  • Highest postsecondary education degree attained as of 2000 (F4HHDG)
  • Family income (1991) (F2P74)
  • Math proficiency centiles (1992) (F22XMC)
ELS:2002
  • Highest level of education earned as of F3 (F3ATTAINMENT)
  • Total family income (BYINCOME)
  • Standardized test composite score: math, reading (BYTXCSTD)
  • GPA for all courses taken in grades 9-12 (F1RGPP2)

The variables were described at the beginning of the post.

For example, for the HSB survey, to control for high school achievement, I included both the senior test scores and grades in high school as independent variables in the model. To control for parental income, I included collapsed family income in 1980 in the model.

Side point: For each survey, I wanted to measure high school achievement based on both high school grades and composite scores on standardized tests. I was able to do this for the HSB and ELS survey. However, there were two problems with the NELS:88 survey. Firstly, the variables available that measured high school grades (GPA and HSGPAV) only contained data for about 60% of subjects. More importantly, the subsample of subjects with valid GPA data seemed highly unreflective of the overall sample. So I avoided including GPA data to avoid distorting the data. Secondly, there was no adequate composite score for high school standardized testing that I could use. There was ACT/SAT data, but this only contained data on 40% of subjects and was limited to subjects with postsecondary transcripts, so it likely wasn’t representative of the overall sample. There were also variables named “test percentile (1992)” and “Senior test quintile” but there was no description about their content. So instead I opted to use a test for math proficiency as the measure of high school achievement.

Baseline disparities

Before showing how racial disparities in educational attainment change after controlling for other variables, it’s worth examining the raw differences without any controls. Here are the rates of attainment of bachelor’s degree attainment by race within each of the three datasets:

Bachelor’s degree attainment by race

HSBNELS:88ELS:2002
Asian51.30%51.70%50.10%
White39.00%44.10%39.80%
Black17.10%26.70%19.80%
Hispanic17.60%20.40%18.70%
Native American15.70%17.00%17.10%
Multi-racialN/AN/A27.90%

As expected, Asians are much higher rates of bachelor’s degree attainment than whites, who have an even larger advantage over blacks, Hispanics, and Native Americans.

High School and Beyond

I ran 4 regressions for educational attainment in the HSB study. For each regression, I only included subjects with valid data for degree attainment, parental income, grades, and mini-SAT test scores in high school. This resulted in 9,500 total subjects. The results of each regression are as follows:

Odds ratios from logistic regressions for bachelor’s degree attainment in 1993

BaselineParental IncomeHS grades + HS test scoresParental Income +
HS grades + HS test scores
Intercept0.770.380.030.02
Black0.34*0.45*1.141.35*
Asian/Pacific Islander1.60*1.75*1.68*1.75*
Hispanic0.37*0.45*0.760.84
Native American0.29*0.35*0.710.80

The columns here correspond to different models and the rows correspond to different racial/ethnic groups. Each cell indicates the odds ratio of attaining a bachelor’s degree for that racial/ethnic group relative to white people (whites are used as the reference group). Thus, for example, a 1 in a cell would indicate that a member of the racial/ethnic group of that cell has the same odds as a white person of attaining a bachelor’s degree. A number greater than 1 indicates that members of that racial/ethnic group are more likely to attain a bachelor’s degree, whereas a number less than 1 indicates that members are less likely to do so. The * indicates that there is a statistically significant difference (at the 95% level) between whites and members of the racial/ethnic group in their odds of attaining a bachelor’s degree. The regression tables that will be presented for educational attainment in the other datasets should be interpreted in the same way.

For example, the 0.34 in the table above indicates that, in the baseline model (only controlling for sex), black people have 0.34 times the odds of attaining a bachelor’s degree or higher, relative to white people. In other words, blacks have 66% lower odds of attaining a bachelor’s degree or higher compared to whites. However, in the high school achievement model (which controls for high school GPA and the mini-SAT test scores), black people have 1.14 times the odds of doing so. In other words, blacks have 14% higher odds of attaining a bachelor’s degree after controlling for high school achievement (although the difference is not statistically significant). As you can see, controlling for parental income is not enough to eliminate the black-white disparity in bachelor’s degree attainment, but controlling for high school achievement is sufficient. In fact, controlling for parental income and high school achievement resulted in blacks having 35% higher odds of achieving this feat relative to whites.

Asian respondents had significantly higher (60-75%) odds than whites of attaining a bachelor’s degree in all models. Native Americans and Hispanics had lower odds than whites regardless of controls, although the gap was statistically insignificantly after controlling for high school achievement.

The National Education Longitudinal Study of 1988

I ran 4 regressions for educational attainment in the NELS study. For each regression, I only included subjects with valid data for degree attainment, parental income, and math proficiency in high school. This resulted in 6,400 total subjects. The results of each regression are as follows:

Odds ratios from logistic regressions for bachelor’s degree attainment in 2000

BaselineParental IncomeHS math proficiencyParental Income +
HS math proficiency
Intercept0.820.290.050.03
Black0.44*0.63*0.951.17
Asian/Pacific Islander1.161.231.131.17
Hispanic0.37*0.56*0.60*0.78
Native American0.32*0.39*0.730.85

The results here are similar to the results in the HSB survey. Controlling only for sex, blacks had 56% lower adds of attaining a bachelor’s degree than whites. Controlling for parental income reduces this disparity somewhat. Controlling for HS math proficiency essentially eliminated the gap entirely. Also like the previous survey, Asians had higher odds than whites of attaining a bachelor’s degree in all models, although none of the differences were statistically significant.

The Education Longitudinal Study of 2002

I ran 4 regressions for educational attainment in the ELS study. For each regression, I only included subjects with valid data for degree attainment, parental income, high school grades, and standardized test scores in high school. This resulted in 12,000 total subjects. The results of each regression are as follows:

Odds ratios from logistic regressions for bachelor’s degree attainment in 2011

BaselineParental IncomeHS grades + HS test scoresParental Income +
HS grades + HS test scores
Intercept0.580.190.000.00
Black0.38*0.54*1.55*1.80*
Asian/Pacific Islander1.53*1.92*1.65*1.87*
Hispanic0.34*0.47*0.871.03
Native American0.30*0.39*0.740.83
Multi-racial0.59*0.69*1.001.09

The results here are in line with the previous two studies. Controlling only for sex, blacks have significantly lower odds (62%) of attaining a bachelor’s degree or higher compared to whites. Controlling for parental income only slightly reduced the racial gap. Controlling for high school achievement reverses the gap so that blacks had 55% higher odds of attaining a bachelor’s degree or higher. Asians had higher odds than whites of attaining a bachelor’s degree in all models.

Summary

The main findings are as follows:

  • When controlling only for sex, black respondents had about 50-60% lower odds of attaining a bachelor’s degree or higher compared to white subjects. After introducing controls for parental income, this disparity changes only slightly: blacks still have about 40-50% lower odds than whites of attaining a bachelor’s degree or higher. Controlling for high school achievement either erased or reversed the black-white gap in odds of bachelor’s degree attainment in all datasets.
  • Asians had higher odds than whites of bachelor degree attainment in each model of each dataset. The Asian advantage was statistically significant in each of the models of the HSB and ELS surveys, but was statistically insignificant in each of the models of the NELS survey.
  • The data for Hispanic, Native American, and multi-racial subjects all had similar patterns as the black subjects. Controlling only for sex, these groups had significantly lower odds than whites of attaining a bachelor’s degree or higher. In fact, they often had odds lower than that of blacks. This gap persisted after controlling for parental income. Controlling for high school achievement usually rendered the gaps to statistical insignificance. However, unlike the black subjects, controlling for high school achievement never produced a statistically significant advantage for these minority groups over whites.

Occupational prestige


Models

For each dataset, I run various logistic regressions to determine how different predictors influence the relationship between race and occupational prestige. For each model, the dependent variable is a binary variable indicating whether the participant works in a prestigious occupation. I consider 4 models for each dataset. These are the same models used in the previous section to predict educational attainment, so I will not describe them again.

Variables

To determine whether a subject worked in a prestigious occupation, I mainly relied on my own judgment to determine whether an occupation is prestigious. The prestigious occupations that I selected for each dataset were determined as follows:

  • High School and Beyond: Using the occupation code in 1992 (Y4303FA) variable, prestigious occupations were those coded as professional-legal, professional-engineer, technical-computer related, professional-other, professional-arts, physician, and professional-medical.
  • The National Education Longitudinal Study of 1988: Using the current/previous occupation code (2000) (F4BXOCCD) variable, prestigious occupations were those coded as executive managers, computer programmers, financial service professionals, engineers/architects/software engineers, computer systems/related professionals, legal professionals, medical licensed professionals, and scientists and statistician professionals.
  • The Education Longitudinal Study of 2002: Using the 2-digit ONET code for current/most recent job (F3ONET2CURR) variable, prestigious occupations were those coded as management, business and financial operations, computer and mathematical, architecture and engineering, or life, physical, and social science occupations. These are the occupations with ONET codes 11-19, which are the top two groups in the intermediate aggregation described by the U.S. Bureau of Labor Statistics (see Table 3).

To measure parental income and high school achievement, I used the same variables described in the previous section.

Baseline disparities

Before showing how racial disparities in occupational prestige change after controlling for other variables, it’s worth examining the raw differences without any controls. Here are the rates of attainment of prestigious occupations by race within each of the three datasets:

Attainment of prestigious occupation by race

HSBNELS:88ELS:2002
Asian24.40%22.70%26.20%
White16.20%15.20%22.30%
Black12.20%9.40%13.50%
Hispanic9.20%7.70%13.60%
Native American10.30%9.40%14.60%
Multi-racialN/AN/A17.90%

As expected, Asians are more likely to acquire a prestigious occupation than whites, who are more likely to achieve such a feat than blacks, Hispanics, and Native Americans.

High School and Beyond

I ran 4 regressions for occupational prestige in the HSB study. For each regression, I only included subjects with valid data for occupation code, parental income, grades, and mini-SAT test scores in high school. This resulted in 8,000 total subjects. The results of each regression are as follows:

Odds ratios from logistic regressions for attainment of prestigious occupation in 1992

BaselineParental IncomeHS grades + HS test scoresParental Income +
HS grades + HS test scores
Intercept0.200.170.120.08
Black0.74*0.881.451.61*
Asian/Pacific Islander1.58*1.43*1.351.37
Hispanic0.58*0.751.091.15
Native American0.27*0.290.440.46

This table should be read in the same way as the tables for educational attainment. The only difference is that the numbers indicate odds ratios for attaining a prestigious occupation instead of a bachelor’s degree.

Controlling only for sex (baseline model), black people had 26% lower odds of attaining a prestigious occupation than whites. Controlling for parental income reduced this disparity to just 12% and was no longer statistically significant. Controlling for high school achievement, blacks had 45% higher odds of attaining a prestigious occupation, though the gap was not statistically significant. Controlling for parental income and high school achievement, blacks had 61% higher odds of doing so.

Regardless of the controls used, Asian respondents had higher (35-58%) odds than whites of attaining a prestigious occupation, although the gap was not statistically significant after controlling for high school achievement. Native Americans had lower odds than whites of achieving this feat, regardless of controls, although the gap was not statistically significant after introducing controls for either parental income or high school achievement. Hispanics had higher odds of attaining a prestigious occupation after controlling for high school achievement, although this difference was not statistically significant.

The National Education Longitudinal Study of 1988

I ran 4 regressions for occupational prestige in the NELS study. For each regression, I only included subjects with valid data for occupation code, parental income, and math proficiency in high school. This resulted in 7,700 total subjects. The results of each regression are as follows:

Odds ratios from logistic regressions for attainment of prestigious occupation in 2000

BaselineParental IncomeHS math proficiencyParental Income +
HS math proficiency
Intercept0.220.110.030.03
Black0.63*0.811.171.26
Asian/Pacific Islander1.54*1.511.401.40
Hispanic0.51*0.67*0.760.84
Native American0.861.051.691.80

The results here are in line with the previous two surveys. Controlling only for sex (baseline model), black people had 37% lower odds of attaining a prestigious occupation than whites. The gap was no longer statistically significant outside of the baseline model. Controlling for high school achievement reversed the gap so that black people had higher odds of attaining a prestigious occupation, although the black advantage was not statistically significant. Asian respondents had higher (40-50%) odds than whites of attaining a prestigious occupation in all models, although the gap was not statistically significant outside of the baseline model.

The Education Longitudinal Study of 2002

I ran 4 regressions for occupational prestige in the ELS study. For each regression, I only included subjects with valid data for occupation code, parental income, GPA, and standardized test scores in high school. This resulted in 11,600 total subjects. The results of each regression are as follows:

Odds ratios from logistic regressions for attainment of prestigious occupation in 2011

BaselineParental IncomeHS grades + HS test scoresParental Income +
HS grades + HS test scores
Intercept0.360.200.020.02
Black0.49*0.68*0.981.05
Asian/Pacific Islander1.25*1.40*1.24*1.32*
Hispanic0.55*0.70*0.900.97
Native American0.570.590.780.84
Multi-racial0.760.850.971.02

Controlling only for sex (baseline model), black people had 51% lower odds of attaining a prestigious occupation than whites. The gap was essentially eliminated after controlling for high school achievement. Controlling for high school achievement reversed the gap so that black people had higher odds of attaining a prestigious occupation, although the black advantage was not statistically significant. Asian respondents had higher (25-40%) odds than whites of attaining a prestigious occupation that was statistically significant in all models.

Summary

  • When controlling only for sex, blacks had about 25-50% lower odds of attaining a prestigious occupation compared to white subjects. Controlling for parental income reduced the gap so that it was no longer statistically significant for two of the datasets. Controlling for high school achievement either erased or reversed the black-white gap in odds of attaining a prestigious occupation.
  • Asians had higher odds than whites of attaining a prestigious occupation in each model of each dataset. The Asian advantage was usually no longer statistically significant after controlling for high school achievement.
  • The Hispanic, Native American, and multi-racial subjects all had similar patterns. Controlling only for sex, these groups had significantly lower odds than whites of attaining a prestigious occupation. This gap persisted after controlling for parental income. Controlling for high school achievement rendered the gaps to statistical insignificance, resulting in some of the groups having higher odds than whites of attaining a prestigious occupation.

Income


Models

For each dataset, I run various linear regressions to determine how different predictors influence the relationship between race and adulthood income. For each model, the dependent variable is a continuous variable for income. I consider several models for each dataset. The independent variables in each model are as follows:

  • Baseline: race and sex.
  • Parental Income: race, sex, and parental income.
  • High School Achievement: race, sex, and high school achievement.
  • Educational Attainment: race, sex, and highest level of education .
  • Parental Income + High School Achievement: race, sex, parental income, high school achievement.
  • High School Achievement + Family Structure: race, sex, high school achievement, and family structure.
  • Parental Income + High School Achievement + Family Structure: race, sex, parental income, high school achievement, and family structure.

Not all models are used in each dataset due to lack of relevant data.

Each regression model estimates the relationship between each independent variable and the income earned in adulthood, while controlling for all other independent variables. To estimate the influence of the predictors on racial disparities in income, I first note the effect of race while only including race and sex in the regression model. Then I compare that to the effect of race after introducing more variables in the regression model.

Variables

The variables that I use to measure parental income and high school achievement were described in previous sections, so I won’t describe them again. To measure adulthood income, educational attainment, family structure, I use the following variables from each dataset:

Adulthood IncomeEducational AttainmentFamily Structure
HSB
  • Total household income before taxes in 1991 (Y4601C)
  • Consolidated highest degree 1993 (DEGREE93)
  • Marital and parental status in 1992 (FMFRM92)
NELS:88
  • Income of respondent in 1999 (asked in 2000) (F4HI99)
N/A
  • Marital status (2000) (F4GMRS)
  • Number of biological children (2000) (F4GNCH)
ELS:2002
  • 2011 employment income: R only (F3ERN2011)
  • Highest level of education earned as of F3 (F3ATTAINMENT)
  • Marital status as of F3 (F3MARRSTATUS)
  • Whether R has any biological children (F3D06)

The variables were described at the beginning of the post.

Baseline disparities

Before showing how racial disparities in income change after controlling for other variables, it’s worth examining the raw differences in income without any controls. Here are the average incomes by race within each of the three datasets:

Average income by race

HSBNELS:88ELS:2002
Asian42,26324,38226,926
White38,71225,76027,628
Black28,05320,98319,929
Hispanic31,85622,35021,551
Native American26,36116,72716,872
Multi-racialN/AN/A23,598

Note: the incomes in the HSB dataset are household income, whereas the other two datasets are individual incomes among workers.

Unlike the racial disparities in educational attainment and occupational prestige, Asians and whites in these datasets actually have similar incomes. But both groups have much higher incomes than blacks, Hispanics, and Native Americans.

High School and Beyond

I ran 6 regressions for household income in the HSB study. For each regression, I only included subjects with valid data for total household income in 1991, parental income, high school grades, mini-SAT test scores, educational attainment, and marital/parental status in 1992. This resulted in 7,700 total subjects. The results of each regression are as follows:

Coefficients from linear regressions for total household income before taxes in 1991

BaselineParental IncomeHS grades + HS test scoresEducational AttainmentParental Income + HS grades + HS test scoresParental Income + HS grades + HS test scores + Family Structure
Intercept37,09531,07629,30134,85124,25617,927
Black-10,735*-6,823*-6,472*-8,363*-4,258*-644
Asian/Pacific Islander5,7056,3535,6445,5265,7237,223
Hispanic-5,877*-2,726*-2,513*-3,387*-88045

Each cell indicates the change in income associated with belonging to the corresponding racial/ethnic group relative to white respondents. Thus, for example, a 0 in a cell would indicate that the corresponding racial/ethnic group has the same average household income as white people. A number greater than 0 indicates that the racial/ethnic group earns more than white people on average, whereas a number less than 1 indicates that the racial/ethnic group earns less on average. The * indicates that there is a statistically significant difference (at the 95% level) in earnings between the racial/ethnic group and white people. The other linear regression tables for income should be interpreted in the same way as this table.

For example, the -10,735 in the table above indicates that the average black household income is about $10.8k less than the average white household income, controlling only for sex. After controlling for high school achievement (mini-SAT test scores and grades), black household income is about $6.5k less than white household income. Perhaps surprisingly, controlling for educational attainment only reduced about 22% of the gap, to about $8.4k. Controlling for parental income and high school achievement eliminates more than half of the gap in household income, reducing the gap to about $4.3k. After adding additional controls for family structure (marital status and whether the respondent has children) virtually eliminates the gap in total household income. In other words, after controlling for parental income, mini-SAT test scores, high school GPA, marital status, and the presence of children, the average black household income is only about $644 less than the average white household income, which is only 6% of the original gap of $10,735.

Controlling for marital status and presence of children is important because married individuals tend to earn more than unmarried individuals, and individuals without children tend to earn more than individuals with children. This is particularly relevant for the black-white disparity because black people are far more likely to be single parents than whites. Also, focusing on family structure is particularly important when analyzing household income (as opposed to individual income), because married households tend to have dual income earners.

Similar patterns were found for the Hispanic households in the study, although the Hispanic-white gaps were not as large as the black-white gaps. Asian households had higher household incomes than whites in all models, although these differences were never statistically significant.

The National Education Longitudinal Study of 1988

I ran 6 regressions for income in the NELS study. For each regression, I only included subjects with valid data for income in 1999, parental income, math proficiency test scores, marital status in 2000, and number of biological children in 2000. This resulted in 6,800 total subjects. The results of each regression are as follows:

Coefficients from linear regressions for income in 1999

BaselineParental IncomeHS grades + HS test scoresParental Income + HS grades + HS test scoresHS grades + HS test scores + Family StructureParental Income + HS grades + HS test scores + Family Structure
Intercept30,87927,75925,57824,69625,71124,557
Black-3,661*-2,158*-1,813-1,090-880-196
Asian/Pacific Islander-5880-235-81103284
Hispanic-2,010*-274-640325-339603

The results indicate that black workers in 1999 earned about $3.7k less than whites after controlling only for sex. Controlling for parental income reduced the gap by about 40% to $2.2k. Controlling for high school achievement (math proficiency) reduced the gap by about 50% to $1.8k, rendering the gap to be statistically insignificant. Controlling for both parental income and high school achievement reduced the gap by about 70% to $1.1k. Controlling for parental income, math proficiency, and family structure virtually eliminated the gap, reducing the gap to just $196, 5% of the original gap.

Unlike the other studies, there was not a large gap in income between Asian and white workers. The Hispanic-white gap was statistically insignificant in all of the models except for the baseline model, with a small (statistically insignificant) Hispanic advantage in some of the models.

The Education Longitudinal Study of 2002

I ran 7 regressions for household income in the ELS study. For each regression, I only included subjects with valid data for employment income in 1991, parental income, high school grades, standardized test composite scores, educational attainment, marital status, and whether the respondent has biological children. This resulted in 10,400 total subjects. The results of each regression are as follows:

Coefficients from linear regressions for employment income in 2011

BaselineParental IncomeHS grades + HS test scoresEducational AttainmentParental Income + HS grades + HS test scoresHS grades + HS test scores + Family StructureParental Income + HS grades + HS test scores + Family Structure
Intercept33,82026,75717,44825,36413,9771395711,694
Black-7,270*-5,658*-2,671*-5,268*-2,203*-967-583
Asian/Pacific Islander2,1352,963*1,8701,3662,3042527*3,305*
Hispanic-4,851*-3,242*-1,220-2,623*-666-777-264
Multi-racial-2,627-1,722-600-1,429-226-81219

The results indicate that black workers in 2011 earned about $7.3k less than whites after controlling only for sex. Controlling for parental income reduced the gap by about 25% to about $5.7k. Controlling for high school achievement reduced the gap by about 65% to about $2.7k. Perhaps surprisingly, controlling for educational attainment only reduced about 28% of the gap. Controlling for high school achievement and family structure eliminated about 85% of the gap, reducing the gap to just $967, a statistically insignificant difference. Controlling for parental income, high school achievement, and family structure reduced over 90% of the gap, to just $583.

Asian workers earned more than white workers in all models, although the difference was only statistically significant in some of the models. The incomes for Hispanic and multi-racial workers (relative to whites) followed similar patterns as the incomes for black workers, although the gaps with whites were much lower in magnitude.

Summary

  • All datasets showed that blacks earned significantly less than whites in the baseline models. Controlling for parental income and educational attainment reduced about 30-40% of the black-white income gaps. Controlling for high school achievement reduced about 40% of the household income gap and 50-60% of the individual income gaps. Controlling for both parental income and high school achievement reduced 60-70% of the income gaps. Family structure was also important in explaining the income gap. Controlling for parental income, high school achievement, and family structure eliminated over 90% of the income gaps in all samples.
  • There were similar patterns observed for Hispanic and multi-racial workers. There were initially large gaps (compared to white workers) in the baseline models, which were mostly reduced in models with more controls.
  • Asian workers tended to have higher incomes than white workers, although the differences were often statistically insignificant.

Conclusion


These results show that racial disparities in socioeconomic outcomes can be largely (statistically) explained by a few key variables measuring parental income, high school achievement, and family structure. Concerning black-white gaps in particular, the data shows that black-white gaps in educational attainment and occupational prestige are entirely explained by high school academic achievement. Roughly half of the black-white gaps in income were explained by high school academic achievement. The majority (60-70%) of income gaps could be explained by parental income and high school academic achievement, which are factors in place before individuals reach the market. After further controlling for family structure, over 90% of black-white income gaps are explained. Similar patterns were observed for other racial groups.

These findings provide further evidence that racial inequality is largely driven by factors in place before individuals reach the market. Of course, one could question whether the variables mentioned here are causal. For some of the variables, there is good reason to believe that they have a causal effect, whereas other variables are likely to be driven by confounding.

  • High school academic achievement: in a previous post (near the bottom), I’ve provided reason to believe that the cognitive ability gap is likely causing the gaps in socioeconomic outcomes. Since high school academic achievement is caused in large part by cognitive ability (as I argued here), the relationship between high school academic achievement and socioeconomic outcomes observed here are likely causal.
  • Parental income: while parental income likely has some effect on the racial disparities in socioeconomic outcomes, it is also likely that much of the apparent effect of parental income is due to confounding. The primary source of confounding that I have in mind is genetic confounding. That is, given (1) smarter parents tend to earn higher incomes (as shown here) largely due to genetic reasons (given the heritability of cognitive ability and socioeconomic outcomes) and given (2) smarter parents tend to transmit genes to their children that disposes them to have traits (e.g., high cognitive ability, contentiousness, etc.) to improve their socioeconomic outcomes, much of the correlation between parental incomes and offspring socioeconomic outcomes will be driven by shared genes. Trying to quantify how much of the association between parental incomes and offspring socioeconomic outcomes is driven by genetic confounding must be explored in a later post.
  • Family structure: in one sense, family structure certainly has a strong effect on income. For example, family structure will obviously have a strong effect on household income since cohabiting/married households are much more likely to be dual income earning households than single households. Regarding individual income, it is unclear whether marital status or number of children has a causal impact on an individual’s earnings. For example, it’s possible that both family structure and individual earnings are caused by the same factors, and it’s likely that family structure is partially influenced by an individual’s earnings. Regardless, the fact that many racial disparities are eliminated after controlling for family structure is interesting even if family structure per se is not causal. For example, it may be true the factors causing racial differences in family structure are also causing racial differences in income. If so, it will be worth investigating those factors, since they contribute to racial inequalities.

As I have written elsewhere, I believe these findings suggest that we should be interested in exploring the causes of  racial disparities in cognitive ability / academic achievement and racial differences in family structure, assuming we are interested in exploring the causes of racial inequality. Any attempts to address racial inequalities that do not address racial disparities in high school achievement or family structure are almost certainly doomed to failure. I began exploring various possible causes of racial disparities in cognitive ability here. I hope to explore causes of racial disparities in family structure in a separate post.

Table Specifications

To reproduce some of the regressions that I presented above, you can import the following table specifications into the DataLab.

High School and Beyond

  • File: bachelor’s degree attainment regressed on race, sex, parental income, HS grades, and HS test scores.
  • File: attainment of a prestigious occupation regressed on race, sex, parental income, HS grades, and HS test scores.
  • File: total household income regressed on race, sex, parental income, HS grades, HS test scores, marital and parental status, and highest degree.

The National Education Longitudinal Study of 1988

  • File: bachelor’s degree attainment regressed on race, sex, parental income, and HS test scores.
  • File: attainment of a prestigious occupation regressed on race, sex, parental income and HS test scores.
  • File: employment income regressed on race, sex, parental income, HS test scores, marital status, and number of biological children.

The Education Longitudinal Study of 2002

  • File: bachelor’s degree attainment regressed on race, sex, parental income, HS grades, and HS test scores.
  • File: attainment of a prestigious occupation regressed on race, sex, parental income, HS grades, and HS test scores.
  • File: employment income regressed on race, sex, parental income, HS grades, HS test scores, marital status, and whether the respondent has biological children.

Leave a Reply

Your email address will not be published.