Analyzing admissions metrics as predictors of graduate GPA and whether graduate GPA mediates PhD completion

Mike Verostek Department of Physics and Astronomy, University of Rochester Department of Physics and Astronomy, Rochester Institute of Technology mveroste@ur.rochester.edu Casey Miller Department of Physics and Astronomy, Rochester Institute of Technology Benjamin Zwickl Department of Physics and Astronomy, Rochester Institute of Technology

(August 13, 2025)

Abstract

An analysis of 1,955 physics graduate students from 19 PhD programs shows that undergraduate grade point average predicts graduate grades and PhD completion more effectively than GRE scores. Students’ undergraduate GPA (UGPA) and GRE Physics (GRE-P) scores are small but statistically significant predictors of graduate course grades, while GRE quantitative and GRE verbal scores are not. We also find that males and females score equally well in their graduate coursework despite a statistically significant 18 percentile point gap in median GRE-P scores between genders. A counterfactual mediation analysis demonstrates that among admission metrics tested only UGPA is a significant predictor of overall PhD completion, and that UGPA predicts PhD completion indirectly through graduate grades. Thus UGPA measures traits linked to graduate course grades, which in turn predict graduate completion. Although GRE-P scores are not significantly associated with PhD completion, our results suggest that any predictive effect they may have are also linked indirectly through graduate GPA. Overall our results indicate that among commonly used quantitative admissions metrics, UGPA offers the most insight into two important measures of graduate school success, while posing fewer concerns for equitable admissions practices.

physics graduate admissions equity inclusion GRE

^†^†preprint: APS/123-QED

I Introduction

As physics graduate admission committees across the country consider eliminating GRE scores from consideration when evaluating applicants [1, 2], it is important to continue examining the GRE’s ability to predict success in graduate school in order for programs to make informed policy choices. Although GRE scores are among the numeric metrics that best predict admission into U.S. graduate programs [3, 4], there are significant disparities in typical GRE performance between students of different demographic backgrounds [5]. Combined with the fact that Physics remains one of the least diverse of all the STEM fields [6], the prospect that GRE tests limit the ability of certain students to enter graduate school has led researchers to begin questioning the utility of GRE exam scores in the graduate admissions process in comparison to other quantitative metrics such as undergraduate GPA (UGPA) [1, 7, 8]. Among some of the findings in this body of work are indications that earning high marks on the GRE Physics (GRE-P) test fails to help students “stand out” to admissions committees who would have overlooked them due to an otherwise weak application [8], and that typical physics PhD admissions criteria such as the GRE-P exam fail to predict PhD completion despite limiting access to graduate school for underrepresented groups [1].

Yet overall PhD completion is only one measure of “success” in graduate school. Graduate faculty often cite high grades, graduation in a reasonable amount of time, and finding a job after graduation as indications of successful graduate students [9]. It is therefore crucial for admissions committees understand how these other measures of success are related to common quantitative admissions metrics as well. In particular, studying the role of graduate grade point average (GGPA) is important for both historical and practical reasons. Among physics graduate students, positive relationships between GRE-P scores, first-year graduate grades, and cumulative graduate grades have traditionally been touted as evidence for the exam’s utility in evaluating applicants [10, 11]. Several other studies [12, 13, 14, 15] suggest that a number of common admissions metrics are correlated with GGPA as well. At a practical level, gaining a better understanding of which factors best predict graduate grades is valuable due to the fact that performance in graduate classes can influence whether students will ultimately complete a PhD. For instance, programs may institute GPA requirements that prevent students from continuing study if their course grades do not meet certain criteria.

Predictive validity analyses of GRE scores across all STEM disciplines consistently find that scores on the GRE Quantitative (GRE-Q) and Verbal (GRE-V) tests are more effective predictors of graduate grades than PhD completion [12, 13]. For instance, recent studies on PhD admissions in the biomedical field found that students’ GRE-Q and GRE-V scores are poor predictors of PhD completion, but are more associated with first-semester and cumulative graduate school grades [14, 15]. In contrast, studies cited by the Educational Testing Service (ETS) such as the meta-analysis of GRE predictive validity by Kuncel et al. [11] show a positive correlation between GRE subject scores, graduate grades, and PhD completion. Kuncel et al. find that GRE subject tests show larger correlations with GGPA than GRE-Q, GRE-V, or UGPA, which they attribute to the subject-specific knowledge that the GRE subject tests are purported to measure. Still, GRE-Q and GRE-V scores, which the authors presume to be broad measures of cognitive ability, are shown to only moderately correlate with GGPA but do not significantly correlate with PhD completion. Kuncel et al. also find undergraduate grade point average (UGPA) correlates with GGPA but not completion.

Despite voluminous research on the efficacy of quantitative admissions metrics in predicting graduate success, there remains a dearth of studies specifically examining these metrics in the context of physics graduate education. No current study elucidates the relationships between undergraduate grades, GRE scores, and physics graduate grades. Moreover, studies such as [1] do not incorporate graduate grades into models of PhD completion despite its theoretical and structural importance on the road to graduate success. This paper aims to fill these gaps in the current literature.

The primary goal of this paper is to extend the analysis of Miller et al. [1], using the same data set to examine the correlations of common quantitative admissions statistics with graduate physics GPA, as well as the role that graduate GPA plays in predicting whether a student completes their PhD program. Whereas [1] did not utilize information on student graduate course performance, this paper incorporates graduate GPA into several models in order to determine whether commonly used admissions metrics predict PhD completion of US students directly, or indirectly via graduate GPA; a discussion of the theoretical motivation for why graduate grades may mediate the relationship between admissions metrics and PhD completion is offered in Section II. Hence, while the analysis presented in Miller et al. [1] was primarily focused on simply identifying the measures that best correlated with PhD completion, this analysis explores questions regarding both how and why those correlations occurred.

Exploring whether graduate GPA mediates the relationship between common admissions metrics and PhD completion affords us the opportunity to employ statistical methods from the literature on causal inference [16, 17, 18, 19, 20, 21, 22, 23]. In doing so we lay out methods of calculating the direct and indirect effects of common admissions metrics on PhD completion, as well as the assumptions needed for those effects to have a causal interpretation. This approach allows us to gain useful information from the present analysis, while careful examination of the assumptions required for causal interpretation will help guide future studies.

Use of statistical methods developed in the causal inference literature allows us to build on the findings in [24] by incorporating the ranking of a student’s PhD program along a mediating pathway to completion rather than as a covariate in regression analysis. We also present models with various combinations of GRE-P and GRE-Q scores to show that variance inflation due to collinearity is minimal, and is therefore not a concern. These analyses are included in the Supplemental Material.

We seek to answer two primary research questions in this paper:

1.

How do commonly used admissions metrics and demographic factors relate to physics graduate GPA?
2.

What role does graduate GPA play in predicting PhD completion, and do quantitative admissions metrics predict PhD completion indirectly through graduate GPA?

To answer these questions, we begin by exploring the relationships between variables using bivariate correlations. We then examine the unique predictive effects of different admission metrics on graduate GPA using a multiple linear regression model. These results lay the groundwork for a mediation analysis, which is used to examine the role that graduate GPA plays in PhD completion by breaking down effects into direct and indirect components. All of the primary analyses are performed using data on US physics graduate students, with a review of equivalent analyses for international students included in the Supplemental Materials.

II Background and Motivation

Before outlining the quantitative methods employed in this analysis, we briefly describe the student performance metrics used in this study and the broad individual student characteristics they help to measure. We discuss the underlying constructs hypothesized by the GRE Quantitative, Verbal, and subject tests, as well as undergraduate and graduate grades, and several external factors that influence these scores. The GRE Analytical Writing test is not included since it is not used enough in physics graduate admissions to warrant investigation. This section serves as a theoretical motivation for the models of PhD completion analyzed in this study.

The GRE is a series of standardized tests designed to help admissions committees predict future academic success of students coming from different backgrounds [25, 26]. While the GRE-Q assesses basic concepts of arithmetic, algebra, geometry, and data analysis, the GRE-V assesses reading comprehension skills and verbal and analytical reasoning skills. These tests are specifically constructed to measure “basic developed abilities relevant to performance in graduate studies” [27]. In their meta-analysis of GRE predictive validity, Kuncel et al. frame the GRE-Q and GRE-V as most related to declarative and procedural knowledge and suggest that they are best described as measures of general cognitive ability [11]. In contrast, the GRE subject tests “assess acquired knowledge specific to a field of study” [25], indicating that the GRE subject tests are ostensibly a direct measure of a student’s knowledge of a particular area of study. Indeed, admissions committees often interpret high GRE subject scores as strong evidence of a student’s discipline-specific knowledge [9]. Other research suggests that higher scores on standardized subject tests could also reflect greater student interest in that subject area [28].

The individual characteristics measured by a student’s undergraduate grades include both academic knowledge and a collection of noncognitive factors [29]. Much research exists on the meaning and value of grades, particularly at the K-12 level, and a review [30] of the past century of grading research finds that grades assess a multidimensional construct comprising academic knowledge, engagement (including motivation and interest), and persistence. Consistently over the past 100 years only about 25% of variance in grades is attributable to academic knowledge as measured by standardized tests [31], with recent research suggesting that much of the unexplained variance is represented by a student’s ability to negotiate the “social processes” of school [32]. We therefore regard UGPA as broadly measuring student academic achievement across a wide range of subjects in addition to several aspects of noncognitive traits such as motivation, interest, and work habits. However, we also recognize the limitations inherent in compressing students’ college academic performance into a single number, including the loss of information pertaining to student growth over time and time to degree completion.

We conceptualize graduate GPA similarly, treating it as a measure of subject-specific academic knowledge as well as other non-academic characteristics. In addition to the broad research on grades described above, research specifically addressing the factors leading to graduate success supports this interpretation of GGPA. Interviews conducted with over 100 graduate school faculty reveal that graduate success, which they define as a student’s ability to earn high graduate grades and eventually complete their degree in a timely manner, is largely dependent on noncognitive characteristics [9]. Interviewees deemed motivation, work ethic, maturity, and organizational skills as crucial to student success in graduate school. In a separate review of noncognitive predictors pertaining to graduate success, graduate GPA is specifically linked to a variety of personality (e.g. extroversion and conscientiousness) and attitudinal factors (e.g. motivation, self-efficacy, and interests) [33]. Indeed, the authors of the review characterize graduate grades as a complex composite of many of the cognitive and noncognitive factors related to graduate school success.

These conceptions of grades and GRE scores compel us to hypothesize that graduate GPA mediates the relationship between common quantitative admissions metrics and PhD completion. As a construct measuring subject-specific knowledge and several noncognitive characteristics, we expect UGPA and GRE-P to most strongly link to GGPA. Despite the drawbacks of cumulative UGPA (such as grade inflation and masking of individual growth), we expect UGPA to be associated with graduate course performance since it captures aspects of both academic and some non-academic characteristics. We also expect GRE-P scores to be related to graduate grades due to their requirement of specific physics knowledge. Finally, while we expect GRE-Q may have a small predictive effect on GGPA as a general cognitive measure, we do not necessarily expect a similar relationship for GRE-V as its content is generally disparate from physics curricula.

On both a theoretical and structural level we expect graduate grades to predict PhD completion. Graduate GPA may offer insight into a student’s mastery of advanced physics concepts as well as their personality and attitudes. All of these contribute to successful physics PhD completion, but likely vary in importance depending on choice of research area [33]. Structurally, a satisfactory performance in graduate courses is implicit on the path to completing a PhD. For example, GGPA requirements can act as thresholds for being allowed to continue studying in a PhD program. Poor course performance may negatively influence personal factors (e.g., self-efficacy, identity), limit access to research opportunities (e.g., repeating classes, ease of finding a research lab), or may indicate a lack of preparation for research, all of which could hinder PhD completion. It is also more temporally proximal to PhD completion than the other metrics included in the study.

Lastly, we note that although this discussion has focused on students’ individual traits that may predict success in graduate school, there are undoubtedly a number of external factors that can influence student attrition. Socioeconomic factors, mental health, family responsibilities, work duties, external job prospects, and departmental culture are all variables that would play a role in a comprehensive model of graduate school persistence [9, 34, 35, 36, 37, 38]. These uncollected pieces of information may act as “confounding” variables that can bias results, and we discuss their influence on the present study in Section V.2.

III Method

III.1 Data

Refer to caption — Figure 1: Distributions of the quantitative metrics included in the data. “Raincloud plots” show density plots, boxplots, and scatterplots of the data. We see that despite significant score gaps between US male and female GRE-P test takers, no such gap exists in subsequent GGPA performance. UGPA distributions for male and female applicants are also similar. Code for generating raincloud plots courtesy of [39]. All figures generated with the R package ggplot2 [40]. Figure themes adapted from [41].

Student level data for both this study and [1] was requested from physics departments that awarded more than 10 PhD’s per year for students who matriculated between 2000 and 2010, including information on undergraduate GPA (UGPA), GRE-Q, GRE-V, GRE-P, and graduate GPA (GGPA). Data collected also included the final disposition of students (PhD earned or not), start and finish years, and demographic information. GPA data is analyzed on a 4.0 scale while GRE scores are on the percentile scale.

We received data from 27 programs (approximately a 42% response rate), which spanned a broad range of National Research Council (NRC) rankings. The sample used in [1] consisted of all students in 21 programs for which start year was available. Given that the median time to degree across physics PhD programs is 6 years, some students who started before 2010 were still active at the time of data collection in 2016. The probability of not completing the physics PhD has an exponential time dependence with a time constant of 1.8 years. Thus, students who have been in their programs for three time constants have only a 5% chance of not completing. These students were thus categorized as completers in this study.

These data covered 3962 students. Of this subset, two programs did not report GGPA data for their students. Hence, the sample for this study excludes these students, thereby reducing the sample size to 3406 students across 19 programs. This corresponds to approximately 11% of matriculants to all U.S. physics PhD programs during the years studied.

	US	Non-US	Total
Male	1638	1164	2802
Female	317	287	604
Total	1955	1451	3406

Table 1: Demographic breakdown of the data used in this analysis. To focus on issues of diversity and inclusion most strongly associated with US applicants to physics graduate programs, we analyze only the data from US graduate students.

Among the sample of US students, 16% are women ( $N=317$ ). Although the authors generally advocate for a nuanced treatment of gender in physics education research and recognize the deficits associated with treating gender as a fixed binary variable [42], the present data set spans the years 2000 to 2010 during which the data collected by programs only allowed for the binary option of male/female. Hence, we must treat gender as a dichotomous variable in this analysis.

The racial composition of the dataset is 61.6% White, 1.3% Black, 2.1% Hispanic, 0.2% Native American, 3.5% Asian, 1.0% multiple or other races, and 30.2% undisclosed. Excluding the cases for which race was unavailable, the sample is thus roughly representative of annual PhD production in U.S. physics for gender, race, and citizenship [43]. We include race as a covariate in each analysis presented; however small $N$ , particularly for Black, Hispanic, Native American, and Asian students, often precludes useful interpretation of the results pertaining to these subsets.

In order to focus on issues of diversity and inclusion associated most strongly with US applicants, we use only the subset of data from domestic graduate students. This decision is further motivated by research suggesting that it is difficult for admission committees to directly compare scores earned by US and international students, indicating that separate analyses are appropriate [9]. Using the subset of students who are from the US reduces the total sample size for the study to $N=1955$ . A cursory visualization of the variables in the data set, as shown in Figure 1, shows that the distributions of scores for US and Non-US students are markedly different, which further justifies separate analyses of these two student populations.

Examining the distributions of scores in Figure 1, the presence of non-normality is evident in nearly all of the variables. Each of the continuous variables in the dataset fail the Shapiro-Wilk test of normality at the $\alpha$ level of .05. However, these tests are often of limited usefulness; in general distributions with skewness $\lvert\hat{\gamma_{1}}\rvert>3$ or kurtosis $\lvert\hat{\gamma_{2}}\rvert>10$ likely indicate that they violate any assumption of normality [44]. For this dataset, the GGPA distribution skewness $\hat{\gamma_{1}}=-3.31$ and kurtosis $\hat{\gamma_{2}}=20.43$ , indicating severe non-normality. The GRE-Q distribution also falls into the problematic range ( $\hat{\gamma_{1}}=-2.11$ and $\hat{\gamma_{2}=9.33}$ ). Ceiling effects are also present, since many students earned 4.0 grade point averages or earned the maximum score on the GRE examinations.

The data collection process was limited to gathering only cumulative graduate GPA rather than first-year graduate grades, which were not recorded by some programs. Thus, depending on whether a student persisted in a program, their graduate GPA may be based on many courses while others are based on only a few courses. The data set is also necessarily subject to range restriction, since data on student performance in graduate school is automatically limited to include only students who were accepted to undertake graduate study. We cannot know how students who were not accepted into graduate school would have performed had they been accepted. Range restriction may act to attenuate the strength of observed effects in subsequent analyses [45].

Although not used in a majority of this study, we briefly explore the role of the doctoral programs’ NRC ranking in PhD completion [46]. Since the NRC only gives confidence intervals for program rank, we created a ranking for this study by averaging the 5 and 95% confidence bounds for the NRC regression-based ranking (NRC-R) and rounded this up to the nearest five to protect the confidentiality of participating programs. This led to a ranking range of 5 to 105. We divided the programs into terciles of approximately equal number of records, and categorized as Tier 1 (highest ranked, NRC-R $\leq$ 20), Tier 2 (25 $\leq$ NRC-R $\leq$ 55), and Tier 3 (NRC-R $>$ 55).

Multiple imputation (predictive mean matching) is used to impute missing UGPA and GRE-P scores. Predictive mean matching is used due to the non-normality of the data. 160 students do not have data for either UGPA or GRE-P, while 400 are missing only UGPA and 263 are missing only GRE-P. All multiple imputation is conducted using the mice package in R [47]. 20 imputed data sets are used for each analysis. For consistency, incomplete variables are imputed using the same imputation model used in [1], in which the imputation model utilizes all other variables in the data set aside from graduate GPA and PhD completion. The model utilizes GRE-Q, GRE-V, program tier, gender, and race, as well as complete cases of UGPA and GRE-P. Although the imputation approach presented here is theoretically sound, we also present a comparison of several different models of data imputation in the Supplemental Materials.

III.2 Methods to explore the role of graduate grades

The goal of this section is twofold. First, we seek to gain a cursory look at how graduate GPA is related to common admissions metrics. In doing this, we also wish to determine whether it is reasonable that admissions metrics could indirectly predict completion through graduate GPA. This section presents a series of analyses meant to elucidate the relationships between standard admissions metrics, students’ GGPA, and students’ final disposition. To make our analysis maximally accessible to readers of different statistical backgrounds, we describe here in detail the methods used in this section.

Bivariate correlation coefficients provide information about the level of association between two variables, and are therefore a useful starting point for analysis. We construct a correlation matrix (see Table 2) for all variables in the sample using Pearson correlation coefficients, which are equivalent to the standardized slope coefficients for a linear model predicting $y$ from $x$ . These are given by

r_{xy}=\frac{\sum(x_{i}-\bar{x})(y_{i}-\bar{y})}{\sqrt{\sum(x_{i}-\bar{x})^{2}}\sqrt{\sum(y_{i}-\bar{y})^{2}}}

(1)

for any two continuous variables $x$ and $y$ . Calculating $r_{xy}$ gives us a first glance at the relationships between the continuous variables UGPA, GGPA, GRE-P, GRE-Q, and GRE-V.

When the $x$ variable is treated as dichotomous (e.g., gender in this data set), Eq. (1) reduces to the point-biserial correlation coefficient $r_{pb}$ ,

r_{pb}=\frac{\bar{y_{1}}-\bar{y_{0}}}{\sigma_{y}}\sqrt{pq},

(2)

where $\bar{y_{1}}$ and $\bar{y_{0}}$ are the means of the continuous $y$ variable for the two $x$ groups 1 and 0, $q$ and $p$ are the proportions of data belonging to these two groups, and $\sigma_{y}$ is the standard deviation for the $y$ variable. Like the Pearson coefficient, the quantity $r_{pb}$ ranges from -1 to 1 and indicates the strength of association between two variables. Conveniently, a significance test for the point-biserial correlation is identical to performing an independent t-test on the data [48]. Thus, the point-biserial correlation coefficient yields information about whether two group means are statistically different. For instance, the point-biserial correlation tests whether the GGPA of male students are statistically different from those of female students (we find that GGPAs are not significantly different by gender, see Table 2).

When $x$ and $y$ are both dichotomous, the Pearson coefficient reduces to the phi coefficient,

\phi=\sqrt{\frac{\chi^{2}}{n}},

(3)

where $\chi^{2}$ is the chi-squared statistic for a 2x2 contingency table and $n$ is the total number of observations in the data. The phi coefficient also ranges from -1 to 1 and indicates the strength of association between two binary variables. This quantity allows us to examine whether final disposition is significantly associated with gender (we find that the association between gender and final disposition just meets the threshold for statistical significance ( $\phi=-0.05\pm 0.04$ , $p=0.04$ ), likely due to the large sample size of our data, but the very small phi coefficient indicates that the practical strength of this relationship is negligible [49]).

To characterize how GGPA and other numerical predictors vary across program tier, we conduct several one-way analysis of variance (ANOVA) tests using program tier as the independent variable. ANOVA tests allow us to determine whether there are significant differences between different groups, such as students in different program tiers. These tests produce an F-statistic, which is interpreted as the ratio of between-group variability to within-group variability. Thus, higher values of F indicate that between-group variability is large compared to within-group variability, which is unlikely if the group means all have a similar value.

Lastly we present the results of a multiple regression analysis in which we regress GGPA on common admissions metrics and demographic factors. Regression allows us to examine the unique predictive effects of these predictors.

The classical linear regression model is written mathematically for an outcome variable $Y$ as

Y_{i}=\alpha_{1}X_{i1}+\alpha_{2}X_{i2}+...+\alpha_{k}X_{ik}+\epsilon_{i},

(4)

where $i=1,...,n$ , the number of observations in the data, and $k$ represents the number of predictors in the model. Error terms $\epsilon_{i}$ are assumed to be independent and normally distributed with mean 0 and standard deviation $\sigma$ . $\hat{\alpha}$ is the vector of regression coefficients that minimizes the sum of squared errors

\Sigma_{i=1}^{n}=(Y_{i}-\hat{\alpha}X_{i})^{2}

(5)

for the given data. The regression coefficients can be interpreted as the difference in the outcome variable $Y$ , on average, when comparing two groups of units that differ by 1 in one predictor $X$ while keeping all the other predictors the same.

We report both unstandardized and standardized versions of the regression coefficients. Unstandardized coefficients are the result of regression analysis using the original, unscaled variables. Thus, the unstandardized regression coefficients represent the predicted average change in the outcome variable $Y$ when the corresponding predictor $X$ is changed by one unit. This allows for a straightforward interpretation since the variables are not scaled, but does not yield insight into the relative predictive strengths of the independent variables since they are scaled differently. Standardized regression coefficients result from regression analyses using continuous variables that have been mean-centered and divided by their standard deviation, resulting in variables with variances equal to 1. Thus standardized regression coefficients represent the average number of standard deviations changed in the outcome variable when a predictor variable is increased by one standard deviation. By calculating the standardized coefficients, we exchange a simple interpretation of score change for an interpretation of which variables have the greatest effect on the dependent variable.

III.3 Mediation analysis methods

Using mediation analysis we seek to answer the question of whether graduate GPA mediates the predictive ability of common admissions metrics on PhD completion. Whereas analyses such as logistic regression [1] yield information about whether independent variables such as UGPA and GRE-P affect final disposition of a graduate student, they do not offer insight into the explanation of why and how UGPA, GRE-P, and other admissions metrics affect completion. Mediation analysis is one technique that allows us to probe the underlying process by which some variables influence others [19, 20, 21, 22, 23].

Figure 2 graphically depicts a prototypical mediation model, where $X$ , $Y$ and $M$ represent the model’s independent, dependent, and mediating variables. $C$ represents a covariate. As a hypothetical example, let’s say that previous research has shown a positive relationship between the use of active learning activities in physics class and student exam grades. Researchers might hypothesize that this relationship is actually due to a third mediating variable, student engagement. Using active learning activities in class may cause students to become more engaged with the material, making their subsequent exam grades increase. Engagement is a mediating variable in this case. Meanwhile, since students who are already interested in physics could be predisposed to being more engaged and performing better on exams, the researcher might take students’ prior physics interest into account by including it as a covariate in their analyses.

In this section, we wish to discern whether graduate GPA mediates the relationship between common admissions metrics and students’ likelihood of completing graduate school. In practice, mediation analysis is done by simultaneously estimating a set of regression equations [50]. The goal is to partition the total effect of the independent variable $X$ on the dependent variable $Y$ into two parts: the direct effect of $X$ on $Y$ and the indirect effect of $X$ on $Y$ through the mediating variable $M$ . For the simple example given above, the set of regression equations to be solved are:

Y_{i}=\beta_{0}+\beta_{1}M_{i}+\beta_{2}X_{i}+\beta_{3}C_{i}+\epsilon_{yi}

(6)

M_{i}=\gamma_{0}+\gamma_{1}X_{i}+\gamma_{2}C_{i}+\epsilon_{mi}.

(7)

Traditional mediation literature [51, 52] defines the direct effect of $X$ on $Y$ as the coefficient $\beta_{2}$ and the indirect effect of $X$ on $Y$ as the product of coefficients $\gamma_{1}\beta_{1}$ , corresponding to the products of the path coefficients along the mediated path shown in Figure 2. Statistically significant values of $\gamma_{1}\beta_{1}$ indicate that the relationship between $X$ and $Y$ is mediated by $M$ . This method demonstrates the general intuitive ideas underlying mediation analysis, but is subject to several important limitations. Foremost among the limitations associated with traditional mediation analysis is that its applicability to model categorical variables (e.g., binary outcomes) and nonlinearities is not well defined, as these situations preclude the use of sums and products of coefficients [53, 19, 54]. The difficulties associated with binary outcomes are therefore problematic for a model predicting final disposition, a binary outcome. Furthermore, traditional mediation models leave the causal interpretation of their results ambiguous [55].

Recent work in the field of causal inference [16, 17, 18] has formalized and generalized mediation analysis to resolve these limitations, allowing for categorical outcomes while also clarifying that under certain conditions the results may be interpreted causally. In this framework, often called the “potential outcomes” framework, the traditional product-of-coefficients mediation analysis is a special case for which the mediator and outcome variables are both continuous, while the functional forms of the direct and indirect effects for other situations become more complicated [50].

For the primary analysis of this paper, we calculate the direct and indirect effects defined by the potential outcomes framework for the case of a continuous independent variable (UGPA, GRE-P, and GRE-Q), a continuous mediator (GGPA), and a dichotomous outcome (final disposition). The simultaneous regression equations to be calculated are still Eqs. (6) and (7), except the binary $Y$ is replaced with $Y^{*}$ , a continuous unobserved latent variable which represents the observed binary variable. Once estimated, the direct and indirect effects reduce to simple differences in probability of completing a PhD between students across different values of the independent variables. Mathematically, the effects for a change in the independent variable from a value $x_{0}$ to $x_{1}$ at a particular value of the control $c$ are given by

\text{IE}=\Phi[\text{probit}(x_{0},x_{1})]-\Phi[\text{probit}(x_{0},x_{0})],

(8)

\text{DE}=\Phi[\text{probit}(x_{1},x_{1})]-\Phi[\text{probit}(x_{0},x_{1})],

(9)

where $\Phi$ represents the normal cumulative distribution function and $\text{probit}(x_{a},x_{b})$ is given by

\text{probit}(x_{a},x_{b})=[\beta_{0}+\beta_{2}x_{a}+\beta_{3}c+\\ \beta_{1}(\gamma_{0}+\gamma_{1}x_{b}+\gamma_{2}c)]/\sqrt{v(x_{a})},

(10)

and $v(x_{a})$ is

v(x_{a})=\beta_{1}^{2}\sigma_{m}^{2}+1.

(11)

Note that these expressions are all still simply combinations of the coefficients from the regression equations (6) and (7). Thus, the potential outcomes framework allows us to calculate the total predicted change in probability of completing a PhD due to an independent variable and decompose it into that variable’s indirect effect on final disposition through GGPA as well as its direct effect (see Figure 4).

Using this mediation framework can help to give powerful insights into nuanced relationships between the variables in an observational study [56]. However, giving a truly causal interpretation to the results of this mediation analysis requires a set of strong assumptions to be met, and in practice it can be difficult for any observational study fully meet these conditions [57]. Hence, we try to avoid making explicitly causal claims in our discussion of the results. We discuss the assumptions needed for a causal interpretation as well as the robustness of the current study to violations of those assumptions in Section V.2.

Mediation analyses were conducted using the mediation package in R [58]. Checks for the consistency of results across different computational approaches were done by performing duplicate mediation analyses in the R package medflex [59] as well as the statistical software Mplus [60].

Measure (M + SD)	UGPA	GRE-Q	GRE-V	GRE-P	GGPA	Final Disp.	Gender
UGPA (3.6 $\pm$ 0.3)	–	(0.25, 0.35)	(0.12, 0.22)	(0.26, 0.37)	(0.24, 0.33)	(0.10, 0.20)	(-0.11, 0.01)
GRE-Q (83.3 $\pm$ 10.4)	0.30	–	(0.30, 0.37)	(0.47, 0.54)	(0.13, 0.22)	(0.10, 0.18)	(-0.16, -0.07)
GRE-V (76.3 $\pm$ 18.7)	0.17	0.33	–	(0.23, 0.32)	(0.06, 0.15)	(0.02, 0.10)	(-0.02, 0.07)
GRE-P (52.9 $\pm$ 23.2)	0.31	0.51	0.28	–	(0.18, 0.27)	(0.10, 0.19)	(-0.33, -0.24)
GGPA (3.5 $\pm$ 0.5)	0.29	0.18	0.10	0.22	–	(0.39, 0.46)	(-0.06, 0.03)
Final Disp.	0.15	0.15	0.06	0.14	0.43	–	(-0.09, -0.01)
Gender	-0.05	-0.11	0.03	-0.30	-0.01	-0.05	–

Table 2: A matrix showing bivariate correlations between continuous and dichotomous variables used in subsequent analyses. Correlations are shown in the lower diagonal while confidence intervals for those correlations are shown in the upper diagonal. For example, the correlation between GGPA and GRE-P is

0.22

and a 95% CI of (0.18, 0.27), indicating a weak correlation. Means and standard deviations are also presented in the first column. GPAs are on a 4.0 scale while GRE scores are in terms of percentiles. Correlations are calculated for US students only.

IV Results

IV.1 Results of exploring the role of graduate grades

Correlations An initial question related to predicting a student’s final disposition is whether UGPA, GRE scores, and GGPA are reliably correlated with a student’s final outcome. We are also interested in the strength of association between GGPA, UGPA, and GRE scores, as this information yields insight into whether GGPA could serve as a mediating variable in predicting final disposition. Table 2 contains the bivariate correlations (Pearson, point-biserial, and phi) between each pair of measures for the sample in the lower diagonal. The 95% confidence intervals are reported in the upper diagonal. Confidence intervals that do not include a value of zero indicate that the correlation is statistically significant. The means and standard deviations of the continuous variables are presented in the table’s first column (GPA data is analyzed on a 4.0 scale while GRE scores are on the percentile scale).

Inspection of Table 2 reveals that GGPA is the predictor most strongly correlated with final disposition ( $r_{pb}=0.43$ ). This value is statistically significant ( $p<.001$ ) and positive, meaning that students with higher GGPA are more likely to finish their PhD program successfully. This trend is visually apparent in Figure 3(a), which shows boxplots of GGPA grouped by PhD completers and non-completers. We also observe that UGPA ( $r_{xy}=0.29$ , $p<.001$ ), GRE-Q ( $r_{xy}=0.18$ , $p<.001$ ), and GRE-P ( $r_{xy}=0.22$ , $p<.001$ ) are all positively correlated with GGPA, albeit weakly, meaning that students with higher scores in these metrics tend to earn higher GGPAs. Taken together, the observation that higher UGPA and GRE scores positively correlate to GGPA, which in turn correlate with a student’s likelihood of completion, implies that GGPA might play an important role in mediating the influence of these admissions metrics on PhD completion.

The lack of a statistically significant correlation between gender and GGPA indicates that the disparity in scores on the GRE-P between males and females does not manifest itself in subsequent GGPA performance. Indeed, there is no statistical difference between average GGPA for males and females ( $r_{pb}=-0.01$ , $p=0.57$ , equivalent to a non-significant independent $t$ -test), as demonstrated in Figure 3(b). Yet there exists a statistically significant difference between males and females in GRE-P performance in our data ( $r_{pb}=-0.30$ , $p<.001$ , equivalent to a statistically significant independent $t$ -test). Thus GGPA does not differ between genders despite the known performance gap between males and females on the GRE-P exam. Furthermore, the phi coefficient measuring the association between gender and PhD completion is negligible despite barely meeting the threshold of statistical significance ( $\phi=-0.05$ , $p=0.04$ ).

Still, the bivariate correlations shown in Table 2 do not control for possible relationships between the variables of interest. For instance, the moderate correlation between GRE-Q and GRE-P ( $r_{xy}=0.51$ , $p<.001$ ) indicates that there may be a spurious relationship between one of these variables and GGPA. In addition, we observe low but statistically significant correlations between UGPA and GRE-P ( $r_{xy}=0.31$ , $p<.001$ ), GRE-Q ( $r_{xy}=0.30$ , $p<.001$ ), and GRE-V ( $r_{xy}=0.17$ , $p<.001$ ). This is expected, as UGPA likely contains some information regarding the specific aspects of students’ aptitudes tested by GRE exams. These results motivate the use of multiple regression analysis later in this section in order to disentangle the unique effects of each independent variable on GGPA. That analysis reveals that when we isolate the unique predictive effects of each variable in the regression model, UGPA and GRE-P remain significant but weak predictors of GGPA, while GRE-Q does not retain statistical significance.

Similarly, although UGPA ( $r_{pb}=0.15$ , $p<.001$ ), GRE-P ( $r_{pb}=0.14$ , $p<.001$ ) and GRE-Q ( $r_{pb}=0.15$ , $p<.001$ ) are positively correlated with PhD completion, the magnitudes of these correlations are very weak and do not account for other parameters that may be associated with completion. Multivariate approaches allows us to isolate how individual metrics relate to PhD completion, which we explore in the mediation analysis presented in Section IV.2. Consistent with previous studies of PhD completion using multivariate approaches [1], we find that when accounting for other parameters, only UGPA remains a statistically significant predictor of completion.

Results of one-way independent ANOVA tests show that the main effect of program tier on GGPA is significant, $F(2,1949)=26.31$ , $p<.001$ , which reflects the upward trend in GGPA from Tier 3 to Tier 1 and 2 programs. A Tukey post hoc test reveals that the GGPA was significantly higher for students at Tier 2 ( $M=3.59$ , $SD=0.40$ , $p<.001$ ) and Tier 1 ( $M=3.60$ , $SD=0.42$ , $p<.001$ ) institutions than those at Tier 3 institutions ( $M=3.41$ , $SD=0.59$ ). There was no statistically significant difference between the Tier 1 and Tier 2 groups ( $p=0.97$ ).

Multiple Regression To disentangle the unique effects of each predictor on GGPA we conduct a multiple linear regression analysis. Multiple linear regression allows us to simultaneously fit many independent variables to measure each of their relative effects on a single dependent variable, GGPA. Analyzing the raw coefficients fitted by the regression analysis yields insight into the predicted change in GGPA due to changes in one variable while holding all others constant. Standardized coefficients allow for a comparison of the relative effect sizes of the independent variables.

Our model includes all available GRE scores (GRE-P, GRE-Q, and GRE-V) as well as UGPA in order to examine the unique predictive effects of each measure. We considered the possibility that including both GRE-P and GRE-Q in the same model would raise collinearity concerns, but find these concerns unfounded. The bivariate correlation ( $r_{xy}=0.51$ ) between the two is not high enough to warrant genuine concern [61, 62]; furthermore, the variance inflation factor (VIF) for every imputed dataset’s regression model was below 1.75, well below the commonly cited threshold of 10. Hence, we deem the model posed in the study as most appropriate to answer the research questions raised in this study (further discussion regarding collinearity concerns in the data is available in the Supplemental Material).

Analyzing admissions metrics as predictors of graduate GPA and whether graduate GPA mediates PhD completion

Abstract

I Introduction

II Background and Motivation

III Method

III.1 Data

III.2 Methods to explore the role of graduate grades

III.3 Mediation analysis methods

IV Results

IV.1 Results of exploring the role of graduate grades

IV.2 Mediation analysis results

V Discussion

V.1 Interpretation of Results

V.2 Limitations and future research

V.2.1 Assumptions in Causal Mediation Analysis

V.2.2 Sensitivity Analysis for Violation of Assumptions

V.2.3 Conceptualizing possible unmeasured confounders

VI Conclusions

Acknowledgements.

References