Faculty Research Interests

Students are strongly encouraged to consult faculty when considering whether to participate in honors and in preparing their proposals. Information about faculty research interests may be found below to assist in this process. Due to sabbaticals and other leave patterns, not all faculty may be available to supervise a thesis in a given year.
 

Professor Brittney Bailey

As a biostatistician, my work involves developing and evaluating statistical methods for public health research. I am particularly interested in challenges that arise in clinical trials with nested or clustered study designs, where study participants are either members of intact groups (e.g., communities, workplaces, schools), followed longitudinally, or placed into groups for the purpose of the study (e.g., therapy groups, training sessions, yoga classes). Traditional statistical methods that assume independence break down in these settings, so we have to modify our approach to account for the correlation between measurements within each group. I consider modeling approaches and methods for handling missing data for these types of clinical trials.

Students interested in my research should consider taking a course in missing data (STAT 404) or generalized linear and mixed models (STAT 456).

Student interested in working with me for their honors thesis in 2024-2025 will explore one of the following topics:

  1. Parallelization options for more efficient simulation of multiple imputation methods
    • Multiple  imputation (MI) of missing data involves repeatedly "filling in" missing values to obtain complete datasets for statistical analysis. MI is an "embarrassingly parallel" process that is fast for any individual analysis of modestly sized datasets, but is computationally intensive for big datasets and simulation studies evaluating MI methods. This project will explore how to improve computational time by leveraging parallelization packages in R in combination with high-performance computing options available to Amherst College.
    • As part of this work, the student will be expected to
      • take STAT 404 and/or learn methods for multiple imputation of missing data
      • train to use the HPC system
  2. Defining predictors based on incomplete ranking data (possibly co-advised with Prof Palmquist, Psychology)
    • Motivated by a study designed by Professor Carrie Palmquist (see "Do children understand what information is most helpful for identifying good sources?"), this project will explore options for analyzing a (repeated) binary outcome in association with a predictor that could be interpreted as either categorical data with low cell counts or incomplete ranking data. Focusing on the latter part of this project, we will explore how various approaches to incorporating the incomplete rankings affect the statistical conclusions drawn from this type of data.
    • As part of this work, the student will be expected to
      • take STAT 456 or STAT 436 and/or learn methods for analyzing (repeated) categorical data (e.g., Fisher's exact test, Chi-squared tests, generalized linear models, generalized linear mixed models)
  3. Performance of bootstrapped confidence intervals of intracluster correlation coefficients
    • The intracluster correlation coefficient (ICC) is a measure of the degree of similarity between measurements from people with the same cluster  (e.g., patients within the same care-provider) in a cluster-randomized trial, and estimates of the ICC are important for determining sample sizes when designing new studies. A 2014 study by Ionan et al. compared Bayesian and frequentist methods of estimating confidence intervals of ICCs, but did not include bootstrapped confidence intervals in the comparison. This project will evaluate the performance of bootstrapping and possibly other resampling methods for obtaining confidence intervals for an ICC, especially in the small sample setting.
    • As part of this work, the student will be expected to
      • take STAT 456, STAT 436, or STAT 363 and learn methods for analyzing clustered, longitudinal, or repeated measures data.

 

Professor Katharine Correia

As a biostatistician, my research interests focus around public health and medical applications, particularly in reproductive medicine. In two of my recent collaborations, I have investigated the association between racial disparities and state insurance mandates for assisted reproductive technology treatment and, separately, developed a prediction tool for patients and doctors to use at the start of assisted reproductive technology treatment. Some of the statistical methodology questions that arose through these collaborations involved methods for clustered data (such as generalized estimating equations) and methods for complex missing data. 
 

Professor Nicholas Horton

My statistical methodological research focuses on the development of approaches to account for multivariate response models, longitudinal studies, and missing data.

In addition, I am actively engaged in pedagogical research on how to improve data science workflows and to develop and assess data acumen in students. 
 
Potential honors thesis topics for 2024-2025:
  1. Assessment of latent class models and measures of entropy
  2. False discovery rate methods for high dimensional auditory data analyses
  3. Missing data methods for intensive longitudinal data
  4. Fostering reproducibility and workflow in data science

Professor Shu-Min Liao

My main research interests include fully nonparametric theory and methodology, model-free multivariate dependence measures for categorical data, and STEM education research focused on DEIA (Diversity, Equity, Inclusion, and Accessibility) topics. I also have experience and interest in categorical data analysis, time series analysis, survival analysis, and classification methodologies. Most of my recent publications focus on copula modeling for discrete/ordinal data and inclusive pedagogy research.
 
As a strong believer for student-faculty partnership, I enjoy learning and doing research with my students. My recent thesis students have been actively involved in my research on copula modeling, with applications in goodness-of-fit testing and visualization of multi-dimensional categorical data. Moreover, my student interns have worked with me to co-create a new intro stats course addressing self-care and mental health issues (STAT 136) and various resources integrating inclusive pedagogy into teaching and learning R and Python.
 

Professor Amy Wagaman

My research interests include topics in applications of statistics and networks to understand protein folding, multivariate data analysis (particularly clustering and classification, i.e. machine learning), nonparametric statistics, dimension reduction, and statistics education. I have current research collaborations with Dr. Jaswal in Chemistry on protein folding and stability and with an external collaborator on a nonparametric test procedure. 

I have taught all courses in the core statistics curriculum and two electives, Multivariate Data Analysis, and Nonparametric Statistics. I have supervised theses on a variety of topics, but many have included either a network or machine learning (classification) connection to my research interests.