Project description:The move away from transmission-based lecturing toward a more student-centred active learning approach is well evidenced in STEM higher education. However, the examination of active learning has generally remained confined to formal timetabled contexts, with assumptions made that students independently manage the transition between timetabled and non-timetabled learning. This paper introduces research findings from a mixed methods study that used an ecological approach when investigating student transitions between a formal lecture theatre and adjacent informal breakout space in a UK STEM university. Using quantitative occupancy monitoring data to analyse usage patterns of both spaces, in combination with qualitative ethnographic observations and field interviews, permitted a purposeful exploration of student engagement with transitions within and between the two learning spaces. The ecological approach aided the discovery of spatial, pedagogic and agentic transitions and tensions, which subsequently informed strategic modification of space across the institution to facilitate the adoption of active learning pedagogy.
Project description:Adolescent mental health problems are rising rapidly around the world. To combat this rise, clinicians and policymakers need to know which risk factors matter most in predicting poor adolescent mental health. Theory-driven research has identified numerous risk factors that predict adolescent mental health problems but has difficulty distilling and replicating these findings. Data-driven machine learning methods can distill risk factors and replicate findings but have difficulty interpreting findings because these methods are atheoretical. This study demonstrates how data- and theory-driven methods can be integrated to identify the most important preadolescent risk factors in predicting adolescent mental health. Machine learning models examined which of 79 variables assessed at age 10 were the most important predictors of adolescent mental health at ages 13 and 17. These models were examined in a sample of 1176 families with adolescents from nine nations. Machine learning models accurately classified 78% of adolescents who were above-median in age 13 internalizing behavior, 77.3% who were above-median in age 13 externalizing behavior, 73.2% who were above-median in age 17 externalizing behavior, and 60.6% who were above-median in age 17 internalizing behavior. Age 10 measures of youth externalizing and internalizing behavior were the most important predictors of age 13 and 17 externalizing/internalizing behavior, followed by family context variables, parenting behaviors, individual child characteristics, and finally neighborhood and cultural variables. The combination of theoretical and machine-learning models strengthens both approaches and accurately predicts which adolescents demonstrate above average mental health difficulties in approximately 7 of 10 adolescents 3-7 years after the data used in machine learning models were collected.
Project description:ObjectiveThe successful implementation and interpretation of machine learning (ML) models in epidemiological studies can be challenging without an extensive programming background. We provide a didactic example of machine learning for risk prediction in this study by determining whether early life factors could be useful for predicting adolescent psychopathology.MethodsIn total, 9643 adolescents ages 9-10 from the Adolescent Brain and Cognitive Development (ABCD) Study were included in ML analysis to predict high Child Behavior Checklist (CBCL) scores (i.e., t-scores ≥ 60). ML models were constructed using a series of predictor combinations (prenatal, family history, sociodemographic) across 5 different algorithms. We assessed ML performance through sensitivity, specificity, F1-score, and area under the curve (AUC) metrics.ResultsA total of 1267 adolescents (13.1 %) were found to have high CBCL scores. The best performing algorithms were elastic net and gradient boosted trees. The best performing elastic net models included prenatal and family history factors (Sensitivity 0.654, Specificity 0.713; AUC 0.742, F1-score 0.401) and prenatal, family, history, and sociodemographic factors (Sensitivity 0.668, Specificity 0.704; AUC 0.745, F1-score 0.402). Across all 5 ML algorithms, family history factors (e.g., either parent had nervous breakdowns, trouble holding jobs/fights/police encounters, and counseling for mental issues) and sociodemographic covariates (e.g., maternal age, child's sex, caregiver income and caregiver education) tended to be better predictors of adolescent psychopathology. The most important prenatal predictors were unplanned pregnancy, birth complications, and pregnancy complications.ConclusionOur results suggest that inclusion of prenatal, family history, and sociodemographic factors in ML models can generate moderately accurate predictions of adolescent psychopathology. Issues associated with model overfitting, hyperparameter tuning, and system seed setting should be considered throughout model training, testing, and validation. Future early risk predictions models may improve with the inclusion of additional relevant covariates.
Project description:In psychiatry, compared to other medical fields, the identification of biological markers that would complement current clinical interview, and enable more objective and faster clinical diagnosis, implement accurate monitoring of treatment response and remission, is grave. Current technological development enables analyses of various biological marks in high throughput scale at reasonable costs, and therefore 'omic' studies are entering the psychiatry research. However, big data demands a whole new plethora of skills in data processing, before clinically useful information can be extracted. So far the classical approach to data analysis did not really contribute to identification of biomarkers in psychiatry, but the extensive amounts of data might get to a higher level, if artificial intelligence in the shape of machine learning algorithms would be applied. Not many studies on machine learning in psychiatry have been published, but we can already see from that handful of studies that the potential to build a screening portfolio of biomarkers for different psychopathologies, including suicide, exists.
Project description:Single molecule localisation (SML) microscopy is a fundamental tool for biological discoveries; it provides sub-diffraction spatial resolution images by detecting and localizing "all" the fluorescent molecules labeling the structure of interest. For this reason, the effective resolution of SML microscopy strictly depends on the algorithm used to detect and localize the single molecules from the series of microscopy frames. To adapt to the different imaging conditions that can occur in a SML experiment, all current localisation algorithms request, from the microscopy users, the choice of different parameters. This choice is not always easy and their wrong selection can lead to poor performance. Here we overcome this weakness with the use of machine learning. We propose a parameter-free pipeline for SML learning based on support vector machine (SVM). This strategy requires a short supervised training that consists in selecting by the user few fluorescent molecules (∼ 10-20) from the frames under analysis. The algorithm has been extensively tested on both synthetic and real acquisitions. Results are qualitatively and quantitatively consistent with the state of the art in SML microscopy and demonstrate that the introduction of machine learning can lead to a new class of algorithms competitive and conceived from the user point of view.
Project description:The ecological and environmental science communities have embraced machine learning (ML) for empirical modelling and prediction. However, going beyond prediction to draw insights into underlying functional relationships between response variables and environmental 'drivers' is less straightforward. Deriving ecological insights from fitted ML models requires techniques to extract the 'learning' hidden in the ML models.We revisit the theoretical background and effectiveness of four approaches for deriving insights from ML: ranking independent variable importance (Gini importance, GI; permutation importance, PI; split importance, SI; and conditional permutation importance, CPI), and two approaches for inference of bivariate functional relationships (partial dependence plots, PDP; and accumulated local effect plots, ALE). We also explore the use of a surrogate model for visualization and interpretation of complex multi-variate relationships between response variables and environmental drivers. We examine the challenges and opportunities for extracting ecological insights with these interpretation approaches. Specifically, we aim to improve interpretation of ML models by investigating how effectiveness relates to (a) interpretation algorithm, (b) sample size and (c) the presence of spurious explanatory variables.We base the analysis on simulations with known underlying functional relationships between response and predictor variables, with added white noise and the presence of correlated but non-influential variables. The results indicate that deriving ecological insight is strongly affected by interpretation algorithm and spurious variables, and moderately impacted by sample size. Removing spurious variables improves interpretation of ML models. Meanwhile, increasing sample size has limited value in the presence of spurious variables, but increasing sample size does improves performance once spurious variables are omitted. Among the four ranking methods, SI is slightly more effective than the other methods in the presence of spurious variables, while GI and SI yield higher accuracy when spurious variables are removed. PDP is more effective in retrieving underlying functional relationships than ALE, but its reliability declines sharply in the presence of spurious variables. Visualization and interpretation of the interactive effects of predictors and the response variable can be enhanced using surrogate models, including three-dimensional visualizations and use of loess planes to represent independent variable effects and interactions.Machine learning analysts should be aware that including correlated independent variables in ML models with no clear causal relationship to response variables can interfere with ecological inference. When ecological inference is important, ML models should be constructed with independent variables that have clear causal effects on response variables. While interpreting ML models for ecological inference remains challenging, we show that careful choice of interpretation methods, exclusion of spurious variables and adequate sample size can provide more and better opportunities to 'learn from machine learning'.
Project description:BackgroundThe rapid spread of COVID-19 means that government and health services providers have little time to plan and design effective response policies. It is therefore important to quickly provide accurate predictions of how vulnerable geographic regions such as counties are to the spread of this virus.ObjectiveThe aim of this study is to develop county-level prediction around near future disease movement for COVID-19 occurrences using publicly available data.MethodsWe estimated county-level COVID-19 occurrences for the period March 14 to 31, 2020, based on data fused from multiple publicly available sources inclusive of health statistics, demographics, and geographical features. We developed a three-stage model using XGBoost, a machine learning algorithm, to quantify the probability of COVID-19 occurrence and estimate the number of potential occurrences for unaffected counties. Finally, these results were combined to predict the county-level risk. This risk was then used as an estimated after-five-day-vulnerability of the county.ResultsThe model predictions showed a sensitivity over 71% and specificity over 94% for models built using data from March 14 to 31, 2020. We found that population, population density, percentage of people aged >70 years, and prevalence of comorbidities play an important role in predicting COVID-19 occurrences. We observed a positive association at the county level between urbanicity and vulnerability to COVID-19.ConclusionsThe developed model can be used for identification of vulnerable counties and potential data discrepancies. Limited testing facilities and delayed results introduce significant variation in reported cases, which produces a bias in the model.
Project description:Early childhood asthma diagnosis is common; however, many children diagnosed before age 5 experience symptom resolution and it remains difficult to identify individuals whose symptoms will persist. Our objective was to develop machine learning models to identify which individuals diagnosed with asthma before age 5 continue to experience asthma-related visits. We curated a retrospective dataset for 9,934 children derived from electronic health record (EHR) data. We trained five machine learning models to differentiate individuals without subsequent asthma-related visits (transient diagnosis) from those with asthma-related visits between ages 5 and 10 (persistent diagnosis) given clinical information up to age 5 years. Based on average NPV-Specificity area (ANSA), all models performed significantly better than random chance, with XGBoost obtaining the best performance (0.43 mean ANSA). Feature importance analysis indicated age of last asthma diagnosis under 5 years, total number of asthma related visits, self-identified black race, allergic rhinitis, and eczema as important features. Although our models appear to perform well, a lack of prior models utilizing a large number of features to predict individual persistence makes direct comparison infeasible. However, feature importance analysis indicates our models are consistent with prior research indicating diagnosis age and prior health service utilization as important predictors of persistent asthma. We therefore find that machine learning models can predict which individuals will experience persistent asthma with good performance and may be useful to guide clinician and parental decisions regarding asthma counselling in early childhood.
Project description:Sampling impediments and paucity of suitable material for molecular analyses have precluded the study of speciation and radiation of deep-sea species in Antarctica. We analyzed barcodes together with genome-wide single nucleotide polymorphisms obtained from double digestion restriction site-associated DNA sequencing (ddRADseq) for species in the family Antarctophilinidae. We also reevaluated the fossil record associated with this taxon to provide further insights into the origin of the group. Novel approaches to identify distinctive genetic lineages, including unsupervised machine learning variational autoencoder plots, were used to establish species hypothesis frameworks. In this sense, three undescribed species and a complex of cryptic species were identified, suggesting allopatric speciation connected to geographic or bathymetric isolation. We further observed that the shallow waters around the Scotia Arc and on the continental shelf in the Weddell Sea present high endemism and diversity. In contrast, likely due to the glacial pressure during the Cenozoic, a deep-sea group with fewer species emerged expanding over great areas in the South-Atlantic Antarctic Ridge. Our study agrees on how diachronic paleoclimatic and current environmental factors shaped Antarctic communities both at the shallow and deep-sea levels, promoting Antarctica as the center of origin for numerous taxa such as gastropod mollusks.
Project description:Type 2 diabetes mellitus (T2DM) is a chronic metabolic disorder characterized by elevated blood glucose levels. Despite the availability of pharmacological treatments, dietary plans, and exercise regimens, T2DM remains a significant global cause of mortality. As a result, there is an increasing interest in exploring lifestyle interventions, such as intermittent fasting (IF). This study aims to identify underlying patterns and principles for effectively improving T2DM risk parameters through IF. By analyzing data from multiple randomized clinical trials investigating various IF interventions in humans, a machine learning algorithm was employed to develop a personalized recommendation system. This system offers guidance tailored to pre-diabetic and diabetic individuals, suggesting the most suitable IF interventions to improve T2DM risk parameters. With a success rate of 95%, this recommendation system provides highly individualized advice, optimizing the benefits of IF for diverse population subgroups. The outcomes of this study lead us to conclude that weight is a crucial feature for females, while age plays a determining role for males in reducing glucose levels in blood. By revealing patterns in diabetes risk parameters among individuals, this study not only offers practical guidance but also sheds light on the underlying mechanisms of T2DM, contributing to a deeper understanding of this complex metabolic disorder.