Project description:We use hand-collected data on penalty kicks in the top-level football competitions across France, Germany, Italy, Spain, and the United Kingdom over the 2019/2020 season to analyse how social environment affects the performance of individuals. We exploit the Covid-19 outbreak to induce a plausible source of variation in the supporters’ attendance. We find that for home teams the probability of missing a penalty increases when matches are forced to be played behind closed doors, while visiting teams are less likely to choke on a penalty kick, with these effects being more pronounced when the level of attendance (measured before the pandemic) was high. Taken together, these findings indicate that not only a supportive audience, but also the size of the support plays a key role for success of skill tasks.
Project description:IntroductionWhen taking a soccer penalty kick, there are two distinct kicking techniques that can be adopted; a 'power' penalty or a 'placement' penalty. The current study investigated how the type of penalty kick being taken affected the kicker's visual search strategy and where the ball hit the goal (end ball location).MethodWearing a portable eye tracker, 12 university footballers executed 2 power and placement penalty kicks, indoors, both with and without the presence of a goalkeeper. Video cameras were used to determine initial ball velocity and end ball location.ResultsWhen taking the power penalty, the football was kicked significantly harder and more centrally in the goal compared to the placement penalty. During the power penalty, players fixated on the football for longer and more often at the goalkeeper (and by implication the middle of the goal), whereas in the placement penalty, fixated longer at the goal, specifically the edges. Findings remained consistent irrespective of goalkeeper presence.Discussion/conclusionFindings indicate differences in visual search strategy and end ball location as a function of type of penalty kick. When taking the placement penalty, players fixated and kicked the football to the edges of the goal in an attempt to direct the ball to an area that the goalkeeper would have difficulty reaching and saving. Fixating significantly longer on the football when taking the power compared to placement penalty indicates a greater importance of obtaining visual information from the football. This can be attributed to ensuring accurate foot-to-ball contact and subsequent generation of ball velocity. Aligning gaze and kicking the football centrally in the goal when executing the power compared to placement penalty may have been a strategy to reduce the risk of kicking wide of the goal altogether.
Project description:The analysis of penalty kick has played an important role in performance analysis. The study aims are to get formal feedback on the relevance of variables for penalty kick analysis, to design and validate an observational system; and to assess experts' opinion on the optimum video footage in penalty kick analysis. A structured development process was adopted for content validity, reliability and agreement on video usage. All observational variables included in OSPAF showed Aiken's V values above the cut-off (for 5-scale V> 0.64; for 2-scale = V > 0.75; p < 0.05). Cohen's Kappa resulted in mean intra- and inter-rater reliability values of 0.90 and 0.86, respectively. It is recommended to combine at least three different viewing angles (V = 0.90; p = 0.006) with standardization of video quality (V = 0.95; p = 0.006). Changing the viewing angles may influence the observer perception (V = 0.86; p = 0.006). The aerial and pitch-level viewing angle behind the penalty taker and pitch-level viewing angle behind the goalkeeper were indicated as most appropriate for observational analysis (V = 0.97; p = 0.01). The OSPAF met all requirements of instrument validation. It may be recommended as basis of future observational systems on penalty kicks.
Project description:Machine learning (ML) methods have great potential to transform chemical discovery by accelerating the exploration of chemical space and drawing scientific insights from data. However, modern chemical reaction ML models, such as those based on graph neural networks (GNNs), must be trained on a large amount of labelled data in order to avoid overfitting the data and thus possessing low accuracy and transferability. In this work, we propose a strategy to leverage unlabelled data to learn accurate ML models for small labelled chemical reaction data. We focus on an old and prominent problem-classifying reactions into distinct families-and build a GNN model for this task. We first pretrain the model on unlabelled reaction data using unsupervised contrastive learning and then fine-tune it on a small number of labelled reactions. The contrastive pretraining learns by making the representations of two augmented versions of a reaction similar to each other but distinct from other reactions. We propose chemically consistent reaction augmentation methods that protect the reaction center and find they are the key for the model to extract relevant information from unlabelled data to aid the reaction classification task. The transfer learned model outperforms a supervised model trained from scratch by a large margin. Further, it consistently performs better than models based on traditional rule-driven reaction fingerprints, which have long been the default choice for small datasets, as well as those based on reaction fingerprints derived from masked language modelling. In addition to reaction classification, the effectiveness of the strategy is tested on regression datasets; the learned GNN-based reaction fingerprints can also be used to navigate the chemical reaction space, which we demonstrate by querying for similar reactions. The strategy can be readily applied to other predictive reaction problems to uncover the power of unlabelled data for learning better models with a limited supply of labels.
Project description:Copy number variation (CNV) is a primary source of structural variation in the human genome, leading to several disorders. Therefore, analyzing neonatal CNVs is crucial for managing CNV-related chromosomal disabilities. However, genomic waves can hinder accurate CNV analysis. To mitigate the influences of the waves, we adopted a machine learning approach and developed a new method that uses a modified log R ratio instead of the commonly used log R ratio. Validation results using samples with known CNVs demonstrated the superior performance of our method. We analyzed a total of 16,046 Korean newborn samples using the new method and identified CNVs related to 39 genetic disorders were identified in 342 cases. The most frequently detected CNV-related disorder was Joubert syndrome 4. The accuracy of our method was further confirmed by analyzing a subset of the detected results using NGS and comparing them with our results. The utilization of a genome-wide single nucleotide polymorphism array with wave offset was shown to be a powerful method for identifying CNVs in neonatal cases. The accurate screening and the ability to identify various disease susceptibilities offered by our new method could facilitate the identification of CNV-associated chromosomal disease etiologies.
Project description:ObjectiveTo mitigate the burden associated with heart failure (HF), primary prevention is of the utmost importance. To improve early risk stratification, advanced computational methods such as machine learning (ML) capturing complex individual patterns in large data might be necessary. Therefore, we compared the predictive performance of incident HF risk models in terms of (a) flexible ML models and linear models and (b) models trained on a single cohort (single-center) and on multiple heterogeneous cohorts (multi-center).Design and methodsIn our analysis, we used the meta-data consisting of 30,354 individuals from 6 cohorts. During a median follow-up of 5.40 years, 1,068 individuals experienced a non-fatal HF event. We evaluated the predictive performance of survival gradient boosting (SGB), CoxNet, the PCP-HF risk score, and a stacking method. Predictions were obtained iteratively, in each iteration one cohort serving as an external test set and either one or all remaining cohorts as a training set (single- or multi-center, respectively).ResultsOverall, multi-center models systematically outperformed single-center models. Further, c-index in the pooled population was higher in SGB (0.735) than in CoxNet (0.694). In the precision-recall (PR) analysis for predicting 10-year HF risk, the stacking method, combining the SGB, CoxNet, Gaussian mixture and PCP-HF models, outperformed other models with PR/AUC 0.804, while PCP-HF achieved only 0.551.ConclusionWith a greater number and variety of training cohorts, the model learns a wider range of specific individual health characteristics. Flexible ML algorithms can be used to capture these diverse distributions and produce more precise prediction models.
Project description:Machine learning algorithms are being increasingly used in healthcare settings but their generalizability between different regions is still unknown. This study aims to identify the strategy that maximizes the predictive performance of identifying the risk of death by COVID-19 in different regions of a large and unequal country. This is a multicenter cohort study with data collected from patients with a positive RT-PCR test for COVID-19 from March to August 2020 (n = 8477) in 18 hospitals, covering all five Brazilian regions. Of all patients with a positive RT-PCR test during the period, 2356 (28%) died. Eight different strategies were used for training and evaluating the performance of three popular machine learning algorithms (extreme gradient boosting, lightGBM, and catboost). The strategies ranged from only using training data from a single hospital, up to aggregating patients by their geographic regions. The predictive performance of the algorithms was evaluated by the area under the ROC curve (AUROC) on the test set of each hospital. We found that the best overall predictive performances were obtained when using training data from the same hospital, which was the winning strategy for 11 (61%) of the 18 participating hospitals. In this study, the use of more patient data from other regions slightly decreased predictive performance. However, models trained in other hospitals still had acceptable performances and could be a solution while data for a specific hospital is being collected.
Project description:The penalty kick is of great importance in the sport of soccer. Therefore, the aim of this study was to test predictions of the OPTIMAL theory and identify key attentional and motivational factors that impact the accuracy of the penalty kick. The following six groups of moderately skilled participants performed penalty kicks following instructions that directed their focus of attention or impacted their autonomy support: external focus with autonomy support (EF/AS), external focus alone (EF), internal focus with autonomy support (IF/AS), internal focus alone (IF), autonomy support alone (AS) and control (C) groups. The analysis showed that the EF/AS group demonstrated better kicking accuracy relatively to the IF/AS, IF and C groups, but there were no significant differences between the EF/AS and EF or AS groups. Interestingly, the EF/AS group showed higher self-efficacy compared to the EF, IF/AS, IF and C groups. The finding suggest that a combination of attentional and motivational factors may produce benefits in motor performance.
Project description:To succeed at a sport, athletes must manage the biomechanical trade-offs that constrain their performance. Here, we investigate a previously unknown trade-off in soccer: how the speed of a kick makes the outcome more predictable to an opponent. For this analysis, we focused on penalty kicks to build on previous models of factors that influence scoring. More than 700 participants completed an online survey, watching videos of penalty shots from the perspective of a goalkeeper. Participants (ranging in soccer playing experience from never played to professional) watched 60 penalty kicks, each of which was occluded at a particular moment (-0.4 s to 0.0 s) before the kicker contacted the ball. For each kick, participants had to predict shot direction toward the goal (left or right). As expected, predictions became more accurate as time of occlusion approached ball contact. However, the effect of occlusion was more pronounced when players kicked with the side of the foot than when they kicked with the top of the foot (instep). For side-foot kicks, the direction of shots was predicted more accurately for faster kicks, especially when a large portion of the kicker's approach was presented. Given the trade-off between kicking speed and directional predictability, a penalty kicker might benefit from kicking below their maximal speed.
Project description:BackgroundIn Emergency Departments (EDs), triage is crucial for determining patient severity and prioritizing care, typically using the Manchester Triage Scale (MTS). Traditional triage systems, reliant on human judgment, are prone to under-triage and over-triage, resulting in variability, bias, and incorrect patient classification. Studies suggest that Machine Learning (ML) and Natural Language Processing (NLP) could enhance triage accuracy and consistency. This review analyzes studies on ML and/or NLP algorithms for ED patient triage.MethodsFollowing Preferred Reporting Items for Systematic Review and Meta-Analysis (PRISMA) guidelines, we conducted a systematic review across five databases: Web of Science, PubMed, Scopus, IEEE Xplore, and ACM Digital Library, from their inception of each database to October 2023. The risk of bias was assessed using the Prediction model Risk of Bias Assessment Tool (PROBAST). Only articles employing at least one ML and/or NLP method for patient triage classification were included.ResultsSixty studies covering 57 ML algorithms were included. Logistic Regression (LR) was the most used model, while eXtreme Gradient Boosting (XGBoost), decision tree-based algorithms with Gradient Boosting (GB), and Deep Neural Networks (DNNs) showed superior performance. Frequent predictive variables included demographics and vital signs, with oxygen saturation, chief complaints, systolic blood pressure, age, and mode of arrival being the most retained. The ML algorithms showed significant bias risk due to critical bias assessment in classification models.ConclusionNLP methods improved ML algorithms' classification capability using triage nursing and medical notes and structured clinical data compared to algorithms using only structured data. Feature engineering (FE) and class imbalance correction methods enhanced ML workflows' performance, but FE and eXplainable Artificial Intelligence (XAI) were underexplored in this field. Registration and funding. This systematic review has been registered (registration number: CRD42024604529) in the International Prospective Register of Systematic Reviews (PROSPERO) and can be accessed online at the following URL: https://www.crd.york.ac.uk/prospero/display_record.php?RecordID=604529 . Funding for this work was provided by the National Council for Scientific and Technological Development (CNPq), Brazil.