Project description:Data presentation and statistical analysis in scientific writing are agreed to be in need of improvement, despite the profusion of advice and instruction. Recent evidence supports the need for better planning and analysis of animal experiments. This series of short articles aims to provide advice in small easily digested pieces, on a variety of topics, both basic and more specialized, that are relevant to readers of the journal. The present article encourages authors to present data clearly, preferably as a dot plot, so that the distribution of the values can be recognized. The use of different measures of distribution of a population, and different measures of precision of an estimate is contrasted.
Project description:ObjectivesUsing predictive modeling techniques, we developed and compared appointment no-show prediction models to better understand appointment adherence in underserved populations.Methods and materialsWe collected electronic health record (EHR) data and appointment data including patient, provider and clinical visit characteristics over a 3-year period. All patient data came from an urban system of community health centers (CHCs) with 10 facilities. We sought to identify critical variables through logistic regression, artificial neural network, and naïve Bayes classifier models to predict missed appointments. We used 10-fold cross-validation to assess the models' ability to identify patients missing their appointments.ResultsFollowing data preprocessing and cleaning, the final dataset included 73811 unique appointments with 12,392 missed appointments. Predictors of missed appointments versus attended appointments included lead time (time between scheduling and the appointment), patient prior missed appointments, cell phone ownership, tobacco use and the number of days since last appointment. Models had a relatively high area under the curve for all 3 models (e.g., 0.86 for naïve Bayes classifier).DiscussionPatient appointment adherence varies across clinics within a healthcare system. Data analytics results demonstrate the value of existing clinical and operational data to address important operational and management issues.ConclusionEHR data including patient and scheduling information predicted the missed appointments of underserved populations in urban CHCs. Our application of predictive modeling techniques helped prioritize the design and implementation of interventions that may improve efficiency in community health centers for more timely access to care. CHCs would benefit from investing in the technical resources needed to make these data readily available as a means to inform important operational and policy questions.
Project description:To protect biodiversity, conservation laws should be evaluated and improved using data. We provide a comprehensive assessment of how a key provision of the U.S. Endangered Species Act (ESA) is implemented: consultation to ensure federal actions do not jeopardize the existence of listed species. Data from all 24,893 consultations recorded by the National Marine Fisheries Service (NMFS) from 2000-2017 show federal agencies and NMFS frequently agreed (79%) on how federal actions would affect listed species. In cases of disagreement, agencies most often (71%) underestimated effects relative to the conclusions of species experts at NMFS. Such instances can have deleterious consequences for imperiled species. In 22 consultations covering 14 species, agencies concluded that an action would not harm species while NMFS determined the action would jeopardize species' existence. These results affirm the importance of the role of NMFS in preventing federal actions from jeopardizing listed species. Excluding expert agencies from consultation compromises biodiversity conservation, but we identify approaches that improve consultation efficiency without sacrificing species protections.
Project description:BackgroundOverfitting the data is a salient issue for classifier design in small-sample settings. This is why selecting a classifier from a constrained family of classifiers, ones that do not possess the potential to too finely partition the feature space, is typically preferable. But overfitting is not merely a consequence of the classifier family; it is highly dependent on the classification rule used to design a classifier from the sample data. Thus, it is possible to consider families that are rather complex but for which there are classification rules that perform well for small samples. Such classification rules can be advantageous because they facilitate satisfactory classification when the class-conditional distributions are not easily separated and the sample is not large. Here we consider neural networks, from the perspectives of classical design based solely on the sample data and from noise-injection-based design.ResultsThis paper provides an extensive simulation-based comparative study of noise-injected neural-network design. It considers a number of different feature-label models across various small sample sizes using varying amounts of noise injection. Besides comparing noise-injected neural-network design to classical neural-network design, the paper compares it to a number of other classification rules. Our particular interest is with the use of microarray data for expression-based classification for diagnosis and prognosis. To that end, we consider noise-injected neural-network design as it relates to a study of survivability of breast cancer patients.ConclusionThe conclusion is that in many instances noise-injected neural network design is superior to the other tested methods, and in almost all cases it does not perform substantially worse than the best of the other methods. Since the amount of noise injected is consequential, the effect of differing amounts of injected noise must be considered.
Project description:A year since the declaration of the global coronavirus disease 2019 (COVID-19) pandemic, there were over 110 million cases and 2.5 million deaths. Learning from methods to track community spread of other viruses such as poliovirus, environmental virologists and those in the wastewater-based epidemiology (WBE) field quickly adapted their existing methods to detect SARS-CoV-2 RNA in wastewater. Unlike COVID-19 case and mortality data, there was not a global dashboard to track wastewater monitoring of SARS-CoV-2 RNA worldwide. This study provides a 1-year review of the "COVIDPoops19" global dashboard of universities, sites, and countries monitoring SARS-CoV-2 RNA in wastewater. Methods to assemble the dashboard combined standard literature review, Google Form submissions, and daily, social media keyword searches. Over 200 universities, 1400 sites, and 55 countries with 59 dashboards monitored wastewater for SARS-CoV-2 RNA. However, monitoring was primarily in high-income countries (65%) with less access to this valuable tool in low- and middle-income countries (35%). Data were not widely shared publicly or accessible to researchers to further inform public health actions, perform meta-analysis, better coordinate, and determine equitable distribution of monitoring sites. For WBE to be used to its full potential during COVID-19 and beyond, show us the data.
Project description:Several disorders are related to amyloid aggregation of proteins, for example Alzheimer's or Parkinson's diseases. Amyloid proteins form fibrils of aggregated beta structures. This is preceded by formation of oligomers-the most cytotoxic species. Determining amyloidogenicity is tedious and costly. The most reliable identification of amyloids is obtained with high resolution microscopies, such as electron microscopy or atomic force microscopy (AFM). More frequently, less expensive and faster methods are used, especially infrared (IR) spectroscopy or Thioflavin T staining. Different experimental methods are not always concurrent, especially when amyloid peptides do not readily form fibrils but oligomers. This may lead to peptide misclassification and mislabeling. Several bioinformatics methods have been proposed for in-silico identification of amyloids, many of them based on machine learning. The effectiveness of these methods heavily depends on accurate annotation of the reference training data obtained from in-vitro experiments. We study how robust are bioinformatics methods to weak supervision, encountering imperfect training data. AmyloGram and three other amyloid predictors were applied. The results proved that a certain degree of misannotation in the reference data can be eliminated by the bioinformatics tools, even if they belonged to their training set. The computational results are supported by new experiments with IR and AFM methods.
Project description:BackgroundRace is an important predictor of TKA outcomes in the United States; however, analyses of race can be confounded by socioeconomic factors, which can result in difficulty determining the root cause of disparate outcomes after TKA.Questions/purposesWe asked: (1) Are race and socioeconomic factors at the individual level associated with patient-reported pain and function 2 years after TKA? (2) What is the interaction between race and community poverty and patient-reported pain and function 2 years after TKA?MethodsWe identified all patients undergoing TKA enrolled in a hospital-based registry between 2007 and 2011 who provided 2-year outcomes and lived in New York, Connecticut, or New Jersey. Of patients approached to participate in the registry, more than 82% consented and provided baseline data, and of these patients, 72% provided 2-year data. Proportions of patients with complete followup at 2 years were lower among blacks (57%) than whites (74%), among patients with Medicaid insurance (51%) compared with patients without Medicaid insurance (72%), and among patients without a college education (67%) compared with those with a college education (71%). Our final study cohort consisted of 4035 patients, 3841 (95%) of whom were white and 194 (5%) of whom were black. Using geocoding, we linked individual-level registry data to US census tracts data through patient addresses. We constructed a multivariate linear mixed-effect model in multilevel frameworks to assess the interaction between race and census tract poverty on WOMAC pain and function scores 2 years after TKA. We defined a clinically important effect as 10 points on the WOMAC (which is scaled from 1 to 100 points, with higher scores being better).ResultsRace, education, patient expectations, and baseline WOMAC scores are all associated with 2-year WOMAC pain and function; however, the effect sizes were small, and below the threshold of clinical importance. Whites and blacks from census tracts with less than 10% poverty have similar levels of pain and function 2 years after TKA (WOMAC pain, 1.01 ± 1.59 points lower for blacks than for whites, p = 0.53; WOMAC function, 2.32 ± 1.56 lower for blacks than for whites, p = 0.14). WOMAC pain and function scores 2 years after TKA worsen with increasing levels of community poverty, but do so to a greater extent among blacks than whites. Disparities in pain and function between blacks and whites are evident only in the poorest communities; decreasing in a linear fashion as poverty increases. In census tracts with greater than 40% poverty, blacks score 6 ± 3 points lower (worse) than whites for WOMAC pain (p = 0.03) and 7 ± 3 points lower than whites for WOMAC function (p = 0.01).ConclusionsBlacks and whites living in communities with little poverty have similar patient-reported TKA outcomes, whereas in communities with high levels of poverty, there are important racial disparities. Efforts to improve TKA outcomes among blacks will need to address individual- and community-level socioeconomic factors.Level of evidenceLevel III, therapeutic study.
Project description:BackgroundAlthough increasingly sophisticated environmental measures are being applied to species distributions models, the focus remains on using climatic data to provide estimates of habitat suitability. Climatic tolerance estimates based on expert knowledge are available for a wide range of plants via the USDA PLANTS database. We aim to test how climatic tolerance inferred from plant distribution records relates to tolerance estimated by experts. Further, we use this information to identify circumstances when species distributions are more likely to approximate climatic tolerance.MethodsWe compiled expert knowledge estimates of minimum and maximum precipitation and minimum temperature tolerance for over 1800 conservation plant species from the 'plant characteristics' information in the USDA PLANTS database. We derived climatic tolerance from distribution data downloaded from the Global Biodiversity and Information Facility (GBIF) and corresponding climate from WorldClim. We compared expert-derived climatic tolerance to empirical estimates to find the difference between their inferred climate niches (ΔCN), and tested whether ΔCN was influenced by growth form or range size.ResultsClimate niches calculated from distribution data were significantly broader than expert-based tolerance estimates (Mann-Whitney p values << 0.001). The average plant could tolerate 24 mm lower minimum precipitation, 14 mm higher maximum precipitation, and 7° C lower minimum temperatures based on distribution data relative to expert-based tolerance estimates. Species with larger ranges had greater ΔCN for minimum precipitation and minimum temperature. For maximum precipitation and minimum temperature, forbs and grasses tended to have larger ΔCN while grasses and trees had larger ΔCN for minimum precipitation.ConclusionOur results show that distribution data are consistently broader than USDA PLANTS experts' knowledge and likely provide more robust estimates of climatic tolerance, especially for widespread forbs and grasses. These findings suggest that widely available expert-based climatic tolerance estimates underrepresent species' fundamental niche and likely fail to capture the realized niche.