ABSTRACT: Whole genome sequencing data on D19-0702 (AUS1), presented in Martin et al. 2020 (AUS1). WGS (Illumina HiSeq) was performed at Kinghorn Centre for Clinical Genetics, Garvan Institute of Medical Research. Data was analyzed using the Seave bioinformatic analysis pipeline (https://www.seave.bio).
Project description:The CRISPR-Cas system coupled with Combinatorial Genetics <i>En Masse</i> (CombiGEM) enables systematic analysis of high-order genetic perturbations that are important for understanding biological processes and discovering therapeutic target combinations. Here, we present detailed steps and technical considerations for building multiplexed guide RNA libraries and carrying out a combinatorial CRISPR screen in mammalian cells. We also present an analytical pipeline, CombiPIPE, for mapping two- and three-way genetic interactions. For complete details on the use and execution of this protocol, please refer to Zhou et al. (2020).
Project description:Open or accessible regions of the genome are the primary positions of binding sites for transcription factors and chromatin regulators. Transposase-accessible chromatin sequencing (ATAC-seq) can probe chromatin accessibility in the intact nucleus. Here, we describe a protocol to generate ATAC-seq libraries from fresh <i>Arabidopsis thaliana</i> tissues and establish an easy-to-use bioinformatic analysis pipeline. Our method could be applied to other plants and other tissues and allows for the reliable detection of changes in chromatin accessibility throughout plant growth and development. For complete details on the use and execution of this protocol, please refer to Wang et al. (2020).
Project description:X-ray crystallography and DFT calculations were used to characterize the molecular nature and excited state properties of isomeric photostable azo dyes for textile fibers undergoing extensive sunlight exposure. Structural data in CIF files arising from X-ray analysis are reported and the complete files are deposited with the Cambridge Crystallographic Data Centre as CCDC 1548989 (https://www.ccdc.cam.ac.uk/structures/Search?Ccdcid=1548989) and CCDC 1548990 (https://www.ccdc.cam.ac.uk/structures/Search?Ccdcid=1548990). Data from calculating the vertical electronic excitation of 20 excited states for each dye and from calculating excited state oxidation potential (ESOP) and Frontier HOMO/LUMO isosurfaces are also presented. This data is related to the article "Molecular and excited state properties of isomeric scarlet disperse dyes" (Lim et al., 2018) .
Project description:Despite a large number of proteomic studies of biological fluids from ovarian cancer patients, there is a lack of sensitive screening methods in clinical practice (Kim et al., 2016) (DOI:https://doi.org/10.1111/cas.12987). Low molecular weight endogenous peptides more easily diffuse across endothelial barriers than proteins and can be more relevant biomarker candidates (Meo et al., 2016) (DOI:https://doi.org/10.18632/oncotarget.8931, (Bery et al., 2014) DOI:https://doi.org/10.1186/1559-0275-11-13, (Huang et al., 2018) DOI:https://doi.org/10.1097/IGC.0000000000001166). Detailed peptidomic analysis of 26 ovarian cancer and 15 non-cancer samples of biological fluids (ascites and sera) were performed using TripleTOF 5600+ mass-spectrometer. Prior to LC-MS/MS analysis, peptides were extracted from biological fluids using anion exchange sorbent with subsequent peptide desorption from the surface of highly abundant proteins. In total, we identified 4874 peptides; 3123 peptides were specific for the ovarian cancer samples. The mass-spectrometry peptidomics data presented in this data article have been deposited to the ProteomeXchange Consortium (Deutsch et al., 2017) (DOI:https://doi.org/10.1093/nar/gkw936) via the PRIDE partner repository with the dataset identifier PXD009382 and https://doi.org/10.6019/PXD009382, http://www.ebi.ac.uk/pride/archive/projects/PXD009382.
Project description:The electromagnetic interference caused by overhead power lines on nonparallel underground pipelines is assessed in "A numerical model for the calculation of electromagnetic interference from power lines on nonparallel underground pipelines" (Popoli et al., 2020 ) by segmenting the pipeline path in a number of traits parallel to the power line. The analysis requires a multi-port electrical component to be extracted for each pipeline segment by means of finite element 2D analysis; circuit analysis can then be applied to the network composed of the cascade of the multi-port electrical components in order to calculate the induced voltages and currents on the pipeline. The data in this paper consist of matrices which represent the multi-port electrical components corresponding to the segments in which the pipeline has been subdivided. These matrices can then be used as constitutive relations in network analysis, as detailed in (Popoli et al., 2019 ), in order to find the induced pipeline voltages and currents for different intersection angles between the pipeline and overhead power line routings.
Project description:A major task in analyzing alternative splicing (AS) by RNA-Seq is to explore and quantify the differential representation of various splice classes and the protein-coding potential of the mapped reads. To this end, we have generated a streamlined bioinformatic pipeline available to the scientific community, which maps RNA-Seq reads to a complex combinatorial database of exon-exon junctions for the unique junction discovery, PTC detection and identification of splice isoforms. Hence, we have incorporated a splice isoform inference and PTC detection algorithm, which facilitates a highly accurate mapping and prediction of splice isoform junctions and nonsense-mediated mRNA decay (NMD)-susceptibility. We used this pipeline to investigate the complexity of the transcriptome and global role of AS-NMD in vivo in mammalian cells, by taking advantage of our Upf2 conditional knock-out mouse. Using tissue-specific Cre deleter strains, we have previously demonstrated in vivo that NMD acts to degrade many transcripts that results from AS. Also, ablating UPF2 (a core NMD factor) in the mouse leads to rapid mortality and collapse of most organs tested (Thoren et al., 2010; Weischenfeldt et al., 2008), suggesting an essential function for NMD in AS. To explore the effect of NMD on global splicing and to generate and validate an attractive bioinformatic pipeline for the scientific community to study AS and NMD, we chose to analyze two different mammalian organ systems with distinct phenotypes upon UPF2 ablation to test the robustness and dynamic range of the analysis. In one end of the spectrum, we chose to analyze the liver, wherein removal of UPF2 results in failure in liver metabolism and a high mortality rate (Thoren et al., 2010). To profile a less affected tissue, we chose to analyze bone marrow-derived macrophages (BMMs). These are macrophages that differentiate in vitro and Upf2 deleted BMMs are completely devoid of NMD activity but nevertheless show no morphological or functional phenotype compared to wild-type controls (Weischenfeldt et al., 2008). Thus, we were able to make a direct comparison between these two tissues to demonstrate the potency of our Isoform and PTC detection pipeline described below, which will allow the scientific community to analyze transcriptome data for global AS and NMD-susceptibility. Examination of UPF2 WT vs. KO in two different murine tissues (liver, bone marrow-derived macrophages).
Project description:The data presented in this article relate to the research article "A quasi-3D approach for the assessment of induced AC interference on buried metallic pipelines" (Popoli et al., 2018) . The current induced on a pipeline versus the distance between the overhead power line and the pipeline is presented. Various configurations of the overhead power line are considered. The calculations are based on method that combines 2D FEM simulations with circuital analysis.
Project description:This article accompanies the study presented in Triulzi et al. (2020) . It briefly describes and makes available the data on functional performance for 30 technology domains, their patent sets, the measurement of patent centrality and the method to estimate the yearly technology performance improvement rate (TIR) that underly that study. Some of this data (performance time series and the lists of patents for 28 domains) has been collected by other authors for previous studies but were previously unavailable to the public. Measurements of patent centrality and other patent-based indicators for the 30 domains, and for 5.259.906 utility patents granted by the United States Patent and Trademark Office between 1976 and 2015 are novel data contributed by Triulzi et al. (2020) . Here we organize, describe and make available the collection of data in its entirety. This allows anyone interested to replicate the study or use the method to estimate the improvement rate of a given technology for which patents can be identified. For a detailed description of the data and methods see Triulzi et al. (2020) .
Project description:Background and objectives:The pandemic of novel coronavirus disease 2019 (COVID-19) has severely impacted human society with a massive death toll worldwide. There is an urgent need for early and reliable screening of COVID-19 patients to provide better and timely patient care and to combat the spread of the disease. In this context, recent studies have reported some key advantages of using routine blood tests for initial screening of COVID-19 patients. In this article, first we present a review of the emerging techniques for COVID-19 diagnosis using routine laboratory and/or clinical data. Then, we propose ERLX which is an ensemble learning model for COVID-19 diagnosis from routine blood tests. Method:The proposed model uses three well-known diverse classifiers, extra trees, random forest and logistic regression, which have different architectures and learning characteristics at the first level, and then combines their predictions by using a second level extreme gradient boosting (XGBoost) classifier to achieve a better performance. For data preparation, the proposed methodology employs a KNNImputer algorithm to handle null values in the dataset, isolation forest (iForest) to remove outlier data, and a synthetic minority oversampling technique (SMOTE) to balance data distribution. For model interpretability, features importance are reported by using the SHapley Additive exPlanations (SHAP) technique. Results:The proposed model was trained and evaluated by using a publicly available data set from Albert Einstein Hospital in Brazil, which consisted of 5644 data samples with 559 confirmed COVID-19 cases. The ensemble model achieved outstanding performance with an overall accuracy of 99.88% [95% CI: 99.6-100], AUC of 99.38% [95% CI: 97.5-100], a sensitivity of 98.72% [95% CI: 94.6-100] and a specificity of 99.99% [95% CI: 99.99-100]. Discussion:The proposed model revealed better performance when compared against existing state-of-the-art studies (Banerjee et al., 2020; de Freitas Barbosa et al., 2020; de Moraes Batista et al., 2020; Soares et al., 2020) [3,22,56,71] for the same set of features employed by them. As compared to the best performing Bayes Net model (de Freitas Barbosa et al., 2020)  average accuracy of 95.159%, ERLX achieved an average accuracy of 99.94%. In comparison with AUC of 85% reported by the SVM model (de Moraes Batista et al., 2020) , ERLX obtained AUC of 99.77% in addition to improvements in sensitivity, and specificity. As compared with ER-COV model (Soares et al., 2020)  average sensitivity of 70.25% and specificity of 85.98%, ERLX model achieved sensitivity of 99.47% and specificity of 99.99%. The ERLX model obtained a considerably higher score as compared with ANN model (Banerjee et al., 2020)  in all performance metrics. Therefore, the model presented is robust and can be deployed for reliable early and rapid screening of COVID-19 patients.
Project description:Abstract Human activity has led to increased atmospheric concentrations of many gases, including halocarbons, and may lead to emissions of many more gases. Many of these gases are, on a per molecule basis, powerful greenhouse gases, although at present?day concentrations their climate effect is in the so?called weak limit (i.e., their effect scales linearly with concentration). We published a comprehensive review of the radiative efficiencies (RE) and global warming potentials (GWP) for around 200 such compounds in 2013 (Hodnebrog et al., 2013, https://doi.org/10.1002/rog.20013). Here we present updated RE and GWP values for compounds where experimental infrared absorption spectra are available. Updated numbers are based on a revised “Pinnock curve”, which gives RE as a function of wave number, and now also accounts for stratospheric temperature adjustment (Shine & Myhre, 2020, https://doi.org/10.1029/2019MS001951). Further updates include the implementation of around 500 absorption spectra additional to those in the 2013 review and new atmospheric lifetimes from the literature (mainly from WMO (2019)). In total, values for 60 of the compounds previously assessed are based on additional absorption spectra, and 42 compounds have REs which differ by >10% from our previous assessment. New RE calculations are presented for more than 400 compounds in addition to the previously assessed compounds, and GWP calculations are presented for a total of around 250 compounds. Present?day radiative forcing due to halocarbons and other weak absorbers is 0.38 [0.33–0.43] W m?2, compared to 0.36 [0.32–0.40] W m?2 in IPCC AR5 (Myhre et al., 2013, https://doi.org/10.1017/CBO9781107415324.018), which is about 18% of the current CO2 forcing. Key Points Radiative efficiencies are reassessed for more than 600 compounds and global warming potentials calculated for around 250 of these Forty?two compounds have >10% different radiative efficiency compared to a comprehensive review in 2013 Present?day radiative forcing due to halocarbons and other weak absorbers is 0.38 [0.33–0.43] W m?2, which is ~18% of the CO2 forcing