Project description:Reproductive failure is still a challenge for beef producers and a significant cause of economic loss. The increased availability of transcriptomic data has shed light on the mechanisms modulating pregnancy success. Furthermore, new analytical tools, such as machine learning (ML), provide opportunities for data mining and uncovering new biological events that explain or predict reproductive outcomes. Herein, we identified potential biomarkers underlying pregnancy status and fertility-related networks by integrating gene expression profiles through ML and gene network modeling. We used public transcriptomic data from uterine luminal epithelial cells of cows retrospectively classified as pregnant (P, n = 25) and non-pregnant (NP, n = 18). First, we used a feature selection function from BioDiscML and identified SERPINE3, PDCD1, FNDC1, MRTFA, ARHGEF7, MEF2B, NAA16, ENSBTAG00000019474, and ENSBTAG00000054585 as candidate biomarker predictors of pregnancy status. Then, based on co-expression networks, we identified seven genes significantly rewired (gaining or losing connections) between the P and NP networks. These biomarkers were co-expressed with genes critical for uterine receptivity, including endometrial tissue remodeling, focal adhesion, and embryo development. We provided insights into the regulatory networks of fertility-related processes and demonstrated the potential of combining different analytical tools to prioritize candidate genes.
Project description:The northern Australia beef cattle industry operates in harsh environmental conditions which consistently suppress female fertility. To better understand the environmental effect on cattle raised extensively in northern Australia, new environmental descriptors were defined for 54 commercial herds located across the region. Three fertility traits, based on the presence of a corpus luteum at 600 d of age, indicating puberty, (CL Presence, n = 25,176), heifer pregnancy (n = 20,989) and first lactation pregnancy (n = 10,072) were recorded. Temperature, humidity, and rainfall were obtained from publicly available data based on herd location. Being pubertal at 600 d (i.e. CL Presence) increased the likelihood of success at heifer pregnancy and first lactation pregnancy (P < 0.05), underscoring the importance of early puberty in reproductive success. A temperature humidity index (THI) of 65-70 had a significant (P < 0.05) negative effect on first lactation pregnancy rate, heifer pregnancy and puberty at 600 d of age. Area under the curve of daily THI was significant (P < 0.05) and reduced the likelihood of pregnancy at first lactation and puberty at 600 days. Deviation from long-term average rainfall was not significant (P < 0.05) for any trait. Average daily weight gain had a significant and positive relationship (P < 0.05) for heifer and first lactation pregnancy. The results indicate that chronic or cumulative heat load is more determinantal to reproductive performance than acute heat stress. The reason for the lack of a clear relationship between acute heat stress and reproductive performance is unclear but may be partially explained by peak THI and peak nutrition coinciding at the same time. Sufficient evidence was found to justify the use of average daily weight gain and chronic heat load as descriptors to define an environmental gradient.
Project description:Machine learning (ML) methods have shown promising results in identifying genes when applied to large transcriptome datasets. However, no attempt has been made to compare the performance of combining different ML methods together in the prediction of high feed efficiency (HFE) and low feed efficiency (LFE) animals. In this study, using RNA sequencing data of five tissues (adrenal gland, hypothalamus, liver, skeletal muscle, and pituitary) from nine HFE and nine LFE Nellore bulls, we evaluated the prediction accuracies of five analytical methods in classifying FE animals. These included two conventional methods for differential gene expression (DGE) analysis (t-test and edgeR) as benchmarks, and three ML methods: Random Forests (RFs), Extreme Gradient Boosting (XGBoost), and combination of both RF and XGBoost (RX). Utility of a subset of candidate genes selected from each method for classification of FE animals was assessed by support vector machine (SVM). Among all methods, the smallest subsets of genes (117) identified by RX outperformed those chosen by t-test, edgeR, RF, or XGBoost in classification accuracy of animals. Gene co-expression network analysis confirmed the interactivity existing among these genes and their relevance within the network related to their prediction ranking based on ML. The results demonstrate a great potential for applying a combination of ML methods to large transcriptome datasets to identify biologically important genes for accurately classifying FE animals.
Project description:Fertility plays a key role in the success of calf production, but there is evidence that reproductive efficiency in beef cattle has decreased during the past half-century worldwide. Therefore, identifying animals with superior fertility could significantly impact cow-calf production efficiency. The objective of this research was to identify candidate regions affecting bull fertility in beef cattle and positional candidate genes annotated within these regions. A GWAS using a weighted single-step genomic BLUP approach was performed on 265 crossbred beef bulls to identify markers associated with scrotal circumference (SC) and sperm motility (SM). Eight windows containing 32 positional candidate genes and five windows containing 28 positional candidate genes explained more than 1% of the genetic variance for SC and SM, respectively. These windows were selected to perform gene annotation, QTL enrichment, and functional analyses. Functional candidate gene prioritization analysis revealed 14 prioritized candidate genes for SC of which MAP3K1 and VIP were previously found to play roles in male fertility. A different set of 14 prioritized genes were identified for SM and five were previously identified as regulators of male fertility (SOD2, TCP1, PACRG, SPEF2, PRLR). Significant enrichment results were identified for fertility and body conformation QTLs within the candidate windows. Gene ontology enrichment analysis including biological processes, molecular functions, and cellular components revealed significant GO terms associated with male fertility. The identification of these regions contributes to a better understanding of fertility associated traits and facilitates the discovery of positional candidate genes for future investigation of causal mutations and their implications.
Project description:Reproductive failure remains a significant challenge to the beef industry. The omics technologies have provided opportunities to improve reproductive efficiency. We used a multistaged analysis from blood profiles to integrate metabolome (plasma) and transcriptome (peripheral white blood cells) in beef heifers. We used untargeted metabolomics and RNA-Seq paired data from six AI-pregnant (AI-P) and six nonpregnant (NP) Angus-Simmental crossbred heifers at artificial insemination (AI). Based on network co-expression analysis, we identified 17 and 37 hub genes in the AI-P and NP groups, respectively. Further, we identified TGM2, TMEM51, TAC3, NDRG4, and PDGFB as more connected in the NP heifers' network. The NP gene network showed a connectivity gain due to the rewiring of major regulators. The metabolomic analysis identified 18 and 15 hub metabolites in the AI-P and NP networks. Tryptophan and allantoic acid exhibited a connectivity gain in the NP and AI-P networks, respectively. The gene-metabolite integration identified tocopherol-a as positively correlated with ENSBTAG00000009943 in the AI-P group. Conversely, tocopherol-a was negatively correlated in the NP group with EXOSC2, TRNAUIAP, and SNX12. In the NP group, α-ketoglutarate-SMG8 and putrescine-HSD17B13 were positively correlated, whereas a-ketoglutarate-ALAS2 and tryptophan-MTMR1 were negatively correlated. These multiple interactions identified novel targets and pathways underlying fertility in bovines.
Project description:The identification of biological processes related to the regulation of complex traits is a difficult task. Commonly, complex traits are regulated through a multitude of genes contributing each to a small part of the total genetic variance. Additionally, some loci can simultaneously regulate several complex traits, a phenomenon defined as pleiotropy. The lack of understanding on the biological processes responsible for the regulation of these traits results in the decrease of selection efficiency and the selection of undesirable hitchhiking effects. The identification of pleiotropic key-regulator genes can assist in developing important tools for investigating biological processes underlying complex traits. A multi-breed and multi-OMICs approach was applied to study the pleiotropic effects of key-regulator genes using three independent beef cattle populations evaluated for fertility traits. A pleiotropic map for 32 traits related to growth, feed efficiency, carcass and meat quality, and reproduction was used to identify genes shared among the different populations and breeds in pleiotropic regions. Furthermore, data-mining analyses were performed using the Cattle QTL database (CattleQTLdb) to identify the QTL category annotated in the regions around the genes shared among breeds. This approach allowed the identification of a main gene network (composed of 38 genes) shared among breeds. This gene network was significantly associated with thyroid activity, among other biological processes, and displayed a high regulatory potential. In addition, it was possible to identify genes with pleiotropic effects related to crucial biological processes that regulate economically relevant traits associated with fertility, production and health, such as MYC, PPARG, GSK3B, TG and IYD genes. These genes will be further investigated to better understand the biological processes involved in the expression of complex traits and assist in the identification of functional variants associated with undesirable phenotypes, such as decreased fertility, poor feed efficiency and negative energetic balance.
Project description:Infertility and subfertility negatively impact the economics and reproductive performance of cattle. Of note, significant pregnancy loss occurs in cattle during the first month of pregnancy, yet little is known about the genetic loci influencing pregnancy success and loss in cattle. To identify quantitative trait loci (QTL) with large effects associated with early pregnancy loss, Angus crossbred heifers were classified based on day 28 pregnancy outcomes to serial embryo transfer. A genome wide association analysis (GWAA) was conducted comparing 30 high fertility heifers with 100% success in establishing pregnancy to 55 subfertile heifers with 25% or less success. A gene set enrichment analysis SNP (GSEA-SNP) was performed to identify gene sets and leading edge genes influencing pregnancy loss. The GWAA identified 22 QTL (p < 1 x 10-5), and GSEA-SNP identified 9 gene sets (normalized enrichment score > 3.0) with 253 leading edge genes. Network analysis identified TNF (tumor necrosis factor), estrogen, and TP53 (tumor protein 53) as the top of 671 upstream regulators (p < 0.001), whereas the SOX2 (SRY [sex determining region Y]-box 2) and OCT4 (octamer-binding transcription factor 4) complex was the top master regulator out of 773 master regulators associated with fertility (p < 0.001). Identification of QTL and genes in pathways that improve early pregnancy success provides critical information for genomic selection to increase fertility in cattle. The identified genes and regulators also provide insight into the complex biological mechanisms underlying pregnancy establishment in cattle.
Project description:Rheumatoid arthritis (RA) is an incurable disease that afflicts 0.5-1.0% of the global population though it is less threatening at its early stage. Therefore, improved diagnostic efficiency and prognostic outcome are critical for confronting RA. Although machine learning is considered a promising technique in clinical research, its potential in verifying the biological significance of gene was not fully exploited. The performance of a machine learning model depends greatly on the features used for model training; therefore, the effectiveness of prediction might reflect the quality of input features. In the present study, we used weighted gene co-expression network analysis (WGCNA) in conjunction with differentially expressed gene (DEG) analysis to select the key genes that were highly associated with RA phenotypes based on multiple microarray datasets of RA blood samples, after which they were used as features in machine learning model validation. A total of six machine learning models were used to validate the biological significance of the key genes based on gene expression, among which five models achieved good performances [area under curve (AUC) >0.85], suggesting that our currently identified key genes are biologically significant and highly representative of genes involved in RA. Combined with other biological interpretations including Gene Ontology (GO) analysis, protein-protein interaction (PPI) network analysis, as well as inference of immune cell composition, our current study might shed a light on the in-depth study of RA diagnosis and prognosis.