Project description:Fusarium head blight (FHB) incited by Fusarium graminearum Schwabe is a devastating disease of barley and other cereal crops worldwide. Fusarium head blight is associated with trichothecene mycotoxins such as deoxynivalenol (DON), where contaminated grains are unfit for malting or animal feed industries. While genetically resistant cultivars offer the best economic and environmentally responsible means to mitigate disease, parent lines with adequate resistance are limited in barley. Resistancebreeding based upon quantitative genetic gains has been slow to date, due to intensive labour requirements of disease nurseries. The development of high throughput genome-wide molecular markers, allow application in genomic prediction models. A diverse genomic panel consisting of 400 two-row spring barley lines was assembled to focus on Canadian barley breeding programs. The panel was evaluated for FHB and DON content in three environments and over two years. Moreover, it was genotyped using an Illumina Infinium HTS iSelect custom beadchip array of single nucleotide polymorphic molecular markers (50K SNP), where over 23K molecular markers were polymorphic. Genomic prediction has been successfully demonstrated for reducing FHB and DON content in cereals using various statistically-based models of different underlying assumptions. Herein, we have studied an alternative method basedon machine learning and compare it with a statistical approach. Two encoding techniques were utilized (categorical or Hardy-Weinberg frequencies), followed by selecting essential genomic markers for phenotype prediction. Subsequently, we applied a transformer-based deep learning algorithm to predict FHB and DON. Apart from the transformer method, we also implemented a Residual Fully Connected Neural Network (RFCNN). Pearson correlation coefficients were calculated to compare true vs. predicted outputs. Under most model scenarios, the use of all markers vs. selected markers marginally improved prediction performance except for RFCNN method for FHB (27.6%). Hardy-Weinberg encoding generally improved correlation for FHB (6.9%) and DON (9.6%) for transformer. This study suggests the potential of the transformer based method for genomic prediction of complex traits such as FHB or DON, having performed better or equally compared with existing machine learning and statistical method. To genomic prediction in barley for Fusarium head blight and deoxynivalenol content using a custom Illumina Infinium array (BarleySNP50-JHI) (www.illumina.com). Sample types included leaves from 400 barley genotypes mostly of Canadian origin. This series includes 400 genotypes assayed on an Illumina infinium HTS platform 50K BeadChip.
Project description:DNA methylation is a crucial topic in bioinformatics research. Traditional wet experiments are usually time-consuming and expensive. In contrast, machine learning offers an efficient and novel approach. In this study, we propose DeepMethylation, a novel methylation predictor with deep learning. Specifically, the DNA sequence is encoded with word embedding and GloVe in the first step. After that, dilated convolution and Transformer encoder are utilized to extract the features. Finally, full connection and softmax operators are applied to predict the methylation sites. The proposed model achieves an accuracy of 97.8% on the 5mC dataset, which outperforms state-of-the-art methods. Furthermore, our predictor exhibits good generalization ability as it achieves an accuracy of 95.8% on the m1A dataset. To ease access for other researchers, our code is publicly available at https://github.com/sb111169/tf-5mc.
Project description:Automated tooth segmentation and identification on dental radiographs are crucial steps in establishing digital dental workflows. While deep learning networks have been developed for these tasks, their performance has been inferior in partially edentulous individuals. This study proposes a novel semi-supervised Transformer-based framework (SemiTNet), specifically designed to improve tooth segmentation and identification performance on panoramic radiographs, particularly in partially edentulous cases, and establish an open-source dataset to serve as a unified benchmark. A total of 16,317 panoramic radiographs (1589 labeled and 14,728 unlabeled images) were collected from various datasets to create a large-scale dataset (TSI15k). The labeled images were divided into training and test sets at a 7:1 ratio, while the unlabeled images were used for semi-supervised learning. The SemiTNet was developed using a semi-supervised learning method with a label-guided teacher-student knowledge distillation strategy, incorporating a Transformer-based architecture. The performance of SemiTNet was evaluated on the test set using the intersection over union (IoU), Dice coefficient, precision, recall, and F1 score, and compared with five state-of-the-art networks. Paired t-tests were performed to compare the evaluation metrics between SemiTNet and the other networks. SemiTNet outperformed other networks, achieving the highest accuracy for tooth segmentation and identification, while requiring minimal model size. SemiTNet's performance was near-perfect for fully dentate individuals (all metrics over 99.69%) and excellent for partially edentulous individuals (all metrics over 93%). In edentulous cases, SemiTNet obtained statistically significantly higher tooth identification performance than all other networks. The proposed SemiTNet outperformed previous high-complexity, state-of-the-art networks, particularly in partially edentulous cases. The established open-source TSI15k dataset could serve as a unified benchmark for future studies.
Project description:The research of metamaterial shows great potential in the field of solar energy harvesting. In the past decade, the design of broadband solar metamaterial absorber (SMA) has attracted a surge of interest. The conventional design typically requires brute-force optimizations with a huge sampling space of structure parameters. Very recently, deep learning (DL) has provided a promising way in metamaterial design, but its application on SMA development is barely reported due to the complicated features of broadband spectrum. Here, this work develops the DL model based on metamaterial spectrum transformer (MST) for the powerful design of high-performance SMAs. The MST divides the optical spectrum of metamaterial into N patches, which overcomes the severe problem of overfitting in traditional DL and boosts the learning capability significantly. A flexible design tool based on free customer definition is developed to facilitate the real-time on-demand design of metamaterials with various optical functions. The scheme is applied to the design and fabrication of SMAs with graded-refractive-index nanostructures. They demonstrate the high average absorptance of 94% in a broad solar spectrum and exhibit exceptional advantages over many state-of-the-art counterparts. The outdoor testing implies the high-efficiency energy collection of about 1061 kW h m-2 from solar radiation annually. This work paves a way for the rapid smart design of SMA, and will also provide a real-time developing tool for many other metamaterials and metadevices.
Project description:While convolutional operation effectively extracts local features, their limited receptive fields make it challenging to capture global dependencies. Transformer, on the other hand, excels at global modeling and effectively captures global dependencies. However, the self-attention mechanism used in Transformers lacks a local mechanism for information exchange within specific regions. This article attempts to leverage the strengths of both Transformers and convolutional neural networks (CNNs) to enhance the Swin Transformer V2 model. By incorporating both convolutional operation and self-attention mechanism, the enhanced model combines the local information-capturing capability of CNNs and the long-range dependency-capturing ability of Transformers. The improved model enhances the extraction of local information through the introduction of the Swin Transformer Stem, inverted residual feed-forward network, and Dual-Branch Downsampling structure. Subsequently, it models global dependencies using the improved self-attention mechanism. Additionally, downsampling is applied to the attention mechanism's Q and K to reduce computational and memory overhead. Under identical training conditions, the proposed method significantly improves classification accuracy on multiple image classification datasets, showcasing more robust generalization capabilities.
Project description:Accurate geographical traffic forecasting plays a critical role in urban transportation planning, traffic management, and geospatial artificial intelligence (GeoAI). Although deep learning models have made significant progress in geographical traffic forecasting, they still face challenges in effectively capturing long-term temporal dependencies and modeling heterogeneous dynamic spatial dependencies. To address these issues, we propose a novel deep transformer-based heterogeneous spatiotemporal graph learning model for geographical traffic forecasting. Our model incorporates a temporal transformer that captures long-term temporal patterns in traffic data without simple data fusion. Furthermore, we introduce adaptive normalized graph structures within different graph layers, enabling the model to capture dynamic spatial dependencies and adapt to diverse traffic scenarios, especially for the heterogeneous relationship. We conduct comprehensive experiments and visualization on four primary public datasets and demonstrate that our model achieves state-of-the-art results in comparison to existing methods.
Project description:Cancer of unknown primary site (CUPS) is a type of metastatic tumor for which the sites of tumor origin cannot be determined. Precise diagnosis of the tissue origin for metastatic CUPS is crucial for developing treatment schemes to improve patient prognosis. Recently, there have been many studies using various cancer biomarkers to predict the tissue-of-origin (TOO) of CUPS. However, only a very few of them use copy number alteration (CNA) to trance TOO. In this paper, a two-step computational framework called CNA_origin is introduced to predict the tissue-of-origin of a tumor from its gene CNA levels. CNA_origin set up an intellectual deep-learning network mainly composed of an autoencoder and a convolution neural network (CNN). Based on real datasets released from the public database, CNA_origin had an overall accuracy of 83.81% on 10-fold cross-validation and 79% on independent datasets for predicting tumor origin, which improved the accuracy by 7.75 and 9.72% compared with the method published in a previous paper. Our results suggested that the autoencoder model can extract key characteristics of CNA and that the CNN classifier model developed in this study can predict the origin of tumors robustly and effectively. CNA_origin was written in Python and can be downloaded from https://github.com/YingLianghnu/CNA_origin.
Project description:Wildfires are a worldwide natural disaster causing important economic damages and loss of lives. Experts predict that wildfires will increase in the coming years mainly due to climate change. Early detection and prediction of fire spread can help reduce affected areas and improve firefighting. Numerous systems were developed to detect fire. Recently, Unmanned Aerial Vehicles were employed to tackle this problem due to their high flexibility, their low-cost, and their ability to cover wide areas during the day or night. However, they are still limited by challenging problems such as small fire size, background complexity, and image degradation. To deal with the aforementioned limitations, we adapted and optimized Deep Learning methods to detect wildfire at an early stage. A novel deep ensemble learning method, which combines EfficientNet-B5 and DenseNet-201 models, is proposed to identify and classify wildfire using aerial images. In addition, two vision transformers (TransUNet and TransFire) and a deep convolutional model (EfficientSeg) were employed to segment wildfire regions and determine the precise fire regions. The obtained results are promising and show the efficiency of using Deep Learning and vision transformers for wildfire classification and segmentation. The proposed model for wildfire classification obtained an accuracy of 85.12% and outperformed many state-of-the-art works. It proved its ability in classifying wildfire even small fire areas. The best semantic segmentation models achieved an F1-score of 99.9% for TransUNet architecture and 99.82% for TransFire architecture superior to recent published models. More specifically, we demonstrated the ability of these models to extract the finer details of wildfire using aerial images. They can further overcome current model limitations, such as background complexity and small wildfire areas.
Project description:Recent developments in deep learning, coupled with an increasing number of sequenced proteins, have led to a breakthrough in life science applications, in particular in protein property prediction. There is hope that deep learning can close the gap between the number of sequenced proteins and proteins with known properties based on lab experiments. Language models from the field of natural language processing have gained popularity for protein property predictions and have led to a new computational revolution in biology, where old prediction results are being improved regularly. Such models can learn useful multipurpose representations of proteins from large open repositories of protein sequences and can be used, for instance, to predict protein properties. The field of natural language processing is growing quickly because of developments in a class of models based on a particular model-the Transformer model. We review recent developments and the use of large-scale Transformer models in applications for predicting protein characteristics and how such models can be used to predict, for example, post-translational modifications. We review shortcomings of other deep learning models and explain how the Transformer models have quickly proven to be a very promising way to unravel information hidden in the sequences of amino acids.
Project description:The transportation sector is a major contributor to greenhouse gas (GHG) emissions and is a driver of adverse health effects globally. Increasingly, government policies have promoted the adoption of electric vehicles (EVs) as a solution to mitigate GHG emissions. However, government analysts have failed to fully utilize consumer data in decisions related to charging infrastructure. This is because a large share of EV data is unstructured text, which presents challenges for data discovery. In this article, we deploy advances in transformer-based deep learning to discover topics of attention in a nationally representative sample of user reviews. We report classification accuracies greater than 91% (F1 scores of 0.83), outperforming previously leading algorithms in this domain. We describe applications of these deep learning models for public policy analysis and large-scale implementation. This capability can boost intelligence for the EV charging market, which is expected to grow to US$27.6 billion by 2027.