Unknown

Dataset Information

0

Large-scale DNA-based phenotypic recording and deep learning enable highly accurate sequence-function mapping.


ABSTRACT: Predicting effects of gene regulatory elements (GREs) is a longstanding challenge in biology. Machine learning may address this, but requires large datasets linking GREs to their quantitative function. However, experimental methods to generate such datasets are either application-specific or technically complex and error-prone. Here, we introduce DNA-based phenotypic recording as a widely applicable, practicable approach to generate large-scale sequence-function datasets. We use a site-specific recombinase to directly record a GRE's effect in DNA, enabling readout of both sequence and quantitative function for extremely large GRE-sets via next-generation sequencing. We record translation kinetics of over 300,000 bacterial ribosome binding sites (RBSs) in >2.7 million sequence-function pairs in a single experiment. Further, we introduce a deep learning approach employing ensembling and uncertainty modelling that predicts RBS function with high accuracy, outperforming state-of-the-art methods. DNA-based phenotypic recording combined with deep learning represents a major advance in our ability to predict function from genetic sequence.

SUBMITTER: Hollerer S 

PROVIDER: S-EPMC7363850 | biostudies-literature | 2020 Jul

REPOSITORIES: biostudies-literature

altmetric image

Publications

Large-scale DNA-based phenotypic recording and deep learning enable highly accurate sequence-function mapping.

Höllerer Simon S   Papaxanthos Laetitia L   Gumpinger Anja Cathrin AC   Fischer Katrin K   Beisel Christian C   Borgwardt Karsten K   Benenson Yaakov Y   Jeschek Markus M  

Nature communications 20200715 1


Predicting effects of gene regulatory elements (GREs) is a longstanding challenge in biology. Machine learning may address this, but requires large datasets linking GREs to their quantitative function. However, experimental methods to generate such datasets are either application-specific or technically complex and error-prone. Here, we introduce DNA-based phenotypic recording as a widely applicable, practicable approach to generate large-scale sequence-function datasets. We use a site-specific  ...[more]

Similar Datasets

| S-EPMC9249189 | biostudies-literature
| S-EPMC3367211 | biostudies-literature
| S-EPMC6460955 | biostudies-literature
| S-EPMC4363621 | biostudies-literature
| S-EPMC8149826 | biostudies-literature
| S-EPMC6415887 | biostudies-literature
2023-07-11 | GSE228009 | GEO