Unknown

Dataset Information

0

Towards practical and robust DNA-based data archiving using the yin-yang codec system.


ABSTRACT: DNA is a promising data storage medium due to its remarkable durability and space-efficient storage. Early bit-to-base transcoding schemes have primarily pursued information density, at the expense of introducing biocompatibility challenges or decoding failure. Here we propose a robust transcoding algorithm named the yin-yang codec, using two rules to encode two binary bits into one nucleotide, to generate DNA sequences that are highly compatible with synthesis and sequencing technologies. We encoded two representative file formats and stored them in vitro as 200 nt oligo pools and in vivo as a ~54 kbps DNA fragment in yeast cells. Sequencing results show that the yin-yang codec exhibits high robustness and reliability for a wide variety of data types, with an average recovery rate of 99.9% above 104 molecule copies and an achieved recovery rate of 87.53% at ≤102 copies. Additionally, the in vivo storage demonstration achieved an experimentally measured physical density close to the theoretical maximum.

SUBMITTER: Ping Z 

PROVIDER: S-EPMC10766522 | biostudies-literature | 2022 Apr

REPOSITORIES: biostudies-literature

altmetric image

Publications

Towards practical and robust DNA-based data archiving using the yin-yang codec system.

Ping Zhi Z   Chen Shihong S   Zhou Guangyu G   Huang Xiaoluo X   Zhu Sha Joe SJ   Zhang Haoling H   Lee Henry H HH   Lan Zhaojun Z   Cui Jie J   Chen Tai T   Zhang Wenwei W   Yang Huanming H   Xu Xun X   Church George M GM   Shen Yue Y  

Nature computational science 20220425 4


DNA is a promising data storage medium due to its remarkable durability and space-efficient storage. Early bit-to-base transcoding schemes have primarily pursued information density, at the expense of introducing biocompatibility challenges or decoding failure. Here we propose a robust transcoding algorithm named the yin-yang codec, using two rules to encode two binary bits into one nucleotide, to generate DNA sequences that are highly compatible with synthesis and sequencing technologies. We en  ...[more]

Similar Datasets

| S-EPMC11525716 | biostudies-literature
| S-EPMC9989090 | biostudies-literature
| S-EPMC7473878 | biostudies-literature
| S-EPMC3386643 | biostudies-literature
| S-EPMC10529878 | biostudies-literature
| S-EPMC10294226 | biostudies-literature
| S-EPMC5411326 | biostudies-literature
| S-EPMC5007621 | biostudies-literature
| S-EPMC3913823 | biostudies-other
| S-EPMC4779672 | biostudies-literature