Genomics

Dataset Information

344

Deep Mutational Scanning of HIV tat and rev in a non-overlapped context


ABSTRACT: This dataset provides allele counts and raw fastqs for deep mutational scanning of the HIV-1 genes tat and rev when not-overlapped with one another (placed in the nef locus) as described in Fernandes et al. Functional segregation of overlapping genes in HIV Cell 2016 (in revision). Preselection (input) and post selection (replicate 1/2) files for every possible point mutant of these two HIV proteins from the NL4-3 background are given.Tab delimited files including codon counts across the amplicons are also included and are probably the most useful thing to most researchers. The data here was used to generate Figures 3 and 4 and 7 and might be of general use for people interested in deep mutational scanning, looking for signatures of epistasis in rev or tat, or reanalyzing and mining the data. FAQ: Why do the ends of each amplicon have such variation? In order to increase diversity across the flowcell, I pooled standard primers with N, NN, and NNN extensions to throw amplicons out of phase. When aligning you should trim the ends or ignore them. This means that the overlap between PE's can vary by 3 nt. Why are the filenames not easy to deal with? The filenames are tied to separate MiSeq runs. I hope to clean up the nomenclature and update this entry in the future while preserving the run information. You can get a sense of that as different residues will vary in Q-score, and that is mostly tied to the run they were pooled on and not any interesting biology. While this is makes it a little harder to follow, I think it's good to get a sense that doing this kind of analysis in high-throughput fashion leads to a reasonable amount of failure (i.e. RNA isolation, RT, fail) that led to repetition until we had good data for every position. Can you help me deal with this dataset? Yes. Please email me at jferna10@ucsc.edu, or contact me on twitter @jdf_ev. For reagents please contact Alan Frankel at frankel@cgl.ucsf.edu. Do you have the analysis scripts you used to process the data? Yes they are on github. https://github.com/nbstrauli/allele_frequency_trajectory_sim

INSTRUMENT(S): Illumina MiSeq

ORGANISM(S): HIV-1 vector pNL4-3  

SUBMITTER: Jason D Fernandes  

PROVIDER: E-MTAB-5154 | ArrayExpress | 2016-10-30

SECONDARY ACCESSION(S): ERP018159

REPOSITORIES: ArrayExpress, ENA

Similar Datasets

| PRJEB16306 | ENA
| GSE33706 | GEO
2011-11-06 | E-GEOD-30734 | ArrayExpress
2015-12-31 | E-GEOD-33705 | ArrayExpress
2011-11-06 | E-GEOD-30739 | ArrayExpress
2011-11-06 | E-GEOD-30736 | ArrayExpress
2011-11-06 | E-GEOD-30738 | ArrayExpress
2015-12-31 | E-GEOD-33706 | ArrayExpress
2013-08-02 | E-GEOD-44460 | ArrayExpress
2015-12-31 | E-GEOD-32626 | ArrayExpress