<HashMap><database>biostudies-arrayexpress</database><scores/><additional><submitter>Laura Langohr</submitter><organism>Homo sapiens</organism><software>Scanpy v.1.9.120, scvi-tools v.0.17.4</software><full_dataset_link>https://www.ebi.ac.uk/biostudies/studies/E-MTAB-16984</full_dataset_link><description>This study includes single-cell RNA sequencing of human samples across ERCC6L2 disease (ED), Shwachman-Diamond syndrome (SDS), acute myeloid leukemia (AML), and healthy controls. ED, SDS, and AML samples were processed for single-cell capture, followed by library preparation and sequencing. Sequencing reads were processed through standard pipelines, including quality control, alignment, and quantification, to generate expression matrices. The dataset contains processed single-cell gene expression data, and curated metadata describing sample identity, cell populations, and disease classification. Related raw sequencing files will be available at FEGA.</description><repository>biostudies-arrayexpress</repository><sample_protocol>Nucleic Acid Extraction - We applied single-cell RNA sequencing (scRNA-seq, 3’ 10X Genomics) and targeted genotyping by single-cell amplicon sequencing (scAmp-seq) following manufacturer’s instructions on BM samples from ED and SDS patients. The manufacturer’s instructions were followed for generating gel beads in emulsion (GEM).</sample_protocol><sample_protocol>Library Construction - For the ED and SDS samples, the Chromium Single Cell 3' Gene Expression run and library preparations were done using the 10X Genomics Chromium Next GEM Single Cell 3' Gene Expression version 3.1 Dual Index chemistry.</sample_protocol><sample_protocol>Sample Collection - Human bone marrow samples were collected from patients with ERCC6L2 disease (ED), and Shwachman-Diamond syndrome (SDS) following informed consent and institutional ethical approval. Samples were obtained as part of routine clinical diagnostics or biobanking procedures. Bone marrow samples were collected in EDTA tubes, followed by ACK -based lysis and freezing.</sample_protocol><sample_protocol>Sequencing - Libraries were sequenced on Illumina NovaSeq 6000 platform using read lengths: 28bp (Read 1), 10bp (i7 Index), 10bp (i5 Index) and 90bp (Read 2). The resulting files were processed using CellRanger 6.0.2 pipelines with default parameters to generate FASTQ files and count matrices. The Illumina bcl2fastq v2.2.0 was used to run the mkfastq pipeline and alignment was done using GRCh38.</sample_protocol><figure_sub>Organization</figure_sub><figure_sub>MINSEQE Score</figure_sub><figure_sub>Assays and Data</figure_sub><figure_sub>Processed Data</figure_sub><figure_sub>MAGE-TAB Files</figure_sub><data_protocol>Data Transformation - QC on scRNA-seq data was performed by Scanpy. We removed cells with more than 20% of mitochondrial transcripts (mtRNAs) as it has been shown that this is a sufficient threshold for human BM and removed cells with less than 200 genes expressed to remove barcodes originating from droplets with ambient/cell-free RNA only. We did not perform further filtering until necessary in downstream data analysis following recommendations for single-cell RNA-seq analyses. Data integration and cell type identification was performed using Single-cell variational inference (scVI) and Single-cell ANnotation using Variational Inference (scANVI) in combination with a hematological marker database.</data_protocol><omics_type>Unknown</omics_type><omics_type>Transcriptomics</omics_type><omics_type>Genomics</omics_type><omics_type>Proteomics</omics_type><instrument_platform>NA</instrument_platform><instrument_platform>10x Genomics Chromium Controller</instrument_platform><instrument_platform>Illumina NovaSeq 6000</instrument_platform><study_type>RNA-seq of coding RNA from single cells</study_type><species>Homo sapiens</species><pubmed_title>Distinct Stem Cell Identities Converge into Shared Erythroid Stress in ERCC6L2 Disease and Shwachman-Diamond Syndrome</pubmed_title><pubmed_authors>Ilse Kaaja</pubmed_authors><pubmed_authors>Ulla Wartiovaara-Kautto</pubmed_authors><pubmed_authors>Laura Langohr</pubmed_authors><pubmed_authors>Laura Langohr 1,2,3,4* Ilse Kaaja 1,3* Suvi Douglas 1,3 Hanna Nebelung 1,2,4 Jessica Koski 1,3 Ina Ikonen 1,3 Lotta Katainen 1,3 Katri Maljanen 1,2,4 Marja Hakkarainen 1,3,5 Tuulia Räisänen 1,3 Riitta Niinimäki 6 Sakari Kakko 7, 8 Timo Siitonen 7, 8 Sadiksha Adhikari 2,4 Markus Vähä-Koskela 2,4 Caroline A. Heckman 2,4 Jenni Lahtela 2 Ulla Wartiovaara-Kautto 1,5**  Esa Pitkänen 1,2,4** Outi Kilpivaara 1,3,4,9,10**  1 Applied Tumor Genomics Research Program, Faculty of Medicine, University of Helsinki, Helsinki, Finland 2 Institute for Molecular Medicine Finland (FIMM), HiLIFE, University of Helsinki, Helsinki, Finland 3 Department of Medical and Clinical Genetics/Medicum, University of Helsinki, Helsinki, Finland 4 iCAN Digital Precision Cancer Medicine Flagship, 00014 Helsinki, Finland 5 Department of Hematology, Helsinki University Hospital Comprehensive Cancer Center, University of Helsinki, Helsinki, Finland 6 Department of Pediatrics, Oulu University Hospital and PEDEGO Research Unit, University of Oulu, Oulu, Finland 7 Cancer Center, Oulu University Hospital, Oulu, Finland 8 Research Unit of Biomedicine and Internal Medicine, University of Oulu, Oulu, Finland 9 HUS diagnostic center (Helsinki University Hospital), HUSLAB Laboratory of Genetics, Helsinki, Finland 10 K. Albin Johansson Cancer Research Fellow, Foundation for the Finnish Cancer Institute * L.L. and I.K. contributed equally to this study ** U.W.-K., E.P. and O.K. co-supervised the study and are the corresponding authors.</pubmed_authors><pubmed_authors>Outi Kilpivaara</pubmed_authors></additional><is_claimable>false</is_claimable><name>scRNA-seq data for Distinct Stem Cell Identities Converge into Shared Erythroid Stress in ERCC6L2 Disease and Shwachman-Diamond Syndrome</name><description>This study includes single-cell RNA sequencing of human samples across ERCC6L2 disease (ED), Shwachman-Diamond syndrome (SDS), acute myeloid leukemia (AML), and healthy controls. ED, SDS, and AML samples were processed for single-cell capture, followed by library preparation and sequencing. Sequencing reads were processed through standard pipelines, including quality control, alignment, and quantification, to generate expression matrices. The dataset contains processed single-cell gene expression data, and curated metadata describing sample identity, cell populations, and disease classification. Related raw sequencing files will be available at FEGA.</description><dates><release>2026-06-03T00:00:00Z</release><modification>2026-06-03T11:01:15.831Z</modification><creation>2026-05-01T11:01:17.118Z</creation></dates><accession>E-MTAB-16984</accession><cross_references><EFO>EFO_0002944</EFO><EFO>EFO_0004170</EFO><EFO>EFO_0005684</EFO><EFO>EFO_0005518</EFO><EFO>EFO_0003816</EFO><EFO>EFO_0004184</EFO><doi>10.1002/hem3.70374</doi></cross_references></HashMap>