Unknown

Dataset Information

0

More practical differentially private publication of key statistics in GWAS.


ABSTRACT: Motivation: Analyses of datasets that contain personal genomic information are very important for revealing associations between diseases and genomes. Genome-wide association studies, which are large-scale genetic statistical analyses, often involve tests with contingency tables. However, if the statistics obtained by these tests are made public as they are, sensitive information of individuals could be leaked. Existing studies have proposed privacy-preserving methods for statistics in the χ2 test with a 3 × 2 contingency table, but they do not cover all the tests used in association studies. In addition, existing methods for releasing differentially private P-values are not practical. Results: In this work, we propose methods for releasing statistics in the χ2 test, the Fisher's exact test and the Cochran-Armitage's trend test while preserving both personal privacy and utility. Our methods for releasing P-values are the first to achieve practicality under the concept of differential privacy by considering their base 10 logarithms. We make theoretical guarantees by showing the sensitivity of the above statistics. From our experimental results, we evaluate the utility of the proposed methods and show appropriate thresholds with high accuracy for using the private statistics in actual tests.

Availability and implementation

A python implementation of our experiments is available at https://github.com/ay0408/DP-statistics-GWAS.

Supplementary information

Supplementary data are available at Bioinformatics Advances online.

SUBMITTER: Yamamoto A 

PROVIDER: S-EPMC9710635 | biostudies-literature | 2021

REPOSITORIES: biostudies-literature

altmetric image

Publications

More practical differentially private publication of key statistics in GWAS.

Yamamoto Akito A   Shibuya Tetsuo T  

Bioinformatics advances 20210518 1


<b>Motivation:</b> Analyses of datasets that contain personal genomic information are very important for revealing associations between diseases and genomes. Genome-wide association studies, which are large-scale genetic statistical analyses, often involve tests with contingency tables. However, if the statistics obtained by these tests are made public as they are, sensitive information of individuals could be leaked. Existing studies have proposed privacy-preserving methods for statistics in th  ...[more]

Similar Datasets

| S-EPMC9884206 | biostudies-literature
| S-EPMC3860164 | biostudies-other
| S-EPMC6519716 | biostudies-literature
| S-EPMC4269884 | biostudies-literature
| S-EPMC6764830 | biostudies-literature
| S-EPMC4101668 | biostudies-literature
| S-EPMC6417431 | biostudies-literature
| S-EPMC11196113 | biostudies-literature
| S-EPMC4916991 | biostudies-literature
| S-EPMC10290720 | biostudies-literature