Dataset Information

Using Modern Methods for Missing Data Analysis with the Social Relations Model: A Bridge to Social Network Analysis.

ABSTRACT:

SUBMITTER: Jorgensen TD

PROVIDER: S-EPMC5894893 | biostudies-literature | 2018 Jul

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Using Modern Methods for Missing Data Analysis with the Social Relations Model: A Bridge to Social Network Analysis.

Jorgensen Terrence D TD Forney K Jean KJ Hall Jeffrey A JA Giles Steven S

Social networks 20171214

PMID: 29657356

Similar Datasets

Project description:BACKGROUND:Contact tracing data of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) pandemic is used to estimate basic epidemiological parameters. Contact tracing data could also be potentially used for assessing the heterogeneity of transmission at the individual patient level. Characterization of individuals based on different levels of infectiousness could better inform the contact tracing interventions at field levels. METHODS:Standard social network analysis methods used for exploring infectious disease transmission dynamics was employed to analyze contact tracing data of 1959 diagnosed SARS-CoV-2 patients from a large state of India. Relational network data set with diagnosed patients as "nodes" and their epidemiological contact as "edges" was created. Directed network perspective was utilized in which directionality of infection emanated from a "source patient" towards a "target patient". Network measures of " degree centrality" and "betweenness centrality" were calculated to identify influential patients in the transmission of infection. Components analysis was conducted to identify patients connected as sub- groups. Descriptive statistics was used to summarise network measures and percentile ranks were used to categorize influencers. RESULTS:Out-degree centrality measures identified that of the total 1959 patients, 11.27% (221) patients have acted as a source of infection to 40.19% (787) other patients. Among these source patients, 0.65% (12) patients had a higher out-degree centrality (>?=?10) and have collectively infected 37.61% (296 of 787), secondary patients. Betweenness centrality measures highlighted that 7.50% (93) patients had a non-zero betweenness (range 0.5 to 135) and thus have bridged the transmission between other patients. Network component analysis identified nineteen connected components comprising of influential patient's which have overall accounted for 26.95% of total patients (1959) and 68.74% of epidemiological contacts in the network. CONCLUSIONS:Social network analysis method for SARS-CoV-2 contact tracing data would be of use in measuring individual patient level variations in disease transmission. The network metrics identified individual patients and patient components who have disproportionately contributed to transmission. The network measures and graphical tools could complement the existing contact tracing indicators and could help improve the contact tracing activities.

Project description:BackgroundThe potential value of large scale datasets is constrained by the ubiquitous problem of missing data, arising in either a structured or unstructured fashion. When imputation methods are proposed for large scale data, one limitation is the simplicity of existing evaluation methods. Specifically, most evaluations create synthetic data with only a simple, unstructured missing data mechanism which does not resemble the missing data patterns found in real data. For example, in the UK Biobank missing data tends to appear in blocks, because non-participation in one of the sub-studies leads to missingness for all sub-study variables.MethodsWe propose a tool for generating mixed type missing data mimicking key properties of a given real large scale epidemiological data set with both structured and unstructured missingness while accounting for informative missingness. The process involves identifying sub-studies using hierarchical clustering of missingness patterns and modelling the dependence of inter-variable correlation and co-missingness patterns.ResultsOn the UK Biobank brain imaging cohort, we identify several large blocks of missing data. We demonstrate the use of our tool for evaluating several imputation methods, showing modest accuracy of imputation overall, with iterative imputation having the best performance. We compare our evaluations based on synthetic data to an exemplar study which includes variable selection on a single real imputed dataset, finding only small differences between the imputation methods though with iterative imputation leading to the most informative selection of variables.ConclusionsWe have created a framework for simulating large scale data with that captures the complexities of the inter-variable dependence as well as structured and unstructured informative missingness. Evaluations using this framework highlight the immense challenge of data imputation in this setting and the need for improved missing data methods.

Project description:BackgroundAvailability of linked biomedical and social science data has risen dramatically in past decades, facilitating holistic and systems-based analyses. Among these, Bayesian networks have great potential to tackle complex interdisciplinary problems, because they can easily model inter-relations between variables. They work by encoding conditional independence relationships discovered via advanced inference algorithms. One challenge is dealing with missing data, ubiquitous in survey or biomedical datasets. Missing data is rarely addressed in an advanced way in Bayesian networks; the most common approach is to discard all samples containing missing measurements. This can lead to biased estimates. Here, we examine how Bayesian network structure learning can incorporate missing data.MethodsWe use a simulation approach to compare a commonly used method in frequentist statistics, multiple imputation by chained equations (MICE), with one specific for Bayesian network learning, structural expectation-maximization (SEM). We simulate multiple incomplete categorical (discrete) data sets with different missingness mechanisms, variable numbers, data amount, and missingness proportions. We evaluate performance of MICE and SEM in capturing network structure. We then apply SEM combined with community analysis to a real-world dataset of linked biomedical and social data to investigate associations between socio-demographic factors and multiple chronic conditions in the US elderly population.ResultsWe find that applying either method (MICE or SEM) provides better structure recovery than doing nothing, and SEM in general outperforms MICE. This finding is robust across missingness mechanisms, variable numbers, data amount and missingness proportions. We also find that imputed data from SEM is more accurate than from MICE. Our real-world application recovers known inter-relationships among socio-demographic factors and common multimorbidities. This network analysis also highlights potential areas of investigation, such as links between cancer and cognitive impairment and disconnect between self-assessed memory decline and standard cognitive impairment measurement.ConclusionOur simulation results suggest taking advantage of the additional information provided by network structure during SEM improves the performance of Bayesian networks; this might be especially useful for social science and other interdisciplinary analyses. Our case study show that comorbidities of different diseases interact with each other and are closely associated with socio-demographic factors.

Dataset Information

Using Modern Methods for Missing Data Analysis with the Social Relations Model: A Bridge to Social Network Analysis.

Publications

Using Modern Methods for Missing Data Analysis with the Social Relations Model: A Bridge to Social Network Analysis.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets