Project description:Recently, computational approaches integrating copy number aberrations (CNAs) and gene expression (GE) have been extensively studied to identify cancer-related genes and pathways. In this work, we integrate these two data sets with protein-protein interaction (PPI) information to find cancer-related functional modules. To integrate CNA and GE data, we first built a gene-gene relationship network from a set of seed genes by enumerating all types of pairwise correlations, e.g. GE-GE, CNA-GE, and CNA-CNA, over multiple patients. Next, we propose a voting-based cancer module identification algorithm by combining topological and data-driven properties (VToD algorithm) by using the gene-gene relationship network as a source of data-driven information, and the PPI data as topological information. We applied the VToD algorithm to 266 glioblastoma multiforme (GBM) and 96 ovarian carcinoma (OVC) samples that have both expression and copy number measurements, and identified 22 GBM modules and 23 OVC modules. Among 22 GBM modules, 15, 12, and 20 modules were significantly enriched with cancer-related KEGG, BioCarta pathways, and GO terms, respectively. Among 23 OVC modules, 19, 18, and 23 modules were significantly enriched with cancer-related KEGG, BioCarta pathways, and GO terms, respectively. Similarly, we also observed that 9 and 2 GBM modules and 15 and 18 OVC modules were enriched with cancer gene census (CGC) and specific cancer driver genes, respectively. Our proposed module-detection algorithm significantly outperformed other existing methods in terms of both functional and cancer gene set enrichments. Most of the cancer-related pathways from both cancer data sets found in our algorithm contained more than two types of gene-gene relationships, showing strong positive correlations between the number of different types of relationship and CGC enrichment [Formula: see text]-values (0.64 for GBM and 0.49 for OVC). This study suggests that identified modules containing both expression changes and CNAs can explain cancer-related activities with greater insights.
Project description:A recent promise to access unstructured clinical data from electronic health records on large-scale has revitalized the interest in automated de-identification of clinical notes, which includes the identification of mentions of Protected Health Information (PHI). We describe the methods developed and evaluated as part of the i2b2/UTHealth 2014 challenge to identify PHI defined by 25 entity types in longitudinal clinical narratives. Our approach combines knowledge-driven (dictionaries and rules) and data-driven (machine learning) methods with a large range of features to address de-identification of specific named entities. In addition, we have devised a two-pass recognition approach that creates a patient-specific run-time dictionary from the PHI entities identified in the first step with high confidence, which is then used in the second pass to identify mentions that lack specific clues. The proposed method achieved the overall micro F1-measures of 91% on strict and 95% on token-level evaluation on the test dataset (514 narratives). Whilst most PHI entities can be reliably identified, particularly challenging were mentions of Organizations and Professions. Still, the overall results suggest that automated text mining methods can be used to reliably process clinical notes to identify personal information and thus providing a crucial step in large-scale de-identification of unstructured data for further clinical and epidemiological studies.
Project description:Protein-protein interaction (PPI) networks are viable tools to understand cell functions, disease machinery, and drug design/repositioning. Interpreting a PPI, however, it is a particularly challenging task because of network complexity. Several algorithms have been proposed for an automatic PPI interpretation, at first by solely considering the network topology, and later by integrating Gene Ontology (GO) terms as node similarity attributes. Here we present MTGO - Module detection via Topological information and GO knowledge, a novel functional module identification approach. MTGO let emerge the bimolecular machinery underpinning PPI networks by leveraging on both biological knowledge and topological properties. In particular, it directly exploits GO terms during the module assembling process, and labels each module with its best fit GO term, easing its functional interpretation. MTGO shows largely better results than other state of the art algorithms (including recent GO-based ones) when searching for small or sparse functional modules, while providing comparable or better results all other cases. MTGO correctly identifies molecular complexes and literature-consistent processes in an experimentally derived PPI network of Myocardial infarction. A software version of MTGO is available freely for non-commercial purposes at https://gitlab.com/d1vella/MTGO .
Project description:The discovery of topological quantum states marks a new chapter in both condensed matter physics and materials sciences. By analogy to spin electronic system, topological concepts have been extended into phonons, boosting the birth of topological phononics (TPs). Here, we present a high-throughput screening and data-driven approach to compute and evaluate TPs among over 10,000 real materials. We have discovered 5014 TP materials and grouped them into two main classes of Weyl and nodal-line (ring) TPs. We have clarified the physical mechanism for the occurrence of single Weyl, high degenerate Weyl, individual nodal-line (ring), nodal-link, nodal-chain, and nodal-net TPs in various materials and their mutual correlations. Among the phononic systems, we have predicted the hourglass nodal net TPs in TeO3, as well as the clean and single type-I Weyl TPs between the acoustic and optical branches in half-Heusler LiCaAs. In addition, we found that different types of TPs can coexist in many materials (such as ScZn). Their potential applications and experimental detections have been discussed. This work substantially increases the amount of TP materials, which enables an in-depth investigation of their structure-property relations and opens new avenues for future device design related to TPs.
Project description:Functional near-infrared spectroscopy (fNIRS) is an optical non-invasive neuroimaging technique that allows participants to move relatively freely. However, head movements frequently cause optode movements relative to the head, leading to motion artifacts (MA) in the measured signal. Here, we propose an improved algorithmic approach for MA correction that combines wavelet and correlation-based signal improvement (WCBSI). We compare its MA correction accuracy to multiple established correction approaches (spline interpolation, spline-Savitzky-Golay filter, principal component analysis, targeted principal component analysis, robust locally weighted regression smoothing filter, wavelet filter, and correlation-based signal improvement) on real data. Therefore, we measured brain activity in 20 participants performing a hand-tapping task and simultaneously moving their head to produce MAs at different levels of severity. In order to obtain a "ground truth" brain activation, we added a condition in which only the tapping task was performed. We compared the MA correction performance among the algorithms on four predefined metrics (R, RMSE, MAPE, and ΔAUC) and ranked the performances. The suggested WCBSI algorithm was the only one exceeding average performance (p < 0.001), and it had the highest probability to be the best ranked algorithm (78.8% probability). Together, our results indicate that among all algorithms tested, our suggested WCBSI approach performed consistently favorably across all measures.
Project description:Respondent-driven sampling (RDS) is widely used for collecting data on hard-to-reach populations, including information about the structure of the networks connecting the individuals. Characterizing network features can be important for designing and evaluating health programs, particularly those that involve infectious disease transmission. While the validity of population proportions estimated from RDS-based datasets has been well studied, little is known about potential biases in inference about network structure from RDS. We developed a mathematical and statistical platform to simulate network structures with exponential random graph models, and to mimic the data generation mechanisms produced by RDS. We used this framework to characterize biases in three important network statistics - density/mean degree, homophily, and transitivity. Generalized linear models were used to predict the network statistics of the original network from the network statistics of the sample network and observable sample design features. We found that RDS may introduce significant biases in the estimation of density/mean degree and transitivity, and may exaggerate homophily when preferential recruitment occurs. Adjustments to network-generating statistics derived from the prediction models could substantially improve validity of simulated networks in terms of density, and could reduce bias in replicating mean degree, homophily, and transitivity from the original network.
Project description:The oculomotor nerve (OCN) is the main motor nerve innervating eye muscles and can be involved in multiple flammatory, compressive, or pathologies. The diffusion magnetic resonance imaging (dMRI) tractography is now widely used to describe the trajectory of the OCN. However, the complex cranial structure leads to difficulties in fiber orientation distribution (FOD) modeling, fiber tracking, and region of interest (ROI) selection. Currently, the identification of OCN relies on expert manual operation, resulting in challenges, such as the carries high clinical, time-consuming, and labor costs. Thus, we propose a method that can automatically identify OCN from dMRI tractography. First, we choose the multi-shell multi-tissue constraint spherical deconvolution (MSMT-CSD) FOD estimation model and deterministic tractography to describe the 3D trajectory of the OCN. Then, we rely on the well-established computational pipeline and anatomical expertise to create a data-driven OCN tractography atlas from 40 HCP data. We identify six clusters belonging to the OCN from the atlas, including the structures of three kinds of positional relationships (pass between, pass through, and go around) with the red nuclei and two kinds of positional relationships with medial longitudinal fasciculus. Finally, we apply the proposed OCN atlas to identify the OCN automatically from 40 new HCP subjects and two patients with brainstem cavernous malformation. In terms of spatial overlap and visualization, experiment results show that the automatically and manually identified OCN fibers are consistent. Our proposed OCN atlas provides an effective tool for identifying OCN by avoiding the traditional selection strategy of ROIs.
Project description:The error correction model's main purpose in heavy hexagonal quantum codes is to improve their reliability for quantum computing applications. Existing challenges include finding the optimal decoder for quantum error correction in heavy hexagonal codes. This research propels the frontier of quantum error correction, with a specific focus on tailoring topological quantum error-correcting codes for the unique challenges posed by superconducting qubits in quantum computers. In response, this research harnesses the power of deep learning, presenting a Humming sparrow optimization based self-adaptive deep CNN (HSO-based SADCNN) model designed for heavy hexagonal codes. This decoder incorporates a Self-adaptive Deep CNN (SADCNN) Noise Correction Module, a sophisticated component to refine error correction. The proposed decoder's efficacy is rigorously evaluated across varying code distances (three, five, and seven) using the Humming Sparrow Optimization (HSO) algorithm. HSO, intricately designed to fine-tune the SADCNN decoder, significantly enhances its error correction capabilities for heavy hexagonal quantum codes. The algorithm seamlessly integrates advantageous characteristics of herding and tracing from Humming Bird optimization and Sparrow search optimization, representing a critical stride in advancing the reliability of quantum computing applications, particularly within the intricate domain of heavy hexagonal quantum codes. Based upon the achievements, the Training Percentage (TP) 90 metrics demonstrate significant progress, boasting a commendable accuracy of 97.35% , coupled with reduced logical error probability and a diminished bit error rate, marked at 5.51 and 3.72, respectively.
Project description:In the present study, a novel data-driven topological filtering technique is introduced to derive the backbone of functional brain networks relying on orthogonal minimal spanning trees (OMSTs). The method aims to identify the essential functional connections to ensure optimal information flow via the objective criterion of global efficiency minus the cost of surviving connections. The OMST technique was applied to multichannel, resting-state neuromagnetic recordings from four groups of participants: healthy adults (n = 50), adults who have suffered mild traumatic brain injury (n = 30), typically developing children (n = 27), and reading-disabled children (n = 25). Weighted interactions between network nodes (sensors) were computed using an integrated approach of dominant intrinsic coupling modes based on two alternative metrics (symbolic mutual information and phase lag index), resulting in excellent discrimination of individual cases according to their group membership. Classification results using OMST-derived functional networks were clearly superior to results using either relative power spectrum features or functional networks derived through the conventional minimal spanning tree algorithm.