Project description:Biomedical literature is an essential source of biomedical evidence. To translate the evidence for biomedicine study, researchers often need to carefully read multiple articles about specific biomedical issues. These articles thus need to be highly related to each other. They should share similar core contents, including research goals, methods, and findings. However, given an article r, it is challenging for search engines to retrieve highly related articles for r. In this paper, we present a technique PBC (Passage-based Bibliographic Coupling) that estimates inter-article similarity by seamlessly integrating bibliographic coupling with the information collected from context passages around important out-link citations (references) in each article. Empirical evaluation shows that PBC can significantly improve the retrieval of those articles that biomedical experts believe to be highly related to specific articles about gene-disease associations. PBC can thus be used to improve search engines in retrieving the highly related articles for any given article r, even when r is cited by very few (or even no) articles. The contribution is essential for those researchers and text mining systems that aim at cross-validating the evidence about specific gene-disease associations.
Project description:Classification schemes for scientific activity and publications underpin a large swath of research evaluation practices at the organizational, governmental, and national levels. Several research classifications are currently in use, and they require continuous work as new classification techniques becomes available and as new research topics emerge. Convolutional neural networks, a subset of "deep learning" approaches, have recently offered novel and highly performant methods for classifying voluminous corpora of text. This article benchmarks a deep learning classification technique on more than 40 million scientific articles and on tens of thousands of scholarly journals. The comparison is performed against bibliographic coupling-, direct citation-, and manual-based classifications-the established and most widely used approaches in the field of bibliometrics, and by extension, in many science and innovation policy activities such as grant competition management. The results reveal that the performance of this first iteration of a deep learning approach is equivalent to the graph-based bibliometric approaches. All methods presented are also on par with manual classification. Somewhat surprisingly, no machine learning approaches were found to clearly outperform the simple label propagation approach that is direct citation. In conclusion, deep learning is promising because it performed just as well as the other approaches but has more flexibility to be further improved. For example, a deep neural network incorporating information from the citation network is likely to hold the key to an even better classification algorithm.
Project description:Bilingual dictionaries for technical terms such as biomedical terms are an important resource for machine translation systems as well as for humans who would like to understand a concept described in a foreign language. Often a biomedical term is first proposed in English and later it is manually translated to other languages. Despite the fact that there are large monolingual lexicons of biomedical terms, only a fraction of those term lexicons are translated to other languages. Manually compiling large-scale bilingual dictionaries for technical domains is a challenging task because it is difficult to find a sufficiently large number of bilingual experts. We propose a cross-lingual similarity measure for detecting most similar translation candidates for a biomedical term specified in one language (source) from another language (target). Specifically, a biomedical term in a language is represented using two types of features: (a) intrinsic features that consist of character n-grams extracted from the term under consideration, and (b) extrinsic features that consist of unigrams and bigrams extracted from the contextual windows surrounding the term under consideration. We propose a cross-lingual similarity measure using each of those feature types. First, to reduce the dimensionality of the feature space in each language, we propose prototype vector projection (PVP)--a non-negative lower-dimensional vector projection method. Second, we propose a method to learn a mapping between the feature spaces in the source and target language using partial least squares regression (PLSR). The proposed method requires only a small number of training instances to learn a cross-lingual similarity measure. The proposed PVP method outperforms popular dimensionality reduction methods such as the singular value decomposition (SVD) and non-negative matrix factorization (NMF) in a nearest neighbor prediction task. Moreover, our experimental results covering several language pairs such as English-French, English-Spanish, English-Greek, and English-Japanese show that the proposed method outperforms several other feature projection methods in biomedical term translation prediction tasks.
Project description:Study designBibliographic analysis.ObjectiveThe aim of this study is to identify the most cited studies on lumbar spondylolisthesis and report their impact in spine field.MethodsThomson Reuters Web of Science-Science Citation Index Expanded was searched using title-specific search "spondylolisthesis." All studies published in English language between 1900 and 2019 were included with no restrictions. The top 100 cited articles were identified using "Times cited" arranging articles from high to low according to citation count. Further analysis was made to obtain the following items: article title, author's name and specialty, country of origin, institution, journal of publication, year of publication, citations number, study design.ResultsThe citation count of the top 100 articles ranged from 68 to 589. All published between 1932 and 2016. Among 20 journals, Spine had the highest number of articles (49), with citation number of 6155 out of 13 618. Second ranked was Journal of Bone and Joint Surgery with 15 articles and total citations of 3023. With regard to the primary author's specialty, orthopedic surgeons contributed to the majority of top 100 list with 82 articles, and neurosurgery was the second specialty with 11 articles. The United States had produced more than half of the list with 59 articles. England was the second country with 7 articles. Surgical management of degenerative lumbar spondylolisthesis was the most common discussed topic.ConclusionThis article identifies the top 100 influential articles on lumbar spondylolisthesis and recognizes an important aspect of knowledge evolution served by leading researchers as they guided today's clinical decision making in spondylolisthesis.
Project description:Mitochondrial inner membrane potentials in cardiomyocytes may oscillate in cycles of depolarization/repolarization when the mitochondrial network is exposed to metabolic or oxidative stress. The frequencies of such oscillations are dynamically changing while clusters of weakly coupled mitochondrial oscillators adjust to a common phase and frequency. Across the cardiac myocyte, the averaged signal of the mitochondrial population follows self-similar or fractal dynamics; however, fractal properties of individual mitochondrial oscillators have not yet been examined. We show that the largest synchronously oscillating cluster exhibits a fractal dimension, D, that is indicative of self-similar behavior with D=1.27±0.11, in contrast to the remaining network mitochondria whose fractal dimension is close to that of Brownian noise, D=1.58±0.10. We further demonstrate that fractal behavior is correlated with local coupling mechanisms, whereas it is only weakly linked to measures of functional connections between mitochondria. Our findings suggest that individual mitochondrial fractal dimensions may serve as a simple measure of local mitochondrial coupling.
Project description:ObjectiveGlobal neurosurgery is the practice of neurosurgery with the primary purpose of delivering timely, safe, and affordable neurosurgical care to all who need it. The aim of this study is to identify the most frequently cited articles in global neurosurgery through a bibliographic review to characterize articles and trends around this growing topic.MethodsThe top most-cited articles in global neurosurgery were determined by searching the Web of Science database using a priori search terms. Articles with at least 5 citations were selected, and there were no time period or language restrictions. The data were extracted from each included article and all characteristics were summarized.ResultsA total of 932 articles were identified using the search terms; 69 articles fulfilled inclusion criteria and 17 articles were selected that had more than 5 citations. The articles' number of citations ranged from 6 to 98 for the most-cited article. Authors from, or affiliated with, 14 countries contributed to the 17 articles, and the country that had the greatest representation was the United States. The main topic discussed was surgical capacity, the second topic was the treatment of different neurosurgical conditions, and volunteerism was the third topic.ConclusionsThere is currently a deficit in both the amount of literature surrounding the topic of global neurosurgery and how much that literature is cited. Developing innovative ways to increase academic productivity within, or in collaboration with, low-middle income countries is essential to contribute to global neurosurgery.
Project description:Referencing scholarly documents as information sources on Wikipedia is important because it supports or improves the quality of Wikipedia content. Several studies have been conducted regarding scholarly references on Wikipedia; however, little is known of the editors and their edits contributing to add the scholarly references on Wikipedia. In this study, we develop a methodology to detect the oldest scholarly reference added to Wikipedia articles by which a certain paper is uniquely identifiable as the "first appearance of the scholarly reference." We identified the first appearances of 923,894 scholarly references (611,119 unique DOIs) in 180,795 unique pages on English Wikipedia as of March 1, 2017 and stored them in the dataset. Moreover, we assessed the precision of the dataset, which was highly precise regardless of the research field. Finally, we demonstrate the potential of our dataset. This dataset is unique and attracts those who are interested in how the scholarly references on Wikipedia grew and which editors added them.
Project description:BackgroundSimilarity-based retrieval of Electronic Health Records (EHRs) from large clinical information systems provides physicians the evidence support in making diagnoses or referring examinations for the suspected cases. Clinical Terms in EHRs represent high-level conceptual information and the similarity measure established based on these terms reflects the chance of inter-patient disease co-occurrence. The assumption that clinical terms are equally relevant to a disease is unrealistic, reducing the prediction accuracy. Here we propose a term weighting approach supported by PubMed search engine to address this issue.MethodsWe collected and studied 112 abdominal computed tomography imaging examination reports from four hospitals in Hong Kong. Clinical terms, which are the image findings related to hepatocellular carcinoma (HCC), were extracted from the reports. Through two systematic PubMed search methods, the generic and specific term weightings were established by estimating the conditional probabilities of clinical terms given HCC. Each report was characterized by an ontological feature vector and there were totally 6216 vector pairs. We optimized the modified direction cosine (mDC) with respect to a regularization constant embedded into the feature vector. Equal, generic and specific term weighting approaches were applied to measure the similarity of each pair and their performances for predicting inter-patient co-occurrence of HCC diagnoses were compared by using Receiver Operating Characteristics (ROC) analysis.ResultsThe Areas under the curves (AUROCs) of similarity scores based on equal, generic and specific term weighting approaches were 0.735, 0.728 and 0.743 respectively (p < 0.01). In comparison with equal term weighting, the performance was significantly improved by specific term weighting (p < 0.01) but not by generic term weighting. The clinical terms "Dysplastic nodule", "nodule of liver" and "equal density (isodense) lesion" were found the top three image findings associated with HCC in PubMed.ConclusionsOur findings suggest that the optimized similarity measure with specific term weighting to EHRs can improve significantly the accuracy for predicting the inter-patient co-occurrence of diagnosis when compared with equal and generic term weighting approaches.
Project description:Molecular biology and literature databases represent essential infrastructure for life science research. Effective integration of these data resources requires that there are structured cross-references at the level of individual articles and biological records. Here, we describe the current patterns of how database entries are cited in research articles, based on analysis of the full text Open Access articles available from Europe PMC. Focusing on citation of entries in the European Nucleotide Archive (ENA), UniProt and Protein Data Bank, Europe (PDBe), we demonstrate that text mining doubles the number of structured annotations of database record citations supplied in journal articles by publishers. Many thousands of new literature-database relationships are found by text mining, since these relationships are also not present in the set of articles cited by database records. We recommend that structured annotation of database records in articles is extended to other databases, such as ArrayExpress and Pfam, entries from which are also cited widely in the literature. The very high precision and high-throughput of this text-mining pipeline makes this activity possible both accurately and at low cost, which will allow the development of new integrated data services.