Hybrid RGSA and Support Vector Machine Framework for Three-Dimensional Magnetic Resonance Brain Tumor Classification.
Ontology highlight
ABSTRACT: A novel hybrid approach for the identification of brain regions using magnetic resonance images accountable for brain tumor is presented in this paper. Classification of medical images is substantial in both clinical and research areas. Magnetic resonance imaging (MRI) modality outperforms towards diagnosing brain abnormalities like brain tumor, multiple sclerosis, hemorrhage, and many more. The primary objective of this work is to propose a three-dimensional (3D) novel brain tumor classification model using MRI images with both micro- and macroscale textures designed to differentiate the MRI of brain under two classes of lesion, benign and malignant. The design approach was initially preprocessed using 3D Gaussian filter. Based on VOI (volume of interest) of the image, features were extracted using 3D volumetric Square Centroid Lines Gray Level Distribution Method (SCLGM) along with 3D run length and cooccurrence matrix. The optimal features are selected using the proposed refined gravitational search algorithm (RGSA). Support vector machines, over backpropagation network, and k-nearest neighbor are used to evaluate the goodness of classifier approach. The preliminary evaluation of the system is performed using 320 real-time brain MRI images. The system is trained and tested by using a leave-one-case-out method. The performance of the classifier is tested using the receiver operating characteristic curve of 0.986 (±002). The experimental results demonstrate the systematic and efficient feature extraction and feature selection algorithm to the performance of state-of-the-art feature classification methods.
Project description:Automatic feature extraction and classification are two main tasks in abnormal ECG beat recognition. Feature extraction is an important prerequisite prior to classification since it provides the classifier with input features, and the performance of classifier depends significantly on the quality of these features. This study develops an effective method to extract low-dimensional ECG beat feature vectors. It employs wavelet multi-resolution analysis to extract time-frequency domain features and then applies principle component analysis to reduce the dimension of the feature vector. In classification, 12-element feature vectors characterizing six types of beats are used as inputs for one-versus-one support vector machine, which is conducted in form of 10-fold cross validation with beat-based and record-based training schemes. Tested upon a total of 107049 beats from MIT-BIH arrhythmia database, our method has achieved average sensitivity, specificity and accuracy of 99.09%, 99.82% and 99.70%, respectively, using the beat-based training scheme, and 44.40%, 88.88% and 81.47%, respectively, using the record-based training scheme.
Project description:Structural brain alterations have been repeatedly reported in schizophrenia; however, the pathophysiology of its alterations remains unclear. Multivariate pattern recognition analysis such as support vector machines can classify patients and healthy controls by detecting subtle and spatially distributed patterns of structural alterations. We aimed to use a support vector machine to distinguish patients with schizophrenia from control participants on the basis of structural magnetic resonance imaging data and delineate the patterns of structural alterations that significantly contributed to the classification performance. We used independent datasets from different sites with different magnetic resonance imaging scanners, protocols and clinical characteristics of the patient group to achieve a more accurate estimate of the classification performance of support vector machines. We developed a support vector machine classifier using the dataset from one site (101 participants) and evaluated the performance of the trained support vector machine using a dataset from the other site (97 participants) and vice versa. We assessed the performance of the trained support vector machines in each support vector machine classifier. Both support vector machine classifiers attained a classification accuracy of >70% with two independent datasets indicating a consistently high performance of support vector machines even when used to classify data from different sites, scanners and different acquisition protocols. The regions contributing to the classification accuracy included the bilateral medial frontal cortex, superior temporal cortex, insula, occipital cortex, cerebellum, and thalamus, which have been reported to be related to the pathogenesis of schizophrenia. These results indicated that the support vector machine could detect subtle structural brain alterations and might aid our understanding of the pathophysiology of these changes in schizophrenia, which could be one of the diagnostic findings of schizophrenia.
Project description:BackgroundSynthesizing and characterizing aptamers with high affinity and specificity have been extensively carried out for analytical and biomedical applications. Few publications can be found that describe structure-activity relationships (SARs) of candidate aptamer sequences.MethodologyThis paper reports pattern recognition with support vector machine (SVM) classification techniques for the identification of streptavidin-binding aptamers as "low" or "high" affinity aptamers. The SVM parameters C and γ were optimized using genetic algorithms. Four descriptors, the topological descriptor PW4 (path/walk 4--Randic shape index), the connectivity index X3A (average connectivity index chi-3), the topological charge index JGI2 (mean topological charge index of order 2), and the free energy E of the secondary structure, were used to describe the structures of candidate aptamer sequences from SELEX selection (Schütze et al. (2011) PLoS ONE (12):e29604).ConclusionsThe predicted fractions of winning streptavidin-binding aptamers for ten rounds of SELEX conform to the aptamer evolutionary principles of SELEX-based screening. The feasibility of applying pattern recognition based on SVM and genetic algorithms for streptavidin-binding aptamers has been demonstrated.
Project description:It remains a great challenge to achieve sufficient cancer classification accuracy with the entire set of genes, due to the high dimensions, small sample size, and big noise of gene expression data. We thus proposed a hybrid gene selection method, Information Gain-Support Vector Machine (IG-SVM) in this study. IG was initially employed to filter irrelevant and redundant genes. Then, further removal of redundant genes was performed using SVM to eliminate the noise in the datasets more effectively. Finally, the informative genes selected by IG-SVM served as the input for the LIBSVM classifier. Compared to other related algorithms, IG-SVM showed the highest classification accuracy and superior performance as evaluated using five cancer gene expression datasets based on a few selected genes. As an example, IG-SVM achieved a classification accuracy of 90.32% for colon cancer, which is difficult to be accurately classified, only based on three genes including CSRP1, MYL9, and GUCA2B.
Project description:Repetitive TMS (rTMS) allows for non-invasive and transient disruption of local neuronal functioning. We used machine learning approaches to assess whether brain tumor patients can be accurately classified into aphasic and non-aphasic groups using their rTMS language mapping results as input features. Given that each tumor affects the subject-specific language networks differently, resulting in heterogenous rTMS functional mappings, we propose the use of machine learning strategies to classify potential patterns of rTMS language mapping results. We retrospectively included 90 patients with left perisylvian world health organization (WHO) grade II-IV gliomas that underwent presurgical navigated rTMS language mapping. Within our cohort, 29 of 90 (32.2%) patients suffered from at least mild aphasia as shown in the Aachen Aphasia Test based Berlin Aphasia Score (BAS). After spatial normalization to MNI 152 of all rTMS spots, we calculated the error rate (ER) in each stimulated cortical area (28 regions of interest, ROI) by automated anatomical labeling parcellation (AAL3) and IIT. We used a support vector machine (SVM) to classify significant areas in relation to aphasia. After feeding the ROIs into the SVM model, it revealed that in addition to age (w = 2.98), the ERs of the left supramarginal gyrus (w = 3.64), left inferior parietal gyrus (w = 2.28) and right pars triangularis (w = 1.34) contributed more than other features to the model. The model's sensitivity was 86.2%, the specificity was 82.0%, the overall accuracy was 85.5% and the AUC was 89.3%. Our results demonstrate an increased vulnerability of right inferior pars triangularis to rTMS in aphasic patients due to left perisylvian gliomas. This finding points towards a functional relevant involvement of the right pars triangularis in response to aphasia. The tumor location feature, specified by calculating overlaps with white and grey matter atlases, did not affect the SVM model. The left supramarginal gyrus as a feature improved our SVM model the most. Additionally, our results could point towards a decreasing potential for neuroplasticity with age.
Project description:We develop methods to accurately predict whether pre-symptomatic individuals are at risk of a disease based on their various marker profiles, which offers an opportunity for early intervention well before definitive clinical diagnosis. For many diseases, existing clinical literature may suggest the risk of disease varies with some markers of biological and etiological importance, for example age. To identify effective prediction rules using nonparametric decision functions, standard statistical learning approaches treat markers with clear biological importance (e.g., age) and other markers without prior knowledge on disease etiology interchangeably as input variables. Therefore, these approaches may be inadequate in singling out and preserving the effects from the biologically important variables, especially in the presence of potential noise markers. Using age as an example of a salient marker to receive special care in the analysis, we propose a local smoothing large margin classifier implemented with support vector machine (SVM) to construct effective age-dependent classification rules. The method adaptively adjusts age effect and separately tunes age and other markers to achieve optimal performance. We derive the asymptotic risk bound of the local smoothing SVM, and perform extensive simulation studies to compare with standard approaches. We apply the proposed method to two studies of premanifest Huntington's disease (HD) subjects and controls to construct age-sensitive predictive scores for the risk of HD and risk of receiving HD diagnosis during the study period.
Project description:This work is motivated by the needs of predictive analytics on healthcare data as represented by Electronic Medical Records. Such data is invariably problematic: noisy, with missing entries, with imbalance in classes of interests, leading to serious bias in predictive modeling. Since standard data mining methods often produce poor performance measures, we argue for development of specialized techniques of data-preprocessing and classification. In this paper, we propose a new method to simultaneously classify large datasets and reduce the effects of missing values. It is based on a multilevel framework of the cost-sensitive SVM and the expected maximization imputation method for missing values, which relies on iterated regression analyses. We compare classification results of multilevel SVM-based algorithms on public benchmark datasets with imbalanced classes and missing values as well as real data in health applications, and show that our multilevel SVM-based method produces fast, and more accurate and robust classification results.
Project description:This paper formulates a support vector machine with quantile hyper-spheres (QHSVM) for pattern classification. The idea of QHSVM is to build two quantile hyper-spheres with the same center for positive or negative training samples. Every quantile hyper-sphere is constructed by using pinball loss instead of hinge loss, which makes the new classification model be insensitive to noise, especially the feature noise around the decision boundary. Moreover, the robustness and generalization of QHSVM are strengthened through maximizing the margin between two quantile hyper-spheres, maximizing the inner-class clustering of samples and optimizing the independent quadratic programming for a target class. Besides that, this paper proposes a novel local center-based density estimation method. Based on it, ?-QHSVM with surrounding and clustering samples is given. Under the premise of high accuracy, the execution speed of ?-QHSVM can be adjusted. The experimental results in artificial, benchmark and strip steel surface defects datasets show that the QHSVM model has distinct advantages in accuracy and the ?-QHSVM model is fit for large-scale datasets.
Project description:The vast amount and diversity of the content shared on social media can pose a challenge for any business wanting to use it to identify potential customers. In this paper, our aim is to investigate the use of both unsupervised and supervised learning methods for target audience classification on Twitter with minimal annotation efforts. Topic domains were automatically discovered from contents shared by followers of an account owner using Twitter Latent Dirichlet Allocation (LDA). A Support Vector Machine (SVM) ensemble was then trained using contents from different account owners of the various topic domains identified by Twitter LDA. Experimental results show that the methods presented are able to successfully identify a target audience with high accuracy. In addition, we show that using a statistical inference approach such as bootstrapping in over-sampling, instead of using random sampling, to construct training datasets can achieve a better classifier in an SVM ensemble. We conclude that such an ensemble system can take advantage of data diversity, which enables real-world applications for differentiating prospective customers from the general audience, leading to business advantage in the crowded social media space.
Project description:We present the novel prokaryotic gene finder GISMO, which combines searches for protein family domains with composition-based classification based on a support vector machine. GISMO is highly accurate; exhibiting high sensitivity and specificity in gene identification. We found that it performs well for complete prokaryotic chromosomes, irrespective of their GC content, and also for plasmids as short as 10 kb, short genes and for genes with atypical sequence composition. Using GISMO, we found several thousand new predictions for the published genomes that are supported by extrinsic evidence, which strongly suggest that these are very likely biologically active genes. The source code for GISMO is freely available under the GPL license.