Improving radiotherapy quality assurance in clinical trials: assessment of target volume delineation of the pre-accrual benchmark case.
ABSTRACT: As the complexity of radiotherapy (RT) trials increases, issues surrounding target volume delineation will become more important. Some form of outlining assessment prior to trial entry is increasingly being mandated in UK RT trials. This document produced by the Outlining and Imaging Subgroup (OISG) of the National Cancer Research Institute will address methods to reduce interobserver variation in clinical trials and how to conduct an assessment of outlining through a pre-accrual benchmark case. We review currently available methods of describing the variation and identify areas where further work is needed. The OISG would encourage ongoing discussion with chief investigators in order to provide advice on individual aspects of benchmark case assessment for current and future trials.
Project description:SCOPE 1 was the first UK based multi-centre trial involving radiotherapy of the oesophagus. A comprehensive radiotherapy trials quality assurance programme was launched with two main aims: 1. To assist centres, where needed, to adapt their radiotherapy techniques in order to achieve protocol compliance and thereby enable their participation in the trial. 2. To support the trial's clinical outcomes by ensuring the consistent planning and delivery of radiotherapy across all participating centres.A detailed information package was provided and centres were required to complete a benchmark case in which the delineated target volumes and organs at risk, dose distribution and completion of a plan assessment form were assessed prior to recruiting patients into the trial. Upon recruiting, the quality assurance (QA) programme continued to monitor the outlining and planning of radiotherapy treatments. Completion of a questionnaire was requested in order to gather information about each centre's equipment and techniques relating to their trial participation and to assess the impact of the trial nationally on standard practice for radiotherapy of the oesophagus. During the trial, advice was available for individual planning issues, and was circulated amongst the SCOPE 1 community in response to common areas of concern using bulletins.36 centres were supported through QA processes to enable their participation in SCOPE1. We discuss the issues which have arisen throughout this process and present details of the benchmark case solutions, centre questionnaires and on-trial protocol compliance. The range of submitted benchmark case GTV volumes was 29.8-67.8cm3; and PTV volumes 221.9-513.3 cm3. For the dose distributions associated with these volumes, the percentage volume of the lungs receiving 20Gy (V20Gy) ranged from 20.4 to 33.5%. Similarly, heart V40Gy ranged from 16.1 to 33.0%. Incidence of incorrect outlining of OAR volumes increased from 50% of centres at benchmark case, to 64% on trial. Sixty-five percent of centres, who returned the trial questionnaire, stated that their standard practice had changed as a result of their participation in the SCOPE1 trial.The SCOPE 1 QA programme outcomes lend support to the trial's clinical conclusions. The range of patient planning outcomes for the benchmark case indicated, at the outset of the trial, the significant degree of variation present in UK oesophageal radiotherapy planning outcomes, despite the presence of a protocol. This supports the case for increasingly detailed definition of practice by means of consensus protocols, training and peer review. The incidence of minor inconsistencies of technique highlights the potential for improved QA systems and the need for sufficient resource for this to be addressed within future trials. As indicated in questionnaire responses, the QA exercise as a whole has contributed to greater consistency of oesophageal radiotherapy in the UK via the adoption into standard practice of elements of the protocol.The SCOPE1 trial is an International Standard Randomized Controlled Trial, ISRCTN47718479 .
Project description:We updated our protein-protein docking benchmark to include complexes that became available since our previous release. As before, we only considered high-resolution complex structures that are nonredundant at the family-family pair level, for which the X-ray or NMR unbound structures of the constituent proteins are also available. Benchmark 4.0 adds 52 new complexes to the 124 cases of Benchmark 3.0, representing an increase of 42%. Thus, benchmark 4.0 provides 176 unbound-unbound cases that can be used for protein-protein docking method development and assessment. Seventeen of the newly added cases are enzyme-inhibitor complexes, and we found no new antigen-antibody complexes. Classifying the new cases according to expected difficulty for protein-protein docking algorithms gives 33 rigid body cases, 11 cases of medium difficulty, and 8 cases that are difficult. Benchmark 4.0 listings and processed structure files are publicly accessible at http://zlab.umassmed.edu/benchmark/.
Project description:<h4>Background</h4>Helical membrane proteins are vital for the interaction of cells with their environment. Predicting the location of membrane helices in protein amino acid sequences provides substantial understanding of their structure and function and identifies membrane proteins in sequenced genomes. Currently there is no comprehensive benchmark tool for evaluating prediction methods, and there is no publication comparing all available prediction tools. Current benchmark literature is outdated, as recently determined membrane protein structures are not included. Current literature is also limited to global assessments, as specialised benchmarks for predicting specific classes of membrane proteins were not previously carried out.<h4>Description</h4>We present a benchmark server at http://sydney.edu.au/pharmacy/sbio/software/TMH_benchmark.shtml that uses recent high resolution protein structural data to provide a comprehensive assessment of the accuracy of existing membrane helix prediction methods. The server further allows a user to compare uploaded predictions generated by novel methods, permitting the comparison of these novel methods against all existing methods compared by the server. Benchmark metrics include sensitivity and specificity of predictions for membrane helix location and orientation, and many others. The server allows for customised evaluations such as assessing prediction method performances for specific helical membrane protein subtypes.We report results for custom benchmarks which illustrate how the server may be used for specialised benchmarks. Which prediction method is the best performing method depends on which measure is being benchmarked. The OCTOPUS membrane helix prediction method is consistently one of the highest performing methods across all measures in the benchmarks that we performed.<h4>Conclusions</h4>The benchmark server allows general and specialised assessment of existing and novel membrane helix prediction methods. Users can employ this benchmark server to determine the most suitable method for the type of prediction the user needs to perform, be it general whole-genome annotation or the prediction of specific types of helical membrane protein. Creators of novel prediction methods can use this benchmark server to evaluate the performance of their new methods. The benchmark server will be a valuable tool for researchers seeking to extract more sophisticated information from the large and growing protein sequence databases.
Project description:While a general goal of early phase clinical studies is to identify an acceptable dose for further investigation, modern dose finding studies and designs are highly specific to individual clinical settings. In addition, as outcome-adaptive dose finding methods often involve complex algorithms, it is crucial to have diagnostic tools to evaluate the plausibility of a method's simulated performance and the adequacy of the algorithm. In this article, we propose a simple technique that provides an upper limit, or a benchmark, of accuracy for dose finding methods for a given design objective. The proposed benchmark is nonparametric optimal in the sense of O'Quigley et al. (2002, Biostatistics 3, 51-56), and is demonstrated by examples to be a practical accuracy upper bound for model-based dose finding methods. We illustrate the implementation of the technique in the context of phase I trials that consider multiple toxicities and phase I/II trials where dosing decisions are based on both toxicity and efficacy, and apply the benchmark to several clinical examples considered in the literature. By comparing the operating characteristics of a dose finding method to that of the benchmark, we can form quick initial assessments of whether the method is adequately calibrated and evaluate its sensitivity to the dose-outcome relationships.
Project description:BACKGROUND:The SCOPE trials (SCOPE 1, NeoSCOPE and SCOPE 2) have been the backbone of oesophageal RT trials in the UK. Many changes in oesophageal RT techniques have taken place in this time. The SCOPE trials have, in addition to adopting these new techniques, been influential in aiding centres with their implementation. We discuss the progress made through the SCOPE trials and include details of a questionnaire sent to participating centres. to establish the role that trial participation played in RT changes in their centre. METHODS:Questionnaires were sent to 47 centres, 27 were returned. RESULTS:100% of centres stated their departmental protocol for TVD was based on the relevant SCOPE trial protocol. 4DCT use has increased from 42 to 71%. Type B planning algorithms, mandated in the NeoSCOPE trial, were used in 79.9% pre NeoSCOPE and now in 83.3%. 12.5% of centres were using a stomach filling protocol pre NeoSCOPE, now risen to 50%. CBCT was mandated for IGRT in the NeoSCOPE trial. 66.7% used this routinely pre NeoSCOPE/SCOPE 2 which has risen to 87.5% in the survey. CONCLUSION:The results of the questionnaires show how participation in national oesophageal RT trials has led to the adoption of newer RT techniques in UK centres, leading to better patient care.
Project description:BACKGROUND:Benchmark datasets are essential for both method development and performance assessment. These datasets have numerous requirements, representativeness being one. In the case of variant tolerance/pathogenicity prediction, representativeness means that the dataset covers the space of variations and their effects. RESULTS:We performed the first analysis of the representativeness of variation benchmark datasets. We used statistical approaches to investigate how proteins in the benchmark datasets were representative for the entire human protein universe. We investigated the distributions of variants in chromosomes, protein structures, CATH domains and classes, Pfam protein families, Enzyme Commission (EC) classifications and Gene Ontology annotations in 24 datasets that have been used for training and testing variant tolerance prediction methods. All the datasets were available in VariBench or VariSNP databases. We tested also whether the pathogenic variant datasets contained neutral variants defined as those that have high minor allele frequency in the ExAC database. The distributions of variants over the chromosomes and proteins varied greatly between the datasets. CONCLUSIONS:None of the datasets was found to be well representative. Many of the tested datasets had quite good coverage of the different protein characteristics. Dataset size correlates to representativeness but only weakly to the performance of methods trained on them. The results imply that dataset representativeness is an important factor and should be taken into account in predictor development and testing.
Project description:This article contains data related to the research article Auzias et al. (2015) . This data can be used as a benchmark for quantitative evaluation of sulcal pits extraction algorithm. In particular, it allows a quantitative comparison with our method, and the assessment of the consistency of the sulcal pits extraction across two well-matched populations.
Project description:Benchmark analysis is a widely used tool in public health risk analysis. Therein, estimation of minimum exposure levels, called Benchmark Doses (BMDs), that induce a prespecified Benchmark Response (BMR) is well understood for the case of an adverse response to a single stimulus. For cases where two agents are studied in tandem, however, the benchmark approach is far less developed. This article demonstrates how the benchmark modeling paradigm can be expanded from the single-dose setting to joint-action, two-agent studies. Focus is on response outcomes expressed as proportions. Extending the single-exposure setting, representations of risk are based on a joint-action dose-response model involving both agents. Based on such a model, the concept of a benchmark profile (BMP) - a two-dimensional analog of the single-dose BMD at which both agents achieve the specified BMR - is defined for use in quantitative risk characterization and assessment. The resulting, joint, low-dose guidelines can improve public health planning and risk regulation when dealing with low-level exposures to combinations of hazardous agents.
Project description:With the development of many computational methods that predict the structural models of protein-protein complexes, there is a pressing need to benchmark their performance. As was the case for protein monomers, assessing the quality of models of protein complexes is not straightforward. An effective scoring scheme should be able to detect substructure similarity and estimate its statistical significance. Here, we focus on characterizing the similarity of the interfaces of the complex and introduce two scoring functions. The first, the interfacial Template Modeling score (iTM-score), measures the geometric distance between the interfaces, while the second, the Interface Similarity score (IS-score), evaluates their residue-residue contact similarity in addition to their geometric similarity. We first demonstrate that the IS-score is more suitable for assessing docking models than the iTM-score. The IS-score is then validated in a large-scale benchmark test on 1562 dimeric complexes. Finally, the scoring function is applied to evaluate docking models submitted to the Critical Assessment of Prediction of Interactions (CAPRI) experiments. While the results according to the new scoring scheme are generally consistent with the original CAPRI assessment, the IS-score identifies models whose significance was previously underestimated.