Project description:Clustering is a central task in many data analysis applications. However, there is no universally accepted metric to decide the occurrence of clusters. Ultimately, we have to resort to a consensus between experts. The problem is amplified with high-dimensional datasets where classical distances become uninformative and the ability of humans to fully apprehend the distribution of the data is challenged. In this paper, we design a mobile human-computing game as a tool to query human perception for the multidimensional data clustering problem. We propose two clustering algorithms that partially or entirely rely on aggregated human answers and report the results of two experiments conducted on synthetic and real-world datasets. We show that our methods perform on par or better than the most popular automated clustering algorithms. Our results suggest that hybrid systems leveraging annotations of partial datasets collected through crowdsourcing platforms can be an efficient strategy to capture the collective wisdom for solving abstract computational problems.
Project description:Online crowdsourcing platforms such as MTurk and Prolific have revolutionized how researchers recruit human participants. However, since these platforms primarily recruit computer-based respondents, they risk not reaching respondents who may have exclusive access or spend more time on mobile devices that are more widely available. Additionally, there have been concerns that respondents who heavily utilize such platforms with the incentive to earn an income provide lower-quality responses. Therefore, we conducted two studies by collecting data from the popular MTurk and Prolific platforms, Pollfish, a self-proclaimed mobile-first crowdsourcing platform, and the Qualtrics audience panel. By distributing the same study across these platforms, we examine data quality and factors that may affect it. In contrast to MTurk and Prolific, most Pollfish and Qualtrics respondents were mobile-based. Using an attentiveness composite score we constructed, we find mobile-based responses comparable with computer-based responses, demonstrating that mobile devices are suitable for crowdsourcing behavioral research. However, platforms differ significantly in attentiveness, which is also affected by factors such as the respondents' incentive for completing the survey, their activity before engaging, environmental distractions, and having recently completed a similar study. Further, we find that a stronger system 1 thinking is associated with lower levels of attentiveness and acts as a mediator between some of the factors explored, including the device used and attentiveness. In addition, we raise a concern that most MTurk users can pass frequently used attention checks but fail less utilized measures, such as the infrequency scale.
Project description:BackgroundRapid data sharing can maximize the utility of data. In epidemics and pandemics like Zika, Ebola, and COVID-19, the case for such practices seems especially urgent and warranted. Yet rapidly sharing data widely has previously generated significant concerns related to equity. The continued lack of understanding and guidance on equitable data sharing raises the following questions: Should data sharing in epidemics and pandemics primarily advance utility, or should it advance equity as well? If so, what norms comprise equitable data sharing in epidemics and pandemics? Do these norms address the equity-related concerns raised by researchers, data providers, and other stakeholders? What tensions must be balanced between equity and other values?MethodsTo explore these questions, we undertook a systematic scoping review of the literature on data sharing in epidemics and pandemics and thematically analyzed identified literature for its discussion of ethical values, norms, concerns, and tensions, with a particular (but not exclusive) emphasis on equity. We wanted to both understand how equity in data sharing is being conceptualized and draw out other important values and norms for data sharing in epidemics and pandemics.ResultsWe found that values of utility, equity, solidarity, and reciprocity were described, and we report their associated norms, including researcher recognition; rapid, real-time sharing; capacity development; and fair benefits to data generators, data providers, and source countries. The value of utility and its associated norms were discussed substantially more than others. Tensions between utility norms (e.g., rapid, real-time sharing) and equity norms (e.g., researcher recognition, equitable access) were raised.ConclusionsThis study found support for equity being advanced by data sharing in epidemics and pandemics. However, norms for equitable data sharing in epidemics and pandemics require further development, particularly in relation to power sharing and participatory approaches prioritizing inclusion. Addressing structural inequities in the wider global health landscape is also needed to achieve equitable data sharing in epidemics and pandemics.
Project description:Proponents of big data claim it will fuel a social research revolution, but skeptics challenge its reliability and decontextualization. The largest subset of big data is not designed for social research. Data augmentation-systematic assessment of measurement against known quantities and expansion of extant data with new information-is an important tool to maximize such data's validity and research value. Using trained research assistants or specialized algorithms are common approaches to augmentation but may not scale to big data or appease skeptics. We consider a third alternative: data augmentation with online crowdsourcing. Three empirical cases illustrate strengths and limitations of crowdsourcing, using Amazon Mechanical Turk to verify automated coding, link online databases, and gather data on online resources. Using these, we develop best practice guidelines and a reporting template to enhance reproducibility. Carefully designed, correctly applied, and rigorously documented crowdsourcing help address concerns about big data's usefulness for social research.
Project description:Spindle event detection is a key component in analyzing human sleep. However, detection of these oscillatory patterns by experts is time consuming and costly. Automated detection algorithms are cost efficient and reproducible but require robust datasets to be trained and validated. Using the MODA (Massive Online Data Annotation) platform, we used crowdsourcing to produce a large open-source dataset of high quality, human-scored sleep spindles (5342 spindles, from 180 subjects). We evaluated the performance of three subtype scorers: "experts, researchers and non-experts", as well as 7 previously published spindle detection algorithms. Our findings show that only two algorithms had performance scores similar to human experts. Furthermore, the human scorers agreed on the average spindle characteristics (density, duration and amplitude), but there were significant age and sex differences (also observed in the set of detected spindles). This study demonstrates how the MODA platform can be used to generate a highly valid open source standardized dataset for researchers to train, validate and compare automated detectors of biological signals such as the EEG.
Project description:Given globalization and other social phenomena, controlling the spread of infectious diseases has become an imperative public health priority. A plethora of interventions that in theory can mitigate the spread of pathogens have been proposed and applied. Evaluating the effectiveness of such interventions is costly and in many circumstances unrealistic. Most important, the community effect (i.e., the ability of the intervention to minimize the spread of the pathogen from people who received the intervention to other community members) can rarely be evaluated. Here we propose a study design that can build and evaluate evidence in support of the community effect of an intervention. The approach exploits molecular evolutionary dynamics of pathogens in order to track new infections as having arisen from either a control or an intervention group. It enables us to evaluate whether an intervention reduces the number and length of new transmission chains in comparison with a control condition, and thus lets us estimate the relative decrease in new infections in the community due to the intervention. We provide as an example one working scenario of a way the approach can be applied with a simulation study and associated power calculations.
Project description:Understanding how animals move within their environment is a burgeoning field of research. Despite this, relatively basic data, such as the locomotor speeds that animals choose to walk at in the wild, are sparse. If animals choose to walk with dynamic similarity, they will move at equal dimensionless speeds, represented by Froude number (Fr). Fr may be interpreted from simple limb kinematics obtained from video data. Here, using Internet videos, limb kinematics were measured in 112 bird and mammal species weighing between 0.61 and 5400 kg. This novel method of data collection enabled the determination of kinematics for animals walking at their self-selected speeds without the need for exhaustive fieldwork. At larger sizes, both birds and mammals prefer to walk at slower relative speeds and relative stride frequencies, as preferred Fr decreased in larger species, indicating that Fr may not be a good predictor of preferred locomotor speeds. This may result from the observation that the minimum cost of transport is approached at lower Fr in larger species. Birds walk with higher duty factors, lower stride frequencies and longer stance times compared to mammals at self-selected speeds. The trend towards lower preferred Fr is also apparent in extinct vertebrate species.
Project description:This article presents 14 quick tips to build a team to crowdsource data for public health advocacy. It includes tips around team building and logistics, infrastructure setup, media and industry outreach, and project wrap-up and archival for posterity.