Project description:To reveal the neurophysiological underpinnings of natural movement, neural recordings must be paired with accurate tracking of limbs and postures. Here, we evaluated the accuracy of DeepLabCut (DLC), a deep learning markerless motion capture approach, by comparing it with a 3D X-ray video radiography system that tracks markers placed under the skin (XROMM). We recorded behavioral data simultaneously with XROMM and RGB video as marmosets foraged and reconstructed 3D kinematics in a common coordinate system. We used the toolkit Anipose to filter and triangulate DLC trajectories of 11 markers on the forelimb and torso and found a low median error (0.228 cm) between the two modalities corresponding to 2.0% of the range of motion. For studies allowing this relatively small error, DLC and similar markerless pose estimation tools enable the study of increasingly naturalistic behaviors in many fields including non-human primate motor control.
Project description:Quantifying movement is critical for understanding animal behavior. Advances in computer vision now enable markerless tracking from 2D video, but most animals move in 3D. Here, we introduce Anipose, an open-source toolkit for robust markerless 3D pose estimation. Anipose is built on the 2D tracking method DeepLabCut, so users can expand their existing experimental setups to obtain accurate 3D tracking. It consists of four components: (1) a 3D calibration module, (2) filters to resolve 2D tracking errors, (3) a triangulation module that integrates temporal and spatial regularization, and (4) a pipeline to structure processing of large numbers of videos. We evaluate Anipose on a calibration board as well as mice, flies, and humans. By analyzing 3D leg kinematics tracked with Anipose, we identify a key role for joint rotation in motor control of fly walking. To help users get started with 3D tracking, we provide tutorials and documentation at http://anipose.org/.
Project description:Deep learning-based approaches to markerless 3D pose estimation are being adopted by researchers in psychology and neuroscience at an unprecedented rate. Yet many of these tools remain unvalidated. Here, we report on the validation of one increasingly popular tool (DeepLabCut) against simultaneous measurements obtained from a reference measurement system (Fastrak) with well-known performance characteristics. Our results confirm close (mm range) agreement between the two, indicating that under specific circumstances deep learning-based approaches can match more traditional motion tracking methods. Although more work needs to be done to determine their specific performance characteristics and limitations, this study should help build confidence within the research community using these new tools.
Project description:Accurate and flexible 3D pose estimation for virtual entities is a strenuous task in computer vision applications. Conventional methods struggle to capture realistic movements; thus, creative solutions that can handle the complexities of genuine avatar interactions in dynamic virtual environments are imperative. In order to tackle the problem of precise 3D pose estimation, this work introduces TRI-POSE-Net, a model intended for scenarios with limited supervision. The proposed technique, which is based on ResNet-50 and includes integrated Selective Kernel Network (SKNet) blocks, has proven to be efficient for feature extraction customised specifically to pose estimation scenarios. Furthermore, trifocal tensors and their trio-view geometry allow us to generate 3D ground truth poses from 2D poses, resulting in more refined triangulations. Through the proposed approach, the 3D poses can be estimated from a single 2D RGB image. Moreover, the proposed approach was evaluated on the HumanEva-I dataset yielding a Mean-Per-Joint-Position-Error (MPJPE) of 47.6 under self-supervision and an MPJPE of 29.9 under full supervision. In comparison with the other works, the proposed work has performed well in the self-supervision paradigm.
Project description:Purpose : Tracking of tools and surgical activity is becoming more and more important in the context of computer assisted surgery. In this work, we present a data generation framework, dataset and baseline methods to facilitate further research in the direction of markerless hand and instrument pose estimation in realistic surgical scenarios.Methods : We developed a rendering pipeline to create inexpensive and realistic synthetic data for model pretraining. Subsequently, we propose a pipeline to capture and label real data with hand and object pose ground truth in an experimental setup to gather high-quality real data. We furthermore present three state-of-the-art RGB-based pose estimation baselines.Results : We evaluate three baseline models on the proposed datasets. The best performing baseline achieves an average tool 3D vertex error of 16.7 mm on synthetic data as well as 13.8 mm on real data which is comparable to the state-of-the art in RGB-based hand/object pose estimation.Conclusion : To the best of our knowledge, we propose the first synthetic and real data generation pipelines to generate hand and object pose labels for open surgery. We present three baseline models for RGB based object and object/hand pose estimation based on RGB frames. Our realistic synthetic data generation pipeline may contribute to overcome the data bottleneck in the surgical domain and can easily be transferred to other medical applications.
Project description:Pose estimation algorithms are shedding new light on animal behavior and intelligence. Most existing models are only trained with labeled frames (supervised learning). Although effective in many cases, the fully supervised approach requires extensive image labeling, struggles to generalize to new videos, and produces noisy outputs that hinder downstream analyses. We address each of these limitations with a semi-supervised approach that leverages the spatiotemporal statistics of unlabeled videos in two different ways. First, we introduce unsupervised training objectives that penalize the network whenever its predictions violate smoothness of physical motion, multiple-view geometry, or depart from a low-dimensional subspace of plausible body configurations. Second, we design a new network architecture that predicts pose for a given frame using temporal context from surrounding unlabeled frames. These context frames help resolve brief occlusions or ambiguities between nearby and similar-looking body parts. The resulting pose estimation networks achieve better performance with fewer labels, generalize better to unseen videos, and provide smoother and more reliable pose trajectories for downstream analysis; for example, these improved pose trajectories exhibit stronger correlations with neural activity. We also propose a Bayesian post-processing approach based on deep ensembling and Kalman smoothing that further improves tracking accuracy and robustness. We release a deep learning package that adheres to industry best practices, supporting easy model development and accelerated training and prediction. Our package is accompanied by a cloud application that allows users to annotate data, train networks, and predict new videos at scale, directly from the browser.
Project description:The rhesus macaque is an important model species in several branches of science, including neuroscience, psychology, ethology, and medicine. The utility of the macaque model would be greatly enhanced by the ability to precisely measure behavior in freely moving conditions. Existing approaches do not provide sufficient tracking. Here, we describe OpenMonkeyStudio, a deep learning-based markerless motion capture system for estimating 3D pose in freely moving macaques in large unconstrained environments. Our system makes use of 62 machine vision cameras that encircle an open 2.45 m × 2.45 m × 2.75 m enclosure. The resulting multiview image streams allow for data augmentation via 3D-reconstruction of annotated images to train a robust view-invariant deep neural network. This view invariance represents an important advance over previous markerless 2D tracking approaches, and allows fully automatic pose inference on unconstrained natural motion. We show that OpenMonkeyStudio can be used to accurately recognize actions and track social interactions.
Project description:Purpose: We present a markerless vision-based method for on-the-fly three-dimensional (3D) pose estimation of a fiberscope instrument to target pathologic areas in the endoscopic view during exploration. Approach: A 2.5-mm-diameter fiberscope is inserted through the endoscope's operating channel and connected to an additional camera to perform complementary observation of a targeted area such as a multimodal magnifier. The 3D pose of the fiberscope is estimated frame-by-frame by maximizing the similarity between its silhouette (automatically detected in the endoscopic view using a deep learning neural network) and a cylindrical shape bound to a kinematic model reduced to three degrees-of-freedom. An alignment of the cylinder axis, based on Plücker coordinates from the straight edges detected in the image, makes convergence faster and more reliable. Results: The performance on simulations has been validated with a virtual trajectory mimicking endoscopic exploration and on real images of a chessboard pattern acquired with different endoscopic configurations. The experiments demonstrated a good accuracy and robustness of the proposed algorithm with errors of 0.33±0.68 mm in distance position and 0.32±0.11 deg in axis orientation for the 3D pose estimation, which reveals its superiority over previous approaches. This allows multimodal image registration with sufficient accuracy of <3 pixels . Conclusion: Our pose estimation pipeline was executed on simulations and patterns; the results demonstrate the robustness of our method and the potential of fiber-optical instrument image-based tracking for pose estimation and multimodal registration. It can be fully implemented in software and therefore easily integrated into a routine clinical environment.
Project description:Navigated transcranial magnetic stimulation (nTMS) is a valuable tool for non-invasive brain stimulation. Currently, nTMS requires fixing of markers on the patient's head. Head marker displacements lead to changes in coil placement and brain stimulation inaccuracy. A markerless neuronavigation method is needed to increase the reliability of nTMS and simplify the nTMS protocol. In this study, we introduce and release MarLe, a Python markerless head tracker neuronavigation software for TMS. This novel software uses computer-vision techniques combined with low-cost cameras to estimate the head pose for neuronavigation. A coregistration algorithm, based on a closed-form solution, was designed to track the patient's head and the TMS coil referenced to the individual's brain image. We show that MarLe can estimate head pose based on real-time video processing. An intuitive pipeline was developed to connect the MarLe and nTMS neuronavigation software. MarLe achieved acceptable accuracy and stability in a mockup nTMS experiment. MarLe allows real-time tracking of the patient's head without any markers. The combination of face detection and a coregistration algorithm can overcome nTMS head marker displacement concerns. MarLe can improve reliability, simplify, and reduce the protocol time of brain intervention techniques such as nTMS.
Project description:Automatic feature extraction from images of speech articulators is currently achieved by detecting edges. Here, we investigate the use of pose estimation deep neural nets with transfer learning to perform markerless estimation of speech articulator keypoints using only a few hundred hand-labelled images as training input. Midsagittal ultrasound images of the tongue, jaw, and hyoid and camera images of the lips were hand-labelled with keypoints, trained using DeepLabCut and evaluated on unseen speakers and systems. Tongue surface contours interpolated from estimated and hand-labelled keypoints produced an average mean sum of distances (MSD) of 0.93, s.d. 0.46 mm, compared with 0.96, s.d. 0.39 mm, for two human labellers, and 2.3, s.d. 1.5 mm, for the best performing edge detection algorithm. A pilot set of simultaneous electromagnetic articulography (EMA) and ultrasound recordings demonstrated partial correlation among three physical sensor positions and the corresponding estimated keypoints and requires further investigation. The accuracy of the estimating lip aperture from a camera video was high, with a mean MSD of 0.70, s.d. 0.56 mm compared with 0.57, s.d. 0.48 mm for two human labellers. DeepLabCut was found to be a fast, accurate and fully automatic method of providing unique kinematic data for tongue, hyoid, jaw, and lips.