Project description:Biodiversity information in the form of species occurrence records is key for monitoring and predicting current and future biodiversity patterns, as well as for guiding conservation and management strategies. However, the reliability and accuracy of this information are frequently undermined by taxonomic and spatial errors. Additionally, biodiversity information facilities often share data in diverse incompatible formats, precluding seamless integration and interoperability. We provide a comprehensive quality-controlled dataset of occurrence records of the Class Demospongiae, which comprises 81% of the entire Porifera phylum. Demosponges are ecologically significant as they structure rich habitats and play a key role in nutrient cycling within marine benthic communities. The dataset aggregates occurrence records from multiple sources, employs dereplication and taxonomic curation techniques, and is flagged for potentially incorrect records based on expert knowledge regarding each species' bathymetric and geographic distributions. It yields 417,626 records of 1,816 accepted demosponge species (of which 321,660 records of 1,495 species are flagged as potentially correct), which are provided under the FAIR principle of Findability, Accessibility, Interoperability and Reusability in the Darwin Core Standard. This dataset constitutes the most up-to-date baseline for studying demosponge diversity at the global scale, enabling researchers to examine biodiversity patterns (e.g., species richness and endemicity), and forecast potential distributional shifts under future scenarios of climate change.
Project description:We compiled modern and fossil relative abundance of coccolithophore species Florisphaera profunda from published and unpublished datasets, along with ocean environmental variable data from satellite remote sensing and physical measurements. The database includes relative abundances of F. profunda in sediment trap (n = 26) and core-top (n = 1258), and sediment core samples (n = 104). Downcore data covers the Last Glacial Maximum (n = 94, 24-19 ka) or the Mid-to-Late Holocene (n = 77, <6 ka). This database allows studying modern and past biogeography of F. profunda as a response to changing ocean and climate conditions, "Quantitative reconstruction of primary productivity in low latitudes during the last glacial maximum and the mid-to-late Holocene from a global Florisphaera. profunda calibration dataset" (Hernández-Almeida et al., 2018).
Project description:Shallow groundwater (GW), defined as the water table of unconfined or perched aquifers that is near enough to the land surface to influence the vadose zone and the surface soil moisture, impacts land surface water, energy, and carbon cycles by providing additional moisture to the root zone via capillary fluxes. Although the interactions of shallow GW and the terrestrial land surface are widely recognized, incorporating shallow GW into the land surface, climate, and agroecosystem models is not yet possible due to the lack of groundwater data. Groundwater systems are affected by various factors, including climate, land use/land cover, ecosystems, GW extractions, and lithology. Although GW wells are the most direct and accurate way of monitoring water table depths at point scales, upscaling GW levels from point scale to areal or regional scale poses significant challenges. Here, we provide high spatiotemporal resolution global maps of the terrestrial land surface areas influenced by shallow GW from mid-2015 to 2021 (a separate NetCDF file for each year) in a 9 km spatial and daily temporal resolution. We derived this data from NASA's Soil Moisture Active Passive (SMAP) mission spaceborne soil moisture observations with a temporal resolution of 3 days and approximately 9 km grid resolution. This spatial scale corresponds to SMAP's "Equal Area Scalable Earth" (EASE) grids. The central assumption is that the monthly moving average of soil moisture observations and their coefficient of variation are sensitive to shallow GW regardless of the prevailing climate. We process the Level-2 enhanced passive soil moisture SMAP (SPL2SMP_E) product to detect shallow GW signals. The presence of shallow GW data is calculated by an ensemble machine learning model, which is trained using simulations from a variably saturated soil moisture flow model (Hydrus-1D). The simulations span various climates, soil textures, and lower boundary conditions. The spatiotemporal distribution of shallow GW data based on SMAP soil moisture observations is provided for the first time with this dataset. The data are of value in a wide variety of applications. The most direct use is in climate and land surface models as lower boundary conditions or as a diagnostic tool to verify model results. Some other applications may include flood risk analyses and regulation, identifying geotechnical issues such as shallow GW-triggered liquefaction, global food security, ecosystem services, watershed management, crop yield, vegetation health, water storage trends, and tracking mosquito-borne diseases by identifying wetlands, among other applications.
Project description:Species distribution records are a prerequisite to follow climate-induced range shifts across space and time. However, synthesizing information from various sources such as peer-reviewed literature, herbaria, digital repositories and citizen science initiatives is not only costly and time consuming, but also challenging, as data may contain thematic and taxonomic errors and generally lack standardized formats. We address this gap for important marine ecosystem-structuring species of large brown algae and seagrasses. We gathered distribution records from various sources and provide a fine-tuned dataset with ~2.8 million dereplicated records, taxonomically standardized for 682 species, and considering important physiological and biogeographical traits. Specifically, a flagging system was implemented to signal potentially incorrect records reported on land, in regions with limiting light conditions for photosynthesis, and outside the known distribution of species, as inferred from the most recent published literature. We document the procedure and provide a dataset in tabular format based on Darwin Core Standard (DwC), alongside with a set of functions in R language for data management and visualization.
Project description:A tracer breakthrough curve (BTC) for each sampling station is the ultimate goal of every quantitative hydrologic tracing study, and dataset size can critically affect the BTC. Groundwater-tracing data obtained using in situ automatic sampling or detection devices may result in very high-density data sets. Data-dense tracer BTCs obtained using in situ devices and stored in dataloggers can result in visually cluttered overlapping data points. The relatively large amounts of data detected by high-frequency settings available on in situ devices and stored in dataloggers ensure that important tracer BTC features, such as data peaks, are not missed. Alternatively, such dense datasets can also be difficult to interpret. Even more difficult, is the application of such dense data sets in solute-transport models that may not be able to adequately reproduce tracer BTC shapes due to the overwhelming mass of data. One solution to the difficulties associated with analyzing, interpreting, and modeling dense data sets is the selective removal of blocks of the data from the total dataset. Although it is possible to arrange to skip blocks of tracer BTC data in a periodic sense (data decimation) so as to lessen the size and density of the dataset, skipping or deleting blocks of data also may result in missing the important features that the high-frequency detection setting efforts were intended to detect. Rather than removing, reducing, or reformulating data overlap, signal filtering and smoothing may be utilized but smoothing errors (e.g., averaging errors, outliers, and potential time shifts) need to be considered. Appropriate probability distributions to tracer BTCs may be used to describe typical tracer BTC shapes, which usually include long tails. Recognizing appropriate probability distributions applicable to tracer BTCs can help in understanding some aspects of the tracer migration.
Project description:Tidal marshes store large amounts of organic carbon in their soils. Field data quantifying soil organic carbon (SOC) stocks provide an important resource for researchers, natural resource managers, and policy-makers working towards the protection, restoration, and valuation of these ecosystems. We collated a global dataset of tidal marsh soil organic carbon (MarSOC) from 99 studies that includes location, soil depth, site name, dry bulk density, SOC, and/or soil organic matter (SOM). The MarSOC dataset includes 17,454 data points from 2,329 unique locations, and 29 countries. We generated a general transfer function for the conversion of SOM to SOC. Using this data we estimated a median (± median absolute deviation) value of 79.2 ± 38.1 Mg SOC ha-1 in the top 30 cm and 231 ± 134 Mg SOC ha-1 in the top 1 m of tidal marsh soils globally. This data can serve as a basis for future work, and may contribute to incorporation of tidal marsh ecosystems into climate change mitigation and adaptation strategies and policies.
Project description:Soil microbial biomass carbon (SMBC) is important in regulating soil organic carbon (SOC) dynamics along soil profiles by mediating the decomposition and formation of SOC. The dataset (VDMBC) is about the vertical distributions of SOC, SMBC, and soil microbial quotient (SMQ = SMBC/SOC) and their relations to environmental factors across five continents. Data were collected from literature, with a total of 289 soil profiles and 1040 observations in different soil layers compiled. The associated environment data collectd include climate, ecosystem types, and edaphic factors. We developed this dataset by searching the Web of Sciene and the China National Knowledge Infrastructure from the year of 1970 to 2019. All the data in this dataset met two creteria: 1) there were at least three mineral soil layers along a soil profile, and 2) SMBC was measured using the fumigation extraction method. The data in tables and texts were obtained from literature directly, and the data in figures were extracted by using the GetData Graph digitizer software version 2.25. When climate and soil properties were not available from publications, we obtainted the data from the World Weather Information Service (https://worldweather.wmo.int/en/home.html) and SoilGrids at a spatial resolution of 250 meters (version 0.5.3, https://soilgrids.org). The units of all the variables were converted to the standard international units or commonly used ones and the values were transformed correspondingly. For example, the value of soil organic matter (SOM) was converted to SOC by using the equation (SOC = SOM × 0.58). This dataset can be used in predicting global SOC changes along soil profiles by using the multi-layer soil carbon models. It can also be used to analyse how soil microbial biomass changes with plant roots as well as the composition, structure, and functions of soil microbial communities along soil profiles at large spatial scales. This dataset offers opportunities to improve our prediction of SOC dynamics under global changes and to advance our understanding of the environmental controls.
Project description:There is an increasing evidence that smallholder farms contribute substantially to food production globally, yet spatially explicit data on agricultural field sizes are currently lacking. Automated field size delineation using remote sensing or the estimation of average farm size at subnational level using census data are two approaches that have been used. However, both have limitations, for example, automatic field size delineation using remote sensing has not yet been implemented at a global scale while the spatial resolution is very coarse when using census data. This paper demonstrates a unique approach to quantifying and mapping agricultural field size globally using crowdsourcing. A campaign was run in June 2017, where participants were asked to visually interpret very high resolution satellite imagery from Google Maps and Bing using the Geo-Wiki application. During the campaign, participants collected field size data for 130 K unique locations around the globe. Using this sample, we have produced the most accurate global field size map to date and estimated the percentage of different field sizes, ranging from very small to very large, in agricultural areas at global, continental, and national levels. The results show that smallholder farms occupy up to 40% of agricultural areas globally, which means that, potentially, there are many more smallholder farms in comparison with the two different current global estimates of 12% and 24%. The global field size map and the crowdsourced data set are openly available and can be used for integrated assessment modeling, comparative studies of agricultural dynamics across different contexts, for training and validation of remote sensing field size delineation, and potential contributions to the Sustainable Development Goal of Ending hunger, achieve food security and improved nutrition and promote sustainable agriculture.
Project description:Volcanic eruptions differ enormously in their size and impacts, ranging from quiet lava flow effusions along the volcano flanks to colossal events with the potential to affect our entire civilization. Knowledge of the time and size distribution of volcanic eruptions is of obvious relevance for understanding the dynamics and behavior of the Earth system, as well as for defining global volcanic risk. From the analysis of recent global databases of volcanic eruptions extending back to more than 2 million years, I show here that the return times of eruptions with similar magnitude follow an exponential distribution. The associated relative frequency of eruptions with different magnitude displays a power law, scale-invariant distribution over at least six orders of magnitude. These results suggest that similar mechanisms subtend to explosive eruptions from small to colossal, raising concerns on the theoretical possibility to predict the magnitude and impact of impending volcanic eruptions.
Project description:Vegetation impacts on ecosystem functioning are mediated by mycorrhizas, plant-fungal associations formed by most plant species. Ecosystems dominated by distinct mycorrhizal types differ strongly in their biogeochemistry. Quantitative analyses of mycorrhizal impacts on ecosystem functioning are hindered by the scarcity of information on mycorrhizal distributions. Here we present global, high-resolution maps of vegetation biomass distribution by dominant mycorrhizal associations. Arbuscular, ectomycorrhizal, and ericoid mycorrhizal vegetation store, respectively, 241 ± 15, 100 ± 17, and 7 ± 1.8 GT carbon in aboveground biomass, whereas non-mycorrhizal vegetation stores 29 ± 5.5 GT carbon. Soil carbon stocks in both topsoil and subsoil are positively related to the community-level biomass fraction of ectomycorrhizal plants, though the strength of this relationship varies across biomes. We show that human-induced transformations of Earth's ecosystems have reduced ectomycorrhizal vegetation, with potential ramifications to terrestrial carbon stocks. Our work provides a benchmark for spatially explicit and globally quantitative assessments of mycorrhizal impacts on ecosystem functioning and biogeochemical cycling.