Among environmental variables, climate data are perhaps the most readily available, relevant for the distribution of organisms on a global scale, and provide essential information for determining impacts of climate change on distribution [111,112]. Yes Another direct impact of biotechnology could be episodic genetic erosion, which could threaten the genetic diversity on which this technology depends. Applications of biodiversity theories in conservation, Department of Ecosystem Modelling, Georg-August Universitt Gttingen, This is an open access article distributed under the terms of the, https://doi.org/10.7287/peerj.preprints.27054v1. Pointing out typos is fine, but authors are encouraged to accept only substantially helpful feedback. This cookie is set by GDPR Cookie Consent plugin. Researchers who have compiled data from multiple sources for a particular analysis can better ensure that these data are accessible and get credit for the work involved in integrating datasets by formally publishing data with descriptive metadata and obtain a persistent DOI [75]. The prevalence of inaccessible databases and incomplete database citations indicates that many biodiversity researchers lack the resources to manage and preserve data for the long term and/or are unaware of best practices. The best way for taxonomic experts to help ensure that nomenclature for their group is current is to engage with the community-supported and specialist-edited taxonomic database projects in their respective fields. The most traditional use of collections data is for taxonomy, so it is not surprising that over 50% of taxonomy papers also involve collections and literature data. The biggest obstacle for biodiversity data users is obtaining records of sufficient quantity and quality for the region and taxonomic group of interest [24,25]. The second and third most common environmental data types used were geographic and habitat, which usually included GIS layers for elevation and land use and/or vegetation (see S1 Table). Such efforts unlock previously inaccessible data and expand their availability to researchers around the world. However, it is unclear how often studies actually address issues of error and bias when using opportunistic records. In an age when population is exponentially increasing and biodiversity is being depleted due to man-made environmental degradation, biotechnology should come to the rescue of mankind by providing greater and efficient means of utilizing the available biodiversity. We characterize a variety of ways in which researchers are using species occurrence records by assessing the prevalence of individual tags corresponding to topics of interest. Elevation, land use, and vegetation data are also among the most readily available environmental data types, and are often relevant for evaluating species distribution at smaller spatial scales [113]. Many papers include more than one taxon, and we use an all taxa categorization for studies that use all available data within the species occurrence database(s), such as GBIF. However, most records in GBIF, for example, still do not have uncertainty radii; in a recent assessment of GBIF records for Odonata, Ephemeroptera, Plecoptera, and Trichoptera from the U.S.A., we found that the percentage of records with uncertainty radii associated with them was only 736% for these aquatic insect groups (as of April 2017). We obtained citation numbers for each paper from the GS search results at the time of downloading records (April 2017) [58]. broad scope, and wide readership a perfect fit for your research every time. Validation, http://www.conabio.gob.mx/remib_ingles/doctos/remibnodosdb.html? Share Your PDF File To do this, we characterized 501 papers that use openly accessible biodiversity databases. The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. Geographic errors (or missing information) may be more readily corrected and associated with appropriate uncertainty estimates using standardized methods [31,37] and online tools (i.e. The cookies is used to store the user consent for the cookies in the category "Necessary". Other forms of bias were rarely addressed in only 12% of papers and include temporal bias (usually seasonal bias for certain times of year, or bias for certain years where specialists are active), taxonomic bias (e.g. The use of species occurrence data for conservation followed predicted trends. Other invertebrate phyla, such as Mollusca, are highly diverse as well (estimated 70,00076,000 living species) [95]. However, this may be an effect of small sample sizes. We then determined the average number of citations for papers involving each data use. Data papers and those describing a new database will increase over time as new venues have grown supporting such publications. data papers, n = 117), taxonomy (n = 95), conservation (n = 68), data quality (n = 68), invasive species (n = 61), and that described a new database (n = 60, Fig 1); see S1 Table for full descriptions of each category of research use. This would raise the value of the material, resulting in increased collection pressure on that plant, which in turn would lead to overexploitation and species loss. Data quality papers tend to focus evenly on the two most easily corrected issues (spatial and taxonomic, each 40% of data quality papers), followed by accounting for spatial bias (29% of data quality papers), effort (25%), and correcting specimen identification (18%). We argue that the scale of data that needs processing, along with issues of often sparse data, data obsolescence [109], and data of uncertain quality, make large-scale analyses challenging for anyone but a small group of data sciences-savvy end users. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Our standardized tagging protocol was based on key topics of interest, including: database(s) used, taxa addressed, general uses of data, other data types linked to species occurrence data, and data quality issues addressed. This cookie is set by GDPR Cookie Consent plugin. Copyright: 2019 Ball-Damerow et al. However, it is possible that many studies simply use available data and may not appropriately evaluate data quality. The only data types that have increased over time were specimen collection, genetic, and phylogenetic data (Fig 7). Methodology, Roles While studies overall were less common for vertebrates than for plants, vertebrates may generally be more suitable for distribution studies because the group is less diverse, many collections are completely digitized, there are prolific citizen science communities reporting bird observations in particular, and data for individual species are more likely to contain sufficient numbers of records. Many taxa and regions are still highly under-sampled or completely unrepresented (e.g. Globally, we find that biodiversity databases are still in the initial stages of data compilation. Biotechnological methods lead to the identification of a plant material for an important pharmaceutical use. Such efforts require both taxonomic and geospatial skills, although some automation may be possible [128]. The overall prevalence of plants in this work corroborated a recent bibliometric study, which found that 56% of biodiversity-related papers addressed plants, compared to 29% for vertebrates and 23% for invertebrates [90]. Yes Welcome to BiologyDiscussion! The authors declare that they have no competing interests. See S1 Table for detailed descriptions of each research type. Traditional methods for dealing with these issues may include subsampling, data aggregation, and additional surveys [7]. We found that drift and stochasticity appear much less frequently in conservation studies than selection processes typical of niche theory. The prevalence of most uses did not change from 20102016, with the exception of data papers and taxonomy-related studies, which both increased (Fig 2); taxonomy studies usually involved developing regional species checklists. When data quality is addressed, it is usually done manually, and workflows are difficult to document, extend, and share. This is particularly true with reference to: (i) increased availability food, feed and other renewable raw materials; (ii) improved human health and hygiene; (iii) greater protection of the environment, and (iv) enhancement of bio safety and environment-friendly technologies. Adverse biological effects on non-target populations and ecological and evolutionary disruption may be either the direct result of the introduced transgene(s) or alternatively the indirect result of socioeconomic conditions related to the application of recombinant DNA technologies. The most commonly studied taxa were plants (n = 232 papers, 46%), followed by invertebrates (n = 125, 25%), vertebrates (n = 124, 25%), all taxa (n = 40, 8%), fungi (n = 16, 3%), and paleontological specimens (n = 14, 3%; Table 3). http://www.environment.gov.au/science/abrs/publications/other/numbers-living-species/contents#copyright. The total number of invertebrate studies was equivalent to the total number of vertebrate studies (Fig 3). Non-experts can check for spatial outliers or incorrect georeferences using standardized methods and online georeferencing tools [37,117]. Once a country attains the capacity to manage its genetic resources, it will automatically enable it to produce novel products from its own biodiversity. Functional cookies help to perform certain functionalities like sharing the content of the website on social media platforms, collect feedbacks, and other third-party features. This is best illustrated by an example. Some GM crops have been shown to affect soil ecosystems by decreasing the rate of decomposition of organic wastes, affecting carbon and nitrogen levels and decreasing the diversity of soil microbial populations. At the same time, new approaches to publishing and integrating previously disconnected data resources promise to help provide the evidence needed for more efficient and effective conservation and management. Unfortunately, the decline in resources devoted to the field of taxonomy does not bode well for achieving a unified taxonomic backbone usable for resolving all taxonomic issues [124,125]. taxonomy, ecology, biodiversity informatics) and geographical regions [110]. PLOS ONE promises fair, rigorous peer review, This is a question and answer forum for students, teachers and general visitors for exchanging articles, answers and notes. Uses involving other online data types (i.e. Visualization, Data Availability: The data files underlying this article are published in Zenodo, and can be found at: https://zenodo.org/record/2589439#.XKfWOutKjBI (DOI: 10.5281/zenodo.2589439). One major problem is that many papers using biodiversity data have obtained data from an aggregator, such as GBIF, which has potentially drawn from thousands of original data sources. The indirect impacts of biotechnology on biodiversity are predominantly socioeconomic ones, operated through human economic and social systems. Automated data quality annotations are growing within the major online data aggregators (e.g. Is the Subject Area "Taxonomy" applicable to this article? Even with correct identification, names in species occurrence repositories may still be incorrect and need validation [36]. Conceptualization, Birds in particular have relatively good data available, in part because of online citizen science efforts and associated open data platforms, such as eBird [3]. Disclaimer Copyright, Share Your Knowledge Some expected trends include the following: We identify 347 primary biodiversity databases used in papers from our dataset (S2 Table), the URL for each database, and the scale (institution, regional, global, taxa) and regional or taxonomic focus (e.g. What are the most common uses, general taxa addressed, and data linkages, and how have they changed over time? elevation; n = 106, Fig 6). Sources of potential biases in opportunistic occurrence data have also been well-documented in previous work and generally include variation in collection effort and taxonomic, spatial, and temporal biases [4,4045]. As predicted, climate is often a critical data type linked to occurrence records, especially for species distribution where it is the most commonly linked data type, and for diversity/population studies where it is a close second. butterflies, Danaus plexippus). Even for those who attempt to cite sources, many journals do not allow large numbers of citations in the reference section, and the only solution is to cite sources in a supplement or appendix which does not provide citation credit [77]. ); and 4.) Data types fall within one of four categories, including 1.) attributes of occurrence information, 2.) Methodology, Twenty-six percent (n = 501; see S1 File for citation information) of the papers in the final evaluated dataset (n = 1,934) were relevant according to these criteria. Australia, fish) of each database. Furthermore, neutral theory makes less intuitive assumptions than niche theory and does not consider trophic interactions. We determine how studies link primary biodiversity data to other data types by characterizing the variety of data compiled and used in each study (see S1 Table for full descriptions of 28 data linkage tags). Continued growth of data publications will enhance the efficiency and relevance of the field in addressing biodiversity conservation and environmental management. Is the biodiversity research community citing databases appropriately, and are the cited databases currently accessible online? This sector is expected to contribute up to 50% of the world economy in near future. However, identifying decline requires large numbers of records along with systematically collected surveys over time, which often do not exist for rare and potentially threatened species [108]. GEOLocate, www.geo-locate.org). But if specimens exist, this information can be verified or corrected by taxonomic experts. The average number of data linkages per paper was four (ranging from one to 11). The combined data of massive authority file efforts spanning multiple taxon groups, such as those covered by WoRMS, allow for novel approaches to data analysis [127]. data papers and new database development). In addition, we determine prevalence of these tags over time to assess positive or negative trends. The number of species addressed will increase over time as more data become available online and projects leverage broader-scale data. Continued data digitization, publication, enhancement, and quality control efforts are necessary to make biodiversity science more efficient and relevant in our fast-changing environment. What data quality issues tend to be addressed for the top uses? e0215794. However, only a subset of these have uncertainty radii associated. Conceptualization, Most papers had multiple use tags assigned (mean = 2.5, max = 7). It is concerning that a relatively large proportion of studies does not explicitly address data qualityonly 69% of studies in our dataset reported addressing one or more aspects of data quality. Google Scholar (GS) provides full-text indexing, which was important for identifying data sources that often appear buried in the methods section of a paper. We ended with 31 potential research use tags, as listed and described in S1 Table. We developed a list of potential tags and descriptions for each topic; a full list with descriptions of tags is provided in S1 Table. Novel and integrative applications are restricted to certain taxonomic groups and regions with higher numbers of quality records. Biases that result from variation in collection effort across space, time, taxonomic groups, and environments are also well-known problems in opportunistic biodiversity records [32,41,42,92]. Data quality improvements on a large scale will require additional investment in data enhancements (e.g. Roles No, Is the Subject Area "Conservation science" applicable to this article? What uses have the highest impact, as measured through the mean number of citations per year? The most common data uses associated with the major taxonomic groups reflect the general maturity of data products associated with the respective group. The relatively high percentage of data papers that involve collections data (44%) reflects recent digitization efforts for natural history collections [1,9,13,116]. The most commonly checked data quality issues for papers involving species distribution were spatial errors (28% of distribution studies), taxonomic nomenclature (27%), spatial bias (24%), specimen identification (21%), and excluding inappropriate records (19%; Table 6). Given the speed of taxonomic concept changes [126], lack of updated resources is a significant impediment to proper data integration. data from literature, field surveys, species catalogues, private data); 2.) Other uncategorized cookies are those that are being analyzed and have not been classified into a category as yet. Conceptualization, Many records are also prone to missing important information or information loss over time, particularly the absence of geographic coordinates and associated uncertainty estimates [31]. As a result of insufficient data citation practices and lack of data preservation, data are either completely lost or it is impossible to reproduce the dataset used and results. Continued advocacy for data publication will be important to maximize the potential usability of all biodiversity data. The high prevalence of studies compiling occurrence records from other sources indicates a continued demand for more and continued specimen sampling, and the need for more progress in getting these data into online databases (i.e. Sometimes the compiled data eventually make it into online data aggregators, such as GBIF, and sometimes they do not. Marco Sciaini analyzed the data, authored or reviewed drafts of the paper, approved the final draft, conducted systematic literature search. The increasing number of data papers over time reflects progress in digitization and online platforms for reporting observations through citizen science, as well as increases in journals that support data publication. Funding acquisition, Data curation, While distribution studies were still the most common application across groups, significantly smaller percentages of plant (33%) and invertebrate (41%) studies dealt with species distribution. Large-scale improvements in data availability and fitness will require interdisciplinary effort and collaboration. Project administration, While vertebrates have more data, they are by no means complete [102]; less-studied vertebrates (i.e. Other groups may lack online sources or have sources that are significantly out of date [123]. The cookie is used to store the user consent for the cookies in the category "Other. The most cited databases include: the Global Biodiversity Information Facility (GBIF [10]), Barcode of Life Data Systems that includes species occurrence and genetic data (BOLDSystems [59, 60]), SpeciesLink [61], Ocean Biogeographic Information System (OBIS [62]), Australasian Virtual Herbarium (AVH [63]), Tropicos [64], FishBase [65,66], Fishes of Texas [67], and CONABIO REMIB (Table 1, [68]); note that we did not find significant changes over the study time period (20102017) in usage of individual databases, likely due to insufficient data points per year. Our study corroborated a recent bibliometric analysis of the larger field of biodiversity research, finding that more studies address plants (46% of studies using biodiversity databases) than vertebrates (25%) and invertebrates (25%). However, we still have not reached the major goal of having online taxonomic data sources that are consistently updated by taxonomic experts for all species, although community-supported resources such as FishBase [65], WoRMS [120], and the latters affiliated databases such as MilliBase [121], and MolluscaBase [122] are approaching that goal for many taxonomic groups. In the aforementioned survey assessment of user needs for primary biodiversity data [23,24], these same categories of use were among the top ways in which people listed that they use primary biodiversity data. Which types of bacteria are used in Bt-cotton? Writing review & editing. Indirect impacts of biotechnology are immense and of very great relevance to people in developing countries who rely directly on biodiversity for their sustenance. Study reproducibility, strongly linked to data persistence [78], is a key principle in the scientific process and a growing concern across scientific disciplines (e.g. You also have the option to opt-out of these cookies. J. Damerow subsequently checked each tagged paper from the first 1,000 papers to maintain consistency and became the sole tagger for an additional 934 papers. Writing original draft, For example, micro propagation and the consequent production of identical clones discourage perpetuation of genetic diversity through evolutionary adaptations. Neutral theory assumes that the establishment and success of an individual in a community does not depend on its species identity, but is instead predominantly driven by a stochastic process. Museums and funding agencies have invested considerable resources to digitize information from natural history specimens, make their data openly accessible [11,12], and sustain platforms to provide access to those data. Necessary cookies are absolutely essential for the website to function properly. Our overarching goal in this study is to determine how such usage has developed since 2010, during a time of unprecedented growth of online data resources. Previous work has outlined best practices for publication of biodiversity data [6974] and scientific data more generally (e.g. Methodology, Writing review & editing, Roles Data papers and papers describing a new database have increased over time (Fig 2), which is likely to be the result of the introduction and expansion of many data journals [69,85], online platforms for reporting species occurrence observations such as iNaturalist [86] and eBird [3,87], and efforts over the past decade to digitize specimen records [1,13]. Spatial errors and taxonomic nomenclature are generally the easiest data quality errors to correct. Estimates for rates of collection misidentification range from 560%, depending on the taxonomic group [11,34,35]. Writing review & editing, Affiliation We then determine the average number of data link tags associated with the six top uses, and the most common data type associated with each of these top uses. https://doi.org/10.1371/journal.pone.0215794.g003, https://doi.org/10.1371/journal.pone.0215794.t003. No, Is the Subject Area "Species interactions" applicable to this article? Another possible direct impact of GMOS raised for conferring viral resistance is the likely emergence of new viruses with new biological characteristics through recombination. This is a preprint submission to PeerJ Preprints. Many users outside of the community of trained collection scientists may not understand or be interested in taxonomic concepts [1]. Data quality studies often included a variety of data linkages, with little sorting of top linkages, likely representing the high dimensionality of data quality issues. When data are available, researchers must check for common errors and biases known to occur in opportunistic datasets that are often assembled over long time periods (e.g. No, Is the Subject Area "Plant taxonomy" applicable to this article? https://doi.org/10.1371/journal.pone.0215794.t007. No, Is the Subject Area "Invertebrates" applicable to this article? Several direct non-target effects on beneficial and native organisms by GMOS have been reported. Online taxonomic catalogues and tools to check records against updated catalogues are available for correcting taxonomic nomenclature [118,119]. PLoS ONE 14(9): We characterize papers that address major data quality issues known to be associated with species occurrence data, including both common errors and biases. The top research uses for online species occurrence databasesfrom our dataset of 501 relevant paperswere studies on species distribution (n = 175), diversity/population studies that usually assess species richness (n = 122), dataset description (i.e. Most papers focused on numbers of species in the single or double digits (Table 4). Neutral theory provides the benefits of a community theory whereas niche theory focuses on single species. Our search was therefore restricted to GS and to the time period of 2010 through the date of the search (April 2017; note when looking at trends over time we remove 2017, as the year was not complete in our dataset). Funding: This research was supported in part through a Bass Postdoctoral fellowship to J. Ball-Damerow at the Field Museum of Natural History (Chicago, USA), under the mentorship of P. Sierwald and R. Bieler, and by the Negaunee Foundation. The Relationship between Biotechnology and Biodiversity is Multidirectional (UNEP 1995): (i) Biotechnology or Molecular Biology provides very powerful tools for critical assessment of biodiversity, especially genetic diversity, and consequently the identification of potential bio-resources. This cookie is set by GDPR Cookie Consent plugin. Online databases with detailed information on organism occurrences collectively contain well over one billion records, and the numbers continue to grow. Our world is in the midst of unprecedented changeclimate shifts and sustained, widespread habitat degradation have led to dramatic declines in biodiversity rivaling historical extinction events. How often are major data quality issues addressed? Therefore, despite misidentification being a well-known problem, this issue is less often directly addressed in papers. Explain with suitable example. Neutral approaches have been used in conservation to generate realistic species-abundance distributions and species-area relationships, provide a standard against which to compare species loss, prioritize species protection, model biological invasions, and support protected area design. The biodiversity community is still in an active stage of compiling existing biodiversity data and dealing with issues of data quality. Yeast: Origin, Reproduction, Life Cycle and Growth Requirements | Industrial Microbiology, How is Bread Made Step by Step? Draw a neatly labeled diagram of chloroplast found in leaf, and its role in photosynthesis? The most common data quality issues addressed will be checks for correct taxonomic nomenclature and georeferences, which can often be assessed with readily-available online resources. Adverse impacts on biodiversity through the introduction of GMOS may also result from disturbance of the dynamic population equilibrium of ecosystems. Writing review & editing, Affiliation Automating digitization of such specimens, especially pinned insects and fluid-preserved invertebrates, faces significant obstacles [12,18,97100]. The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional". Many researchers do not sufficiently cite databases used [76,77], and links to many databases become invalid over time [7880]. environmental data (e.g. However, papers are relevant if they use these other types of occurrence data in addition to online databases of primary occurrence records (see section on data linkages, below), or if they compile these types of occurrence records and deposit them into an existing online biodiversity data aggregator (e.g. The prevalence of plants in studies that use online biodiversity databases may be due to a strong history of plant diversity work in Europe in particular, and the relative ease with which herbarium records can be digitized by scanning herbarium sheets. Environmental data used in conjunction with online biodiversity records are often applied in studies of species distribution. A higher percentage of data papers, taxonomy, and barcoding papers involved invertebrates (Fig 4), reflecting in part the high taxonomic diversity for this group and need for more data. Conceptualization, We also determine uses with the highest number of citations, how online occurrence data are linked to other data types, and if/how data quality is addressed. Papers with the highest maximum number of citations per year focused on disease ecology, species diversity, and publishing data (each with a maximum of 97 citations/year; Table 2); we did not account for self-citation here. The terms included: species occurrence database (8,800 results), natural history collection database (634 results), herbarium database (16,500 results), biodiversity database (3,350 results), primary biodiversity data database (483 results), museum collection database (4,480 results), digital accessible information database (10 results), and digital accessible knowledge database (52 results)note that quotations are used as part of the search terms where specific phrases are needed in whole. Data quality issues are often dictated by the specific use. Data on insect distributions are less complete (or non-existent) for most species and hence may not be suitable for distribution and conservation studies [92,93]. The best answers are voted up and rise to the top. Twenty-six percent of databases (n = 90) cited in one or more papers from our dataset were totally inaccessible at the time of this assessment. These cookies help provide information on metrics the number of visitors, bounce rate, traffic source, etc. An example is the transgenic Bt cotton plant, which affects a wide array of non-target insects such as butterflies, moths and beetles.