Identification of Data Flags

Code here written by Erica Krimmel.

General Overview

In this use case for the iDigBio API we look at how to search for specimen records that have a specific data quality flag. See here for more information about iDigBio’s data quality flags.

In this demo we will cover how to:

  1. Write a query to search for specimens using idig_search_records
  2. Explore data quality flags

Load Packages

# Load core libraries; install these packages if you have not already
library(ridigbio)
library(tidyverse)

# Load library for making nice HTML output
library(kableExtra)

Write a query to search for specimen records

First, let’s find all the specimen records for the data quality flag we are interested in. Do this using the idig_search_records function from the ridigbio package. You can learn more about this function from the iDigBio API documentation and ridigbio documentation.

In this example, we want to start by searching for specimens flagged with “rev_geocode_flip” which means that iDigBio has swapped the values of the latitude and longitude fields in order to place the coordinate point in the country stated by the record. For example, iDigBio ingests a record with the coordinates “-87.646166, 41.89542” that says it was collected in the United States, but the verbatim coordinates actually plot to Antarctica. If the latitude and longitude are flipped, then the coordinates plot to the United States, so iDigBio assumes that this is what the data provider meant.

# Edit the fields (e.g. `flags` or `institutioncode`) and values (e.g. 
# "rev_geocode_flip" or "fmnh") in `list()` to adjust your query and the fields
# (e.g. `uuid`) in `fields` to adjust the columns returned in your results
records <- idig_search_records(rq = list(flags = "rev_geocode_flip",
                                              institutioncode = "fmnh"),
                    fields = c("uuid",
                               "institutioncode",
                               "collectioncode",
                               "country",
                               "data.dwc:country",
                               "stateprovince",
                               "county",
                               "locality",
                               "geopoint",
                               "data.dwc:decimalLongitude",
                               "data.dwc:decimalLatitude"),
                    limit = 100000) %>% 
  # Rename fields to more easily reflect their provenance (either from the
  # data provider directly or modified by the data aggregator)
  rename(provider_lon = `data.dwc:decimalLongitude`,
         provider_lat = `data.dwc:decimalLatitude`,
         provider_country = `data.dwc:country`,
         aggregator_lon = `geopoint.lon`,
         aggregator_lat = `geopoint.lat`,
         aggregator_country = country,
         aggregator_stateprovince = stateprovince,
         aggregator_county = county,
         aggregator_locality = locality) %>% 
  # Reorder columns for easier viewing
  select(uuid, institutioncode, collectioncode, provider_lat, aggregator_lat,
         provider_lon, aggregator_lon, provider_country, aggregator_country,
         aggregator_stateprovince, aggregator_county, aggregator_locality)

Here is what our query result data looks like:

uuid institutioncode collectioncode provider_lat aggregator_lat provider_lon aggregator_lon provider_country aggregator_country aggregator_stateprovince aggregator_county aggregator_locality
032387ec-d2c0-4e31-9217-06142b99ab45 fmnh mammals -87.646166 41.89542 41.89542 -87.64617 USA united states illinois cook co NA
04dba613-bb9a-4281-8dba-eb4bf59cd777 fmnh mammals -88.107013 41.86614 41.86614 -88.10701 USA united states illinois dupage co. wheaton
05679624-d82c-4488-bd4b-ab13f40abb0b fmnh mammals 75 38.00000 38 75.00000 China china xinjiang uygur kashi pref taxkorgan tajik aut co, near little kara kul, ‘kara su’ river
0bdf0231-dae7-4de5-a43b-c756e96cb74e fmnh mammals -87.818397 42.03420 42.034196 -87.81840 USA united states illinois cook co. oak park, pheasent and harlem
0de28396-f117-4a0f-bca7-0d08cc58dc5a fmnh mammals -88.140531 41.79461 41.79461 -88.14053 USA united states illinois dupage co. naperville, 1520 maple knoll ct.
0f85a79f-2f5b-4d0d-8770-6e3f262c2834 fmnh mammals -88.090019 41.71987 41.719872 -88.09002 USA united states illinois will co. naperville, 25 w. 540 royce rd.
109555a6-3fcf-43ec-ae83-450ea6e85e5e fmnh fishes -80.85 -6.45000 -6.45 -80.85000 Peru peru NA NA lobos de tierra bay
1252e5dc-1fe6-4d78-a775-ba4c0ae5af67 fmnh mammals -88.067012 41.87753 41.877529 -88.06701 USA united states illinois dupage co. glen ellyn, roosevelt & park
1561e1ce-23b9-43ab-a59c-2b299037f5b2 fmnh mammals -87.973949 41.75198 41.751975 -87.97395 USA united states illinois dupage co. darien
1ac87a63-1df9-48c4-984b-85f52d8d1f95 fmnh mammals -88.050341 41.74697 41.746975 -88.05034 USA united states illinois dupage co. woodridge
1f734bc6-130c-48d2-b47f-26cacfa5c722 fmnh mammals 31.3999996 24.86667 24.8666706 31.40000 Egypt egypt matruh NA salum, sidi omar
22717ba0-9ec5-4e1c-88fc-26452b9cdb22 fmnh mammals 75 38.00000 38 75.00000 China china xinjiang uygur kashi pref taxkorgan tajik aut co, near little kara kul, ‘kara su’ river
27236f31-f92b-4f42-8c76-be4f38599fc7 fmnh mammals -88.050341 41.74697 41.746975 -88.05034 USA united states illinois dupage co. woodridge
2c475317-1113-4dca-a32c-9f7673026a98 fmnh mammals 31.3999996 24.86667 24.8666706 31.40000 Egypt egypt matruh NA salum, sidi omar
30fe1434-1e75-45e2-97d4-84520e0d1f90 fmnh mammals -88.107013 41.86614 41.86614 -88.10701 USA united states illinois dupage co. wheaton
334800d8-7f2f-472a-92ac-d242e6b4f2bc fmnh mammals -88.090019 41.71987 41.719872 -88.09002 USA united states illinois will co. naperville, 25 w. 540 royce rd.
33551039-8928-43fe-be46-26a2fb0f0150 fmnh invertebrate zoology -73 -41.67000 -41.67 -73.00000 Chile chile NA NA chaica, senode reloncavi, llongothue
37c644b4-b1d8-4ab4-8a82-306502700307 fmnh mammals -89.97818 42.08053 42.080535 -89.97818 USA united states illinois carroll co. mt. carroll, 1 mile south of mount carroll
38e73860-a1ee-4585-8687-945cdec490ca fmnh mammals -87.646166 41.89542 41.89542 -87.64617 USA united states illinois cook co NA
39c28ede-a1a2-4654-b530-62b926b522c6 fmnh mammals -88.087747 42.63685 42.636849 -88.08775 USA united states wisconsin kenosha co kansasville, 23000 burlington rd., 53139
3dc63226-5b38-47f0-b02d-ce6c1e99baea fmnh mammals -88.087747 42.63685 42.636849 -88.08775 USA united states wisconsin kenosha co kansasville, 23000 burlington rd., 53139
40f45ef7-1fc3-430e-9b84-d198ef87124a fmnh invertebrate zoology -70.012086 43.74296 43.742961 -70.01209 United States of America united states maine cumberland south harpswell
419092db-710b-4823-9ada-cef1dc27d413 fmnh mammals -87.67913 41.96874 41.968745 -87.67913 USA united states illinois cook co. chicago, damen and lawrence
422e0874-3c59-4e97-8838-ab0faed00b16 fmnh mammals -87.968099 42.27394 42.273935 -87.96810 USA united states illinois lake co. libertyville, 911 creastfield ave.
46e7dca6-bd0f-4710-a2ae-066e47a96e59 fmnh invertebrate zoology -73 -41.66670 -41.6667 -73.00000 Chile chile NA NA llangothie, senode, relocnavi, chaica
4b6340c2-8d61-4f06-8539-0c174cd03f3b fmnh mammals 75 38.00000 38 75.00000 China china xinjiang uygur kashi pref taxkorgan tajik aut co, near little kara kul, ‘subashi’ pass
4c5a8228-8b47-4c9b-b7b7-8a4748061691 fmnh mammals -88.058783 41.79092 41.790922 -88.05878 USA united states illinois dupage co. lisle, 5321 westview, 60532
4f4ecf74-48cd-4d44-bbec-117ce36cc805 fmnh mammals -89.869212 42.25056 42.250559 -89.86921 USA united states illinois stephenson co. near pearl city-loran/nw
4f56899f-6bfd-482a-89e8-d47f31ca6b73 fmnh mammals -88.107013 41.86614 41.86614 -88.10701 USA united states illinois dupage co. wheaton
5c836443-dfbc-4298-a0b9-499f587117b9 fmnh mammals -88.056212 41.88147 41.881469 -88.05621 USA united states illinois dupage co. glen ellyn, 735 cresent blvd.
6385b5e2-4219-4154-a4b1-aee2e297f0ee fmnh mammals -88.050341 41.74697 41.746975 -88.05034 USA united states illinois dupage co. woodridge
65b73f92-d0a6-4d95-b2ac-e9f9dcf703fc fmnh mammals -87.646166 41.89542 41.89542 -87.64617 USA united states illinois cook co NA
6b14ca0a-5a3c-4078-9629-385a0fbb0768 fmnh mammals -88.060564 41.84333 41.843331 -88.06056 USA united states illinois dupage co. glen ellyn, willowbrook nature trail
6ee726f3-18a8-402f-bdd1-5b8da939dfba fmnh mammals -88.107013 41.86614 41.86614 -88.10701 USA united states illinois dupage co. wheaton
7204a2b2-512b-4431-a95b-c0ed166a0633 fmnh mammals 29.75 24.83333 24.833334 29.75000 Egypt egypt matruh NA siwa oasis, el malfa swamp
7437a46e-f784-4b2b-ba13-b98254b5255b fmnh mammals 29.75 24.83333 24.833334 29.75000 Egypt egypt matruh NA el malfa, siwa, 110 km w
763c9b19-74a7-43a3-9a5b-4684fef8a585 fmnh mammals -88.174751 41.76673 41.766727 -88.17475 USA united states illinois dupage co. naperville, river and aurora
79ff24fc-5a16-4adc-8270-a4576176666c fmnh mammals -88.058356 41.87121 41.871205 -88.05836 USA united states illinois dupage co. glen ellyn, montclaire and turner
7e51a7c1-de80-4e19-82e8-231cbd440fb7 fmnh mammals -88.107013 41.86614 41.86614 -88.10701 USA united states illinois dupage co. wheaton
7e8abc27-d38a-47a6-937b-978822afc72f fmnh mammals -87.73599 41.79169 41.79169 -87.73599 USA united states illinois cook co. chicago, 5555 s. kolmar ave
7f5970d1-b69b-4255-a780-aed284ba1ac8 fmnh mammals -89.4903273 45.59772 45.5977178 -89.49033 USA united states wisconsin NA oneida, sec 29, town 36 n, range 8e
7fdb9011-93c3-4d11-a6c2-96e3a4764d19 fmnh mammals 31.3999996 24.86667 24.8666706 31.40000 Egypt egypt matruh NA salum, sidi omar
823e6998-3bc1-43b1-ab51-3f83b945219d fmnh mammals -87.670626 42.02282 42.022825 -87.67063 USA united states illinois cook co. chicago, 1550 w. juneway terrace, 60626
836d1a77-3eed-4785-8f3c-7f2bfb33d8ed fmnh mammals -88.087113 41.86226 41.862257 -88.08711 USA united states illinois dupage co. wheaton, blanchard and illinois
83d136cc-a41e-4128-a682-64aa6dba9e51 fmnh mammals -88.090019 41.71987 41.719872 -88.09002 USA united states illinois will co. naperville, 25 w. 540 royce rd.
920b9297-a114-4474-aa66-ddaaf6e5ca36 fmnh mammals 29.75 24.83333 24.833334 29.75000 Egypt egypt matruh NA siwa oasis, el malfa swamp
92535d43-dcaf-42b9-8e0d-6236a746847d fmnh mammals 75 38.00000 38 75.00000 China china xinjiang uygur kashi pref taxkorgan tajik aut co, near little kara kul, ‘kara su’ river
9757887b-ebef-485e-9acc-cd3ed0aa88e4 fmnh mammals -88.060564 41.84333 41.843331 -88.06056 USA united states illinois dupage co. glen ellyn, willowbrook nature trail
97d7edc5-17e3-44b1-8fa8-b2ccb11a9ab2 fmnh mammals -87.92895 41.83281 41.832808 -87.92895 USA united states illinois dupage co. oak brook, kimberly and charlatan
9e1b6b23-7b91-4f95-9a95-8c7ef0a232c8 fmnh mammals -87.963927 44.52909 44.529095 -87.96393 USA united states wisconsin brown co. green bay, 1660 e. shore dr. 54302

If a data provider wants to fix these records in a local collection management system, it might be useful to have them in a CSV format rather than only in R. Here is how we can save our results as a CSV:

# Save `records` as a CSV for reintegration into a local collection management
# system
write_csv(records, "records.csv")

It is important for you as a data provider or data user to review the results of the data quality flags and confirm that iDigBio’s interpretation matches your expectations. For example, coordinates representing marine localities and localities in or near Antarctica are prone to misinterpretation.