Review metadata quality

Perspective

A good way to think about the quality review process is to reverse roles with the people who will be searching for data like yours.

  • Would your search find the catalog entry based on the keywords, geographic area, time frame, abstract, and other metadata elements?

  • Based on the title of the catalog entry, would you realize the record is relevant to your needs?

  • Is any information (or lack of information) in the record confusing?

  • Is any important or relevant information missing?

  • If you found the catalog entry you are reviewing, would you be satisfied with the level of detail? Or would you be frustrated by gaps and ambiguity?

In short, keep in mind that a wide variety of people across the world may use your metadata, and we want to make sure the data is discoverable and useful to them.

The following sections describe metadata best practices to guide your review of your new metadata record before submitting it for publication in GeoData. We recommend submitters seek a second opinion on the record from a colleague or peer who can provide content feedback.

Overall metadata considerations

The following criteria should be applied to all metadata fields:

  • All acronyms and abbreviations are defined.

  • The reader is not expected to understand the internal organizational structure of USDA or other represented agencies and organizations.

  • Use plain language. Make sentences easy to read and understandable by someone whose primary language is not English.

  • All URLs are correct and resolve properly. Resources are adequately described.

  • Text is formatted for display.
    • No “hard returns” resulting from copy/paste of fixed line length text.

    • No extraneous hyphens if copy/pasted from automatically hyphenated justfied text.

The following fields and content are generally mandatory for validation as well as acceptance into NAL GeoData:

Title

  • Briefly summarizes the dataset

  • Makes it easy to distinguish this dataset from others
    • Includes data description, location name, time frame, and/or other distinguishing characteristics

    • Data description phrase is not too broad (“agricultural data” is not particularly useful)

Abstract

  • Summarizes the Who, What, When, Where, Why, and How of the data

  • Starts with the most important information
    • Move details about the purpose of collecting the data to the Purpose element

  • Describes the GIS layer or dataset, NOT the publication in which the data are described
    • If there are publications, there should be online resources that link to them

Data time frame (temporal extent)

  • Beginning and ending dates/times are correct

  • For on-going data collection, the special value (“nil reason”) of “now” is used
    • Do not use the date the metadata record was written for continuing data

  • For on-going data that only periodically updates the corresponding online resource, the end date is the date of the last observation in the resource
    • The update frequency is specified

    • The metadata record will be updated when new data is published in the dataset

Location (geographic extent)

  • The bounding box closely corresponds to the location in which the data were collected
    • A bounding box that extends well beyond the data will result in bad geospatial search results (false positives) for users

  • Exception: The bounding box may be intentionally broad as allowed by legal mandates and approved data management policies
    • Examples include protection of cultural/rare resources, maintain anonymity private property locations, etc.

Keywords

  • Applied keywords represent significant aspects of the dataset

  • At least 1 keyword minimum has been added for the thesauri listed below

Keyword review:

The following keyword groups were essential for the pilot test with ARS unless otherwise noted:

  • ISO topics represent the main domain(s) for the data
    • Most agricultural data is included in the “farming” topic

    • “biota” is used for natural environments

    • “boundaries are legal and administrative

    • “environemental” is used for resources, protection and conservation

    • “geoscientificinformation” is earth sciences including soil, erosion and hydrogeology

  • Location is specified
    • Use both GCMD location keywords and FIPS codes

  • Particularly relevant location-based theme keywords must be included
  • ADC License Type
    • We encourage Creative Commons licenses, most notably U.S. Public Domain and Creative Commons CCZero for federally funded data

    • Other available licenses include Creative Commons Attribution and Creative Commons Attribution Share-Alike

    • See https://creativecommons.org/licenses/ for descriptions of the licenses

  • Crossref Funding
  • Mandatory keywords for ARS data inclusion in Data.gov:
    • OMB Bureau Codes

    • Program Code

    • Include National Program Number in free text keywords using the NPxxx protocol (your program number will replace the “xxx”)

  • NALT terms
  • Data Source Affiliation
    • During pilot phase for all ARS and LTAR groups

  • Ag Data Commons keywords
    • Improves categorization once ingested into this catalog

  • GCMD Earth Science keywords and GCMD Instrument keywords are strongly encouraged

Contacts

You must include at least one contact in each of the following sections:

  • Data: Point of Contact / Creator / Primary Investigator for the data

  • Metadata: Author of the metadata record in GeoData or primary contact for the metadata if created elsewhere

Effective use of snippets (reusable metadata elements)

  • Reusable components used for all contact information wherever possible

  • The appropriate person (permanent responsibility) or position (current responsibility) has been designated
    • Both data and metadata contact sections use snippets (if possible)