DataCite Schema

From The SBN PDS4 Wiki
Jump to: navigation, search

Analysis and summary of the DataCite DOI database schema version 4.3 (released August 2019) and the accompanying documentation, prepared with PDS data set applications in mind.

Note: Green notes indicate recommended values based on (so far limited) node discussions at SBN. Additions and suggestions are very welcome - please contact Anne Raugh.

<resource>

ROOT

This is the root of the submission file document, and thus required.

Note that the content of the <resource> element is defined under an xs:all group, so that the immediate child nodes can appear in any order. The order here reflects the order in which they are listed in the schema.

<identifier>

REQUIRED

This is the assigned DOI identifier and must begin with the DataCite prefix "10."

Required attribute: identifierType. This must have a value of DOI. For example:
     <identifier identifierType="DOI">10.12345/abcde</identifier>

<creators>

REQUIRED

This is the author list or equivalent. Creators should appear in priority order. Both individuals and institutions/organizations can be credited as creators.

If the only list we have is a list of editors, then it goes here, but also indicate in the <contributors> section that each person in the list is a contributor with the role of Editor. This is not an optimal solution, but it will do for now.

<creator>

REQUIRED, repeatable

This class contains the identifying information for one creator. It is repeated for each creator. Note that the metadata schema does not distinguish between "authors" and "editors". If some of the creators listed are editors rather than authors, repeat their creator information in the <contributors> section and indicate a role of "Editor".

Fill out as much of this class as possible with information that was apropos at the time of the publication. So, if you know given name, include it; if you know the institutional affiliation, include it; if you can find and confirm the correct ORCID, include it; etc. This all helps in linking the data set into the literature and the author's network.

<creatorName>

REQUIRED

The string representing the name as it should appear in citations. For personal names, the format is "Family, Given" (all one string). Full first names are preferred for the metadata, in order to help identify authors unambiguously. (It is also relatively easy to turn a complete name into an initial.) For organizational names, use the full, formal name of the organization.

This tag has one optional XML attribute, nameType, which takes one of these values:

Organizational
Personal

Best practice is to always include this attribute for creatorName.

<givenName>

OPTIONAL

For personal names, this should contain the string corresponding to the given name in the <creatorName> element. This should not be present for organizational names (but that is not enforced by the DataCite schema).

<familyName>

OPTIONAL

For personal names, this should contain the string corresponding to the family name (surname, patronymic, etc.). This should not be present for organizational names (but that is not enforced by the DataCite schema).

Note: For PDS purposes, at least, we should consider where to associate suffixes (Jr., Sr., III, etc.). If DataCite, ADS, or the AAS journals have a convention, we should follow that.

<nameIdentifier>

OPTIONAL, repeatable

This attribute provides a formal identifier for an individual or organization - for example, an author's ORCID, or an organizational DOI. Only public identifiers should appear here, of course. If there is more than one applicable identifier, this element may be repeated.

Required attribute: nameIdentifierScheme. The value should be the common name or acronym for the identifier given (like "ORCID").
Optional attribute: schemeURI. The value should be the URI of the defining organization or schema ("http://orcid.org", e.g.).

Where it can be obtained directly from the creator, or where we can be certain about a creator's ORCID, we should add it to the metadata.

<affiliation>

OPTIONAL, repeatable

This attribute provides an organizational affiliation (as free-format text) for the creator. It may be repeated. Organizational "creators" may also have affiliations, if that makes sense.

Optional attribute: affiliationIdentifier. The value should be a formal, permanent, unique identifier for the affiliated organization, beyond the name that is the value of the <affiliation> element. For example, a Research Organization Registry (ROR) ID would be ideal here.
Optional attribute: affiliationIdentifierScheme. The (human-recognizable) name of the identifier system (e.g., "ROR" would be sufficient in the above case.
Optional attribute: schemeURI. The URI of the identifier scheme, typically a URL.

<titles>

REQUIRED

This class lists names or titles for the resource being identified. At least one title must be provided. This is the title that will be used to format citations, and it is also the title that users will see returned by various search interfaces. It needs to make sense in a general, non-PDS search context, so avoid acronyms and assuming that users will know that, for example, "Deep Impact" is also the name of a spacecraft and NASA mission.

Remember the primary title has to make sense in the context of an ADS-type search. Users may not know that data sets are included in their result set. The title we put here will be used to formulate citations of the data set as well as helping users to identify data of potential interest. So avoid acronyms (no matter how obvious they seem now) unless they include the full name. For example, "DIF" is not good on its own, but "Deep Impact Flyby (DIF) Spacecraft" is very good - explicit and contains the acronym that knowledgeable users might search on as well.

<title>

REQUIRED, repeatable

This element contains a single title. It may be repeated for alternate titles where appropriate.

Optional attribute: titleType. This must have one of the following values:
  • AlternativeTitle
  • Subtitle
  • TranslatedTitle
  • Other

For PDS purposes, the formal title should always be listed first without a titleType attribute, and additional titles should probably always be either alternatives or translations and identified accordingly through the titleType attribute.

Optional attribute: xml:lang. This should contain one of the standard ISO 2- or 3-letter codes (but this is not validated). Note that this indicates only the language of the associated title string, not the language of the resource.

PDS documents are required to be in English, so we should have no use for this. Our IPDA partners, however, might. In that sort of context, it would be prudent to include the xml:lang attribute for all <title> elements, not just the one identified as the TranslatedTitle.

<publisher>

REQUIRED

This attribute identifies the publisher/distributor/curator of the resource. It is used in creating citations.

For PDS4 data sets that have LIDs that begin with urn:nasa:pds, this should always be NASA Planetary Data System. If we are assigning DOIs for other publishers (like ESA), we should first determine what their publisher title should be, and we should probably be using a DOI prefix that can be unique to that publishing archive.

If we are assigning a DOI and also serving the data for a non-PDS publisher, then we should list our node and facility in the <contributors> section, with a role of DataCurator. (The first time this comes up we should think about this again, just in case.)

Optional attribute: xml:lang. This attribute is used to indicate the language (using the standard ISO codes) of the publisher name. Unless otherwise specified, this is assumed to be in English.

<publicationYear>

REQUIRED

The year the resource was made available to the public. This is used in creating citations.

For PDS data sets, this must be the four-digit year in which the data were publicly posted in the format and version associated with this DOI. This may or may not be the same year as listed in the CITATION_DESC field for legacy data sets. When in doubt, assume the DATA_SET_RELEASE_DATE is correct unless you have documentation that proves it is not - and then use the date in that documentation and note the discrepancy and resolution in an additional <description> field (following) with a descriptionType of Other.

This date must agree with the Available date (see below) provided for ADS processing.

There are other date fields in which significant dates can be indicated if needed or desired.

<resourceType>

REQUIRED

This element takes a free-format text description of the type of resource associated with the DOI.

Required attribute: resourceTypeGeneral. This must have one of the following values:
  • Audiovisual
  • Collection
  • DataPaper
  • Dataset
  • Event
  • Image
  • InteractiveResource
  • Model
  • PhysicalObject
  • Service
  • Software
  • Sound
  • Text
  • Workflow
  • Other

Best/recommended practice is to consider the resourceTypeGeneral as the broader term which is then modified by the value string, so that a classification can be formed by concatenating the two with '/'. So, for example:

    <resourceType resourceTypeGeneral="Dataset">PDS4 Refereed Data Collection</resourceType>

would read as "Dataset/PDS4 Refereed Data Collection".

Best practice for "Text", specifically, is for the value to be taken from the CASRAI dictionary "Output Types" Sub-Element list at http://dictionary.casrai.org/Output_Types.

Here are the values to use for non-text SBN cases:

DOI target resourceTypeGeneral resourceType
PDS3 archived data set Dataset PDS3 Refereed Data Set
PDS4 archived data product Dataset PDS4 Refereed Data Product
PDS4 archived collection product Dataset PDS4 Refereed Data Collection
PDS4 archived bundle product, collections do not have DOIs Dataset PDS4 Refereed Data Bundle
PDS4 archived bundle product, collections have their own DOIs Collection PDS4 Refereed Data Collection
PDS3 safed data set Dataset PDS3 Save-the-bits Data Set

In general, we should aim to be consistent with Dublin Core usage for these terms, but decisions here will have consequences elsewhere in the database. Consistency across PDS would be highly desirable here.

<subjects>

OPTIONAL

This element lists keyword-type classifications as are commonly associated with journal articles.


<subject>

OPTIONAL

This element provides a string that corresponds to a keyword or similar classifier for the resource. It may be repeated as desired. Each occurrence should contain only a single taxonomic-type entry, and the taxonomy should be indicated via the optional attributes as far as possible.

Optional attribute: subjectScheme. This should be the name of the taxonomy or authority. There is no controlled value list.
Optional attribute: schemeURI. This should be a reference to the taxonomy definition or reference site.
Optional attribute: valueURI. If there is a URL, for example, for the definition of the specific term being used, include it here.
Optional attribute: xml:lang. Use this attribute to provide the standard ISO abbreviation for the language of the term.

PDS really should find a reference taxonomy to use specifically for this field and the corresponding label fields. Mike Kelley is currently working on enhancing the current Universal Astronomical Thesaurus (UAT) taxonomy for that. Use this as it becomes available (wiki page pending - email us if you have an immediate need).

Note that for hierarchical taxonomies, a single instance of <subject> should express the entire hierarchy as a single string in the appropriate notation - there is no implied relationship between <subject> elements.

<contributors>

OPTIONAL

This element provides a means for identifying people and organizations, other than the previously identified <creator>, who contributed to the creation, management, curation, distribution, etc., of the resource being described.

<contributor>

OPTIONAL

This element identifies a person or organization who made or makes some contribution to the resource. There is a required attribute to define the type of contribution. The element may be repeated as needed.

Required attribute: contributorType. This must have one of the following values:
  • ContactPerson
  • DataCollector
  • DataCurator
  • DataManager
  • Distributor
  • Editor
  • HostingInstitution
  • Producer
  • ProjectLeader
  • ProjectManager
  • ProjectMember
  • RegistrationAgency
  • RegistrationAuthority
  • RelatedPerson
  • Researcher
  • ResearchGroup
  • RightsHolder
  • Sponsor
  • Supervisor
  • WorkPackageLeader
  • Other
These are all defined in the appendix to the DataCite schema description document, but interpreting them into a PDS context should be done with care.
Here are some suggestions:
Contributor Role Use for
Data Collector People involved in collection or compiling the data object but not otherwise involved in generating labels or archive support information, and are not otherwise credited as authors or editors.
Data Curator The PDS Node/Subnode where the primary copy of the data physically reside.
Editor Someone who formatted or otherwise altered the content of data product files or archive labels created by someone else.
Producer Someone who is involved in designing and creating the PDS4 archive labels and support files, who is not otherwise credited.
Other contributor roles may be used if appropriate, but generic roles should be avoided and no role should be used gratuitously. Do not use the ContactPerson, or RelatedPerson roles in particular, as these do not age well in an archive and will not be maintained - at least not by me.

<contributorName>

REQUIRED

The name of a single person or organization contributing. As in the case of <creator>, this should be in the format "Family, Given" for personal names, and the formal name for organizations.

Optional attribute: nameType. It must have one of these two values:
  • Personal
  • Organizational

Best practice is to use the optional attribute.

<givenName>

OPTIONAL

The given name of a personal name, analogous to the same field for <creatorName>.

<familyName>

OPTIONAL

The surname or patronymic of a personal name, analogous to the same field for <creatorName>.

<nameIdentifier>

OPTIONAL

A formal identifier for a person or organization, such as a personal ORCID or an organizational DOI. It may be repeated if there is more than one applicable identifier.

Required attribute: nameIdentifierScheme. This is the type of the identifier ("ORCID" or "DOI", e.g.).
Optional attribute: schemeURI. This is a URI reference to the identifier definition or defining organization.

<affiliation>

OPTIONAL

This element contains the name of an organization or institution with which the named contributor is affiliated. It is a free-format text field. It should be repeated for each unique affiliation when there is more than one.

In the archiving case, this must be interpreted as "affiliation at the time of publication of the product(s)" - in other words, affiliation on the date given for <publicationYear>.

Optional attribute: affiliationIdentifier. The value should be a formal, permanent, unique identifier for the affiliated organization, beyond the name that is the value of the <affiliation> element. For example, a Research Organization Registry (ROR) ID would be ideal here.
Optional attribute: affiliationIdentifierScheme. The (human-recognizable) name of the identifier system (e.g., "ROR" would be sufficient in the above case.
Optional attribute: schemeURI. The URI of the identifier scheme, typically a URL.

<dates>

OPTIONAL

This element provides a way to include various significant dates in the DOI database record.

<date>

OPTIONAL

One significant date for the resource. Dates should be in ISO 8601 format and can be to any precision (but this is not schematically enforced). This element may be repeated as needed for each date.

Required attribute: dateType. This indicates the significance of the date and must be one of the following values:
  • Accepted
  • Available
  • Collected
  • Copyrighted
  • Created
  • Issued
  • Other
  • Submitted
  • Updated
  • Valid
  • Withdrawn
These are defined in the DOI Schema description document.
Optional attribute: dateInformation. This should be a very brief clarification of the dateType, where necessary.

Wherever it is possible to do so, SBN must include an <Available> date that is the year and month of publication - so the year must agree with the previously listed <publicationYear> value. This date is the date ADS will use in generating statistics over various periods, so it is important that at least the month be present whenever it is known. As with <publicationYear>, the PDS3 DATA_SET_RELEASE date should usually be considered authoritative.

<language>

OPTIONAL

The natural language of the resource. This is defined as being of type xs:language, which provides syntax validation but does not actually fully enforce that values come from the "IETF BCP 47, ISO 639-1 language code," as specified in the description.

For most purposes, "English" is assumed and there is no need to attempt to distinguish between American, Canadian, British, or any other sub-categories of modern English. If you're a completionist, though, feel free to include <language>en</language> in your DOI metadata.

<alternateIdentifiers>

OPTIONAL

This element lists alternate identifiers for the same instance of the resource (as opposed to physically distinct, duplicate copies with their own identifiers). The identifiers should be unique and controlled within some context which should be specified.

In the DOI metadata context, PDS3 Data Set IDs and PDS4 LIDVIDs are "alternate identifiers". Typically we would want the DSID or LIDVID to be included in a citation, so we need to pay particular attention to content here to make sure it is consistent and thus programmatically retrievable. PDS4 LIDs without the version ID are not identifiers in the context of DOIs, because the thing they point to is not permanently fixed (a LID alone refers to the "latest available version", which may change).

For PDS Product: Use:
PDS3 Product DSID:PRODUCT_ID
PDS3 Data Set DATA_SET_ID
PDS4 Product Product LIDVID
PDS4 Collection Collection LDVID
PDS4 Bundle Bundle LIDVID*
*Note that a Bundle that does not change its own version ID every time a collection it contains changes its version ID should not have its own DOI. The DOI needs to be associated with an unchanging entity. If the bundle doesn't change when any of its collections changes, then it is not an "unchanging entity".

<alternateIdentifier>

OPTIONAL

This element provides one instance of an alternate identifier for the resource. It may be repeated as desired.

Required attribute: alternateIdentifierType. This string must describe the source or context of the identifier.
For PDS Identifier: Use alternateIdentifierType:
PDS3 DSID:PRODUCT_ID PDS3 Product ID
PDS3 DSID PDS3 Dataset ID
PDS4 Product LIDVID PDS4 Product ID
PDS4 Collection LIDVID PDS4 Collection ID
PDS4 Bundle LIDVID PDS4 Bundle ID

<relatedIdentifiers>

OPTIONAL

This element lists identifiers for other resources related to this resource in some specific way. This is an important attribute to include in our DOI metadata. This is where we tie into the published literature, so some effort should be put into getting this right. It's also where we can related different versions of the same PDS product (each of which will have its own DOI).

Citation Considerations

This is where we indicate which papers and other data sets are cited by the data set getting the DOI. This is not a "reading list". When in doubt, consider the data set/product as if it were being prepared for publication in a refereed journal. If the citation would be appropriate in that context, then it is likely appropriate in this one.


Citing Conventions

Here are some draft guidelines developed at SBN/UMD during a discussion among the node personnel for where to look for citations to include in DOI metadata as relatedIdentifiers of the type Cites or isCitedBy:

Raw and Calibrated Data
  • Cites any inline citation in the text of the overview and/or intro document
  • Cites any published papers describing the mission and/or relevant instrumentation referenced by the archive
  • Calibrated data Cites raw data and calibration data; raw and calibration data isCitedBy calibrated data
High-level Data
  • Cites immediate precursor products (e.g., mosaic Cites calibrated images; resampled data Cites full-resolution data); precursor data isCitedBy high-level data
  • Cites published paper describing the derivation/processing, if any.
Calibration Data
  • Cites mission and hardware papers, as appropriate.

Other Relationships

The next most important relationship to note here is that between older and newer versions. Where both things have DOIs, the metadata should be updated in both cases to provide bi-directional pointing. This can also be used to indicate predecessors even when there are multiple predecessors that were combined, or one that was split.

Other relationships should be documented if that makes sense both in the context of the DOI metadata as well as the PDS archive. Some of the likely ones to occur are listed following.

<relatedIdentifier>

OPTIONAL

This element is a single related identifier. It can be repeated as needed for additional identifiers. Note that it has half a dozen attributes, only two of which are required, to help in defining the relationship. Standard values are defined in the DataCite schema documentation.

Required attribute: relatedIdentifierType. The value must come from the following list:
  • ARK
  • arXiv
  • bibcode
  • DOI
  • EAN13
  • EISSN
  • Handle
  • IGSN
  • ISBN
  • ISSN
  • ISTC
  • LISSN
  • LSID
  • PMID
  • PURL
  • UPC
  • URL
  • URN
  • w3id
Do not include more than one <relatedIdentifier> for the same publication/work to be cited or related.
For general use, DOI is the preferred identifier for all relationship types. For citation, bibcode is acceptable, but should only be used if there is no DOI available. When both exist, ADS (who issue bibcodes) have specifically requested that we only use the DOI in this list to cite the other work. ISBN and arXiv might also be available and could be used in cases where this is no DOI.
Required attribute: relationType. The value must come from the following list:
  • IsCitedBy
  • Cites
  • IsSupplementTo
  • IsSupplementedBy
  • IsContinuedBy
  • Continues
  • IsNewVersionOf
  • IsPreviousVersionOf
  • IsPartOf
  • HasPart
  • IsReferencedBy
  • References
  • IsDocumentedBy
  • Documents
  • IsCompiledBy
  • Compiles
  • IsVariantFormOf
  • IsOriginalFormOf
  • IsIdenticalTo
  • HasMetadata
  • IsMetadataFor
  • Reviews
  • IsReviewedBy
  • IsDerivedFrom
  • IsSourceOf
  • Describes
  • IsDescribedBy
  • HasVersion
  • IsVersionOf
  • Requires
  • IsRequiredby
  • Obsoletes
  • IsObsoletedBy
Following is a summary of the relationships most likely to be useful in our context.
relationType Use for:
Cites/IsCitedBy Standard citation of previous/subsequent work
IsVariantFormOf PDS4 versions of PDS3 data sets, where the PDS3 data set has a DOI and no substantive changes (i.e., an external peer review was not required for the PDS4 version) have been made in migrating the data from PDS3 to PDS4. This should be included in the DOI metadata for both the PDS3 and PDS4 versions.
IsNewVersionOf/IsPreviousVersionOf Datasets that are new/previous versions (that is, same ID but different version number) of datasets that also have DOIs; in this case both DOIs should have their metadata updated. Also, if a PDS3-to-PDS4 migration did require substantive changes to the data file, then this is the appropriate relationship between the two.
Obsoletes/IsObsoletedBy When a dataset supersedes a dataset with a different ID (in the same ID system), use this relationship.
IsSupplementTo Data that is the basis of or direct result of a journal article, but that was not published as part of the article - especially where the data set is mentioned in that publication
IsReferencedBy/References Bibliographic references (as opposed to citations)
IsDerivedFrom/IsSourceOf Raw-Calibrated relationships; high-order products and the contributing source products
IsPartOf/HasPart Bundles and their constituent collections when both have DOIs. IsPartOf can also be used when a data product comprises one or more tables that were published (in their entirety) as part of a journal article or book.
Other relationships can and should be used where they seem useful.
Optional attribute: resourceTypeGeneral. This attribute is identical to the one of the same name in <resourceType>, above.
These optional attributes should only be used when the value of the relationType attribute is either IsMetadataFor or HasMetadata (this is not validated):
Optional attribute: relatedMetadataScheme. This indicates the ID or name of a metadata definition standard.
Optional attribute: schemeURI. This should be the URI of the named metadata standard.
Optional attribute: schemeType. The DataCite definition is not clear, but this looks like a specific file format type for the referenced metadata standard (such as "XSD").

<sizes>

OPTIONAL

This element provides unstructured size information. In other words, it is not required to be numeric and there are no syntax constraints on the content.

<size>

OPTIONAL

A single size specification string, like "18GB" or "Three volumes". This element may be repeated as needed or desired.

Some Examples
<size>55GB</size> Total volume
<size>3250 tables<size> Number of data files
<size>235 FITS images<size> Number of data files with more info
<size>24 documents in multiple formats<size> Number of unique/logical products

<formats>

OPTIONAL

This class indicates the physical/digital format(s) of the resource.

<format>

OPTIONAL

This element contains a text description of the format. It is not constrained and may be repeated as appropriate.

Best practice is to use a file extension or MIME type string as the value.

PDS should be more formal about the content here. We also need to think about possible mixed-format products, like single documents that comprise multiple files, some of which are text and some images/graphics, and the best way to describe format for collections (if at all).

Some Examples
<format>text/plain</format> Flat ASCII text (including Table_Character files)
<format>text/plain;charset=utf-8</format> Flat UTF-8 text (including Table_Character files)
<format>text/csv</format> CSV files (must use commas as field separators)
<format>text/tab-separated-values</format> DSV files where tabs are used as field separators
<format>text/vnd.ascii-art</format> ASCII art
<format>application/pdf</format> PDF files
<format>application/msword</format> MS Word files without macros
<format>text/xml</format> XML document (e.g., PDS4 labels)
<format>image/fits</format> Simple FITS file with an image in the primary data unit and no extensions
<format>application/fits</format> Every other kind of FITS file

Other media types (MIME types) may be found by referencing the list provided by the Internet Assigned Number Authority (IANA) at https://www.iana.org/assignments/media-types/media-types.xhtml. There is no media type corresponding to PDS3 .img images, or plain raster images (as far as I can tell).

<version>

OPTIONAL

A version number associated with the resource.

Best practice is to obtain a new DOI for a major version change of something that already has a DOI.

Because of the traceability and reproducability concerns involved in research data, PDS should never use this element for archival data. New versions of PDS products with DOIs should get their own DOIs. The <relatedIdentifier> element can be used to link the two versions in the DOI database.

<rightsList>

OPTIONAL

This element typically contains only a single <rights> member to indicate the rights management for the resource, although it may contain multiple <rights> elements in complex cases.

<rights>

OPTIONAL

A single rights license with management information (e.g., "Creative Commons", or "GNU General Public License"). This should be as explicit as possible, with a complete management statement where appropriate. Embargo information should also be recorded here.

Optional attribute: rightsURI. This should be the URI to the full text of the license.
Optional attribute: rightsIdentifier. A short, standardized license name. Best practice is to use one of the identifiers on the SPDX list: https://spdx.org/licenses/
Optional attribute: rightsIdentifierScheme. The name of the scheme for the identifier, immediately preceding. If the identifier same from the SPDX list, for example, then this attribute should have the value SPDX.
Optional attribute: xml:lang. This indicates the language of the license.

PDS data is public domain (which may or may not be worth stating explicitly), but we may need to consider the case of embargoed data if we are planning to reserve DOIs in advance of publication.

<descriptions>

OPTIONAL

This element provides a place for additional information that does not fit into other categories.

<description>

OPTIONAL

A single, free-format description of the specified (via attribute) type. This field may be repeated as needed.

It looks like formatting is not preserved for this text, but you may use the <br/> tag to insert a paragraph break.

Best practice is to provide at least one description of some type. It is probably not a good idea to provide multiple <description> elements with the same descriptionType value, but this is not validated.

Required attribute: descriptionType. This indicates the category of information being provided. It must have one of these values:
  • Abstract
  • Methods
  • SeriesInformation
  • TableOfContents
  • TechnicalInfo
  • Other

All PDS DOIs must have at least one <description> field with a descriptionType of Abstract. Note that if you create a DOI through the Fabrica interface, the description you enter there is not tagged as an abstract. If other description types seem appropriate, add them.

Optional attribute: xml:lang. This indicates the language of the description being provided, not of the resource.

<geoLocations>

OPTIONAL

This element is used to define relationships (either on the creation or application side) between the resource and a defined patch on the surface of the Earth. I don't see a way to apply it as it currently exists to celestial coordinates, and that might seriously confuse applications that process this sort of metadata harvested from DOI databases. It's included here for completeness.

<geoLocation>

OPTIONAL

This element defines one specific patch of Earth where the data were taken or on which the resource is focused. It may be repeated as desired.

<geoLocationPlace>

OPTIONAL

A name for the location being defined.

Note that the text provided in the schema that describes the points comprising the following elements as having values which are "a single latitude-longitude pair, separated by whitespace". This is not true. In all cases there are tags specifically defining latitude and longitude as separate and distinct elements.

It is also possible to define any single <geoLocation> as being simultaneously a single point, a single box, and any number of polygons. This seems irrational, yet it appears to be deliberate.

<geoLocationPoint>

OPTIONAL

This element specifies a single point on the globe.

<pointLongitude>

REQUIRED

Longitude in degrees in the range +/- 180.

<pointLatitude>

REQUIRED

Latitude in degrees in the range +/- 90.

<geoLocationBox>

OPTIONAL

A box is defined by its four sides - east and west longtude, and north and south latitude.

<westBoundLongitude>

REQUIRED

Westward bounding longitude in degrees in the range +/- 180.


<eastBoundLongitude>

REQUIRED

Eastward bounding longitude in degrees in the range +/- 180.

<southBoundLatitude>

REQUIRED

Southward bounding latitude in degrees in the range +/- 90.

<northBoundLatitude>

REQUIRED

Northward bounding latitude in degrees in the range +/- 90.

<geoLocationPolygon>

OPTIONAL

This element defines an arbitrary polygon as a sequence of points around the perimeter in which the last point must have the same definition as the first point (though this is not validated, nor is the nature of the path). There must be at least 4 points provided.

Oddly, the schema allows this element to be repeated.

<polygonPoint>

REQUIRED

Longitude and latitude of one point on the perimeter of the polygon.

<pointLongitude>

REQUIRED

Longitude in degrees in the range +/- 180.

<pointLatitude>

REQUIRED

Latitude in degrees in the range +/- 90.

<inPolygonPoint>

OPTIONAL

If you are intending to define an area that is larger than half the total surface of the Earth, then you must use this element to define a point somewhere inside the area of interest. Otherwise the smaller enclosed area is assumed to be the area of interest. The actual point can be random, as long as it is inside the intended area.

<pointLongitude>

REQUIRED

Longitude in degrees in the range +/- 180.

<pointLatitude>

REQUIRED

Latitude in degrees in the range +/- 90.

<fundingReferences>

OPTIONAL

This element identifies sources of funding related to creating or maintaining the resource.

This is not information that PDS has traditionally collected, but as it is used to trace results back to funding in the literature databases, we probably should. In the case of ROSES data preparers, I'd make that "definitely should".

<fundingReference>

OPTIONAL

This element identifies a single source of funding. It may be repeated as needed.

<funderName>

REQUIRED

Name of the funding source. This should be the formal name.

<funderIdentifier>

OPTIONAL

This is a string that uniquely identifies a funding source under some public scheme, like "Crossref Funder" or ISNI.

Required attribute: funderIdentifierType. This string is the source of the corresponding identifier. It must have one of these values:
  • ISNI
  • GRID
  • Crossref Funder ID
  • ROR
  • Other
NASA's Crossref Funder ID is 10.13039/100000104 .

<awardNumber>

OPTIONAL

Grant number or similar code assigned by the funding organization.

Optional attribute: awardURI. This attribute can be used to provide a link to a page at the funding organization website that describes the award/grant program.

<awardTitle>

OPTIONAL

The title on the grant/award - that is, the title of the proposal that was submitted and funded.