International Database of Digital Humanities Projects

 


Charlottesville Committee Meeting

Proposed fields for the International Humanities Projects Database.

Updated to reflect discussions at the Charlottesville ACH/ALLC conference: see Meeting Report.

Find below a table of IHPDB fields as they were proposed after discussions at the ACH conference. The Resp column indicates responsibility for providing the field values:

R indicates that the field value is required at submission
O indicates that the field is optional to both submitter and cataloger.
CR indicates the field is optional at submission, but required by the cataloger.

References to Dublin Core specifications reflect version 1.1, just recently released. For descriptions of the element set, see http://purl.org/dc/documents/proposed_recommendations/pr-dces-19990702.htm

 

Field

(label : field name)

Data Model

Responsibility

Description

     
Identifier: identifier PCDATA

R

This is a unique identifier for the resource. For networked resources, this will predominantly be a URL. However, there is reason to believe that it could be one of either a URL, URI, or ISBN (for cdroms), in which case we should create a type attribute for the field (i.e., type=URL or type=ISBN). Dublin core specifications also suggest the Digital Object Identifier (DOI) as an option.
Title: title PCDATA

R

Implied      
Other title: other title PCDATA

O

Implied      
Author: author indiv | org / PCDATA

R

Mostly implied, though author can be distinguished as an individual or an organization. There have been no established guidelines for the form of this entry (with specific regard to names of individuals). The only real reason to enforce a rigid form, so far as I can tell, is for listings of resources alphabetically by author last name. Searching within name is not effected by form.

If clear distinction between first and last name is desired, I would suggest two directions.

1. break author down to first and last sub fields, if it is an individual. If not, sort without preceeding articles or such.

2. include a MARC-indicator-like attribute number that indicates which word, if any, to index (eg. <author><indiv index=3> John C. Watson</indiv></author> tells the system to use Watson as the index name for listings).

I would urge 2, if any, since it allows folks to enter names in natural form. But would recommend against either if such listings are not necessary.

Other Agent: otherAgent indiv | org / PCDATA

O

Roughly equivalent to DCs "contributor" field, this contains information about any contributor to the intellectual content who is not the primary. Like author, can be distinguished as individual or organization.
Project Contact Information: contact (name,phone, email, url*, fax*)

R

Required at submission are name, phone, email only. url and fax are optional
Subject (LCSH): subject/cv subject / cv [type]=LC

CR

Accepts one or more LCSH headings. Practice in the registry was to restrict the vocabulary to the first two levels, I believe.
Subject Classification (LCSH): class/kw class / kw

CR

Accepts one or more LC class nos. for browsing purposes. Required of catalogers.
Subject Keywords: subject/kw subject / kw

R

Required at submission? I believe this was the agreement.
Description: description PCDATA

R

Narrative description of the resource.
Publisher : publisher PCDATA

O

According to DC, the "entity responsible for making the resource available."
Date: date PCDATA

O

Typically, the creation or availability date associated with a resource. DC recommends ISO 8601 full date format (i.e., YYYY-MM-DD). We do not need to validate the form, so can accomodate YYYY or YYYY-MM, depending on how specific information is.
Status: status ??

R

Recommendation at ACH was that status be defined into two tiers: accessible (Y | N) and in progress (Y | N). One determines whether or not the project is accessible to the public? The second indicates whether or not it is currently completed? On reflection, I would suggest that the second degree of status, progress, could be omitted, or indicated in the description. Otherwise, to indicate both, we would need format for the field, such as:

YY (accessible and in progress)

YN (accessible but not in progress / complete.)

NY (not accessible and in progress)

NN (not accessible but complete)

Object type: subject/cv subject / cv [type]=NINCHTYPE

R

From the list of NINCH defined object types. There have been suggested changes to this list which need to be approved.
Computational info: compInfo PCDATA

O

Contains information about file characteristics, software used, hardware used. Suggested at ACH. Implementation would obviate the current software and hardware fields, as well as format. With regard to guidelines for values, we might look to DCs recommendations for the DC-Format field.

See: http://www.isi.edu/in-notes/iana/assignments/media-types/media-types

Cataloged: catalogRef PCDATA

[rectype]

O

Suggested at ACH. Intended to indicate whether a formal cataloged record exists as a reference. rectype attribute would hold one of an enumerated list of values to indicate the source of the record (e.g., OCLC, etc.). The content of the field could contain the actual record number, assuming the catalog sources all apply unique numbers that are avaialble. Also suggested at ACH was that the actual record be dumped into this record, though I'm not sure whether copyright restrictions wouldn't make this a mess, as well as some formatting and special character issues. To include the actual record would require a container field, such as catalogRec.
Source: source PCDATA

O

Brief descrtiption of the source this resoruce is derived from, if any.
Language: language PCDATA

O

Implied
Coverage: coverage PCDATA

O

Coverage or scope of the content. Here most likely to indicate coverage dates or period. See DC version 1.1 for details.
Funding: funding/funder* (PCDATA | p | lb | funder)*

R

Accomodates general funding information, or specific names of formal funders, or both. Required at submission.
Rights: rights PCDATA

O

Statement of rights.

 

Other fields that require additional investigation

Relation See DC. Intended to indicate a related resource, perhaps by unique id.
Collection

?

Evaluation

?

Level for Resources

?