International Database of Digital Humanities Projects

PROPOSED DATABASE STRUCTURE

CONTENT CREATION (Mike Neuman)

Let me begin by pretending this is easy. Here are the Dublin Core metadata elements that we would draw upon for categorizing Content Creation (archives, editions, serials) in the NINCH Database:

Title
Subject
Author
Publisher
OtherAgent
Date
ObjectType
Form
Identifier
Relation
Source
Language
Coverage

In other words, we would need all the fields for nearly all the content-related projects.

TEACHING PROJECTS (Lorna Hughes)

I've included a list of the Dublin Core elements that should be included in the "pedagogy" section of the database. The key elements identified by Mike and John, which should be common to all projects, are straightforward, but I've included John's asterisks to indicate the fields that should be included in all events.

* Title
*Subject
* Author
Publisher
OtherAgent
*Date
*ObjectType
*Form
*Identifier
Relation
Source
*Language
Coverage

Other elements (my asterisks):

*Description
Rights
Startdate/Enddate
*Project contact information
*Project Funding
*Software used
Standards used
Evaluation

Also important for pedagogy - the level at which the resources created are aimed at, (K-12? University? Graduate?). I also think that evaluation standards need to be built in.

TOOLS (John Unsworth)

Here's a sample list of fields for the tools component of the database: (asterisk indicates a field that, it seems to me, should be required in all cases).

Dublin Core Elements:

*Title:
*Subject:
*Author:
Publisher
OtherAgent:
*Date:
*ObjectType:
*Form:
*Identifier:
Relation:
Source:
*Language:
Coverage:

Other Elements:


Startdate/Enddate:
*Project Contact Info:
Project Funding:
*Software Used:
Standards Used:
Software Created:

FINDING AIDS/ARCHIVES/INDEXES (Pamela Ellis)

SUBJECT: We should come up with a controlled vocabulary if we want to facilitate searching-if we are entering all different sorts of terms (US History, American Studies, United States History, History--US) for a concept, searching will be very unsuccessful. Perhaps a drop down list of broad subject headings (derived from LC Subject Headings) might be a way to go. I'm using this approach for the new (as yet not public) version of the AALN database.

AUTHOR: Individual or group who created the access aid. Also includes contributors.

PUBLISHER: Institution or company name, includes name of division or department (e.g. The New York Public Library, Center for the Humanities, Manuscripts and Archives Division).

DATE: Date of last update to finding aid or index.

OBJECT TYPE: Again we need a controlled list for this field--I have something I'm using for AALN that we might want to draw on, but I've left the draft on my home machine. I'll try to send that Monday--if I don't continue to remind me!

FORM: text, HTML, EAD, whatever

SOURCE: Item from which the digital document was derived.

IDENTIFIER: URI (that is, URL or URN, if available)

RELATION: Digitized images of collection, full-text of items, or anything available that is referred to in the access aid.

LANGUAGE: From controlled list (seems like everyone would include the same name for a given language, but not necessarily).

COVERAGE: Scope of the collection described.

*Other elements


COLLECTION: Collection name (e.g. Mariam and Ira D Wallace Collection)

NOTE: Other relevant information.

Pamela's notes on her contribution:

We will need to provide detailed instructions (a chart of elements and brief explanations) if we want resource providers to contribute data.

The tricky thing about Dublin Core is that it was designed to describe documents "born digital" rather than converted material. Hence the need to clarify what exactly each of the fields is describing.

Also, we should think about making certain elements required, so that there is some consistency. Although there may be variance among the elements for each data type, we should have several elements that are present in all records. All elements in Dublin Core are repeatable--I think that's ok for this project.

Pamela

Next, comment by Mike:

"Now for the complications: I've spent a little time exploring the OCLC site that Pamela referred us to (purl.oclc.org/metadata/dublin_core/), and I've discovered how much effort is currently being invested in the attempt to encompass all the genre types (now under the metadata element Resource Type).

It would be well to explore some of the links from the OCLC page that I've just visited, esp. Dublin Core Resource Types (sunsite.berkeley.edu/Metadata/structuralist.html) and the comprehensive list of Dublin Core Types and proposals for nesting of hierarchies (9 printed pages) at andrew.triumf.ca/TYPE.html, which is referred to as Andrew Daviel's Summary on the preceding URL.

While these Resource Types seem primarily applicable for our category of Content Creation, they are also relevant, I believe, for the other three categories. See, for example, Daviel's analysis of Virtual Reality on page 4 of 9.

In summary, there's an immense apparatus under development by the Dublin Core working groups, and we will need to keep abreast of their progress. At the moment few of the schemes have been completed and ratified to the point at which we could adopt them with confidence.

Mike

Agreement from Lorna on Mike's concerns:

I share Mike's concerns about the evolving schemes being developed by the Dublin Core working groups, and that we need to keep up with what they are doing.

Lorna

And then the opening by John Unsworth of the subject of being able to use software at Michigan:

"It's worth pointing out that there are significant overlaps between what we're trying to do and what JPW and others have done at UMichigan, in the Digital Library Registry Database--that project seeks first of all to catalogue resources held or produced at UMich, but beyond that it is interested in cataloguing resources held elsewhere, it uses Dublin Core (in SGML form) and it proposes to expand eventually to include full-text searching of the resources catalogued, where possible. Might we ask John Price-Wilkin about this project, and about whether we could adopt some of the technology he's developed for this? It's PAT-based, so we'd need to run it on a Pat-licensed server, but if John isn't interested in offering the server space, we also have Pat at the Etext center here, and could probably interest David Seaman in working with us on this. Alternately, we can use SGML tools that IATH already owns, like Dynaweb, or a database back-end like DB2, but that's assuming that for some reason it we can't, or choose not to, use JPW's framework.

John

For continuity's sake, I'll now also repeat John's further elaboration of this after a very fruitful conversation with John Price Wilkin about the system at Michigan, see <http://www.lib.umich.edu/registry/>

>"The University of Michigan Digital Library is a collection of

>computer-accessible scholarly resources in support of campus research and

>instructional programs. Resources included in the Digital Library

>collection are variously purchased, leased, locally created, or

>recommended as worthwhile sites hosted elsewhere but readily accessible to

>U-M users. As with library collections in other formats, selection of an

>individual resource is based upon a judgment of the scholarly quality,

>relevance, and organization of the content as well as the contribution its

>inclusion makes to the scope and balance of the collection as whole."

>"Each resource is abstracted, cataloged, and classified before being added

>to the Registry."

So far, the project at Michigan is mostly focused on resources licensed by or produced at Michigan, but JPW seemed open to discussing the possibility of using the software he's got set up for the UM Registry as the basis of af a separate dataset of the sort we've been discussing. His setup uses Dublin Core, and the search/browse interface is already set up and works quite nicely.

Issues remaining to be discussed, it seems to me, are:

--whether the registry could, or should, serve some of the matchmaking functions we discussed in Georgetown, with respect to funding agencies. I doubt that JPW wants to spend lots of resources adding features to the registry package, so it seems to me we ought to find other ways to handle these functions, outside that software.

--how to fund the extra work involved in aggressively cataloging projects not in the queue for the registry as it is now staffed and planned. On this subject, there is some proprietary software involved behind the scenes at the registry, but Rice belongs to the consortium of universities with rights to use this software; second, because UMich is an AFS/Kerberos network environment, it'd be difficult to get outside staff login accounts to work on data housed at Michigan. However, I think it would be possible for records to be created elsewhere and then mirrored to Michigan. JPW has offered to contribute .25 FTEto this project at the Michigan end, "within the Cataloging department, working with a cataloger, using the Registry/Gateway mechanisms" (JPW), and if that could be matched with another .25 FTE somewhere else, that'd probably be enough to get things started.

The registry could, it seems to me, save us all a lot of time by providing an existing set of relevant records (even if we didn't intermingle the Ninch data and the existing UMich registry--which we probably don't want to do, since the two databases have somewhat different purposes and audiences), and a software base on which to build further data. Other functions perhaps not supported by or not appropriate to the registry (the production of reviews of publicly accessible projects, by Chorus for example; the matchmaking with funders; the nagging of project leaders for periodic updates of their records) might be carried out through channels which might appear as a kind of wrapper for the data at Michigan, without actually being housed on the same server or administered by the registry (Umich or Rice? library) staff.

John