Friday 22

 

Final Discussion: Future of Project Outlines

History:
The lead partners in the Visualization projects would be Ed Ayers at Virginia and Greg Crane at Tufts. A focus would be on how to use new digital tools to further scholarship. A theme might be the East coast over a certain time. There is a clear need to draw data from places not normally thought of as humanities data repositories (e.g. the National Oceanic and Atmospheric Administration, NOAA). The goal would be to think through the process so such a project could be replicable. With data mining and data harvesting possibilities, one focus might be to work with substantial archives with data that can be analyzed and used to develop visualizations and interpretations

Visual and Media Studies
The lead proposal from this group, “Federated Digital Image Repositories and Interpretive Information,” would probably focus on American Studies materials. The group stressed the importance of strong information architecture to enable the networking of repositories. Rather than copying a repository, one should be able to point at or refer to it.

Performing Arts
Of the many performing arts proposals, Performance Modeling, recreating a specific performance event that draws on other visual media, appeared to link to the History “Visualization” project and the Federated Repositories idea in that it was interested in seeing how to draw from digital repositories but also to show alternative possibilities in reconstructing from a common set of materials.

Humanities Modeling
One of the pedagogical challenges in performance is how to represent the experience of a live event that can allow for variations in the performance and in the audience. How to play with different parameters of a performance such as lighting. This is another element of an humanities modeling process. Both history and performing arts want to play with variables. Differences may include that performance is generative, whereas history may be more reconstructive?

OCR projects:
There were many OCR projects outlined. Interests included: increased accuracy; the ability to work with multiple languages, hand written manuscripts and manuscript music. Several such projects have been started but have then been dropped because there’s been no commercial value to them. Likewise, it appears there’s no money to be made in scanning ancient texts: commercial programs may not be developed to a level of flexibility that could do Cyrillic. So could an OCR program be developed with a linguistic background that could try and interpret on the fly? Structured writing from particular historical periods implies the need for OCR of specific kinds of handwriting. Overall, there are perhaps two important aspects of OCR: recognizing symbols and then interpreting them (Unicode could perhaps be expanded to recognize semantically meaningful symbols). Another element is taking OCR beyond the flat written page such as on 3-D objects, textiles, inscriptions and making this context accessible. There have been some recent Xerox PARC breakthroughs recently that have been parsing out field information to identify signature, date etc.

There is a great need for more training capability in OCR such as we have in audio digitizing. Mike Mahoney sees two possible OCR projects: for non-Roman alphabets (that would include extension of Unicode for historic languages) and for written texts. The idea is to identify a set of resources that is highly structured in terms of script and layout
(e.g. ledgers) that would benefit from the digitizingof a new kind of material and provide a set of test material for new kinds of OCR. There was widespread interest in OCR in the history group (Marilyn Levine, Greg Crane) in the Interdisciplinary group and in Literature and Language, but no proposal outline.
.
Humanities Centers:
There was broad and related interest in working on developing a new kind of digital humanities center where scholars could work on projects, could share their work with others, could be trained and could use expensive equipment: we really need a sub-committee to explore relations with foundations.

There is clearly too much work being done in isolation and too much duplication going on in the contexts of each university working largely on its own resources. Within Performance Studies there was the issue of how to integrate information from the different areas (theatre, dance, music, etc). The Performing Arts group proposed an institute run by a national governing body that would seek $750K/yrs for 3 years to select existing centers and designate them as a “center” for scholarship in performing arts and then select visiting scholars, who would perhaps apply in teams (maybe a CS person and an arts person). They would get some salary, $’s for equipment to take back home.

Visual & Media Studies had the idea of pooling technology resources and taking better advantage of centers that exist, perhaps providing them with new resources so they could serve wider constituencies. There might be the understanding that there should be wider national consultation when projects are starting (to avoid duplication) and, if the services of a larger center were used that the results would be widely shared. All groups were interested in these related ideas. Others emphasized the importance of national meetings and of the emphasis on scholarship. The Interdisciplinary Studies group wanted this to have an international scope. Others thought a virtual network in addition to a physical center would be a vital component. This center would benefit from consistent participation from computer science. Other related issues included the issue of maintenance of projects and participation in national/regional digital repositories.

While harvesting shows some promise, all fields felt there needed to be more intensive development of search engines and systems for identifying the good sites. Tracing the way people search might assist the effort.
Language and literature had the idea of identifying a particular digital archive and sending in a team of scholars to look at whether scholars can really get the information useful to them. It was pointed out that the Scout project, School Zone, Webivore, and ISI are organizations that do this already. But the idea would be to take one of these further to meet more scholarly needs.

Real need to move away from manually building portal sites and to use CS technology to mine the wealth of new material coming online in a steady way. Could a discussion list be set up on this topic? Let’s find ways of developing much better searching of the big national projects such as American Memory.

There was some discussion of the importance of getting more scholars to recognize the importance of intellectual property legislation and for getting the scholar’s voice back into this process for arguing for fair use. David Green noted this was part of NINCH’s wider agenda and asked for further input from this community.