e-Research on Texts & Images

I am at the British Academy attending a workshop organised by the Oxford e-Research Centre.

Introduction to the Colloquium

Mike Brady, Oxford & Alan Bowman Oxford

Alan Bowman introduced the workshop by summarising the work that is happening in e-research. He explained the main issues and objectives of the discipline.

Mike Brady remarked on the difficulty of making sense of ancient documents given their condition. He talked about advances in imaging in medical technology over the past 20 years, and development in distributed computing in recent years. He commended the work on developing support tools for people who work with images, clinicians or historians. He also stressed the importance of development of computational models for the analysis of those images. He ended by saying that there are very little difference in the way that intellectual scholarship is developed in the sciences and the humanities.

New Image Capture Techniques for Analysing text

Ségolène Tartre, Oxford

Ségolène Tartre talked about her work support systems for papyrologists. She has designed a model for the intellectual processes that occur while interpretation is being performed. This model tries tu capture transition from signal to meaning and the individual steps in between.

She also remarked on the similarities in difficulties in feature detection for both the technology and the palaeographers. She explained how using image analysis in the Fourier space can model is similar to the human eye. The human eye is very good at identifying features. This features are much more apparent to the computer in the Fourier space. She concluded highlighting oscillations between mutability, perception and interpretation, characters and features, engineers and papyrologists.

Simon Tanner, KCL

Simon Tanner is concerned with getting from an object, to an image of an object, to the interpretation. He focused on the value of having a good image in the first place. A good image can be used with even very basic tools to provide a sufficient resource for humanities scholars. He talked about a variety of technology to generate good images, improve them, or simplify them.

He introduced his work with the Dead Sea Scrolls. The interesting issue from the humanities perspective is finding differences in the text from each other or established copies. He showed images of the thousands of pieces that are part of the collection. These images are from the original imaging project done during the 1960’s, when they took many images with the technology available at the time, including black and white and infrared.

He remarked on the technological shortcomings and the conditions in which these images where acquired. He also explained the difficulties in imaging these scrolls: some of them are quite small, there are too many pieces, the pieces are not necessarily in any meaningful order, and there are many which have text on both sides. Even so, much valuable work has been done from those original images.

The new initiative by the IAA is concerned with better conserving the scrolls, provide a long-term resource for scholars and offer better, unified documentation. Dr. Tanner was involved in a pilot project for making new images under this initiative. They will include color, multi-spectral and infrared images.

The project is using spectroscopy mainly for conservation purposes. The multi-spectral images provide them with information about the water content of the scrolls. There is also the possibility of assessing the damage on the scrolls, following some research done at UC Irvine. They found that while the reflectance of the parchment and background changed with the spectra, the ink usually stayed the same.

Discussion

Graeme Earl, Southampton

Graeme Earl talked about RTI, a technique to capture images by varying the position of the lighting to map the surface of an object. They are working in developing and refining this technique. He is also working on imaging the Herculaneum scrolls and wall painting, Vindolanda Tablets, and ceramic surfaces. They are able to produce very high resolution images from this process. Further using various image processing algorithms with the captured data they can simulate different lighting conditions in a virtual environment, and 2 or 3 dimensional reproductions.

Dirk Obbink, Oxford

Dirk Obbink is developing crowd-sourcing applications to help with archeology efforts to classify papyri, collect and correct text from the objects. The pilot study has already about 10,000 users, which range from school students to professional archaeologists.

A Virtual Research Environment for the Humanities

John Pybus, Oxford & Charles Crowther, Oxford

John Pybus talked about their work in building a virtual reality environment for the humanities. He stressed the unique requirements of collaboration tools from scholars in the humanities, the need to re-use existing software. John has already worked in developing a virtual collaborative environment for ancient documents to allow researchers to collaborate. They introduced the new follow-up project that focuses on annotation and context.

Charles Crowther showed some of the traditional ways in which researchers use annotations. They want to replicate this use of contextual information. Their system focuses on a principal viewing space, and an annotating space.

John Pybus gave a small summary of the technical features of the systems. The annotating environment is web-based so it works across different computers systems. It also supports standard formats for data distributions, like RDF, allowing meta-data to be used for other projects. They also are working on supporting some image processing functions within the virtual environment, the processing would be done by a national grid computing system, which would then send them back and link them to the virtual environment.

Plans for the future include integrating their environment into OpenSocial/Google Gadgets. They also are looking at online education environments, which have similar approaches, like the Oxford Weblearn Service and Sakai.

Dot Porter, DHO

Dot Porter talked about TILE and how the motivation for this project. TILE is a tool for digital editing and publishing images. It uses standard data formats from TEI. She showed a few examples of TILE displaying digital reproductions, linked with transcribed text.

Their objective was to create a tool that would be used in a variety of disciplines. They have partners who are helping with testing. They have also been reached by a series of research groups interested in the project. They have attempted to incorporate their needs into TILE. these include annotation of graphics and their relationship to the text, automatic marking of text boxes. They use a variety of materials as test cases, including comic books, maps, and ancient manuscripts.

Work in the future includes semi-automated creation fo links, annotation of an area with controlled vocabulary, application of editorial annotations to any area, support for linking in non-horizontal areas, and linking between non-contiguous areas.

Additionaly TILE provides a range of options to include external functionality. It can be extended, it allows for a variety of output formats, and it is simple to set up, it can be deployed externally.

Discussion

TILE wants to be a framework for building modules and collaboration. It does not want to be another system without users.

The digital world has brought us together to think about the process that we go through within our scholarship. TILE is an example of a framework that tries to leave traces of that framework for future use. This sort of framework does not exist in the paper and pencil world. Often we go back to our work and do not recall how we arrived at the conclussions that we did. It seems like the fact that the digital worlds has made us aware of that.

Even in the digital world the documentation tends to be of the results, rather than the processes.

New researchers work digitally more naturally. They are also quicker to adopt new technology.

Specific Support Tools for e-Research in the Humanities

Melissa Terras, UCL & Henriette Roued-Cunliffe, Oxford

Melissa Terras is developing a system for assisting the reading of ancient documents. Her work is based on previous research on the Vindolanda Tablets and stylus text. She stressed that the objective must be to assist experts in reading and interpreting these documents and not replace them. The focus of current work has been on presenting documents that have already been read and interpreted. They display reproductions, transcriptions, and translations. But these systems are not aiding the process of reading, transcriptions, and interpretation. To do that you need to understand the process of reading these texts.

They have developed a computer model of this process that can be understood by a computer. The problem is that experts are notoriously bad at explaining how they approach reading a text. The process involved a lot of uncertainty, they needed to find a way of expressing this in a way that a computer might understand. The process is also non-linear, experts go through a states back and forth until settling in a conclusion. They have discovered, though, that the struggle is recognising the features, and not the grammar or interpretation. This is the area where a computer might be helpful in the process.

Objective:

Provide suggestions into the recognition of features
Recording the hypothesis and the changes along the process

Henriette Roued-Cunliffe talked about the prototype of decision support system that they have built: APPELLO. She explains that the computer must do the things that humans find difficult, remembering, searching, accessing other experts knowledge, and enabling collaboration. APPELLO reads any EpiDoc encoded collection and provides a searching interface for a variety of features. It provides a framework for presenting the collections like the Vindolanda tablets interactively. It is designed as a RESTful, which is a standard that can be coupled with other web services.

Future work includes including it in Google Gadgets or as an HTML Add-on. They also want to include it into a decision support system.

Tom Elliot, NYU

Tom Elliot works at the ISAW at NYU. He looks at the web as an environment that already includes collaboration and publications, it encompasses a virtual environment for research. Technologies in the web are still in an experimental stage, where the concepts are still not standard or understood. He is interested in how this environment is disrupting and changing scholarship. He has witnesses a shift towards more collaboration.

Their focus is in using services that are already available, and making it work for them. For example, they hold a large collection of images and meta-data in Flickr. They intend to develop their own interface, but they were interested in distributing their material earlier rather than later.

They are also developing some tools of their own, using primary and secondary material. The primary concern with this is to make their material citable, the ability for the material to be cited. He argues that this is a principal academic function.

Scholars have a responsibility to arrange for failure. They also need to be open, make their material available. Cool URIs is an initiative to provide URI’s that are sensible for the material. They should mask mechanisms, avoid file types and over-all, be stable. For example Pleiades. They are also striving to work with open data licences and providing ways to allow revisions.

He demos, papyri.info. A search engine developed to provide resources for papyrological research. It is currently undergoing a major revision. They are focusing on making it more document centric rather than collection centric. They are also breaking up the monolithic nature of the application and providing an API. They are releasing their own interface as open source software.

Discussion

Wikipedia has a lot of crap, do you have a system to editorialise content.

TE: Yes, but we are providing the interface, it is the people who finally use it the ones who are going to need to check for quality.

It is a pleasant surprise that Tom Elliot’s work incorporates the ideas of the semantic web, open-source, and open-licensing.

VRE is quite a good name, but it gives the idea of a close box. The purpose of these projects are more to train people into using these resources.

There are very few people that can build interfaces, why is this important?

The ability to do so is important in itself. Also, this allows for our data to be used in a different way, without locking the use of it in a very idiosyncratic way.

Summary and Panel Discussion

David Robey, Oxford, John Fox, Oxford; David de Roure, Southampton & Annamaria Carusi, Oxford

David de Roure talks about his involvement with VREs, and specifically a project designed for social scientists to share their methods. He remarks on the commonality of processes across disciplines. He likes that the research today focused on the issues of the humanities, and it is quite sophisticated. He finds that the most significant development within computer science in relation to e-Research is the increase in participation. It is very important to share the processes and methods not only data. “By sharing methods we are sharing know-how, expertise, etc… this is very significant”

Annamaria Carusi thinks that if methodologies is understanding processes, humanities expertise could prove very valuable. Traditions of critique in the humanities can help us understand the agency behind processes. It can also bring some aid at analysing what kinds of disciplines are brought together in these processes. Humanities interprets and brings to the surface the nuances of the interconnection of the way we are doing things. She argues that we must also be aware of the non-neutrality of our abstractions, they are only useful. We must be aware of this non-neutrality.

John Fox summarizes the talks given today, and shares some of his impressions about each one.

He also talked about MOLGEN, a system developed in the 70’s and 80’s that was similar to what was talked about today. It was a decision support system but in the area fo genetics. It was concerned about providing a centralised resource for research. It mirrored the interest in preserving the processes and decisions.

John concludes by noting that we are beginning to see discussions about any topic be very common over the internet, and the question becomes how do we manage uncertainty?

End of Day 1

Broader Implications of Digital Technologies for Research in the Humanities

Data Mining for Historians – Robert Shoemaker, Sheffield

Robert Shoemaker showed us his data-mining project London Lives. He is interested in ways to find his way through data in a way that can be understood. He contrasts this with popular commercial search engines that make it easy to find popular results through a proprietary ranking algorithm.

He also shows us Connected Histories, a new search engine, that additionally provides some natural language processing capabilities. It allows users to search through a variety of distributed documents about British History.

He introduced Scrutiny, a Firefox extension that helps researchers by recognising entities within research data. It allows researchers find relevant data easier.

The last project he talked about is Data Mining with Criminal Intent a project developed by a large international collaborative team. It includes OCR and plugs in to available tools like Zotero, and TAPor including Voyeur. It searchers through the data of the Old Bailey Proceedings.

He concludes by saying that data-mining allows researchers to explore collections in new ways, in order to discover new features.

New Directions in e-Science for the Arts and Humanities – Stuart Dunn, KCL & Tobias Blanke, KCL

Stuart Dunn works at the AHESSC. He is concerned with data deluge: large sets of distributed, fuzzy data that is constantly being produced. He argues that while this is still the case in e-Science, the world is moving towards cloud computing.

He reiterated yesterdays interest in process. He highlighted reconstructions as an example of this, and argued that these are of an educational nature.

He showed the Motion in Place Platform which was developed to document the motion traces in dance performances. He was inspired by this project, and how it can be transfered to archaeology, and how excavations are conducted.

He is also interested in the behaviour of visitors in galleries and museums. They have developed a piece of software to track the movements of visitors. The original interest was in the concept of exhibitions as happenings.

Finally, he talked about the difficulty of dealing with fuzzy data in GIS. Geoparsers can deal with references that point to a specific point in the map, like names of places or coordinates. Descriptions of the environment, which indicate a location are not very well handled. They are trying to find ways to feed additional information from external databases in order to help the geoparser become a more useful tool.

He also described LaQuAT, a tool developed to bring together diverse resources that are already available, and provide researchers a way to cross-query them.

He concluded by arguing that e-Science and Digital Humanities are very diverse areas. Bringing together projects and resources is important. It will help researchers documenting the process, linking resources, and developing new research questions.

Data Resources, Sustainability & Infrastructure – Alastair Dunning, JISC; David Robey, Oxford & Lorna Hughes, KCL

Alastair Dunning introduced JISC research sources in mass digitisation, digital resources, digital collections, and variety of others. He stressed the importance of all these resource made available to the public, which are changing the way researchers work. This resources have been release despite technical and copyright problems. However, there is still a lot of work to do. Several disciplines are not covered by current digitisation efforts, plenty of resources are under-utilised, or unsustainable, and many are too spread. Further, there is less digitisation funding available.

Future funding will concentrate on international research and data-mining. They also want to improve usage, meta-data availability, and usability. Opening up data via APIs or other Web2.0 initiatives will be also encouraged. Content creation is still the centre of attention: UK special collections, advancing teaching needs, stimulating the economy, and building communities.

David Robey talked about AHRC funding past progress and objectives. The focus of the AHRC is very narrow, yet it has still funded a significant number of digital projects.

He stressed that we need e-infrastructure that is visible, sustainable, follow standards and are usable. Building sustainable resources is difficult, securing funding in the long term is the main issue. Making the resources standard, however, helps accessibility and visibility, which may help funding.

He highlighted the AHRC-EPSRC-JISC Arts & Humanities e-Science Initiative which has funded a variety of projects, workshops in a wide variety of subjects, methods and technologies. He discouraged the concentration of new research findings, arguing that it takes some time for new technology to show its influence.

He concluded emphasising that if AHRC wants to get their investment’s worth in research funding, they need to create and encourage the creation of the minimum standard of e-Infrastructure.

Lorna Hughes talked about , a comprehensive knowledge-base for methods and resources in digital humanities. Their aim is to bring together a variety of resources and knowledge about digital humanities in combination with the related communities, computing science departments, museums, etc. and showcase that interdisciplinarity. The limited funding has forced to re-visit the original objectives of the project. The main purpose is to bring visibility to the discipline, to show how it has changed research in arts and humanities, to provide evidence of digital humanities.

She also stressed the importance of visibility of individual projects. Usage and cross-disciplinarity is seen as a larger benefit. Funding ends in 2010, she hopes is that the resource can survive after that.

Alan Bowman concluded the colloquium by thanking the British Academy and the attendees.

End of Day 2