Blog & News

Creating multilingual vocabularies

09 junio 2017

One of the distinct work packages during the design and implmentation of the database that will sit within the main ArchAIDE application has been the creation of mulitlingual vocabularies to aid cross-searching catalogues that differ both linguistically, but also conceptually. To take the simplified example above, we may not only use different words within our own laguage to categorise something, but based on our respective archaeological traditions we may use different levels of granularity which have become firmly established in our working practices. For example, I may simply call something a "plate", but someone else may wish to call it a "dinner plate". Ultimately, we 're talking about a shared concept, and in searching a database I may not want to be presented with too many classifications, or to miss the detail in other people's data.

At the time of writing vocabularies have been created for a series of fields within the database that would normally use a controlled list:

  • Sherd type (for example "rim" or "handle")
  • Form (for example "plate" or "urn")
  • Decoration form (for example "Burnished" or "incised")
  • Decoration color (for example "yellow")
  • Fabric (in progress!)

Each of the partners (Cologne, York, Pisa and Barcelona) that are contributing data to the reference database have gone through their inventories and catalogues, and identified the descriptive terms used within that refer to those main categories. Learning from the methodology and using the tools developed for the ARIADNE project by the Hypermedia Research Group at the University of South Wales, we've then looked to create a neutral spine to which partners could map these terms. The ARIADNE project used the Getty Research Institutes's Art and Architecture Thesaurus (AAT) to successfully map all the resource discovery subjects but it was not known whether the AAT would be suitable for the for more specialised pottery concepts. After a thorough evaluation, it was decided it was immensely useful in this regard, and would give the added value of interoperability with ARIADNE and any other data mapped to the AAT. To give a very basic example, after consultation it was decided that the AAT concept sgraffito was adequate to describe the form of decoration made by scratching through a surface to reveal a lower layer of a contrasting colour.

In Italian catalogues the following terms are used to describe this technique, :

  • graffita
  • graffita a punta
  • graffita a stecca

In Spanish + Catalan catalogues:

  • Esgrafiada

Thus all these terms were mapped to sgraffito. Within the reference database, any data being imported that contains, for example 'graffita a stecca' could extend the section on decoration form to also contain the AAT term.

Besides having a centralised term that can be used to provide a simple linguistic transaltion (so a Spanish user looking for types of 'Esgrafiada' will find all the 'graffita...') there are further benefits to this approach. The first is for simple knowledge organisation within our own database. So, for example in the photograph (right) from the Roman Amphorae database, the image depicts only part of the vessel. We can use the selected AAT terms to describe the parts depicted, e.g.:

Thus we now know that that picture contains a series of solid, defined concepts


Now, in a hypothetical scenario. Someone in the field may find a sherd of pottery to run through the ArchAIDE application (below, right). In this case they will be presented with a list of neutral terms with which to categorise it:


If the user can make a firm identification, they would probably deduce that this sherd could probably be classed as belonging to handles!

Alternatively, the user could opt to select or input terms in their own language. For example to show just a few terms from the German vocabulary:

  • gebogener Henkel
  • Ohrförmiger Henkel
  • langer Vertikalhenkel

Again, these are all types of handles! So, right from the beginning we can start filtering our database, and looking for models that contain handles for comparison. This would of course include the examples from the (UK) Roman Amphorae database.

To take things a stage further, we can also use the voabularies and mappings to account for differences not only in language, but also granularity in the definitions and classifications used within catalogues. In the example below, you can see two descriptions of "form" from the Italian and Spanish catalogues. In their mappings, colleagues from Pisa and Barcelona have classed these to the AAT term jars

This can be utilised in the ArchAIDE application (below). So, in one instance you may be dealing with a fieldworker that finds part of a vessel. There's enough information for them to begin to establish its form according to the list of AAT terms we're using. The reference database can also provide a preview of what we're calling "jars" to help the user pick a form. In this case, the user is happy that it's probably part of a "jar", in which case the comparison can be weighted towards all those types which have been mapped to this concept (including “Cántaro” or "Orcio"). Thus, the application may be able to tell me (British, with limited understanding of Spanish ceramic typologies!) that I've found something which in Spanish is called a “Cántaro”. Conversely, the user may be more expert in chronologies and typologies of ceramics in their locale, and already be able to recognise a “Cántaro”. In this case the application will allow them to use their native term, which as we know is a "jar" can be used to expand a search. This may be useful as (in this hypothetical case) there may be sub-types of “Cántaro”. Finally, the mapping between "Cántaro" and "jars" means that both terms are stored in the final results.


These are admittedly, small steps and only beginning to use the cababilities of a structured vocabulary such as the AAT (inference for example). However, the work from all contributing partners in mapping the terminologies within their catalogues is not to be underestimated. It is to be hoped that this hard work reaps dividends in allowing the ArchAIDE applicaiton to enrich the potential of sherd-based searches, returning potential matches that would otherwise be hidden by our historic traditions of classification.

ArchAIDE at CAA Atlanta

21 febrero 2017

Please come say hello to members of the ArchAIDE team, Michael Remmy (University of Cologne) and Holly Wright (University of York) to learn more about the ArchAIDE project at the CAA Atlanta poster session (Tuesday, March 14, 2017, 5:45pm - 7:30pm).

Poster Abstract: The newly launched ArchAIDE project will support the classification and interpretation work of archaeologists with innovative, computer-based tools, and provide users with features for the semi-automatic description and matching of potsherds digitised from existing ceramic catalogues. Pottery classification is of fundamental importance for understanding and dating archaeological sites, and for understanding production, trade flows and social interactions, but requires complex skills and is a time consuming activity. ArchAIDE seeks to revolutionise the habits, behaviours and expectations of archaeologists, and meet real user needs by reducing time and costs associated with pottery classification. ArchAIDE will develop an automatic-as-possible procedure to transform paper catalogues into digital descriptions, and create a digital comparative collection for search and retrieval. A tool will then be developed for mobile devices, to support archaeologists in recognising and classifying potsherds during excavation and post-excavation analysis. This will include an easy-to-use interface and efficient algorithms for characterisation, search and retrieval of visual/geometrical correspondences. This automated procedure will allow the creation of a potsherd’s identity card by transforming the data collected into a formatted electronic document, printed or screen-based, and a web-based real-time data visualisation. These tools will then be tested and assessed in the field, paving the way for future exploitation. ArchAIDE is coordinated by the University of Pisa, Italy, and funded by the European Commission.


Call for papers: EAA session sponsored by ArchAIDE

21 febrero 2017

The annual EAA Conference will be held this year in Maastricht, the Netherlands from 30 August to 3 September. The ArchAIDE project would like to invite papers related to the topic of automation in artefact recognition. Papers are encouraged which not only highlight technical possibilities, but also challenges facing artefact recognition by archaeologists working across Europe. Session details are available below:

Session 166: Automation in artefact recognition: perspectives and challenges in archaeological practice

Theme: Interpreting the archaeological record

Session format: Papers, maximum 15 minutes each

Deadline 15 March, 2017

You can submit to the session via the EAA website

Given that artefacts are of fundamental importance for the dating and interpretation of archaeological contexts, the automatic recognition of artefact types has been one of the ‘golden chestnuts’ of archaeological computing, dominating computer application papers of the 1970s and 1980s, but development of a practical working system has not been successful. Nonetheless, software and image recognition technology has moved on, and projects like ArchAIDE, DADAISM and GRAVITATE are working towards the (semi-) automatic recognition of artefacts (pottery, metalwork, stone tools, plastic arts, etc.) and the (partial) automation of archaeological workflows.
Artefact recognition is a time consuming activity, and spending time (and money) in repetitive work is not optimal, but automation can help in supporting interpretation with innovative computer-based tools. Artefact recognition calls for complex, specialist skills which are not always available. Automation can facilitate specialist interpretation for generalists, increasing the number of researchers able to devote more time to data analysis, and consequently to greater comprehension and new knowledge in areas such as trade and exchange, supply and production, religious or social affiliation, and so on.
Based on this assumption, we call for papers to foster both theoretical discussion as well as practical solutions, focused on how automatic artefact recognition could:

• meet real user needs, and generate economic benefits;
• produce new interpretations;
• revolutionise archaeologists’ habits, behaviours and expectations;
• create societal benefits from cultural heritage, improving access, re-use and exploitation of digital cultural heritage in a sustainable way.

Keywords: Artefacts, automation, interpretation, recognition, practice

Deadline: 15 March, 2017

Nearching Factory

20 febrero 2017

30th January – 1st February 2017

Santiago de Compostela (Spain), San Martín Pinario

What was Nearching Factory? A real factory. A hard work of the organization (Spanish team of NEARCH project) to create a not-formal and comfortable place, by unusual ways to share and exchange ideas and experiences. Many people (not only archaeologists) from all over Europe stayed together for 3 days to discuss the future of Archaeology, passing through economic development, social challenges and changes, sustainable choises. ArchAIDE was invited to debate in Working Group n.1 “Digital Capabilities for sustainability”, (lead by Holly Wright) with particular attention to the “Archaeology and the Open Data movement” (Francesca Anichini and Gabriele Gattiglia). You can find here the summaries and results of all the discussions held during the 3 days and the 10 working groups.



FB: @NearchingFactory


Thinking spatially

15 diciembre 2016

Members of the team have recently been undertaking work examining existing digital catalogues, principally Roman Amphorae and CeramAlex, and using the extant data to build a reference database to facilitate filtering of results alongside the image-based recognition. Some of this has been very simple, for example having room in the database to support classification of the labelling the part of the sherd being depicted, e.g. 'handle' or 'rim'. Others that at first seemed simple have transpired to be more difficult than I envisioned. For example terms used to describe the appearance or form of a vessel or sherd (or even decoration) may be consistent within a single schema, but not portable or applicable to other catalogues. In addition, how is the description of a term that is often subjective such as 'beaded' going to help an automated system?

Recently, we've also been considering of the use of geographical terms and/or geometries to assist in filtering results. The broad concept being that an extremely localised type 'xxx' will not usually appear (or has not been documented as found) in area 'xzy. For example, to look at the record from the Teliţa type of amphora, we can see that it has a distribution limited to 'Scythia' (also where it is manufactured), and mapped to the concept of 'Black Sea' in that particular system. Looking more closely, it seems there are 38 countries/areas used to classify distribution. These range from smaller entities such as Cyprus or Belgium to much broader terms such as 'The Levant' or 'Mediterranean region'. When dealing with more common types such as Mid Roman Amphora 5, these broad regions become more understandable (see below).

Record for Teliţa amphora type (

Record for Teliţa amphora type (


Record for id Roman Amphora 5 type (

Record for id Roman Amphora 5 type (

So, thinking ahead. How could we build on this to try and help the application filter by where the sherd was found? And not only for Roman Amphora, but also collections from across archaeological periods and continental Europe, the Middle East and North Africa? To my mind there are three issues: text versus geometry, scale and consistency.

The initial proposal was to record countries or regions similar to that of the Amphorae collection. This is somewhat simplistic, and open to inconsistencies as the catalogue grows with the digitization of paper or museum collections. For example from a British perspective, do I record "Great Britain" "British isles" "United Kingdom", "England" or "Yorkshire"? And although I may pick a term that suits me, someone else digitizing a catalogue may well choose a different term based purely on subjective reasoning. A simple database like or equals statement then potentially misses a positive match, or perhaps even returns a false positive.

To get around this at the ADS we use the Getty Thesaurus of Geographical Names (TGN), a structured vocabulary, including names, descriptions, and other metadata for extant and historical cities, empires, archaeological sites, and physical features important to research of art and architecture. Mapping terms to Getty subjects (for example see the entry for 'England' or even 'Northern England') not only allows greater consistency, but also flexibility in searching and subsequent results. For the recent ARIADNE project, Holly successfully mapped spatial terms - including those within Roman Amphorae - to Geonames. Although my personal preference is for TGN, purely because it records historical regions such as Scythia or Britannia, I think mapping to modern terms in Geonames would be more useful. Especially as that system supports bounding boxes and polygons for higher tier administrative regions.

Polygon(s) for the extent of Repubblica Italiana in Geonames (

Polygon(s) for the extent of Repubblica Italiana in Geonames (

Although this helps the accuracy of a 'spatial' filter, we're still just restricted to modern administrative regions, which may bear little resemblance to archaeological distribution. If we did want to move towards utilizing capacity of a spatial database, then we'd have to think about the following issues of using our own geometries.

  1. We could create overarching zones such as Roman Amphora (e.g. Baleric Islands or southern Britain) but with a spatial extent stored as a polygon. Types/Classes would then be mapped to these if appropriate. The positives are that these relatively simple to create and administer, the negatives are that we would have to decide on, and then create these 'zones' which for most of Europe will be a hassle. Plus, how detailed do we go?
  2. Each class/type has its own extent polygon(s). This is potentially more accurate than Option 1, but also time consuming and potentially inconsistent if the extents are done by different people (as with use of text terms, this could vary between detailed and very broad!)
  3. The recording of X/Y values for findspots of a particular class/type. This has the potential for a higher level of accuracy, and capacity to build more intuitive map searches. However this is again potentially time consuming for non-digital collections as well as catalogues (such as Amphora) that do not reference sites at all. There's also the danger that the bias of the coverage of some catalogues may unintentionally produce a skewed distribution that is not truly representative of the pottery type.

After talking this through with the project team we're going to investigate (in addition to the mapping of any text terms to Geonames) option 2. Although this will require a certain amount of creation and curation it will really help the application move beyond the restrictions of modern borders. In addition, the database will also look to support individual sites as points where they already exist in the catalogues we're using. There's also the potential for results from the user application to feed back into this, enhancing the coverage of the reference dataset.