Current Approach for Searching the Portal
Currently, the Museumsportal Berlin provides its visitors with a simple keyword based search functionality allowing users to find web pages presenting museums, exhibitions, or events in which the specified term appears either in the textual description or among tags associated with a particular page (see Figure 1). The advantage of this approach lies in its simplicity and the fact that users are familiar with this kind of search. However, the main drawback of the keyword search, in general, is that the results are obtained merely on the basis of a syntactic match (i.e. the exact occurrence of a given term). This problem is especially evident in cases of:
- misspelling
- alternative spelling (e.g. Sandro Botticelli vs. Il Botticello)
- aliases (Alessandro di Mariano di Vanni Filipepi known as Botticelli)
- synonyms (words having the same meaning, e.g. fix and repair)
- homonyms (the same word having different meanings, e.g. bank meaning either a river bank or a financial institution)
Those cases may lead to a situation where the museums of interest to the user are not found, even though they exist in the portal's database, simply because the searched keyword does not exactly match the words used in the museums' descriptions.
Figure 1: Webpage describing a museum.
Another problem arises from the fact, that the processing search engine
does not "understand" the meaning (semantics) of the search keyword and
thereby cannot relate it to other terms which might also yield a valid
query result. In order to illustrate this, consider the following
example: If users of the Museumsportal Berlin who are looking for
museums related to "impressionism", merely perform a search for this
particular keyword, they will find only two entries. There are, however,
many more museums presenting paintings of impressionist artists, for
example pieces by Claude Monet, Max Liebermann, or Karl Hagemeister (see Figure 2).
Unless the user performs multiple iterative searches for all those
related terms, which of course is a tedious task requiring some
knowledge in the arts domain, many museums of interest to the user will
not be found.
Figure 2: Example of a search expansion. The search for "impressionism" is expanded into multiple searches for artists belonging to this art movement, like Claude Monet, Max Liebermann and Karl Hagemeister. The blue boxes represent additional search resul
Moreover, the keyword search proves rather an
inefficient method if users introduce additional constraints into their
queries. For example, if someone is looking for museums or events
related to "impressionism", open on Tuesday, with entrance fee less than
10€ and audio guidance in English, the museums of interest can hardly
be found by a simple enumeration of keywords. Instead, a mixed approach
of searching and navigation is required (as depicted in Figure 3).
First, the user has to perform a query for the key concept (i.e.
impressionism, see Fig. 3.a), then he or she has to examine each found
museum or event by following links to subpages containing information on
opening hours, prices, and services (see Fig 3.b-d). In this concrete
example the user of the Museumsportal Berlin would have to go through a
navigation path consisting of 45 clicks, at the same time evaluating and
aggregating all the information "manually" and memorizing museums
satisfying his or her preferences.
Figure 3: A navigation path for a complex query.
The problems associated with keyword search, described above,
are mainly caused by the fact that most of the information available on
various portals, such as the Museumsportal Berlin, is represented in
form of textual descriptions designed to be read by humans. Although
machines can parse web pages for layout processing, they do not
understand the semantics of the data. We hereby propose enhancements to
the portal, relying on a formal representation of the information from
the arts domain using Semantic Web Technologies.
Enhancing the Portal with Semantic Web Technologies
In the following we want to present some ideas on how the search and navigation on the portal may be improved by the application of Semantic Web technologies.
Museum Ontology
As already pointed out above, the main problem about accessing information on the Museumsportal arises from the fact that information is represented in a text format which, in terms of semantics, can hardly be understood by computers. Our proposed solution to this problem is the formalization of the portal data in form of a museum ontology consisting of two sub-ontologies:
- Museum Description Ontology defining the semantic structure and key concepts used for describing cultural institutions as well as events and exhibitions offered by them.
- Arts Domain Ontology capturing the general knowledge from the arts domain including information on artists, art movements, etc.
The former sub-ontology is populated with instances of museums which are present on the portal. We convert all the available data about each museum into the schema of our ontology. Since most of the information is provided by those institutions themselves, through a simple input form, the data is rather weakly structured. Therefore, we additionally apply Named Entity Recognition techniques for the extraction of artist names, etc. as well as identify catchwords belonging to the arts domain. The found names and catchwords are, in turn, mapped onto concepts from the latter sub-ontology, thereby connecting the information about museums with a broader knowledge base of semantic relations from the arts domain. (Compare integration component in Figure 4.)
Since the process of ontology development and maintenance is a rather complex and costly task, especially for such a broad domain as arts, we try to reuse already existing knowledge provided by other communities such as Wikipedia. At this point, it is important to note that we utilize this particular information source only as a practical example in order to illustrate the potential benefits resulting from the application of semantic technologies. In fact, there exist several classifications and thesauri, for example the
Art and Architecture thesaurus (ATT) or the
Union List of Artists Names (ULAN), which could be used as a foundation for our domain ontology as well.
Figure 4: A schematic view of the system architecture.
Information Integration
There is, however, one important issue about integrating information from Wikipedia into Museumsportal Berlin, which is, that Wikipedia itself is a collection of documents represented in textual form, targeted at human readers and thus can only be queried by keywords. As argued before, we need a well-structured and semantically rich representation of data in order to overcome the limits of keyword search. This is even more important if we want to automatically integrate the relevant information from Wikipedia into Museumsportal Berlin. Fortunately, owing to the
DbPedia Project - a community effort aiming at extracting structured data from Wikipedia and representing it with Semantic Web technologies - we can easily perform this integration task.
For each catchword or named entity (e.g. artist name), found either in the museum description or among its tags, we perform a look up in DbPedia in order to check if the given concept belongs to the arts domain. This can be found out based on the category of the DbPedia-resource corresponding to the concept in question. For example, the catchword "bauhaus" has a corresponding DbPedia-resource dbpedia:Bauhaus which belongs to (indicated by the property rdf:type) the category yago:ArtMovements, as shown in Figure 5. If the given catchword was positively validated, additional information describing this resource (in this example: painters associated with this movement etc.) is integrated into our ontology and stored in a local triple store for improved performance (see Figure 4).
Figure 5: The DBPedia ressource for Bauhaus.
By linking domain concepts on the Museumsportal with DbPedia-resources (also pointing to human-readable Wikipedia articles) we are able to enrich the content presented on the portal by embedding additional information on catchwords and entities found in museum descriptions. Consequently, visitors of the portal are provided with comprehensive information on the subject of museum exhibitions without the need to leave the Museumsportal in order to consult other sources for more details on encountered keywords (see Figure 6).
Figure 6: A popup for the entity surrealism providing additional information.
Improved Search and Navigation
Apart from enriching the information presented in the front-end of the
Museumsportal Berlin, we also use semantic relations between concepts
from the arts domain in order to overcome the limits of keyword search
discussed earlier.
Since the domain ontology extracted from
DbPedia contains information on synonyms and alternative spelling for
arts concepts, e.g. impressionism and impressionist art, as well as on
aliases of artist names (both indicated by the property
dbprop:redirect), e.g. Sandro Botticelli or Il Botticello, we utilize
this data by applying the mechanism of query expansion. Each search for a
keyword specified by the user is complemented with queries for all its
synonyms and spelling variations from our ontology.
Moreover,
this simple mechanism is also applied to provide cross-lingual search.
Although most of the museum and exhibition descriptions, delivered by
those institutions themselves, are available in German as well as in
English, there are still some exceptions where only a German version is
available, especially in the case of tags. However, since the concepts
in our ontology are associated with their names in different languages
(indicated by the rdfs:label property, see Fig. 5) we are able to map
the search keyword specified by the user to the same ontology concept,
regardless of the language used, and expand the query into other
languages. For example the search for impressionism (engl.) is realized
by mapping this keyword to the concept dbpedia:Impressionism and
performing the search for both the English and the expanded German (i.e.
Impressionismus) term.
The examples so far deal with improving
the keyword search for a particular concept from the arts domain by
considering its different lexical representations (synonyms, alternative
spelling, translations in different languages, etc.). The mechanism of
query expansion, however, may go one step further by additionally taking
into consideration semantic relations between different concepts. For
example, based on our ontology, we are able to expand the search for an
art movement into queries for artists belonging to (indicated by the
property dbprop:movement) this particular style, or in the case of
artists additionally search for their style and other artists they are
related to in various ways (indicated by properties like
dbprop:influencedBy or dbprop:training). Those kinds of semantic
relations are the most interesting ones from the users' point of view.
Figure 7: An expansion rule for the query expansion.
To give a better understanding of our approach to query expansion we
will illustrate the workflow of a search in our system. The search is
initiated by a user entering a search term, e.g. "Paul Klee". The first
thing the system does is a normal keyword based search as every other
classical information retrieval system. The system then searches the
Museum Description Ontology for a resource with the label "Paul Klee".
If it finds one (or more) as in this example
dbpedia:Paul_Klee of
rdf:type yago:GermanPainters the search term is mapped to
this resource.
We then search a set of predefined semantic rules for the query
expansion. Those rules are defined in an RDF file which enables a
greater flexibility as the rules can easily be modified if the schema
changes and new rules can be added. Every rule has a set of resource
types it applies to (because
of a restructuration of the DBPedia ontology in version 3.6 we do not
only match rules by means of the
rdf:type property but by arbitrary
properties, esp.
dcterms:subject). In this example the system finds (among others) the rule
rule:ExtendArtistWithMovement which applies to resources of the
supertype
yago:Creator109614315, as shown in Figure 7. To determine the supertype of a resource we use interference. A rule also
contains an expansion pattern, which is a SPARQL query that has as
result labels of resources that the rule finds related to the searched
resource. In this case the expansion pattern returns labels of resources
that are art movements, e.g. "expressionism", "surrealism" and
"bauhaus". The system then performs a keyword bases search for every
additional label the expansion query delivered. In this example the
classical search in our system as well as the genuine Museumsportal
finds three normal results. Our system additionally finds nine more
museums that have exhibits from the art movements of Paul Klee, e.g.
expressionism.
The advantage of our hybrid approach to search (first
performing a normal keyword based search and then expanding the search
based on semantic relations between domain concepts) is that on one hand
the system still finds all the results the genuine Museumsportal Berlin
finds. On the other hand the system is able to find more relevant
results, thus offering an improvement to the normal search.
Because
the expansion of a query into semantically related concepts increases
the amount of answers, it is important to present the search result in
such a way that it is manageable and comprehensible to users. At this
point, once again, the semantic relations used in the process of query
expansion might be used for generating explanations of the result set.
One possible way of doing this, is to first list the exact matches of
the searched keyword followed by the results obtained through query
expansion, each with a dynamically generated explanation, as shown in
Figure 8.
Figure 8: An explanation for an extended result telling the user, why this result is shown.
Another advantage of a well-structured ontology-based representation of
museum data is the possibility of the realization of complex queries,
such as discussed earlier. Users can specify their search constraints
(e.g. desired services, etc.) through facets corresponding to the
possible values of properties, from the ontology, describing museums and
exhibitions. The preferences provided by the visitors of the portal are
then translated into a formal query (i.e. SPARQL, see Figure 4). In
consequence, the amount of clicks currently required to find out the
desired information, as shown in the example in Figure 3, can
significantly be reduced.
Figure 9: A complex search without a long navigation path.
Demonstrator
We also have a
demonstrator online. Please note that we provide only an example dataset (resp. a snapshot of the Museumsportal's data). Because of this you will not be able to find current exhibitions.
Go back