ApacheCon Europe 2012

Rhein-Neckar-Arena, Sinsheim, Germany

5–8 November 2012

Solr-based search & tagging services at ZEIT Online GmbH - where metadata come from

Christoph Goller

Audience level:
Intermediate
Track:
Linked Data

Thursday 10:15 a.m.–10:45 a.m. in Level 2 Left

Description

This talk will showcase recent efforts of IntraFind in terms of increasing search ability and relevance of search results for the major German news portal ZEIT Online of the weekly German newspaper “Die ZEIT” by using automatic meta data generation and semantic linking.

Abstract

This talk will showcase recent efforts of IntraFind in terms of increasing search ability and relevance of search results for the major German news portal ZEIT Online of the weekly German newspaper “Die ZEIT”. To achieve these two goals we used our morphological analyzers as as well as automatic augmentation (meta data generation) of the archive’s content by using our components for information extraction, text classification, and statistical tagging which are available as RESTful webservice. This set of metadata can be used for search, faceting and clustering in order to support the visitors of the web portal in searching and browsing through the large amount of articles of the ZEIT Online archive.

In the second part of the talk we will present our experience with integrating our components for meta data generation into the Apache Stanbol framework for semantic content management. We will briefly introduce this framework. The main advantages of Apache Stanbol we currently see consist in the linking engines and in the ability to use its persistency and reasoning mechanism.