Text preprocessing for Trend Mining
In order to retrieve RDF-triples from a text stream, we have to preprocess texts. Our preprocessing component uses statistical parser from Stanford University and creates a timeline with parsed texts.
Trend Mining corpus (in cooperation with neofonie GmbH)
Texts from our Trend Mining corpus are stored in 2 data bases: finance and mafo.Finance has 21 tables named by the document sources: chats, information boards, etc. Finance corpus consists of 276,587 documents. Mafo has 27 tables named by the document sources: chats, information boards, etc. Mafo corpus consists of 74,145 documents.
Chemisches Zentralblatt corpus (in cooperation with FIZ Chemie GmbH):
We develop approaches for semantic preprocessing of chemical texts from 19th century.
Further description available soon.
Team