New method to facilitate extraction
Information on, for example, events in a company from news texts; who is leaving which post, why, to which company and position the person is moving etc. In his thesis Fredrik Olsson deals with a new method of facilitating the marking up of occurrences of names in data-based textual documents.
Information extraction entails analysing texts with the aim of identifying and picking out information about predefined types of entities, events in which the entities are engaged and relationships between entities and events. In other words it is about gaining access to structured information from an apparently unstructured source of information.
One of the reasons that information extraction is not available for everyone is that it requires a lot of work and time to adapt a system to function for new data in a new text domain. A system that could handle the scenario used as an example above would probably not function at all if the data were changed to identifying interactions between proteins described in biomedical text.
An established way of approaching the problem of domain adaptation of systems for information extraction is to realise its components using machine learning, i.e. computer programs that can learn. In many respects machine learning is based on there being examples from which to learn. A component in an extraction system needs to see examples of the phenomenon it is going to learn to identify, e.g. entities and the relationships between them. The basis of this type of machine learning is thus access to large quantities of examples. However, there are major challenges in producing good examples: it is laborious, takes time and requires a person who knows the domain well to mark up examples in texts.
Recognising names of, for example individuals, companies and locations is fundamental for information extraction. By recognising names we can also start to look for, for example, relationships, expressed in the text, between the bearers of the names.
In his thesis Fredrik Olsson describes the work of developing and evaluating a method, called BootMark, of marking up the occurrence of names in textual documents. BootMark contributes to reducing the quantity of documents that a human annotator needs to mark up in order to train a name recognizer with a performance that is equally good or better than a name recognizer who is trained in a random selection of documents from the same corpus.
Title of the thesis: Bootstrapping Named Entity Annotation by Means of Active Machine Learning. A method for creating corpora.
The thesis will be public defended on Friday 19 December at 1.15 pm
Location: Lilla hörsalen, Humanisten, Renströmsgatan 6
For further information contact Fredrik Olsson, mobile: +46 (0)704 -15 54 10,
e-mail: fredriko@sics.se
Contact person: Barbro Ryder Liljegren Faculty of Arts, University of Gothenburg Tel. +46 (0)31-786 48 65, e-mail: barbro.ryder@hum.gu.se
Media Contact
More Information:
http://www.vr.seAll latest news from the category: Communications Media
Engineering and research-driven innovations in the field of communications are addressed here, in addition to business developments in the field of media-wide communications.
innovations-report offers informative reports and articles related to interactive media, media management, digital television, E-business, online advertising and information and communications technologies.
Newest articles
Parallel Paths: Understanding Malaria Resistance in Chimpanzees and Humans
The closest relatives of humans adapt genetically to habitats and infections Survival of the Fittest: Genetic Adaptations Uncovered in Chimpanzees Görlitz, 10.01.2025. Chimpanzees have genetic adaptations that help them survive…
You are What You Eat—Stanford Study Links Fiber to Anti-Cancer Gene Modulation
The Fiber Gap: A Growing Concern in American Diets Fiber is well known to be an important part of a healthy diet, yet less than 10% of Americans eat the minimum recommended…
Trust Your Gut—RNA-Protein Discovery for Better Immunity
HIRI researchers uncover control mechanisms of polysaccharide utilization in Bacteroides thetaiotaomicron. Researchers at the Helmholtz Institute for RNA-based Infection Research (HIRI) and the Julius-Maximilians-Universität (JMU) in Würzburg have identified a…