Rule-based Information Extraction from Disease Outbreak Reports
Wafa N. Alshowaib
Pages - 37 - 58     |    Revised - 01-06-2014     |    Published - 01-07-2014
Volume - 5   Issue - 3    |    Publication Date - July 2014  Table of Contents
Information Extraction, Disease Outbreak, Rule-based, NLP.
Information extraction (IE) systems serve as the front end and core stage in different natural language programming tasks. As IE has proved its efficiency in domain-specific tasks, this project focused on one domain: disease outbreak reports. Several reports from the World Health Organization were carefully examined to formulate the extraction tasks: named-entities, such as disease name, date and location; the location of the reporting authority; and the outbreak incident. Extraction rules were then designed, based on a study of the textual expressions and elements found in the text that appeared before and after the target text.

The experiment resulted in very high performance scores for all the tasks in general. The training corpora and the testing corpora were tested separately. The system performed with higher accuracy with entities and events extraction than with relationship extraction.

It can be concluded that the rule-based approach has been proven capable of delivering reliable IE, with extremely high accuracy and coverage results. However, this approach requires an extensive, time-consuming, manual study of word classes and phrases.
Miss Wafa N. Alshowaib
KACST - Saudi Arabia

