GNU Head

About the EP4IR

grammar and lexicon

for Information Retrieval

AGFL logo

Read about EP4IR and AGFL in the bibliography.

You are welcome to try version 2.8 of the EP4IR parser.

You may take a look at EP4IR Version 2.8:

Ready-to-use parsers for use under Unix and Windows can be downloaded at the moment only for AGFL 2.4; for version 2.8 they will be available shortly.

back to AGFL

Goal of the "English Phrases for IR" (EP4IR) project at the Radboud University Nijmegen (The Netherlands) is the development of a grammar and lexicon of English suitable for applications in Information Retrieval and available in the public domain.

The 'English Phrases for Information Retrieval' (EP4IR) grammar started life in 1962 as the first affix grammar for a natural language (English), developed, implemented as a generative device and presented to the EURATOM colloquium at the university of Amsterdam by two students, Lambert Meertens and Kees Koster.

It was revived in the early nineties of the previous century, cast into a modern notation (AGFL), extended and completed with a large lexicon. The rationale behind the current version of the grammar is described in this working paper .
The parser generated by the AGFL system from the EP4IR grammar and lexicon is robust against badly formed input and unknown words. It analyses the input from left to right, as a sequence of phrases, skipping words that are unknown or useless. For each phrase, the most probable analysis is sought. Using the transduction facility of AGFL, the parser can produce not only parse trees but also dependency trees, which may be unnested to dependency triplets, as described in

  • C.H.A. Koster, M. Seutter and O. Seibert (2007),
    Parsing the Medline Corpus. Proceedings RANLP 2007, pp 325-329. pdf
In the transduction process, the dependency trees are syntactically normalized and some elements of the input are elided to denote only the aboutness of the text (for the notion of "aboutness" see P. Bruza and T.W.C. Huibers (1996), A Study of Aboutness in Information Retrieval. Artificial Intelligence Review, 10, p 1-27.)

In 2001 the AGFL system became the the first parser-generator for linguistic applications which was made available under the GNU Public License GPL. The EP4IR grammar and lexicon as well as the parsers generated from it fall under the LGPL, so that they can be used freely for all applications, even for commercial exploitation.

Comments can be mailed to www-agfl@cs.ru.nl.