The increasing availability and maturity of both scalable computing architectures and deep syntactic parsers is opening up new possibilities for Relation Extraction (RE) on ever-growing corpora of natural language text. Freepal is a resource designed to assist with the creation of relation extractors for more than 5,000 relations defined in the Freebase knowledge base. The resource consists of over 10 million distinct lexico-syntactic patterns defined over dependency trees, each of which is assigned to one or more Freebase relations with different confidence strengths.

The resource is generated by a large-scale distant supervision approach on the ClueWeb09 corpus to extract and parse over 260 million sentences labeled with Freebase entities and relations.

The dataset is released to the research community to evaluate our method further as well as use the findings as basis for building powerful relation extraction systems.

Patterns are extracted from the shortest undirected path between two annotated entities. The pattern is a candidate for all Freebase relations (in this case PersonSiblingOfPerson) that the two entities participate in.

You can download the dataset in two different sizes.

Large dataset
10 Million different pattern with their observed relations. 180MB BZip2 compressed JSON (4.4 GB raw)
Non lemmatized
7.5 Million non lemmatized pattern with their observed relations. 200 MB Bzip2 compressed JSON (4.7 GB raw)
Website dataset
The dataset which drives the web demonstrator. 15 MB JSON

If you need the dataset in a different format, please contact us.

The dataset is comprised of individual JSON documents, one per line. A sample of the small dataset looks like the following:
 "feature":"argue case [X] before [Y] [0-dobj-1,0-prep-3,1-appos-2,3-pobj-4]",
 "sentence":"Justice Marshall successfully argued the 1954 landmark case Brown v. Board of Education before the U.S. Supreme Court.",
The large dataset does not contain any sample sentences or freebase annotations. A sample record looks like the following
 "feature":"argue case [X] before [Y] [0-dobj-1,0-prep-3,1-appos-2,3-pobj-4]",


If you use this data in a publication, please cite it using one of the following citations

Johannes Kirschnick, Alan Akbik, Holmer Hemsen, "Freepal: A Large Collection of Deep Lexico-Syntactic Patterns for Relation Extraction", in the 9th edition of the Language Resources and Evaluation Conference, LREC. 2014.

Johannes Kirschnick, Alan Akbik, Holmer Hemsen, "Freepal Dataset: A Large Collection of Deep Lexico-Syntactic Patterns for Relation Extraction", Version 1 (Release date 24.10.2013), October 2013

Download PDF.


