WordNet
Back to Programming from A to Z
Examples:
Related:
Exercises:
WordNet
Traditional dictionaries (in book format) are designed to be readable and searchable by a human being and the driving organizational principle behind such a dictionary is alphabetic order. A dictionary (or lexicon) designed to be machine-readable, however, does not have to live under such constraints. WordNet is a lexical database of the English language where words (separated into the parts of speech: nouns, verbs, adjectives, and adverbs) are linked via semantic relationships. It’s a social network, so to speak, for words.
“WordNet® is a large lexical database of English, developed under the direction of George A. Miller. Nouns, verbs, adjectives and adverbs are grouped into sets of cognitive synonyms (synsets), each expressing a distinct concept. Synsets are interlinked by means of conceptual-semantic and lexical relations.”
This page will serve as a brief tutorial for using WordNet in Java. We will use both RiTa Wordnet by Daniel Howe and JWNL, a Java API for accessing the WordNet dictionary.
How is WordNet organized?
There are several online resources to get started learning about how the data in WordNet is organized. The WordNet Wikipedia page is a good place to start, not to mention the WordNet site itself.
Following is a brief summary of the key points:
The fundamental building block of WordNet is a synset. A synset is not a word, but rather is a collection of words (and collocations) that are synonymous (i.e. set of synonyms). In other words, the primary record in the WordNet database is a conceptual idea/meaning, which several words might be linked to.
A word can belong to multiple synsets. For example, consider the word “eat.” Here are its synsets:
All synsets are categorized into 4 parts of speech: nouns, verbs, adjectives, and adverbs. While a traditional dictionary or thesaurus is focused on the meanings of words, WordNet is focused on the relationships between synsets. In WordNet, only nouns can be related to nouns, verbs to verbs, etc.
Following are the semantic relationships available in WordNet:
All parts of speech
Nouns only
Verbs only
Using RiTa
We’re going to access Wordnet via RiTa Wordnet by Daniel Howe. The RiTa Wordnet download comes with the Wordnet dictionary itself so all you need is to add two JAR files to your Eclipse build path: ritaWN.jar, supportWN.jar. Both of these JARS are found in the download.
To use RiTa wordnet, you must first declare and initialize the RiWordnet object.
RiWordnet wordnet = new RiWordnet(null);
Just as with regular RiTa from last week, RiTa wordnet expects a Processing PApplet. Since our examples are java console apps, we can just pass in “null” to the constructor (but if you use RiTa with Processing you should pass in a reference to the PApplet).
The wordnet object can then query the Wordnet dictionary for you. There are many useful functions, such as: exists(), getAllAntonyms(), getAllHypernyms(), getAllSynonyms(), getPos(), getSenseIds(), getSoundsLike(), etc. Let’s walk through a few examples.
Let’s say we are starting with a word: “run”. Now word tokens are not the fundamental building blocks of Wordnet, everything in Wordnet is organized into synsets (senses) and a word can be a member of several synsets. Run can mean a lot of different things. As a noun, it can be a run scored in a baseball game or a jog around central park. Run can also be a verb, of course, to run around the park or to run for office, etc. Wordnet can tell us all of this. We first start with a String.
String word = "run";
And then ask RiTa wordnet for all the parts of speech available for that String.
String[] pos = wordnet.getPos(word);
RiTa will give you the following Strings indicating parts of speech:
n –> noun
v –> verb
a –> adjective
r –> adverb
Armed with a part of speech, you can then ask RiTa for all of the “senses” associated with that word.
int[] ids = wordnet.getSenseIds(word, pos[0]);
In wordnet a “sense” is unique, and therefore as a unique ID #. Looping through the ID #’s, we can get more information about the “sense”, such as a description, and the list of words in the synset.
for (int i = 0; i < ids.length; i++) {
// Sense ID #
System.out.println("Sense: " + ids[i]);
String description = wordnet.getDescription(ids[i]);
// Sense Description (definition)
System.out.println("Description: " + description);
// All words that belong to this synset
String[] words = wordnet.getSynset(ids[i]);
if (words != null) {
System.out.print("Synset: ");
for (int j = 0; j < words.length; j++) System.out.print(words[j] + " ");
}
System.out.println("\n-------------------------");
}
Wordnet is a massive network of pointers. Every synset points to other synsets, and each pointer is of a certain type: antonynm, synonym, hypernym, etc. RiTa wordnet simplifies these relationships by providing a series of functions that return (as an array of Strings) a list of related words for any given word (and part of speech). For example, if you want the list of all hyponyms for a given word, you would say:
// Hyponyms for all senses
String word = "cat";
String pos = wordnet.getBestPos(word);
String[] result = wordnet.getAllHyponyms(word, pos);
for (int i = 0; i < result.length; i++) {
System.out.println(result[i]);
}
Note that this returns all hyponyms for all of the synsets that include the word cat. If you wanted hyponyms for a specific sense, you would have to use a sense ID, in combination with the function: getHyponyms(). You can get lists of other types of related words in exactly the same manner as above with the functions offered here: getAllAntonyms(), getAllDerivedTerms(), getAllHolonyms(), getAllHypernyms(), getAllHyponyms(), getAllMeronyms(), getAllNominalizations(), getAllSynonyms(), etc.
Here are some other examples:
Listing Hyponyms: Hyponyms.java
Listing Synonyms: Synonyms.java
Listing Synsets: SensesLookup.java,
Rewriting a text using antonyms, synonyms, and hyponyms: Rewrite.java, Replacer.java
Using JWNL
JWNL (Java WordNet Library) is a Java API for accessing the WordNet dictionary. I recommend that you use the much friendlier RiTa Wordnet (as described above), however if you want to delve deeper into the code and walk through the "pointer" relationships in wordnet more manually, JWNL is the place to be. Here's how you can get started.
Step 1. Download WordNet.
Select and download the WordNet files for your OS. The folder we care about is "dict," which stores the dictionary files.
Step 2. Download JWNL
JWNL is available for download here. You can read a bit more about it here as well as peruse the JavaDocs.
Step 3. Grab jars and configure the properties file
The download includes two JAR files: jwnl.jar and commons-logging.jar. You must add both of these to your build path in Eclipse to run the examples provided on this page. In addition, you should make sure to copy the sample "file_properties.xml" file. Edit the line in this file that includes the path to the dictionary. i.e.:
<param name="dictionary_path" value="/Users/daniel/Desktop/WordNet-3.0/dict"/>
Once you have JWNL installed, you can access the WordNet database from your Java application and search for these semantic relationships.
First, you must initialize JWNL with the properties file.
JWNL.initialize(new FileInputStream(propsFile));
Once the database is initialized, you can create a Dictionary object (that can be queried).
wordnet = Dictionary.getInstance();
Once you have the Dictionary object, you can begin to look up words and search for relationships. To do this, we need to get familiar with the following classes:
IndexWord. An IndexWord is a single word and part of speech. An IndexWord can be used to lookup a Synset object.
IndexWord word = wordnet.getIndexWord(POS.VERB,"run");
Once you have an IndexWord, you can lookup all the Synset objects associated with that word. A Synset represents a concept, and contains the set of words whose meanings are synonymous.
Synset[] senses = word.getSenses();
for (int i = 0; i < senses.length; i++) {
System.out.println(word + ": " + senses[i].getGloss());
}
For simplicity, you might also just ask for the first one.
Synset sense = word.getSense(1);
For any given Synset, you can search for all related Synsets (for any given type of relationship). All the Synsets in WordNet that are related to each other point to each other. The PointerUtils class provides a selection of methods that returns a list of Synsets that a given Synset points to. For example:
PointerTargetNodeList relatedList = PointerUtils.getInstance().getSynonyms(sense);
Once you have that list, you can iterate through it:
Iterator i = relatedList.iterator();
while (i.hasNext()) {
PointerTargetNode related = (PointerTargetNode) i.next();
Synset s = related.getSynset();
System.out.println(s);
}
With two Synsets, you can search for a relationship between them using the RelationshipFinder.
The RelationshipFinder requires that you specify a PointerType. There is a PointerType for each of the semantic relationships described above, i.e.:
PointerType.SYNONYM
PointerType.HYPERNYM
etc.
RelationshipList list = RelationshipFinder.getInstance().findRelationships(synset1, synset2, PointerType.SYNONYM);
if (!list.isEmpty()) {
Relationship rel = (Relationship) list.get(0);
System.out.println(rel);
}
In the above example, a list of Relationship objects is returned. However, for simplicity, only the first relationship is displayed.
Once you have that Relationship object, you can call various methods to learn about that relationship. For example, you can get the depth of that relationship (how many degrees of separation) via getDepth() as well as traverse the links between the Synsets by calling getNodeList().
System.out.println("The depth of this relationship is: " + rel.getDepth());
PointerTargetNodeList nodelist = rel.getNodeList();
Iterator i = nodelist.iterator();
while (i.hasNext()) {
PointerTargetNode related = (PointerTargetNode) i.next();
System.out.println(related.getSynset());
}
14 Responses to “WordNet”
- 1 Pingback on Mar 14th, 2007 at 3:27 pm
- 2 Pingback on Sep 22nd, 2007 at 3:21 pm
Dear All,
I’m a student and as part of one of my projects I need to do sort of manipulations with the jwnl.jar and WordNet dictionary, I started my work with the samples available at http://www.shiffman.net/teaching/a2z/wordnet/ , but they all stop with an exception saying :
java.io.FileNotFoundException: c:\program files\wordnet\2.1\dict\adv.idx (The system cannot find the file specified)
I have installed the wordNet 2.1 on my windows pc, but couldn’t find the specified file (adv.idx). I would be grateful if somebody could tell me the place I may find this file.
Thanks in advance
I think the code that you are executing make use of jwnl,but jwnl doesn’t support wordnet 2.1 so you try installing wordnet 2.0 and your problem will be solved
there will be a cntlist.rev rar file in the same dict folder. try to execute that it will get the missing files in the directory
Can I use JWNL to measure semantic similarity between two sentences?
Thank you for your response…
I am a student and am currently working on a project that needs to use WordNet to determine the part of speech a word belongs to.
But the project is in C#. How can I use WordNet through C#.
Please Help.
Thanks.
Dear All,
I’m a student and I am working in my projects with the jwnl.jar and WordNet dictionary, I need a function wich return all hypernyms of a synset , but not just of the sense 1 .
Thanks in advance
Hi.
I tried the code above, and it works fine. But i dont get any elements in the pointertargetnodelist.
Calling realtedList.size() returns 0, in all occurences…. Hmm , can anybody help?
Regards
Hi.
I’m trying to extract synonyms using WordNet 2.0 with JWNL 1.3. However, It doesn’t work what I expected. For example, I can not get synonyms with resect to “take, VERB”. On the other hand, MIT JWI library gives me synonyms of “take, VERB”.
How can I use JWNL well for finding synoyms?
Thanks in advance.
Hi
I am a student. I have used the code on your website before. But now I have a problem.
relatedList = PointerUtils.getInstance().getSynonyms(sense);
Doesn’t seem to working.
It returns empty.
Please help.
Heung-Seon: Try using RiTa, it makes getting synonyms much easier: http://www.rednoise.org/rita/wordnet/documentation/index.htm
Wow…nice step by step how-to….really appreciate this!
hi,
is it possible to add information (synonyms…) in WordNet with JAVA language.
thanks