This week’s topics:

  • WordNet
  • Examples:

  • All the code as two Eclipse projects: wordnet_week9.zip
  • Updates on CVS: /home/dts204/a2z/examples/ — Projects: WordNetJWNL, WordNetRiTa
  • RiTa code examples (adapted from Daniel Howe): Hyponyms.java, Synonyms.java, Substitutionalized.java
  • Additional RiTa examples: WordNetDemo.java, SensesLookup.java, Rewrite.java, Replacer.java
  • JWNL examples: SynReplace.java, WordNetDemo.java, WordNetHelper.java
  • Related:

  • WordNet Similarity
  • Words and Rules by Steven Pinker. You might also be interested in watching the lecture Words and Rules or reading the entire the book.
  • WordNet: a lexical database for English, George A. Miller (Note you must visit this site on the NYU Network or via the NYU proxy).
  • Exercises:

  • Write a function that lists all antonyms for any given word.
  • Write a function that traverses the tree of hypernyms (or hyponyms) for any given word.
  • A semantic concordance is a concordance where every word sense (as opposed to word string) is counted. Revise our concordance example to produce a semantic concordance via WordNet.
  • WordNet

    Traditional dictionaries (in book format) are designed to be readable and searchable by a human being and the driving organizational principle behind such a dictionary is alphabetic order. A dictionary (or lexicon) designed to be machine-readable, however, does not have to live under such constraints. WordNet is a lexical database of the English language where words (separated into the parts of speech: nouns, verbs, adjectives, and adverbs) are linked via semantic relationships. It’s a social network, so to speak, for words.

    “WordNet® is a large lexical database of English, developed under the direction of George A. Miller. Nouns, verbs, adjectives and adverbs are grouped into sets of cognitive synonyms (synsets), each expressing a distinct concept. Synsets are interlinked by means of conceptual-semantic and lexical relations.”

    This page will serve as a brief tutorial for using WordNet in Java. We will use both RiTa Wordnet by Daniel Howe and JWNL, a Java API for accessing the WordNet dictionary.

    How is WordNet organized?

    There are several online resources to get started learning about how the data in WordNet is organized. The WordNet Wikipedia page is a good place to start, not to mention the WordNet site itself.

    Following is a brief summary of the key points:

    The fundamental building block of WordNet is a synset. A synset is not a word, but rather is a collection of words (and collocations) that are synonymous (i.e. set of synonyms). In other words, the primary record in the WordNet database is a conceptual idea/meaning, which several words might be linked to.

    A word can belong to multiple synsets. For example, consider the word “eat.” Here are its synsets:

  • [IndexWord: [Lemma: eat] [POS: verb]]: take in solid food; “She was eating a banana”; “What did you eat for dinner last night?”
  • [IndexWord: [Lemma: eat] [POS: verb]]: eat a meal; take a meal; “We did not eat until 10 P.M. because there were so many phone calls”; “I didn’t eat yet, so I gladly accept your invitation”
  • [IndexWord: [Lemma: eat] [POS: verb]]: take in food; used of animals only; “This dog doesn’t eat certain kinds of meat”; “What do whales eat?”
  • [IndexWord: [Lemma: eat] [POS: verb]]: worry or cause anxiety in a persistent way; “What’s eating you?”
  • [IndexWord: [Lemma: eat] [POS: verb]]: use up (resources or materials); “this car consumes a lot of gas”; “We exhausted our savings”; “They run through 20 bottles of wine a week”
  • [IndexWord: [Lemma: eat] [POS: verb]]: cause to deteriorate due to the action of water, air, or an acid; “The acid corroded the metal”; “The steady dripping of water rusted the metal stopper in the sink”
  • All synsets are categorized into 4 parts of speech: nouns, verbs, adjectives, and adverbs. While a traditional dictionary or thesaurus is focused on the meanings of words, WordNet is focused on the relationships between synsets. In WordNet, only nouns can be related to nouns, verbs to verbs, etc.

    Following are the semantic relationships available in WordNet:

    All parts of speech

  • Synonymy. This one is easy and links words that have similar meanings, e.g. happy and glad.
  • Antonymy. The opposite of synonymy, e.g. happy and sad.
  • Nouns only

  • Hypernymy. Hypnernymy refers to a hierarchical relationship between words. For example, furniture is a hypernym of chair since every chair is a piece of furniture (but not vice-versa).
  • Hyponymy. Hypnonymy is the opposite of hypernymy. Dog is a hyponym of canine since every dog is a canine.
  • Meronymy. Meronymy refers to a part/whole relationship. For example, paper is a meronym of book, since paper is a part of a book.
  • Verbs only

  • Troponymy. Troponymy is the semantic relationship of doing something in the manner of something else. For example, “walk” is a troponym of “move” and “limp” is a troponym of “walk.”
  • Entailment. Entailment refers to the relationship between verbs where doing something requires doing something else. If you are snoring, you must be sleeping so sleeping is entailed by snoring.
  • Using RiTa

    We’re going to access Wordnet via RiTa Wordnet by Daniel Howe. The RiTa Wordnet download comes with the Wordnet dictionary itself so all you need is to add two JAR files to your Eclipse build path: ritaWN.jar, supportWN.jar. Both of these JARS are found in the download.

    To use RiTa wordnet, you must first declare and initialize the RiWordnet object.

    RiWordnet wordnet = new RiWordnet(null);
    

    Just as with regular RiTa from last week, RiTa wordnet expects a Processing PApplet. Since our examples are java console apps, we can just pass in “null” to the constructor (but if you use RiTa with Processing you should pass in a reference to the PApplet).

    The wordnet object can then query the Wordnet dictionary for you. There are many useful functions, such as: exists(), getAllAntonyms(), getAllHypernyms(), getAllSynonyms(), getPos(), getSenseIds(), getSoundsLike(), etc. Let’s walk through a few examples.

    Let’s say we are starting with a word: “run”. Now word tokens are not the fundamental building blocks of Wordnet, everything in Wordnet is organized into synsets (senses) and a word can be a member of several synsets. Run can mean a lot of different things. As a noun, it can be a run scored in a baseball game or a jog around central park. Run can also be a verb, of course, to run around the park or to run for office, etc. Wordnet can tell us all of this. We first start with a String.

    String word = "run";
    

    And then ask RiTa wordnet for all the parts of speech available for that String.

    String[] pos = wordnet.getPos(word);
    

    RiTa will give you the following Strings indicating parts of speech:

    n –> noun
    v –> verb
    a –> adjective
    r –> adverb

    Armed with a part of speech, you can then ask RiTa for all of the “senses” associated with that word.

    int[] ids = wordnet.getSenseIds(word, pos[0]);
    

    In wordnet a “sense” is unique, and therefore as a unique ID #. Looping through the ID #’s, we can get more information about the “sense”, such as a description, and the list of words in the synset.

    for (int i = 0; i < ids.length; i++) {
      // Sense ID #
      System.out.println("Sense: " + ids[i]);
      String description = wordnet.getDescription(ids[i]);
      // Sense Description (definition)
      System.out.println("Description: " + description);
      // All words that belong to this synset
      String[] words = wordnet.getSynset(ids[i]);
      if (words != null) {
        System.out.print("Synset: ");
        for (int j = 0; j < words.length; j++) System.out.print(words[j] + " ");
      }
      System.out.println("\n-------------------------");
    }
    

    Wordnet is a massive network of pointers. Every synset points to other synsets, and each pointer is of a certain type: antonynm, synonym, hypernym, etc. RiTa wordnet simplifies these relationships by providing a series of functions that return (as an array of Strings) a list of related words for any given word (and part of speech). For example, if you want the list of all hyponyms for a given word, you would say:

    // Hyponyms for all senses
    String word = "cat";
    String pos = wordnet.getBestPos(word);
    String[] result = wordnet.getAllHyponyms(word, pos);
    for (int i = 0; i < result.length; i++) {
      System.out.println(result[i]);
    }
    

    Note that this returns all hyponyms for all of the synsets that include the word cat. If you wanted hyponyms for a specific sense, you would have to use a sense ID, in combination with the function: getHyponyms(). You can get lists of other types of related words in exactly the same manner as above with the functions offered here: getAllAntonyms(), getAllDerivedTerms(), getAllHolonyms(), getAllHypernyms(), getAllHyponyms(), getAllMeronyms(), getAllNominalizations(), getAllSynonyms(), etc.

    Here are some other examples:

    Listing Hyponyms: Hyponyms.java
    Listing Synonyms: Synonyms.java
    Listing Synsets: SensesLookup.java,
    Rewriting a text using antonyms, synonyms, and hyponyms: Rewrite.java, Replacer.java

    Using JWNL

    JWNL (Java WordNet Library) is a Java API for accessing the WordNet dictionary. I recommend that you use the much friendlier RiTa Wordnet (as described above), however if you want to delve deeper into the code and walk through the “pointer” relationships in wordnet more manually, JWNL is the place to be. Here’s how you can get started.

    Step 1. Download WordNet.

    Select and download the WordNet files for your OS. The folder we care about is “dict,” which stores the dictionary files.

    Step 2. Download JWNL

    JWNL is available for download here. You can read a bit more about it here as well as peruse the JavaDocs.

    Step 3. Grab jars and configure the properties file

    The download includes two JAR files: jwnl.jar and commons-logging.jar. You must add both of these to your build path in Eclipse to run the examples provided on this page. In addition, you should make sure to copy the sample “file_properties.xml” file. Edit the line in this file that includes the path to the dictionary. i.e.:

    <param name=”dictionary_path” value=”/Users/daniel/Desktop/WordNet-3.0/dict”/>

    Once you have JWNL installed, you can access the WordNet database from your Java application and search for these semantic relationships.

    First, you must initialize JWNL with the properties file.

    JWNL.initialize(new FileInputStream(propsFile));
    

    Once the database is initialized, you can create a Dictionary object (that can be queried).

    wordnet = Dictionary.getInstance();
    

    Once you have the Dictionary object, you can begin to look up words and search for relationships. To do this, we need to get familiar with the following classes:

    IndexWord. An IndexWord is a single word and part of speech. An IndexWord can be used to lookup a Synset object.

    IndexWord word = wordnet.getIndexWord(POS.VERB,"run");
    

    Once you have an IndexWord, you can lookup all the Synset objects associated with that word. A Synset represents a concept, and contains the set of words whose meanings are synonymous.

    Synset[] senses = word.getSenses();
    for (int i = 0; i < senses.length; i++) {
       System.out.println(word + ": " + senses[i].getGloss());
    }
    

    For simplicity, you might also just ask for the first one.

    Synset sense = word.getSense(1);
    

    For any given Synset, you can search for all related Synsets (for any given type of relationship). All the Synsets in WordNet that are related to each other point to each other. The PointerUtils class provides a selection of methods that returns a list of Synsets that a given Synset points to. For example:

    PointerTargetNodeList relatedList = PointerUtils.getInstance().getSynonyms(sense);
    

    Once you have that list, you can iterate through it:

    Iterator i = relatedList.iterator();
    while (i.hasNext()) {
      PointerTargetNode related = (PointerTargetNode) i.next();
      Synset s = related.getSynset();
      System.out.println(s);
    }
    

    With two Synsets, you can search for a relationship between them using the RelationshipFinder.
    The RelationshipFinder requires that you specify a PointerType. There is a PointerType for each of the semantic relationships described above, i.e.:

    PointerType.SYNONYM
    PointerType.HYPERNYM
    etc.

    RelationshipList list = RelationshipFinder.getInstance().findRelationships(synset1, synset2, PointerType.SYNONYM);
    if (!list.isEmpty())  {
      Relationship rel = (Relationship) list.get(0);
      System.out.println(rel);
    }
    

    In the above example, a list of Relationship objects is returned. However, for simplicity, only the first relationship is displayed.

    Once you have that Relationship object, you can call various methods to learn about that relationship. For example, you can get the depth of that relationship (how many degrees of separation) via getDepth() as well as traverse the links between the Synsets by calling getNodeList().

    System.out.println("The depth of this relationship is: " + rel.getDepth());
    
    PointerTargetNodeList nodelist = rel.getNodeList();
    Iterator i = nodelist.iterator();
    while (i.hasNext()) {
      PointerTargetNode related = (PointerTargetNode) i.next();
      System.out.println(related.getSynset());
    }
    

    13 Responses to “WordNet”  

    1. 1 .

      Dear All,

      I’m a student and as part of one of my projects I need to do sort of manipulations with the jwnl.jar and WordNet dictionary, I started my work with the samples available at http://www.shiffman.net/teaching/a2z/wordnet/ , but they all stop with an exception saying :
      java.io.FileNotFoundException: c:\program files\wordnet\2.1\dict\adv.idx (The system cannot find the file specified)
      I have installed the wordNet 2.1 on my windows pc, but couldn’t find the specified file (adv.idx). I would be grateful if somebody could tell me the place I may find this file.

      Thanks in advance

    2. 2 Rohit

      I think the code that you are executing make use of jwnl,but jwnl doesn’t support wordnet 2.1 so you try installing wordnet 2.0 and your problem will be solved

    3. 3 gargi

      there will be a cntlist.rev rar file in the same dict folder. try to execute that it will get the missing files in the directory

    4. 4 Erwin

      Can I use JWNL to measure semantic similarity between two sentences?
      Thank you for your response…

    5. 5 Bina Khan

      I am a student and am currently working on a project that needs to use WordNet to determine the part of speech a word belongs to.
      But the project is in C#. How can I use WordNet through C#.
      Please Help.
      Thanks.

    6. 6 ali benabdellah

      Dear All,

      I’m a student and I am working in my projects with the jwnl.jar and WordNet dictionary, I need a function wich return all hypernyms of a synset , but not just of the sense 1 .

      Thanks in advance

    7. 7 Christian

      Hi.
      I tried the code above, and it works fine. But i dont get any elements in the pointertargetnodelist.
      Calling realtedList.size() returns 0, in all occurences…. Hmm , can anybody help?

      Regards

    8. 8 Heung-Seon

      Hi.

      I’m trying to extract synonyms using WordNet 2.0 with JWNL 1.3. However, It doesn’t work what I expected. For example, I can not get synonyms with resect to “take, VERB”. On the other hand, MIT JWI library gives me synonyms of “take, VERB”.

      How can I use JWNL well for finding synoyms?

      Thanks in advance.

    9. 9 Lilia Coelho

      Hi
      I am a student. I have used the code on your website before. But now I have a problem.

      relatedList = PointerUtils.getInstance().getSynonyms(sense);

      Doesn’t seem to working.

      It returns empty.

      Please help.

    10. 10 Daniel

      Heung-Seon: Try using RiTa, it makes getting synonyms much easier: http://www.rednoise.org/rita/wordnet/documentation/index.htm

    11. 11 GatechGirl

      Wow…nice step by step how-to….really appreciate this!

    1. 1 WordNet at daniel shiffman
    2. 2 rascunho » Blog Archive » links for 2007-09-22


    Leave a Reply