Tuesday, July 14, 2009

Today's commits

Here are the changes I committed to the SVN trunk today:

filters.py and similarity.py
  • Fixed the keywords.
fgdc_extractor.py
Minor bug fixes:
  • Bug in dictReader() that didn't open the file if the path was passed in as a parameter.
  • Bug in extract() that asked for path to hierarchy file even if a hierarchy dictionary is provided as a parameter.
filters.py
Added several WordNet-based functions, to retrieve the following types of related words for every word in a term description:
  • hyponyms
  • instance hyponyms
  • member meronyms
  • part meronyms
  • substance meronyms
  • entailments
  • attibutes
  • causes
  • also-sees
  • verb groups
  • similar-tos
  • pertainyms
  • derivationally related forms
In each case, the results are appended to the description and the full list is returned. There is also an option to include the hyponyms/meronyms/etc for each word AND its synonyms. I've included this because there are often no results returned when only the word is considered. I'm not sure why that is- either WordNet is less rich than expected, or there's a very subtle error in the code.

TO DO
  • Have a look at the WordNet hierarchy.
  • Figure out why the XML output of fgdc_extractor.extract() is producing errors. Namrata has suggested that this may be due to the presence of '<' in the text.
  • Write filter functions to get hypernyms and holonyms.
  • Write a function to filter term descriptions and rank them relative to a given description.

No comments:

Post a Comment