Wednesday, July 15, 2009

XML Update

Got the XML issue cleared up. When I was writing the function, I forgot to replace things like '<' and '>' with escape characters. I added a function to fgdc_extractor.py that takes a string and returns the same string, with all necessary characters escaped for XML. This is called by the extract() function before it writes anything to file.

I sent the output to Namrata and she told me already that she can open it (The new XML file is in the SVN trunk as well, by the way).

More Filter Functions

I added six functions to filters.py. They retrieve the following related words from lemmas and/or synsets of the words in a term description:
  • hypernym distances
  • hypernyms
  • instance hypernyms
  • member holonyms
  • part holonyms
  • substance holonyms
As with yesterday's functions, these can be applied to either lemmas (e.g. words) or synsets. The lemma option performs very poorly most of the time, which is odd because synsets contain lemmas (I think). But rather than returning a subset of the synset option's output, the lemma option typically adds nothing at all to the description.

It's interesting that the WordNet online browser returns direct hypernyms and hyponyms, full hypernyms and hyponyms, inherited hypernyms and sister terms, etc. but the NLTK functions don't make those distinctions. Is there another Python library that has a better representation of WordNet than NLTK?

No comments:

Post a Comment