Thursday, July 23, 2009

a couple of new and important things

The other day, I finally got around to trying out my termReader() function on Namrata's file of EML terms. That prompted a re-write of the entire function, which is now much improved.

New termReader() capabilities:
  1. Reads XML files, regardless of whether the tags open on a new line. The old version of this function assumed that the formatting was much more strict.
  2. Gives the user the option of what information to use as keys in the returned dictionary. Previously, only the term names were used, but EML appears not to have unique term names. So the new version allows the use of hierarchy paths as keys- this option is set as the default.
In addition, because the original aim of this project was to take a combinatoric approach to developing a good ranking algorithm, I've added a callFilters(), a new function called by termRanker(). It takes a list of filter functions passed into termRanker() as a parameter, and applies them to the descriptions in the term dictionary. I've also created a help function to go along with it, which lists the available filter functions and how to pass them into termRanker().

No comments:

Post a Comment