I committed a couple of things to the svn repository:
Contains a function to calculate the cosine similarity between two token lists. These lists can
either be tokenized descriptions, or processed versions thereof.
Contains a number of functions to process and modify tokenized descriptions- including two
stemming functions, a function to throw out stop words, etc.