VDC 09 Development Blog: Progress...

Uploaded two new Python functions:

1. termReader(term_file, terms = {}) reads a text file with term/description pairs (format described below) and writes them into a dictionary. The terms parameter allows for the addition of terms to an existing dictionary.

2. tokens(terms) takes a dictionary and tokenizes the values using the NLTK's WordPunctTokenizer(). It currently writes the tokenized lists back into the same dictionary; I'm planning to change that. I'll also try to think of a more apt name for this function.
(Question: Is context information lost when the strings are tokenized?)

All in all, this is moving along a little bit slowly so far, because I haven't used Python in a long time. There's a lot of going and looking things up. As I get a better feel of it, chances are I'll come back to this early code and take advantage of more of the functionality, instead of looping through everything.

VDC 09 Development Blog

Monday, June 15, 2009

Progress...

No comments:

Post a Comment

Blog Archive

Related Links

About Me

Followers