Uploaded two new Python functions:
1. termReader(term_file, terms = {}) reads a text file with term/description pairs (format described below) and writes them into a dictionary. The terms parameter allows for the addition of terms to an existing dictionary.
2. tokens(terms) takes a dictionary and tokenizes the values using the NLTK's WordPunctTokenizer(). It currently writes the tokenized lists back into the same dictionary; I'm planning to change that. I'll also try to think of a more apt name for this function.
(Question: Is context information lost when the strings are tokenized?)
All in all, this is moving along a little bit slowly so far, because I haven't used Python in a long time. There's a lot of going and looking things up. As I get a better feel of it, chances are I'll come back to this early code and take advantage of more of the functionality, instead of looping through everything.
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment