Friday, June 12, 2009

This week, I've been putting together some Python code to extract the fields and descriptions from the FGDC bio standard, which can be found here:

http://www.nbii.gov/images/uploaded/151871_1162508205217_BDP-workbook.doc

It takes terms in this format:

1.2.3 Term -- description blah blah blah
Type: type
Short Name: shortname

and writes a text file in this format:

term = "Term"
description = " description blah blah blah
Type: type
Short Name: shortname
"

This is the same output format as Namrata uses, where the term name and description are enclosed in quotes.

The bio elements in FGDC begin with BDP1.2.3 instead of 1.2.3, so I'll take that into account in the code before committing it to SVN.


--------- HOWEVER --------
So far, I haven't written in a way to preserve the hierarchy. The numbering 1.2.3 alludes to it, but the actual names of the levels are given in diagrams of the FGDC standard. Not sure if there's a(n easy) way to automatically pull out that information. Ideas are welcome.

Otherwise, I'll probably store that information in the code itself and access it as needed, for the sake of having it working soon with the current standard at least.


UPDATE: Committed to SVN.

No comments:

Post a Comment