Friday, August 14, 2009

A RESULT!

Ran a comparison of two Darwin Core terms, MaximumElevationInMeters and MinimumElevationInMeters against the FGDC standard. I chose these terms because they appear to be the only ones with obvious exact matches in FGDC: Altitude Maximum and Altitude Minimum.

The filter algorithm I used was:
  1. Augment each description with the WordNet similar-tos of the synsets of every word in that description.
  2. Add every word's synonyms to the description.
  3. Throw away all words with length < 6.

This is the output that comes from a call to correspRank() (with the two rankings as input):
Top Rank
1
Bottom Rank
1
Range
0
Median Rank
1.0


Top Score
0.369239797651
Bottom Score
0.360379846875
Range
0.00885995077593
Median Score
0.364809822263

Note that mean and variance are not reported- I haven't written them in yet because my sample sizes so far are too small.

Let's look at the top 5 hits in both rankings:
MaximumElevationInMeters
  1. Altitude Maximum
  2. Altitude System Definition
  3. Altitude System Definition
  4. False Easting
  5. False Easting

MinimumElevationInMeters
  1. Altitude Minimum
  2. Altitude System Definition
  3. Altitude System Definition
  4. Altitude Maximum
  5. False Easting
Interestingly, Altitude Maximum makes it into the top five for MinimumElevationInMeters, but the reverse is not true.

One that's not so good about this result is that the filter/score/rank/assess process took about 141 seconds of processor time. Ouch!

No comments:

Post a Comment