This module contains low-level functions and a high-level class for parsing the prolog file “wn_s.pl” from the WordNet prolog download into an object suitable for looking up synonyms and performing query expansion.
http://wordnetcode.princeton.edu/3.0/WNprolog-3.0.tar.gz
Represents the WordNet synonym database, either loaded into memory from the wn_s.pl Prolog file, or stored on disk in a Whoosh index.
This class allows you to parse the prolog file “wn_s.pl” from the WordNet prolog download into an object suitable for looking up synonyms and performing query expansion.
http://wordnetcode.princeton.edu/3.0/WNprolog-3.0.tar.gz
To load a Thesaurus object from the wn_s.pl file...
>>> t = Thesaurus.from_filename("wn_s.pl")
To save the in-memory Thesaurus to a Whoosh index...
>>> from whoosh.filedb.filestore import FileStorage
>>> fs = FileStorage("index")
>>> t.to_storage(fs)
To load a Thesaurus object from a Whoosh index...
>>> t = Thesaurus.from_storage(fs)
The Thesaurus object is thus usable in two ways:
Here are timings for various tasks on my (fast) Windows machine, which might give an idea of relative costs for in-memory vs. on-disk.
Task | Approx. time (s) |
---|---|
Parsing the wn_s.pl file | 1.045 |
Saving to an on-disk index | 13.084 |
Loading from an on-disk index | 0.082 |
Look up synonyms for “light” (in memory) | 0.0011 |
Look up synonyms for “light” (loaded from disk) | 0.0028 |
Basically, if you can afford spending the memory necessary to parse the Thesaurus and then cache it, it’s faster. Otherwise, use an on-disk index.
Creates a Thesaurus object from the given file-like object, which should contain the WordNet wn_s.pl file.
>>> f = open("wn_s.pl")
>>> t = Thesaurus.from_file(f)
>>> t.synonyms("hail")
['acclaim', 'come', 'herald']
Creates a Thesaurus object from the given filename, which should contain the WordNet wn_s.pl file.
>>> t = Thesaurus.from_filename("wn_s.pl")
>>> t.synonyms("hail")
['acclaim', 'come', 'herald']
Creates a Thesaurus object from the given storage object, which should contain an index created by Thesaurus.to_storage().
>>> from whoosh.filedb.filestore import FileStorage
>>> fs = FileStorage("index")
>>> t = Thesaurus.from_storage(fs)
>>> t.synonyms("hail")
['acclaim', 'come', 'herald']
Parameters: |
|
---|
Returns a list of synonyms for the given word.
>>> thesaurus.synonyms("hail")
['acclaim', 'come', 'herald']
Creates am index in the given storage object from the synonyms loaded from a WordNet file.
>>> from whoosh.filedb.filestore import FileStorage
>>> fs = FileStorage("index")
>>> t = Thesaurus.from_filename("wn_s.pl")
>>> t.to_storage(fs)
Parameters: |
|
---|