support.bitvector module
An implementation of an object that acts like a collection of on/off bits.
Base classes
-
class whoosh.idsets.DocIdSet
Base class for a set of positive integers, implementing a subset of the
built-in set type’s interface with extra docid-related methods.
This is a superclass for alternative set implementations to the built-in
set which are more memory-efficient and specialized toward storing
sorted lists of positive integers, though they will inevitably be slower
than set for most operations since they’re pure Python.
-
after()
- Returns the next integer in the set after i, or None.
-
before()
- Returns the previous integer in the set before i, or None.
-
first()
- Returns the first (lowest) integer in the set.
-
invert_update(size)
- Updates the set in-place to contain numbers in the range
[0 - size) except numbers that are in this set.
-
last()
- Returns the last (highest) integer in the set.
-
class whoosh.idsets.BaseBitSet
Implementation classes
-
class whoosh.idsets.BitSet(source=None, size=0)
A DocIdSet backed by an array of bits. This can also be useful as a bit
array (e.g. for a Bloom filter). It is much more memory efficient than a
large built-in set of integers, but wastes memory for sparse sets.
Parameters: |
- maxsize – the maximum size of the bit array.
- source – an iterable of positive integers to add to this set.
- bits – an array of unsigned bytes (“B”) to use as the underlying
bit array. This is used by some of the object’s methods.
|
-
class whoosh.idsets.OnDiskBitSet(dbfile, basepos, bytecount)
A DocIdSet backed by an array of bits on disk.
>>> st = RamStorage()
>>> f = st.create_file("test.bin")
>>> bs = BitSet([1, 10, 15, 7, 2])
>>> bytecount = bs.to_disk(f)
>>> f.close()
>>> # ...
>>> f = st.open_file("test.bin")
>>> odbs = OnDiskBitSet(f, bytecount)
>>> list(odbs)
[1, 2, 7, 10, 15]
Parameters: |
- dbfile – a StructFile object
to read from.
- basepos – the base position of the bytes in the given file.
- bytecount – the number of bytes to use for the bit array.
|
-
class whoosh.idsets.SortedIntSet(source=None)
- A DocIdSet backed by a sorted array of integers.
-
class whoosh.idsets.MultiIdSet(idsets, offsets)
Wraps multiple SERIAL sub-DocIdSet objects and presents them as an
aggregated, read-only set.
Parameters: |
- idsets – a list of DocIdSet objects.
- offsets – a list of offsets corresponding to the DocIdSet objects
in idsets.
|