Lookups
This class allows convenient access to large lookup tables and dictionaries,
e.g. lemmatization data or tokenizer exception lists using Bloom filters.
Lookups are available via the Vocab
as vocab.lookups
, so they
can be accessed before the pipeline components are applied (e.g. in the
tokenizer and lemmatizer), as well as within the pipeline components via
doc.vocab.lookups
.
Lookups.__init__ method
Create a Lookups
object.
Lookups.__len__ method
Get the current number of tables in the lookups.
Name | Description |
---|---|
RETURNS | The number of tables in the lookups. int |
Lookups.__contains__ method
Check if the lookups contain a table of a given name. Delegates to
Lookups.has_table
.
Name | Description |
---|---|
name | Name of the table. str |
RETURNS | Whether a table of that name is in the lookups. bool |
Lookups.tables property
Get the names of all tables in the lookups.
Name | Description |
---|---|
RETURNS | Names of the tables in the lookups. List[str] |
Lookups.add_table method
Add a new table with optional data to the lookups. Raises an error if the table exists.
Name | Description |
---|---|
name | Unique name of the table. str |
data | Optional data to add to the table. dict |
RETURNS | The newly added table. Table |
Lookups.get_table method
Get a table from the lookups. Raises an error if the table doesn’t exist.
Name | Description |
---|---|
name | Name of the table. str |
RETURNS | The table. Table |
Lookups.remove_table method
Remove a table from the lookups. Raises an error if the table doesn’t exist.
Name | Description |
---|---|
name | Name of the table to remove. str |
RETURNS | The removed table. Table |
Lookups.has_table method
Check if the lookups contain a table of a given name. Equivalent to
Lookups.__contains__
.
Name | Description |
---|---|
name | Name of the table. str |
RETURNS | Whether a table of that name is in the lookups. bool |
Lookups.to_bytes method
Serialize the lookups to a bytestring.
Name | Description |
---|---|
RETURNS | The serialized lookups. bytes |
Lookups.from_bytes method
Load the lookups from a bytestring.
Name | Description |
---|---|
bytes_data | The data to load from. bytes |
RETURNS | The loaded lookups. Lookups |
Lookups.to_disk method
Save the lookups to a directory as lookups.bin
. Expects a path to a directory,
which will be created if it doesn’t exist.
Name | Description |
---|---|
path | A path to a directory, which will be created if it doesn’t exist. Paths may be either strings or Path -like objects. Union[str,Path] |
Lookups.from_disk method
Load lookups from a directory containing a lookups.bin
. Will skip loading if
the file doesn’t exist.
Name | Description |
---|---|
path | A path to a directory. Paths may be either strings or Path -like objects. Union[str,Path] |
RETURNS | The loaded lookups. Lookups |
Table classordererddict
A table in the lookups. Subclass of OrderedDict
that implements a slightly
more consistent and unified API and includes a Bloom filter to speed up missed
lookups. Supports all other methods and attributes of OrderedDict
/
dict
, and the customized methods listed here. Methods that get or set keys
accept both integers and strings (which will be hashed before being added to the
table).
Table.__init__ method
Initialize a new table.
Name | Description |
---|---|
name | Optional table name for reference. str |
Table.from_dict classmethod
Initialize a new table from a dict.
Name | Description |
---|---|
data | The dictionary. dict |
name | Optional table name for reference. str |
RETURNS | The newly constructed object. Table |
Table.set method
Set a new key / value pair. String keys will be hashed. Same as
table[key] = value
.
Name | Description |
---|---|
key | The key. Union[str, int] |
value | The value. |
Table.to_bytes method
Serialize the table to a bytestring.
Name | Description |
---|---|
RETURNS | The serialized table. bytes |
Table.from_bytes method
Load a table from a bytestring.
Name | Description |
---|---|
bytes_data | The data to load. bytes |
RETURNS | The loaded table. Table |
Attributes
Name | Description |
---|---|
name | Table name. str |
default_size | Default size of bloom filters if no data is provided. int |
bloom | The bloom filters. preshed.BloomFilter |