medcat.pipeline.pipeline
Classes:
-
DelegatingTokenizer–A delegating tokenizer.
-
IncorrectAddonLoaded– -
IncorrectArgumentsForComponent– -
IncorrectArgumentsForTokenizer– -
IncorrectCoreComponent– -
IncorrectFolderUponLoad– -
Pipeline–The pipeline for the NLP process.
-
UnkownAddonConfig–
Attributes:
-
logger–
DelegatingTokenizer
DelegatingTokenizer(tokenizer: BaseTokenizer, components: list[CoreComponent])
Bases: BaseTokenizer
A delegating tokenizer.
This can be used to create a tokenizer with some preprocessing (i.e components) included.
Methods:
-
create_entity– -
create_new_tokenizer– -
entity_from_tokens– -
entity_from_tokens_in_doc– -
get_doc_class– -
get_entity_class–
Attributes:
-
components– -
tokenizer–
Source code in medcat-v2/medcat/pipeline/pipeline.py
35 36 37 38 | |
components
instance-attribute
components = components
tokenizer
instance-attribute
tokenizer = tokenizer
create_entity
create_entity(doc: MutableDocument, token_start_index: int, token_end_index: int, label: str) -> MutableEntity
Source code in medcat-v2/medcat/pipeline/pipeline.py
40 41 42 43 44 | |
create_new_tokenizer
classmethod
create_new_tokenizer(config: Config) -> DelegatingTokenizer
Source code in medcat-v2/medcat/pipeline/pipeline.py
59 60 61 | |
entity_from_tokens
entity_from_tokens(tokens: list[MutableToken]) -> MutableEntity
Source code in medcat-v2/medcat/pipeline/pipeline.py
46 47 | |
entity_from_tokens_in_doc
entity_from_tokens_in_doc(tokens: list[MutableToken], doc: MutableDocument) -> MutableEntity
Source code in medcat-v2/medcat/pipeline/pipeline.py
49 50 51 | |
get_doc_class
get_doc_class() -> type[MutableDocument]
Source code in medcat-v2/medcat/pipeline/pipeline.py
63 64 | |
get_entity_class
get_entity_class() -> type[MutableEntity]
Source code in medcat-v2/medcat/pipeline/pipeline.py
66 67 | |
IncorrectAddonLoaded
IncorrectAddonLoaded(*args)
Bases: ValueError
Source code in medcat-v2/medcat/pipeline/pipeline.py
471 472 | |
IncorrectArgumentsForComponent
IncorrectArgumentsForComponent(comp_type: CoreComponentType, comp_name: str)
Bases: TypeError
Source code in medcat-v2/medcat/pipeline/pipeline.py
442 443 444 445 | |
IncorrectArgumentsForTokenizer
IncorrectArgumentsForTokenizer(provider: str)
Bases: TypeError
Source code in medcat-v2/medcat/pipeline/pipeline.py
435 436 437 | |
IncorrectCoreComponent
IncorrectCoreComponent(*args)
Bases: ValueError
Source code in medcat-v2/medcat/pipeline/pipeline.py
450 451 | |
IncorrectFolderUponLoad
IncorrectFolderUponLoad(*args)
Bases: ValueError
Source code in medcat-v2/medcat/pipeline/pipeline.py
456 457 | |
Pipeline
Pipeline(cdb: CDB, vocab: Optional[Vocab], model_load_path: Optional[str], old_pipe: Optional[Pipeline] = None, addon_config_dict: Optional[dict[str, dict]] = None)
The pipeline for the NLP process.
This class is responsible to initial creation of the NLP document, as well as running through of all the components and addons.
Methods:
-
add_addon– -
entity_from_tokens–Get the entity from the list of tokens.
-
entity_from_tokens_in_doc–Get the entity from the list of tokens in a document.
-
get_component–Get the core component by the component type.
-
get_doc–Get the document for this text.
-
iter_addons– -
iter_all_components– -
save_components–
Attributes:
-
cdb– -
config– -
tokenizer(BaseTokenizer) –The raw tokenizer (with no components).
-
tokenizer_with_tag(BaseTokenizer) –The tokenizer with the tagging component.
-
vocab(Vocab) –
Source code in medcat-v2/medcat/pipeline/pipeline.py
77 78 79 80 81 82 83 84 85 86 87 88 89 90 | |
cdb
instance-attribute
cdb = cdb
config
instance-attribute
config = config
tokenizer_with_tag
property
tokenizer_with_tag: BaseTokenizer
The tokenizer with the tagging component.
add_addon
add_addon(addon: AddonComponent) -> None
Source code in medcat-v2/medcat/pipeline/pipeline.py
394 395 396 397 | |
entity_from_tokens
entity_from_tokens(tokens: list[MutableToken]) -> MutableEntity
Get the entity from the list of tokens.
This effectively turns a list of (consecutive) documents into an entity.
Parameters:
-
(tokenslist[MutableToken]) –The tokens to use.
Returns:
-
MutableEntity(MutableEntity) –The resulting entity.
Source code in medcat-v2/medcat/pipeline/pipeline.py
336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 | |
entity_from_tokens_in_doc
entity_from_tokens_in_doc(tokens: list[MutableToken], doc: MutableDocument) -> MutableEntity
Get the entity from the list of tokens in a document.
This effectively turns a list of (consecutive) documents into an entity. But it is also designed to reuse existing instances on the document instead of creating new ones.
Parameters:
-
(tokenslist[MutableToken]) –The tokens to use.
-
(docMutableDocument) –The document for these tokens.
Returns:
-
MutableEntity(MutableEntity) –The resulting entity.
Source code in medcat-v2/medcat/pipeline/pipeline.py
358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 | |
get_component
get_component(ctype: CoreComponentType) -> CoreComponent
Get the core component by the component type.
Parameters:
-
(ctypeCoreComponentType) –The core component type.
Raises:
-
ValueError–If no component by that type is found.
Returns:
-
CoreComponent(CoreComponent) –The corresponding core component.
Source code in medcat-v2/medcat/pipeline/pipeline.py
375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 | |
get_doc
get_doc(text: str) -> MutableDocument
Get the document for this text.
This essentially runs the tokenizer over the text.
Parameters:
-
(textstr) –The input text.
Returns:
-
MutableDocument(MutableDocument) –The resulting document.
Source code in medcat-v2/medcat/pipeline/pipeline.py
316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 | |
iter_addons
iter_addons() -> Iterable[AddonComponent]
Source code in medcat-v2/medcat/pipeline/pipeline.py
429 430 | |
iter_all_components
iter_all_components() -> Iterable[BaseComponent]
Source code in medcat-v2/medcat/pipeline/pipeline.py
423 424 425 426 427 | |
save_components
save_components(serialiser_type: Union[AvailableSerialisers, str], components_folder: str) -> None
Source code in medcat-v2/medcat/pipeline/pipeline.py
399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 | |
UnkownAddonConfig
UnkownAddonConfig(cnf: ComponentConfig, *existing_types: type[ComponentConfig])
Bases: ValueError
Source code in medcat-v2/medcat/pipeline/pipeline.py
462 463 464 465 466 | |