medcat.tokenizing.regex_impl.tokenizer
Classes:
-
Document– -
Entity– -
RegexTokenizer– -
Token–
Document
Document(text: str, tokens: Optional[list[MutableToken]] = None)
Methods:
-
get_addon_data– -
get_available_addon_paths– -
get_tokens– -
has_addon_data– -
isupper– -
register_addon_path– -
set_addon_data–
Attributes:
-
base(BaseDocument) – -
linked_ents(list[MutableEntity]) – -
ner_ents(list[MutableEntity]) – -
text–
Source code in medcat-v2/medcat/tokenizing/regex_impl/tokenizer.py
223 224 225 226 227 228 229 | |
text
instance-attribute
text = text
get_addon_data
Source code in medcat-v2/medcat/tokenizing/regex_impl/tokenizer.py
281 282 283 284 | |
get_available_addon_paths
Source code in medcat-v2/medcat/tokenizing/regex_impl/tokenizer.py
286 287 288 | |
get_tokens
get_tokens(start_index: int, end_index: int) -> list[MutableToken]
Source code in medcat-v2/medcat/tokenizing/regex_impl/tokenizer.py
258 259 260 261 262 263 264 265 | |
has_addon_data
Source code in medcat-v2/medcat/tokenizing/regex_impl/tokenizer.py
278 279 | |
isupper
isupper() -> bool
Source code in medcat-v2/medcat/tokenizing/regex_impl/tokenizer.py
270 271 | |
register_addon_path
classmethod
Source code in medcat-v2/medcat/tokenizing/regex_impl/tokenizer.py
290 291 292 293 294 | |
set_addon_data
Source code in medcat-v2/medcat/tokenizing/regex_impl/tokenizer.py
273 274 275 276 | |
Entity
Entity(document: Document, text: str, start_index: int, end_index: int, start_char_index: int, end_char_index: int)
Methods:
-
get_addon_data– -
get_available_addon_paths– -
has_addon_data– -
register_addon_path– -
set_addon_data–
Attributes:
-
ENTITY_INFO_PREFIX– -
base(BaseEntity) – -
confidence(float) – -
context_similarity(float) – -
cui– -
detected_name– -
end_char_index(int) – -
end_index(int) – -
id– -
label(int) – -
link_candidates(list[str]) – -
start_char_index(int) – -
start_index(int) – -
text(str) –
Source code in medcat-v2/medcat/tokenizing/regex_impl/tokenizer.py
126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 | |
ENTITY_INFO_PREFIX
class-attribute
instance-attribute
ENTITY_INFO_PREFIX = 'Entity:'
cui
instance-attribute
cui = ''
detected_name
instance-attribute
detected_name = ''
id
instance-attribute
id = -1
get_addon_data
Source code in medcat-v2/medcat/tokenizing/regex_impl/tokenizer.py
180 181 182 183 | |
get_available_addon_paths
Source code in medcat-v2/medcat/tokenizing/regex_impl/tokenizer.py
185 186 187 | |
has_addon_data
Source code in medcat-v2/medcat/tokenizing/regex_impl/tokenizer.py
177 178 | |
register_addon_path
classmethod
Source code in medcat-v2/medcat/tokenizing/regex_impl/tokenizer.py
189 190 191 192 193 194 195 196 197 198 199 | |
set_addon_data
Source code in medcat-v2/medcat/tokenizing/regex_impl/tokenizer.py
172 173 174 175 | |
RegexTokenizer
Bases: BaseTokenizer
Methods:
-
create_entity– -
create_new_tokenizer– -
entity_from_tokens– -
entity_from_tokens_in_doc– -
get_doc_class– -
get_entity_class–
Attributes:
-
REGEX–
REGEX
class-attribute
instance-attribute
REGEX = compile('(([^a-zA-Z0-9\\s]+|\\b\\w+\\b|\\S+)\\s?)')
create_entity
create_entity(doc: MutableDocument, token_start_index: int, token_end_index: int, label: str) -> MutableEntity
Source code in medcat-v2/medcat/tokenizing/regex_impl/tokenizer.py
333 334 335 336 337 338 339 | |
create_new_tokenizer
classmethod
create_new_tokenizer(config: Config) -> RegexTokenizer
Source code in medcat-v2/medcat/tokenizing/regex_impl/tokenizer.py
386 387 388 | |
entity_from_tokens
entity_from_tokens(tokens: list[MutableToken]) -> MutableEntity
Source code in medcat-v2/medcat/tokenizing/regex_impl/tokenizer.py
344 345 346 347 348 349 350 | |
entity_from_tokens_in_doc
entity_from_tokens_in_doc(tokens: list[MutableToken], doc: MutableDocument) -> MutableEntity
Source code in medcat-v2/medcat/tokenizing/regex_impl/tokenizer.py
362 363 364 365 366 367 | |
get_doc_class
get_doc_class() -> Type[MutableDocument]
Source code in medcat-v2/medcat/tokenizing/regex_impl/tokenizer.py
390 391 | |
get_entity_class
get_entity_class() -> Type[MutableEntity]
Source code in medcat-v2/medcat/tokenizing/regex_impl/tokenizer.py
393 394 | |
Token
Token(document: Document, text: str, _text_with_ws: str, start_index: int, token_index: int, is_punct: bool, to_skip: bool)
Attributes:
-
base(BaseToken) – -
char_index(int) – -
index(int) – -
is_digit(bool) – -
is_punctuation(bool) – -
is_stop(bool) – -
is_upper(bool) – -
lemma(str) – -
lower(str) – -
norm(str) – -
tag(Optional[str]) – -
text(str) – -
text_versions(list[str]) – -
text_with_ws(str) – -
to_skip(bool) –
Source code in medcat-v2/medcat/tokenizing/regex_impl/tokenizer.py
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 | |