The current implementation offers support for HF LLama models and BERT models. We will cover only BERT in this section as the Llama usage is the same, just different imports.

In [ ]:

Copied!

# Install medcat
! pip install "medcat[spacy,rel-cat,meta-cat]~=2.4.0" # NOTE: VERSION-STRING
# Install medcat
! pip install "medcat[spacy,rel-cat,meta-cat]~=2.4.0" # NOTE: VERSION-STRING

Collecting medcat@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2 (from medcat[meta-cat,spacy]@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2)
  Cloning https://github.com/CogStack/cogstack-nlp (to revision medcat/v0.11.2) to /private/var/folders/h4/sklqg_zx1dbbbx76m2__zb8h0000gn/T/pip-install-7r4on_8p/medcat_c4a76da1eaa7411a9ff529c5127bf9eb
  Running command git clone --filter=blob:none --quiet https://github.com/CogStack/cogstack-nlp /private/var/folders/h4/sklqg_zx1dbbbx76m2__zb8h0000gn/T/pip-install-7r4on_8p/medcat_c4a76da1eaa7411a9ff529c5127bf9eb
  Running command git checkout -q b1ce30ba716ff7c1f3b912085ca02026b6de3f22
  Resolved https://github.com/CogStack/cogstack-nlp to commit b1ce30ba716ff7c1f3b912085ca02026b6de3f22
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Preparing metadata (pyproject.toml) ... done
Requirement already satisfied: numpy>2.0 in /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/.venv_v2_tut/lib/python3.11/site-packages (from medcat@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2->medcat[meta-cat,spacy]@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2) (2.3.1)
Requirement already satisfied: dill in /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/.venv_v2_tut/lib/python3.11/site-packages (from medcat@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2->medcat[meta-cat,spacy]@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2) (0.4.0)
Requirement already satisfied: pandas<3.0,>=2.2 in /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/.venv_v2_tut/lib/python3.11/site-packages (from medcat@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2->medcat[meta-cat,spacy]@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2) (2.3.0)
Requirement already satisfied: tqdm<5.0,>=4.64 in /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/.venv_v2_tut/lib/python3.11/site-packages (from medcat@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2->medcat[meta-cat,spacy]@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2) (4.67.1)
Requirement already satisfied: xxhash<4.0,>=3.5.0 in /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/.venv_v2_tut/lib/python3.11/site-packages (from medcat@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2->medcat[meta-cat,spacy]@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2) (3.5.0)
Requirement already satisfied: pydantic>2.0 in /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/.venv_v2_tut/lib/python3.11/site-packages (from medcat@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2->medcat[meta-cat,spacy]@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2) (2.11.7)
Requirement already satisfied: typing-extensions in /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/.venv_v2_tut/lib/python3.11/site-packages (from medcat@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2->medcat[meta-cat,spacy]@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2) (4.14.0)
Requirement already satisfied: spacy in /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/.venv_v2_tut/lib/python3.11/site-packages (from medcat@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2->medcat[meta-cat,spacy]@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2) (3.8.7)
Requirement already satisfied: transformers<5.0,>=4.41.0 in /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/.venv_v2_tut/lib/python3.11/site-packages (from medcat@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2->medcat[meta-cat,spacy]@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2) (4.53.0)
Requirement already satisfied: peft<1.0,>0.8.2 in /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/.venv_v2_tut/lib/python3.11/site-packages (from medcat@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2->medcat[meta-cat,spacy]@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2) (0.15.2)
Requirement already satisfied: torch<3.0,>=2.4.0 in /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/.venv_v2_tut/lib/python3.11/site-packages (from medcat@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2->medcat[meta-cat,spacy]@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2) (2.7.1)
Requirement already satisfied: scikit-learn<2.0,>=1.1.3 in /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/.venv_v2_tut/lib/python3.11/site-packages (from medcat@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2->medcat[meta-cat,spacy]@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2) (1.7.0)
Requirement already satisfied: scipy in /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/.venv_v2_tut/lib/python3.11/site-packages (from medcat@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2->medcat[meta-cat,spacy]@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2) (1.16.0)
Requirement already satisfied: python-dateutil>=2.8.2 in /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/.venv_v2_tut/lib/python3.11/site-packages (from pandas<3.0,>=2.2->medcat@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2->medcat[meta-cat,spacy]@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2) (2.9.0.post0)
Requirement already satisfied: pytz>=2020.1 in /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/.venv_v2_tut/lib/python3.11/site-packages (from pandas<3.0,>=2.2->medcat@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2->medcat[meta-cat,spacy]@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2) (2025.2)
Requirement already satisfied: tzdata>=2022.7 in /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/.venv_v2_tut/lib/python3.11/site-packages (from pandas<3.0,>=2.2->medcat@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2->medcat[meta-cat,spacy]@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2) (2025.2)
Requirement already satisfied: packaging>=20.0 in /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/.venv_v2_tut/lib/python3.11/site-packages (from peft<1.0,>0.8.2->medcat@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2->medcat[meta-cat,spacy]@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2) (25.0)
Requirement already satisfied: psutil in /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/.venv_v2_tut/lib/python3.11/site-packages (from peft<1.0,>0.8.2->medcat@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2->medcat[meta-cat,spacy]@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2) (7.0.0)
Requirement already satisfied: pyyaml in /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/.venv_v2_tut/lib/python3.11/site-packages (from peft<1.0,>0.8.2->medcat@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2->medcat[meta-cat,spacy]@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2) (6.0.2)
Requirement already satisfied: accelerate>=0.21.0 in /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/.venv_v2_tut/lib/python3.11/site-packages (from peft<1.0,>0.8.2->medcat@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2->medcat[meta-cat,spacy]@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2) (1.8.1)
Requirement already satisfied: safetensors in /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/.venv_v2_tut/lib/python3.11/site-packages (from peft<1.0,>0.8.2->medcat@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2->medcat[meta-cat,spacy]@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2) (0.5.3)
Requirement already satisfied: huggingface_hub>=0.25.0 in /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/.venv_v2_tut/lib/python3.11/site-packages (from peft<1.0,>0.8.2->medcat@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2->medcat[meta-cat,spacy]@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2) (0.33.1)
Requirement already satisfied: joblib>=1.2.0 in /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/.venv_v2_tut/lib/python3.11/site-packages (from scikit-learn<2.0,>=1.1.3->medcat@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2->medcat[meta-cat,spacy]@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2) (1.5.1)
Requirement already satisfied: threadpoolctl>=3.1.0 in /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/.venv_v2_tut/lib/python3.11/site-packages (from scikit-learn<2.0,>=1.1.3->medcat@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2->medcat[meta-cat,spacy]@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2) (3.6.0)
Requirement already satisfied: filelock in /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/.venv_v2_tut/lib/python3.11/site-packages (from torch<3.0,>=2.4.0->medcat@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2->medcat[meta-cat,spacy]@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2) (3.18.0)
Requirement already satisfied: sympy>=1.13.3 in /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/.venv_v2_tut/lib/python3.11/site-packages (from torch<3.0,>=2.4.0->medcat@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2->medcat[meta-cat,spacy]@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2) (1.14.0)
Requirement already satisfied: networkx in /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/.venv_v2_tut/lib/python3.11/site-packages (from torch<3.0,>=2.4.0->medcat@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2->medcat[meta-cat,spacy]@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2) (3.5)
Requirement already satisfied: jinja2 in /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/.venv_v2_tut/lib/python3.11/site-packages (from torch<3.0,>=2.4.0->medcat@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2->medcat[meta-cat,spacy]@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2) (3.1.6)
Requirement already satisfied: fsspec in /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/.venv_v2_tut/lib/python3.11/site-packages (from torch<3.0,>=2.4.0->medcat@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2->medcat[meta-cat,spacy]@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2) (2025.5.1)
Requirement already satisfied: regex!=2019.12.17 in /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/.venv_v2_tut/lib/python3.11/site-packages (from transformers<5.0,>=4.41.0->medcat@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2->medcat[meta-cat,spacy]@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2) (2024.11.6)
Requirement already satisfied: requests in /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/.venv_v2_tut/lib/python3.11/site-packages (from transformers<5.0,>=4.41.0->medcat@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2->medcat[meta-cat,spacy]@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2) (2.32.4)
Requirement already satisfied: tokenizers<0.22,>=0.21 in /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/.venv_v2_tut/lib/python3.11/site-packages (from transformers<5.0,>=4.41.0->medcat@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2->medcat[meta-cat,spacy]@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2) (0.21.2)
Requirement already satisfied: hf-xet<2.0.0,>=1.1.2 in /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/.venv_v2_tut/lib/python3.11/site-packages (from huggingface_hub>=0.25.0->peft<1.0,>0.8.2->medcat@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2->medcat[meta-cat,spacy]@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2) (1.1.5)
Requirement already satisfied: annotated-types>=0.6.0 in /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/.venv_v2_tut/lib/python3.11/site-packages (from pydantic>2.0->medcat@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2->medcat[meta-cat,spacy]@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2) (0.7.0)
Requirement already satisfied: pydantic-core==2.33.2 in /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/.venv_v2_tut/lib/python3.11/site-packages (from pydantic>2.0->medcat@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2->medcat[meta-cat,spacy]@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2) (2.33.2)
Requirement already satisfied: typing-inspection>=0.4.0 in /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/.venv_v2_tut/lib/python3.11/site-packages (from pydantic>2.0->medcat@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2->medcat[meta-cat,spacy]@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2) (0.4.1)
Requirement already satisfied: six>=1.5 in /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/.venv_v2_tut/lib/python3.11/site-packages (from python-dateutil>=2.8.2->pandas<3.0,>=2.2->medcat@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2->medcat[meta-cat,spacy]@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2) (1.17.0)
Requirement already satisfied: mpmath<1.4,>=1.1.0 in /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/.venv_v2_tut/lib/python3.11/site-packages (from sympy>=1.13.3->torch<3.0,>=2.4.0->medcat@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2->medcat[meta-cat,spacy]@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2) (1.3.0)
Requirement already satisfied: MarkupSafe>=2.0 in /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/.venv_v2_tut/lib/python3.11/site-packages (from jinja2->torch<3.0,>=2.4.0->medcat@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2->medcat[meta-cat,spacy]@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2) (3.0.2)
Requirement already satisfied: charset_normalizer<4,>=2 in /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/.venv_v2_tut/lib/python3.11/site-packages (from requests->transformers<5.0,>=4.41.0->medcat@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2->medcat[meta-cat,spacy]@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2) (3.4.2)
Requirement already satisfied: idna<4,>=2.5 in /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/.venv_v2_tut/lib/python3.11/site-packages (from requests->transformers<5.0,>=4.41.0->medcat@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2->medcat[meta-cat,spacy]@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2) (3.10)
Requirement already satisfied: urllib3<3,>=1.21.1 in /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/.venv_v2_tut/lib/python3.11/site-packages (from requests->transformers<5.0,>=4.41.0->medcat@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2->medcat[meta-cat,spacy]@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2) (2.5.0)
Requirement already satisfied: certifi>=2017.4.17 in /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/.venv_v2_tut/lib/python3.11/site-packages (from requests->transformers<5.0,>=4.41.0->medcat@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2->medcat[meta-cat,spacy]@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2) (2025.6.15)
Requirement already satisfied: spacy-legacy<3.1.0,>=3.0.11 in /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/.venv_v2_tut/lib/python3.11/site-packages (from spacy->medcat@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2->medcat[meta-cat,spacy]@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2) (3.0.12)
Requirement already satisfied: spacy-loggers<2.0.0,>=1.0.0 in /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/.venv_v2_tut/lib/python3.11/site-packages (from spacy->medcat@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2->medcat[meta-cat,spacy]@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2) (1.0.5)
Requirement already satisfied: murmurhash<1.1.0,>=0.28.0 in /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/.venv_v2_tut/lib/python3.11/site-packages (from spacy->medcat@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2->medcat[meta-cat,spacy]@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2) (1.0.13)
Requirement already satisfied: cymem<2.1.0,>=2.0.2 in /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/.venv_v2_tut/lib/python3.11/site-packages (from spacy->medcat@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2->medcat[meta-cat,spacy]@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2) (2.0.11)
Requirement already satisfied: preshed<3.1.0,>=3.0.2 in /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/.venv_v2_tut/lib/python3.11/site-packages (from spacy->medcat@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2->medcat[meta-cat,spacy]@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2) (3.0.10)
Requirement already satisfied: thinc<8.4.0,>=8.3.4 in /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/.venv_v2_tut/lib/python3.11/site-packages (from spacy->medcat@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2->medcat[meta-cat,spacy]@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2) (8.3.6)
Requirement already satisfied: wasabi<1.2.0,>=0.9.1 in /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/.venv_v2_tut/lib/python3.11/site-packages (from spacy->medcat@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2->medcat[meta-cat,spacy]@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2) (1.1.3)
Requirement already satisfied: srsly<3.0.0,>=2.4.3 in /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/.venv_v2_tut/lib/python3.11/site-packages (from spacy->medcat@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2->medcat[meta-cat,spacy]@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2) (2.5.1)
Requirement already satisfied: catalogue<2.1.0,>=2.0.6 in /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/.venv_v2_tut/lib/python3.11/site-packages (from spacy->medcat@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2->medcat[meta-cat,spacy]@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2) (2.0.10)
Requirement already satisfied: weasel<0.5.0,>=0.1.0 in /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/.venv_v2_tut/lib/python3.11/site-packages (from spacy->medcat@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2->medcat[meta-cat,spacy]@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2) (0.4.1)
Requirement already satisfied: typer<1.0.0,>=0.3.0 in /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/.venv_v2_tut/lib/python3.11/site-packages (from spacy->medcat@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2->medcat[meta-cat,spacy]@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2) (0.16.0)
Requirement already satisfied: setuptools in /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/.venv_v2_tut/lib/python3.11/site-packages (from spacy->medcat@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2->medcat[meta-cat,spacy]@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2) (65.5.0)
Requirement already satisfied: langcodes<4.0.0,>=3.2.0 in /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/.venv_v2_tut/lib/python3.11/site-packages (from spacy->medcat@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2->medcat[meta-cat,spacy]@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2) (3.5.0)
Requirement already satisfied: language-data>=1.2 in /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/.venv_v2_tut/lib/python3.11/site-packages (from langcodes<4.0.0,>=3.2.0->spacy->medcat@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2->medcat[meta-cat,spacy]@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2) (1.3.0)
Requirement already satisfied: blis<1.4.0,>=1.3.0 in /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/.venv_v2_tut/lib/python3.11/site-packages (from thinc<8.4.0,>=8.3.4->spacy->medcat@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2->medcat[meta-cat,spacy]@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2) (1.3.0)
Requirement already satisfied: confection<1.0.0,>=0.0.1 in /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/.venv_v2_tut/lib/python3.11/site-packages (from thinc<8.4.0,>=8.3.4->spacy->medcat@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2->medcat[meta-cat,spacy]@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2) (0.1.5)
Requirement already satisfied: click>=8.0.0 in /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/.venv_v2_tut/lib/python3.11/site-packages (from typer<1.0.0,>=0.3.0->spacy->medcat@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2->medcat[meta-cat,spacy]@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2) (8.2.1)
Requirement already satisfied: shellingham>=1.3.0 in /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/.venv_v2_tut/lib/python3.11/site-packages (from typer<1.0.0,>=0.3.0->spacy->medcat@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2->medcat[meta-cat,spacy]@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2) (1.5.4)
Requirement already satisfied: rich>=10.11.0 in /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/.venv_v2_tut/lib/python3.11/site-packages (from typer<1.0.0,>=0.3.0->spacy->medcat@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2->medcat[meta-cat,spacy]@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2) (14.0.0)
Requirement already satisfied: cloudpathlib<1.0.0,>=0.7.0 in /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/.venv_v2_tut/lib/python3.11/site-packages (from weasel<0.5.0,>=0.1.0->spacy->medcat@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2->medcat[meta-cat,spacy]@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2) (0.21.1)
Requirement already satisfied: smart-open<8.0.0,>=5.2.1 in /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/.venv_v2_tut/lib/python3.11/site-packages (from weasel<0.5.0,>=0.1.0->spacy->medcat@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2->medcat[meta-cat,spacy]@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2) (7.1.0)
Requirement already satisfied: wrapt in /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/.venv_v2_tut/lib/python3.11/site-packages (from smart-open<8.0.0,>=5.2.1->weasel<0.5.0,>=0.1.0->spacy->medcat@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2->medcat[meta-cat,spacy]@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2) (1.17.2)
Requirement already satisfied: marisa-trie>=1.1.0 in /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/.venv_v2_tut/lib/python3.11/site-packages (from language-data>=1.2->langcodes<4.0.0,>=3.2.0->spacy->medcat@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2->medcat[meta-cat,spacy]@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2) (1.2.1)
Requirement already satisfied: markdown-it-py>=2.2.0 in /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/.venv_v2_tut/lib/python3.11/site-packages (from rich>=10.11.0->typer<1.0.0,>=0.3.0->spacy->medcat@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2->medcat[meta-cat,spacy]@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2) (3.0.0)
Requirement already satisfied: pygments<3.0.0,>=2.13.0 in /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/.venv_v2_tut/lib/python3.11/site-packages (from rich>=10.11.0->typer<1.0.0,>=0.3.0->spacy->medcat@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2->medcat[meta-cat,spacy]@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2) (2.19.2)
Requirement already satisfied: mdurl~=0.1 in /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/.venv_v2_tut/lib/python3.11/site-packages (from markdown-it-py>=2.2.0->rich>=10.11.0->typer<1.0.0,>=0.3.0->spacy->medcat@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2->medcat[meta-cat,spacy]@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2) (0.1.2)

In [ ]:

Copied!





import logging
from medcat.cdb import CDB
from medcat.config.config_rel_cat import ConfigRelCAT
from medcat.components.addons.relation_extraction.rel_cat import RelCAT
from medcat.components.addons.relation_extraction.base_component import RelExtrBaseComponent
from medcat.components.addons.relation_extraction.bert.model import RelExtrBertModel
from medcat.components.addons.relation_extraction.bert.config import RelExtrBertConfig
from medcat.components.addons.relation_extraction.tokenizer import BaseTokenizerWrapper
from medcat.config import Config
from medcat.tokenizing.tokenizers import create_tokenizer
import logging
from medcat.cdb import CDB
from medcat.config.config_rel_cat import ConfigRelCAT
from medcat.components.addons.relation_extraction.rel_cat import RelCAT
from medcat.components.addons.relation_extraction.base_component import RelExtrBaseComponent
from medcat.components.addons.relation_extraction.bert.model import RelExtrBertModel
from medcat.components.addons.relation_extraction.bert.config import RelExtrBertConfig
from medcat.components.addons.relation_extraction.tokenizer import BaseTokenizerWrapper
from medcat.config import Config
from medcat.tokenizing.tokenizers import create_tokenizer

/Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/.venv_v2_tut/lib/python3.11/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm

Training RelCAT models with custom datasets from scratch.

1. create the RelCAT config and set the parameters

In [3]:

Copied!

config = ConfigRelCAT()
config.general.log_level = logging.INFO
config.general.model_name = "bert-base-uncased" # base model that you want to use, we're going to use the HuggingFace bert-base-uncased model
config = ConfigRelCAT()
config.general.log_level = logging.INFO
config.general.model_name = "bert-base-uncased" # base model that you want to use, we're going to use the HuggingFace bert-base-uncased model

1.1 Based on what model you use, you might want to keep an eye on config.model.hidden_size, config.model.model_size and config.model.hidden_layers

In [4]:

Copied!

config.model.hidden_size= 256
config.model.model_size = 2304 # 4096 for llama
config.model.hidden_size= 256
config.model.model_size = 2304 # 4096 for llama

1.2 Other notable configurations

In [5]:

Copied!





config.general.cntx_left = 15 # how many tokens to the left of the start entity we select
config.general.cntx_right = 15 # how many tokens to the right of the end entity we selecd
config.general.window_size = 300 # distance (in characters) between two entities to be considered a relation
config.train.nclasses = 2 # number of classes in your medcat export / dataset
config.train.nepochs = 10 # number of epochs to train for
config.model.freeze_layers = False # whether to freeze the layers of the base model
config.general.limit_samples_per_class = 300 # limit the number of training samples per class to this number, to avoid overfitting in unbalanced datasets
config.train.batch_size = 32 # batch size
config.train.lr = 3e-5
config.train.adam_epsilon = 1e-8
config.train.adam_weight_decay = 0.0005
config.general.cntx_left = 15 # how many tokens to the left of the start entity we select
config.general.cntx_right = 15 # how many tokens to the right of the end entity we selecd
config.general.window_size = 300 # distance (in characters) between two entities to be considered a relation
config.train.nclasses = 2 # number of classes in your medcat export / dataset
config.train.nepochs = 10 # number of epochs to train for
config.model.freeze_layers = False # whether to freeze the layers of the base model
config.general.limit_samples_per_class = 300 # limit the number of training samples per class to this number, to avoid overfitting in unbalanced datasets
config.train.batch_size = 32 # batch size
config.train.lr = 3e-5
config.train.adam_epsilon = 1e-8
config.train.adam_weight_decay = 0.0005

2. create a CDB, it can be a CDB from another model of your choice or an empty one. The CDB is used only when filtering by concept unique identifiers (CUI) or concept type ids (TUI).

In [7]:

Copied!





gen_cnf = Config()
gen_cnf.general.nlp.provider = 'spacy'
cdb = CDB(gen_cnf)
base_tokenizer = create_tokenizer(gen_cnf.general.nlp.provider, gen_cnf)
gen_cnf = Config()
gen_cnf.general.nlp.provider = 'spacy'
cdb = CDB(gen_cnf)
base_tokenizer = create_tokenizer(gen_cnf.general.nlp.provider, gen_cnf)

Collecting en-core-web-md==3.8.0
  Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_md-3.8.0/en_core_web_md-3.8.0-py3-none-any.whl (33.5 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 33.5/33.5 MB 86.4 MB/s eta 0:00:0000:0100:01
Installing collected packages: en-core-web-md
Successfully installed en-core-web-md-3.8.0
✔ Download and installation successful
You can now load the package via spacy.load('en_core_web_md')

3. Create a tokenizer

In [8]:

Copied!

tokenizer = BaseTokenizerWrapper.load(tokenizer_path=config.general.model_name,
                                                                           relcat_config=config)   
tokenizer = BaseTokenizerWrapper.load(tokenizer_path=config.general.model_name,
                                                                           relcat_config=config)

4. Add token tags to tokenizer. This step is optional because the [s1], [e1], [s2], [e2] tags are already located in the default RelCATConfig. If you are using a LLama based model, you will need to add the [PAD] token to the tokenizer, as shown below.

In [9]:

Copied!

special_ent_tokens = ["[s1]", "[e1]", "[s2]", "[e2]"]
tokenizer.hf_tokenizers.add_tokens(special_ent_tokens, special_tokens=True)
tokenizer.hf_tokenizers.add_special_tokens({'pad_token': '[PAD]'}) # used in llama tokenizer
special_ent_tokens = ["[s1]", "[e1]", "[s2]", "[e2]"]
tokenizer.hf_tokenizers.add_tokens(special_ent_tokens, special_tokens=True)
tokenizer.hf_tokenizers.add_special_tokens({'pad_token': '[PAD]'}) # used in llama tokenizer

Out[9]:

5. Add tokens to the RelCATConfig

In [10]:

Copied!

config.general.tokenizer_relation_annotation_special_tokens_tags = special_ent_tokens
config.general.annotation_schema_tag_ids = tokenizer.hf_tokenizers.convert_tokens_to_ids(special_ent_tokens)
config.general.tokenizer_relation_annotation_special_tokens_tags = special_ent_tokens
config.general.annotation_schema_tag_ids = tokenizer.hf_tokenizers.convert_tokens_to_ids(special_ent_tokens)

6. Create the relCAT object and initialize its components

In [12]:

Copied!

# if you wish to skip the steps in section 6.1 you can pass the init_model=True arguement to intialize the components with the default ConfigRelCAT settings.
relCAT = RelCAT(base_tokenizer, cdb, config=config)
# if you wish to skip the steps in section 6.1 you can pass the init_model=True arguement to intialize the components with the default ConfigRelCAT settings.
relCAT = RelCAT(base_tokenizer, cdb, config=config)

INFO:medcat.components.addons.relation_extraction.base_component:RelExtrBaseComponent initialized

6.1 Use the BaseComponent object, this one holds the tokenizer, model and model config. We will have to initialize each component beforehand.

Resize token embeddings since we added the tokens before, this should be done after adding tokens to the tokenizer. It is not required after creating and saving/loading a model as the value will be retained.

In [14]:

Copied!





model_config = RelExtrBertConfig.load(pretrained_model_name_or_path=config.general.model_name,
                                                                   relcat_config=config)

# update the model config with the proper vocab size, since we added special tokens to the tokenizer
model_config.hf_model_config.vocab_size = tokenizer.get_size()

# set the padding idx in the model config and relcat config, this is necesasry as it depends on what tokenizer you use
config.model.padding_idx = model_config.pad_token_id = tokenizer.get_pad_id()

model = RelExtrBertModel.load(pretrained_model_name_or_path=config.general.model_name,
                                                                   model_config=model_config,
                                                                   relcat_config=config)

# we have to update the model to reflect the new token embeddings, since we added special tokens to the tokenizer
model.hf_model.resize_token_embeddings(len(tokenizer.hf_tokenizers)) # type: ignore

component = RelExtrBaseComponent(tokenizer=tokenizer, config=config)
component.model = model
component.model_config = model_config
component.relcat_config = config
component.tokenizer = tokenizer

relCAT.component = component
model_config = RelExtrBertConfig.load(pretrained_model_name_or_path=config.general.model_name,
                                                                   relcat_config=config)

# update the model config with the proper vocab size, since we added special tokens to the tokenizer
model_config.hf_model_config.vocab_size = tokenizer.get_size()

# set the padding idx in the model config and relcat config, this is necesasry as it depends on what tokenizer you use
config.model.padding_idx = model_config.pad_token_id = tokenizer.get_pad_id()

model = RelExtrBertModel.load(pretrained_model_name_or_path=config.general.model_name,
                                                                   model_config=model_config,
                                                                   relcat_config=config)

# we have to update the model to reflect the new token embeddings, since we added special tokens to the tokenizer
model.hf_model.resize_token_embeddings(len(tokenizer.hf_tokenizers)) # type: ignore

component = RelExtrBaseComponent(tokenizer=tokenizer, config=config)
component.model = model
component.model_config = model_config
component.relcat_config = config
component.tokenizer = tokenizer

relCAT.component = component

INFO:medcat.components.addons.relation_extraction.bert.config:Loaded config from pretrained: bert-base-uncased
INFO:medcat.components.addons.relation_extraction.models:RelCAT model config: BertConfig {
  "architectures": [
    "BertForMaskedLM"
  ],
  "attention_probs_dropout_prob": 0.1,
  "classifier_dropout": null,
  "gradient_checkpointing": false,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "layer_norm_eps": 1e-12,
  "max_position_embeddings": 512,
  "model_type": "bert",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 0,
  "position_embedding_type": "absolute",
  "transformers_version": "4.53.0",
  "type_vocab_size": 2,
  "use_cache": true,
  "vocab_size": 30526
}

INFO:medcat.components.addons.relation_extraction.models:RelCAT model config: BertConfig {
  "architectures": [
    "BertForMaskedLM"
  ],
  "attention_probs_dropout_prob": 0.1,
  "classifier_dropout": null,
  "gradient_checkpointing": false,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "layer_norm_eps": 1e-12,
  "max_position_embeddings": 512,
  "model_type": "bert",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 0,
  "position_embedding_type": "absolute",
  "transformers_version": "4.53.0",
  "type_vocab_size": 2,
  "use_cache": true,
  "vocab_size": 30526
}

Some weights of BertModel were not initialized from the model checkpoint at bert-base-uncased and are newly initialized because the shapes did not match:
- embeddings.word_embeddings.weight: found shape torch.Size([30522, 768]) in the checkpoint and torch.Size([30526, 768]) in the model instantiated
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
INFO:medcat.components.addons.relation_extraction.bert.model:Loaded model from pretrained: bert-base-uncased
INFO:medcat.components.addons.relation_extraction.models:Loaded RelExtrBertModel from pretrained_model_name_or_path: bert-base-uncased
INFO:medcat.components.addons.relation_extraction.base_component:RelExtrBaseComponent initialized

7. Train the model from the ADE dataset.

In [15]:

Copied!

! rm -rf "./ade_relcat_model"
! mkdir -p "./ade_relcat_model"
! rm -rf "./ade_relcat_model"
! mkdir -p "./ade_relcat_model"

In [17]:

Copied!

relCAT.train(train_csv_path="./data/rel_cat_ADE_V2.tsv", checkpoint_path="./ade_relcat_model")

# for MedCAT Trainer Exports, use the export_path argument : relCAT.train(export_data_path="./data/MedCAT_Export_relation_extraction.json")
relCAT.train(train_csv_path="./data/rel_cat_ADE_V2.tsv", checkpoint_path="./ade_relcat_model")

# for MedCAT Trainer Exports, use the export_path argument : relCAT.train(export_data_path="./data/MedCAT_Export_relation_extraction.json")

INFO:medcat.components.addons.relation_extraction.rel_dataset:CSV dataset | No. of relations detected: 7093 | from : ./data/rel_cat_ADE_V2.tsv | nclasses: 2 | idx2label: {0: 'DRUG-AE', 1: 'DRUG-DOSE'}
INFO:medcat.components.addons.relation_extraction.rel_dataset:Samples per class: 
INFO:medcat.components.addons.relation_extraction.rel_dataset: label: DRUG-AE | samples: 6814
INFO:medcat.components.addons.relation_extraction.rel_dataset: label: DRUG-DOSE | samples: 279
INFO:root:Relations after train, test split :  train - 524 | test - 115
INFO:root: label: DRUG-AE samples | train 300 | test 60
INFO:root: label: DRUG-DOSE samples | train 224 | test 55
INFO:root:Attempting to load RelCAT model on device: cpu
INFO:medcat.components.addons.relation_extraction.rel_cat:Starting training process...
INFO:medcat.components.addons.relation_extraction.rel_cat:Total epochs on this model: 10 | currently training epoch 0
  0%|          | 0/524 [00:00<?, ?it/s]/Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/.venv_v2_tut/lib/python3.11/site-packages/torch/utils/data/dataloader.py:683: UserWarning: 'pin_memory' argument is set as true but not supported on MPS now, then device pinned memory won't be used.
  warnings.warn(warn_msg)
100%|██████████| 524/524 [00:29<00:00, 17.52it/s]
INFO:medcat.components.addons.relation_extraction.rel_cat:Losses at Epoch 0: 0.02372
INFO:medcat.components.addons.relation_extraction.rel_cat:Train accuracy at Epoch 0: 0.53125
INFO:medcat.components.addons.relation_extraction.rel_cat:======================== TRAIN SET TEST RESULTS ========================
INFO:medcat.components.addons.relation_extraction.rel_cat:Evaluating test samples...
INFO:medcat.components.addons.relation_extraction.rel_cat:==================== Evaluation Results ====================
INFO:medcat.components.addons.relation_extraction.rel_cat: no. of batches:17
INFO:medcat.components.addons.relation_extraction.rel_cat: accuracy = 0.573
INFO:medcat.components.addons.relation_extraction.rel_cat: f1 = 0.573
INFO:medcat.components.addons.relation_extraction.rel_cat: loss = 0.681
INFO:medcat.components.addons.relation_extraction.rel_cat: precision = 0.573
INFO:medcat.components.addons.relation_extraction.rel_cat: recall = 0.573
INFO:medcat.components.addons.relation_extraction.rel_cat:----------------------- class stats -----------------------
INFO:medcat.components.addons.relation_extraction.rel_cat:label: DRUG-AE | f1: 0.724 | prec : 1.000 | acc: 0.573 | recall: 0.573 
INFO:medcat.components.addons.relation_extraction.rel_cat:label: DRUG-DOSE | f1: 0.000 | prec : 0.000 | acc: 0.573 | recall: 0.000 
INFO:medcat.components.addons.relation_extraction.rel_cat:-----------------------------------------------------------
INFO:medcat.components.addons.relation_extraction.rel_cat:===========================================================
INFO:medcat.components.addons.relation_extraction.rel_cat:======================== TEST SET TEST RESULTS ========================
INFO:medcat.components.addons.relation_extraction.rel_cat:Evaluating test samples...
INFO:medcat.components.addons.relation_extraction.rel_cat:==================== Evaluation Results ====================
INFO:medcat.components.addons.relation_extraction.rel_cat: no. of batches:4
INFO:medcat.components.addons.relation_extraction.rel_cat: accuracy = 0.522
INFO:medcat.components.addons.relation_extraction.rel_cat: f1 = 0.522
INFO:medcat.components.addons.relation_extraction.rel_cat: loss = 0.692
INFO:medcat.components.addons.relation_extraction.rel_cat: precision = 0.522
INFO:medcat.components.addons.relation_extraction.rel_cat: recall = 0.522
INFO:medcat.components.addons.relation_extraction.rel_cat:----------------------- class stats -----------------------
INFO:medcat.components.addons.relation_extraction.rel_cat:label: DRUG-AE | f1: 0.682 | prec : 1.000 | acc: 0.522 | recall: 0.522 
INFO:medcat.components.addons.relation_extraction.rel_cat:label: DRUG-DOSE | f1: 0.000 | prec : 0.000 | acc: 0.522 | recall: 0.000 
INFO:medcat.components.addons.relation_extraction.rel_cat:-----------------------------------------------------------
INFO:medcat.components.addons.relation_extraction.rel_cat:===========================================================
INFO:medcat.components.addons.relation_extraction.rel_cat:Epoch finished, took 0:00:29.914109 seconds
INFO:medcat.components.addons.relation_extraction.rel_cat:Total epochs on this model: 10 | currently training epoch 1
100%|██████████| 524/524 [00:27<00:00, 19.24it/s]
INFO:medcat.components.addons.relation_extraction.rel_cat:Losses at Epoch 1: 0.02317
INFO:medcat.components.addons.relation_extraction.rel_cat:Train accuracy at Epoch 1: 0.57782
INFO:medcat.components.addons.relation_extraction.rel_cat:======================== TRAIN SET TEST RESULTS ========================
INFO:medcat.components.addons.relation_extraction.rel_cat:Evaluating test samples...
INFO:medcat.components.addons.relation_extraction.rel_cat:==================== Evaluation Results ====================
INFO:medcat.components.addons.relation_extraction.rel_cat: no. of batches:17
INFO:medcat.components.addons.relation_extraction.rel_cat: accuracy = 0.576
INFO:medcat.components.addons.relation_extraction.rel_cat: f1 = 0.576
INFO:medcat.components.addons.relation_extraction.rel_cat: loss = 0.677
INFO:medcat.components.addons.relation_extraction.rel_cat: precision = 0.576
INFO:medcat.components.addons.relation_extraction.rel_cat: recall = 0.576
INFO:medcat.components.addons.relation_extraction.rel_cat:----------------------- class stats -----------------------
INFO:medcat.components.addons.relation_extraction.rel_cat:label: DRUG-AE | f1: 0.725 | prec : 1.000 | acc: 0.576 | recall: 0.576 
INFO:medcat.components.addons.relation_extraction.rel_cat:label: DRUG-DOSE | f1: 0.000 | prec : 0.000 | acc: 0.576 | recall: 0.000 
INFO:medcat.components.addons.relation_extraction.rel_cat:-----------------------------------------------------------
INFO:medcat.components.addons.relation_extraction.rel_cat:===========================================================
INFO:medcat.components.addons.relation_extraction.rel_cat:======================== TEST SET TEST RESULTS ========================
INFO:medcat.components.addons.relation_extraction.rel_cat:Evaluating test samples...
INFO:medcat.components.addons.relation_extraction.rel_cat:==================== Evaluation Results ====================
INFO:medcat.components.addons.relation_extraction.rel_cat: no. of batches:4
INFO:medcat.components.addons.relation_extraction.rel_cat: accuracy = 0.501
INFO:medcat.components.addons.relation_extraction.rel_cat: f1 = 0.501
INFO:medcat.components.addons.relation_extraction.rel_cat: loss = 0.702
INFO:medcat.components.addons.relation_extraction.rel_cat: precision = 0.501
INFO:medcat.components.addons.relation_extraction.rel_cat: recall = 0.501
INFO:medcat.components.addons.relation_extraction.rel_cat:----------------------- class stats -----------------------
INFO:medcat.components.addons.relation_extraction.rel_cat:label: DRUG-AE | f1: 0.658 | prec : 1.000 | acc: 0.501 | recall: 0.501 
INFO:medcat.components.addons.relation_extraction.rel_cat:label: DRUG-DOSE | f1: 0.000 | prec : 0.000 | acc: 0.501 | recall: 0.000 
INFO:medcat.components.addons.relation_extraction.rel_cat:-----------------------------------------------------------
INFO:medcat.components.addons.relation_extraction.rel_cat:===========================================================
INFO:medcat.components.addons.relation_extraction.rel_cat:Epoch finished, took 0:00:27.238975 seconds
INFO:medcat.components.addons.relation_extraction.rel_cat:Total epochs on this model: 10 | currently training epoch 2
100%|██████████| 524/524 [00:25<00:00, 20.54it/s]
INFO:medcat.components.addons.relation_extraction.rel_cat:Losses at Epoch 2: 0.02323
INFO:medcat.components.addons.relation_extraction.rel_cat:Train accuracy at Epoch 2: 0.57598
INFO:medcat.components.addons.relation_extraction.rel_cat:======================== TRAIN SET TEST RESULTS ========================
INFO:medcat.components.addons.relation_extraction.rel_cat:Evaluating test samples...
INFO:medcat.components.addons.relation_extraction.rel_cat:==================== Evaluation Results ====================
INFO:medcat.components.addons.relation_extraction.rel_cat: no. of batches:17
INFO:medcat.components.addons.relation_extraction.rel_cat: accuracy = 0.570
INFO:medcat.components.addons.relation_extraction.rel_cat: f1 = 0.570
INFO:medcat.components.addons.relation_extraction.rel_cat: loss = 0.675
INFO:medcat.components.addons.relation_extraction.rel_cat: precision = 0.570
INFO:medcat.components.addons.relation_extraction.rel_cat: recall = 0.570
INFO:medcat.components.addons.relation_extraction.rel_cat:----------------------- class stats -----------------------
INFO:medcat.components.addons.relation_extraction.rel_cat:label: DRUG-AE | f1: 0.722 | prec : 1.000 | acc: 0.570 | recall: 0.570 
INFO:medcat.components.addons.relation_extraction.rel_cat:label: DRUG-DOSE | f1: 0.000 | prec : 0.000 | acc: 0.570 | recall: 0.000 
INFO:medcat.components.addons.relation_extraction.rel_cat:-----------------------------------------------------------
INFO:medcat.components.addons.relation_extraction.rel_cat:===========================================================
INFO:medcat.components.addons.relation_extraction.rel_cat:======================== TEST SET TEST RESULTS ========================
INFO:medcat.components.addons.relation_extraction.rel_cat:Evaluating test samples...
INFO:medcat.components.addons.relation_extraction.rel_cat:==================== Evaluation Results ====================
INFO:medcat.components.addons.relation_extraction.rel_cat: no. of batches:4
INFO:medcat.components.addons.relation_extraction.rel_cat: accuracy = 0.517
INFO:medcat.components.addons.relation_extraction.rel_cat: f1 = 0.517
INFO:medcat.components.addons.relation_extraction.rel_cat: loss = 0.691
INFO:medcat.components.addons.relation_extraction.rel_cat: precision = 0.517
INFO:medcat.components.addons.relation_extraction.rel_cat: recall = 0.517
INFO:medcat.components.addons.relation_extraction.rel_cat:----------------------- class stats -----------------------
INFO:medcat.components.addons.relation_extraction.rel_cat:label: DRUG-AE | f1: 0.679 | prec : 1.000 | acc: 0.517 | recall: 0.517 
INFO:medcat.components.addons.relation_extraction.rel_cat:label: DRUG-DOSE | f1: 0.000 | prec : 0.000 | acc: 0.517 | recall: 0.000 
INFO:medcat.components.addons.relation_extraction.rel_cat:-----------------------------------------------------------
INFO:medcat.components.addons.relation_extraction.rel_cat:===========================================================
INFO:medcat.components.addons.relation_extraction.rel_cat:Epoch finished, took 0:00:25.511498 seconds
INFO:medcat.components.addons.relation_extraction.rel_cat:Total epochs on this model: 10 | currently training epoch 3
100%|██████████| 524/524 [00:26<00:00, 19.72it/s]
INFO:medcat.components.addons.relation_extraction.rel_cat:Losses at Epoch 3: 0.02311
INFO:medcat.components.addons.relation_extraction.rel_cat:Train accuracy at Epoch 3: 0.57598
INFO:medcat.components.addons.relation_extraction.rel_cat:======================== TRAIN SET TEST RESULTS ========================
INFO:medcat.components.addons.relation_extraction.rel_cat:Evaluating test samples...
INFO:medcat.components.addons.relation_extraction.rel_cat:==================== Evaluation Results ====================
INFO:medcat.components.addons.relation_extraction.rel_cat: no. of batches:17
INFO:medcat.components.addons.relation_extraction.rel_cat: accuracy = 0.573
INFO:medcat.components.addons.relation_extraction.rel_cat: f1 = 0.573
INFO:medcat.components.addons.relation_extraction.rel_cat: loss = 0.665
INFO:medcat.components.addons.relation_extraction.rel_cat: precision = 0.573
INFO:medcat.components.addons.relation_extraction.rel_cat: recall = 0.573
INFO:medcat.components.addons.relation_extraction.rel_cat:----------------------- class stats -----------------------
INFO:medcat.components.addons.relation_extraction.rel_cat:label: DRUG-AE | f1: 0.722 | prec : 1.000 | acc: 0.573 | recall: 0.573 
INFO:medcat.components.addons.relation_extraction.rel_cat:label: DRUG-DOSE | f1: 0.000 | prec : 0.000 | acc: 0.573 | recall: 0.000 
INFO:medcat.components.addons.relation_extraction.rel_cat:-----------------------------------------------------------
INFO:medcat.components.addons.relation_extraction.rel_cat:===========================================================
INFO:medcat.components.addons.relation_extraction.rel_cat:======================== TEST SET TEST RESULTS ========================
INFO:medcat.components.addons.relation_extraction.rel_cat:Evaluating test samples...
INFO:medcat.components.addons.relation_extraction.rel_cat:==================== Evaluation Results ====================
INFO:medcat.components.addons.relation_extraction.rel_cat: no. of batches:4
INFO:medcat.components.addons.relation_extraction.rel_cat: accuracy = 0.522
INFO:medcat.components.addons.relation_extraction.rel_cat: f1 = 0.522
INFO:medcat.components.addons.relation_extraction.rel_cat: loss = 0.684
INFO:medcat.components.addons.relation_extraction.rel_cat: precision = 0.522
INFO:medcat.components.addons.relation_extraction.rel_cat: recall = 0.522
INFO:medcat.components.addons.relation_extraction.rel_cat:----------------------- class stats -----------------------
INFO:medcat.components.addons.relation_extraction.rel_cat:label: DRUG-AE | f1: 0.686 | prec : 1.000 | acc: 0.522 | recall: 0.522 
INFO:medcat.components.addons.relation_extraction.rel_cat:label: DRUG-DOSE | f1: 0.000 | prec : 0.000 | acc: 0.522 | recall: 0.000 
INFO:medcat.components.addons.relation_extraction.rel_cat:-----------------------------------------------------------
INFO:medcat.components.addons.relation_extraction.rel_cat:===========================================================
INFO:medcat.components.addons.relation_extraction.rel_cat:Epoch finished, took 0:00:26.571325 seconds
INFO:medcat.components.addons.relation_extraction.rel_cat:Total epochs on this model: 10 | currently training epoch 4
100%|██████████| 524/524 [00:25<00:00, 20.37it/s]
INFO:medcat.components.addons.relation_extraction.rel_cat:Losses at Epoch 4: 0.02265
INFO:medcat.components.addons.relation_extraction.rel_cat:Train accuracy at Epoch 4: 0.57721
INFO:medcat.components.addons.relation_extraction.rel_cat:======================== TRAIN SET TEST RESULTS ========================
INFO:medcat.components.addons.relation_extraction.rel_cat:Evaluating test samples...
INFO:medcat.components.addons.relation_extraction.rel_cat:==================== Evaluation Results ====================
INFO:medcat.components.addons.relation_extraction.rel_cat: no. of batches:17
INFO:medcat.components.addons.relation_extraction.rel_cat: accuracy = 0.657
INFO:medcat.components.addons.relation_extraction.rel_cat: f1 = 0.657
INFO:medcat.components.addons.relation_extraction.rel_cat: loss = 0.638
INFO:medcat.components.addons.relation_extraction.rel_cat: precision = 0.657
INFO:medcat.components.addons.relation_extraction.rel_cat: recall = 0.657
INFO:medcat.components.addons.relation_extraction.rel_cat:----------------------- class stats -----------------------
INFO:medcat.components.addons.relation_extraction.rel_cat:label: DRUG-AE | f1: 0.762 | prec : 0.979 | acc: 0.657 | recall: 0.631 
INFO:medcat.components.addons.relation_extraction.rel_cat:label: DRUG-DOSE | f1: 0.322 | prec : 0.205 | acc: 0.657 | recall: 0.869 
INFO:medcat.components.addons.relation_extraction.rel_cat:-----------------------------------------------------------
INFO:medcat.components.addons.relation_extraction.rel_cat:===========================================================
INFO:medcat.components.addons.relation_extraction.rel_cat:======================== TEST SET TEST RESULTS ========================
INFO:medcat.components.addons.relation_extraction.rel_cat:Evaluating test samples...
INFO:medcat.components.addons.relation_extraction.rel_cat:==================== Evaluation Results ====================
INFO:medcat.components.addons.relation_extraction.rel_cat: no. of batches:4
INFO:medcat.components.addons.relation_extraction.rel_cat: accuracy = 0.535
INFO:medcat.components.addons.relation_extraction.rel_cat: f1 = 0.535
INFO:medcat.components.addons.relation_extraction.rel_cat: loss = 0.673
INFO:medcat.components.addons.relation_extraction.rel_cat: precision = 0.535
INFO:medcat.components.addons.relation_extraction.rel_cat: recall = 0.535
INFO:medcat.components.addons.relation_extraction.rel_cat:----------------------- class stats -----------------------
INFO:medcat.components.addons.relation_extraction.rel_cat:label: DRUG-AE | f1: 0.681 | prec : 0.972 | acc: 0.535 | recall: 0.527 
INFO:medcat.components.addons.relation_extraction.rel_cat:label: DRUG-DOSE | f1: 0.124 | prec : 0.068 | acc: 0.535 | recall: 0.750 
INFO:medcat.components.addons.relation_extraction.rel_cat:-----------------------------------------------------------
INFO:medcat.components.addons.relation_extraction.rel_cat:===========================================================
INFO:medcat.components.addons.relation_extraction.rel_cat:Epoch finished, took 0:00:25.722024 seconds
INFO:medcat.components.addons.relation_extraction.rel_cat:Total epochs on this model: 10 | currently training epoch 5
100%|██████████| 524/524 [00:28<00:00, 18.52it/s]
INFO:medcat.components.addons.relation_extraction.rel_cat:Losses at Epoch 5: 0.02180
INFO:medcat.components.addons.relation_extraction.rel_cat:Train accuracy at Epoch 5: 0.63909
INFO:medcat.components.addons.relation_extraction.rel_cat:======================== TRAIN SET TEST RESULTS ========================
INFO:medcat.components.addons.relation_extraction.rel_cat:Evaluating test samples...
INFO:medcat.components.addons.relation_extraction.rel_cat:==================== Evaluation Results ====================
INFO:medcat.components.addons.relation_extraction.rel_cat: no. of batches:17
INFO:medcat.components.addons.relation_extraction.rel_cat: accuracy = 0.794
INFO:medcat.components.addons.relation_extraction.rel_cat: f1 = 0.794
INFO:medcat.components.addons.relation_extraction.rel_cat: loss = 0.595
INFO:medcat.components.addons.relation_extraction.rel_cat: precision = 0.794
INFO:medcat.components.addons.relation_extraction.rel_cat: recall = 0.794
INFO:medcat.components.addons.relation_extraction.rel_cat:----------------------- class stats -----------------------
INFO:medcat.components.addons.relation_extraction.rel_cat:label: DRUG-AE | f1: 0.808 | prec : 0.769 | acc: 0.794 | recall: 0.859 
INFO:medcat.components.addons.relation_extraction.rel_cat:label: DRUG-DOSE | f1: 0.764 | prec : 0.828 | acc: 0.794 | recall: 0.718 
INFO:medcat.components.addons.relation_extraction.rel_cat:-----------------------------------------------------------
INFO:medcat.components.addons.relation_extraction.rel_cat:===========================================================
INFO:medcat.components.addons.relation_extraction.rel_cat:======================== TEST SET TEST RESULTS ========================
INFO:medcat.components.addons.relation_extraction.rel_cat:Evaluating test samples...
INFO:medcat.components.addons.relation_extraction.rel_cat:==================== Evaluation Results ====================
INFO:medcat.components.addons.relation_extraction.rel_cat: no. of batches:4
INFO:medcat.components.addons.relation_extraction.rel_cat: accuracy = 0.708
INFO:medcat.components.addons.relation_extraction.rel_cat: f1 = 0.708
INFO:medcat.components.addons.relation_extraction.rel_cat: loss = 0.642
INFO:medcat.components.addons.relation_extraction.rel_cat: precision = 0.708
INFO:medcat.components.addons.relation_extraction.rel_cat: recall = 0.708
INFO:medcat.components.addons.relation_extraction.rel_cat:----------------------- class stats -----------------------
INFO:medcat.components.addons.relation_extraction.rel_cat:label: DRUG-AE | f1: 0.687 | prec : 0.615 | acc: 0.708 | recall: 0.782 
INFO:medcat.components.addons.relation_extraction.rel_cat:label: DRUG-DOSE | f1: 0.724 | prec : 0.815 | acc: 0.708 | recall: 0.655 
INFO:medcat.components.addons.relation_extraction.rel_cat:-----------------------------------------------------------
INFO:medcat.components.addons.relation_extraction.rel_cat:===========================================================
INFO:medcat.components.addons.relation_extraction.rel_cat:Epoch finished, took 0:00:28.295869 seconds
INFO:medcat.components.addons.relation_extraction.rel_cat:Total epochs on this model: 10 | currently training epoch 6
100%|██████████| 524/524 [00:26<00:00, 19.75it/s]
INFO:medcat.components.addons.relation_extraction.rel_cat:Losses at Epoch 6: 0.02017
INFO:medcat.components.addons.relation_extraction.rel_cat:Train accuracy at Epoch 6: 0.72610
INFO:medcat.components.addons.relation_extraction.rel_cat:======================== TRAIN SET TEST RESULTS ========================
INFO:medcat.components.addons.relation_extraction.rel_cat:Evaluating test samples...
INFO:medcat.components.addons.relation_extraction.rel_cat:==================== Evaluation Results ====================
INFO:medcat.components.addons.relation_extraction.rel_cat: no. of batches:17
INFO:medcat.components.addons.relation_extraction.rel_cat: accuracy = 0.844
INFO:medcat.components.addons.relation_extraction.rel_cat: f1 = 0.844
INFO:medcat.components.addons.relation_extraction.rel_cat: loss = 0.513
INFO:medcat.components.addons.relation_extraction.rel_cat: precision = 0.844
INFO:medcat.components.addons.relation_extraction.rel_cat: recall = 0.844
INFO:medcat.components.addons.relation_extraction.rel_cat:----------------------- class stats -----------------------
INFO:medcat.components.addons.relation_extraction.rel_cat:label: DRUG-AE | f1: 0.869 | prec : 0.925 | acc: 0.844 | recall: 0.824 
INFO:medcat.components.addons.relation_extraction.rel_cat:label: DRUG-DOSE | f1: 0.794 | prec : 0.730 | acc: 0.844 | recall: 0.886 
INFO:medcat.components.addons.relation_extraction.rel_cat:-----------------------------------------------------------
INFO:medcat.components.addons.relation_extraction.rel_cat:===========================================================
INFO:medcat.components.addons.relation_extraction.rel_cat:======================== TEST SET TEST RESULTS ========================
INFO:medcat.components.addons.relation_extraction.rel_cat:Evaluating test samples...
INFO:medcat.components.addons.relation_extraction.rel_cat:==================== Evaluation Results ====================
INFO:medcat.components.addons.relation_extraction.rel_cat: no. of batches:4
INFO:medcat.components.addons.relation_extraction.rel_cat: accuracy = 0.690
INFO:medcat.components.addons.relation_extraction.rel_cat: f1 = 0.690
INFO:medcat.components.addons.relation_extraction.rel_cat: loss = 0.608
INFO:medcat.components.addons.relation_extraction.rel_cat: precision = 0.690
INFO:medcat.components.addons.relation_extraction.rel_cat: recall = 0.690
INFO:medcat.components.addons.relation_extraction.rel_cat:----------------------- class stats -----------------------
INFO:medcat.components.addons.relation_extraction.rel_cat:label: DRUG-AE | f1: 0.738 | prec : 0.851 | acc: 0.690 | recall: 0.658 
INFO:medcat.components.addons.relation_extraction.rel_cat:label: DRUG-DOSE | f1: 0.609 | prec : 0.515 | acc: 0.690 | recall: 0.759 
INFO:medcat.components.addons.relation_extraction.rel_cat:-----------------------------------------------------------
INFO:medcat.components.addons.relation_extraction.rel_cat:===========================================================
INFO:medcat.components.addons.relation_extraction.rel_cat:Epoch finished, took 0:00:26.537600 seconds
INFO:medcat.components.addons.relation_extraction.rel_cat:Total epochs on this model: 10 | currently training epoch 7
100%|██████████| 524/524 [00:26<00:00, 19.61it/s]
INFO:medcat.components.addons.relation_extraction.rel_cat:Losses at Epoch 7: 0.01812
INFO:medcat.components.addons.relation_extraction.rel_cat:Train accuracy at Epoch 7: 0.78370
INFO:medcat.components.addons.relation_extraction.rel_cat:======================== TRAIN SET TEST RESULTS ========================
INFO:medcat.components.addons.relation_extraction.rel_cat:Evaluating test samples...
INFO:medcat.components.addons.relation_extraction.rel_cat:==================== Evaluation Results ====================
INFO:medcat.components.addons.relation_extraction.rel_cat: no. of batches:17
INFO:medcat.components.addons.relation_extraction.rel_cat: accuracy = 0.822
INFO:medcat.components.addons.relation_extraction.rel_cat: f1 = 0.822
INFO:medcat.components.addons.relation_extraction.rel_cat: loss = 0.483
INFO:medcat.components.addons.relation_extraction.rel_cat: precision = 0.822
INFO:medcat.components.addons.relation_extraction.rel_cat: recall = 0.822
INFO:medcat.components.addons.relation_extraction.rel_cat:----------------------- class stats -----------------------
INFO:medcat.components.addons.relation_extraction.rel_cat:label: DRUG-AE | f1: 0.822 | prec : 0.764 | acc: 0.822 | recall: 0.903 
INFO:medcat.components.addons.relation_extraction.rel_cat:label: DRUG-DOSE | f1: 0.813 | prec : 0.904 | acc: 0.822 | recall: 0.750 
INFO:medcat.components.addons.relation_extraction.rel_cat:-----------------------------------------------------------
INFO:medcat.components.addons.relation_extraction.rel_cat:===========================================================
INFO:medcat.components.addons.relation_extraction.rel_cat:======================== TEST SET TEST RESULTS ========================
INFO:medcat.components.addons.relation_extraction.rel_cat:Evaluating test samples...
INFO:medcat.components.addons.relation_extraction.rel_cat:==================== Evaluation Results ====================
INFO:medcat.components.addons.relation_extraction.rel_cat: no. of batches:4
INFO:medcat.components.addons.relation_extraction.rel_cat: accuracy = 0.736
INFO:medcat.components.addons.relation_extraction.rel_cat: f1 = 0.736
INFO:medcat.components.addons.relation_extraction.rel_cat: loss = 0.577
INFO:medcat.components.addons.relation_extraction.rel_cat: precision = 0.736
INFO:medcat.components.addons.relation_extraction.rel_cat: recall = 0.736
INFO:medcat.components.addons.relation_extraction.rel_cat:----------------------- class stats -----------------------
INFO:medcat.components.addons.relation_extraction.rel_cat:label: DRUG-AE | f1: 0.700 | prec : 0.610 | acc: 0.736 | recall: 0.854 
INFO:medcat.components.addons.relation_extraction.rel_cat:label: DRUG-DOSE | f1: 0.755 | prec : 0.864 | acc: 0.736 | recall: 0.679 
INFO:medcat.components.addons.relation_extraction.rel_cat:-----------------------------------------------------------
INFO:medcat.components.addons.relation_extraction.rel_cat:===========================================================
INFO:medcat.components.addons.relation_extraction.rel_cat:Epoch finished, took 0:00:26.722161 seconds
INFO:medcat.components.addons.relation_extraction.rel_cat:Total epochs on this model: 10 | currently training epoch 8
100%|██████████| 524/524 [00:24<00:00, 20.96it/s]
INFO:medcat.components.addons.relation_extraction.rel_cat:Losses at Epoch 8: 0.01818
INFO:medcat.components.addons.relation_extraction.rel_cat:Train accuracy at Epoch 8: 0.78064
INFO:medcat.components.addons.relation_extraction.rel_cat:======================== TRAIN SET TEST RESULTS ========================
INFO:medcat.components.addons.relation_extraction.rel_cat:Evaluating test samples...
INFO:medcat.components.addons.relation_extraction.rel_cat:==================== Evaluation Results ====================
INFO:medcat.components.addons.relation_extraction.rel_cat: no. of batches:17
INFO:medcat.components.addons.relation_extraction.rel_cat: accuracy = 0.849
INFO:medcat.components.addons.relation_extraction.rel_cat: f1 = 0.849
INFO:medcat.components.addons.relation_extraction.rel_cat: loss = 0.442
INFO:medcat.components.addons.relation_extraction.rel_cat: precision = 0.849
INFO:medcat.components.addons.relation_extraction.rel_cat: recall = 0.849
INFO:medcat.components.addons.relation_extraction.rel_cat:----------------------- class stats -----------------------
INFO:medcat.components.addons.relation_extraction.rel_cat:label: DRUG-AE | f1: 0.856 | prec : 0.812 | acc: 0.849 | recall: 0.914 
INFO:medcat.components.addons.relation_extraction.rel_cat:label: DRUG-DOSE | f1: 0.832 | prec : 0.903 | acc: 0.849 | recall: 0.780 
INFO:medcat.components.addons.relation_extraction.rel_cat:-----------------------------------------------------------
INFO:medcat.components.addons.relation_extraction.rel_cat:===========================================================
INFO:medcat.components.addons.relation_extraction.rel_cat:======================== TEST SET TEST RESULTS ========================
INFO:medcat.components.addons.relation_extraction.rel_cat:Evaluating test samples...
INFO:medcat.components.addons.relation_extraction.rel_cat:==================== Evaluation Results ====================
INFO:medcat.components.addons.relation_extraction.rel_cat: no. of batches:4
INFO:medcat.components.addons.relation_extraction.rel_cat: accuracy = 0.760
INFO:medcat.components.addons.relation_extraction.rel_cat: f1 = 0.760
INFO:medcat.components.addons.relation_extraction.rel_cat: loss = 0.561
INFO:medcat.components.addons.relation_extraction.rel_cat: precision = 0.760
INFO:medcat.components.addons.relation_extraction.rel_cat: recall = 0.760
INFO:medcat.components.addons.relation_extraction.rel_cat:----------------------- class stats -----------------------
INFO:medcat.components.addons.relation_extraction.rel_cat:label: DRUG-AE | f1: 0.745 | prec : 0.691 | acc: 0.760 | recall: 0.829 
INFO:medcat.components.addons.relation_extraction.rel_cat:label: DRUG-DOSE | f1: 0.765 | prec : 0.828 | acc: 0.760 | recall: 0.718 
INFO:medcat.components.addons.relation_extraction.rel_cat:-----------------------------------------------------------
INFO:medcat.components.addons.relation_extraction.rel_cat:===========================================================
INFO:medcat.components.addons.relation_extraction.rel_cat:Epoch finished, took 0:00:24.996328 seconds
INFO:medcat.components.addons.relation_extraction.rel_cat:Total epochs on this model: 10 | currently training epoch 9
100%|██████████| 524/524 [00:25<00:00, 20.30it/s]
INFO:medcat.components.addons.relation_extraction.rel_cat:Losses at Epoch 9: 0.01695
INFO:medcat.components.addons.relation_extraction.rel_cat:Train accuracy at Epoch 9: 0.83456
INFO:medcat.components.addons.relation_extraction.rel_cat:======================== TRAIN SET TEST RESULTS ========================
INFO:medcat.components.addons.relation_extraction.rel_cat:Evaluating test samples...
INFO:medcat.components.addons.relation_extraction.rel_cat:==================== Evaluation Results ====================
INFO:medcat.components.addons.relation_extraction.rel_cat: no. of batches:17
INFO:medcat.components.addons.relation_extraction.rel_cat: accuracy = 0.890
INFO:medcat.components.addons.relation_extraction.rel_cat: f1 = 0.890
INFO:medcat.components.addons.relation_extraction.rel_cat: loss = 0.387
INFO:medcat.components.addons.relation_extraction.rel_cat: precision = 0.890
INFO:medcat.components.addons.relation_extraction.rel_cat: recall = 0.890
INFO:medcat.components.addons.relation_extraction.rel_cat:----------------------- class stats -----------------------
INFO:medcat.components.addons.relation_extraction.rel_cat:label: DRUG-AE | f1: 0.894 | prec : 0.873 | acc: 0.890 | recall: 0.923 
INFO:medcat.components.addons.relation_extraction.rel_cat:label: DRUG-DOSE | f1: 0.878 | prec : 0.908 | acc: 0.890 | recall: 0.858 
INFO:medcat.components.addons.relation_extraction.rel_cat:-----------------------------------------------------------
INFO:medcat.components.addons.relation_extraction.rel_cat:===========================================================
INFO:medcat.components.addons.relation_extraction.rel_cat:======================== TEST SET TEST RESULTS ========================
INFO:medcat.components.addons.relation_extraction.rel_cat:Evaluating test samples...
INFO:medcat.components.addons.relation_extraction.rel_cat:==================== Evaluation Results ====================
INFO:medcat.components.addons.relation_extraction.rel_cat: no. of batches:4
INFO:medcat.components.addons.relation_extraction.rel_cat: accuracy = 0.747
INFO:medcat.components.addons.relation_extraction.rel_cat: f1 = 0.747
INFO:medcat.components.addons.relation_extraction.rel_cat: loss = 0.542
INFO:medcat.components.addons.relation_extraction.rel_cat: precision = 0.747
INFO:medcat.components.addons.relation_extraction.rel_cat: recall = 0.747
INFO:medcat.components.addons.relation_extraction.rel_cat:----------------------- class stats -----------------------
INFO:medcat.components.addons.relation_extraction.rel_cat:label: DRUG-AE | f1: 0.732 | prec : 0.732 | acc: 0.747 | recall: 0.733 
INFO:medcat.components.addons.relation_extraction.rel_cat:label: DRUG-DOSE | f1: 0.747 | prec : 0.754 | acc: 0.747 | recall: 0.742 
INFO:medcat.components.addons.relation_extraction.rel_cat:-----------------------------------------------------------
INFO:medcat.components.addons.relation_extraction.rel_cat:===========================================================
INFO:medcat.components.addons.relation_extraction.rel_cat:Epoch finished, took 0:00:25.820102 seconds

In [18]:

Copied!

# save the model
relCAT.save(save_path="./ade_relcat_model")
# save the model
relCAT.save(save_path="./ade_relcat_model")

In [ ]: