logo
MedCAT Documentation
1. Supervised Training Relation Extraction
Initializing search
    cogstack-nlp
    • Home
    • Architecture
    • Tutorials
    • Upgrading
    • API Reference
    cogstack-nlp
    • Home
    • Architecture
    • Tutorials
        • Basic
          • 1. Build a Concept Database and a Vocabulary
          • 2. Unsupervised training on model
          • 3. Supervised training on model
          • 4. Evaluating performance on dataset
        • MetaCAT
          • 1. Add a MetaCat to a Model
          • 1. Supervised Training Relation Extraction
          • 2. Infering relations from annotations with Relation toolkit
        • 1. Creating and using a custom tokenizer
        • 2. Create and use component
        • Custom Components
          • 1. Create Modelpack with 2step linker
        • Migration
          • 1. Migrate v1 model to v2
      • Migration guide (v2)
      • Breaking changes
      • medcat
        • cat
        • cdb
          • cdb
          • concepts
        • components
          • addons
            • addons
            • meta_cat
              • data_utils
              • mctokenizers
                • bert_tokenizer
                • bpe_tokenizer
                • tokenizers
              • meta_cat
              • ml_utils
              • models
            • relation_extraction
              • base_component
              • bert
                • config
                • model
                • tokenizer
              • config
              • llama
                • config
                • model
                • tokenizer
              • ml_utils
              • models
              • modernbert
                • config
                • model
                • tokenizer
              • pad_seq
              • rel_cat
              • rel_dataset
              • tokenizer
          • linking
            • context_based_linker
            • no_action_linker
            • only_primary_name_linker
            • two_step_context_based_linker
            • vector_context_model
          • ner
            • dict_based_ner
            • trf
              • deid
              • helpers
              • model
              • tokenizer
              • transformers_ner
            • vocab_based_annotator
            • vocab_based_ner
          • normalizing
            • normalizer
          • tagging
            • tagger
          • types
        • config
          • config
          • config_meta_cat
          • config_rel_cat
          • config_transformers_ner
        • data
          • entities
          • mctexport
          • model_card
        • deid
        • model_creation
          • cdb_maker
          • preprocess_snomed
          • preprocess_umls
        • pipeline
          • pipeline
          • speed_utils
        • plugins
          • catalog
          • cli
          • data
          • downloadable
          • installer
          • loader
          • registry
        • preprocessors
          • cleaners
        • stats
          • kfold
          • stats
        • storage
          • jsonserialiser
          • mp_ents_save
          • schema
          • serialisables
          • serialisers
          • zip_utils
        • tokenizing
          • regex_impl
            • tokenizer
          • spacy_impl
            • tokenizers
            • tokens
            • utils
          • tokenizers
          • tokens
        • trainer
        • utils
          • cdb_state
          • cdb_utils
          • check_for_updates
          • config_utils
          • data_utils
          • defaults
          • download_scripts
          • envsnapshot
          • exceptions
          • fileutils
          • filters
          • hasher
          • import_utils
          • iterutils
          • legacy
            • conversion_all
            • convert_cdb
            • convert_config
            • convert_deid
            • convert_meta_cat
            • convert_rel_cat
            • convert_vocab
            • fixes
            • helpers
            • identifier
            • legacy_converter
            • v2_beta
          • matutils
          • ner
            • data_collator
            • metrics
            • transformers_ner
          • postprocessing
          • registry
          • regression
            • checking
            • regression_checker
            • results
            • targeting
            • utils
          • usage_monitoring
          • vocab_utils
        • version
        • vocab

    The current implementation offers support for HF LLama models and BERT models. We will cover only BERT in this section as the Llama usage is the same, just different imports.

    In [ ]:
    Copied!
    # Install medcat
    ! pip install "medcat[spacy,rel-cat,meta-cat]~=2.4.0" # NOTE: VERSION-STRING
    
    # Install medcat ! pip install "medcat[spacy,rel-cat,meta-cat]~=2.4.0" # NOTE: VERSION-STRING
    Collecting medcat@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2 (from medcat[meta-cat,spacy]@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2)
      Cloning https://github.com/CogStack/cogstack-nlp (to revision medcat/v0.11.2) to /private/var/folders/h4/sklqg_zx1dbbbx76m2__zb8h0000gn/T/pip-install-7r4on_8p/medcat_c4a76da1eaa7411a9ff529c5127bf9eb
      Running command git clone --filter=blob:none --quiet https://github.com/CogStack/cogstack-nlp /private/var/folders/h4/sklqg_zx1dbbbx76m2__zb8h0000gn/T/pip-install-7r4on_8p/medcat_c4a76da1eaa7411a9ff529c5127bf9eb
      Running command git checkout -q b1ce30ba716ff7c1f3b912085ca02026b6de3f22
      Resolved https://github.com/CogStack/cogstack-nlp to commit b1ce30ba716ff7c1f3b912085ca02026b6de3f22
      Installing build dependencies ... done
      Getting requirements to build wheel ... done
      Preparing metadata (pyproject.toml) ... done
    Requirement already satisfied: numpy>2.0 in /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/.venv_v2_tut/lib/python3.11/site-packages (from medcat@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2->medcat[meta-cat,spacy]@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2) (2.3.1)
    Requirement already satisfied: dill in /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/.venv_v2_tut/lib/python3.11/site-packages (from medcat@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2->medcat[meta-cat,spacy]@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2) (0.4.0)
    Requirement already satisfied: pandas<3.0,>=2.2 in /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/.venv_v2_tut/lib/python3.11/site-packages (from medcat@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2->medcat[meta-cat,spacy]@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2) (2.3.0)
    Requirement already satisfied: tqdm<5.0,>=4.64 in /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/.venv_v2_tut/lib/python3.11/site-packages (from medcat@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2->medcat[meta-cat,spacy]@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2) (4.67.1)
    Requirement already satisfied: xxhash<4.0,>=3.5.0 in /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/.venv_v2_tut/lib/python3.11/site-packages (from medcat@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2->medcat[meta-cat,spacy]@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2) (3.5.0)
    Requirement already satisfied: pydantic>2.0 in /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/.venv_v2_tut/lib/python3.11/site-packages (from medcat@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2->medcat[meta-cat,spacy]@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2) (2.11.7)
    Requirement already satisfied: typing-extensions in /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/.venv_v2_tut/lib/python3.11/site-packages (from medcat@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2->medcat[meta-cat,spacy]@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2) (4.14.0)
    Requirement already satisfied: spacy in /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/.venv_v2_tut/lib/python3.11/site-packages (from medcat@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2->medcat[meta-cat,spacy]@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2) (3.8.7)
    Requirement already satisfied: transformers<5.0,>=4.41.0 in /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/.venv_v2_tut/lib/python3.11/site-packages (from medcat@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2->medcat[meta-cat,spacy]@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2) (4.53.0)
    Requirement already satisfied: peft<1.0,>0.8.2 in /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/.venv_v2_tut/lib/python3.11/site-packages (from medcat@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2->medcat[meta-cat,spacy]@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2) (0.15.2)
    Requirement already satisfied: torch<3.0,>=2.4.0 in /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/.venv_v2_tut/lib/python3.11/site-packages (from medcat@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2->medcat[meta-cat,spacy]@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2) (2.7.1)
    Requirement already satisfied: scikit-learn<2.0,>=1.1.3 in /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/.venv_v2_tut/lib/python3.11/site-packages (from medcat@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2->medcat[meta-cat,spacy]@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2) (1.7.0)
    Requirement already satisfied: scipy in /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/.venv_v2_tut/lib/python3.11/site-packages (from medcat@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2->medcat[meta-cat,spacy]@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2) (1.16.0)
    Requirement already satisfied: python-dateutil>=2.8.2 in /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/.venv_v2_tut/lib/python3.11/site-packages (from pandas<3.0,>=2.2->medcat@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2->medcat[meta-cat,spacy]@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2) (2.9.0.post0)
    Requirement already satisfied: pytz>=2020.1 in /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/.venv_v2_tut/lib/python3.11/site-packages (from pandas<3.0,>=2.2->medcat@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2->medcat[meta-cat,spacy]@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2) (2025.2)
    Requirement already satisfied: tzdata>=2022.7 in /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/.venv_v2_tut/lib/python3.11/site-packages (from pandas<3.0,>=2.2->medcat@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2->medcat[meta-cat,spacy]@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2) (2025.2)
    Requirement already satisfied: packaging>=20.0 in /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/.venv_v2_tut/lib/python3.11/site-packages (from peft<1.0,>0.8.2->medcat@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2->medcat[meta-cat,spacy]@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2) (25.0)
    Requirement already satisfied: psutil in /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/.venv_v2_tut/lib/python3.11/site-packages (from peft<1.0,>0.8.2->medcat@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2->medcat[meta-cat,spacy]@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2) (7.0.0)
    Requirement already satisfied: pyyaml in /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/.venv_v2_tut/lib/python3.11/site-packages (from peft<1.0,>0.8.2->medcat@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2->medcat[meta-cat,spacy]@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2) (6.0.2)
    Requirement already satisfied: accelerate>=0.21.0 in /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/.venv_v2_tut/lib/python3.11/site-packages (from peft<1.0,>0.8.2->medcat@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2->medcat[meta-cat,spacy]@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2) (1.8.1)
    Requirement already satisfied: safetensors in /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/.venv_v2_tut/lib/python3.11/site-packages (from peft<1.0,>0.8.2->medcat@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2->medcat[meta-cat,spacy]@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2) (0.5.3)
    Requirement already satisfied: huggingface_hub>=0.25.0 in /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/.venv_v2_tut/lib/python3.11/site-packages (from peft<1.0,>0.8.2->medcat@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2->medcat[meta-cat,spacy]@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2) (0.33.1)
    Requirement already satisfied: joblib>=1.2.0 in /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/.venv_v2_tut/lib/python3.11/site-packages (from scikit-learn<2.0,>=1.1.3->medcat@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2->medcat[meta-cat,spacy]@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2) (1.5.1)
    Requirement already satisfied: threadpoolctl>=3.1.0 in /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/.venv_v2_tut/lib/python3.11/site-packages (from scikit-learn<2.0,>=1.1.3->medcat@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2->medcat[meta-cat,spacy]@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2) (3.6.0)
    Requirement already satisfied: filelock in /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/.venv_v2_tut/lib/python3.11/site-packages (from torch<3.0,>=2.4.0->medcat@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2->medcat[meta-cat,spacy]@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2) (3.18.0)
    Requirement already satisfied: sympy>=1.13.3 in /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/.venv_v2_tut/lib/python3.11/site-packages (from torch<3.0,>=2.4.0->medcat@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2->medcat[meta-cat,spacy]@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2) (1.14.0)
    Requirement already satisfied: networkx in /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/.venv_v2_tut/lib/python3.11/site-packages (from torch<3.0,>=2.4.0->medcat@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2->medcat[meta-cat,spacy]@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2) (3.5)
    Requirement already satisfied: jinja2 in /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/.venv_v2_tut/lib/python3.11/site-packages (from torch<3.0,>=2.4.0->medcat@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2->medcat[meta-cat,spacy]@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2) (3.1.6)
    Requirement already satisfied: fsspec in /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/.venv_v2_tut/lib/python3.11/site-packages (from torch<3.0,>=2.4.0->medcat@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2->medcat[meta-cat,spacy]@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2) (2025.5.1)
    Requirement already satisfied: regex!=2019.12.17 in /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/.venv_v2_tut/lib/python3.11/site-packages (from transformers<5.0,>=4.41.0->medcat@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2->medcat[meta-cat,spacy]@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2) (2024.11.6)
    Requirement already satisfied: requests in /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/.venv_v2_tut/lib/python3.11/site-packages (from transformers<5.0,>=4.41.0->medcat@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2->medcat[meta-cat,spacy]@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2) (2.32.4)
    Requirement already satisfied: tokenizers<0.22,>=0.21 in /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/.venv_v2_tut/lib/python3.11/site-packages (from transformers<5.0,>=4.41.0->medcat@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2->medcat[meta-cat,spacy]@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2) (0.21.2)
    Requirement already satisfied: hf-xet<2.0.0,>=1.1.2 in /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/.venv_v2_tut/lib/python3.11/site-packages (from huggingface_hub>=0.25.0->peft<1.0,>0.8.2->medcat@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2->medcat[meta-cat,spacy]@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2) (1.1.5)
    Requirement already satisfied: annotated-types>=0.6.0 in /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/.venv_v2_tut/lib/python3.11/site-packages (from pydantic>2.0->medcat@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2->medcat[meta-cat,spacy]@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2) (0.7.0)
    Requirement already satisfied: pydantic-core==2.33.2 in /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/.venv_v2_tut/lib/python3.11/site-packages (from pydantic>2.0->medcat@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2->medcat[meta-cat,spacy]@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2) (2.33.2)
    Requirement already satisfied: typing-inspection>=0.4.0 in /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/.venv_v2_tut/lib/python3.11/site-packages (from pydantic>2.0->medcat@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2->medcat[meta-cat,spacy]@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2) (0.4.1)
    Requirement already satisfied: six>=1.5 in /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/.venv_v2_tut/lib/python3.11/site-packages (from python-dateutil>=2.8.2->pandas<3.0,>=2.2->medcat@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2->medcat[meta-cat,spacy]@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2) (1.17.0)
    Requirement already satisfied: mpmath<1.4,>=1.1.0 in /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/.venv_v2_tut/lib/python3.11/site-packages (from sympy>=1.13.3->torch<3.0,>=2.4.0->medcat@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2->medcat[meta-cat,spacy]@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2) (1.3.0)
    Requirement already satisfied: MarkupSafe>=2.0 in /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/.venv_v2_tut/lib/python3.11/site-packages (from jinja2->torch<3.0,>=2.4.0->medcat@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2->medcat[meta-cat,spacy]@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2) (3.0.2)
    Requirement already satisfied: charset_normalizer<4,>=2 in /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/.venv_v2_tut/lib/python3.11/site-packages (from requests->transformers<5.0,>=4.41.0->medcat@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2->medcat[meta-cat,spacy]@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2) (3.4.2)
    Requirement already satisfied: idna<4,>=2.5 in /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/.venv_v2_tut/lib/python3.11/site-packages (from requests->transformers<5.0,>=4.41.0->medcat@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2->medcat[meta-cat,spacy]@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2) (3.10)
    Requirement already satisfied: urllib3<3,>=1.21.1 in /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/.venv_v2_tut/lib/python3.11/site-packages (from requests->transformers<5.0,>=4.41.0->medcat@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2->medcat[meta-cat,spacy]@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2) (2.5.0)
    Requirement already satisfied: certifi>=2017.4.17 in /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/.venv_v2_tut/lib/python3.11/site-packages (from requests->transformers<5.0,>=4.41.0->medcat@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2->medcat[meta-cat,spacy]@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2) (2025.6.15)
    Requirement already satisfied: spacy-legacy<3.1.0,>=3.0.11 in /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/.venv_v2_tut/lib/python3.11/site-packages (from spacy->medcat@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2->medcat[meta-cat,spacy]@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2) (3.0.12)
    Requirement already satisfied: spacy-loggers<2.0.0,>=1.0.0 in /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/.venv_v2_tut/lib/python3.11/site-packages (from spacy->medcat@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2->medcat[meta-cat,spacy]@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2) (1.0.5)
    Requirement already satisfied: murmurhash<1.1.0,>=0.28.0 in /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/.venv_v2_tut/lib/python3.11/site-packages (from spacy->medcat@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2->medcat[meta-cat,spacy]@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2) (1.0.13)
    Requirement already satisfied: cymem<2.1.0,>=2.0.2 in /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/.venv_v2_tut/lib/python3.11/site-packages (from spacy->medcat@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2->medcat[meta-cat,spacy]@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2) (2.0.11)
    Requirement already satisfied: preshed<3.1.0,>=3.0.2 in /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/.venv_v2_tut/lib/python3.11/site-packages (from spacy->medcat@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2->medcat[meta-cat,spacy]@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2) (3.0.10)
    Requirement already satisfied: thinc<8.4.0,>=8.3.4 in /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/.venv_v2_tut/lib/python3.11/site-packages (from spacy->medcat@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2->medcat[meta-cat,spacy]@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2) (8.3.6)
    Requirement already satisfied: wasabi<1.2.0,>=0.9.1 in /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/.venv_v2_tut/lib/python3.11/site-packages (from spacy->medcat@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2->medcat[meta-cat,spacy]@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2) (1.1.3)
    Requirement already satisfied: srsly<3.0.0,>=2.4.3 in /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/.venv_v2_tut/lib/python3.11/site-packages (from spacy->medcat@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2->medcat[meta-cat,spacy]@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2) (2.5.1)
    Requirement already satisfied: catalogue<2.1.0,>=2.0.6 in /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/.venv_v2_tut/lib/python3.11/site-packages (from spacy->medcat@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2->medcat[meta-cat,spacy]@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2) (2.0.10)
    Requirement already satisfied: weasel<0.5.0,>=0.1.0 in /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/.venv_v2_tut/lib/python3.11/site-packages (from spacy->medcat@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2->medcat[meta-cat,spacy]@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2) (0.4.1)
    Requirement already satisfied: typer<1.0.0,>=0.3.0 in /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/.venv_v2_tut/lib/python3.11/site-packages (from spacy->medcat@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2->medcat[meta-cat,spacy]@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2) (0.16.0)
    Requirement already satisfied: setuptools in /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/.venv_v2_tut/lib/python3.11/site-packages (from spacy->medcat@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2->medcat[meta-cat,spacy]@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2) (65.5.0)
    Requirement already satisfied: langcodes<4.0.0,>=3.2.0 in /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/.venv_v2_tut/lib/python3.11/site-packages (from spacy->medcat@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2->medcat[meta-cat,spacy]@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2) (3.5.0)
    Requirement already satisfied: language-data>=1.2 in /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/.venv_v2_tut/lib/python3.11/site-packages (from langcodes<4.0.0,>=3.2.0->spacy->medcat@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2->medcat[meta-cat,spacy]@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2) (1.3.0)
    Requirement already satisfied: blis<1.4.0,>=1.3.0 in /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/.venv_v2_tut/lib/python3.11/site-packages (from thinc<8.4.0,>=8.3.4->spacy->medcat@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2->medcat[meta-cat,spacy]@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2) (1.3.0)
    Requirement already satisfied: confection<1.0.0,>=0.0.1 in /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/.venv_v2_tut/lib/python3.11/site-packages (from thinc<8.4.0,>=8.3.4->spacy->medcat@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2->medcat[meta-cat,spacy]@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2) (0.1.5)
    Requirement already satisfied: click>=8.0.0 in /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/.venv_v2_tut/lib/python3.11/site-packages (from typer<1.0.0,>=0.3.0->spacy->medcat@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2->medcat[meta-cat,spacy]@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2) (8.2.1)
    Requirement already satisfied: shellingham>=1.3.0 in /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/.venv_v2_tut/lib/python3.11/site-packages (from typer<1.0.0,>=0.3.0->spacy->medcat@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2->medcat[meta-cat,spacy]@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2) (1.5.4)
    Requirement already satisfied: rich>=10.11.0 in /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/.venv_v2_tut/lib/python3.11/site-packages (from typer<1.0.0,>=0.3.0->spacy->medcat@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2->medcat[meta-cat,spacy]@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2) (14.0.0)
    Requirement already satisfied: cloudpathlib<1.0.0,>=0.7.0 in /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/.venv_v2_tut/lib/python3.11/site-packages (from weasel<0.5.0,>=0.1.0->spacy->medcat@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2->medcat[meta-cat,spacy]@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2) (0.21.1)
    Requirement already satisfied: smart-open<8.0.0,>=5.2.1 in /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/.venv_v2_tut/lib/python3.11/site-packages (from weasel<0.5.0,>=0.1.0->spacy->medcat@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2->medcat[meta-cat,spacy]@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2) (7.1.0)
    Requirement already satisfied: wrapt in /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/.venv_v2_tut/lib/python3.11/site-packages (from smart-open<8.0.0,>=5.2.1->weasel<0.5.0,>=0.1.0->spacy->medcat@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2->medcat[meta-cat,spacy]@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2) (1.17.2)
    Requirement already satisfied: marisa-trie>=1.1.0 in /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/.venv_v2_tut/lib/python3.11/site-packages (from language-data>=1.2->langcodes<4.0.0,>=3.2.0->spacy->medcat@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2->medcat[meta-cat,spacy]@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2) (1.2.1)
    Requirement already satisfied: markdown-it-py>=2.2.0 in /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/.venv_v2_tut/lib/python3.11/site-packages (from rich>=10.11.0->typer<1.0.0,>=0.3.0->spacy->medcat@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2->medcat[meta-cat,spacy]@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2) (3.0.0)
    Requirement already satisfied: pygments<3.0.0,>=2.13.0 in /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/.venv_v2_tut/lib/python3.11/site-packages (from rich>=10.11.0->typer<1.0.0,>=0.3.0->spacy->medcat@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2->medcat[meta-cat,spacy]@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2) (2.19.2)
    Requirement already satisfied: mdurl~=0.1 in /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/.venv_v2_tut/lib/python3.11/site-packages (from markdown-it-py>=2.2.0->rich>=10.11.0->typer<1.0.0,>=0.3.0->spacy->medcat@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2->medcat[meta-cat,spacy]@ git+https://github.com/CogStack/cogstack-nlp@medcat/v0.11.2#subdirectory=medcat-v2) (0.1.2)
    
    In [ ]:
    Copied!
    import logging
    from medcat.cdb import CDB
    from medcat.config.config_rel_cat import ConfigRelCAT
    from medcat.components.addons.relation_extraction.rel_cat import RelCAT
    from medcat.components.addons.relation_extraction.base_component import RelExtrBaseComponent
    from medcat.components.addons.relation_extraction.bert.model import RelExtrBertModel
    from medcat.components.addons.relation_extraction.bert.config import RelExtrBertConfig
    from medcat.components.addons.relation_extraction.tokenizer import BaseTokenizerWrapper
    from medcat.config import Config
    from medcat.tokenizing.tokenizers import create_tokenizer
    
    import logging from medcat.cdb import CDB from medcat.config.config_rel_cat import ConfigRelCAT from medcat.components.addons.relation_extraction.rel_cat import RelCAT from medcat.components.addons.relation_extraction.base_component import RelExtrBaseComponent from medcat.components.addons.relation_extraction.bert.model import RelExtrBertModel from medcat.components.addons.relation_extraction.bert.config import RelExtrBertConfig from medcat.components.addons.relation_extraction.tokenizer import BaseTokenizerWrapper from medcat.config import Config from medcat.tokenizing.tokenizers import create_tokenizer
    /Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/.venv_v2_tut/lib/python3.11/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
      from .autonotebook import tqdm as notebook_tqdm
    

    Training RelCAT models with custom datasets from scratch.

    1. create the RelCAT config and set the parameters

    In [3]:
    Copied!
    config = ConfigRelCAT()
    config.general.log_level = logging.INFO
    config.general.model_name = "bert-base-uncased" # base model that you want to use, we're going to use the HuggingFace bert-base-uncased model
    
    config = ConfigRelCAT() config.general.log_level = logging.INFO config.general.model_name = "bert-base-uncased" # base model that you want to use, we're going to use the HuggingFace bert-base-uncased model

    1.1 Based on what model you use, you might want to keep an eye on config.model.hidden_size, config.model.model_size and config.model.hidden_layers

    In [4]:
    Copied!
    config.model.hidden_size= 256
    config.model.model_size = 2304 # 4096 for llama
    
    config.model.hidden_size= 256 config.model.model_size = 2304 # 4096 for llama

    1.2 Other notable configurations

    In [5]:
    Copied!
    config.general.cntx_left = 15 # how many tokens to the left of the start entity we select
    config.general.cntx_right = 15 # how many tokens to the right of the end entity we selecd
    config.general.window_size = 300 # distance (in characters) between two entities to be considered a relation
    config.train.nclasses = 2 # number of classes in your medcat export / dataset
    config.train.nepochs = 10 # number of epochs to train for
    config.model.freeze_layers = False # whether to freeze the layers of the base model
    config.general.limit_samples_per_class = 300 # limit the number of training samples per class to this number, to avoid overfitting in unbalanced datasets
    config.train.batch_size = 32 # batch size
    config.train.lr = 3e-5
    config.train.adam_epsilon = 1e-8
    config.train.adam_weight_decay = 0.0005
    
    config.general.cntx_left = 15 # how many tokens to the left of the start entity we select config.general.cntx_right = 15 # how many tokens to the right of the end entity we selecd config.general.window_size = 300 # distance (in characters) between two entities to be considered a relation config.train.nclasses = 2 # number of classes in your medcat export / dataset config.train.nepochs = 10 # number of epochs to train for config.model.freeze_layers = False # whether to freeze the layers of the base model config.general.limit_samples_per_class = 300 # limit the number of training samples per class to this number, to avoid overfitting in unbalanced datasets config.train.batch_size = 32 # batch size config.train.lr = 3e-5 config.train.adam_epsilon = 1e-8 config.train.adam_weight_decay = 0.0005

    2. create a CDB, it can be a CDB from another model of your choice or an empty one. The CDB is used only when filtering by concept unique identifiers (CUI) or concept type ids (TUI).

    In [7]:
    Copied!
    gen_cnf = Config()
    gen_cnf.general.nlp.provider = 'spacy'
    cdb = CDB(gen_cnf)
    base_tokenizer = create_tokenizer(gen_cnf.general.nlp.provider, gen_cnf)
    
    gen_cnf = Config() gen_cnf.general.nlp.provider = 'spacy' cdb = CDB(gen_cnf) base_tokenizer = create_tokenizer(gen_cnf.general.nlp.provider, gen_cnf)
    Collecting en-core-web-md==3.8.0
      Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_md-3.8.0/en_core_web_md-3.8.0-py3-none-any.whl (33.5 MB)
         ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 33.5/33.5 MB 86.4 MB/s eta 0:00:0000:0100:01
    Installing collected packages: en-core-web-md
    Successfully installed en-core-web-md-3.8.0
    ✔ Download and installation successful
    You can now load the package via spacy.load('en_core_web_md')
    

    3. Create a tokenizer

    In [8]:
    Copied!
    tokenizer = BaseTokenizerWrapper.load(tokenizer_path=config.general.model_name,
                                                                               relcat_config=config)   
    
    tokenizer = BaseTokenizerWrapper.load(tokenizer_path=config.general.model_name, relcat_config=config)

    4. Add token tags to tokenizer. This step is optional because the [s1], [e1], [s2], [e2] tags are already located in the default RelCATConfig. If you are using a LLama based model, you will need to add the [PAD] token to the tokenizer, as shown below.

    In [9]:
    Copied!
    special_ent_tokens = ["[s1]", "[e1]", "[s2]", "[e2]"]
    tokenizer.hf_tokenizers.add_tokens(special_ent_tokens, special_tokens=True)
    tokenizer.hf_tokenizers.add_special_tokens({'pad_token': '[PAD]'}) # used in llama tokenizer
    
    special_ent_tokens = ["[s1]", "[e1]", "[s2]", "[e2]"] tokenizer.hf_tokenizers.add_tokens(special_ent_tokens, special_tokens=True) tokenizer.hf_tokenizers.add_special_tokens({'pad_token': '[PAD]'}) # used in llama tokenizer
    Out[9]:
    0

    5. Add tokens to the RelCATConfig

    In [10]:
    Copied!
    config.general.tokenizer_relation_annotation_special_tokens_tags = special_ent_tokens
    config.general.annotation_schema_tag_ids = tokenizer.hf_tokenizers.convert_tokens_to_ids(special_ent_tokens)
    
    config.general.tokenizer_relation_annotation_special_tokens_tags = special_ent_tokens config.general.annotation_schema_tag_ids = tokenizer.hf_tokenizers.convert_tokens_to_ids(special_ent_tokens)

    6. Create the relCAT object and initialize its components

    In [12]:
    Copied!
    # if you wish to skip the steps in section 6.1 you can pass the init_model=True arguement to intialize the components with the default ConfigRelCAT settings.
    relCAT = RelCAT(base_tokenizer, cdb, config=config)
    
    # if you wish to skip the steps in section 6.1 you can pass the init_model=True arguement to intialize the components with the default ConfigRelCAT settings. relCAT = RelCAT(base_tokenizer, cdb, config=config)
    INFO:medcat.components.addons.relation_extraction.base_component:RelExtrBaseComponent initialized
    

    6.1 Use the BaseComponent object, this one holds the tokenizer, model and model config. We will have to initialize each component beforehand.

    Resize token embeddings since we added the tokens before, this should be done after adding tokens to the tokenizer. It is not required after creating and saving/loading a model as the value will be retained.

    In [14]:
    Copied!
    model_config = RelExtrBertConfig.load(pretrained_model_name_or_path=config.general.model_name,
                                                                       relcat_config=config)
    
    # update the model config with the proper vocab size, since we added special tokens to the tokenizer
    model_config.hf_model_config.vocab_size = tokenizer.get_size()
    
    # set the padding idx in the model config and relcat config, this is necesasry as it depends on what tokenizer you use
    config.model.padding_idx = model_config.pad_token_id = tokenizer.get_pad_id()
    
    model = RelExtrBertModel.load(pretrained_model_name_or_path=config.general.model_name,
                                                                       model_config=model_config,
                                                                       relcat_config=config)
    
    # we have to update the model to reflect the new token embeddings, since we added special tokens to the tokenizer
    model.hf_model.resize_token_embeddings(len(tokenizer.hf_tokenizers)) # type: ignore
    
    component = RelExtrBaseComponent(tokenizer=tokenizer, config=config)
    component.model = model
    component.model_config = model_config
    component.relcat_config = config
    component.tokenizer = tokenizer
    
    relCAT.component = component
    
    model_config = RelExtrBertConfig.load(pretrained_model_name_or_path=config.general.model_name, relcat_config=config) # update the model config with the proper vocab size, since we added special tokens to the tokenizer model_config.hf_model_config.vocab_size = tokenizer.get_size() # set the padding idx in the model config and relcat config, this is necesasry as it depends on what tokenizer you use config.model.padding_idx = model_config.pad_token_id = tokenizer.get_pad_id() model = RelExtrBertModel.load(pretrained_model_name_or_path=config.general.model_name, model_config=model_config, relcat_config=config) # we have to update the model to reflect the new token embeddings, since we added special tokens to the tokenizer model.hf_model.resize_token_embeddings(len(tokenizer.hf_tokenizers)) # type: ignore component = RelExtrBaseComponent(tokenizer=tokenizer, config=config) component.model = model component.model_config = model_config component.relcat_config = config component.tokenizer = tokenizer relCAT.component = component
    INFO:medcat.components.addons.relation_extraction.bert.config:Loaded config from pretrained: bert-base-uncased
    INFO:medcat.components.addons.relation_extraction.models:RelCAT model config: BertConfig {
      "architectures": [
        "BertForMaskedLM"
      ],
      "attention_probs_dropout_prob": 0.1,
      "classifier_dropout": null,
      "gradient_checkpointing": false,
      "hidden_act": "gelu",
      "hidden_dropout_prob": 0.1,
      "hidden_size": 768,
      "initializer_range": 0.02,
      "intermediate_size": 3072,
      "layer_norm_eps": 1e-12,
      "max_position_embeddings": 512,
      "model_type": "bert",
      "num_attention_heads": 12,
      "num_hidden_layers": 12,
      "pad_token_id": 0,
      "position_embedding_type": "absolute",
      "transformers_version": "4.53.0",
      "type_vocab_size": 2,
      "use_cache": true,
      "vocab_size": 30526
    }
    
    INFO:medcat.components.addons.relation_extraction.models:RelCAT model config: BertConfig {
      "architectures": [
        "BertForMaskedLM"
      ],
      "attention_probs_dropout_prob": 0.1,
      "classifier_dropout": null,
      "gradient_checkpointing": false,
      "hidden_act": "gelu",
      "hidden_dropout_prob": 0.1,
      "hidden_size": 768,
      "initializer_range": 0.02,
      "intermediate_size": 3072,
      "layer_norm_eps": 1e-12,
      "max_position_embeddings": 512,
      "model_type": "bert",
      "num_attention_heads": 12,
      "num_hidden_layers": 12,
      "pad_token_id": 0,
      "position_embedding_type": "absolute",
      "transformers_version": "4.53.0",
      "type_vocab_size": 2,
      "use_cache": true,
      "vocab_size": 30526
    }
    
    Some weights of BertModel were not initialized from the model checkpoint at bert-base-uncased and are newly initialized because the shapes did not match:
    - embeddings.word_embeddings.weight: found shape torch.Size([30522, 768]) in the checkpoint and torch.Size([30526, 768]) in the model instantiated
    You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
    INFO:medcat.components.addons.relation_extraction.bert.model:Loaded model from pretrained: bert-base-uncased
    INFO:medcat.components.addons.relation_extraction.models:Loaded RelExtrBertModel from pretrained_model_name_or_path: bert-base-uncased
    INFO:medcat.components.addons.relation_extraction.base_component:RelExtrBaseComponent initialized
    

    7. Train the model from the ADE dataset.

    In [15]:
    Copied!
    ! rm -rf "./ade_relcat_model"
    ! mkdir -p "./ade_relcat_model"
    
    ! rm -rf "./ade_relcat_model" ! mkdir -p "./ade_relcat_model"
    In [17]:
    Copied!
    relCAT.train(train_csv_path="./data/rel_cat_ADE_V2.tsv", checkpoint_path="./ade_relcat_model")
    
    # for MedCAT Trainer Exports, use the export_path argument : relCAT.train(export_data_path="./data/MedCAT_Export_relation_extraction.json")
    
    relCAT.train(train_csv_path="./data/rel_cat_ADE_V2.tsv", checkpoint_path="./ade_relcat_model") # for MedCAT Trainer Exports, use the export_path argument : relCAT.train(export_data_path="./data/MedCAT_Export_relation_extraction.json")
    INFO:medcat.components.addons.relation_extraction.rel_dataset:CSV dataset | No. of relations detected: 7093 | from : ./data/rel_cat_ADE_V2.tsv | nclasses: 2 | idx2label: {0: 'DRUG-AE', 1: 'DRUG-DOSE'}
    INFO:medcat.components.addons.relation_extraction.rel_dataset:Samples per class: 
    INFO:medcat.components.addons.relation_extraction.rel_dataset: label: DRUG-AE | samples: 6814
    INFO:medcat.components.addons.relation_extraction.rel_dataset: label: DRUG-DOSE | samples: 279
    INFO:root:Relations after train, test split :  train - 524 | test - 115
    INFO:root: label: DRUG-AE samples | train 300 | test 60
    INFO:root: label: DRUG-DOSE samples | train 224 | test 55
    INFO:root:Attempting to load RelCAT model on device: cpu
    INFO:medcat.components.addons.relation_extraction.rel_cat:Starting training process...
    INFO:medcat.components.addons.relation_extraction.rel_cat:Total epochs on this model: 10 | currently training epoch 0
      0%|          | 0/524 [00:00<?, ?it/s]/Users/martratas/Documents/CogStack/.MedCAT.nosync/monorepo-nlp/.venv_v2_tut/lib/python3.11/site-packages/torch/utils/data/dataloader.py:683: UserWarning: 'pin_memory' argument is set as true but not supported on MPS now, then device pinned memory won't be used.
      warnings.warn(warn_msg)
    100%|██████████| 524/524 [00:29<00:00, 17.52it/s]
    INFO:medcat.components.addons.relation_extraction.rel_cat:Losses at Epoch 0: 0.02372
    INFO:medcat.components.addons.relation_extraction.rel_cat:Train accuracy at Epoch 0: 0.53125
    INFO:medcat.components.addons.relation_extraction.rel_cat:======================== TRAIN SET TEST RESULTS ========================
    INFO:medcat.components.addons.relation_extraction.rel_cat:Evaluating test samples...
    INFO:medcat.components.addons.relation_extraction.rel_cat:==================== Evaluation Results ====================
    INFO:medcat.components.addons.relation_extraction.rel_cat: no. of batches:17
    INFO:medcat.components.addons.relation_extraction.rel_cat: accuracy = 0.573
    INFO:medcat.components.addons.relation_extraction.rel_cat: f1 = 0.573
    INFO:medcat.components.addons.relation_extraction.rel_cat: loss = 0.681
    INFO:medcat.components.addons.relation_extraction.rel_cat: precision = 0.573
    INFO:medcat.components.addons.relation_extraction.rel_cat: recall = 0.573
    INFO:medcat.components.addons.relation_extraction.rel_cat:----------------------- class stats -----------------------
    INFO:medcat.components.addons.relation_extraction.rel_cat:label: DRUG-AE | f1: 0.724 | prec : 1.000 | acc: 0.573 | recall: 0.573 
    INFO:medcat.components.addons.relation_extraction.rel_cat:label: DRUG-DOSE | f1: 0.000 | prec : 0.000 | acc: 0.573 | recall: 0.000 
    INFO:medcat.components.addons.relation_extraction.rel_cat:-----------------------------------------------------------
    INFO:medcat.components.addons.relation_extraction.rel_cat:===========================================================
    INFO:medcat.components.addons.relation_extraction.rel_cat:======================== TEST SET TEST RESULTS ========================
    INFO:medcat.components.addons.relation_extraction.rel_cat:Evaluating test samples...
    INFO:medcat.components.addons.relation_extraction.rel_cat:==================== Evaluation Results ====================
    INFO:medcat.components.addons.relation_extraction.rel_cat: no. of batches:4
    INFO:medcat.components.addons.relation_extraction.rel_cat: accuracy = 0.522
    INFO:medcat.components.addons.relation_extraction.rel_cat: f1 = 0.522
    INFO:medcat.components.addons.relation_extraction.rel_cat: loss = 0.692
    INFO:medcat.components.addons.relation_extraction.rel_cat: precision = 0.522
    INFO:medcat.components.addons.relation_extraction.rel_cat: recall = 0.522
    INFO:medcat.components.addons.relation_extraction.rel_cat:----------------------- class stats -----------------------
    INFO:medcat.components.addons.relation_extraction.rel_cat:label: DRUG-AE | f1: 0.682 | prec : 1.000 | acc: 0.522 | recall: 0.522 
    INFO:medcat.components.addons.relation_extraction.rel_cat:label: DRUG-DOSE | f1: 0.000 | prec : 0.000 | acc: 0.522 | recall: 0.000 
    INFO:medcat.components.addons.relation_extraction.rel_cat:-----------------------------------------------------------
    INFO:medcat.components.addons.relation_extraction.rel_cat:===========================================================
    INFO:medcat.components.addons.relation_extraction.rel_cat:Epoch finished, took 0:00:29.914109 seconds
    INFO:medcat.components.addons.relation_extraction.rel_cat:Total epochs on this model: 10 | currently training epoch 1
    100%|██████████| 524/524 [00:27<00:00, 19.24it/s]
    INFO:medcat.components.addons.relation_extraction.rel_cat:Losses at Epoch 1: 0.02317
    INFO:medcat.components.addons.relation_extraction.rel_cat:Train accuracy at Epoch 1: 0.57782
    INFO:medcat.components.addons.relation_extraction.rel_cat:======================== TRAIN SET TEST RESULTS ========================
    INFO:medcat.components.addons.relation_extraction.rel_cat:Evaluating test samples...
    INFO:medcat.components.addons.relation_extraction.rel_cat:==================== Evaluation Results ====================
    INFO:medcat.components.addons.relation_extraction.rel_cat: no. of batches:17
    INFO:medcat.components.addons.relation_extraction.rel_cat: accuracy = 0.576
    INFO:medcat.components.addons.relation_extraction.rel_cat: f1 = 0.576
    INFO:medcat.components.addons.relation_extraction.rel_cat: loss = 0.677
    INFO:medcat.components.addons.relation_extraction.rel_cat: precision = 0.576
    INFO:medcat.components.addons.relation_extraction.rel_cat: recall = 0.576
    INFO:medcat.components.addons.relation_extraction.rel_cat:----------------------- class stats -----------------------
    INFO:medcat.components.addons.relation_extraction.rel_cat:label: DRUG-AE | f1: 0.725 | prec : 1.000 | acc: 0.576 | recall: 0.576 
    INFO:medcat.components.addons.relation_extraction.rel_cat:label: DRUG-DOSE | f1: 0.000 | prec : 0.000 | acc: 0.576 | recall: 0.000 
    INFO:medcat.components.addons.relation_extraction.rel_cat:-----------------------------------------------------------
    INFO:medcat.components.addons.relation_extraction.rel_cat:===========================================================
    INFO:medcat.components.addons.relation_extraction.rel_cat:======================== TEST SET TEST RESULTS ========================
    INFO:medcat.components.addons.relation_extraction.rel_cat:Evaluating test samples...
    INFO:medcat.components.addons.relation_extraction.rel_cat:==================== Evaluation Results ====================
    INFO:medcat.components.addons.relation_extraction.rel_cat: no. of batches:4
    INFO:medcat.components.addons.relation_extraction.rel_cat: accuracy = 0.501
    INFO:medcat.components.addons.relation_extraction.rel_cat: f1 = 0.501
    INFO:medcat.components.addons.relation_extraction.rel_cat: loss = 0.702
    INFO:medcat.components.addons.relation_extraction.rel_cat: precision = 0.501
    INFO:medcat.components.addons.relation_extraction.rel_cat: recall = 0.501
    INFO:medcat.components.addons.relation_extraction.rel_cat:----------------------- class stats -----------------------
    INFO:medcat.components.addons.relation_extraction.rel_cat:label: DRUG-AE | f1: 0.658 | prec : 1.000 | acc: 0.501 | recall: 0.501 
    INFO:medcat.components.addons.relation_extraction.rel_cat:label: DRUG-DOSE | f1: 0.000 | prec : 0.000 | acc: 0.501 | recall: 0.000 
    INFO:medcat.components.addons.relation_extraction.rel_cat:-----------------------------------------------------------
    INFO:medcat.components.addons.relation_extraction.rel_cat:===========================================================
    INFO:medcat.components.addons.relation_extraction.rel_cat:Epoch finished, took 0:00:27.238975 seconds
    INFO:medcat.components.addons.relation_extraction.rel_cat:Total epochs on this model: 10 | currently training epoch 2
    100%|██████████| 524/524 [00:25<00:00, 20.54it/s]
    INFO:medcat.components.addons.relation_extraction.rel_cat:Losses at Epoch 2: 0.02323
    INFO:medcat.components.addons.relation_extraction.rel_cat:Train accuracy at Epoch 2: 0.57598
    INFO:medcat.components.addons.relation_extraction.rel_cat:======================== TRAIN SET TEST RESULTS ========================
    INFO:medcat.components.addons.relation_extraction.rel_cat:Evaluating test samples...
    INFO:medcat.components.addons.relation_extraction.rel_cat:==================== Evaluation Results ====================
    INFO:medcat.components.addons.relation_extraction.rel_cat: no. of batches:17
    INFO:medcat.components.addons.relation_extraction.rel_cat: accuracy = 0.570
    INFO:medcat.components.addons.relation_extraction.rel_cat: f1 = 0.570
    INFO:medcat.components.addons.relation_extraction.rel_cat: loss = 0.675
    INFO:medcat.components.addons.relation_extraction.rel_cat: precision = 0.570
    INFO:medcat.components.addons.relation_extraction.rel_cat: recall = 0.570
    INFO:medcat.components.addons.relation_extraction.rel_cat:----------------------- class stats -----------------------
    INFO:medcat.components.addons.relation_extraction.rel_cat:label: DRUG-AE | f1: 0.722 | prec : 1.000 | acc: 0.570 | recall: 0.570 
    INFO:medcat.components.addons.relation_extraction.rel_cat:label: DRUG-DOSE | f1: 0.000 | prec : 0.000 | acc: 0.570 | recall: 0.000 
    INFO:medcat.components.addons.relation_extraction.rel_cat:-----------------------------------------------------------
    INFO:medcat.components.addons.relation_extraction.rel_cat:===========================================================
    INFO:medcat.components.addons.relation_extraction.rel_cat:======================== TEST SET TEST RESULTS ========================
    INFO:medcat.components.addons.relation_extraction.rel_cat:Evaluating test samples...
    INFO:medcat.components.addons.relation_extraction.rel_cat:==================== Evaluation Results ====================
    INFO:medcat.components.addons.relation_extraction.rel_cat: no. of batches:4
    INFO:medcat.components.addons.relation_extraction.rel_cat: accuracy = 0.517
    INFO:medcat.components.addons.relation_extraction.rel_cat: f1 = 0.517
    INFO:medcat.components.addons.relation_extraction.rel_cat: loss = 0.691
    INFO:medcat.components.addons.relation_extraction.rel_cat: precision = 0.517
    INFO:medcat.components.addons.relation_extraction.rel_cat: recall = 0.517
    INFO:medcat.components.addons.relation_extraction.rel_cat:----------------------- class stats -----------------------
    INFO:medcat.components.addons.relation_extraction.rel_cat:label: DRUG-AE | f1: 0.679 | prec : 1.000 | acc: 0.517 | recall: 0.517 
    INFO:medcat.components.addons.relation_extraction.rel_cat:label: DRUG-DOSE | f1: 0.000 | prec : 0.000 | acc: 0.517 | recall: 0.000 
    INFO:medcat.components.addons.relation_extraction.rel_cat:-----------------------------------------------------------
    INFO:medcat.components.addons.relation_extraction.rel_cat:===========================================================
    INFO:medcat.components.addons.relation_extraction.rel_cat:Epoch finished, took 0:00:25.511498 seconds
    INFO:medcat.components.addons.relation_extraction.rel_cat:Total epochs on this model: 10 | currently training epoch 3
    100%|██████████| 524/524 [00:26<00:00, 19.72it/s]
    INFO:medcat.components.addons.relation_extraction.rel_cat:Losses at Epoch 3: 0.02311
    INFO:medcat.components.addons.relation_extraction.rel_cat:Train accuracy at Epoch 3: 0.57598
    INFO:medcat.components.addons.relation_extraction.rel_cat:======================== TRAIN SET TEST RESULTS ========================
    INFO:medcat.components.addons.relation_extraction.rel_cat:Evaluating test samples...
    INFO:medcat.components.addons.relation_extraction.rel_cat:==================== Evaluation Results ====================
    INFO:medcat.components.addons.relation_extraction.rel_cat: no. of batches:17
    INFO:medcat.components.addons.relation_extraction.rel_cat: accuracy = 0.573
    INFO:medcat.components.addons.relation_extraction.rel_cat: f1 = 0.573
    INFO:medcat.components.addons.relation_extraction.rel_cat: loss = 0.665
    INFO:medcat.components.addons.relation_extraction.rel_cat: precision = 0.573
    INFO:medcat.components.addons.relation_extraction.rel_cat: recall = 0.573
    INFO:medcat.components.addons.relation_extraction.rel_cat:----------------------- class stats -----------------------
    INFO:medcat.components.addons.relation_extraction.rel_cat:label: DRUG-AE | f1: 0.722 | prec : 1.000 | acc: 0.573 | recall: 0.573 
    INFO:medcat.components.addons.relation_extraction.rel_cat:label: DRUG-DOSE | f1: 0.000 | prec : 0.000 | acc: 0.573 | recall: 0.000 
    INFO:medcat.components.addons.relation_extraction.rel_cat:-----------------------------------------------------------
    INFO:medcat.components.addons.relation_extraction.rel_cat:===========================================================
    INFO:medcat.components.addons.relation_extraction.rel_cat:======================== TEST SET TEST RESULTS ========================
    INFO:medcat.components.addons.relation_extraction.rel_cat:Evaluating test samples...
    INFO:medcat.components.addons.relation_extraction.rel_cat:==================== Evaluation Results ====================
    INFO:medcat.components.addons.relation_extraction.rel_cat: no. of batches:4
    INFO:medcat.components.addons.relation_extraction.rel_cat: accuracy = 0.522
    INFO:medcat.components.addons.relation_extraction.rel_cat: f1 = 0.522
    INFO:medcat.components.addons.relation_extraction.rel_cat: loss = 0.684
    INFO:medcat.components.addons.relation_extraction.rel_cat: precision = 0.522
    INFO:medcat.components.addons.relation_extraction.rel_cat: recall = 0.522
    INFO:medcat.components.addons.relation_extraction.rel_cat:----------------------- class stats -----------------------
    INFO:medcat.components.addons.relation_extraction.rel_cat:label: DRUG-AE | f1: 0.686 | prec : 1.000 | acc: 0.522 | recall: 0.522 
    INFO:medcat.components.addons.relation_extraction.rel_cat:label: DRUG-DOSE | f1: 0.000 | prec : 0.000 | acc: 0.522 | recall: 0.000 
    INFO:medcat.components.addons.relation_extraction.rel_cat:-----------------------------------------------------------
    INFO:medcat.components.addons.relation_extraction.rel_cat:===========================================================
    INFO:medcat.components.addons.relation_extraction.rel_cat:Epoch finished, took 0:00:26.571325 seconds
    INFO:medcat.components.addons.relation_extraction.rel_cat:Total epochs on this model: 10 | currently training epoch 4
    100%|██████████| 524/524 [00:25<00:00, 20.37it/s]
    INFO:medcat.components.addons.relation_extraction.rel_cat:Losses at Epoch 4: 0.02265
    INFO:medcat.components.addons.relation_extraction.rel_cat:Train accuracy at Epoch 4: 0.57721
    INFO:medcat.components.addons.relation_extraction.rel_cat:======================== TRAIN SET TEST RESULTS ========================
    INFO:medcat.components.addons.relation_extraction.rel_cat:Evaluating test samples...
    INFO:medcat.components.addons.relation_extraction.rel_cat:==================== Evaluation Results ====================
    INFO:medcat.components.addons.relation_extraction.rel_cat: no. of batches:17
    INFO:medcat.components.addons.relation_extraction.rel_cat: accuracy = 0.657
    INFO:medcat.components.addons.relation_extraction.rel_cat: f1 = 0.657
    INFO:medcat.components.addons.relation_extraction.rel_cat: loss = 0.638
    INFO:medcat.components.addons.relation_extraction.rel_cat: precision = 0.657
    INFO:medcat.components.addons.relation_extraction.rel_cat: recall = 0.657
    INFO:medcat.components.addons.relation_extraction.rel_cat:----------------------- class stats -----------------------
    INFO:medcat.components.addons.relation_extraction.rel_cat:label: DRUG-AE | f1: 0.762 | prec : 0.979 | acc: 0.657 | recall: 0.631 
    INFO:medcat.components.addons.relation_extraction.rel_cat:label: DRUG-DOSE | f1: 0.322 | prec : 0.205 | acc: 0.657 | recall: 0.869 
    INFO:medcat.components.addons.relation_extraction.rel_cat:-----------------------------------------------------------
    INFO:medcat.components.addons.relation_extraction.rel_cat:===========================================================
    INFO:medcat.components.addons.relation_extraction.rel_cat:======================== TEST SET TEST RESULTS ========================
    INFO:medcat.components.addons.relation_extraction.rel_cat:Evaluating test samples...
    INFO:medcat.components.addons.relation_extraction.rel_cat:==================== Evaluation Results ====================
    INFO:medcat.components.addons.relation_extraction.rel_cat: no. of batches:4
    INFO:medcat.components.addons.relation_extraction.rel_cat: accuracy = 0.535
    INFO:medcat.components.addons.relation_extraction.rel_cat: f1 = 0.535
    INFO:medcat.components.addons.relation_extraction.rel_cat: loss = 0.673
    INFO:medcat.components.addons.relation_extraction.rel_cat: precision = 0.535
    INFO:medcat.components.addons.relation_extraction.rel_cat: recall = 0.535
    INFO:medcat.components.addons.relation_extraction.rel_cat:----------------------- class stats -----------------------
    INFO:medcat.components.addons.relation_extraction.rel_cat:label: DRUG-AE | f1: 0.681 | prec : 0.972 | acc: 0.535 | recall: 0.527 
    INFO:medcat.components.addons.relation_extraction.rel_cat:label: DRUG-DOSE | f1: 0.124 | prec : 0.068 | acc: 0.535 | recall: 0.750 
    INFO:medcat.components.addons.relation_extraction.rel_cat:-----------------------------------------------------------
    INFO:medcat.components.addons.relation_extraction.rel_cat:===========================================================
    INFO:medcat.components.addons.relation_extraction.rel_cat:Epoch finished, took 0:00:25.722024 seconds
    INFO:medcat.components.addons.relation_extraction.rel_cat:Total epochs on this model: 10 | currently training epoch 5
    100%|██████████| 524/524 [00:28<00:00, 18.52it/s]
    INFO:medcat.components.addons.relation_extraction.rel_cat:Losses at Epoch 5: 0.02180
    INFO:medcat.components.addons.relation_extraction.rel_cat:Train accuracy at Epoch 5: 0.63909
    INFO:medcat.components.addons.relation_extraction.rel_cat:======================== TRAIN SET TEST RESULTS ========================
    INFO:medcat.components.addons.relation_extraction.rel_cat:Evaluating test samples...
    INFO:medcat.components.addons.relation_extraction.rel_cat:==================== Evaluation Results ====================
    INFO:medcat.components.addons.relation_extraction.rel_cat: no. of batches:17
    INFO:medcat.components.addons.relation_extraction.rel_cat: accuracy = 0.794
    INFO:medcat.components.addons.relation_extraction.rel_cat: f1 = 0.794
    INFO:medcat.components.addons.relation_extraction.rel_cat: loss = 0.595
    INFO:medcat.components.addons.relation_extraction.rel_cat: precision = 0.794
    INFO:medcat.components.addons.relation_extraction.rel_cat: recall = 0.794
    INFO:medcat.components.addons.relation_extraction.rel_cat:----------------------- class stats -----------------------
    INFO:medcat.components.addons.relation_extraction.rel_cat:label: DRUG-AE | f1: 0.808 | prec : 0.769 | acc: 0.794 | recall: 0.859 
    INFO:medcat.components.addons.relation_extraction.rel_cat:label: DRUG-DOSE | f1: 0.764 | prec : 0.828 | acc: 0.794 | recall: 0.718 
    INFO:medcat.components.addons.relation_extraction.rel_cat:-----------------------------------------------------------
    INFO:medcat.components.addons.relation_extraction.rel_cat:===========================================================
    INFO:medcat.components.addons.relation_extraction.rel_cat:======================== TEST SET TEST RESULTS ========================
    INFO:medcat.components.addons.relation_extraction.rel_cat:Evaluating test samples...
    INFO:medcat.components.addons.relation_extraction.rel_cat:==================== Evaluation Results ====================
    INFO:medcat.components.addons.relation_extraction.rel_cat: no. of batches:4
    INFO:medcat.components.addons.relation_extraction.rel_cat: accuracy = 0.708
    INFO:medcat.components.addons.relation_extraction.rel_cat: f1 = 0.708
    INFO:medcat.components.addons.relation_extraction.rel_cat: loss = 0.642
    INFO:medcat.components.addons.relation_extraction.rel_cat: precision = 0.708
    INFO:medcat.components.addons.relation_extraction.rel_cat: recall = 0.708
    INFO:medcat.components.addons.relation_extraction.rel_cat:----------------------- class stats -----------------------
    INFO:medcat.components.addons.relation_extraction.rel_cat:label: DRUG-AE | f1: 0.687 | prec : 0.615 | acc: 0.708 | recall: 0.782 
    INFO:medcat.components.addons.relation_extraction.rel_cat:label: DRUG-DOSE | f1: 0.724 | prec : 0.815 | acc: 0.708 | recall: 0.655 
    INFO:medcat.components.addons.relation_extraction.rel_cat:-----------------------------------------------------------
    INFO:medcat.components.addons.relation_extraction.rel_cat:===========================================================
    INFO:medcat.components.addons.relation_extraction.rel_cat:Epoch finished, took 0:00:28.295869 seconds
    INFO:medcat.components.addons.relation_extraction.rel_cat:Total epochs on this model: 10 | currently training epoch 6
    100%|██████████| 524/524 [00:26<00:00, 19.75it/s]
    INFO:medcat.components.addons.relation_extraction.rel_cat:Losses at Epoch 6: 0.02017
    INFO:medcat.components.addons.relation_extraction.rel_cat:Train accuracy at Epoch 6: 0.72610
    INFO:medcat.components.addons.relation_extraction.rel_cat:======================== TRAIN SET TEST RESULTS ========================
    INFO:medcat.components.addons.relation_extraction.rel_cat:Evaluating test samples...
    INFO:medcat.components.addons.relation_extraction.rel_cat:==================== Evaluation Results ====================
    INFO:medcat.components.addons.relation_extraction.rel_cat: no. of batches:17
    INFO:medcat.components.addons.relation_extraction.rel_cat: accuracy = 0.844
    INFO:medcat.components.addons.relation_extraction.rel_cat: f1 = 0.844
    INFO:medcat.components.addons.relation_extraction.rel_cat: loss = 0.513
    INFO:medcat.components.addons.relation_extraction.rel_cat: precision = 0.844
    INFO:medcat.components.addons.relation_extraction.rel_cat: recall = 0.844
    INFO:medcat.components.addons.relation_extraction.rel_cat:----------------------- class stats -----------------------
    INFO:medcat.components.addons.relation_extraction.rel_cat:label: DRUG-AE | f1: 0.869 | prec : 0.925 | acc: 0.844 | recall: 0.824 
    INFO:medcat.components.addons.relation_extraction.rel_cat:label: DRUG-DOSE | f1: 0.794 | prec : 0.730 | acc: 0.844 | recall: 0.886 
    INFO:medcat.components.addons.relation_extraction.rel_cat:-----------------------------------------------------------
    INFO:medcat.components.addons.relation_extraction.rel_cat:===========================================================
    INFO:medcat.components.addons.relation_extraction.rel_cat:======================== TEST SET TEST RESULTS ========================
    INFO:medcat.components.addons.relation_extraction.rel_cat:Evaluating test samples...
    INFO:medcat.components.addons.relation_extraction.rel_cat:==================== Evaluation Results ====================
    INFO:medcat.components.addons.relation_extraction.rel_cat: no. of batches:4
    INFO:medcat.components.addons.relation_extraction.rel_cat: accuracy = 0.690
    INFO:medcat.components.addons.relation_extraction.rel_cat: f1 = 0.690
    INFO:medcat.components.addons.relation_extraction.rel_cat: loss = 0.608
    INFO:medcat.components.addons.relation_extraction.rel_cat: precision = 0.690
    INFO:medcat.components.addons.relation_extraction.rel_cat: recall = 0.690
    INFO:medcat.components.addons.relation_extraction.rel_cat:----------------------- class stats -----------------------
    INFO:medcat.components.addons.relation_extraction.rel_cat:label: DRUG-AE | f1: 0.738 | prec : 0.851 | acc: 0.690 | recall: 0.658 
    INFO:medcat.components.addons.relation_extraction.rel_cat:label: DRUG-DOSE | f1: 0.609 | prec : 0.515 | acc: 0.690 | recall: 0.759 
    INFO:medcat.components.addons.relation_extraction.rel_cat:-----------------------------------------------------------
    INFO:medcat.components.addons.relation_extraction.rel_cat:===========================================================
    INFO:medcat.components.addons.relation_extraction.rel_cat:Epoch finished, took 0:00:26.537600 seconds
    INFO:medcat.components.addons.relation_extraction.rel_cat:Total epochs on this model: 10 | currently training epoch 7
    100%|██████████| 524/524 [00:26<00:00, 19.61it/s]
    INFO:medcat.components.addons.relation_extraction.rel_cat:Losses at Epoch 7: 0.01812
    INFO:medcat.components.addons.relation_extraction.rel_cat:Train accuracy at Epoch 7: 0.78370
    INFO:medcat.components.addons.relation_extraction.rel_cat:======================== TRAIN SET TEST RESULTS ========================
    INFO:medcat.components.addons.relation_extraction.rel_cat:Evaluating test samples...
    INFO:medcat.components.addons.relation_extraction.rel_cat:==================== Evaluation Results ====================
    INFO:medcat.components.addons.relation_extraction.rel_cat: no. of batches:17
    INFO:medcat.components.addons.relation_extraction.rel_cat: accuracy = 0.822
    INFO:medcat.components.addons.relation_extraction.rel_cat: f1 = 0.822
    INFO:medcat.components.addons.relation_extraction.rel_cat: loss = 0.483
    INFO:medcat.components.addons.relation_extraction.rel_cat: precision = 0.822
    INFO:medcat.components.addons.relation_extraction.rel_cat: recall = 0.822
    INFO:medcat.components.addons.relation_extraction.rel_cat:----------------------- class stats -----------------------
    INFO:medcat.components.addons.relation_extraction.rel_cat:label: DRUG-AE | f1: 0.822 | prec : 0.764 | acc: 0.822 | recall: 0.903 
    INFO:medcat.components.addons.relation_extraction.rel_cat:label: DRUG-DOSE | f1: 0.813 | prec : 0.904 | acc: 0.822 | recall: 0.750 
    INFO:medcat.components.addons.relation_extraction.rel_cat:-----------------------------------------------------------
    INFO:medcat.components.addons.relation_extraction.rel_cat:===========================================================
    INFO:medcat.components.addons.relation_extraction.rel_cat:======================== TEST SET TEST RESULTS ========================
    INFO:medcat.components.addons.relation_extraction.rel_cat:Evaluating test samples...
    INFO:medcat.components.addons.relation_extraction.rel_cat:==================== Evaluation Results ====================
    INFO:medcat.components.addons.relation_extraction.rel_cat: no. of batches:4
    INFO:medcat.components.addons.relation_extraction.rel_cat: accuracy = 0.736
    INFO:medcat.components.addons.relation_extraction.rel_cat: f1 = 0.736
    INFO:medcat.components.addons.relation_extraction.rel_cat: loss = 0.577
    INFO:medcat.components.addons.relation_extraction.rel_cat: precision = 0.736
    INFO:medcat.components.addons.relation_extraction.rel_cat: recall = 0.736
    INFO:medcat.components.addons.relation_extraction.rel_cat:----------------------- class stats -----------------------
    INFO:medcat.components.addons.relation_extraction.rel_cat:label: DRUG-AE | f1: 0.700 | prec : 0.610 | acc: 0.736 | recall: 0.854 
    INFO:medcat.components.addons.relation_extraction.rel_cat:label: DRUG-DOSE | f1: 0.755 | prec : 0.864 | acc: 0.736 | recall: 0.679 
    INFO:medcat.components.addons.relation_extraction.rel_cat:-----------------------------------------------------------
    INFO:medcat.components.addons.relation_extraction.rel_cat:===========================================================
    INFO:medcat.components.addons.relation_extraction.rel_cat:Epoch finished, took 0:00:26.722161 seconds
    INFO:medcat.components.addons.relation_extraction.rel_cat:Total epochs on this model: 10 | currently training epoch 8
    100%|██████████| 524/524 [00:24<00:00, 20.96it/s]
    INFO:medcat.components.addons.relation_extraction.rel_cat:Losses at Epoch 8: 0.01818
    INFO:medcat.components.addons.relation_extraction.rel_cat:Train accuracy at Epoch 8: 0.78064
    INFO:medcat.components.addons.relation_extraction.rel_cat:======================== TRAIN SET TEST RESULTS ========================
    INFO:medcat.components.addons.relation_extraction.rel_cat:Evaluating test samples...
    INFO:medcat.components.addons.relation_extraction.rel_cat:==================== Evaluation Results ====================
    INFO:medcat.components.addons.relation_extraction.rel_cat: no. of batches:17
    INFO:medcat.components.addons.relation_extraction.rel_cat: accuracy = 0.849
    INFO:medcat.components.addons.relation_extraction.rel_cat: f1 = 0.849
    INFO:medcat.components.addons.relation_extraction.rel_cat: loss = 0.442
    INFO:medcat.components.addons.relation_extraction.rel_cat: precision = 0.849
    INFO:medcat.components.addons.relation_extraction.rel_cat: recall = 0.849
    INFO:medcat.components.addons.relation_extraction.rel_cat:----------------------- class stats -----------------------
    INFO:medcat.components.addons.relation_extraction.rel_cat:label: DRUG-AE | f1: 0.856 | prec : 0.812 | acc: 0.849 | recall: 0.914 
    INFO:medcat.components.addons.relation_extraction.rel_cat:label: DRUG-DOSE | f1: 0.832 | prec : 0.903 | acc: 0.849 | recall: 0.780 
    INFO:medcat.components.addons.relation_extraction.rel_cat:-----------------------------------------------------------
    INFO:medcat.components.addons.relation_extraction.rel_cat:===========================================================
    INFO:medcat.components.addons.relation_extraction.rel_cat:======================== TEST SET TEST RESULTS ========================
    INFO:medcat.components.addons.relation_extraction.rel_cat:Evaluating test samples...
    INFO:medcat.components.addons.relation_extraction.rel_cat:==================== Evaluation Results ====================
    INFO:medcat.components.addons.relation_extraction.rel_cat: no. of batches:4
    INFO:medcat.components.addons.relation_extraction.rel_cat: accuracy = 0.760
    INFO:medcat.components.addons.relation_extraction.rel_cat: f1 = 0.760
    INFO:medcat.components.addons.relation_extraction.rel_cat: loss = 0.561
    INFO:medcat.components.addons.relation_extraction.rel_cat: precision = 0.760
    INFO:medcat.components.addons.relation_extraction.rel_cat: recall = 0.760
    INFO:medcat.components.addons.relation_extraction.rel_cat:----------------------- class stats -----------------------
    INFO:medcat.components.addons.relation_extraction.rel_cat:label: DRUG-AE | f1: 0.745 | prec : 0.691 | acc: 0.760 | recall: 0.829 
    INFO:medcat.components.addons.relation_extraction.rel_cat:label: DRUG-DOSE | f1: 0.765 | prec : 0.828 | acc: 0.760 | recall: 0.718 
    INFO:medcat.components.addons.relation_extraction.rel_cat:-----------------------------------------------------------
    INFO:medcat.components.addons.relation_extraction.rel_cat:===========================================================
    INFO:medcat.components.addons.relation_extraction.rel_cat:Epoch finished, took 0:00:24.996328 seconds
    INFO:medcat.components.addons.relation_extraction.rel_cat:Total epochs on this model: 10 | currently training epoch 9
    100%|██████████| 524/524 [00:25<00:00, 20.30it/s]
    INFO:medcat.components.addons.relation_extraction.rel_cat:Losses at Epoch 9: 0.01695
    INFO:medcat.components.addons.relation_extraction.rel_cat:Train accuracy at Epoch 9: 0.83456
    INFO:medcat.components.addons.relation_extraction.rel_cat:======================== TRAIN SET TEST RESULTS ========================
    INFO:medcat.components.addons.relation_extraction.rel_cat:Evaluating test samples...
    INFO:medcat.components.addons.relation_extraction.rel_cat:==================== Evaluation Results ====================
    INFO:medcat.components.addons.relation_extraction.rel_cat: no. of batches:17
    INFO:medcat.components.addons.relation_extraction.rel_cat: accuracy = 0.890
    INFO:medcat.components.addons.relation_extraction.rel_cat: f1 = 0.890
    INFO:medcat.components.addons.relation_extraction.rel_cat: loss = 0.387
    INFO:medcat.components.addons.relation_extraction.rel_cat: precision = 0.890
    INFO:medcat.components.addons.relation_extraction.rel_cat: recall = 0.890
    INFO:medcat.components.addons.relation_extraction.rel_cat:----------------------- class stats -----------------------
    INFO:medcat.components.addons.relation_extraction.rel_cat:label: DRUG-AE | f1: 0.894 | prec : 0.873 | acc: 0.890 | recall: 0.923 
    INFO:medcat.components.addons.relation_extraction.rel_cat:label: DRUG-DOSE | f1: 0.878 | prec : 0.908 | acc: 0.890 | recall: 0.858 
    INFO:medcat.components.addons.relation_extraction.rel_cat:-----------------------------------------------------------
    INFO:medcat.components.addons.relation_extraction.rel_cat:===========================================================
    INFO:medcat.components.addons.relation_extraction.rel_cat:======================== TEST SET TEST RESULTS ========================
    INFO:medcat.components.addons.relation_extraction.rel_cat:Evaluating test samples...
    INFO:medcat.components.addons.relation_extraction.rel_cat:==================== Evaluation Results ====================
    INFO:medcat.components.addons.relation_extraction.rel_cat: no. of batches:4
    INFO:medcat.components.addons.relation_extraction.rel_cat: accuracy = 0.747
    INFO:medcat.components.addons.relation_extraction.rel_cat: f1 = 0.747
    INFO:medcat.components.addons.relation_extraction.rel_cat: loss = 0.542
    INFO:medcat.components.addons.relation_extraction.rel_cat: precision = 0.747
    INFO:medcat.components.addons.relation_extraction.rel_cat: recall = 0.747
    INFO:medcat.components.addons.relation_extraction.rel_cat:----------------------- class stats -----------------------
    INFO:medcat.components.addons.relation_extraction.rel_cat:label: DRUG-AE | f1: 0.732 | prec : 0.732 | acc: 0.747 | recall: 0.733 
    INFO:medcat.components.addons.relation_extraction.rel_cat:label: DRUG-DOSE | f1: 0.747 | prec : 0.754 | acc: 0.747 | recall: 0.742 
    INFO:medcat.components.addons.relation_extraction.rel_cat:-----------------------------------------------------------
    INFO:medcat.components.addons.relation_extraction.rel_cat:===========================================================
    INFO:medcat.components.addons.relation_extraction.rel_cat:Epoch finished, took 0:00:25.820102 seconds
    
    In [18]:
    Copied!
    # save the model
    relCAT.save(save_path="./ade_relcat_model")
    
    # save the model relCAT.save(save_path="./ade_relcat_model")
    In [ ]:
    Copied!
    
    
    In [ ]:
    Copied!
    
    
    Previous
    1. Add a MetaCat to a Model
    Next
    2. Infering relations from annotations with Relation toolkit
    Made with Material for MkDocs