diff --git a/docs/src/concepts/fine-tuning.rst b/docs/src/concepts/fine-tuning.rst
new file mode 100644
index 0000000000..3a26a92d8c
--- /dev/null
+++ b/docs/src/concepts/fine-tuning.rst
@@ -0,0 +1,226 @@
+.. _label_fine_tuning_concept:
+
+Fine-tune a pre-trained model
+=============================
+
+.. warning::
+
+  Finetuning may not be supported by every architecture and if supported the syntax to
+  start a finetuning may be different from how it is explained here.
+  This section describes the process of fine-tuning a pre-trained model to adapt it to
+  new tasks or datasets. Fine-tuning is a common technique used in machine learning,
+  where a model is trained on a large dataset and then fine-tuned on a smaller dataset
+  to improve its performance on specific tasks. So far the fine-tuning capabilities are
+  only available for PET model.
+
+There is a complete example in the tutorial section
+:ref:`sphx_glr_generated_examples_0-beginner_02-fine-tuning.py`.
+
+.. note::
+
+  Please note that the fine-tuning recommendations in this section are not universal
+  and require testing on your specific dataset to achieve the best results. You might
+  need to experiment with different fine-tuning strategies depending on your needs.
+
+
+Basic Fine-tuning
+-----------------
+
+The basic way to fine-tune a model is to use the ``mtt train`` command with the
+available pre-trained model defined in an ``options.yaml`` file. In this case, all the
+weights of the model will be adapted to the new dataset. In contrast to the
+training continuation, the optimizer and scheduler state will be reset. You can still
+adjust the training hyperparameters in the ``options.yaml`` file, but the model
+architecture will be taken from the checkpoint.
+To set the path to the pre-trained model checkpoint, you need to specify the
+``read_from`` parameter in the ``options.yaml`` file:
+
+.. code-block:: yaml
+
+  architecture:
+    training:
+      finetune:
+        method: "full" # This stands for the full fine-tuning
+        read_from: path/to/checkpoint.ckpt
+
+We recommend to use a lower learning rate than the one used for the original training,
+as this will help stabilizing the training process. I.e. if the default learning rate is
+``1e-4``, you can set it to ``1e-5`` or even lower, using the following in the
+``options.yaml`` file:
+
+.. code-block:: yaml
+
+  architecture:
+    training:
+      learning_rate: 1e-5
+
+Please note, that in most use cases you should invoke a new energy head by specifying
+a new energy variant. A variant is a version of a target quantity, such as ``energy``.
+A model can have multiple variants, that can be selected during training and inference.
+More on variants can be found in `metatomic`_
+
+.. _metatomic: https://docs.metatensor.org/metatomic/latest/engines/index.html
+
+Variant names follow the simple pattern ``energy/{variantname}``, where we used
+``energy`` as the target quantity. A reasonable name could be the energy functional or
+level of theory your finetuning dataset was trained on, e.g. ``energy/pbe``,
+``energy/SCAN`` or even ``energy/dataset1``. Further we recommend adding a short
+:attr:`description` for the new variant, that you can specify in ``description`` of
+your ``options.yaml`` file.
+
+
+.. code-block:: yaml
+
+  training_set:
+      systems:
+        read_from: path/to/dataset.xyz
+        length_unit: angstrom
+      targets:
+        energy/<variantname>:
+          quantity: energy
+          key: <energy-key>
+          unit: <energy-unit>
+          description: "description of your variant"
+
+
+The new energy variant can be selected for evaluation either with ``mtt eval`` by
+specifying it in the options.yaml for evaluation:
+
+.. code-block:: yaml
+
+   systems: path/to/dataset.xyz
+   targets:
+     energy/<variantname>:
+       key: <energy-key>
+       unit: <energy-unit>
+       forces:
+         key: forces
+
+When using the finetuned model in simulation engines the default target name expected
+by the ``metatomic`` package in order to use the model in ASE and LAMMPS calculations
+is ``energy``. When loading the model in ``metatomic`` you have to specify which
+variant should be used for energy and force prediction. You can find an example for how
+to do this in the tutorial
+(see :ref:`sphx_glr_generated_examples_0-beginner_02-fine-tuning.py`) and more in the
+`metatomic documentation`_.
+
+.. _metatomic documentation: https://docs.metatensor.org/metatomic/latest/engines/index.html
+
+
+Until here, our model would train on all weights of the model, create a new energy head
+and a new composition model.
+
+The basic fine-tuning strategy is a good choice for most use cases. Below, we present
+a few more advanced topics.
+
+Inheriting weights from existing heads
+--------------------------------------
+
+In some cases, the new targets might be similar to the existing targets
+in the pre-trained model. For example, if the pre-trained model is trained
+on energies and forces computed with the PBE functional, and the new targets
+are energies and forces coming from the PBE0 calculations, it might be beneficial
+to initialize the new PBE0 heads and last layers with the weights of the PBE
+heads and last layers. This can be done by specifying the ``inherit_heads``
+parameter in the ``options.yaml`` file:
+
+.. code-block:: yaml
+
+  architecture:
+    training:
+      finetune:
+        method: full
+        read_from: path/to/checkpoint.ckpt
+        inherit_heads:
+          energy/<variantname>: energy # inherit weights from the "energy" head
+
+The ``inherit_heads`` parameter is a dictionary mapping the new trainable
+targets specified in the ``training_set/targets`` section to the existing
+targets in the pre-trained model. The weights of the corresponding heads and
+last layers will be copied from the source heads to the destination heads
+instead of random initialization. These weights are still trainable and
+will be adapted to the new dataset during the training process.
+
+
+Multi-fidelity training
+-----------------------
+So far the old head is left untouched, but it is rendered useless, due to changing
+deeper weights of the model. If you want to fine-tune and retain multiple functional
+heads, the recommended way is to do full fine-tuning on a new target, but keep
+training the old energy head as well. This will leave you with a model capable of
+using different variants for energy and force prediction. Again, you are able to select
+the preferred head in ``LAMMPS`` or when creating a ``metatomic`` calculator object.
+Thus, you should specify both variants in the ``targets`` section of your
+``options.yaml``. In the code snippet, we additionally assume that the energy labels
+come from different datasets. Please note, if you have both references in one file,
+they can be selected by selecting the corresponding keys from the same system.
+the same dataset.
+
+.. code-block:: yaml
+
+  training_set:
+      - systems:
+            read_from: dataset_1.xyz
+            length_unit: angstrom
+        targets:
+            energy/<variant1>:
+                quantity: energy
+                key: my_energy_label1
+                unit: eV
+                description: 'my variant1 description'
+      - systems:
+            read_from: dataset_2.xyz
+            length_unit: angstrom
+        targets:
+            energy/<variant2>:
+                quantity: energy
+                key: my_energy_label2
+                unit: eV
+                description: 'my variant2 description'
+
+
+
+You can find more about setting up training with multiple files in the
+:ref:`Training YAML reference <train_yaml_config>`.
+
+
+Training only the head weights can be an alternative, if one wants to keep the old energy
+head, but the reference data it was trained are not available. In that case, the
+internal model weights are frozen, and only the weights of the new target are trained.
+
+
+Fine-tuning model Heads only
+----------------------------
+
+Adapting all the model weights to a new dataset is not always the best approach. If the
+new dataset consist of the same or similar data computed with a slightly different level
+of theory compared to the pre-trained models' dataset, you might want to keep the
+learned representations of the crystal structures and only adapt the readout layers
+(i.e. the model heads) to the new dataset.
+In this case, the ``mtt train`` command needs to be accompanied by the specific training
+options in the ``options.yaml`` file. The following options need to be set:
+
+.. code-block:: yaml
+
+  architecture:
+    training:
+      finetune:
+        method: "heads"
+        read_from: path/to/checkpoint.ckpt
+        config:
+          head_modules: ['node_heads', 'edge_heads']
+          last_layer_modules: ['node_last_layers', 'edge_last_layers']
+
+
+The ``method`` parameter specifies the fine-tuning method to be used and the
+``read_from`` parameter specifies the path to the pre-trained model checkpoint. The
+``head_modules`` and ``last_layer_modules`` parameters specify the modules to be
+fine-tuned. Here, the ``node_*`` and ``edge_*`` modules represent different parts of the
+model readout layers related to the atom-based and bond-based features. The
+``*_last_layer`` modules are the last layers of the corresponding heads, implemented as
+multi-layer perceptron (MLPs). You can select different combinations of the node and
+edge heads and last layers to be fine-tuned.
+
+We recommend to first start the fine-tuning including all the modules listed above and
+experiment with their different combinations if needed. You might also consider using a
+lower learning rate, e.g. ``1e-5`` or even lower, to stabilize the training process.
diff --git a/docs/src/concepts/index.rst b/docs/src/concepts/index.rst
index db689d7518..06372e535d 100644
--- a/docs/src/concepts/index.rst
+++ b/docs/src/concepts/index.rst
@@ -9,5 +9,6 @@ such as output naming, auxiliary outputs, and wrapper models.
    :maxdepth: 1
 
    output-naming
+   fine-tuning
    loss-functions
    auxiliary-outputs
diff --git a/docs/src/getting-started/finetuning-example.rst b/docs/src/getting-started/finetuning-example.rst
deleted file mode 100644
index f13ec6e556..0000000000
--- a/docs/src/getting-started/finetuning-example.rst
+++ /dev/null
@@ -1,110 +0,0 @@
-.. _fine-tuning-example:
-
-Finetuning example
-==================
-
-.. warning::
-
-  Finetuning is currently only available for the PET architecture.
-
-
-This is a simple example for fine-tuning PET-MAD (or a general PET model), that
-can be used as a template for general fine-tuning with metatrain.
-Fine-tuning a pretrained model allows you to obtain a model better suited for
-your specific system. You need to provide a dataset of structures that have
-been evaluated at a higher reference level of theory, usually DFT. Fine-tuning
-a universal model such as PET-MAD allows for reasonable model performance even if little training
-data is available.
-It requires using a pre-trained model checkpoint with the ``mtt train`` command and setting the
-new targets corresponding to the new level of theory in the ``options.yaml`` file.
-
-
-In order to obtain a pretrained model, you can use a PET-MAD checkpoint from huggingface
-
-.. code-block:: bash
-
-  wget https://huggingface.co/lab-cosmo/pet-mad/resolve/v1.1.0/models/pet-mad-v1.1.0.ckpt
-
-Next, we set up the ``options.yaml`` file. We can specify the fine-tuning method
-in the ``finetune`` block in the ``training`` options of the ``architecture``.
-Here, the basic ``full`` option is chosen, which finetunes all weights of the model.
-All available fine-tuning methods are found in the advanced concepts
-:ref:`Fine-tuning <fine-tuning>`. This section discusses implementation details,
-options and recommended use cases. Other fine-tuning options can be simply substituted in this script,
-by changing the ``finetune`` block.
-
-Furthermore, you need to specify the checkpoint, that you want to fine-tune in
-the ``read_from`` option.
-
-A simple ``options.yaml`` file for this task could look like this:
-
-Training on a new level of theory is a common use case for transfer learning. Let's
-
-.. code-block:: yaml
-
-  architecture:
-    name: pet
-    training:
-      num_epochs: 1000
-      learning_rate: 1e-5
-      finetune:
-        method: full
-        read_from: path/to/checkpoint.ckpt
-  training_set:
-    systems:
-        read_from: dataset.xyz
-        reader: ase
-        length_unit: angstrom
-    targets:
-        energy:
-            quantity: energy
-            read_from: dataset.xyz
-            reader: ase
-            key: energy
-            unit: eV
-            forces:
-                read_from: dataset.xyz
-                reader: ase
-                key: forces
-            stress:
-                read_from: dataset.xyz
-                reader: ase
-                key: stress
-
-  test_set: 0.1
-  validation_set: 0.1
-
-In this example, we specified generic but reasonable ``num_epochs`` and ``learning_rate``
-parameters. The ``learning_rate`` is chosen to be relatively low to stabilise
-training.
-
-.. warning::
-
-  Note that in ``targets`` we use the PET-MAD ``energy`` head. This means, that there won't be a new head
-  for the new reference energies provided in your dataset. This can lead to bad performance, if the reference
-  energies differ from the ones used in pretraining (different levels of theory, or different electronic structure
-  software used). In future it is recommended to create a new ``energy`` target for the new level of theory.
-  Find more about this in :ref:`Transfer-Learning <transfer-learning>`
-
-
-
-We assumed that the pre-trained model is trained on the dataset ``dataset.xyz`` in which
-energies are written in the ``energy`` key of the ``info`` dictionary of the
-energies. Additionally, forces and stresses should be provided with corresponding keys
-which you can specify in the ``options.yaml`` file under ``targets``.
-Further information on specifying targets can be found in the :ref:`data section of the Training YAML Reference
-<data-section>`.
-
-.. note::
-
-  It is important that the ``length_unit`` is set to ``angstrom`` and the ``energy`` ``unit`` is ``eV`` in order
-  to match the units PET-MAD was trained on. If your dataset has different energy units, it is
-  necessary to convert it to ``eV`` before fine-tuning.
-
-
-After setting up your ``options.yaml`` file, finetuning can then simply be run
-via ``mtt train options.yaml``.
-
-
-Further fine-tuning examples can be found in the
-`AtomisticCookbook <https://atomistic-cookbook.org/examples/pet-finetuning/pet-ft.html>`_
diff --git a/docs/src/getting-started/index.rst b/docs/src/getting-started/index.rst
index 1baee9deeb..af6d2f4224 100644
--- a/docs/src/getting-started/index.rst
+++ b/docs/src/getting-started/index.rst
@@ -10,5 +10,4 @@ This sections describes how to install the package, and its most basic commands.
    train_yaml_config
    override
    checkpoints
-   finetuning-example
    units
diff --git a/docs/src/getting-started/quickstart.rst b/docs/src/getting-started/quickstart.rst
index 4251124e85..77384422d7 100644
--- a/docs/src/getting-started/quickstart.rst
+++ b/docs/src/getting-started/quickstart.rst
@@ -8,6 +8,13 @@ Quickstart
    :start-after: <!-- marker-quickstart -->
    :end-before: <!-- marker-shell -->
 
-For a more detailed description please checkout
-our :ref:`label_basic_usage` and the rest of the
-documentation.
+.. hint::
+
+  If you want to fine-tune an existing model
+  check out :ref:`label_fine_tuning_concept`.
+
+.. note::
+
+  For a more detailed descriptions on the training pleases, checkout
+  our :ref:`label_basic_usage` and the rest of the
+  documentation.
diff --git a/examples/0-beginner/02-fine-tuning.py b/examples/0-beginner/02-fine-tuning.py
index e3dd2f5b28..e549538706 100644
--- a/examples/0-beginner/02-fine-tuning.py
+++ b/examples/0-beginner/02-fine-tuning.py
@@ -1,198 +1,296 @@
 r"""
-.. _fine-tuning:
 
-Fine-tune a pre-trained model
-=============================
+Fine-tuning a pre-trained model
+===============================
 
 .. warning::
 
-  This section of the documentation is only relevant for PET model so far.
+  Finetuning is currently only available for the PET architecture.
 
-This section describes the process of fine-tuning a pre-trained model to
-adapt it to new tasks or datasets. Fine-tuning is a common technique used
-in machine learning, where a model is trained on a large dataset and then
-fine-tuned on a smaller dataset to improve its performance on specific tasks.
-So far the fine-tuning capabilities are only available for PET model.
 
-There is a complete example in :ref:`Fine-tune example <fine-tuning-example>`.
+This is a simple example for fine-tuning PET-MAD (or a general PET model), that
+can be used as a template for general fine-tuning with metatrain.
+Fine-tuning a pretrained model allows you to obtain a model better suited for
+your specific system. You need to provide a dataset of structures that have
+been evaluated at a higher reference level of theory, usually DFT. Fine-tuning
+a universal model such as PET-MAD allows for reasonable model performance even if little
+training data is available.
+It requires using a pre-trained model checkpoint with the ``mtt train`` command and
+setting the new targets corresponding to the new level of theory in the ``options.yaml``
+file.
 
-.. note::
-
-  Please note that the fine-tuning recommendations in this section are not universal
-  and require testing on your specific dataset to achieve the best results. You might
-  need to experiment with different fine-tuning strategies depending on your needs.
-
-
-Basic Fine-tuning
------------------
-
-The basic way to fine-tune a model is to use the ``mtt train`` command with the
-available pre-trained model defined in an ``options.yaml`` file. In this case, all the
-weights of the model will be adapted to the new dataset. In contrast to to the
-training continuation, the optimizer and scheduler state will be reset. You can still
-adjust the training hyperparameters in the ``options.yaml`` file, but the model
-architecture will be taken from the checkpoint.
-
-To set the path to the pre-trained model checkpoint, you need to specify the
-``read_from`` parameter in the ``options.yaml`` file:
-
-.. code-block:: yaml
-
-  architecture:
-    training:
-      finetune:
-        method: "full" # This stands for the full fine-tuning
-        read_from: path/to/checkpoint.ckpt
-
-We recommend to use a lower learning rate than the one used for the original training,
-as this will help stabilizing the training process. I.e. if the default learning rate is
-``1e-4``, you can set it to ``1e-5`` or even lower, using the following in the
-``options.yaml`` file:
-
-.. code-block:: yaml
 
-  architecture:
-    training:
-      learning_rate: 1e-5
+In order to obtain a pretrained model, you can use a PET-MAD checkpoint from huggingface
 
-Please note, that in the case of the basic fine-tuning, the composition model weights
-will be taken from the checkpoint and not adapted to the new dataset.
+.. code-block:: bash
 
-The basic fine-tuning strategy is a good choice in the case when the level of theory
-which is used for the original training is the same, or at least similar to the one used
-for the new dataset. However, since this is not always the case, we also provide more
-advanced fine-tuning strategies described below.
+  wget https://huggingface.co/lab-cosmo/pet-mad/resolve/v1.1.0/models/pet-mad-v1.1.0.ckpt
 
-Here is the specification for the inputs to pass to the
-``architecture.training.finetune`` parameter in case of the basic fine-tuning:
+Next, we set up the ``options.yaml`` file. Here we specify to fine-tune on a small model
+dataset containing structures of ethanol, labelled with energies and forces.
+We can specify the fine-tuning method in the ``finetune`` block in the ``training``
+options of the ``architecture``. Here, the basic ``full`` option is chosen, which
+finetunes all weights of the model. All available fine-tuning methods are found in the
+concepts page :ref:`Fine-tuning <label_fine_tuning_concept>`. This section discusses
+implementation details, options and recommended use cases. Other fine-tuning options can
+be simply substituted in this script, by changing the ``finetune`` block.
 
-.. autoclass:: metatrain.pet.modules.finetuning.FullFinetuneHypers
-    :members:
-    :undoc-members:
+.. note::
 
+  Since our dataset has energies and forces obtained from reference calculations,
+  different from the reference of the pretrained model, it is recommended to create a
+  new energy head. Using this so-called energy variant can be simply invoked by
+  requesting a new target in the options file. Follow the nomenclature
+  energy/{yourname}.
 
-Fine-tuning model Heads
------------------------
 
-Adapting all the model weights to a new dataset is not always the best approach. If the
-new dataset consist of the same or similar data computed with a slightly different level
-of theory compared to the pre-trained models' dataset, you might want to keep the
-learned representations of the crystal structures and only adapt the readout layers
-(i.e. the model heads) to the new dataset.
+Furthermore, you need to specify the checkpoint, that you want to fine-tune in
+the ``read_from`` option.
 
-In this case, the ``mtt train`` command needs to be accompanied by the specific training
-options in the ``options.yaml`` file. The following options need to be set:
+A simple ``options-ft.yaml`` file for this task could look like this:
 
 .. code-block:: yaml
 
-  architecture:
-    training:
-      finetune:
-        method: "heads"
-        read_from: path/to/checkpoint.ckpt
-        config:
-          head_modules: ['node_heads', 'edge_heads']
-          last_layer_modules: ['node_last_layers', 'edge_last_layers']
-
-
-The ``method`` parameter specifies the fine-tuning method to be used and the
-``read_from`` parameter specifies the path to the pre-trained model checkpoint. The
-``head_modules`` and ``last_layer_modules`` parameters specify the modules to be
-fine-tuned. Here, the ``node_*`` and ``edge_*`` modules represent different parts of the
-model readout layers related to the atom-based and bond-based features. The
-``*_last_layer`` modules are the last layers of the corresponding heads, implemented as
-multi-layer perceptron (MLPs). You can select different combinations of the node and
-edge heads and last layers to be fine-tuned.
-
-We recommend to first start the fine-tuning including all the modules listed above and
-experiment with their different combinations if needed. You might also consider using a
-lower learning rate, e.g. ``1e-5`` or even lower, to stabilize the training process.
-
-Here is the specification for the inputs to pass to the
-``architecture.training.finetune`` parameter in case of ``"heads"`` fine-tuning:
-
-.. autoclass:: metatrain.pet.modules.finetuning.HeadsFinetuneHypers
-    :members:
-    :undoc-members:
+    architecture:
+      name: pet
+      training:
+        batch_size: 8
+        num_epochs: 10
+        learning_rate: 1e-3
+        warmup_fraction: 0.01
+        finetune:
+          method: full
+          read_from: pet-mad-v1.1.0.ckpt
+          inherit_heads:
+            energy/finetune: energy # inherit weights from the "energy" head
+
+    training_set:
+      systems:
+        read_from: ethanol_reduced_100.xyz
+        reader: ase
+        length_unit: angstrom
+      targets:
+        energy/finetune:
+          quantity: energy
+          read_from: ethanol_reduced_100.xyz
+          reader: ase
+          key: energy
+          unit: eV
+          description: "pbe energy ethanol"
+          forces:
+            read_from: ethanol_reduced_100.xyz
+            reader: ase
+            key: forces
+
+    validation_set: 0.1
+    test_set: 0.1
+
+
+In this example, we specified a low number of :attr:`num_epochs` and a relatively high
+:attr:`learning_rate`, for short compilation time. Usually, the ``learning_rate`` is
+chosen to be relatively low. Typically lower, than the ``learning_rate`` that the model
+has been per-trained on.
+to stabilise training.
 
-.. autoclass:: metatrain.pet.modules.finetuning.HeadsFinetuneConfig
-    :members:
-    :undoc-members:
-
-
-LoRA Fine-tuning
-----------------
+.. warning::
 
-If the conceptually new type of structures is introduced in the new dataset, tuning only
-the model heads might not be sufficient. In this case, you might need to adapt the
-internal representations of the crystal structures. This can be done using the LoRA
-technique. However, in this case the model heads will be not adapted to the new dataset,
-so conceptually the level of theory should be consistent with the one used for the
-pre-trained model.
+  Note that in ``targets`` we use the ``energy/finetune`` head, differing from the
+  default ``energy`` head. This means, that the model creates a new head with a new
+  composition model for the new reference energies provided in your dataset. While
+  the old energy reference is still available, it is rendered useless, as we trained
+  all weights of the model. If you want to obtain a model with multiple energy heads,
+  you can simply train on multiple energy references simultaneously. This and other
+  more advanced fine-tuning strategies are discussed in
+  :ref:`Fine-tuning concepts <label_fine_tuning_concept>`.
 
-What is LoRA?
-^^^^^^^^^^^^^
 
-LoRA (Low-Rank Adaptation) stands for a Parameter-Efficient Fine-Tuning (PEFT)
-technique used to adapt pre-trained models to new tasks by introducing low-rank
-matrices into the model's architecture.
+We assumed that the pre-trained model is trained on the dataset
+``ethanol_reduced_100.xyz`` in which energies are written in the ``energy`` key of
+the ``info`` dictionary of the dataset.
+Additionally, forces should be provided with corresponding keys
+which you can specify in the ``options-ft.yaml`` file under ``targets``.
+Further information on specifying targets can be found in the :ref:`data section of
+the Training YAML Reference <data-section>`.
 
-Given a pre-trained model with the weights matrix :math:`W_0`, LoRA introduces
-low-rank matrices :math:`A` and :math:`B` of a rank :math:`r` such that the
-new weights matrix :math:`W` is computed as:
+.. note::
 
-.. math::
+  It is important that the ``length_unit`` is set to ``angstrom`` and the ``energy``
+  ``unit`` is ``eV`` in order to match the units of your reference data.
 
-  W = W_0 + \frac{\alpha}{r} A B
 
-where :math:`\alpha` is a regularization factor that controls the influence
-of the low-rank matrices on the model's weights. By adjusting the rank :math:`r`
-and the regularization factor :math:`\alpha`, you can fine-tune the model
-to achieve better performance on specific tasks.
+After setting up your ``options-ft.yaml`` file, you can then simply run:
 
-To use LoRA for fine-tuning, you need to provide the pre-trained model checkpoint with
-the ``mtt train`` command and specify the LoRA parameters in the ``options.yaml`` file:
+.. code-block:: bash
 
-.. code-block:: yaml
+  mtt train options-ft.yaml -o model-ft.pt
 
-  architecture:
-    training:
-      finetune:
-        method: "lora"
-        read_from: path/to/pre-trained-model.ckpt
-        config:
-          alpha: 0.1
-          rank: 4
-
-These parameters control the rank of the low-rank matrices introduced by LoRA
-(``rank``), and the regularization factor for the low-rank matrices (``alpha``).
-By selecting the LoRA rank and the regularization factor, you can control the
-amount of adaptation to the new dataset. Using lower values of the rank and
-the regularization factor will lead to a more conservative adaptation, which can help
-balancing the performance of the model on the original and new datasets.
-
-We recommend to start with the LoRA parameters listed above and experiment with
-different values if needed. You might also consider using a lower learning rate,
-e.g. ``1e-5`` or even lower, to stabilize the training process.
-
-Here is the specification for the inputs to pass to the
-``architecture.training.finetune`` parameter in case of ``"lora"`` fine-tuning:
-
-.. autoclass:: metatrain.pet.modules.finetuning.LoRaFinetuneHypers
-    :members:
-    :undoc-members:
-
-.. autoclass:: metatrain.pet.modules.finetuning.LoRaFinetuneConfig
-    :members:
-    :undoc-members:
-
-Fine-tuning on a new level of theory
-------------------------------------
-
-If the new dataset is computed with a totally different level of theory compared to the
-pre-trained model, which includes, for instance, the different composition energies, or
-you want to fine-tune the model on a completely new target, you might need to consider
-the transfer learning approach and introduce a new target in the ``options.yaml`` file.
-More details about this approach can be found in the :ref:`Transfer Learning
-<transfer-learning>` section of the documentation.
+You can check finetuning training curves by parsing the ``train.csv`` that is written
+by ``mtt train``. We remove the old outputs folder from other examples, which
+is not necessary for the normal usage.
 """
+
+# %%
+#
+import glob
+import subprocess
+
+import ase.io
+import matplotlib.pyplot as plt
+import numpy as np
+from metatomic.torch.ase_calculator import MetatomicCalculator
+
+
+# %%
+#
+
+# Here, we get the PET-MAD ckpt, run ``mtt train`` as a subprocess, and delete the old
+# outputs folder.
+subprocess.run(
+    [
+        "wget",
+        "https://huggingface.co/lab-cosmo/pet-mad/resolve/v1.1.0/models/pet-mad-v1.1.0.ckpt",
+    ]
+)
+subprocess.run(["rm", "-rf", "outputs"])
+subprocess.run(["mtt", "train", "options-ft.yaml", "-o", "model-ft.pt"], check=True)
+
+# %%
+#
+csv_path = glob.glob("outputs/*/*/train.csv")[-1]
+with open(csv_path, "r") as f:
+    header = f.readline().strip().split(",")
+    f.readline()  # skip units row
+
+# Build dtype
+dtype = [(h, float) for h in header]
+
+# Load data as plain float array
+data = np.loadtxt(csv_path, delimiter=",", skiprows=2)
+
+# Convert to structured
+structured = np.zeros(data.shape[0], dtype=dtype)
+for i, h in enumerate(header):
+    structured[h] = data[:, i]
+
+# %%
+#
+# Now, let's plot the learning curves.
+
+# %%
+#
+training_energy_RMSE = structured["training energy/finetune RMSE (per atom)"]
+training_forces_MAE = structured["training forces[energy/finetune] MAE"]
+validation_energy_RMSE = structured["validation energy/finetune RMSE (per atom)"]
+validation_forces_MAE = structured["validation forces[energy/finetune] MAE"]
+
+fig, axs = plt.subplots(1, 2, figsize=(12, 5))
+
+axs[0].plot(training_energy_RMSE, label="training energy/finetune RMSE (per atom)")
+axs[0].plot(validation_energy_RMSE, label="validation energy/finetune RMSE (per atom)")
+axs[0].set_xlabel("Epochs")
+axs[0].set_ylabel("energy / meV")
+axs[0].set_xscale("log")
+axs[0].set_yscale("log")
+axs[0].legend()
+axs[1].plot(training_forces_MAE, label="training forces[energy/finetune] MAE")
+axs[1].plot(validation_forces_MAE, label="validation forces[energy/finetune] MAE")
+axs[1].set_ylabel("force / meV/A")
+axs[1].set_xlabel("Epochs")
+axs[1].set_xscale("log")
+axs[1].set_yscale("log")
+axs[1].legend()
+plt.tight_layout()
+plt.show()
+
+# %%
+#
+# You can see that the validation loss still decreases, however, for the sake of brevity
+# of this exercise we only finetuned for a few epochs. As further check for how well
+# your fine-tuned model performs on a dataset of choice, we can check the parity plots
+# for energy and force
+# (see :ref:`sphx_glr_generated_examples_0-beginner_04-parity_plot.py`).
+# For evaluation, we can compare performance of our fine-tuned model and the base model
+# PET-MAD. Using ``mtt eval`` we can simply evaluate our new energy head, by specifying
+# it in the options-ft-eval.yaml:
+#
+# .. code-block:: yaml
+#
+#   systems: ethanol_reduced_100.xyz
+#   targets:
+#     energy/finetune:
+#       key: energy
+#       unit: eV
+#       forces:
+#         key: forces
+#
+# and then run
+#
+# .. code-block:: bash
+#
+#  mtt eval model-ft.pt options-ft-eval.yaml -o output-ft.xyz
+#
+# Then you can simply read the predicted energies in the headers of the xyz file.
+# Another possibility is to load your fine-tuned model ``model-ft.pt`` as ``metatomic``
+# model and evaluate energies and forces with ASE in Python.
+#
+
+# %%
+#
+targets = ase.io.read(
+    "ethanol_reduced_100.xyz",
+    format="extxyz",
+    index=":",
+)
+calc_ft = MetatomicCalculator(
+    "model-ft.pt", variants={"energy": "finetune"}, extensions_directory=None
+)  # specify variant suffix here
+
+e_targets = np.array(
+    [frame.get_total_energy() / len(frame) for frame in targets]
+)  # target energies
+f_targets = np.array(
+    [frame.get_forces().flatten() for frame in targets]
+).flatten()  # target forces
+
+for frame in targets:
+    frame.set_calculator(calc_ft)
+
+e_predictions = np.array(
+    [frame.get_total_energy() / len(frame) for frame in targets]
+)  # predicted energies
+f_predictions = np.array(
+    [frame.get_forces().flatten() for frame in targets]
+).flatten()  # predicted forces
+
+# %%
+#
+fig, axs = plt.subplots(1, 2, figsize=(12, 5))
+
+# Parity plot for energies
+axs[0].scatter(e_targets, e_predictions, label="FT")
+axs[0].axline((np.min(e_targets), np.min(e_targets)), slope=1, ls="--", color="red")
+axs[0].set_xlabel("Target energy / meV")
+axs[0].set_ylabel("Predicted energy / meV")
+min_e = np.min(np.array([e_targets, e_predictions])) - 2
+max_e = np.max(np.array([e_targets, e_predictions])) + 2
+axs[0].set_title("Energy Parity Plot")
+axs[0].set_xlim(min_e, max_e)
+axs[0].set_ylim(min_e, max_e)
+
+# Parity plot for forces
+axs[1].scatter(f_targets, f_predictions, alpha=0.5, label="FT")
+axs[1].axline((np.min(f_targets), np.min(f_targets)), slope=1, ls="--", color="red")
+axs[1].set_xlabel("Target force / meV/Å")
+axs[1].set_ylabel("Predicted force / meV/Å")
+min_f = np.min(np.array([f_targets, f_predictions])) - 2
+max_f = np.max(np.array([f_targets, f_predictions])) + 2
+axs[1].set_title("Force Parity Plot")
+axs[1].set_xlim(min_f, max_f)
+axs[1].set_ylim(min_f, max_f)
+fig.tight_layout()
+plt.show()
+
+# %%
+#
+# Further fine-tuning examples can be found in the
+# `AtomisticCookbook <https://atomistic-cookbook.org/examples/pet-finetuning/pet-ft.html>`_
diff --git a/examples/0-beginner/04-parity_plot.py b/examples/0-beginner/04-parity_plot.py
index ccb7948105..6c6ed88033 100644
--- a/examples/0-beginner/04-parity_plot.py
+++ b/examples/0-beginner/04-parity_plot.py
@@ -1,4 +1,5 @@
 """
+
 Model validation with parity plots
 ==================================
 
diff --git a/examples/0-beginner/options-ft-eval.yaml b/examples/0-beginner/options-ft-eval.yaml
new file mode 100644
index 0000000000..c81a193341
--- /dev/null
+++ b/examples/0-beginner/options-ft-eval.yaml
@@ -0,0 +1,8 @@
+systems: ethanol_reduced_100.xyz
+
+targets:
+  energy/finetune:
+    key: energy
+    unit: eV
+    forces:
+      key: forces
diff --git a/examples/0-beginner/options-ft.yaml b/examples/0-beginner/options-ft.yaml
new file mode 100644
index 0000000000..6035788aa9
--- /dev/null
+++ b/examples/0-beginner/options-ft.yaml
@@ -0,0 +1,31 @@
+architecture:
+  name: pet
+  training:
+    batch_size: 8
+    num_epochs: 10
+    learning_rate: 1e-3
+    warmup_fraction: 0.01
+    finetune:
+      method: full
+      read_from: pet-mad-v1.1.0.ckpt
+
+training_set:
+  systems:
+    read_from: ethanol_reduced_100.xyz
+    reader: ase
+    length_unit: angstrom
+  targets:
+    energy/finetune:
+      quantity: energy
+      read_from: ethanol_reduced_100.xyz
+      reader: ase
+      key: energy
+      unit: eV
+      description: pbe energy ethanol
+      forces:
+        read_from: ethanol_reduced_100.xyz
+        reader: ase
+        key: forces
+
+validation_set: 0.1
+test_set: 0.1
diff --git a/src/metatrain/pet/documentation.py b/src/metatrain/pet/documentation.py
index db32c610b9..fe60466a44 100644
--- a/src/metatrain/pet/documentation.py
+++ b/src/metatrain/pet/documentation.py
@@ -202,5 +202,5 @@ class TrainerHypers(TypedDict):
     }
     """Parameters for fine-tuning trained PET models.
 
-    See :ref:`fine-tuning` for more details.
+    See :ref:`label_fine_tuning_concept` for more details.
     """