Fine-tuning a large language model (LLM) opens up exciting possibilities in NLP, especially when tackling real-world challenges. Recently, I set out to enhance an LLM for German address correction using open source data, exploring the difficulties of adapting a model to this specific task. In this post, I’ll share my experience, key takeaways, and practical tips for anyone looking to tackle a similar issue.

Understanding the Significance of Address Correction
Address correction is quite important in many sectors, including delivery services and location-based applications. Address correction processes are essential for maintaining accurate, up-to-date, and standardized address data across systems. This ensures seamless operations, reduces costs, and enhances customer trust. In Germany, where address formatting includes specific rules, umlauts, and unique street names, the task becomes even more challenging. Understanding these complexities is the first step towards effective fine-tuning.
Setting Up the Environment
Before diving into the fine-tuning process, I carefully prepared my environment. Here’s what I did:
Choose the Right LLM: I selected a pre-trained model renowned for its language understanding capabilities. This choice formed a strong foundation for my specific objectives.
Install Necessary Libraries: I ensured I had the latest versions of key libraries, such as Hugging Face Transformers, PyTorch and Unsloth. These tools are essential for working with LLMs.
Prepare the Dataset: Crafting the dataset was a critical phase. I collected over 10.4K German addresses. For the training set creation, I focused in the city/state of Berlin using the data from the OpenStreetMaps.
With the environment set, I was ready to tackle the task ahead.
Data Preparation: Shaping Raw Data into Strategic Value
A major painpoint in any address correction (or AI-) project is data preparation. My dataset included:
Valid German addresses
Common formatting errors
Misspellings or incomplete addresses
Various types of additional noise to simulate problematic addresses
I followed a structured approach to clean and preprocess the data, including removing non-german addresses. Importantly, I created pairs of incorrect and correct addresses. This approach is essential for helping the model learn how to identify and correct errors effectively. Below I include the prompt for the LLM's finetuning as well as a couple of samples which were used for training
Prompt:
"""You are an address corrector. Below is an instruction that describes a task. Write a response that appropriately completes the request.
### Instruction:
Correct (if false) the given address #ADDRESS below which follows this structure: Street, Postal Code, City\n#ADDRESS: {}
### Response:
{}
"""
Below I provide some pairs of Instruction and Response that fill up the above template so that the model can start learning the format of an address but also the naming of addresses in Berlin. To make the task simpler, I removed the numbering and focused on the other parts such as postal code, street name, and structure of address.
Example pairs:
Correct (if false) the given address #ADDRESS below which follows this structure: Street, Postal Code, City\n#ADDRESS: Berlin, 12347, hamnemannstrass
Berlin, 12347, hannemannstrasse
Correct (if false) the given address #ADDRESS below which follows this structure: Street, Postal Code, City\n#ADDRESS: Berlin, 11123, wilhelm-caspar-wielely-pgatz
Berlin, 10623, wilhelm-caspar-wegely-platz
Correct (if false) the given address #ADDRESS below which follows this structure: Street, Postal Code, City\n#ADDRESS: 10785, Berlin, else-laskear-schueler-stradsse
Berlin, 10783, else-lasker-schueler-strasse
Correct (if false) the given address #ADDRESS below which follows this structure: Street, Postal Code, City\n#ADDRESS: Berlin, 10713, offmaan-vol-fallersneben-platz
Berlin, 10713, hoffmann-von-fallersleben-platz
This noisy data are comprising my training dataset which inlcudes around 100K samples of a varying percentage of noise fro 0 to 50% e.g., 50% means that half the characters of an address will be affected by a mix of shuffling, addition, or deletion operations.
Training the Model: Fine-Tuning Process
With my dataset ready, I began the fine-tuning process. I choose a good-enough and easy to train open-source LLM namely "Llama-3.2-3B-Instruct-bnb-4bit". For the finetuning process, I used lora which means that the LLM's weights are frozen and on top of them I train lora weights to force the model to pick up the new task.
Choosing Hyperparameters
Getting hyperparameters right is critical. I started with:
Learning Rate: I set this at 1e-4. This value balances speed and model performance during fine-tuning
Batch Size: I used a batch size of 8, and gradient accumulation steps of 16 which helped optimize training speed while managing memory
Optimizer: to have a good performance and memory balance I used adamw_8bit with weight_decay of 0.01, and a linear scheduler_type
For the lora weights, I used the ones which make the GPU RAM reach top capacity w/o needing paging through RAM.
Implementing the Training Loop
My training loop was straightforward, focusing on:
Input Formatting: Each incorrect and correct address pair was carefully input as an input-output pair for the model.
Loss Monitoring: I monitored loss closely during training, aiming for a steady decrease over the training epochs. For monitoring, I used tensorboard which is supported by unsloth and makes the whole task so much easier!
In the figure below, it is quite clear that the model is able to pick up the task quite easily after the first epoch (around 100 steps) :D

Evaluating and Iterating
To evaluate the ft-LLM, I generated a new set of noisy data and excluded the ones which existed on the training dataset so that the evaluation is performed in unseen problematic addresses. The initial fine-tuning results were promising e.g., around 59%, but there was room for improvement. Here’s how I optimized performance further:
Error Analysis
I reviewed the most common mistakes the model made. This analysis helped me identify specific patterns in address components that needed more focus.
Data Augmentation
I enriched my dataset by synthesizing new examples. By altering existing addresses to introduce common errors, I trained the model on a broader spectrum of scenarios. This increased the dataset by 40%.
Regularization Techniques
To prevent overfitting, I utilized various regularization strategies, including dropout layers and weight decay. These methods enhanced the model’s ability to generalize better to unseen data.
Results
After a couple rounds of iterations, the LLM was able to achieve 74.8% accuracy which in many cases is a good enough model for production. Lets see some corrected addresses and some failed ones. For accuracy, I accept only a perfect match, in other words even if 1 character is different then it is considered as failed.
Corrected addresses
Input: wrong address | Output: corrected address |
Berlin, 13088, eumagserntasse | Berlin, 13088, neumagenerstrasse |
Berlin, 12623, neuePnChasNeerstrgspae | Berlin, 12623, neuenhagenerstrasse |
Glienicke/Nordbahn, 16548, eeWethoenstrassb | Glienicke/Nordbahn, 16548, beethovenstrasse |
15307, Berlin, haessilertrseO | Berlin, 12307, haeselerstrasse |
Berlin, 11357, travmVanudersressMe | Berlin, 13357, travemuenderstrasse |
Non-corrected addresses
Input: wrong address | Output: faulty address | Correct address |
Berlin, 13465, enNayksrchetirXrasse | Berlin, 13465, eichenstrasse | Berlin, 13465, enkircherstrasse |
Berlin, 12355, sanglehanc | Berlin, 12355, sangelhanz | Berlin, 12355, langschanweg |
Berlin, 12459, aGudl-tDos-traspe | Berlin, 12459, paul-tauber-strasse | Berlin, 12459, paul-tropp-strasse |
Berlin, 10247, richra-smchstsse | Berlin, 10247, richard-schmidt-strasse | Berlin, 10247, richard-ermisch-strasse |
Practical Tips for Fellow NLP Enthusiasts
From my experience, here are some actionable tips for those looking to fine-tune an LLM for address correction tasks:
Diverse Dataset: A training dataset must cover a wide array of variations and common errors for effective results.
Error Analysis: Conduct thorough analysis after training to understand the model’s weaknesses.
Regularization: Use regularization techniques to minimize overfitting, particularly in smaller datasets.
Continuous Learning: Keep your model adaptive by integrating new data over time to enhance its performance.
User Feedback: If possible, create mechanisms for user feedback to refine the model continually.
Final Thoughts
Fine-tuning a large language model for address correction of German addresses has been a rewarding experience. It demanded careful preparation, dedicated training, and a spirit of iteration.
As AI enthusiasts, we have the chance to improve how machines understand and handle language, making navigation easier for everyone. With the right strategy and commitment, anyone can successfully explore this exciting journey in NLP.
I hope my insights and tips help pave the way for your success in fine-tuning LLMs for real-world applications. Happy coding!
** If you are interested in other ML use-cases, please contact me using the form (and also include a publicly available dataset for this case, I'm always curious to explore new problems).