Post

Fine-tuning with LlamaIndex

Fine-tuning a model involves updating it with new data to enhance its performance. This includes generating better results, reducing hallucinations, including more data holistically, and lowering both latency and costs. Fine-tuning not only improves output quality but can also enhance how the system retrieves information.

This post was inspired by the LlamaIndex fine-tuning overview, I’ve prepared a few examples based on those shared by them. The next examples cover everything that is discussed in this post, and offer valuable insights for anyone interested in gaining a deeper understanding on fine-tuning:

Fine-tune LLMs

Open-source LLMs are significant achievements on their own. They perform well on many public benchmarks, often matching or even surpassing proprietary models. This makes them a reliable choice for complex applications, including RAG systems and agents. Fine-tuning these LLMs can be used to further improve them for specific cases, such as:

  • Output Styling: Enhance the model’s ability to generate text by improving its understanding of syntax, style, and rules.
  • Domain-Specific Language Improvement: Open-source models sometimes struggle with specific technical languages like SQL or Python. By training with a relevant text-to-code dataset, these models can become more proficient at generating accurate responses in these languages.
  • LLM Distillation: This process focuses on replicating a larger model’s performance for a specific task in a smaller and more cost-effective model. It starts by using the larger model to label data, which then trains a smaller “student” model to perform similarly on that task.
  • Function Calling: This involves enhancing the model’s ability to extract structured data by improving its function-calling abilities.
  • Improving re-ranking: Develop a re-ranker for specific domains or datasets to optimize the ordering of search results, enhancing the relevance of the top-ranked nodes.
  • LLM-based Evaluators: Fine-tune evaluators, known as judges, distilled from larger models. These evaluators can achieve high levels of agreement with human judgments, allowing for the creation of specialized evaluators based on these larger models.

Fine-tune Embeddings

The key question is how to improve retrieval. Often, the existing embeddings might not be ideal for your specific data, which can prevent them from working well with your retrieval objective. Fine-tuning these embeddings is a solution for this problem. To do this, you need training data consisting of text pairs that should be either close together (positive) or far apart (negative). Tools like the LlamaIndex module’s MultipleNegativesRankingLoss can help during training by automatically generating positive training pairs from unstructured text, while negative pairs are randomly selected from different text chunks. Here are some ways to improve system retrieval through fine-tuning.

  • Embedding:: Develop more meaningful embedding representations to boost retrieval performance by fine-tuning the embedding model over an unstructured text corpus that closely resemble the real data for a specific scenario..
  • Embedding Adapter:: A linear adapter applies a transformation to the query embeddings while keeping the document embeddings unchanged. This optimizes the query embeddings for specific data and queries. The advantage is that it works with any model, without the need to re-embed documents.
  • Routers: These are modules that enhance decision-making by selecting the most appropriate data source from multiple options based on a user query.
This post is licensed under CC BY 4.0 by the author.