Svetlana Borovkova: Finetuning LLMs for financial applications

Svetlana Borovkova: Finetuning LLMs for financial applications

Kunstmatige intelligentie Technologie
Svetlana Borovkova (foto archief Probability)

By Dr. Svetlana Borovkova, Head of Quant Modelling at Probability & Partners

In my previous column, I discussed how Large Language Models (LLMs) are and can be applied in financial services. I also outlined several challenges associated with such applications, such as the lack of domain-specific financial knowledge, hallucinations, timeliness, or explainability issues. Fortunately, techniques for dealing with these challenges are rapidly emerging. In this column, I would like to address some of them.

Issues such as hallucinations (producing plausibly sounding but incorrect answers) and timeliness (LLMs being trained up to a specific point in time and so, not ‘knowing’ anything that happened after that) can be dealt with using a technique called retrieval augmented generation (RAG). This technique is rapidly becoming popular, both in commercial LLMs (such as Chat GPT) and in the development of in-house LLM applications.

Retrieval Augmented Generation

In RAG, an LLM’s answer is cross-checked with an external and credible database or the contents of internet. It enables the model to access the most current and reliable information outside its training dataset.

To illustrate this, think of RAG as akin to an open book versus a closed book exam: in RAG, the model searches through an ‘external book’ (external database) for answers, unlike with the traditional approach, where it tries to generate answers from memory, with all possible unfortunate consequences (such as not knowing the answer and making it up). Access to an external database also allows for the verification of accuracy in LLM generated answers and ensures trust.

An added advantage of RAG is that it does not require an LLM to be constantly retrained on newly available data, avoiding associated high costs and computational time. The paid-for version of Chat GPT-4 has RAG as its integral part, as the LLM’s answers are crosschecked with the information available on internet, and the links to external sources are provided. For specific, in-house applications of LLMs, proprietary databases from famous data providers such as Bloomberg or Refinitiv are typically used, or internal databases.

Building an LLM from scratch

However, RAG is unable to solve the central challenge in using LLMs for financial applications: their generality and the absence of domain-specific knowledge. Trained on extensive textual data corpus, generative LLMs are able to cope with a broad spectrum of topics, yet they lack the expertise needed for specific tasks.

One apparent ‘brute force’ solution to this problem is to develop a bespoke Large Language Model from scratch and train it on an extensive yet domain specific textual corpus. A recently revealed Bloomberg GPT has been developed using this strategy and generated high expectations in finance community.

While such an LLM is expected to outperform ChatGPT in financial tasks, it still needs a massive training dataset. Hence, it may still grapple with the challenge of being too general and will likely lack domain expertise for highly specific tasks, such as analysis of specific classes of documents or generating regulatory compliant reports. Moreover, creating and training a new LLM is a hugely complex and costly task, with expenses ranging in tens if not hundreds of millions of dollars.

Parameter Efficient Finetuning

An alternative approach involves taking an open source LLM such as LLaMA and retraining or finetuning it using a domain specific data corpus. This modification injects the much-needed domain expertise into the model. However, all foundational LLMs have billions of parameters, so the retraining process, requiring modifications to all these parameters, remains too CPU- and memory intensive, and hence, expensive.

A revolutionary new technique called Parameter Efficient Finetuning (PEF) has recently been developed. The most notable methods are the Low Rank Adapters (LoRA) and Quantized LoRA (QLoRA). These groundbreaking techniques now allow for the efficient retraining or finetuning of a foundational LLM, making it feasible to achieve the much-needed domain knowledge at a relatively low cost.

The main idea behind LoRA is based on a fundamental technique in matrix calculus: Singular Value Decomposition. This decomposition is applied to the colossal matrix of weights, reducing the number of parameters that require adjustment by tens of thousands of times. I anticipate that proliferation of LoRA and QLoRA will lead to the creation of a myriad of domain specific LLMs, both in commercial and open source worlds.

The quality and explainability of data

As in RAG (but perhaps even more so), the importance of large quantities of relevant and high quality data for LLM finetuning cannot be overstated. That’s why data providers are currently uniquely positioned to leverage their market advantage, once RAG and PEF find their way into the development of specialized in-house versions of LLMs. But internal databases can be equally important for this task.

One problem for which no feasible solutions have been developed yet is the explainability of an LLM’s output. I hope that an army of researchers is working on this problem as we speak, especially in the light of the recent EU AI act, requiring AI models to be both fair and explainable.

The techniques outlined above – especially PEF – are still quite a bit removed from the way the general public uses LLMs. So currently, the easiest and most ‘democratic’ way to make an LLM do what you want is by clever prompt engineering. This approach does not entail any model modifications but relies on providing precise instructions to the LLM. Prompts should be engineered to steer the model toward the intended outcome, whether it is a particular format, size, or style. Various techniques can be used to generate effective prompts, but it is more of an art than a science.

Probability & Partners is a Risk Advisory Firm offering integrated risk management and quantitative modelling solutions to the financial sector and data-driven enterprises.