# Svetlana Borovkova: Transfer and reinforcement learning

By Svetlana Borovkova, Head of Quant Modelling at Probability & Partners

In my previous column on Machine Learning applications in finance I discussed credit and market risk. Today, I will address another exciting ML application: pricing and hedging financial derivatives.

You may recall that a derivatives contract (such as an option) derives its value from the value of an underlying asset, such as a stock, an index, a currency or a commodity. The value of an option is determined by its maturity and exercise price as well as the current price of the underlying, its volatility and the interest rate.

If all these parameters are known, then the option’s price (even for an exotic and complex option) is just a deterministic function of these parameters – albeit a very complicated and nonlinear one. In some rare situations, this function is known. For European puts and calls, it is the famous Black-Scholes formula. But for most options, it is unknown, and we try to approximate it by, for instance, finite difference methods or Monte Carlo simulations.

**Machine Learning in pricing functions**

The deterministic relation between the option’s price and its parameters makes this a perfect task for Machine Learning: you may recall that Neural Networks are perfect nonlinear function approximators. So if we feed enough training samples into a neural net (a large number of option prices for various strikes, maturities, volatilities et cetera), then we should be able to perfectly approximate the pricing function.

The problem is that we do not have extensive datasets of option prices for a wide range of parameters. A typical neural net needs hundreds of thousands of training samples to train properly and such large historical datasets of option prices are rarely available.

We can generate as many hypothetical option prices as we need from some reasonably realistic model, such as SABR or Heston. These are called *synthetic data*. Now the million dollar question is: if we train our neural net on such synthetic data, will it be able to cope with pricing actual options in real trading situations? In other words: can our machine learn building with toy bricks and then go and build a real house?

This is the holy grail for this application of Machine Learning, and it is called transfer learning. There is mounting evidence – including our own work – that transfer learning can work, if synthetic data are generated from a realistic model, calibrated to market data. However, further work is needed, as it seems there are certain regions of parameter values (like out-of-the-money or long maturity options) where such transfer learning is still not that successful.

**Reinforced learning for hedging **

An exciting new class of Machine Learning algorithms – so-called reinforcement learning – goes further than learning a function connecting inputs to the output and considers the consequences of acting upon its predictions. These algorithms sequentially tune their parameters according to the rewards associated with actions.

For example, if a neural net is used to make stock market predictions, the reward could be the amount of profit associated with each trade based on these predictions. The problem of hedging an option is another example: here hedging decisions must be continuously made throughout the lifetime of an option, and these decisions are accompanied by a clear notion of ‘reward’ (like hedge costs). So, reinforced learning is an ideal tool for hedging (and has already been applied there), but the problem here is essentially the same: the lack of data.

So, we asked ourselves two questions: will a reinforced learning algorithm, trained on just one type of option (one for which we have many quotes) be able to cope with hedging a wide variety of options? And, more importantly: if we train this algorithm on synthetic option prices, will it transfer its acquired knowledge to the real hedging environment?

It turns out that the answer to both question is ‘yes’. First, it is possible to ‘generalize’ the acquired hedging knowledge to a wider range of options than those the algorithm was trained on. And second, in the real hedging environment, the hedging costs obtained by the reinforced learning algorithm (trained on synthetic data) are 30% lower than traditional, Black-Scholes hedging strategies. So the transfer knowledge works – but it can be improved further and that is something we are working on.

**Further applications for reinforced learning**

Reinforced learning can be applied more widely, to any sequential decision-making problem were a reward is clearly associated with an action. One promising application is active asset management and stock selection. An ‘action’ here is portfolio rebalancing and the ‘reward’ is the portfolio P/L in the next holding period. We and others are exploring this and the results look promising.

Get in touch with us if you are thinking of applying Machine Learning to your asset or risk management process – we will be happy to explore this together! And if you want to know more about reinforcement learning for hedging, have a look at our Probability & Partners white paper by Alexandru Giurca and myself.

**Probability & Partners is a Risk Advisory Firm offering integrated risk management and quantitative modelling solutions to the financial sector and data-driven enterprises.**