Flow Toxicity

Understanding the determinants of market-making profitability

Mar 19, 2024

In last week’s post, we saw that, in a simple framework where the market-maker faces purely uninformed flow, a statistical arbitrage exists in that the market-maker makes money at a constant, positive rate per unit of risk. We saw that one way to recover a notion of market equilibrium is to introduce market impact, which cancels out the market-maker’s ability to make money on average.

In this post, we are interested in what happens when the market maker’s counterparty (liquidity taker) has a forecast (also called alpha) and trades on it. We first examine how the market maker’s profitability is impacted as a function of the quality of his forecast. A standard way to measure it is the so-called '”information coefficient”, or IC, which is just the correlation between the forecast and the asset return it’s supposed to predict:

\(\hbox{IC} = \hbox{Corr}\left(e_t,s_{t+h} - s_t\right) = \mathbb{E}\left(e_t\frac{(s_{t+h} - s_t)}{\sigma\sqrt{h}}\right),\)

where e_t is the normalized forecast, s_t is the stock price (assumed without drift), σ its (normal) volatility, and h the forecast’s horizon. From this, we need to specify how the informed traders will monetize their forecasts. It is reasonable to assume their aggregate position is increasing as a function of the strength of the forecast e_t. Recall that in an Avellaneda-Stoikov framework, the buying and selling at the best ask (resp. best bid) is modeled by Poisson processes with respective intensities

\(\lambda^{a,b}_t = A e^{-k\delta^{a,b}_t},\)

where A>0, k>0, and δ^a_t, δ^b_t are the ask- and bid-side spreads quoted by the market makers around mid price s_t. In order to accommodate informed flow, a simple solution consists in varying the baseline buying/selling intensity, turning it into a function of the forecast:

\(\begin{equation*} \left\{ \begin{split} &\lambda^a_t = A\phi(\kappa e_t)e^{-k\delta^a_t}\\ \\ &\lambda^b_t = A\left(1-\phi(\kappa e_t)\right)e^{-k\delta^b_t}\\ \\ &\phi(z) = \frac{1}{1 + e^{-z}},\ \ \ \ \kappa>0, \end{split} \right. \end{equation*}\)

so that the Poisson intensities are modulated from 0 to 1 according to the forecast’s strength, in the appropriate direction. (We could lay down a justification for this phenomenological model from an agent-based model perspective, but this isn’t our focus here.)

We can now run Monte-Carlo simulations for different values of the forecast’s IC and compute the market maker’s pnl (mean, standard deviation, and Sharpe ratio). Here’s the result:

(non-annualized Sharpe). As expected, the Sharpe ratio drops as the forecast IC increases, illustrating how important it is to know how to deal with toxic flow (that is, counterparties with high IC’s) as a market maker.

Let’s examine now the role of the forecast horizon h, for a given IC. Does the impact of toxic flow depend on the forecast horizon h? The answer is, hell yes:

I plotted the market maker’s Sharpe as a function of the flow frequency = 1/h, for a fixed IC. The reason for this curve shape is that there is a resonant frequency (which depends on the half-life of the market-maker’s positions turnover, which in turn ultimately depends on his risk aversion parameter) at which the market maker is most vulnerable. This can be justified theoretically, and may be the subject of another post. (If you want to derive it on your own, I suggest you look at the diffusive limit to make your life a little easier.)

In practice, market makers have rules of thumb to manage this phenomenon. Typically, flow is either i) segregated by frequency (high-frequency traders, medium, slow, etc.) and risk parameters (and therefore the resonant frequency) are adjusted in each bucket; or ii) a flow mix is chosen whereby the influence of bad-frequency flow is watered down. Also, market-makers have alphas of their own obviously, although these tend to lie at the very high end of the frequency spectrum, for the most part.

(Python code below for paying subscribers.)

Quantitatively Yours,

from __future__ import print_function
import datetime as dt
import numpy as np

# parameters ------
np.random.seed(10)
ndays = 100
deltat = 1 # seconds
StartDate = dt.datetime(2024,3,19,9,30,0)
date_list = [StartDate + dt.timedelta(seconds=x * deltat) for x in range(0, int(6.5*3600))]
date_text= [x.strftime('%Y-%m-%d %H:%M:%S') for x in date_list]
n = len(date_list)
DeltaT = n
A = 0.05
s0 = 20.
sigma = s0 * 0.02 / np.sqrt(n)
k = np.log(2.) / 0.01
q0 = 100.
gamma = 1e-2 / q0
kappa = 2#1
h = 15 * 60
IC = 1

ParamName = 'IC'
AllParams = [0,0.05,0.1,0.15,0.2,0.25,0.3,0.35,0.4,0.45,0.5]
ParamMultiplier = 1

#ParamName = 'horizon (min)'
#AllParams = [10,20,30,40,50,60,70,80,90,100,110,120,3*60,4*60,5*60,6*60,7*60,8*60,9*60,
#             10*60,11*60,12*60,13*60,14*60,15*60,20*60,25*60,30*60,60*60,90*60,120*60,150*60,180*60,210*60,240*60]
#ParamMultiplier = 1/60
# ---------------------------

def PrintStats(param,Stats):
    print("%1.2f, %1.0f, %1.0f, %1.1f"
          % (ParamMultiplier * param, Stats[param]['mu'],Stats[param]['sigma'],Stats[param]['Sharpe']))

DailyPnls = {}
IntradayPnls = {}
MidPrice = {}
Pos = {}
Stats = {}
print("%s,mean pnl,std pnl,Sharpe"%(ParamName))
for param in AllParams:

    IC = param
    #h = param

    delta_b = None
    delta_a = None
    DailyPnls[param] = np.zeros(ndays)
    IntradayPnls[param] = np.zeros(n)
    Pos[param] = np.zeros(n)
    MidPrice[param] = np.zeros(n)
    q_sum_sq = 0.
    q_sum = 0.
    for ind in range(ndays):
        u_b = np.random.uniform(0., 1., n)
        u_a = np.random.uniform(0., 1., n)
        forecast_noise = np.random.normal(0,1,n)
        dN_b = np.zeros(n)
        dN_a = np.zeros(n)
        N_b = np.zeros(n)
        N_a = np.zeros(n)
        q = np.zeros(n)
        dW = np.array(np.random.normal(0., 1., n))
        ds = sigma * dW
        s = s0 + np.cumsum(ds)
        for t in range(1, n):
            #Avellaneda-Stoikov model
            delta_b = max(0.5 * gamma * sigma * sigma * DeltaT + (1. / gamma) * np.log(
                1. + gamma / k) + gamma * sigma * sigma * DeltaT * q[t - 1], 0.)
            delta_a = max(0.5 * gamma * sigma * sigma * DeltaT + (1. / gamma) * np.log(
                1. + gamma / k) - gamma * sigma * sigma * DeltaT * q[t - 1], 0.)
            # adverse selection model
            theta = kappa * (IC * (s[min(t + h, n - 1)] - s[t]) / (sigma * np.sqrt(h)) \
                    + np.sqrt(1 - IC**2) * forecast_noise[t])
            adv_factor = 1. / (1. + np.exp(-theta))
            lambda_b = A * np.exp(-k * delta_b) * (1. - adv_factor)
            lambda_a = A * np.exp(-k * delta_a) * adv_factor
            if u_b[t] < lambda_b * deltat:
                dN_b[t] = 1.
            else:
                dN_b[t] = 0.
            if u_a[t] < lambda_a * deltat:
                dN_a[t] = 1.
            else:
                dN_a[t] = 0.
            N_b[t] = N_b[t-1] + dN_b[t]
            N_a[t] = N_a[t-1] + dN_a[t]
            q[t] = q0 * (N_b[t] - N_a[t])

        p_b = s - delta_b
        p_a = s + delta_a
        dx = q0 * (np.multiply(p_a,dN_a) - np.multiply(p_b,dN_b))
        x = np.cumsum(dx)
        pnl = x + np.multiply(q,s)

        DailyPnls[param][ind] = pnl[-1] #cumulated daily pnl
        IntradayPnls[param] = pnl
        Pos[param] = q
        MidPrice[param] = s
        q_sum_sq += np.sum(np.multiply(q,q))
        q_sum += np.sum(q)
    q_var = (1./(ndays * n)) * q_sum_sq - ((1./(ndays * n))  * q_sum)**2
    Stats[param] = {}
    Stats[param]['mu'] = np.mean(DailyPnls[param])
    Stats[param]['sigma'] = np.std(DailyPnls[param])
    Stats[param]['Sharpe'] = Stats[param]['mu'] / Stats[param]['sigma']
    Stats[param]['q_std'] = np.sqrt(q_var)
    PrintStats(param,Stats)

Maverick Quant

Discussion about this post