from IPython.display import display
from IPython.display import HTML
import IPython.core.display as di # Example: di.display_html('<h3>%s:</h3>' % str, raw=True)

# This line will hide code by default when the notebook is exported as HTML
di.display_html('<script>jQuery(function() {if (jQuery("body.notebook_app").length == 0) { jQuery(".input_area").toggle(); jQuery(".prompt").toggle();}});</script>', raw=True)

# This line will add a button to toggle visibility of code blocks, for use with the HTML export version
di.display_html('''<button onclick="jQuery('.input_area').toggle(); jQuery('.prompt').toggle();">Toggle code</button>''', raw=True)

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from statsmodels.graphics.gofplots import qqplot
import warnings
from scipy.stats import pareto, lognorm, kurtosis
from IPython.display import HTML
warnings.filterwarnings('ignore')
plt.style.use('bmh')
plt.rcParams['font.family'] = 'serif'
plt.rcParams['font.size'] = 10
from math import sqrt

from jupyterthemes import jtplot
from jupyterthemes.stylefx import set_nb_theme
set_nb_theme('grade3')

Statistical inference in the non-light tailed world

Why be suspicious of the sample standard deviation and not rely on the sample kurtosis

Home	About	Archive

This notebook is a continuation of my previous article Fat tails, Bitcoin and the Stock market.

Let's quote the final part of that article that refers to the estimate of the moments of the SPX and BTC losses distribution:

As we can notice in the above charts, the first moment (the mean) is finite both for SPX and BTC.
Regarding the $2^{nd}$ moment (variance) instead, while it seems to be finite for the SPX losses distribution, it is unclear whether it is finite or infinite for the BTC losses distribution, because it doesn't seem to converge to zero with evidence.
Moreover for the BTC it might not make sense to compute the sample skewness and the sample kurtosis because it is likely that the $3^{rd}$ and $4^{th}$ moments are not finite (while, trivially, the sample statistics are always finite).
It is instead not so clear whether the SPX losses distribution has a finite $3^{rd}$ moment or not.
For the SPX the 4th moment of the losses distribution is likely not to exist (this fact is confirmed also by N.N.Taleb analysis in his Statistical consequences of fat tails): in this case the estimate of the true volatility could be unreliable even if the $2^{nd}$ moment actually existed .
The conclusion is: we are sure that the $1^{st}$ moment exists, but from the $2^{nd}$ moment on (for BTC) and from the $3^{rd}$ moment on (for SPX), we have too few data in order to make any strong claims about the existence of these moments.

Now let's say that we have our empirical losses distribution and we decide to measure risk with the standard deviation and the kurtosis (there is not a unique-objective way to measure risk but for our purposes we map the word risk to these two mathematical concepts).
In fact we can decide to infer the standard deviation and the kurtosis regarding the whole data distribution (i.e. the daily log-returns) or regarding only the absolute values of the left tail of that distribution (i.e. the losses distribution).
Since we are going to make use of the lognormal distribution in order to show our conclusions, we decide to focus only on the losses distribution and so to infer its standard deviation and its excess kurtosis.

In this notebook we will show that even if all the moments actually existed (hence the losses distribution doesn't follow a pareto distribution, i.e. its tail decreases more rapidly than a power law decay) we nevertheless might have serious problems in order to obtain a reliable estimate of the standard deviation and kurtosis.
In other words we have troubles to quantify how much the risk is.
Meaning we have troubles to know how much the next loss could depart from the sample mean loss.

As I said before I will make use of the lognormal distribution.
This distribution has all finite moments but when the parameter $\sigma$ is not small, its tail behaviour can be easily mistaken with a power law decay: indeed a lognormal with a not small $\sigma$ resembles a fat-tailed distribution, even if the lognormal belongs to the sub-exponential class and not to the more extreme fat-tailed class in which the pareto distribution belongs.

Through the Extreme Value Theory tools, in the previous notebook we showed how the empirical losses distribution of the SPX and BTC had similar traits to those of a lognormal distribution with a not small $\sigma$ (actually the BTC losses distribution seems to have similarities even to a pareto distribution).

Let's first plot 3 lognormal probability density functions (PDF) with the same parameter $\mu$ ($\mu = 0$) but with 3 different $\sigma$, in order to visualize our reference distribution:

fig, ax = plt.subplots(figsize=(10,10))
for s, c in zip([0.3, 0.8, 1.5], ['r', 'g', 'b']):
    x = np.linspace(0, 6, 1000)
    ax.plot(x, lognorm.pdf(x, s), f'{c}--', lw=2, alpha=0.6, label=r'lognorm pdf $\sigma$ = {}'.format(s))
    ax.legend(fontsize=14);

They all have the same median = $e^{\mu}$, i.e. median = $1$, but they have different means.
Indeed mean $= e^{\mu + \frac{\sigma^{2}}{2}}$

The median indeed is a function of $\mu$ while the mean is a function of both $\mu$ and $\sigma$.

As you can notice the heavy tailnedess feature increases by increasing $\sigma$.

Let's hyphotesize that our losses distribution is approximable with a lognormal distribution with $\sigma = 1.5$ (the blue PDF).
Hence all the moments exist, because the lognormal has all finite moments, regardless the value of $\sigma$.
For ex. A lognormal with $\sigma = 1.5$ has a standard deviation of about $9$, and an excess kurtosis of about $10000$ (you can easily find the formulas on the web).

s = 1.5 # s = sigma
mean, var, skew, ex_kurt = lognorm.stats(s, moments='mvsk')

std = var**0.5 # std = sqrt((np.exp(s**2) - 1)*np.exp(s**2))
print('Standard deviation:', std)

Standard deviation: 8.973817218116451

print('Ex. Kurtosis:', ex_kurt) # ex_kurt = np.exp(4*s**2) + 2*np.exp(3*s**2) + 3*np.exp(2*s**2) - 6

Ex. Kurtosis: 10075.252846529254

Let's say that we generate an increasing amount of $N$ observations generated by a lognormal with $\sigma = 1.5$.
$N$ will go from $100$ to $25000$.
At each iteration we compute the sample standard deviation and the sample excess kurtosis of the sample.
How would these sample moments differ from the real values?
Do they converge to the population moments when $N$ approaches to $25000$?

Before diving into the simulations it's better to define clearly what we are talking about.
The sample standard deviation and the sample ex.kurtosis are two estimators of the population standard deviation and the population ex.kurtosis, when both the $2^{nd}$ and $4^{th}$ moments are finite (as it is the case for the lognormal distribution).
I won't dive into the details of these 2 estimators: for our purposes we only need to know that they both are consistent estimators.
In other words an estimator $\hat{\theta}$ for the parameter $\theta$ is consistent when:

$$\lim_{N\to\infty} P(\lvert{\hat{\theta} - \theta}\rvert) < \epsilon = 1\,\,,\, \forall\epsilon > 0$$

The same property can be described by stating that $\hat{\theta}$ converges in probability to ${\theta}$:

$$ \DeclareMathOperator*{\plim}{plim} \plim_{N\to\infty} \hat{\theta} = \theta\,\,$$

The consistency property of an estimator tells us that with $N$ sufficiently large our estimate will turn out to be very precise: its standard error decreases more and more when $N$ becomes sufficiently large.
The problem is: what does sufficiently large mean?
If I had $N$ observations and I computed the sample standard deviation and the sample ex.kurtosis, when (i.e. for which $N$) would I have a reliable guess of what the true standard deviation and true ex.kurtosis are?
If we are measuring risk with these two mathematical concepts (standard deviation and kurtosis), can our two estimates (i.e. the stochastic values taken by our estimators) give us a precise idea of what the current risk is?
Well, in the non-light tailedness world we almost always have a very inaccurate idea of which are the true values of the standard deviation and kurtosis: most of the time $N$ is not sufficiently large to make the estimates reliable.

Let's proceed with the first simulation.
So we generate about 500 samples, each with an increasing amount of $N$ observations generated by a lognormal with $\sigma = 1.5$.
$N$ will go from $100$ (the first sample) to $25000$ (the last sample).
At each iteration we compute the sample standard deviation and the sample excess kurtosis of the sample.

# the standard deviation and ex.kurtosis formulas without using the scipy module
def std_kurtosis(s):
    std = sqrt((np.exp(s**2) - 1)*np.exp(s**2))
    ex_kurt = np.exp(4*s**2) + 2*np.exp(3*s**2) + 3*np.exp(2*s**2) - 6
    return std, ex_kurt

np.random.seed(12)
real_std, real_ex_kurt = std_kurtosis(s)  
sample_std = []
sample_kurt = []
for N in range(100, 25050, 50):
    lognorm_sample = lognorm(s=s).rvs(size=N)
    sample_std.append(lognorm_sample.std())
    sample_kurt.append(kurtosis(lognorm_sample, bias=False))
sample_std = np.array(sample_std)

fig, ax = plt.subplots(1,2 , figsize=(17,7))
fig.suptitle(f'Lognormal $\sigma = {s}$, Sample size = $25000$', fontsize=17)
ax[0].set_title('Sample Standard deviation')
ax[0].plot(range(100, 25050, 50), sample_std, '--')
ax[0].axhline(y=real_std, color='r', linestyle='--', label='True Standard Deviation')

ax[1].set_title('Sample Ex. Kurtosis')
ax[1].plot(range(100, 25050, 50), sample_kurt, '--')
ax[1].axhline(y=real_ex_kurt, color='g', linestyle='--', label='True Ex.Kurtosis')
for i in range(2):
    ax[i].legend(loc='best', fontsize=11);

We have two interesting results:

1) The sample standard deviation can take very different values compared to the real value.
In more formal terms we would say that it is characterized by a high standard error.
2) The sample ex.kurtosis always underestimates the real ex. kurtosis, and the underestimate doesn't decrease when $N$ approaches to $25000$.

Let's first focus on the point 2).
How can it be possible that the sample ex.kurtosis doesn't converge to the real value when the sample size increases if we know that the sample ex.kurtosis is a consistent estimator of the population ex.kurtosis?
Simply, the sample size is still too small.
This is a first important lesson:
in the non-light tailed world we need AN ENORMOUS AMOUNT of observations in order to make unbiased the sample kurtosis estimator: indeed a consistent estimator implies the asympthotically unbiasedness of the estimator, but before seeing this unbiasedness we might need a huge number of data.

Let's highlight that if we were studying the daily losses distribution of an asset, $25000$ loss observations would amount to about $200$ years of trading (if we assume that the number of winning days is close to the number of losing days) .
Nonetheless the estimate of the standard deviation would be unreliable (i.e. large standard error) and the estimate of the kurtosis would still be biased.

Let's now repeat the previous simulation but this time with $N$ that arrives to $250000$.
How the results would change?

np.random.seed(12)
s = 1.5
real_std, real_ex_kurt = std_kurtosis(s)
sample_std = []
sample_kurt = []
for N in range(100, 250200, 200):
    lognorm_sample = lognorm(s=s).rvs(size=N)
    sample_std.append(lognorm_sample.std())
    sample_kurt.append(kurtosis(lognorm_sample, bias=False))
sample_std = np.array(sample_std)

fig, ax = plt.subplots(1,2 , figsize=(17,7))
fig.suptitle(f'Lognormal $\sigma = {s}$, Sample size = $250000$', fontsize=17)
ax[0].set_title('Sample Standard deviation')
ax[0].plot(range(100, 250200, 200), sample_std, '--')
ax[0].axhline(y=real_std, color='r', linestyle='--', label='True Standard Deviation')

ax[1].set_title('Sample Ex. Kurtosis')
ax[1].plot(range(100, 250200, 200), sample_kurt, '--')
ax[1].axhline(y=real_ex_kurt, color='g', linestyle='--', label='True Kurtosis')
for i in range(2):
    ax[i].legend(loc=0, fontsize=11);

This is what happens:

1) The consistency property of the sample standard deviation estimator is still not evident after $250000$ observations: in other words we still don't see clearly the convergence toward the population standard deviation.
2) The unbiasdeness property of the sample ex.kurtosis now appears: the sample ex.kurtosis estimator doesn't seem to under/over estimate systematically the population ex. kurtosis.
Nevertheless its standard error is very high: the estimate remains unreliable even after $250000$ observations (almost $2000$ years of trading if those observations represented daily losses).

What if we increased our sample size $N$ to $1.5$ million?

np.random.seed(12)
s = 1.5
real_std, real_ex_kurt = std_kurtosis(s)
sample_std = []
sample_kurt = []
for N in range(100, 1_501_500, 1500):
    lognorm_sample = lognorm(s=s).rvs(size=N)
    sample_std.append(lognorm_sample.std())
    sample_kurt.append(kurtosis(lognorm_sample, bias=False))
sample_std = np.array(sample_std)

fig, ax = plt.subplots(1,2 , figsize=(17,7))
fig.suptitle(f'Lognormal $\sigma = {s}$, Sample size = $1.5 M$', fontsize=17)
ax[0].set_title('Sample Standard deviation')
ax[0].plot(range(100, 1_501_500, 1500), sample_std, '--')
ax[0].axhline(y=real_std, color='r', linestyle='--', label='True Standard Deviation')

ax[1].set_title('Sample Ex. Kurtosis')
ax[1].plot(range(100, 1_501_500, 1500), sample_kurt, '--')
ax[1].axhline(y=real_ex_kurt, color='g', linestyle='--', label='True Kurtosis')
for i in range(2):
    ax[i].legend(loc=0, fontsize=11);

We notice that:

1) The consistency property of the sample standard deviation estimator begins to appear.
2) The consistency property of the sample ex.kurtosis estimator doesn't yet appear.

It seems that for empirical data that likely come from a non-light tailedness distribution, it's better to neglect the existence of the sample kurtosis, ascertained its very slow consistency convergence feature: its estimate is unrealiable even if we had a huge sample size ($1.5$M observations is a huge sample size in financial context but maybe not so huge in other contexts).

At this point it would be helpful to repeat the same simulation but with more well-behaved lognormal data: i.e. a sample size generated from a lognormal distribution with $\sigma = 0.5$.

np.random.seed(12)
s = .5
real_std, real_ex_kurt = std_kurtosis(s)
sample_std = []
sample_kurt = []
for N in range(100, 25000, 50):
    lognorm_sample = lognorm(s=s).rvs(size=N)
    sample_std.append(lognorm_sample.std())
    sample_kurt.append(kurtosis(lognorm_sample, bias=False))

fig, ax = plt.subplots(1,2 , figsize=(17,7))
fig.suptitle(f'Lognormal $\sigma = {s}$, Sample size = $25000$', fontsize=17)
ax[0].set_title('Sample Standard deviation')
ax[0].plot(range(100, 25000, 50), sample_std, '--')
ax[0].axhline(y=real_std, color='r', linestyle='--', label='True Standard Deviation')

ax[1].set_title('Sample Ex. Kurtosis')
ax[1].plot(range(100, 25000, 50), sample_kurt, '--')
ax[1].axhline(y=real_ex_kurt, color='g', linestyle='--', label='True Kurtosis')
for i in range(2):
    ax[i].legend(loc=0, fontsize=11);

As you can see the characteristics of the sample moments are very different compared to the previous ones. Now $25000$ (and not $250000$) observations are sufficient in order to clearly see:

1) The consistency of the sample standard deviation estimator
2) Both the unbiasedness and the consistency of the sample ex.kurtosis estimator

If we repeated the same simulations with a even more well-behaved distribution like a normal we would see the consistency and unbiasedness of these two sample moments in a even more evident and fast way. But I leave this task to you.

It is now evident how statistical inference becomes very problematic when our data likely come from a non well-behaved distribution: the difference between the asymptotic theory's results ($\lim_{N\to\infty}$) and what we instead obtain with sample statistics can be misleading.

The bootstrap procedure¶

Some reader might be thinking:

Ok, let's suppose that our data come from a non well-behaved distribution: we now know that the estimate of both the standard deviation and the kurtosis are not precise if we don't have a huge amount of data, but can't we overcome this problem with a bootstrap procedure?

The brief answer is: if the hidden generating distribution has a tail behaviour similar to a fat-tailed distribution (even if it is not in that class) the bootstrap procedure can be useful to estimate the standard deviation but not the kurtosis.

Let's do a simulation:
We generate a sample of 2000 data (about $16$ years of trading if those data represent the simulated losses) from a lognormal with $\mu$ = $0$, $\sigma$ = $1.5$.
Then we apply a bootstrap procedure on that sample: we generate $10000$ bootstrap samples, each composed of $500$ data drawn from the original generated sample.
For each bootstrap sample we compute the sample mean, the sample standard deviation and the sample ex.kurtosis.
In order to estimate the mean, standard deviation and the ex.kurtosis we take the sample mean of the previous statistics.
We know that:
population mean = $3.08$
population standard deviation = $8.97$
population ex.kurtosis = $10075.25$

np.random.seed(88)
s = 1.5
real_std, real_ex_kurt = std_kurtosis(s)
sample = lognorm(s=s).rvs(size=2000)
sample_means = []
sample_stds = []
sample_kurts = []
for i in range(10_000):
    bootstrap_sample = np.random.choice(sample, size=500)
    sample_means.append(bootstrap_sample.mean())
    sample_stds.append(bootstrap_sample.std())
    sample_kurts.append(kurtosis(bootstrap_sample, bias=False))

1) Let's start with the sample means histogram:

plt.hist(sample_means, bins=50)
plt.title('Sample means histogram');

If we computed the sample mean of the sample means distribution we would obtain:

print('Sample mean of sample means:', np.array(sample_means).mean())
print('True mean:', np.exp(s**2/2))

Sample mean of sample means: 3.058175227793094
True mean: 3.080216848918031

We are very close to the real mean.

2) Let's see the sample standard deviations histogram:

plt.hist(sample_stds, bins=50)
plt.title('Sample standard deviations histogram');

If we computed the sample mean of the sample standard deviations distribution we would obtain:

print('Sample mean of sample standard deviations:', np.array(sample_stds).mean())
print('True Standard deviation:', real_std)

Sample mean of sample standard deviations: 5.985158294054182
True Standard deviation: 8.973817218116451

Now the two values differ a bit from each other: we are underestimating the standard deviation.

3) Let's see the sample ex.kurtosis histogram:

plt.hist(sample_kurts, bins=50)
plt.title('Sample ex.kurtosis histogram');

We immediately notice that even if we computed the sample mean of the sample ex.kurtosis we can't obtain a value close to the real one:

print('Sample mean of sample ex.kurtosis distribution', np.array(sample_kurts).mean())
print('True Ex.kurtosis', real_ex_kurt)

Sample mean of sample ex.kurtosis distribution 40.619199213418234
True Ex.kurtosis 10075.252846529253

The difference is sobering: the bootstrap procedure can help us for the estimate of the mean (very precise) and also for the estimate of the standard deviation (an underestimate though), but if the hidden distribution is non well-behaved we can't use the bootstrap procedure to estimate the ex.kurtosis.
Indeed in the initial charts we saw that the sample ex.kurtosis is very biased if $N$ is not extremly large.
That is, the problem of this estimator for similar fat-tailed distributions is that it is not only very erratic (i.e. high standard error) but it is also very biased and so averaging many biased estimates (through the bootstrap procedure) will still give us a biased estimate.
And in the specific case the bias is huge.