from IPython.display import display
from IPython.display import HTML
import IPython.core.display as di # Example: di.display_html('<h3>%s:</h3>' % str, raw=True)

# This line will hide code by default when the notebook is exported as HTML
di.display_html('<script>jQuery(function() {if (jQuery("body.notebook_app").length == 0) { jQuery(".input_area").toggle(); jQuery(".prompt").toggle();}});</script>', raw=True)

# This line will add a button to toggle visibility of code blocks, for use with the HTML export version
di.display_html('''<button onclick="jQuery('.input_area').toggle(); jQuery('.prompt').toggle();">Toggle code</button>''', raw=True)

import numpy as np 
import pandas as pd
import matplotlib.pyplot as plt
import warnings
from statsmodels.graphics.gofplots import qqplot
from scipy.stats import gumbel_r, expon, pareto, lognorm
from IPython.display import HTML
warnings.filterwarnings('ignore')
plt.style.use('bmh')
plt.rcParams['font.family'] = 'serif'
plt.rcParams['font.size'] = 10

from jupyterthemes import jtplot
from jupyterthemes.stylefx import set_nb_theme
set_nb_theme('grade3')

Analysis of SPX and BTC daily losses

with the Extreme Value Theory approach

Home	About	Archive

Introduction¶

We all know that in financial world there is no such thing as normal returns.
Yet, we don't exactly know why the gaussian framework is still paramount in the financial textbooks, but this is a topic for another debate.
The topic of this notebook instead is to present a gentle and informal introduction to Extreme Value Theory, with study applications relative to the daily losses (absolute value of negative returns) of S&P 500 (SPX) and Bitcoin (BTC).

The Non-Light Tailed World¶

We often talk about fat tails, but most of the time we don't have a precise idea of what a fat tailed distribution exactly means.
In fact there exist various degrees of non-light tailedness, where the two most famous distributions included in the light-tailedness set are the normal and exponential distributions.
In we sorted the non-light tailedness feature in an ascending order, we would have the following 4 classes:

Heavy Tailed distributions
Long Tailed distributions
Sub-exponential distributions
Fat tailed distributions

with the following inclusions:
$4 \subset 3 \subset 2 \subset 1$

That means that the heavy tailed class is the most general set of the non-light tailed family, while the fat tailed class is the most extreme subset.
Each class has a corresponding mathematical definition.

Before diving into the topic, let's first look at this image created by Prof. Pasquale Cirillo:

from IPython.display import Image
Image(filename='tails.jpg')

A mathematical taxonomy for the non-light tailedness world¶

Heavy-Tailed Class (1)

The most general class of the non-light tailed world.
Mathematically a distribution $F$ is Heavy-Tailed if:

$$\lim_{x\to\infty} \sup\frac{\bar{F}(x)}{e^{-\lambda x}} = \infty, \,\,\, \forall\lambda > 0$$

where the numerator represents the survival function of our r.v. $X$ (i.e.${\bar{F}(x)} \equiv P(X > x)$) and the denominator is the survival function of an exponential r.v.
Indeed in the Extreme Value Theory the Exponential distribution represents the benchmark distribution:
so a r.v. $X$ is heavy-tailed if its survival function decays more slowly than the survival function of an exponential r.v.

Long-Tailed Class (2)

It is included in the Heavy-Tailed family and it is characterized by the Explosion Principle.
Mathematically a distribution $F$ is Long-Tailed if:

$$\lim_{x\to\infty} P(X > x + k | X > x) = 1, \,\forall k > 0$$

The above property represents the Explosion Principle and it can be interpreted as follows:
if a large loss happens, an even larger loss will happen in the future.
This is a powerful property and it totally undermines the epistemology validity of statistical inferences made with sample statistics related to extreme events.
Regarding extreme events, history is not able to tell the whole story.

The problem of these first two classes is that from a practical point of view it is not easy to detect an Heavy-Tailed or a Long-Tailed distribution from data.
So we have to restrict further our attention and move on to the last two types of distributions included in the non-light tailedness world.
Indeed for these last two classes we have a series of graphical tools that can offer us educated hints about whether or not our empirical data come from a sub-exponential or a fat-tailed distribution.

Sub-Exponential Class (3)

It is included in the Long-Tailed class.
It is characterized by the Catastrophe Principle (also known as Winner takes all Principle).
Let's say we have N i.i.d. positive observations of a r.v. $X$ that in our case represent the daily losses in a given period of time, and we take the partial maximum $M_{n}$ of these observations, where
$M_{n} = \max(X_1, X_2, X_3,....,X_n)$
and the partial sum $S_{n}$
$S_{n} = \sum_{i=1}^{n}X_i$

A r.v. $X$ belongs to the Sub-Exponential class if and only if:

$$P(M_{n} > x) \approx P(S_{n} > x), \, \, as \,x \rightarrow \infty$$

It means that at a certain point the behaviour of the partial sum will be dominated by the partial maximum.
Ex: if I had a portfolio and its losses followed a Sub-Exponential distribution, the total loss of the portfolio would be given by one single big loss, while all the other losses would play a much smaller role (a sort of noise around the "big player").
The Log-Normal distribution is a Sub-Exponential distribution.

Fat-Tailed Class (4)

It is included in the Sub-Exponential class.
It belongs to the Maximum Domain of Attraction (MDA) of the Fréchet distribution (more on this later).
The mathematical property of a Fat-Tailed distribution is:

$$\bar{F}(x) = x^{-\alpha} L(x)\,,\,\alpha > 0$$

where $L(x)$ is a slow varying function (i.e. $\lim_{x\to\infty}\frac{L(kx)}{L(x)} = 1 ,\,\, \forall k > 0$).
In other words a r.v. $X$ is fat-tailed if its survival function decays as a power law, and so, much more slowly than an exponential decay.
So, to recap, a Fat-Tailed r.v. $X$ is also a Heavy-Tailed, Long-Tailed and a Sub-exponential one.
The Pareto distribution is a Fat-Tailed distribution.

Let's take the daily log returns of SPX from June 1990 to June 2020 and the daily log returns of BTC from September 2014 to June 2020.

sp500 = pd.read_csv(r'C:\Users\Admin\JupyterFiles\Value Investing\^SP500TR.csv', index_col=0)

spx_ret = np.log(sp500['Adj Close'] / sp500['Adj Close'].shift())[1:]

btc = pd.read_csv(r'BTC-USD.csv', index_col=0)

btc_ret = np.log(btc['Adj Close'] / btc['Adj Close'].shift())[1:]

Let's graphically compare the two distributions with their "corresponding" normal generated sample (with the sample mean and sample standard deviation of our securities used as location and scale of the corresponding generated normal samples).
It is immediate to notice the great difference between the daily returns distributions of SPX and BTC and their "corresponding" gaussians.

fig, ax = plt.subplots(2, 2, figsize=(13,13))
ax[0, 0].set_title(r'$SPX$ daily returns histogram')
spx_ret.hist(ax=ax[0, 0], bins=120);
ax[0, 1].set_title(r'$Gaussian$ generated sample histogram (same loc. and scale)')
ax[0, 1].hist(np.random.normal(loc=spx_ret.mean(), scale=spx_ret.std(), size=len(spx_ret)), bins=120)

ax[1, 0].set_title(r'$BTC/USD$ daily returns histogram')
btc_ret.hist(ax=ax[1, 0], bins=100);
ax[1, 1].set_title(r'$Gaussian$ generated sample histogram (same loc. and scale)')
ax[1, 1].hist(np.random.normal(loc=btc_ret.mean(), scale=btc_ret.std(), size=len(btc_ret)), bins=100);

One important difference between log returns ($r_{t}$) and simple returns ($R_{t}$) is that their distributions have two different supports:
The distribution followed by $r_{t}$ has an unbounded support, i.e. $(-\infty, +\infty)$
The distribution followed by $R_{t}$ has a finite lower bound, i.e. its support is $[-1, +\infty)$

This difference is very important when we study the tail behaviour of the returns.
Indeed, let's focus our attention only on negative returns:
If we take the absolute value of negative daily returns we have the daily losses in relative terms (losses for a long position).
Our study will focus on these losses.
Since the theoretical distributions through which most of the Extreme Value Theory is developed have an infinite upper bound (indeed we want to study the "extremeness" feature of a r.v. : the unbounded right-end of the support gives the chance to do that), only by using the log returns we can have a losses distribution $G_{x}$ with a support with infinite upper bound $[0, \infty)$:

If instead we used the simple returns, the support of the losses would have a finite upper bound (i.e. $[0, 1]$) and we should act in a different way (for example by using a log-transformation and working with a dual distribution, not the real one. See Appendix A for more specifics on dual distribution).

So let's take the absolute values of the negative $SPX$ and $BTC$ daily log returns, i.e. the daily losses.

spx_losses = np.abs(spx_ret[spx_ret < 0])
btc_losses = np.abs(btc_ret[btc_ret < 0])

fig, ax = plt.subplots(1, 2, figsize=(14,7))
fig.suptitle('Daily Losses Histogram', fontsize=15)
spx_losses.hist(ax=ax[0], bins=80, label=r'$SPX$ Losses Distribution ({} obs)'.format(len(spx_losses)))
btc_losses.hist(ax=ax[1], bins=80, label=r'$BTC$ Losses Distribution ({} obs)'.format(len(btc_losses)))
for i in range(2):
    ax[i].legend(fontsize=13);

Let's focus on the tail (top 25%) of this daily losses distribution: we take the survival function $\bar{F}(x) = P(X > x)$ with $x = 75th \, percentile$ of the daily losses distribution.

lower_bound_spx = np.percentile(spx_losses, 75)
lower_bound_btc = np.percentile(btc_losses, 75)
print('75th percentile of SPX losses: ', round(lower_bound_spx, 2))
print('75th percentile of BTC losses: ', round(lower_bound_btc, 2))

75th percentile of SPX losses:  0.01
75th percentile of BTC losses:  0.03

spx_losses_tail = spx_losses[spx_losses > lower_bound_spx]
btc_losses_tail = btc_losses[btc_losses > lower_bound_btc]

fig, ax = plt.subplots(1, 2, figsize=(14,7))
fig.suptitle('Tail (top 25%) Losses Histogram', fontsize=15)
spx_losses_tail.hist(ax=ax[0], bins=60, label=r'$SPX$ Losses Distribution ({} obs)'.format(len(spx_losses_tail)))
btc_losses_tail.hist(ax=ax[1], bins=60, label=r'$BTC$ Losses Distribution ({} obs)'.format(len(btc_losses_tail)))
for i in range(2):
    ax[i].legend(fontsize=13);

QQ Plot (vs Exponential quantiles)¶

Let's start by plotting the QQ Plot of SPX and BTC tail losses.

fig, ax = plt.subplots(2, 2, figsize=(13,12))
qqplot(spx_losses, ax=ax[0, 0], dist=expon, line='r')
ax[0, 0].set_title(r'$SPX$ Losses QQ Plot')
qqplot(spx_losses_tail, ax=ax[0, 1], dist=expon, line='r')
ax[0, 1].set_title(r'$SPX$ Losses Tail (top 25%) QQ Plot')
qqplot(btc_losses, ax=ax[1, 0], dist=expon, line='r')
ax[1, 0].set_title(r'$BTC$ Losses QQ Plot')
qqplot(btc_losses_tail, ax=ax[1, 1], dist=expon, line='r')
ax[1, 1].set_title(r'$BTC$ Losses Tail (top 25%) QQ Plot')
for i in range(2):
    for k in range(2):
        ax[i, k].set_xlabel('Exponential Theoretical Quantiles')
plt.subplots_adjust();

In the above chart the x-axis represents the Exponential distribution quantiles (not the Normal's ones).
Indeed in the Extreme Value Theory the Exponential distribution represents our benchmark.
In the above QQ Plots we see that the SPX and BTC daily losses decay more slowly than an exponential decay: this is represented by the fact that blue dots end up above the red line, that represents the line on which the quantiles should lie if the tail losses decayed with an exponential speed: actually they don't decay with an exponential speed. They decay more slowly.

Zipf Plot¶

In order to say that an empirical distribution can be included in the fat-tailed class, a necessary (but not sufficient condition) is that the Zipf plot has to show a straight line with negative slope.
Why that?
Because if the distribution is fat-tailed the survivor function is:

$\bar{F}(x) = (\frac{x}{x_{0}})^{-\alpha} \;\;, 0 < x_{0} \leq x \Rightarrow \log{\bar{F}(x)} = \alpha\log{x_{0}} - \alpha\log{x}$

def survival(data):
    """ Compute Survival Function """
    x = np.sort(data)
    n = x.size
    y = np.arange(1, n + 1) / n
    return(x, 1 - y)

def zipf(data, f: 'function'):
    x, f_x = f(data)
    return np.log(x), np.log(f_x)

np.random.seed(122)
pareto_sample = pareto(b=1.5).rvs(size=3000)
top_25_pareto = pareto_sample[pareto_sample > np.percentile(pareto_sample, 25)]

expon_sample = expon.rvs(size=3000)
top_25_expon = expon_sample[expon_sample > np.percentile(expon_sample, 25)]

lognor_sample = lognorm(s=1).rvs(size=3000)
top_25_lognor = lognor_sample[lognor_sample > np.percentile(lognor_sample, 25)]

norm_rv = np.random.randn(3000)
norm_losses = np.abs(norm_rv[norm_rv < 0])
norm_top_25_losses = norm_losses[norm_losses > np.percentile(norm_losses, 25)]

Let's first recall the Zipf Plot of 4 fundamental distributions:

Pareto
Exponential
Normal
Lognormal ($\sigma$ = 1)

fig, ax = plt.subplots(figsize=(10,7))
fig.suptitle('ZIPF Plot of Tail Losses', fontsize=18)

ax.scatter(*zipf(norm_top_25_losses, survival), label='Normal', s=30, alpha=0.6)
ax.scatter(*zipf(top_25_pareto, survival), label=r'Pareto ($\alpha=1.5$)', s=30, alpha=0.6)
ax.scatter(*zipf(top_25_expon, survival), label='Exponential', s=30, alpha=0.6)
ax.scatter(*zipf(top_25_lognor, survival), label=r'Lognormal ($\sigma = 1$)', s=30, alpha=0.6)
ax.legend(fontsize=14);

Now let's show the ZIPF Plots of SPX and BTC tail losses (and we compare them with the ZIPF of an Exponential r.v.).

fig, ax = plt.subplots(1, 2, figsize=(18,8))
fig.suptitle('$ZIPF$ Plots of tail losses (top 25%)', fontsize=18)
ax[0].scatter(*zipf(spx_losses_tail, survival), label='SPX top 25% losses', c='r', s=30, alpha=0.6)
#ax[0].scatter(*zipf(exp_rv_tail_1, survival), label='Exponential sample top 25% losses (same loc. and scale)', s=30, alpha=0.6)
ax[0].set_title('$SPX$')
ax[1].scatter(*zipf(btc_losses_tail, survival), label='BTC top 25% losses', c='teal', s=30, alpha=0.6)
#ax[1].scatter(*zipf(exp_rv_tail_2, survival), label='Exponential sample top 25% losses (same loc. and scale)', s=30, alpha=0.6)
ax[1].set_title('$BTC$')
for i in range(2):
    ax[i].set_xlabel('$\log(x)$', fontsize=16)
    ax[i].set_ylabel('$\log(1 - F(x))$', fontsize=16)
    ax[i].legend(fontsize=13)
plt.subplots_adjust();

As we can notice in the above plots both the SXP and BTC tails of losses (top 25% of losses) seem to decay with a sort of "midway speed" between a Lognormal with not small $\sigma$ and a Pareto distribution.

Peaks Over Thresholds (POT) and MEPLOT¶

Now let's talk about a very interesting property of the fat-tailed distributions related to the Mean Excess Function.
Let's define the Mean Excess Function as:

$M(\nu) \equiv E[X-\nu | X > \nu]$

It can be viewed as a sort of expected shortfall, but shifted (i.e. we subtract $\nu$ from $X$).

I like this function because its interpretation is very intuitive and useful.
Indeed, let's take 2 light-tailed different distributions:

Normal
Exponential

Let's start with Normal:
If I knew that a normal random variable has taken a positive value (i.e. $\nu = 0$), or I knew that the same random variable has taken a value higher than 3 (i.e. $\nu = 3$), would the two Mean Excess Functions be the same?
In other words, if $X$ follows a standard normal distribution:
does $E[X | X > 0]$ would be the same of $E[X - 3| X > 3]$ ?
(where 0 and 3 represent two virtual losses).
Clearly not.
The first mean excess function is greater than the second.
In other words, for a normal distribution, the more I increase the threshold $\nu$ the more the Mean Excess Function decreases.

def mef(data, v):
    Hdata = data[data > v]
    Hdata_rescaled = Hdata - v
    return Hdata_rescaled.mean()

We can see this concept with the Sample Mean Excess Plot (i.e. MEPLOT).
Let's generate a sample from a standard normal e let's take only the losses of this sample (abs value of negative observations).
In the following chart we can see that there is an inverse relation between the threshold and the Mean Excess Function (we can ignore the last dots to the right in the following charts).

np.random.seed(1)
norm_rv = np.random.randn(10000)
norm_losses = np.abs(norm_rv[norm_rv < 0])
mef_normal = [mef(norm_losses, v) for v in np.sort(norm_losses)]
fig, ax = plt.subplots()
ax.set_title(r'Sample Mean Excess Plot of $Normal$ Losses', fontsize=14)
ax.set_xlabel(r'Threshold $\nu$')
ax.set_ylabel(r'Average of $(x-\nu) | x > \nu$', fontsize=14)
ax.scatter(np.sort(norm_losses), mef_normal, s=20, alpha=0.6);

What about the Exponential case? We should know that the Exponential distribution has the memoryless property:
once I knew that the variable exceeded a given threshold $\nu$, that threshold would not give info about the exceeding value taken by $Y$.
In other words the conditional probability of $Y - \nu$ does not depend on $\nu$.

np.random.seed(1)
exp_rv = expon.rvs(size=10000)
mef_exp = [mef(exp_rv, v) for v in np.sort(exp_rv)]
fig, ax = plt.subplots()
ax.set_title(r'Sample Mean Excess Plot of $Exponential$ Sample', fontsize=14)
ax.set_xlabel(r'Threshold $\nu$')
ax.set_ylabel(r'Average of $(x-\nu) | x > \nu$', fontsize=14)
ax.scatter(np.sort(exp_rv), mef_exp, s=20, alpha=0.6);

fig, ax = plt.subplots(1, 2, figsize=(16,7))

lognorm_rv_03 = lognorm(s=0.3).rvs(size=10000)
lognorm_rv_15 = lognorm(s=1.5).rvs(size=10000)
np.random.seed(32)
mef_lognorm_rv_03 = [mef(lognorm_rv_03, v) for v in np.sort(lognorm_rv_03)]
mef_lognorm_rv_15 = [mef(lognorm_rv_15, v) for v in np.sort(lognorm_rv_15)]
ax[0].set_title(r'Sample Mean Excess Plot of $Lognormal$ Sample ($\sigma = 0.3$)', fontsize=14)
ax[0].set_xlabel(r'Threshold $\nu$', fontsize=14)
ax[0].set_ylabel(r'Average of $(x-\nu) | x > \nu$', fontsize=14)
ax[0].scatter(np.sort(lognorm_rv_03), mef_lognorm_rv_03, s=20, alpha=0.6);
ax[1].set_title(r'Sample Mean Excess Plot of $Lognormal$ Sample ($\sigma = 1.5$)', fontsize=14)
ax[1].set_xlabel(r'Threshold $\nu$', fontsize=14)
ax[1].set_ylabel(r'Average of $(x-\nu) | x > \nu$', fontsize=14)
ax[1].scatter(np.sort(lognorm_rv_15), mef_lognorm_rv_15, s=20, alpha=0.6);

As we notice, the MEPLOT of the Lognormal with a small $\sigma$ is very different compared to that with a not small $\sigma$.
In fact the tail behaviour of a Lognormal distribution is highly dependent by $\sigma$.
Data generated by a Lognormal with a high $\sigma$ can be easily mistaken with a fat-tailed distribution: we have to recall this fact when we'll try to estimate if empirical tail losses are likely to follow a *fat-tailed* distribution or a *lognormal* distribution with a not small $\sigma$.

Now what about a random variable the follows a pareto distribution (fat-tailed class) with finite mean ($\alpha=1.5$)?

np.random.seed(12)
par_rv = pareto(b=1.5).rvs(size=10000)
mef_pareto = [mef(par_rv, v) for v in np.sort(par_rv)]
fig, ax = plt.subplots(figsize=(9,6))
ax.set_title('Sample Mean Excess Plot of $Pareto$ Sample', fontsize=14)
ax.set_xlabel(r'Threshold $\nu$', fontsize=14)
ax.set_ylabel(r'Average of $(x-\nu) | x > \nu$', fontsize=14)
ax.scatter(np.sort(par_rv), mef_pareto, s=20);

Yes, for a fat-tailed variable, the more we increase the threshold the more the Mean Excess Function increases.
If we knew that 2 r.v. $J$ and $L$ that follow the same $Pareto (\alpha = 1.5)$, so with finite mean and infinite variance, have exceeded respectively the value of $3$ and the value of $50$, it would follow that $E[J - 3] < E[L - 50]$.
In other words if a r.v. follows a paretian behavior its Mean Excess Function grows approximately linearly in the threshold $\nu$ (but the mean of the r.v. has to be finite so that Mean Excess Function makes sense).

In the Extremistan, if we knew that the r.v. exceeded a threshold, the more the exceeded threshold increases the more the r.v. will be far from the threshold.

Now let's come back to the tail (top 25%) of the SPX and BTC empirical losses distributions.
How does their empirical Mean Excess Function behave?

mef_spx_losses_tail = [mef(spx_losses_tail, v) for v in np.sort(spx_losses_tail)]
mef_btc_losses_tail = [mef(btc_losses_tail, v) for v in np.sort(btc_losses_tail)]
fig, ax = plt.subplots(1, 2, figsize=(15,7))
ax[0].set_title(r'Sample Mean Excess Plot of $SPX$ tail losses', fontsize=14)
ax[0].scatter(np.sort(spx_losses_tail), mef_spx_losses_tail, color='r', s=20, alpha=0.6);
ax[1].set_title(r'Sample Mean Excess Plot of $BTC$ tail losses', fontsize=14)
ax[1].scatter(np.sort(btc_losses_tail), mef_btc_losses_tail, color='teal', s=20, alpha=0.6);
for i in range(2):
    ax[i].set_xlabel(r'Threshold $\nu$', fontsize=14)
    ax[i].set_ylabel(r'Average of $(x-\nu) | x > \nu$', fontsize=14)

This seems an interesting result:

Regarding the SPX, the losses distribution seems to be characterized by an increasing Mean Excess Function, at least until a certain threshold, beyond which the plot seems to become concave.
If we knew that the SPX closed with a daily loss $x$ above $\nu$ the $E[x - \nu]$ increases with $\nu$, at least until a certain loss-threshold.

In this sense it seems to be that the S&P 500 losses actually follow a sort of distribution in the middle between the Fréchet type (to which pareto belongs) and Gumbel type (to which exponential, normal and log-normal distributions belong). More on that later.

Regarding the BTC, the losses distribution seems to be characterized by an increasing Mean Excess Function, and it doesn't become concave.
For the BTC the Mean Excess Function of its losses grows linearly with the threshold.
Let's remember that this is a feature of all fat-tailed distributions.
So we have a sign that *BTC* daily losses could be characterized by a paretian behavior in the tail.

The problem is that in order to distinguish with accuracy a true fat-tailed distribution with almost fat-tailed distribution we need a lot of data, at least 10.000 observations according to Prof. P. Cirillo (and this fact will be confirmed by our simulations at the end of this notebook).

The Mean Excess Function is another term to refer to Peaks Over Thresholds.
The distribution of these exceeding values is called the conditional excess distribution function of the r.v. $X$ (i.e. $F_{\nu}(x) \equiv P(X - \nu \leq x | X > \nu)$ and it is closely related to the notion of Generalized Pareto Distribution (GPD) and the linked Balkema and de Haan, 1974, Pickands, 1975 theorem.

GDP and GEV: a brief recall¶

The GPD distribution depends on a shape parameter $\xi$ and it describes the behavior of the conditional excess distribution function $F_{\nu}(x)$ (let's recall that $F_{\nu}(x) \equiv P(X - \nu \leq x | X > \nu)$ when we bring the threshold $\nu$ towards the right-end of the support of the underlying distribution.
This is valid only for those random variables whose normalized maxima converge in distribution to a Generalized Extreme Value distribution. There is indeed a close relation between the GPD framework and GEV framework, and we will understand that at the end of this section.

Let's say we have N i.i.d. observations of a r.v. $X$, and we take the partial maximum $M_{n}$ of these observations:
$M_{n} = \max(X_1, X_2, X_3,....,X_n)$
Do $a_{n}$ and $b_{n}$ exist so that the normalized maximum $\frac{M_{n}-a_{n}}{b_{n}}$ with $b_{n} > 0$ converge in distribution to a non-degenerate certain distribution $G(x)$?

The Fischer-Trippet, 1928 theorem answers to that question with the notion of the Generalized Extreme Value (GEV) distribution.
It represents a sort of Central Limit Theorem of the Maxima and it classifies a limiting distribution of a normalized maxima random variable in three different types of distributions, depending on the value of a specific shape parameter $\xi$, that governs the tail behaviour of the distribution.

The Fréchet distribution, $\xi > 0$ (finite lower limit. Any underlying distribution with fat tails)
The Gumbel distribution, $\xi = 0$ (unlimited. e.g. these underlying distributions: normal, lognormal, exponential, gamma, F)
The Weibull distribution, $\xi < 0$ (limited e.g. uniform underlying distribution)

One caveat about the log-normal distribution.
From a GEV standpoint the log-normal is comparable to other well-behaved distributions like the normal or the exponential.
Nonethless its tail behavior is more tricky as we've already seen: as I wrote at the beginning of the notebook, while the normal and exponential are light-tailed, the log-normal is a Sub-Exponential distribution, and so the Catastrophe Priniciple applies to it.
In this sense it is not a well-behaved distribution at all (the more we increase $\sigma$ the more a lognormal behaves like a fat-tailed r.v.).

Now let's report the two pivotal theorems of the Extreme Value Thoery.

The Fischer-Trippet, 1928 theorem states the following:
if there exist constants $a_{n} \in R$ and $b_{n}$ > 0 and some non-degenerate distribution $G$ such that $\frac{M_{n}-a_{n}}{b_{n}}$ $\xrightarrow{d}$ $G(x)$,

then $G$ is a $GEV$ distribution (we don't report here the PDF and CDF of the GEV, you can easily find them on the web), and so it belongs to one of the three distributions listed above.

In this case we will say that $X$ belongs to the Maximum Domain of Attraction of $G$, i.e. $X \in MDA(G)$.

Once we have defined the Fischer-Trippet theorem we can state the other fundamental theorem of Extreme Value Theory, the so called Pickands–Balkema–de Haan theorem.

The Pickands–Balkema–de Haan theorem states the following:
Let $(X_{1},X_{2},...X_{n})$ be a sequence of i.i.d. r.v, and let $F_{\nu}$ be their conditional excess distribution function:

If $X \in MAD(G_{\xi})$ $\Rightarrow$ $F_{\nu}(x)$ $\rightarrow$ $G(x;\sigma(u),\xi), \, as \, \nu \rightarrow x_{N}$

where $x_{N}$ is the right-end of the support of $X$ and $G(x;\sigma(u),\xi) $ is a Generalized Pareto Distribution (GPD) (we don't report here the PDF and CDF of the GPD).

In other words the Pickands–Balkema–de Haan theorem states that if $X$ belongs to the Maximum Domain of Attraction of a $GEV$ distribution, hence the distribution of excess values (i.e. $X - \nu$) above a high threshold is very well approximable by a GDP distribution.

An important result is that if the distribution of normalized maxima of an underlying distribution F converges in distribution to one of the GEV distributions-type with the corresponding parameter $\xi$, then the distribution of exceedances over threshold converges in distribution to the GPD with the same parameter $\xi$.
GDP and GEV are two mathematical concepts that view the same issue but with different perspectives.

We are not going to dive further in the mathematical details of the issue (in the Resources section you can find more formal material related to that).

Max to Sum Ratio Plot (MS)¶

Before concluding our notebook, let's first introduce the notion of Maximum to Sum ratio of order $p$.
Let's say we have N i.i.d. observations of a r.v. X, and we take the maximum $M_{n}(p)$ of these observations, where

$M_{n}(p) = \max(|X_1|^{p}, |X_2|^{p}, |X_3|^{p},....,|X_n|^{p})$

and the sum $S_{n}(p)$

$S_{n}(p) = \sum_{i=1}^{n}(|X_i|^{p})$

we define:
$R_{n}(p) = \frac{M_{n}(p)}{S_{n}(p)}$

The Law of Large Number (LLN) tells us that $R_{n}(p) \longrightarrow 0 \Longleftrightarrow E[|X|^{p}] < \infty$,
In other words the maximum to sum ratio of order $p$ converges almost surely to zero if and only if the moment of order $p$ is finite.

def max_to_sum_ratio(array, order):
    ratios = []
    for i in range(1, len(array)):
        max_ = np.max(np.abs(array[:i])**order)
        sum_ = np.sum(np.abs(array[:i])**order)
        r_ = max_ / sum_
        ratios.append(r_)
    return np.array(ratios)

Let's show the MS Plot of 4 fundamental sample of 2000 observations generated from:

Pareto ($\alpha = 1.5$)
Exponential ($\lambda = 1$)
Lognormal ($\sigma$ = 0.5)
Lognormal ($\sigma$ = 1.5)

np.random.seed(12)
par_rv = pareto(b=1.5).rvs(size=2000)
exp_rv = expon.rvs(size=2000)
log_norm_rv_05 = lognorm(s=0.5).rvs(size=2000)
log_norm_rv_15 = lognorm(s=1.5).rvs(size=2000)
fig, ax = plt.subplots(2, 2, figsize=(16,13))
ax[0, 0].set_title(r'$Pareto$ Max to Sum Ratio (only 1st Moment finite)')
ax[0, 1].set_title(r'$Exponential$ Max to Sum Ratio')
ax[1, 0].set_title(r'$Lognorm$ Max to Sum Ratio ($\sigma = 0.5$)')
ax[1, 1].set_title(r'$Lognorm$ Max to Sum Ratio ($\sigma = 1.5$)')
for i in range(1, 5):
    ax[0, 0].plot(max_to_sum_ratio(par_rv, i), label=f'Order {i}')
for i in range(1, 5):
    ax[0, 1].plot(max_to_sum_ratio(exp_rv, i), label=f'Order {i}')
for i in range(1, 5):
    ax[1, 0].plot(max_to_sum_ratio(log_norm_rv_05, i), label=f'Order {i}')
for i in range(1, 5):
    ax[1, 1].plot(max_to_sum_ratio(log_norm_rv_15, i), label=f'Order {i}')
for j in range(2):
    for i in range(2):
        ax[j, i].legend(loc='best')
        ax[j, i].set_xlabel(r'$N$ (partial count)', fontsize=12)
        ax[j, i].set_ylabel(r'$R_{N}(p)$', fontsize=14)

Let's briefly comment the above charts:

Pareto sample: only the $1^{st}$ moment exists, and we see what we expect: the MS of order $1$ converges to zero, while the other ratios of order greater than $1$ don't converge to $0$.
Exponential sample: all theoretical moments exist, and we see what we expect: all the ratios converge to zero.
Lognormal ($sigma = 0.5$) sample: all theoretical moments exist, and we see what we expect: all the ratios converge to zero.
Lognormal ($sigma = 1.5$) sample: all theoretical moments exist, but we do not see what we expect: only the ratio of order $1$ seems to converge to $0$, while all the other ratios don't seem to converge to zero.
Why?

Because we have too few data: 2000 observations are insufficient to see the convergence of $R_{n}(p)$ to $0$ when the *Lognormal* has a **not** small $\sigma$.
In fact, as we already said, we need a lot of observations in order to distinguish a lognormal from a fat-tailed.
How many?

In order to gauge the answer we have repeated the same MS plot for the Lognormal ($\sigma = 1.5$) case but this time with 15000 observations.
Now we can notice the all the ratios $R(p)$ have decreased, nevertheless we are not yet able to see the convergence to $0$ for $p > 1$.

np.random.seed(12)
log_norm_rv_15 = lognorm(s=1.5).rvs(size=15000)
fig, ax = plt.subplots(figsize=(10, 7))
ax.set_title(r'$Lognorm$ Max to Sum Ratio ($\sigma = 1.5$)')
for i in range(1, 5):
    ax.plot(max_to_sum_ratio(log_norm_rv_15, i), label=f'Order {i}')
ax.set_xlabel(r'$N$ (partial count)', fontsize=12)
ax.set_ylabel(r'$R_{N}(p)$', fontsize=14)
ax.legend(loc='best');

Instead the increased number of observations (15000) doesn't change at all the results for the Pareto case.
$R(p)$ doesn't converge to 0 for $p > 1$ because it can't: all the $p^{th}$ moments with $p > 1$ don't exist.

np.random.seed(12)
par_rv = pareto(b=1.5).rvs(size=15000)
fig, ax = plt.subplots(figsize=(10, 7))
ax.set_title(r'$Pareto$ Max to Sum Ratio ($\alpha = 1.5$)')
for i in range(1, 5):
    ax.plot(max_to_sum_ratio(par_rv, i), label=f'Order {i}')
ax.set_xlabel(r'$N$ (partial count)', fontsize=12)
ax.set_ylabel(r'$R_{N}(p)$', fontsize=14)
ax.legend(loc=4);

What about the MS Plot for $SPX$ and $BTC$ losses?

fig, ax = plt.subplots(1, 2, figsize=(18,8))
ax[0].set_title(r'$SPX$ Max to Sum Ratio')
ax[0].set_xlabel(r'$SPX$ N° losses', fontsize=12)
ax[1].set_title(r'$BTC$ Max to Sum Ratio')
ax[1].set_xlabel(r'$BTC$ N° losses', fontsize=12)
for i in range(1, 5):
    ax[0].plot(max_to_sum_ratio(spx_losses, i), label=f'Order {i}')
for i in range(1, 5):
    ax[1].plot(max_to_sum_ratio(btc_losses, i), label=f'Order {i}')
    
for i in range(2):
    ax[i].set_ylabel(r'$R_{N}(p)$', fontsize=14)
    ax[i].legend()

As we can notice in the above charts, the first moment (the mean) is finite both for SPX and BTC.
Regarding the $2^{nd}$ moment (variance) instead, while it seems to be finite for the SPX losses distribution, it is unclear whether it is finite or infinite for the BTC losses distribution, because it doesn't seem to converge to zero with evidence.
Moreover for the BTC it might not make sense to compute the sample skewness and the sample kurtosis because it is likely that the $3^{rd}$ and $4^{th}$ moments are not finite (while, trivially, the sample statistics are always finite).
It is instead not so clear whether the SPX losses distribution has a finite $3^{rd}$ moment or not.
For the SPX the 4th moment of the losses distribution is likely not to exist (this fact is confirmed also by N.N.Taleb analysis in his Statistical consequences of fat tails): in this case the estimate of the true volatility could be unreliable even if the $2^{nd}$ moment actually existed .
The conclusion is: we are sure that the $1^{st}$ moment exists, but from the $2^{nd}$ moment on (for BTC) and from the $3^{rd}$ moment on (for SPX), we have too few data in order to make any strong claims about the existence of these moments.

This means that we should use caution when we talk about the volatility, skewness and kurtosis of stock and bitcoin returns.

Resources:¶

Quantitative Risk Management Course by Prof. Pasquale Cirillo (https://www.youtube.com/playlist?list=PLgCR5H4IzggHyHw8dalrVHqHAqZfmTeWa)
Tail risk of contagious diseases (Pasquale Cirillo, N.N. Taleb)
Statistical Consequences of Fat Tails (N.N. Taleb)
Extreme Value Theory: An Application to the Peruvian Stock Market Returns (Rodriguez Gabriel)
http://www.iam.fmph.uniba.sk/institute/jurca/qrm/Chapter4.pdf

Appendix A¶

The dual distribution¶

Every time we have a positive r.v. $X$ that has a finite upper bound and we study it with an EVT approach (the lower bound is automatically finite when we study the tail behavior), we need to take into account this aspect, otherwise we would make epistemological and statistical mistakes by considering it as a r.v. with possible infinite moments (when for example we estimate the tail index $\xi$).
Indeed if $X$ has a finite upper bound, all moments exist, no matter how large it is.

One way to solve this impasse is by using a log-transformation, in order to work with the so called dual distribution.

Let's define the function:

$\phi(X) = L - H \log{\frac{H - X}{H - L}}$

where $L$ and $H$ are the lower and upper bound of the tail under scrutiny.

We define the random variable: $Z = \phi(X)$
$Z$ support is: $[L, +\infty]$

Notice that the transformation induced by $\phi(·)$ does not depend on any of the parameters of the distribution of $X$ , and that $\phi(·)$ is monotone.

We call the distributions of $X$ and $Z$ the real and dual distributions.
For values $x$ much smaller than $H$ , the two distributions are indistinguishable, but the dual distribution ($Z$) has an infinite upper bound, while the real (original) distribution ($X$) has a finite one.

So we can derive the tail behavior of our empirical r.v. $X$ by studying the tail behaviour of its transformation $Z$.