Don't play what's there, play what's not there
Miles Davis
The Quantitative space Data are means of transportation. Models and tools are the infrastructure. The analyst/modeler is the driver.
If the quality of data is good we have reliable means of transportation.
If models and tools are powerful and robust we have a stable infrastructure.
If the analyst is skilled and careful we have a wise driver.
But an overlooked question still remains: which is the destination?
The Good, the Bad and the Ugly
What does Sergio Leone's masterpiece have in common with our topic?
Almost nothing, but given the fact that our story has three virtual characters, why not?
Anyway, let's get serious now.
Inference is the arrival station of the truth town. Forecast is the arrival station of the functional town. Action is the arrival station of the fitness town.
These are three different destinations.
Inference means that through data we get to understand either the nature of a hidden process or the behaviour of a variable or the properties of a dynamics.
Most of the time we didn't actually get to understand the hidden object, but we believe we had it all figured out.
It doesn't make a big difference: the main point is that we believe in an unveiled correspondence between the subject (the inner world of the perceiver) and the object (its outer world).
And, curiously, everyone thinks he/she has the true correspondence once the inference is made.
As a matter of fact it doesn't make sense to talk about a false correspondence:
the notions of correspondence and truth are intimately related.
Inference happens when idealism becomes friends with materialism.
Forecast instead doesn't care about that at all.
Forecast simply tries to anticipate a future outcome, without delving into the inquiry of the path that returns that outcome.
It is not indolent; it simply admits that it can't grasp the dynamic cause-effect series that brings to the outcome.
Forecast is a bizarre character in our plot: it wants to anticipate a future chapter or sentence of the book (i.e. the model's output) by reading only
the current unfinished book and creating the remaining part.
In other words, forecast knows that it is not able to explain why the reality behaves how it does, but nevertheless, for whatever reasons,
it makes a bet on the dress that reality will wear.
Action is a forecast character with a survival payoff constraint.
It is much less light-hearted than the forecast character and far less cocky than the inference one.
Its aim is not so noble as the inference's aim: in fact it is not searching for why and how.
Its aim is not so ego-driven as the forecast's one: it is not about being right.
Action wants to maximize a survival payoff: and in order to do that it does not only have to guess what future will be like, but, mainly,
what will happen if this and what if that.
Action has no time nor energy to think about the truth, and no privilege allows it to only be right.
It has to arrive to the next day in the best possible conditions.
Philosophy, science and power
Statistical Inference applied to the real world represents a continuation of the quest begun with the Greek philosophy.
The main difference is that statistical inference places itself within a very different cultural
atmosphere compared to that by which Thales of Miletus, Socrates, Aristotle and all their "sons" abided.
Statistical Inference doesn't aim to reach the episteme: in fact it doesn't even believe that the concept of episteme makes sense.
Nonetheless, every time we pronounce a claim about the functioning of reality (or of a part of it) we are subconsciously
assuming that some sort of truth (maybe spatio-temporal conditioned, i.e. not incontrovertible regardless space and time) exists. Inference,Truth and Episteme are strictly related concepts, and only by looking them directly in the eyes
we understand how problematic every inference-type claim is.
I would say that the immediate reply someone could make is "hey man, I only want to find the coefficients of this regression in order to explain the relationship
between X and Y".
Or "hey man, too much intellectualism: we inferential analysts only want to know how a certain part of reality behaves in absence of
deterministic patterns: we aim to make some induction about that through data, models, tools".
Got it, but nothing actually changes in the deep chambers of the discourse.
Inference can't exist without the existence of the concept of truth.
When we are making inferences, we are making epistemology, whether we are aware of it or not.
But without being aware of it we would likely make bad inferences.
Do you agree with me? No?
Fair enough.
In fact, Statistical Inference (S.I.) is not exactly about Truth. Truth is the goal in philosophical and religious traditions.
It represents un unconditioned concept, not subordinated to concepts like
empirical verification or falsifiability: in fact it is the unconditioned concept.
In this sense Science, and hence S.I. as well, are much more humble.
The truth involved in Science and in S.I. is a truth without capital T.
Its real name is Explanatory Power. While philosophy and religion want to tell us what the world is, science wants to tell
us how the world behaves.
Statistical Inference is about giving the most powerful explanation in a stochastic territory, full of fog.
But does the most powerful explanation actually exist in a given situation? And who is able to detect it?
These questions seem to beg the existence of something deeper.
Can this something deeper be something different from the concept of Truth? It seems not.
Or can we say that Explanatory Power has no dependence, no connection with the concept of Truth in a sort of
chaotic simulation where it is what it is and that's all?
The biological world
When we work with data in a field which is related, at least to a certain degree, to the biological realm, the Inference
character can easily resemble an exhausted war veteran.
The complexity of the object of study almost always overwhelms the human ability to carry data to that destination (i.e. the Inference arrival station).
The Forecast character does a little bit better, provided that it maintains a certain dose of humility.
The real star of this scene is, as you can imagine, the Action character.
This is something that many data scientists, economists and analysts may gloss over but it is a crucial point.
Every time a quantitative framework is aimed for a biology-related field (psychology, economics, finance, social disciplines in general) we have
to keep in mind that the destination is neither the Inference nor the Forecast: it is the Action.
The studying/modelling process and the related considerations have to take this fact into account.
And, finally, if Action is the destination of every quantitative study related to a field with biological footprints, isn't it true that
data are not sufficient to make a good job?
If we evolved until now without statistics and software, isn't it true that, in the name of rigorous analysis we often miss the
aim of the analysis itself, so to speak, how can we make a good move in a timely way?
Things are complex
We know that Inference is noble, with general purposes and a story-telling that would like to be immutable.
Instead Action is arduous, with specific goals and a temporary story-telling which is only functional to the best survival.
In the Action realm a heavy burden joins the picture.
Unlike Inference and Forecasts "domains", here a fitness payoff is involved, but the problem
is: fitness payoff for whom?
The heavy burden refers to the constant discord between the subject and its complement set: the other.
When the destination of data analysis has to be Action, a good analyst
might know the best move to take in order to maximize his fitness payoff (or the fitness payoff of a group
to which he belongs) but when the object of analysis is not a single player game (e.g. poker, trading, individual health), an Action
destination most likely implies political decisions.
As a matter of fact each data-driven political decision which is taken to
make a good "ensemble" move for citizens can't be demonstrated as the best move in an increasingly connected world.
When Action doesn't apply to a single player game context, mess becomes an inevitable consequence.
Action and Truth are destined to play in two territories with no intersection.
Arrival stations are often merged
Physics uses both inference and forecast in order to deal with nature.
Good poker players use both inference (mainly in the form of pattern recognition) and action, in order to prosper at the table.
Many social scientists generally use a mixture of bad inference, forecast (especially the economists) and their "true" Action in order to... express their ideas!
Some other instead seem to recognize the intrinsic limits of their discipline (labile epistemological traits) and
try to go the extra mile in order to reach a seemingly impossible task: merging the biological
feature of social disciplines, with the opacity feature of complex systems and with the mathematical feature of hard sciences.
Last stop
The path of a quantitative player in stochastic and opaque fields begins with reflection about uncertainty
but this reflection, reflexively, recalls the attention to what is certain, absolutely known, true, from which the
above mentioned reflection seems to depend on, despite all.