Statistics Is Useless Part I: The Problem
Classical statistics is reliant on the fact that observations about the past tell us something about the future. This is useful to help us infer things about the characteristics of the variable in question. We can also use this information to make decisions today based on what is likely to happen in the future.
We use these past observations to tell us something about the generating function of the variable. We try to answer the question “What mechanism is driving the outputs of this variable?”.
This ability to use past observations (a sample) is reliant on the fact that:
- The future resembles the past (the generating function does not change over time).
- Observed observations are sufficient to determine the characteristics of the generating function.
- It is known how large our sample of observations has to be to infer properties of the generating function, with a known degree of accuracy.
This is often not the case, as we shall see later.
What’s Your Type?
There are four types of generators:

Type 1
Type 1 typically relates to games. The limits of said game are bounded (that is, you know the minimum and maximum of the possible observations).
For example, imagine you are rolling a 6-sided dice with possible integer values from the range [1, 12]. Let’s say these values are [1, 5, 2, 8, 12, 10], making the expected value of each dice roll 6.333…
It isn’t going to take you many rolls to figure out roughly what the expected value of each roll is:
Trials | E[X_n] | Absolute Error |
1 | 12 | 5.666… |
10 | 6.7 | 0.3666… |
100 | 6.65 | 0.31666… |
1,000 | 6.487 | 0.153666… |
10,000 | 6.3228 | 0.0105333… |
As you can see, as the number of trials increases, the mean of our trial rolls pretty rapidly approaches the expected value of our dice.
What’s more, even if you do get unlucky, your guess is never going to be that far off because you know what the possible values of E[X] (the expected value) are.
Type 2
Type 2 relates to “well-behaved” generator functions that have a very low probability of generating an extreme result. The moments of the samples produced by these types of generator functions converge to the actual moments of the distribution very quickly and at a known rate, making inference relatively simple. Note that the observations don’t even have to be bounded: as long as the probability of an extreme event is very small, past observations can still tell us an awful lot about the characteristics of the generating function.
Consider the gaussian distribution. Most of classical statistics is based on the fact that this distribution is well behaved and the moments converge to true values at a known rate. Look up the Central Limit Theorem and the Law of Large Numbers if you don’t believe me.
Consider a normal distribution with mean 1,000 and standard deviation 100. Let’s take some samples and see what we get:












As you can see, as the number of trials increases, observations produced from this generator pretty rapidly approximate a normal distribution.
Type 3
Type 3 is where things start to get a little more tricky. Generators of this type are tractable to some extent but convergence occurs at an unknown rate and characteristics are only revealed after a large number of observations. The sample moments from these types of generator functions converge very slowly to their true values. It may take 10^6 observations or more to reveal the actual properties of the generating function.
Taking a mix of a Poisson and a Gaussian, for example:
X = 100(S x N(0.01, 1) + (1-S) x P(0.01))
Here we know that E[X] is 2, but is this obvious from large samples?
Trials | R1 | R2 | R3 | R4 | R5 |
10 | 11 | 11 | 13 | -48 | -15 |
100 | 23.62 | 13.46 | -1.21 | 7.07 | 9.05 |
1,000 | 2.18 | -1.05 | -0.01 | 2.47 | 6.27 |
10,000 | 1.17 | 1.54 | 1.75 | 3.29 | 0.65 |
100,000 | 1.54 | 1.47 | 2.17 | 1.89 | 2.68 |
Looking at the first five runs, we don’t see any type of approximation of the true mean until we start using 100,000 trials. Even then the mean can easily be 25% greater/larger than the true value.
The general problem with combining generating functions into one is that S is unknown. We might know about how to deal with the Type 1 or 2 part, but because S is unknown we don’t know how many observations are required until we are sure about the generator as a whole.
So, even if we know our generator is of Type 3, we don’t know how many observations are required to reveal its true properties.
Type 4
Go to your local library and enter the cobwebbed statistics section. Pick up any introductory statistics textbook. Turn to any major theorem/proof, chances are you’ll find the caveat “conditional on finite mean” and/or “conditional finite variance”. In the realm of Type 4, statistics breaks down. No moments exist.
Without these conditions, statistics simply isn’t possible.
Even Then
One generator, please
The above analysis is concerned with what we can learn from observations of a generator of a known type. However, in the real world, God sadly does not tell us what type of generating function we are dealing with. We don’t observe the generator directly, just the observations from it. There is no independent way to determine what generator we are dealing with, we must infer it from past observations.
This leads to circular reasoning. How do we know if we have enough data to determine the distribution? Well, that depends on the distribution. How do we determine the distribution? From the data.
This regress-loop means that generators of Type 3 and 4 can often be mistaken for Type 1 or 2. Distributions can appear to be simple to deal with…until one observation comes along and changes everything.
The generators are a changin’
Something I’m certain you’re sick of hearing at this point on this site: life is not a controlled experiment. Like God, according to Einstein, we are not rolling dice. Most real-life situations do not involve such known characteristics. This means that real-life situations are highly susceptible to the mistaken-generator problem identified above.
But there is also another problem: the generator may be hard to pin-down in the first place. It may have slippery properties. It may have unstable or highly volatile properties. Worse, it may have identifiable characteristics, we may have determined these characteristics, we may be relying on the characteristics…only for these to change. This can happen either at random or as a result of us acting using our newfound knowledge (most systems in the real world are not closed systems).
It Ain’t What You Don’t Know That Gets You Into Trouble. It’s What You Know for Sure That Just Ain’t So.
Mark Twain
Note: this article is based on an unpublished paper by Taleb and Pipel – On the Unfortunate Problem of the Nonobservability of the Probability Distribution.