## Quantitative Skepticism and Mixtures

• 5k
In statistical modelling, a type of model is called a (finite) mixture distribution. This states that we have a random variable $X_M$ with distribution function $P_M$ satisfying a finite mixture distribution if and only if there is some finite sequence of probability density functions $P_1, P_2, ... , P_n$ and some set of weights $p_1, p_2,...,p_n \geq 0$ such that $\sum_{i} p_i$=1 that give:

$P_M = \sum_{i}^{n} p_i P_i$

For example, if $P_1 \sim N(-1,1)$ and $P_2 \sim N(5,1)$ and weights $p_1 = p_2 = 0.5$, this looks like...

SRAP UPLOADED THE FILE SINCE I COULDN'T GET IT WORKING.

You can see the peaks at the means for $P_1$ (-1) and $P_2$ (5), this kind of clustering behaviour is typical of mixtures. You can calculate the mean (here denoted $E$ and variance (here denoted $V$ for mixture distributions as the weighted mean of the two variables. So in this case, we have:

$E(X_M)=0.5 \int X_1 P(X_1) + 0.5 \int X_2 P(X_2) = 0.5E(X_1)+0.5E(X_2)$

which is $0.5*-1+0.5*5=2$.

Where $X_1$ satisfies $P=N(-1,1)$ and $X_2$ satisfied $P=N(5,1)$. Generally, this implies that the mean is the sum of the two means. Also, the variance is the sum of the two variances (if the variables are independent).

Now consider we have some data which are estimating Newton's second law $F=ma$, measuring the impact force $F$ and acceleration $a$ to estimate the mass $m$. This is estimated very precisely, and the measurement errors are harmlessly assumed to be Gaussian/Normal. So, this would give:

$F=ma \rightarrow F \sim m*N(a,\sigma^2)$ where $\sigma^2$, the measurement error, is very close to 0 since the measurements are precise (this is a high school physics experiment and analysis rendered in terms of the statistics involved. You can now imagine an alternative construction, a mixture distribution model of $F$ such as:

$F\sim 0.999*m*N(a,\sigma^2)+0.001*m*t_1$

This is saying that most of the time we observe faithful measurements of Newton's second law, but 1 in 1000 times we observe something completely nuts and unexpected - the t distribution on 1 degree of freedom. By shrinking the probability multiplying the t distribution term this model can be made consistent with any set of observations - it will look just like Newton's second law but with a hidden pathological component.

What does this pathological component do? Well, it induces two properties in the observations.

(1)The measurements of force have infinite variance.
(2)The measurements of force have infinite mean.

This means the estimate of the measurement error are entirely pointless, and that it is impossible to obtain an estimate of the acceleration through the mean of repeated concordant measurements. As a recent thread suggested: if we have no evidence by which to distinguish a proposition p from not-p, we can't learn anything else. For a sufficiently small p, we can render this model consistent with every set of observations of Newton's second law. But since we cannot distinguish the behaviour of Newton's law from the pathological law, we cannot learn anything about Newton's second law from any data.

Thus there is no evidence for Newton's second law.
• 3.5k

What you're talking about here is the underdetermination of theory by data, yes?

Every dataset is a duck-rabbit.

Goodman shows the same effect can be found in induction-- that there are always pathological predicates that can fit your observations just as well as the usual "entrenched" predicates (or kinds).

I think, in a general way, @T Clark is on the right track by asking which theory you can act on. If you want to build a rocket, would you use $\small F=ma$ or one of the pathological alternatives?

That's sort of a "hot take" -- am I taking about what you're talking about?
• 5k

Yeah I think so. I think it's a way of highlighting that principles of parsimony play a fundamental role in determining scientific hypothesis - it's also a constructive argument, the procedure I gave above could be done to any experiment.*

There's a bit of a wrinkle though, there are systems which arguably should be modelled like this - a 'business as usual' mode and a 'oh god wtf is happening' mode, where the 'oh god wtf is happening' mode changes the global properties of the model. This has been brought up by Mandlebrot and Taleb with regards to financial time series. A bastardised version of the argument would be that financial models typically place too low a value on the relevance of the pathological part and thus they are surprised by pathology. For example, there were probability estimates of a financial crisis beginning in 2008 (apparently, trusting Taleb) being 10^-12. If you applied a 'raw' principle of parsimony to this kind of equation (a GARCH model in financial time series analysis), you end up with the 'best' model being one which contains no possibility of radical surprise.

I think the consequence of this is that as well as there being principles of parsimony. there should be an analysis of the consequences of assuming too simple a model.

*edit: with real valued quantities.
• 7.7k

Is this supposed to be an image?
• 5k

It is, I can't figure out how to make it display. Can you help?
• 7.7k
It is, I can't figure out how to make it display. Can you help?

Sorry, no. I was having trouble following your explanation and was hoping an image would help.
• 5k

Nutshell:
If you have two variables - the mean of their sum is the sum of their means.
If you have two variables with no correlation - the variability of their sum is the sum of their variabilities.
If you introduce a variable which has infinite mean and infinite variance to the sum, then the mean is infinite and the variability is infinite.
It doesn't matter what probability you assign to this infinity inducing mess, it'll always make an infinity inducing mess.
With sufficiently low probability of occurring, an infinity inducing mess can be made to resemble a precise experiment.
Thus, the data cannot distinguish an infinity inducing mess from a precisely measured law.
Thus, there's no evidence for the law.
• 7.7k
introduce a variable which has infinite mean and infinite variance to the sum,

I don't understand your example. Unless you really want to lecture to the kindergarten, we can leave it there.
• 5k

Say you're measuring something's length. You make 10 measurements from a ruler whose smallest division is 1 mm. The measurements might be slightly different. But you expect them to be very similar. This similarity is quantified as the variance, $\small \sigma^2$, of the measurements. For the ruler, you expect them to be about 1mm variability, since you can determine the length within 1 mm from the ruler.

In contrast, consider a ruler whose smallest division was the size of the sun. This would be really bad at measuring stuff. Someone using it would just eyeball the length. But you can expect the variability to be much larger.

Now imagine that someone give someones measurements of something's length, and the fuck-off big ruler or the usual ruler have been used. The person you give the measurements to doesn't know which one is used. You give them 1000 measurements. They appear to be within about 1 mm of each other.

You ask the person to estimate the variability of the measurements, they say 'they're within 1 mm of each other', in in saying that they rule out the idea that the ruler the size of the sun was used. You can construct the above mathematical argument in terms of the proportion of measurements that are made with the fuck-off big ruler. If the proportion is very small, there's little evidence to inform the person looking at the data that they could've been made with the big ruler.

But by construction there's always a chance, a very small chance, of the big one.

Help?
• 3.5k
• 5k

Thank you very much.
bold
italic
underline
strike
code
quote
ulist
image
url
mention
reveal