Predictive modelling is not science

alcontali

Technical analysis uses a variety of charts and calculations to spot trends in the market and individual stocks and to try to predict what will happen next. Technical analysts don't bother looking at any of the qualitative data about a company (for example, its management team or the industry that it is in); instead, they believe that they can accurately predict the future price of a stock by looking at its historical prices and other trading variables.

Imagine a price prediction function F in which:

Pt = F(Pt-1, Pt-2, Pt-3, ... , Pt-k)

Fundamentally, the price level is therefore explained by ... the price level itself. This view is circular, impredicative, and therefore, fundamentally nonsensical. Even if the prediction function F turns out to be highly accurate and successfully predicts future prices, it would still be nonsensical.

Critics of technical analysis, and there are many, say that the whole endeavor is a waste of time and effort.

Not all predictive modelling is inaccurate. Still, the mere ability to predict future values should not be included in the definition of scientific method.

In Science as Falsification, Karl Popper writes:

We all—the small circle of students to which I belong—were thrilled with the result of Eddington's eclipse observations which in 1919 brought the first important confirmation of Einstein's theory of gravitation.

This sentence seems to endorse predictive modelling as a valid subdiscipline in science. Popper also writes:

Now the impressive thing about this case is the risk involved in a prediction of this kind. If observation shows that the predicted effect is definitely absent, then the theory is simply refuted. The theory is incompatible with certain possible results of observation—in fact with results which everybody before Einstein would have expected.

Unfortunately, the technical analyst also takes a risk when predicting future stock prices. His theories are also incompatible with certain possible results of observation. Karl Popper's permissive take on the scientific method will end up granting scientific status to the prediction engines of the worst charlatans. Therefore, I must object to his views.

The scientific method should only cover experimental testing. You must be freely able to choose the input I to feed into the theory F in order to receive output O, i.e. O = F(I). In other words, we must demand that the experimental tester demonstrates causality. He must be able to change I in order to produce changes in O.

This stricter take on the scientific method excludes the stock analyst's predictive function from the epistemic domain of science:

Pt = F(Pt-1, Pt-2, Pt-3, ... , Pt-k)

The stock analyst cannot freely choose Pt-1, Pt-2, ..., Pt-k. Hence, he cannot vary the input in order to obtain different outputs. Therefore, what he does, is not science.

Especially in the current climate of rampant scientism, it is a necessity to deny scientific status to as much predictivity-seeking activity as possible. Denying scientific status to mere predictive modelling neatly expels activities such as stock-price prediction out of the epistemic domain of science.

I do not deny that some predictive modelling -- that cannot be experimentally tested -- is actually accurate and even quite serious, but it should still not be labelled "science". The cost of doing that anyway, would unfortunately, be too high.

god must be atheist

Yikes... the big difference between science and technical analysis in the investment field is that science theories, after gaining support by evidence, are reliable, while on the other hand the technical analyst WILL get it right SOME of the time, you don't know with each instance ahead of the news whether that will be right that time, or it will be wrong that time... there is no assurance, only a randomly distributed set of "correct hits" among a randomly distributed "incorrect hits". If the ratio of "correct hits" over "incorrect hits" exceeds 1, then technical analysis is a worthwhile (although not the best) investment tool. However, to date, this has not been the case (as far as I know.)

Wayfarer

Karl Popper's permissive take on the scientific method will end up granting scientific status to the prediction engines of the worst charlatans. Therefore, I must object to his views. — alcontali

I think the comparison with stock market pricing is completely unjustified. The predictive capacities of physics are an essential part of the science. I mean, if the laws of motion turned out not to accurately predict shell trajectories, then it would have back to the drawing board. In the case of relativity theory, Eddington’s observations were the first of many greeted with the headline ‘Einstein Proved Right’ (although all the subsequent instances, of which there are many, add ‘Again’.) And these provide evidence of the veracity of the theory - what other kind could there be?

Paul Dirac predicted the discovery of anti-matter on the basis of mathematical symmetries. I can’t see any justification for declaring his work non-scientific. And how else are you to validate the accuracy of physical theory but testing it against observation? ‘Oh, that looks like it ought to be right.’

Where Popper’s philosophy is being seriously challenged is in the debate over the scientific status of string theory. There is a movement, launched by physicists George Ellis and Joe Silk, to preserve the principle of falsifiability in physics by arguing that many of the speculative consequences of string theory will be forever beyond falsification even in principle. Those opposing dismiss these criticisms by describing their proponents as ‘the Popperazi’, saying that their conservatism is stifling progress.

In any case, I agree that predictive reckoning in economics doesn’t constitute a science - but I don’t know too many people who would say it was.

So I think your definition of scientific method is far too restrictive.

fresco

I agree with Wayfarer's point that this is a false analogy. The significant issue in 'scientific prediction' is to draw attention or generate 'data' which up to that point prior to the theory, had remained unobserved or ignored.
Nor is the case that 'a failed prediction' would completely 'refute' the theory...rather it would delimit its applicability or lead to modification.

alcontali

I think the comparison with stock market pricing is completely unjustified. The predictive capacities of physics are an essential part of the science. — Wayfarer

Agreed, but charlatans, such as the ones in the stock market, must not be able to repurpose that capacity to claim scientific status. Most physics is obviously beyond reproach.

At the very small scale, there are epistemic problems in physics, but that was inevitable. The very concept of light, observation, quantum nature of energy differences, and so on, will simply start interfering with practices that we otherwise take for granted at a larger scale.

Paul Dirac predicted the discovery of anti-matter on the basis of mathematical symmetries. I can’t see any justification for declaring his work non-scientific. And how else are you to validate the accuracy of physical theory but testing it against observation? ‘Oh, that looks like it ought to be right.’ — Wayfarer

Since his original hypothesis was successfully, experimentally tested, there are clearly no objections to his theory. It is not physics that generally requires scrutiny.

Those opposing dismiss these criticisms by describing their proponents as ‘the Popperazi’, saying that their conservatism is stifling progress. — Wayfarer

There is a whole world outside physics that will happily make use of any relaxation of epistemic constraints in science to claim scientific status for their snake oil. Outside physics, there is an entire reservoir of claims of a very deceptive nature waiting exactly for that.

So I think your definition of scientific method is far too restrictive. — Wayfarer

Have you had a look of what is out there, claiming scientific status? Take as an example the monstrosity of "data science":

Data science is a multi-disciplinary field that uses scientific methods, processes, algorithms and systems to extract knowledge and insights from structured and unstructured data. Data science is the same concept as data mining and big data.

If you unleash something like the ordinary least squares method on approximately any collection of (x,y) tuples, you will almost always find some correlation that will give the resulting prediction function some or more predictive power. These people can then trivially defend their concoction by occasionally, correctly predicting a future (x,y) tuple. Even a broken clock gives the time correctly twice a day. So, moderate predictive power does not mean anything. Still, these people increasingly boast about their "data science" craft.

Their alchemy is widespread now. Just take a look at the articles "4 Reasons Not To Get That Masters In Data Science" or "Best 23 Schools with Data Science Master’s Programs".

Isn't it obvious that the barbarians really are at the gates by calling themselves "data scientists"?

They obviously want to repurpose the populace's false pagan belief in scientism -- which gives the term "science" excessive credibility anyway -- for their own nefarious and self-serving ambitions. With an overly permissive definition for the epistemic domain of science, we are giving these people the ammunition to wholesale manipulate and deceive.

Therefore, there is an urgent need to further restrict the epistemic domain of science by expelling any approach, and also any theory, that merely has predictive power, and which cannot be tested experimentally, by freely varying the input variables of the experiment.

T Clark

Not all predictive modelling is inaccurate. Still, the mere ability to predict future values should not be included in the definition of scientific method. — alcontali

I would say that, if I can generate predictions of the behavior of complex systems on a consistent basis, i.e. significantly better than chance, I have applied a method that models the actual real-world conditions that lead to that behavior. Keep in mind I am talking about an expansive view of what constitutes a model. I don't just mean a complex numerical computer model that allows predictions of the behavior of complex phenomena. "F = ma" is a model.

Let's examine the cartoon version of the scientific method:

Observe the phenomena I am interested in. Collect data.
Based on that data, generate hypotheses that might explain my observations.
Test my hypothesis empirically. Make more observations. Perform experiments.
Use the data collected to verify or falsify my hypotheses.
Repeat as needed.

What you call "predictive modeling" I call "generating hypotheses." The discussion of Einstein and the eclipse of Mercury is a good example of this.

The scientific method should only cover experimental testing. You must be freely able to choose the input I to feed into the theory F in order to receive output O, i.e. O = F(I). In other words, we must demand that the experimental tester demonstrates causality. He must be able to change I in order to produce changes in O. — alcontali

No. The theory that you feed your input into is a model just as much as the technical analysis of stock markets you decry.

Especially in the current climate of rampant scientism, it is a necessity to deny scientific status to as much predictivity-seeking activity as possible. Denying scientific status to mere predictive modelling neatly expels activities such as stock-price prediction out of the epistemic domain of science. — alcontali

Although no one has mentioned it, I keep waiting for the other shoe to drop - using your argument to undermine the credibility of climate science.

Agreed, but charlatans, such as the ones in the stock market, must not be able to repurpose that capacity to claim scientific status. Most physics is obviously beyond reproach. — alcontali

If it works, they're not charlatans. I doubt they care whether you are willing to designate what they do as science. If it doesn't work, they are good scientists with more work to do, bad scientists, pseudo-scientists, or maybe, as you indicate, charlatans.

alcontali

I would say that, if I can generate predictions of the behavior of complex systems on a consistent basis, i.e. significantly better than chance, I have applied a method that models the actual real-world conditions that lead to that behavior. — T Clark

In "Beware the Big Errors of 'Big Data'", Nassim Taleb creates a beautiful visual representation of the problem:

We’re more fooled by noise than ever before, and it’s because of a nasty phenomenon called “big data.” With big data, researchers have brought cherry-picking to an industrial level. Modernity provides too many variables, but too little data per variable. So the spurious relationships grow much, much faster than real information. In other words: Big data may mean more information, but it also means more false information.

big data means anyone can find fake statistical relationships, since the spurious rises to the surface. This is because in large data sets, large deviations are vastly more attributable to variance (or noise) than to information (or signal). It’s a property of sampling: In real life there is no cherry-picking, but on the researcher’s computer, there is. Large deviations are likely to be bogus.

We used to have protections in place for this kind of thing, but big data makes spurious claims even more tempting. And fewer and fewer papers today have results that replicate.

"F = ma" is a model. — T Clark

Yes, but in "F = ma" there is at least an attempt at establishing causality. Newton was not merely "phishing".

No. The theory that you feed your input into is a model just as much as the technical analysis of stock markets you decry. — T Clark

Well, Nassim Taleb also concludes that only data en provenance from experimental testing should be taken seriously:

In observational studies, statistical relationships are examined on the researcher’s computer. In double-blind cohort experiments, however, information is extracted in a way that mimics real life. The former produces all manner of results that tend to be spurious (as last computed by John Ioannidis) more than eight times out of 10.

Although no one has mentioned it, I keep waiting for the other shoe to drop - using your argument to undermine the credibility of climate science. — T Clark

No, no. Climate science is way too political for me. Seriously, I don't want to talk about that debate and I refuse to take a position.

If it works, they're not charlatans. — T Clark

The amount of cherry picking in spurious correlations has now turned into a problem bigger than life. I don't think that we can keep denying the problem. Seriously, it is now absolutely running out of control. We cannot keep going on like this. The correlations are there. So, it seemingly works, but it is also totally fake.

I doubt they care whether you are willing to designate what they do as science. — T Clark

But the charlatans certainly care. There is a reason why they have renamed "big data" to "data science". They want the credibility associated to the "science" part. They are hell bent on further fooling a demographic already badly afflicted with rampant scientism.

Seriously, predictive power by using spurious correlations cannot be considered enough to give their concoctions any kind of serious status. What they are doing, is just alchemy!

T Clark

big data means anyone can find fake statistical relationships, since the spurious rises to the surface. This is because in large data sets, large deviations are vastly more attributable to variance (or noise) than to information (or signal). It’s a property of sampling: In real life there is no cherry-picking, but on the researcher’s computer, there is. Large deviations are likely to be bogus. — alcontali

I'm fine with this. I have no trouble believing that many, most?, predictive models generated using data mining are spurious, but that ought to be self-correcting. If it doesn't work, it doesn't work. I use simple statistics at my work on a regular basis. I know enough about the methods to understand how easy it is to misunderstand and misuse them and how unsophisticated my understanding is. @fdrake is our resident statistician. When he explains some of the intricacies of a statistical analysis I can see three things - 1) he knows what he's talking about, 2) I don't, and more significantly 3) most scientists and other professional data users are more like me than him.

Spurious statistics have been around for a long time.

alcontali

... but that ought to be self-correcting. — T Clark

These people have already invaded the academia.

Examples from the link:

University of California, Berkeley
University of Denver
Syracuse University
Southern Methodist University
University of Dayton
American University
Pepperdine University
...

They are happily busy invading the corporations and the government bureaucracies, while in the meanwhile they have a vested interest in making sure that their expensive diplomas in concocting snake oil are not being criticized.

So, they will need to invade the mainstream media too, who are spectacularly proficient at black mouthing every opinion that does not suit them. They will seek to grab co-control over the social media too. That also seems to have become much easier than it used to be.

There are already way too many "forbidden" opinions.

Sooner or later, criticism on "data science" and similar practices, is simply going to become one more thing that you are no longer allowed to say.

That gang of pseudo-scientists, one of the so many, is gradually going to become one more guild of liars, manipulators, and deceivers that are above all the rules. They badly need protection against criticism, and read my lips, they will make sure to get it.

leo

↪alcontali

There are a lot of misconceptions in what you say, I attempted to explain them to you in the other thread but you don't seem to listen.

Predictive modelling is what science is about, making accurate predictions from past observations. If you exclude predictive modelling from science you remove pretty much everything from science.

Pt = F(Pt-1, Pt-2, Pt-3, ... , Pt-k) is not nonsensical in itself, the only reason you seem to find it nonsensical is that it doesn't work when applied to the price of a stock. I explained in the other thread why it doesn't work: the current price does not only depend on the past prices, it depends on what people do and decide to do, so if you only focus on past prices you have a poor model. I also explained why even if it works for a while it stops working eventually: because if everyone uses it, deep pockets have an incentive to move the market so that it stops working.

Regarding Karl Popper, you misinterpret what he said, the risk he was referring to is what he saw as a necessary aspect of scientific theories, in his view if a theory doesn't have that risk, if it cannot be falsified, then it isn't scientific. He said that theories cannot be verified.

You seem to imply that there are theories that are verified and cannot be falsified, and that they are the ones that should be called scientific, well a theory cannot be verified, because even if you conduct N experimental tests and in each of them observations match the theory's prediction, you cannot know that the theory will keep matching during the (N+1)th test. What you call "demonstrating causality" is predictive modelling just as well, it is assuming that the apparent causality will keep being valid in the future.

There are accurate models and there are inaccurate models, that's the important distinction. Saying that some models are not "science" just because you don't like them is what scientists do already, calling 'scientific' the models they like and 'unscientific' the ones they don't.

Technical analysis is a tool used by the very wealthy to take money from everyone else: it doesn't tell them where the price is going to go, it tells them when many people are going to buy or sell, and they can use that information against them. To a great extent they don't have to predict where the price is going to go, they decide where the price is going to go because they have the money to move the market one way or the other, they just have to decide what way is the most effective to suck money out of everyone else.

alcontali

Predictive modelling is what science is about, making accurate predictions from past observations. If you exclude predictive modelling from science you remove pretty much everything from science. — leo

In my opinion, mere predictive modelling without establishing causality should not be counted as science. Only observations en provenance from controlled experimental testing may be used in science:

A scientific control is an experiment or observation designed to minimize the effects of variables other than the independent variable. This increases the reliability of the results, often through a comparison between control measurements and the other measurements. Scientific controls are a part of the scientific method.

if all controls work as expected, it is possible to conclude that the experiment works as intended, and that results are due to the effect of the tested variable.

He said that theories cannot be verified. — leo

Karl Popper said that theories cannot be verified under Carnap's thesis. That means 'strong' verification, i.e. full inclusive proof, and not 'weak' verification, i.e. probabilistic experimental testing, which is obviously possible. You are digging up an old debate in which Karl Popper participated and that is no longer active. Verificationism and its vocabulary are dead now, and completely replaced by falsificationism.

What you call "demonstrating causality" is predictive modelling just as well, it is assuming that the apparent causality will keep being valid in the future. — leo

It is about the requirement for scientific controls:

Scientific controls are a part of the scientific method. Ideally, all variables in an experiment are controlled (accounted for by the control measurements) and none are uncontrolled. In such an experiment, if all controls work as expected, it is possible to conclude that the experiment works as intended, and that results are due to the effect of the tested variable.

Without these controls, you will get snake oil vendors phishing for naturally occurring but meaningless correlations in "big data". So, it is about putting a stop to the practice of calling this type of activity "data science", because in absence of there being some causality between variables, it is in my opinion not science.

They should rename "data science" to "correlation phishing".

Saying that some models are not "science" just because you don't like them is what scientists do already, calling 'scientific' the models they like and 'unscientific' the ones they don't. — leo

It is not about any personal preference ("because you don't like them"), but about what Nassim Taleb also complained about:

big data means anyone can find fake statistical relationships, since the spurious rises to the surface.

fdrake

The kind of algorithms, like neural networks, that are used to analyse big data aren't made of math objects that are interpretable in terms of the data. Studying the properties of these algorithms and creating them is a science to the same degree that studying the properties of any (set of) mathematical models is a science.

The biggest difference between these algorithms in general and more run of the mill statistical methods in general is that neural networks won't tell you how important each part of the data they depend on is to the conclusions the neural network draws, whereas run of the mill methods like linear regression will. What this difference turns on is that neural networks learn to weight (combinations of) variables in terms of their contribution to predictive accuracy without telling their user the consequences of this weighting scheme and how it learned this pattern from the data. Gains in predictive flexibility are bought with losses in interpretive precision.

A neural network model still takes the form of:

$y = f(x) + e$

where y is a response, x is a spreadsheet of data and e is an error.

A very tame model for the same x might be:

$y = b x + e$

where b describes the slope of the line each x makes when regressed upon y. What these b terms tell you immediately is how x relates to y in terms of the model. In terms of the neural network, you don't get a similar interpretation for f; the meaning of whatever f is is occluded because it is so complicated and flexible.

Neural networks can be a force for good, when applied to data with lots of test cases that actually represents the phenomenon being studied; like language recognition and translation in google translate's photo/video of words -> translation app. They are good when the data is good.

They are however bad when the data is bad. But no matter how complicated and flexible your model is, your data still can be terrible in all sorts of ways, and the resultant model (when it is performing exceptionally well!) will replicate all the crap that's gone into the signal of the data which in an ideal world shouldn't be there.

Companies have tried using big data algorithms to automate hiring practices based off of accepted/rejected CVs, they ended up being arbitrarily racist and sexist, and no matter what the engineers did to improve the algorithm the data had this prejudice so strongly and in so many interlinking ways - it was always preserved. After much consternation the funding for these projects dried up. A similar thing will be happening soon, terrifyingly, with court cases.

Denying that the application of neural networks to data is science is rather arbitrary, it's just often bad science, where people use the algorithms instead of thinking.

T Clark

Denying that the application of neural networks to data is science is rather arbitrary, it's just often bad science, where people use the algorithms instead of thinking. — fdrake

Which is, as I noted earlier, also very true of other kinds of statistical modeling.

sime

The main teething problem of data-science is the current absence of integrated causal modeling (let alone stakeholder legible causal modelling) which is a situation changing rapidly due to the ongoing invention of problem-domain-specific software simulations, along side the development of probabilistic programming libraries for quantifying simulation accuracy.

In this regard I expect that outside of computer-vision and speech recognition, where deep-learning as a strong inductive bias, the black-box neural network approaches in other problems domains will begin to take a back seat as transparent models of network logic come to the forefront. For you cannot easily inject human-readable logic clauses into a distributed neural network with positive and negative activations.

fdrake

In this regard I expect that outside of computer-vision and speech recognition, where deep-learning as a strong inductive bias, the black-box neural network approaches in other problems domains will begin to take a back seat as transparent models of network logic come to the forefront. For you cannot easily inject human-readable logic clauses into a distributed neural network with positive and negative activations. — sime

There's a general purpose procedure for doing causal inference in Bayesian models, though. Once you've fit the model, you sample from the posterior conditional of the desired response (say "do they have cancer?") given the desired 'intervention' (say smoking or not smoking), then you get probabilities of each. The reason this works is because the dependence on all other variables you have is already included by integrating them out.

Predictive modelling is not science

Welcome to The Philosophy Forum!

Categories

More Discussions