## The function of repeatabilty in scientific experiments

• 1.8k
This may be a nonsense, topic, you'll soon tell me. I just had a quick thought and I don't have time to think it through before I forget it:

The function of repeatability in experiments is NOT to confirm a hypothesis.

The function of repeatability is to check the reliability of the experimental result.

Is that correct? I might have just got myself in a muddle.

EDIT: A single unrepeated experiment, if reliable, is enough to refute a hypothesis. You don't have to do it again.
• 8.7k
I just had a quick thought and I don't have time to think it through before I forget it:

This for tee-shirts!

MIchael Penn is an instructor of mathematics who has made a number of Youtube videos (I recommend). He ends them by saying, "And that's a good place to stop."

The only improvement on these just being silent.

And that's a good place to stop.
• 1.8k
The only improvement on these just being silent.

Do you mean I should not have posted? Mods can delete if they want.
• 4.7k

I also think that repeatability in science is to check the results we got previously in our analysis. But, even further than this, repeatability could also help us to improve the hypothesis itself. If you want to make a solid statement I guess you should repeat a lot until you believe is enough proven.
• 1.8k
I also think that repeatability in science is to check the results we got previously in our analysis. But, even further than this, repeatability could also help us to improve the hypothesis itself. If you want to make a solid statement I guess you should repeat a lot until you believe is enough proven.

I see, thanks.
• 1.8k
I can give an example if that would help.
• 8.7k
Not at all. Had you been silent we would not have had from you such a gem. As to experiments, not a simple question; e.g., does the test actually test what the tester thinks he or she is testing? The answer to your question, then, I think yes and no. It depends on the test and what is tested. Usually, both to reconfirm, retest, the hypothesis and the reliability of the tests. And I think the two are not so easily separated.

I learned a long time ago something about intelligence tests. Having seen the results of a series of tests taken by a group of people, the results of which of course for each person varied, I observed that the scores would have to be averaged. "Oh no!" exclaimed the people doing the testing. "The tests work," they said, "The highest score is the correct score (because you have to have the intelligence to get the score). Lower scores are just aberrations from the highest score for any of a lot of reasons."

And I thought that interesting both then and now.
• 1.8k
Not at all. Had you been silent we would not have had from you such a gem.

Oh! Thanks. I didn't understand. I process what you said when I can.
• 13k
The function of repeatability in experiments is NOT to confirm a hypothesis.

The function of repeatability is to check the reliability of the experimental result.

What's the difference? At least under most circumstances.

A single unrepeated experiment, if reliable, is enough to refute a hypothesis. You don't have to do it again.

So, how do you demonstrate that an experiment is reliable. I can think of two ways.1) review the premises, procedures, data, calculations, and conclusions for errors and unsound interpretations and 2) rerun the experiment. How far you have to go to show an experiment is reliable depends on the consequences of being wrong. If people's lives or millions of dollars are on the line, maybe you do need to rerun the experiment after all.
• 5.8k
The function of repeatability in experiments is NOT to confirm a hypothesis.

:up:

If you repeat a measurement under the same conditions in an experiment, the goal of that is usually to take an average; establishing concordance and forming a variance reduced estimate of the true value you're measuring.

If you repeat a measurement under different conditions in an experiment, in part that's trying to find out how the measured response varies with the stimulus/treatment, in part that's trying to find out how that response varies with contextual factors, in part (nowadays) that's trying to assess whether and how the stimulus/treatment's response itself varies with contextual factors. On this level, "repeating a measurement" is pretty much the core of a controlled experiment.

If you're repeating an entire experiment, there's some wiggle room in practice regarding what counts as a repeat. There's the hypothetical "exact replication", which is where you do literally everything the same, the "conceptual replication", which is where you try to ape the experimental conditions to be the same but can't do it exactly. I doubt those are an exhaustive typology of replication results, but the purpose of both isn't easily reducible to confirming or testing a previously held hypothesis in most cases, and that follows just because the overall set up in the initial experiment isn't identical, or necessarily even equivalent in all relevant respects, to the replication attempt.

That "lack of identity" (arguably) shows up in the difference in replication rates between papers where the initial researcher group is represented in the reproduction team and where they are not.

The function of repeatability is to check the reliability of the experimental result.

:up: Largely, I think so, at least up to what's intended by "reliability".

I would make the claim that the function of reproduction attempts/replication attempts in science isn't to check the reliability of any individual result; most results are false and over-simplifications and everyone knows this; the overall function is to make the process of scientific discovery in the aggregate not spend too long on "clear" falsehoods and inaccuracies, it's a quality control thing. What counts as a "clear falsehood" only makes sense in light of reproducibility.

Another angle on repeatability is that if you're repeating the experiment, manage it exactly, and the effect doesn't show up the same as before, that doesn't necessarily mean the conclusions of the initial experiment were false - it might be that the response is contextually variable, it might be a contextual interaction - both experiments could be samples of a distribution associated with the "true effect" indexed by contexts and their variables. The latter approach, to my understanding, is the one favoured by Gelman and his group.

A single unrepeated experiment, if reliable, is enough to refute a hypothesis. You don't have to do it again.

I think that depends too, the role of a non-repeat, if you see it in the context of a contextually variable interaction, it's not a refutation but evidence that the effect is contextual if it exists (and that starts a process of compensation of making it smaller compared to context induced imprecision, "exaggeration factors" "the garden of forking paths", and analysing true power of the study/broader scientific endeavour), if you see it in the context of everything's really set up exactly the same, the effect's probably not there as it was theorised - but if the "exact replication" must reproduce the contextual ambiguities of the initial one? It still doesn't mean the effect's not there/is 0* if the second one comes out, it could be that the ambiguities realised differently in both experiments.

In that kind of case, if the ambiguities are enough to swamp the signal, it's reasonable to say the treatment as intended or the effect as theorised has little to no evidence that it exists... Probably.

Edit*: its expectation could be 0, but it could still have high contextual variability...
• 5.8k
So:

The function of repeatability is to check the reliability of the experimental result.

:up: (IMO)

Up to precisely what's meant by "reliability", anyway.
• 1.8k
Many thanks for your excellent answer. It's taken me a while to get back to it.

If you repeat a measurement under the same conditions in an experiment, the goal of that is usually to take an average; establishing concordance and forming a variance reduced estimate of the true value you're measuring.

:nod:

If you repeat a measurement under different conditions in an experiment, in part that's trying to find out how the measured response varies with the stimulus/treatment, in part that's trying to find out how that response varies with contextual factors, in part (nowadays) that's trying to assess whether and how the stimulus/treatment's response itself varies with contextual factors. On this level, "repeating a measurement" is pretty much the core of a controlled experiment.

Indeed.

If you're repeating an entire experiment, there's some wiggle room in practice regarding what counts as a repeat. There's the hypothetical "exact replication", which is where you do literally everything the same, the "conceptual replication", which is where you try to ape the experimental conditions to be the same but can't do it exactly. I doubt those are an exhaustive typology of replication results, but the purpose of both isn't easily reducible to confirming or testing a previously held hypothesis in most cases, and that follows just because the overall set up in the initial experiment isn't identical, or necessarily even equivalent in all relevant respects, to the replication attempt.

I hadn't thought of that specifically, thanks.

That "lack of identity" (arguably) shows up in the difference in replication rates between papers where the initial researcher group is represented in the reproduction team and where they are not.

That's interesting.
I would make the claim that the function of reproduction attempts/replication attempts in science isn't to check the reliability of any individual result; most results are false and over-simplifications and everyone knows this; the overall function is to make the process of scientific discovery in the aggregate not spend too long on "clear" falsehoods and inaccuracies, it's a quality control thing. What counts as a "clear falsehood" only makes sense in light of reproducibility.

Sure. I think by 'results' you mean conclusions/interpretations rather than data?

Another angle on repeatability is that if you're repeating the experiment, manage it exactly, and the effect doesn't show up the same as before, that doesn't necessarily mean the conclusions of the initial experiment were false - it might be that the response is contextually variable, it might be a contextual interaction - both experiments could be samples of a distribution associated with the "true effect" indexed by contexts and their variables. The latter approach, to my understanding, is the one favoured by Gelman and his group.

Another very good point I hadn't thought of.

I think that depends too, the role of a non-repeat, if you see it in the context of a contextually variable interaction, it's not a refutation but evidence that the effect is contextual if it exists (and that starts a process of compensation of making it smaller compared to context induced imprecision, "exaggeration factors" "the garden of forking paths", and analysing true power of the study/broader scientific endeavour), if you see it in the context of everything's really set up exactly the same, the effect's probably not there as it was theorised - but if the "exact replication" must reproduce the contextual ambiguities of the initial one? It still doesn't mean the effect's not there/is 0* if the second one comes out, it could be that the ambiguities realised differently in both experiments.

In that kind of case, if the ambiguities are enough to swamp the signal, it's reasonable to say the treatment as intended or the effect as theorised has little to no evidence that it exists... Probably.

Yes, I completely glossed over all that nuance. My initial post popped into my head by considering vague memories of my philosophy of science on falsificationism and confirmationism. I found all that really interesting at the time but forgot most of it.

Thank you, that was really interesting and helpful.
bold
italic
underline
strike
code
quote
ulist
image
url
mention
reveal