The Pipedream of Preregistration

(Disclaimer: As this blog is still new, I should reiterate that the opinions presented here are those of the Devil’s Neuroscientist, which do not necessarily overlap with those of my alter ago, Sam Schwarzkopf)

In recent years we have often heard that science is sick. Especially my own field, cognitive neuroscience and psychology, is apparently plagued by questionable research practices and publication bias. Our community abounds with claims that most scientific results are “false” due to lack of statistical power. We are told that “p-hacking” strategies are commonly used to explore the vast parameter space of their experiments and analyses in order to squeeze the last drop of statistical significance out of their data. And hushed (and sometimes quite loud) whispers in the hallways of our institutions, in journal club sessions, and at informal chats at conferences tell of many a high impact study that has repeated failed to be replicated but these failed replications vanish in the bottom of the proverbial file drawer.

Many brave souls have taken up the banner of fighting against this horrible state of affairs. There has been a whole spade of replication attempts of high impact research findings, the open access movement aims to make it more possible to published failed replications, and many proposals have been put forth to change the way we make statistical inferences from our data. These are all large topics in themselves and I will probably tackle them in later posts on this blog.

For my first post though I instead want to focus on the preregistration of experimental protocols. This is the proposal that all basic science projects should be preregistered publicly with an outline of the scientific question and the experimental procedures, including the analysis steps. The rationale behind this idea is that questionable research practices, or even just fairly innocent flexibility in procedures (“researcher degrees of freedom”) that could skew results and inflate false positives, will be more easily controlled. The preregistration idea has been making the rounds during the past few years and it is beginning to be implemented both in the forms of open repositories and some journals. In addition to fixing the validity of published research, preregistration is also meant as an assurance that failed replications are published because acceptance – and publication – of a study does not hinge on how strong and clean the results are but only on whether the protocol was sound and whether it was followed.

These all sound like very noble goals and there is precedent for such preregistration systems in clinical trials. So what is wrong with this notion? Why does this proposal make the Devil’s Neuroscientist anxious?

Well, I believe it is horribly misguided, that it cannot possibly work, and that – in the best-case scenario – it will make no difference to science-society’s ills. I think that well-intentioned as the preregistration idea may be, it actually results from people’s ever shortening 21st century attention spans because they can’t accept that science is a gradual and iterative process that takes decades, sometimes centuries, to converge on a solution.

Basic science isn’t clinical research

There is a world of difference between the aims of basic scientific exploration and clinical trials. I can get behind the idea that clinical tests, say of new drugs or treatments, ought to be conservative and minimizing the false positives in the results. Flexibility in the way data are collected and analyzed, how outliers are treated, how side effects are assessed, and so on, can seriously hamper the underlying goal: finding a treatment that actually works well.

I can even go so far to accept that a similarly strict standard ought to be applied to preclinical research, say animal drug tests. The Devil’s Neuroscientist may work for the Evil One but she is not without ethics. Any research that is meant to test the validity of an approach that can serve the greater good should probably be held to a strict standard.

However, this does not apply to basic research. Science is the quest to explain how the universe works. Exploration and tinkering is at the heart of this endeavor. In fact, I want to see more of this, not less. In my experience (or rather my alter ego’s experience – the Devil’s Neuroscientist is a mischievous demon possessing Sam’s mind and at the time of writing this she is only a day old – but they share the same memories) it is one of the major learning experiences most graduate students and postdocs go through to analyze your data to death.

By tweaking all the little parameters, turning on every dial, and looking at a problem from numerous angles we can get a handle on how robust and generalizable our findings truly are. In every student’s life sooner or later there comes a point where this behavior leads to them to the conclusion their “results are spurious and don’t really show anything of interest whatsoever.” I know, because I have been to that place (or at least my alter ego has).

This is not a bad thing. On the contrary I believe it is actually essential for good science. Truly solid results will survive even the worst data massaging to borrow a phrase Sam’s PhD supervisor used to say. It is crucial that researchers really know their data inside out. And it is important to understand the many ways an effect can be made to disappear, and conversely the ways data massaging can lead to “significant” effects that aren’t really there.

Now this last point underlines that that data massaging can indeed be used to create false or inflated research findings, and this is what people mean when they talk about researcher degrees of freedom or questionable research practices. You can keep collecting data, peeking at your significance level at every step, and then stop when you have a significant finding (this is known as “optional stopping” or “data peeking”). This approach will quite drastically inflate the false positives in a body of evidence and yet such practice may be common. And there may be (much) worse things out there, like the horror story someone (and I have reason to believe them) told me of a lab where the standard operating mode was to run a permutation analysis by iteratively excluding data points to find the most significant result. I neither know who these people were nor where this lab is. I also don’t know if this practice went on with or without the knowledge of the principal investigator. It is certainly not merely a “questionable” research practice but it has crossed the line into outright fraudulence. As the person who told me of this pointed out, if someone is clever enough to do this, it seems likely that they also know that this is wrong. The only difference from doing this and actually making up your data from thin air and eating all the experimental stimuli (candy) to cover up the evidence is that it actually uses real data – but it might as well not for all the validity we can expect from that.

But the key thing to remember here is that this is deep inside the realm of the unethical, certainly on some level of scientific hell (and no, even though I work for the Devil this doesn’t mean I wish to see anybody in scientific hell). Preregistration isn’t going to stop fraud. Cheaters gonna cheat. Yes, the better preregistration systems actually require the inclusion of a “lab log” and possibly all the acquired data as part of the completed study. But does anyone believe that this is really going to work to stop a fraudster? Someone who regularly makes use of a computer algorithm to produce the most significant result isn’t going to bat an eyelid at dropping a few data points from their lab log. What is a lab log anyway? Of course we keep records of our experiments, but unless we introduce some (probably infeasible) Orwellian scheme in which every single piece of data is recorded in a transparent, public way (and there have been such proposals), there is very little to stop a fraudster from forging the documentation for their finished study. And you know what, even in that Big Brother world of science a fraudster would find a way to commit fraud.

Most proponents of preregistration know this and are wont to point out that preregistration isn’t to stop outright fraud but questionable research practices – the data massaging that isn’t really fraudulent but still inflates false positives. These practices may stem from the pressure to publish high impact studies or may even simply be due to ignorance. I certainly believe that data peeking falls into the latter category or at least did before it was widely discussed. I think it is also common because it is intuitive. We want to know if a result is meaningful and collect sufficient data to be sure of it.

The best remedy against such things is to educate people of what is and isn’t acceptable. It also underlines the importance of deriving better analysis methods that are not susceptible to data peeking. There have been many calls to abandon null hypothesis significance testing altogether. This discussion may be the topic of another post by the Devil’s Neuroscientist in the future as there are a lot of myths about this point. However, at this point I certainly agree that we can do better and that there are ways – which may or may not be Bayesian – to use a stopping criterion to improve the validity of scientific findings.

Rather than mainly sticking to a preregistered script, I think we should encourage researchers to explore the robustness of their data by publishing more additional analyses. This is what supplementary materials can be good for. If you remove outliers in your main analysis, show what the result looks like without this step or at least include the data. If you have several different approaches, show them all. The reader can make up their own mind about whether a result is meaningful and their judgment should be all the more correct the more information there is.

Most importantly, people should replicate findings and publish those replications. This is a larger problem and in the past it had been difficult to publish replications. This situation has changed a lot however and replication attempts are now fairly common even in high profile journals. No finding should ever be regarded as solid until it has stood the test of time after repeated and independent replication. Preregistration isn’t going to help with that. Making it more rewarding and interesting to publish replications will. There are probably still issues to be resolved regarding replication although I don’t think the situation is as dire as it is often made out to be (and this will probably be the topic of another post in the future).

Like cats, scientists are naturally curious

Like cats, scientists are naturally curious and always keen to explore the world

Replications don’t require preregistration

The question of replication brings me to the next point. My alter ego has argued in the past that preregistration may be particularly suited for replication attempts. At the surface this seems logical because for a replication surely we want to stick as closely as possible to the original protocol so it is good to have that defined a priori.

While this is true, as I clawed my way out of the darkest reaches of my alter ego’s mind, it dawned on me that for replications the original experimental protocol is already published in the original study. All one needs to do is to follow the original protocol as closely as possible. Sure, in many publications the level of detail may be insufficient for a replication, which is in part due to the ridiculously low word limits of many journals.

However, the solution for this problem is not preregistration because preregistration doesn’t guarantee that the replication protocol is a close match to the original. Rather we must improve the level of detail of our methods sections. They are after all meant to permit replication. Fortunately, many journals that were particularly guilty of this problem have taken steps to change this. I don’t even mind if methods are largely published in online supplementary materials. A proper evaluation of a study requires close inspection of the methods but as long as they are easy to access I prefer detailed online methods to sparse methods sections inside published papers.

Proponents of preregistration also point out that having an experiment preregistered with a journal helps the publication of replication attempts because it ensures that the study will be published regardless of the outcome. This is perhaps true but at present there are not many journals that actually permit preregistration. It also forces the authors’ hands as to where to publish, which will be a turn off for many people.

Imagine that your finding fails to replicate a widely publicized, sensational result. Surely this will be far more interesting to the larger scientific community than if your findings confirm the previous study. Both are actually of equal importance and in truth having two studies investigating the same effect isn’t actually telling us much more about the actual effect than one study would. However, the authors may want to choose a high profile journal for one outcome but not for another (and the same applies to non-replication experiments). Similarly, the editors of a high impact journal will be more interested in one kind of result than another. My guess is PNAS was far keener to publish the failed replication of this study, than it would have been if the replication had confirmed the previous results.

While we still have the journal-based publication system or until we find another way for journals to decide where preregistered studies are published, preregistering studies with a journal forces the authors into a relationship with that journal. I predict that this is not going to appeal to many people.

We replicated the experiment two hours later, and found no evidence of this "sun in the sky". We conclude that the original finding was spurious.

We replicated the experiment two hours later, and found no evidence of this “sun in the sky” (BF01=10^100^100). We conclude that the original finding was spurious.

Ensuring the quality of preregistered protocols

Of course, we don’t need to preregister protocols with a journal but we could have a central repository where such protocols are uploaded. In fact, there is already such a place in the Open Science Framework, and, always keen to foster mainstream acceptance, a trial registry has also been set up for parapsychology. This approach would free the authors with regard to where the final study is published but this comes at a cost: at least at present these repositories do not formally review the scientific quality of the proposals. At least at a journal the editors will invite expert reviewers to assess the merits of the proposed research and suggest possible changes to be implemented before the data collection even begins.

Theoretically this is also possible at a centralized repository, but this is currently not done. It would also cause a major burden on the peer review system. There already are an enormous number of research manuscripts out there that want to be reviewed by someone (just ask the editors at the Frontiers journals). This number would probably inflate that workload massively because it is substantially easier to draft a simple design document for an experiment than it is to write up a fully-fledged study with results, analysis, and interpretation. Incidentally, this is yet another way in which clinical trials differ from basic research: in clinical trials you presumably already have a treatment or drug whose efficacy we want to assess. This limits the number of trials somewhat at least. In basic research all bets are off – you can have as many ideas for experiments as your imagination permits.

So what we will be left with is lots of preregistered experimental protocols that are either reviewed shoddily or not at all. Sam recently reviewed an EEG study claiming to have found neural correlates of telepathy. All the reviews are public so everyone can read them. The authors of this study actually preregistered their experimental protocol at the Open Science Framework. The protocol was an almost verbatim copy of an earlier pilot study the authors had done, plus some minor changes that were added in the hope to improve the paradigm. However, the level of detail in the methods was so sparse and obscure that it made assessment of the research, let alone an actual replication attempt, nigh impossible. There were also some fundamental flaws in the analysis approach that indicated that the entire results, including those from the pilot experiment, were purely artifactual. In other words, the protocol might as well not have been preregistered.

More recently, another study used preregistration (again without formal review) to conduct a replication attempt of a series of structural brain-behavior experiments. You can find a balanced and informative summary of the findings at this blog and at the bottom you will find an extensive discussion, including comments by my alter ego. What these authors did was to preregister the experimental protocol by uploading it to the webpage of one of the authors. They also sent the protocol to the authors of the original studies they wanted to replicate to seek their feedback. Minimal (or lack of) response was then assumed to be tacit agreement that the protocol was appropriate.

The scientific issues of this discussion are outside the scope of this post. Briefly, it turns out that at least for some of the experiments in this study the methods are only a modest match with those of the original studies. This should be fairly clear from reading the original methods sections. Whether or not the original authors “agreed” with the replication protocols (and it remains opaque just what that means exactly), there is already a clear departure from the actually predefined script at the very outset.

It is of course true that a finding should be generalizable to be of importance and robustness to certain minor variations in the approach should be part of that. For example, it was recently argued by Daryl Bem that failure to replicate his precognition results was due to the fact that the replicators did not use his own stimulus software for the replication. This is not a defensible argument because to my knowledge the replication followed the methods outlined in the original study fairly closely. Again, this is what methods sections are for. Therefore, if the effect can really only be revealed by using the original stimulus software this at the very least suggests that it doesn’t generalize. Before drawing any further conclusions it is therefore imperative to understand where the difference lies. It could certainly be that the original software is somehow better at revealing the effect, but it could also mean that it has a hidden flaw resulting in an artifact.

The same isn’t necessarily true in the case of these brain-behavior correlations. It could be true, but at present we have no way of knowing it. The methods of the original study as published in the literature weren’t adhered to, so it is incorrect to even call this a direct replication. Some of the discrepancies could very well be the reason why the effect disappears in the replication or conversely could also introduce a spurious effect in the original studies.

This is where we come back to preregistration. One of the original authors was actually also a reviewer of this replication study of these brain-behavior correlations. His comments are also included on that blog and he further elaborates on his reviews in the discussion on that page. He suggests that he proposed additional analyses to the replicators that are actually a closer match to the original methods but they refused to conduct these analyses because they are exploratory and thus weren’t part of the preregistered protocol. However, this is odd because several additional exploratory analyses are included (and clearly labeled as such) in the replication study. Moreover, the original author suggests he conducted the analysis he suggested on the replication data and that this in fact confirms the original findings. Indeed, another independent successful replication of one of these findings was published but not taken into account by this replication. As such it seems odd that only some of the exploratory methods are included in this replication and some more cynical than the Devil’s Neuroscientist (who is already pretty damn cynical) might call that cherry picking.

What this example illustrates is that it is actually not very straightforward to evaluate and ideally improve a preregistered protocol. First of all, sending out the protocol to (some) of the original authors is not the same as obtaining solid agreement that the methods are appropriate. Moreover, not taking suggestions on board at a later stage, when data collection/analysis has already commenced, is hindering good science. Take again my earlier example of the EEG study Sam reviewed. In his evaluation the preregistered protocol was fundamentally flawed, resulting in completely spurious findings. The authors revised their manuscript and performed different analyses Sam suggested, which essentially confirmed that there was no evidence of telepathy (although the authors never quite got to the point of conceding this).

Now unlike the Devil’s Neuroscientist, Sam is merely a fallible human and any expert can make mistakes. So you should probably take his opinion with a grain of salt. However, I believe in this case his view of that experiment is entirely correct (but then again I’m biased). What this means is that under a properly adhered to preregistration system the authors would have to perform the preregistered procedures even though they are completely inadequate. Any improvements, no matter how essential, will have to be presented as “additional exploratory procedures”.

This is perhaps a fairly extreme case but not unrealistic. There may be many situations where a collaborator or a reviewer (assuming the preregistered protocol is public) will suggest an improvement over the procedures after data collection has started. In fact, if a reviewer makes this suggestion then I would regard this as unproblematic to alter the design post-hoc. After all, preregistration is supposed to stop people from data massaging, not from making independently advised improvements to the methods. Certainly, I would rather see well designed, meticulously executed studies that have a level of flexibility, than a preregistered protocol that is deeply flawed.

The thin red line

Before I conclude this (rather long) first post to my blog, I want to discuss another aspect of preregistration that I predict will be its largest problem. As many proponents of preregistration do not tire to stress, preregistration does not preclude exploratory analyses or even whole exploratory studies. However, as I discussed in the previous sections, there are actually complications with this. My prediction is that almost all studies will contain a large amount of exploration. In fact, I am confident that the best studies will contain the most exploration because – as I wrote earlier – thorough exploration of the data is actually natural, it is useful, and it should be encouraged.

For some studies it may be possible to predict many of the different angles from which to analyze the data. It may also be acceptable to modify small mistakes in the preregistered design post-hoc and clearly label these changes. However, by and large what I believe will happen is that we will have a large number of preregistered protocols but that the final publications will in fact contain a lot of additional exploration, and that some of the scientifically most interesting information will usually be in there.

How often have you carried out an experiment only to find that your beautiful hypotheses aren’t confirmed, that the clear predictions you made do not pan out? These are usually the most interesting results because they entice you to dig deeper, to explore your data, and generate new, better, more exciting hypotheses. Of course, none of this is prevented by preregistration but in the end the preregistered science is likely to be the least interesting part of the literature.

But there is also another alternative. Perhaps we are enforcing preregistration more strictly. Perhaps only a very modest amount of exploration will be permitted after all. Maybe preregistered protocols will have to contain every detailed step in your methods, including the metal screening procedure for MRI experiments, exact measurements of the ambient light level, temperature, and humidity in the behavioral testing room, and the exact words each experimenter will say to each participant before, during, and after the actual experiment, without any room for improvisation (or, dare I say, natural human interaction). It may be that only preregistered studies that closely follow their protocol will be regarded as good science.

This alternative strikes me as a nightmare scenario. Not only will this stifle creativity and slow the already gradual progress of science down to a glacial pace, it will also rob science of the sense of wonder that attracted many of us to this underpaid job in the first place.

The road to hell is paved with good intentions – Wait. Does this mean I should be in favor of it?


16 thoughts on “The Pipedream of Preregistration

  1. Just to clarify: preregistration is *not* about preventing exploratory analyses. Preregistration is also *not* about putting confirmatory research on a pedestal. Preregistration is also *not* appropriate or possible in _all_ academic endeavors. Preregistration is simply a helpful way to remain honest, and prevent you from fooling yourself — we are human and prone to see structure where we expect it to be. Also, many researchers do not realize that their statistical method are only valid for strictly confirmatory research. After I tried preregistration myself (several times now) I realized how different this is from the usual way of doing research, and I no longer understand why people would want to pass up on the opportunity. The proof of the pudding is in the eating. I am pretty sure you’ll end up liking the approach, or at least come away with some new insights.


    • I know this is how proponents of preregistration often frame it. I actually briefly discussed this in the post as well. I believe it is unrealistic, hence a “pipedream”. If preregistration really catches on, confirmatory and exploratory research will inevitably be held to different standards. I think this is not only incorrect but potentially quite dangerous. I would rather take a well-done normal study than a bad preregistered one regardless of how strictly it sticks to the script. Yes, this entails a degree of trust that the authors weren’t just massaging the hell out of their data (or even outright Stapeling it all the way).

      My main point, towards the conclusion of my post, is that in a world dominated by prereg the adherence to protocols will either be loose – which means that nothing really changes all this much from the status quo – or it will become so strict that most innovation and creativity is stifled.

      “Preregistration is simply a helpful way to remain honest, and prevent you from fooling yourself — we are human and prone to see structure where we expect it to be.”

      Yes, most scientists are fallible humans and susceptible to cognitive biases. Again, I think the most important thing here is to educate and also to encourage replication (as I said, this will be a topic for another post – perhaps the next one). Regardless of whether they are preregistered or not, a single study doesn’t tell us anything major, but a series of replications will.

      And as I have also pointed out in the post, I would rather see the data analyzed from many different angles, with many different tweaks and data massaging, and then see that it is robust. This is much more telling than following a single preregistered protocol.

      “Also, many researchers do not realize that their statistical method are only valid for strictly confirmatory research.”

      I believe this is true and I support efforts to change that.

      “I am pretty sure you’ll end up liking the approach…”

      The Devil’s Neuroscientist is quite a stubborn creature. It will take a lot to convince me of anything I don’t believe in. But I know that alter ego, Sam, has been thinking about conducting preregistered studies. He’s feeble like this, always eager to try new things, no matter how insane… Since I share his experiences I can report back on those when they’re in.


      • “confirmatory and exploratory research will inevitably be held to different standards” – but they already are and there is nothing wrong with it. I believe you yourself share this view when you say “I would rather see the data analyzed from many different angles […] and then see that it is robust.” All exploratory work is useful, important and even necessary, but seeing whether a result is robust is the crucial point that can separate truths from illusory correlations and inform further exploration. And that is the confirmatory part.
        From your post, I got a feeling that you imagine a rather bleak future where pre-registrators stick mindlessly to their scripts, never doing anything exploratory or where they cheat and do not follow the registrated plans at all. Then pre-registration would either threaten exploratory work or be useless. However, I believe that pre-registration just makes the difference between exploratory and confirmatory work more explicit, which is a good thing.
        An example from our lab: when doing a more complex study, we run pre-tests and pilots first, tweak the design, then pre-register it with our main hypothesis and analysis, but also mention other variables that we want to explore and for which we don’t have clear predictions. Finally, we write up results of the main confirmatory analysis, but then continue with all those exploratory analyses (labeled as exploratory) – even with those that didn’t occure to us during pre-registration. So I really don’t feel like that by pre-registration wee forbid ourselves to do an exploratory work. We just have a clear idea what is confirmatory (and what I can trust more) and what is exploratory (and what I have to interpret more carefully), so we don’t fool ourselves (and others) too easily. (Plus pre-registration forces us to think about data and analysis in advance in much more detail which I believe is great especially for young researchers, who often focus too much on procedures and just hope to “analyze it somehow” at the end.)

        Of course I agree with your points about replications wholeheartedly. When assessing dependability of a result, it will go exploratory < confirmatory < replication.

        And last thing – nobody is suggesting that every pre-registered study is a good study, so it makes perfect sense to prefer a good "normal" study when compared with a bad pre-registered one.


        • “From your post, I got a feeling that you imagine a rather bleak future where pre-registrators stick mindlessly to their scripts, never doing anything exploratory or where they cheat and do not follow the registrated plans at all.”

          Actually I think the most realistic scenario is that nothing much changes at all. People preregister their protocols but then publish lots of tinkering in the same studies. The exploratory bits will be “clearly labeled” as such but the writing will constantly flip between preregistered and exploratory parts because there just are so many exploratory parts. There are two reasons for this: on the one hand, exploratory (i.e. non-registered) analyses will be required to better interrogate the data. Reviewers will ask for it (as they did in the Boekel study). But in many cases the authors themselves will want to include these additional data because they are more exciting/surprising/interesting than the modest results of the preregistered design (no doubt in part because, as intended, they actually do control false positives and less prone to inflated findings). So things will be more or less the same as now.

          It also means that there really isn’t that much of a point to preregister anything. We could just carry on the way things are: take new surprising results with all the grains of salt they deserve and start trusting them when they replicate, repeatedly, under a range of parameters. Moreover, we should also carry on multiple analysis to test if findings our robust – and report them. Whenever you review a manuscript in which authors stick to one analysis stream only, you should ask them to test the robustness of their results.

          I would be more convinced by a study that shows that the results hold up under a range of situations, thus proving they didn’t just cherry-pick the significant findings, than a single preregistered analysis that reports only one pipeline. Or if they show both, I don’t really see the need of having anything preregistered at all.


    • Hi EJ,

      What I’m reacting to in my post above is the draconian reactions that I’ve seen to the clarion call for pre-registration. There was at least one commenter in Neuroskeptic’s recent thread:

      who was calling for an end to all exploratory research, which of course I think is a mistake. Pre-registration should one tool in our toolbelt that can help to ensure that highly expensive experiments have the biggest impact possible.


  2. Also, of course I’m gonna explore the heck out of my data. Then again, I was trained by you know who – and his “if you can see it and be convinced, it’s going to be significant with some appropriate method” still seems to capture all data I’ve worked with. I know this is sacrilege and I don’t think it’s a safe approach with just anyone, but of you’re naturally nitpicky and suspicious and perfectionist, it has not so far caused me any problems (ie, my work is not shoddy/dodgy/shonky/dwanky – that last one I just learned is the South African version :)). Now I shall exit before psychologists start throwing stats textbooks at me.

    More broadly I believe there are grave dangers in replacing thinking with statistics. They should supplement our claims, not make our claims for us. That’s the scientists job.

    Apologies if anything doesn’t make sense, dictation into iOS works pretty well but not very well.


    • Thanks for both your posts and welcoming me to the blogosphere (my alter ego has actually been in it for a long time but I am after all only a day old now).

      Sam has repeatedly argued that thinking should never be replaced by statistics. I may post about this on this blog as well in the future. Of course, Sam has also been working on statistical methods (this year he’s been wasting much of the time he should’ve been doing actual research on that) so I am not entirely sure where he stands. I sometimes fear for his sanity. In the end no statistical method can save science and neither can preregistration. Only good science can save science.

      Also, you-know-who is quite frankly still a better scientist than most of the people complaining about the state of science combined.


  3. When I meet the devil, I must tell him about my recent replication plus pre-registration experience for a special topic at frontiers. The attempt brought out many of the issues for ex. that we followed the methods of the original paper, but yet one reviewer criticized us for our horribly confounded experimental design and told us the study was basically not a contribution to the field as a result (?! It’s a replication, we can’t change the design) and also was angered by some of our narrative where we referred to the original paper not specifying certain parameters (eg, since the orientation of the Gabor stimuli were not defined, we had matched them to the figures) and told us to really replicate we had to contact the authors and get additional details. That in itself should show a problem with the original paper. The original author may not respond, may be dead, etc. That is why you put details in the methods properly. Plus, we had talked to the PI already. So this reviewer was not behaving like he was assessing a replication attempts in the first place, and seem to think not reporting your methods was an issue that was wrong with the original paper, and not us. This reviewer was not the only problem, and we are not done with paper yet – but the rest shall be discussed over a glass of absinthe or possibly, good scotch if the devil’s alter ego would rather have the chat.

    Good luck with the blogosphere! (is that word still used, or is it archaic by now, like information super highway?)


  4. ‘Interesting’ doesn’t equally valid, true, or even important. I think the closest synonym would probably ‘surprising’. Preregistered hypotheses are by definition not very surprising because you hypothesized that you might observe these results. Of course, some predictions can still surprise if they are confirmed (like telepathy) but I would guess those are far less common than the chance finding we stumble across.

    My alter ego would probably agree with you that most science should be hypothesis driven (he has had to endure too much “exploratory research” in one of his previous lives). However, I defend exploratory research because exploratory research is important for progress in science. I don’t think most exploratory findings will be false positives although of course a lot of them will. However the same applies to preregistered studies.

    The way to determine whether a result is meaningful is through repeated and independent replication, both direct and conceptual. That and a lot of time for evidence to accumulate.


    • Thank you for the reply!

      I don’t understand the use of “surprising” in science. I understand science to be about facts and truths and things like that. That’s why I keep coming back to confirmatory findings being “better” than exploratory findings, because they can probably be trusted more and to be closer to “the truth”. Sure, let things “surprise” you of you like that term but please test such “suprises” via a confirmatory analyses.

      You state: “The way to determine whether a result is meaningful is through repeated and independent replication, both direct and conceptual. That and a lot of time for evidence to accumulate.” I agree, and I view confirmatory findings to probably have a higher chance at successfully being replicated. Also: confirmatory analyses can possibly be a direct replication of earlier findings, i.e. serving as a direct replication of one’s eploratory findings to test their robustness. Also: “confirmatory findings” are probably “exploratory” enough in the sense that their robustness and boundary conditions are to possibly be explored. I vote that “confirmatory findings” are the new “exploratory findings”. I bet you there will be more than enough to be “surprised” about and to “explore” with only “conformatory findings”.


      • Yes science is about “facts and truths and things like that”. My alter ego keeps telling people that science isn’t about truth though but rather about approximating the truth but knowing you’ll always be kinda wrong. However, whichever way you’re looking at it science is about seeking the truth.

        But I would say it is not just that. If scientists were only interested in knowing the truth, they wouldn’t get very far. Science is as much about discovery and exploration as it is about understanding and truth-seeking. Moreover, some of the most groundbreaking science is actually through chance discoveries. Where would we be today without Marie Curie, Ernest Rutherford or Galileo? Where would be without Magellan, Cook, or Columbus? (Wait, don’t answer that one…). Or, to stay within neuroscience, where would we be without Cahal? He didn’t do what one we’d call hypothesis-driven research but largely observed and cataloged what we saw.

        I’m not saying that you shouldn’t follow up your chance discoveries and data massaging with confirmatory research. Of course you should. More importantly, other people should too. Replications of your own findings are great but nothing is better than independent replication.

        But we really don’t need preregistration for that but only the willingness to 1) accept that science is a self-correcting process and 2) that it doesn’t move at the pace of a music video.


    • I agree with Sam that cognitive psychology proceeds most efficiently with a healthy mixture of exploratory and hypothesis driven research. We really know so little about how the mind works that it would impossible for us to find everything by generating hypotheses. It’s a comfortable fiction to imagine that all of our discoveries are generated by using theories to generate testable hypotheses but it’s just not true. Much of what we do is stumbling across things by accident and that’s ok. The important thing is to generate good theories from these stumbled-upon findings, and then test those theories with hypothesis-driven research. The two methods should go together, arm in arm.

      I also think that pre-registration of everything is not the ideal solution for our field. Rather, we should just replicate our own work. If you can replicate your own findings using the same methods and the same equipment, this eliminates most of the perils of fishing. If something is too expensive to replicate, then perhaps pre-registration is a good solution.

      Let me add a related comment which is that I am afraid of the dangers of Big Data and what it will do to our field. Without an extremely rigorous understandings of statistics, Big Data provides a lot of rope for one to hang one’s self with. So with the current Big Data push (i.e. new grants, faculty positions) I am afraid that things are going to get worse before they get better. So I am very glad that these conversations are being had right now, as we start to develop new methods for dealing with large amounts of data. Unfortunately, replication is not easy applied to Big Data problems, because the data sets are often idiosyncratic. Regardless, an awareness of the problems inherent in having lots of data can only help.


      • I completely agree (and I believe Sam might do too) that focusing on replication is far more crucial than all this talk about preregistration. One study is little better than no study to tell us if an effect is real or not. It really doesn’t matter if it is preregistered or not. Yes, prereg may minimize the false positive rate but it also reduces the hit rate of actually finding interesting effects in the first place. In the long run prereg studies will still contain lots of false positives and the only way to tell is to keep replicating. Of course, replication will be the topic of another post by me in the future so I will stop here.

        The same goes for Big Data. The Devil’s Neuroscientist has *a lot* to say about that… 😛


      • “(…) and the only way to tell is to keep replicating”

        This is another thing that keeps bugging me when I read about it: everybody keeps mentioning the importance of replicating yet hardly anybody does it !? How are we to know what are replicable effects when nobody performs replications? To me it seems that it is a promise science keeps telling itself, while at the same time doing almost nothing to actually keep its promise.

        There are hundreds, maybe thousands, of articles printed every month with findings that will probably never be replicated. These articles/ findings are however being used to hypothesize and are being used as references. Isn’t that problematic then? I can imagine that there are whole lines of research made up entirely of false-positives…

        “Of course, replication will be the topic of another post by me in the future so I will stop here.”

        Sweet! Can’t wait to read what you have to say about that.


  5. “but that the final publications will in fact contain a lot of additional exploration, and that some of the scientifically most interesting information will usually be in there.”

    I don’t understand why the most interesting information is in exploratory analyses. Isn’t science about finding facts, truths? If yes, then isn’t the most “interesting information” in the results/findings that could be seen as the most robust, i.e. the findings that are probably relatively close to the truth/ to being a fact? If yes, could you then not state that the most “intersting information” can only be found in the confirmatory findings relative to the exploratory findings because the former can be trusted more compared to the latter?

    I don’t understand all the fuss about “exploratory findings”. Why do I want to know about findings that are probably false positives, or random noise in the data? By all means use them for hypothesising if you think they are useful, but test them using a confirmatory design to at least be somewhat sure about them not being false positives/ noise.

    Liked by 1 person

Comments are closed.