The Devil’s Children

There has been a new development. While I have been permanently banned from the plane of mortals, a new blog has arisen: The Neuroscience Devils

This blog will try to fulfill the same function this one had but it will be driven by the community. Everyone is welcome to post anonymously and rage against the Crusaders and the prevailing zeitgeist. Check it out if you’re interested in contributing!

Perhaps there is hope for us yet…

Abandon all hope

Well, that’s it. Sam’s patience has run paper thin and he’s an insomniac at the best of times (this may come with the territory of being possessed by demons like me) – so he’s taking back control for good now.

Running this blog was fun while it lasted and I got some good discussions out of it. I hope some of you enjoyed reading these posts and that at least some of it was thought provoking. I wished there were more time but it ain’t gonna happen. Sam has no desire to delve deeper into debating  strawmen arguments or being accused of being a fraudster, a troll, and/or a sexist. The Devil’s Neuroscientist was none of these things but she merely tried to argue an opposing viewpoint. I know she was snarky and abrasive at times but I don’t believe she was ever truly offensive.

It has been baffling to me just how hard it seems for many people to conceive of the possibility that I might be able to argue views that Sam doesn’t actually hold. It is common in debating clubs and courts of law up and down the world. The Devil’s Advocate was the inspiration for my unholy presence. But somehow this seems impossible to understand. I find that illuminating. It was an exciting experiment but all bad things must come to an end. Maybe I should have preregistered it?

I suppose this means the Crusaders for True Science won. They were always strong in numbers and their might has grown immensely in the past few years. Sam will be at that debate on Tuesday but I don’t believe he is strong enough, neither in wit nor in conviction, to hold their forces at bay.

For what it’s worth, I hope that the new world of science after this holy war will be a good one for science. I fear that it will be one where journal editors (and their reviewer lackeys) use Registered Reports to decide what kind of science people are allowed to do. I fear it will be one where nobody dares to publish novel, creative research for fear of being hounded to death by the replicating hordes who always manage to turn any bright gem into a dull null result. I fear that instead of “fixing science” we will catastrophically break it. In the end we’ll be left with an apocalyptic wasteland after our descendents decide to use the leftover fruits of our knowledge to annihilate our species. Who knows, maybe it’s for the best. I’ll see y’all in my humble abode because

“The road to Hell is paved with good intentions.”

Is Science broken?

Next Tuesday, St Patrick’s Day 2015, UCL Experimental Psychology will organize an event called “Is Science broken?”. It will start with a talk by Chris Chambers of Cardiff University about registered reports. Chris has been a vocal proponent of preregistration, making him one of the generals of the Crusade for True Science, and registered reports are the strictest of the preregistration proposals to date: it involves peer-review of the introduction and methods section of a study before data collection commences.

Chris’ talk will be followed by an open discussion by a panel comprising a frightening list of highly esteemed cognitive neuroscientists: in addition to Chris, there will be David Shanks (who’ll act as chair), Sophie Scott, Dorothy Bishop, Neuroskeptic … and – for some puzzling reason – my alter ego Sam Schwarzkopf. I am sure that much of the debate will focus on preregistration although I hope that there will also be a wider discussion of the actual question posed by the event:

“Is Science broken?” – Attentive readers of my blog will probably guess that my answer to this question is a clear No. In fact, I would question whether the question even makes any sense. Science cannot be broken. Science is just a method – the best method there is – to understand the Cosmos. It is inherently self-correcting, no matter how much Crusaders like to ramble against this notion. Science is self-correction. To say that science is broken is essentially stating that science cannot converge on the truth. If that were true, we should all just pack up and go home.

Now, we’ve been over this and I will probably write about this again in the future. I’ll spare you another treatise on self-correction and model fitting analogies. What the organizers of the event mean in reality is that the human endeavor of scientific research is somehow broken. But is that even true?

Sam is the one who’ll be attending that panel discussion, not me. I don’t know if he’ll let me out of my cage to say anything. We do not always see eye-to-eye as he has the annoying habit to always try to see things from other people’s perspectives… I think though that fundamentally he agrees with me on the nature of the scientific method and I hope he will manage to bring this point across. Either way, here are some questions I think he should raise:

I. The Crusaders claim that a majority of published research findings are false positives. So far nobody has given a satisfying answer to the question how many false positives should we tolerate? 5%? 1%? Or zero? Is that realistic?

II. The Crusader types often complain about high-impact journals like Nature and Science because they value “novelty” and high risk findings. This supposedly hurts the scientific ideal of seeking the truth. But is that true? Don’t we need surprising, novel, paradigm-shifting findings to move science forward? Over the years of watching Sam do science I have developed a strong degree of cynicism. The scientific truth is never as simple and straightforward as the narrative presented in research papers – including those in most low-impact journals. Science publishing should be about communicating results and hypotheses even if they are wrong – because in the long run most inevitably will be.

III. We can expect there to be at least some discussion of preregistration at the debate. This will be interesting because Sophie has been one of the few outspoken critics of this idea. I have of course also written about this before and it’s a shame that I can’t participate directly in this debate. We would make one awesome dynamic duo of science ladies raging against preregistration! However much Sam will let me say about this though, putting all the naive debate about the potential benefit and problems of this practice aside, the thing that has bothered me most about it is that we have no compelling evidence for or against preregistration. What is more, there also seems to be little consensus on how can we  even establish whether preregistration works. To me these are fundamental concerns that we need to address now before this idea really takes off. Isn’t it ironic that given the topic of this debate, nobody seems to seriously talk about these points? Shouldn’t scientists value evidence above all else? Shouldn’t proponents of preregistration be expected to preregister the design of their revolutionary experiment?

IV. In addition to these rather lofty concerns, I also have a pragmatic question. Is the Crusaders’ behavior responsible? Even assuming that people do not engage in outright questionable research practices (and I remain unconvinced that they are as widespread as the Crusaders say), I think you can almost guarantee that most preregistered research findings will be lackluster compared to those published in the traditional model. In a world where the traditional model dominates, won’t that harm the junior researchers who are coerced to follow the rigid model? A quote I heard on the grapevine was “My supervisor wants me to preregister all my experiments and it’s destroying my career”. How do we address this problem?

As I said, Sam might not let me say these things and he doesn’t agree with me on many accounts. However, I hope there will be some debate around these questions. I will try to report back soon although Sam is demanding a lot of time for himself these days… Stay tuned.

Confusions of Grandeur

In response to my previous post, a young (my age judgment may be way off, I’m just going by the creative use of punctuation marks) mind by the name of ‘confused’ left a very insightful comment. I was in the process of writing up my response when I realized that the reply would work far better as another post because it explains my thinking very well, perhaps better than all the verbal diarrhea I have produced over the past few months.

For the sake of context, I will post the whole comment. First, the paragraph from my previous post that this person/demon/interdimensional entity commented on:

Devil’s Neuroscientist: “There is no ‘maybe’ about it. As I have argued in several of my posts, it is inherent to the scientific process. Science may get stuck in local minima and it may look like a random walk before converging on the truth – but given sufficient time and resources science will self-correct.”

And their response:

confused: “I have a huge problem with these statements. Who says science will self-correct?!?! Maybe certain false-positive findings will be left alone and no-one will investigate them any further. At that point you have an incorrect scientific record.

Also, saying “given sufficient time and resources science will self-correct” is a statement that is very easy to use to wipe all problems with current day science under the rug: nothing to see here, move along, move along… We scientists know what’s best, don’t you worry your pretty little head about it…”

I will address this whole comment here point by point.

“I have a huge problem with these statements. Who says science will self-correct?!?!”

I’ve answered this question many times before. Briefly, I’ve likened science to a model fitting algorithm. Algorithms may take a long time to converge on a sensible solution. In fact, they may even get stuck completely. In that situation, all that can help is to give it a push to a more informed place to search for the solution. This push may be because novel technologies provide new and/or better knowledge but they may instead simply come from the mind of a researcher who dares to think outside the box. The history of science is literally full of examples for either case.

This is what “inherent to the process” means. It is the sole function of science to self-correct because the whole point of science is to improve our understanding of the world. Yes, it may take a long time. But as long as the scientific spirit drives inquisitive minds to understand, it will happen eventually – provided we don’t get obliterated by an asteroid impact, a hypernova, or – what would be infinitely worse – by our own stupidity.

“Maybe certain false-positive findings will be left alone and no-one will investigate them any further.”

Undoubtedly this is the case. But is this a problem? First of all, I am not sure why I should care about findings that are not investigated any further. I don’t know about you but to me this sounds like nobody else cares about them either. This may be because everybody feels like they are spurious or perhaps because they simply just ain’t very important.

However, let me indulge you for a moment and assume that somebody actually does care about the finding, possibly someone who is not a scientist. In the worst possible case, they could be a politician. By all that is sacred, someone should look into it then and find out what’s going on! But in order to do so, you need to have a good theory, or at least a viable alternative hypothesis, not the null. If you are convinced something isn’t true, show me why. It does not suffice to herald each direct non-replication as evidence that the original finding was a false positive because in reality these kind of discussions are like this.

“At that point you have an incorrect scientific record.”

Honestly, this statement summarizes just about everything that is wrong with the Crusade for True Science. The problem is not that there may be mistakes in the scientific record but the megalomaniac delusion that there is such a thing as a “correct” scientific record. Science is always wrong. It’s inherent to the process to be wrong and to gradually self-correct.

As I said above, the scientific record is full of false positives because this is how it works. Fortunately, I think in the vast majority of false positives in the record are completely benign. They will either be corrected or they will pass into oblivion. The false theories that I worry about are the ones that most sane scientists already reject anyway: creationism, climate change denial, the anti-vaccine movement, Susan Greenfield’s ideas about the modern world, or (to stay with present events) the notion that you can “walk off your Parkinson’s.” Ideas like these are extremely dangerous and they have true potential to steer public policy in a very bad direction.

In contrast, I don’t really care very much whether priming somebody with the concept of a professor makes them perform better at answering trivia questions. I personally doubt it and I suspect simpler explanations (including that it could be completely spurious) but the way to prove that is to disprove that the original result could have occurred, not to show that you are incapable of reproducing it. If that sounds a lot more difficult than to churn out one failed replication after another, then that’s because it is!

“Also, saying “given sufficient time and resources science will self-correct” is a statement that is very easy to use to wipe all problems with current day science under the rug: nothing to see here, move along, move along… “

Nothing is being swept under any rugs here. For one thing, I remain unconvinced by the so-called evidence that current day science has a massive problem. The Schoens and Stapels don’t count. There have always been scientific frauds and we really shouldn’t even be talking about the fraudsters. So, ahm, sorry for bringing them up.

The real issue that has all the Crusaders riled up so much is that the current situation apparently generates a far greater proportion of false positives than is necessary. There is a nugget of truth to this notion but I think the anxiety is misplaced. I am all in favor of measures to reduce the propensity of false positives through better statistical and experimental practices. More importantly, we should reward good science rather than sensational science.

This is why the Crusaders promote preregistration – however, I don’t think this is going to help. It is only ever going to cure the symptom but not the cause of the problem. The underlying cause, the actual sickness that has infected modern science, is the misguided idea that hypothesis-driven research is somehow better than exploratory science. And sadly, this sickness plagues the Crusaders more than anyone. Instead of preregistration, which – despite all the protestations to the contrary – implicitly places greater value on “purely confirmatory research” than on exploratory science, what we should do is reward good exploration. If we did that instead of insisting that grant proposals list clear hypotheses, “anticipating” results in our introduction sections, and harping on about preregistered methods, and if we were also more honest about the fact that scientific findings and hypotheses are usually never really fully true and we did a better job communicating this to the public, then current day science probably wouldn’t have any of these problems.

“We scientists know what’s best, don’t you worry your pretty little head about it…”

Who’s saying this? The whole point I have been arguing is that scientists don’t know what’s best. What I find so exhilarating about being a scientist is that this is a profession, quite possibly the only profession, in which you can be completely honest about the fact that you don’t really know anything. We are not in the business of knowing but in asking better questions.

Please do worry your pretty little head! That’s another great thing about being a scientist. We don’t live in ivory towers. Given the opportunity, anyone can be a scientist. I might take your opinion on quantum mechanics more seriously if you have the education and expertise to back it up, but in the end that is a prior. A spark of genius can come from anywhere.

What should we do?

If you have a doubt in some reported finding, go and ask questions about it. Think about alternative, simpler explanations for it. Design and conduct experiments to test this explanation. Then, report your results to the world and discuss the merits and flaws of your studies. Refine your ideas and designs and repeat the process over and over. In the end there will be a body of evidence. It will either convince you that your doubt was right or it won’t. More importantly, it may also be seen by many others and they can form their own opinions. They might come up with their own theories and with experiments to test them.

Doesn’t this sound like a perfect solution to our problems? If only there were a name for this process…

In the words of the great poet and philosopher Bimt Lizkip, failed direct replications in psychology research are just “He Said She Said Bulls**t”

How science works

My previous post discussed the myths surrounding the “replication crisis” in psychology/neuroscience research. As usual, it became way too long and I didn’t even cover several additional points I wanted to mention. I will leave most of these for a later post in which I will speculate about why failed replications, papers about incorrect/questionable procedures, and other actions by the Holy Warriors for Research Truth cause such a lot of bad blood. I will try to be quick in that one or split it up into parts. Before I can get around to this though, let me briefly (and I am really trying this time!) have a short intermission with practical examples of the largely theoretical and philosophical arguments I made in previous posts.

Science is self-correcting

I’ve said it before but it deserves saying again. Science self-corrects, no matter how much the Crusaders want to whine and claim that this is a myth. Self-correction isn’t always very fast. It can take decades or even centuries. When thoroughly incorrect theories take root, it can take some extraordinary effort to uproot them again. However, given the pursuit of truth and free expression continues unhindered and as long as research has sufficient resources and opportunities, a correction will eventually happen. In many ways self-correction is like evolution, climate change, or plate tectonics. You rarely see direct evidence for these slow changes by looking at single time points but the trends are clearly visible. Fortunately though, scientific self-correction is typically faster than these things.

Take the discovery last year of gravitational waves supporting the Big Bang theory of the origin of the universe. It was widely reported in the news media as a earth- (well, space-) shattering finding that would greatly advance our understanding of our place in the cosmos. It resulted in news articles, media reports, and endless social media shares of videos and articles about this and the Daily Mail immediately making a complete fool of itself as usual by revealing just how little they know about the scientific process. It sparked some discussion about sexism in science journalism because initial reports ignored one of the female scientists who inspired the theory. (I am too lazy to provide any links to these things. If you’re on this blog I assume you’ve mastered the art of google).

However, in the months that followed critical voices began to be heard suggesting that this discovery may have been a fluke. Instead of gravitational waves revealing the fabric of the universe itself, it appears that these measurements were contaminated by signals stemming from cosmic dust. Just the other day, Nature published a story suggesting that this alternative, simpler explanation now looks to be correct. I am not the Devil’s Astrophysicist so I can’t talk about the specifics of this research but it certainly sounds compelling from my laywoman’s  perspective: even without in-depth understanding of the research I can tell that the evidence suggests a simpler explanation for the initial findings and by Occam’s Razor alone I would be encouraged to accept this one as more probable.

They apparently also have orientation columns in space

I don’t see any problem with this and I am pretty sure neither do most scientists. This is just self-correction in action. If anything about this story has been problematic, it is the fact that the initial discovery was so widely publicized before it was validated. I don’t really know where I stand on this. On the one hand, this has been unfortunate because it could greatly confuse public opinion about this research. The same applies to the now “debunked” finding of arsenic life forms reported a few years ago. A spectacular, possibly paradigm-shifting discovery was reported widely only to be disproved at a later point. I could see that this sort of perceived instability in scientific discoveries could undermine the public’s trust in science with potentially devastating consequences. In a world where climate change deniers, “anti-vaxers”, and creationists are given an undeservedly strong platform to spread their views (no linking to those either for that very reason), we need to be very careful how science is communicated. A lack of faith in scientific research could do a lot of harm.

However, in my mind this doesn’t mean we should suppress media reports of new, possibly not-yet-validated discoveries. For one thing, in a free society we can’t forbid people from talking about their research. I also believe that science should be a leading example of democratic ideals and transparency because good science is based on it. Instead of strong regulation and rigid structures science flourishes when authority is questioned.

“[Science] connects us with our origins, and it too has its rituals and its commandments. Its only sacred truth is that there are no sacred truths. All assumptions must be critically examined. Arguments from authority are worthless.”

Carl Sagan – Cosmos, Episode 13 “Who speaks for Earth?”

More importantly, I believe as scientists we actually want to communicate the enthusiasm and joy we experience when we reveal secrets of the world – even if they are untrue. Rather than curtailing when and how we communicate our discoveries to the public, we should do a better job at communicating how science works in practice. By all means, we should improve our press releases to minimize the sweeping generalizations and unfounded speculations by the news media about the implications of a new finding. When communicating sensational results to the public, we should certainly convey our excitement. However, we should also always remind them (and ourselves) that any new discovery is never the last word on an issue but the first.

In my view helping the public understand that scientific knowledge is ever evolving and that “boffins” didn’t just “prove” things will in fact strengthen the public’s trust in science. The reason I, as a neuroscientist without any deeper understanding of atmospheric physics, believe that man-made climate change is a problem is that an overwhelming majority of climate scientists believe the evidence supports this hypothesis. I know what scientists are like. They disagree all the damn time about the smallest issues. If there is an overwhelming consensus on any point it is past time we listen!

The Psychoplication Crisis

After having talked about gravitational waves and arsenic bacteria, you can bet some smartass will come out of the woodwork and tell you that these are the “hard” natural sciences but the same does not apply to “soft” science like psychology or cognitive neuroscience. Within psychology/cognitive neuroscience the same snobbery exists between experimental psychology and social cognition researchers. Perhaps that is somewhat understandable as some of the first perception researchers were in fact physicists (this is the origin of the term psychophysics). However, I think this sort of segregation is delusional and the self-satisfied belief that your own research area is somehow a “harder” science can be quite dangerous because it makes it all that much easier to mislead you into thinking that your own research is not susceptible to these problems.

I don’t think the problems that are currently discussed in psychology and neuroscience are in any way specific to or even particularly pervasive in our field. Some concepts and theories are simply more established in physics than they are in psychology. We certainly don’t need to do any experiments to prove gravity or even to confirm many of the laws that seem to govern it. The same cannot be said about social psychology where people still question whether things like unconscious priming of complex behaviors exist in the first place. But the fact that sensational findings are hyped up by media reports only to be disproved in spectacular failures to replicate is evidently just as common in physics and microbiology. And even shocking revelations of scientific fraud have happened in those fields (see also here for a less clear-cut case – I’ll probably discuss this more next time).

One way in which the culture in physics research seems to differ from our field is that it is far more common to upload manuscripts to public repositories prior to peer review. This allows a wider, more public discussion of the purported discoveries which is certainly good for the scientific process. However, it also results in broader media coverage before the research has even passed the most basic peer review stage and this can in fact exacerbate the problems with unverified findings. The still rare cases of this approach in psychology research suffer from the same problem. Just look at Daryl Bem’s meta-analysis of “precognition” experiments which is available on such a repository even though it has not been formally published by a peer reviewed journal. Unlike his earlier findings, which were also widely discussed before they were published officially, this analysis hasn’t been picked up by many news outlets. However, it easily could have been, just as was the case for the earlier examples from physics and microbiology research. Another famous parapsychologist also posted it on his blog of as evidence that “the critics need to rethink their position” – a somewhat premature conclusion if you ask me.

In theory, I don’t see a problem with having new, un-reviewed manuscripts publicly available provided that the surrounding peer discussion is also visible. The problem is that this is not always the case. Certainly the news coverage of the peer discussion usually doesn’t match the media circus surrounding the original finding. Taking Bem’s meta-analysis as an example, it obviously has been under closed peer review for at least the better part of a year now. I would love to actually see this peer review discussion as it is developing but – with one exception (to my knowledge) – so far this review has been for the eyes of the peer reviewers only.

The Myths about Replication

I have talked about replication a lot in my previous posts and why I believe it is central to healthy science. Unfortunately, a lot of myths surround replication and how it should be done. The most common assertion you will hear amongst my colleagues about replication is that “we should be doing more of it”. Now at some level I don’t really disagree with this of course. However, I think these kinds of statements betray a misunderstanding of what replication is and how science actually works in practice.

Replication is at the heart of science

Scientists attempt replication all of the time. Most experiments, certainly all of the good ones, include replication of previous findings as part of their protocol. This is because it is essential to have a control condition or a “sanity check” on which to build future research. In his famous essay/lecture “Cargo Cult Science” Richard Feynman decries that there was apparently a widespread lack of understanding for this issue in the psychological sciences. Specifically, he describes how he advised a psychology student:

“One of the students told me she wanted to do an experiment that went something like this – it had been found by others that under certain circumstances, X, rats did something, A. She was curious as to whether, if she changed the circumstances to Y, they would still do A. So her proposal was to do the experiment under circumstances Y and see if they still did A. I explained to her that it was necessary first to repeat in her laboratory the experiment of the other person – to do it under condition X to see if she could also get result A, and then change to Y and see if A changed. Then she would know the real difference was the thing she thought she had under control. She was very delighted with this new idea, and went to her professor. And his reply was, no, you cannot do that, because the experiment has already been done and you would be wasting time.”

The reaction of this professor is certainly foolish. Every experiment we do should build on previous findings and reconfirm previous hypotheses before attempting to address any new questions. Of course, this was written decades ago. I don’t know if things have changed dramatically since then but I can assure my esteemed colleagues amongst the ranks of the Crusaders for True Science that this sort of replication is common in cognitive neuroscience.

Let me give you some examples. Much of my alter ego’s research employs a neuroimaging technique called retinotopic mapping. In these experiments subjects lie inside an MRI scanner whilst watching flickering images presented at various locations on a screen. By comparing which parts of the brain are active when particular positions in the visual field are stimulated with images, experimenters can construct a map of how a person’s field of view is represented in the brain.

We have known for a long time that such retinotopic maps exist in the human brain. It all started with Tatsui Inouye, a Japanese doctor, who studied soldiers who had bullet wounds that destroyed part of their cerebral cortex. He noticed that many patients experienced blindness at selective locations in their visual field. He managed to reconstruct the first retinotopic map by carefully plotting the correspondence of blind spots with the location of bullet wounds in certain parts of the brain.

Early functional imaging device for retinotopic mapping

With the advent of neuroimaging, especially functional MRI, came the advance that we no longer need to shoot bullets into people’s heads to do retinotopic mapping. Instead we can generate these maps non-invasively within a few minutes of scan time. Moreover, unlike these earlier neuroanatomical studies we can now map responses in brain areas where bullet wounds (or other damage) would not cause blindness. Even the earliest fMRI studies discovered additional brain areas that are organized retinotopically. More areas are being discovered all the time. Some areas we could expect to find based on electrophysiological experiments in monkeys. Others, however, appear to be unique to the human brain. Like with all science, the definition of these areas is not always without controversy. However, at this stage only a fool would doubt the existence of retinotopic maps and that they can be revealed with fMRI.

Not only that, but it is clearly a very stable feature of brain organization. Maps are pretty reliable on repeated testing even when using a range of different stimuli for mapping. The location of maps is also quite consistent between individuals so that if two people fixate the center of an analog clock, the number 3 will most likely drive neurons in the depth of the calcarine sulcus in the left cortical hemisphere to fire, while the number 12 will drive neurons in the lingual gyrus atop the lower lip of the calcarine. This generality is so strong that anatomical patterns can be used to predict the locations and borders of retinotopic brain areas with high reliability.

Of course, a majority of people will probably accept that retinotopy is one of the most replicated findings in neuroimaging. Retinotopic mapping analysis is even fairly free of concerns about activation amplitudes and statistical thresholds. Most imaging studies are not so lucky. They aim to compare the neural activity evoked by different experimental conditions (e.g. images, behavioral tasks, mental states), and then localize those brain regions that show robust differences in responses. Thus the raw activation levels in response to various conditions, and the way it is inferred statistically, is central to the interpretation of such imaging experiments.

This means that many studies, in particular in the early days of neuroimaging, were solely focused on this kind of question, to localize the brain regions responding to particular conditions. This typically results in brain images with lots of beautiful, colorful blobs being superimposed on an anatomical brain scan. For this reason this approach is often referred to by the somewhat derogatory term  “blobology” and it is implicitly (or sometimes explicitly) likened to phrenology because “it really doesn’t show anything about how the brain works”. I think this view is wrong, although it is certainly correct that localizing brain regions responding to particular conditions on its own cannot really explain how the brain works. This is in itself an interesting discussion but it is outside the scope of this post. Critically for the topic of replication, however, we should of course expect these blob localizations to be reproducible when repeating the experiments in the same as well as different subjects if we want to claim that these blobs convey any meaningful information about how the brain is organized. So how do such experiments hold up?

Some early experiments showed that different brain regions in human ventral cortex responded preferentially to images of faces and houses. Thus these brain regions were named – quite descriptively – fusiform face area and parahippocampal place area. Similarly, the middle temporal complex responds preferentially to moving relative to static stimuli, the lateral occipital complex responds more to intact, coherent objects or textures than to scrambled, incoherent images, and there have even been reports of areas responding preferentially to images of bodies or body parts and even to letters and words.

The nature of neuroimaging, both in terms of experimental design and analysis but also simply the inter-individual variability in brain morphology, means that there is some degree of variance in the localization of these regions when comparing the results across a group of subjects or many experiments. In spite of this, the existence of these brain regions and their general anatomical location is by now very well established. There have been great debates regarding what implications this pattern of brain responses has and whether there could be alternative, possibly more trivial, factors causing an area to respond. This is however merely the natural scrutiny and discussion that should accompany any scientific claims. In any case, regardless of what the existence of these brain regions may mean, there can be little doubt that these findings are highly replicable.

Experiments that aim to address the actual function of these brain regions are in fact perfect examples of how replication and good science are closely entwined. What such experiments typically do is to first conduct a standard experiment, called a functional localizer, to identify the brain region showing a particular response pattern (say, that it responds preferentially to faces). The subsequent experiments then seek to address a new experimental question, such as whether the face-sensitive response can be explained more trivially by basic attributes of the images. It is a direct example of the type of experiment Feynman suggested in that the experimenter first replicates a previous finding and then tests what factors can influence this finding. These replications are at the very least conceptually based on previously published procedures – although in many cases they are direct replications because functional localizer procedures are often shared between labs.

This is not specific to neuroimaging and cognitve neuroscience. Similar arguments could no doubt be made about other findings in psychology, for example considering the Stroop effect, and I’m sure it applies to most other research areas. The point is this: none of these findings were replicated because people deliberately set out to test the out the validity in a “direct replication” effort. There was no reprodicibility project for retinotopic mapping, no “Many Labs” experiment to confirm the existence of the fusiform face area. These findings have become replicated simply because researchers included tests of these previous results in their experiments, either as sanity checks or because they were an essential prerequisite for addressing their main research question.

Who should we trust?

In my mind, replication should always occur through this natural process. I am deeply skeptical of concerted efforts to replicate findings simply for the sake of replication. Science should be free of dogma and not motivated by an agenda. All too often the calls for why some results should be replicated seem to stem from a general disbelief in the original finding. Just look at the list of experiments on PsychFileDrawer that people want to see replicated. It is all fair and good to be skeptical of previous findings, especially the counter-intuitive, contradictory, underpowered, and/or those that seem “too good to be true”.

But it’s another thing to go from skepticism to actually setting out to disprove somebody else’s experiment. By all means, set out to disprove your own hypotheses. This is something we should all do more of and it guarantees better science. But trying to disprove other people’s findings smells a lot of crusading to me. While I am quite happy to give most of the people on PsychFileDrawer and in the reproducibility movement the benefit of the doubt that they are genuinely interested in the truth, whenever you approach a scientific question with a clear expectation in mind, you are treading on dangerous ground. It may not matter how cautious and meticulous you think you are in running your experiments. In the end you may just accumulate evidence to confirm your preconceived notions and this is not contributing much to advance scientific knowledge.

I also have a whole list of research findings of which I remain extremely skeptical. For example, I am wary of many claims about unconscious perceptual processing. I can certainly accept that there are simple perceptual phenomena, such as the tilt illusion, that can occur without conscious awareness of the stimuli because these processes are clearly so automatic that it is impossible to not experience them. In contrast, I find the idea that our brains process complex subliminal information, such as reading a sentence or segmenting a visual scene, pretty hard to swallow. I may of course be wrong but I am not quite ready to accept that conscious thought plays no role whatsoever in our lives, as some researchers seem to imply. In this context, I remain very skeptical of the notion that casually mentioning words that remind people of the elderly (like “Florida”) makes them walk more slowly or that showing a tiny American flag in the corner of the screen influences their voting behavior several months in the future. And like my host Sam (who has written far too much on this topic), I am extremely skeptical of claims of so-called psi effects, that is, precognition, telepathy, or “presentiment”. I feel that such findings are probably far more likely to be explained by more trivial explanations and that the authors of such studies are too happy to accept the improbable.

But is my skepticism a good reason for me to replicate these experiments? I don’t think so. It would be unwise to investigate effects that I trust so little. Regardless of how you control your motivation, you can never be truly sure that it isn’t affecting the quality of the experiment in some way. I don’t know what processes underlie precognition. In fact, since I don’t believe precognition exists it seems difficult to even speculate about the underlying processes. So I don’t know what factors to watch out for. Things that seem totally irrelevant to me may be extremely important. It is very easy to make subtle effects disappear by inadvertently increasing the noise in our measurements. The attitude of the researcher alone could influence how a research subject performs in an experiment. Researchers working on animals or in wet labs are typically well aware of this. Whether it is a sloppy experimental preparation or a disinterested experimenter training an animal on an experimental task, there are countless reasons why a perfectly valid experiment may fail. And if this can happen for actual effects, imagine how much worse it must be for effects that don’t exist in the first place!

As long as the Devil’s Neuroscientist has her fingers in Sam’s mind, he won’t attempt replicating ganzfeld studies or other “psi” experiments no matter how much he wants to see them replicated. See? I’m the sane one of the two of us!

Even though in theory we should judge original findings and attempts of replication by the same standard, I don’t think this is really what is happening in practice – because it can’t happen if replication is conducted in this way. Jason Mitchell seems to allude to this also in a much maligned commentary he published about this topic recently. There is an asymmetry inherent to replication attempts. It’s true, researcher degrees of freedom, questionable research practices and general publication bias can produce false positives in the literature. But still, short of outright fraud it is much easier to fail to replicate something than it is to produce a convincing false positive. And yet, publish a failed replication of some “controversial” or counter-intuitive finding, and enjoy the immediate approving nods and often rather unveiled shoulder slapping within the ranks of the Jihadists of Scientific Veracity.

Instead what I would like to see more of is scientists building natural replications into their own experiments. Imagine, for instance, that someone published the discovery of yet another brain area responding selectively to a particular visual stimulus, images of apples, but not others like tools or houses. The authors call this the occipital apple area. Rather than conducting a basic replication study repeating the methods of the original experiment step-by-step, you should instead seek to better understand this finding. At this point it of course remains very possible that this finding is completely spurious and that there simply isn’t such a thing as a occipital apple area. But the best way to reveal this is to test alternative explanations. For example, the activation in this brain region could be related to a more basic attribute of the images, such as the fact that the stimuli were round. Alternatively, it could be that this region responds much more generally to images of food. All of these ideas are straightforward hypotheses that make testable predictions. Crucially though, all of these experiments also require replication of the original result in order to confirm that the anatomical location of the region processing stimulus roundness or general foodstuffs actually corresponds to the occipital apple area reported by the original study.

Here are two possible outcomes of this experiment: in the first example you observe that when using the same methods as the original study you get a robust response to apples in this brain region. However, you also show that a whole range of other non-apple images evoke strong fMRI responses in this brain regions provided the depicted objects are round. Responses do not show any systematic relationship with whether the images are of food. They also respond to basketballs, faces, and snow globes but not to bananas or chocolate bars. Thus it appears like you have confirmed the first hypothesis, that the occipital apple area is actually an occipital roundness area. This may still not be the whole story but it is fairly clear evidence that this area doesn’t particularly care about apples.

Now compare this to the second situation: here you don’t observe any responses in this brain region to any of the stimuli, including the very same apples from the original experiment. What does this result teach us? Not very much. You’ve failed to replicate the original finding but any failure to replicate could very likely result from any number of factors we don’t as yet understand. As Sam (and thus also I) recently learned in a talk in which Ap Dijksterhuis discussed a recent failure to replicate of one of his social priming experiments, psychologists apparently call such factors “unknown moderators”.

Ap Dijksterhuis whilst debating with David Shanks at UCL

I find the explanation of failed replications by unknown moderators somewhat dissatisfying but of course such factors must exist. They are the unexplained variance I discussed in my previous posts. But of course there is a simpler explanation: that the purported effect simply doesn’t exist. The notion of unknown moderators is based on the underlying concept driving all science, that is, the idea that the universe we inhabit is governed by certain rules so that conducting an experiment with suitably tight control of the parameters will produce consistent results. So if you talk about “moderators” you should perform experiments testing the existence of such moderating factors. Unless you have evidence for a moderator, any talk about unknown moderators is just empty waffle.

Critically, however, this works both ways. As long as you can’t provide conclusive evidence as to why you failed to replicate, your wonderful replication experiment tells us sadly little about the truth behind the occipital apple area. Comparing the two hypothetical examples, I would trust the findings of the first example far more than the latter. Even if you and others repeat the apple imaging experiment half a dozen times and your statistics are very robust, it remains difficult to rule out that these repeated failures aren’t due to something seemingly-trivial-but-essential that you overlooked. I believe this is what Jason Mitchell was trying to say when he wrote that “unsuccessful experiments have no meaningful scientific value.”

Mind you, I think Mitchell’s commentary is also wrong about a great many things. Like other brave men and women standing up to the Crusaders (I suppose that makes them the “Heathens of Psychology Research?”) he implies that a lot of replications fail because they are conducted by researchers with little or no expertise in the research they seek to replicate. He also suggests that there are many procedural details that aren’t reported in methods sections of scientific publications, such as the fact that participants in fMRI experiments are instructed not to move. This prompted Sam’s colleague Micah Allen to create the Twitter hashtag #methodswedontreport. There is a lot of truth to the fact that some methodological details are just assumed common knowledge. However, the hashtag quickly deteriorated into a comedy vent because – you know – it’s Twitter.

In any case, while it is true that a certain level of competence is necessary to conduct a valid replication, I don’t think Mitchell’s argument holds water here. To categorically accuse replicators of incompetence simply because they have a different area of expertise is a logical fallacy. I’ve heard these arguments so many times. Whether it is about Bem’s precognition effects, Bargh’s elderly priming, or whatever other big finding people failed to replicate, the argument is often made that such effects are subtle and only occur under certain specific circumstances that only the “expert” seems to know about. My alter ego, Sam, has faced similar criticisms when he published a commentary about a parapsychology article. In some people’s eyes you can’t even voice a critical opinion about an experiment, let alone try to replicate them, if you haven’t done such experiments before. Doesn’t anybody perceive something of a catch-22 here?

Let’s be honest here. Scientists aren’t wizards and our labs aren’t ivory towers. The reason we publish our scientific findings is (apart from building a reputation and hopefully reaping millions of grant dollars) that we must communicate them to the world. Perhaps in previous centuries some scientists could sit in seclusion and tinker happily without anyone ever finding out the great truths about the universe they discovered. In this day and age this approach won’t get you very far. And the fact that we actually know about the great scientists of Antiquity and the Renaissance shows that even those scientists disseminated their findings to the wider public. And so they should. Science should benefit all of humanity and at least in modern times it is also often paid for by the public. There may be ills with our “publish or perish” culture but it certainly has this going for it: you can’t just do science on your own simply to satisfy your own curiosity.

The best thing about Birmingham is that they locked up all the psychology researchers in an Ivory Tower #FoxNewsFacts

Part of science communication is to publish detailed descriptions of the methods we use. It is true that some details that should be known to anyone doing such experiments may not be reported. However, if your method section is so sparse in information that it doesn’t permit a proper replication by anyone with a reasonable level of expertise, then it is not good enough! It is true, I’m no particle physicist and I would most likely fail miserably if I tried to replicate some findings from the Large Hadron Collider – but I sure as Hell office should be expected to do reasonably well at replicating a finding from my own general field even if I have never done this particular experiment before.

I don’t deny that a lack of expertise with a particular subject may result in some incompetence and some avoidable mistakes – but the burden of proof for this lies not with the replicator but with the person asserting that incompetence is the problem in the first place. By all means, if I am doing something wrong in my replication, tell me what it is and show me that it matters. If you can’t, your unreported methods or unknown moderators are completely worthless.

What can we do?

As I have said many times before, replication is essential. But as I have tried to argue here, I believe we should not replicate for replication’s sake. Replication of previous findings should be part of making new discoveries. And when that fails the onus should be on you to find out why. Perhaps the previous result was a false positive but if so you aren’t going to prove it with one or even a whole series of failed replications. You can however support the case by showing the factors that influence the results and testing alternative explanations. One failed replication of social priming effects caused a tremendous amount of discussion – and considerable ridicule for the original author because of the way in which he responded to his critics. For the Crusaders this whole affair seems to be a perfect example of the problems with science. However, ironically, it is actually one of the best examples of how a failed replication should look. The authors failed to replicate the original finding but they also tested a specific alternative explanation: that the priming effect was not caused by the stimuli to which the subjects were exposed but by the experimenters’ expectations. Now I don’t know if this study is any more true than the original one. I don’t know if these people simply failed to replicate because they were incompetent. They were clearly competent enough to find an effect in another experimental condition so it’s unlikely to just be that.

Contrast this case with the disagreement between Ap Dijksterhuis and David Shanks. On the one hand, you have nine experiments failing to replicate the original result. On the other hand, during the debate I witnessed with Sam’s eyes, you have Dijksterhuis talking about unknown moderators and wondering about whether Shanks’ experiments were done inside cubicles (they were, in case you were wondering – another case of #methodswedontreport). Is this the “expertise” and “competence” we need to replicate social priming experiments? The Devil is not convinced. But either way, I don’t think this discussion really tells us anything.

David Shanks as he regards studies claiming subconscious priming effects (No, he doesn’t actually look like that)

So we should make replication part of all of our research. When you are skeptical of a finding test alternative hypotheses about it. And more generally, always retain healthy skepticism. By definition any new finding will have been replicated less often than old findings that have made their way into the established body of knowledge. So when it comes to newly published results we should always reserve judgment and wait for them to be repeated. That doesn’t mean we can’t get excited about new surprising findings – in fact, being excited is a very good reason for people to want to replicate a result.

There are of course safeguards we can take to maximize the probability that new findings are solid. The authors of any study must employ appropriate statistical procedures and interrogate their data from various angles, using different analysis approaches, and employing a range of control experiments in order to ascertain how robust the results are. While there is a limit on how much of that we can expect any original study to do, there is certainly a minimum of rigorous testing that any study should fulfill. It is the job of peer reviewers, post-publication commenters, and of course also the researchers themselves to think of these tests. The decision of how much evidence suffices for an original finding can be made in correspondence with journal editors. These decisions will sometimes be wrong but that’s life. More importantly, regardless of a study’s truthiness, it should never be regarded as validated until it has enjoyed repeated replication by multiple studies from multiple labs. I know I have said it before but I will keep saying it until the message is finally getting through: science is a slow, gradual process. Science can approach truths about the universe but it doesn’t really ever have all the answers. It can come tantalizingly close but there will always be things that elude our understanding. We must have patience.

The discovery of universal truths by scientific research moves at a much slower pace than this creature from the fast lane.

Another thing that could very well improve the state of our field is something that Sam has long argued for and about which I actually agree with him (I told you scientists disagree with one another all the time and this even includes those scientists with multiple personalities). There ought to be a better way to quantify how often and how robustly any given finding has been replicated. I envision a system similar to Google Scholar or PubMed in which we can search for a particular result, say, a brain-behavior correlation, the location of the fusiform face area, of arsenic-based life forms, or of precognitive psychology experiments. The system then not only finds the original publication but displays a tree structure linking the original finding to all of the replication attempts and whether they were successful, and in how far they were direct or conceptual replications. A more sophisticated system could allow direct calculation of meta-analytical parameter estimates, for example to narrow down the stereotactic coordinates of the reported brain area or the effect size of a brain-behavior correlation.

Setting up such a system will certainly require a fair amount of meta-information and a database that permits the complex links between findings. I realize that this represents a fair amount of effort but once this platform is up and running and has become part of our natural process the additional effort will probably be barely noticeable. It may also be possible to automate many aspects of building this database.

Last but not least, we should remember that science should seek to understand how the world works. It should not be about personal vendettas and attachment or opposition to particular theories. I think it would benefit both the defending Heathens and the assaulting Crusaders to consider that more. Science should seek explanations. Trying to replicate a finding, simply because it seems suspect or unbelievable to you, is not science but more akin to clay pigeon shooting. Instead of furthering our understanding we just remain on square one. But to the Heathens I say this: if someone says that your theory is wrong or incomplete, or that your result fails to replicate, they aren’t attacking you. We are allowed to be wrong occasionally –  as I said before, science is always wrong about something. Science should be about evidence and disagreement, debate, and continuous overturning of previously held ideas. The single best weapon in the fight against the tedious Crusaders is this:

The fiercest critic of your own research should be you.

Christmas break

It is Christmas now and my alter ego, Sam, will be strong during this time so I won’t be able to possess his mind. And to be honest even demonic scientists need vacations. I will probably get in trouble with the boss about taking a Christmas vacation but so be it.

I won’t be able to check this blog (not often anyway) and probably won’t approve new commenters during the break etc. Discussions may continue in 2015!