A Layperson’s Guide to the Scientific Literature - Part 1
May 19th, 2008
I’m back! I know that some of you will be disappointed by this, but I’ll try to not let your disappointment ruin my day. For the rest of you - it’s good to be back!I’ve been up to my eyeballs in writing for work and had no time to write for fun.Today, I’m starting a two-part series on how people who didn’t grow up to be “science geeks” can get in on the fun and play at home!
Note: A number of comments went into the electronic void as the result of me moderating comments when I should have been in bed. If your comment didn’t get posted, please send it through again. I promise that I won’t moderate when sleep-deprived again.
At least once a week, someone will ask me how a person lacking advanced education or training in the sciences can critically evaluate scientific literature. This comes up especially often from parents of autistic children who are trying to sort out the “wheat” from the masses of Internet chaff.
Well, the short answer is “Abandon hope, all ye who enter here.” Frankly, critically evaluating the merits of a scientific publication requires not only an in-depth understanding of the specific field; it requires at least a passing familiarity with the journals, their editorial philosophy and the authors who publish in them.
For example, I can fairly easily evaluate the merits of papers published in the biomedical field, having spent all of my adult life either studying or working in the field. Even so, it can be hard for me to judge a study done in a part of biology or medicine that I am unfamiliar with – such as botany. I haven’t had much education in botany – apart from my undergraduate days – and I haven’t spent much time reading the botany literature, so I don’t know which journals routinely publish “iffy” papers and which authors routinely write them.
However – and there had to be a “however”, or this post would be utterly pointless – there are some things that the “layperson” can do that will help them tell if a scientific study – and its corresponding paper – have merit.
First step: Know Thyself
The first step to solving any problem, I am told, is to recognize that there is a problem. And the first step for any layperson who wants to evaluate the scientific literature is to recognize that they are a layperson and what that entails.
Everybody has their own unique mix of competencies and weaknesses. If you do not have an advanced degree (Google PhD’s don’t count) or extensive experience in a scientific field, science is probably not one of your special competencies. That’s not a bad thing, as long as you recognize it and act accordingly.
Problems with weaknesses usually happen when people fail to recognize them. The inherent weakness of a footbridge is not a problem unless you fail to recognize it and try to drive a truck across it. Likewise, a weakness in scientific knowledge is only a problem if you aren’t aware of it. I’ve seen people with tremendous competency in other, non-scientific, areas humiliate themselves (often without even being aware of that) by imagining that their skills in business (for example) somehow conferred on them the ability to understand science and medicine.
There is a component of relativity in this sort of hubris. These folks – MBA’s, lawyers, business people, actors, “Google PhD’s”, etc. – are probably more familiar with the science they think they know than is the “average” person. At the very least, they are more familiar with the words, the literature and a few of the “players” than the people they usually talk to.
This “greater knowledge” - relative to the people they associate with – may have led them to believe that they have a comprehensive grasp of the topic. This, in turn, has led some of them to think that anything they can’t understand is either baloney or unimportant. And this, in turn, has led them to thinking that they know more than the people who actually do know the subject.
I have to admit that I find it rather difficult to put myself in their position – that of imagining that I am an expert in a field where I have little or no education or training. And I suspect that it is a phenomenon limited to the sciences, as I don’t hear of too many people believing they have “special competency” in finance, business, plumbing or car repair. Perhaps that is because people who try to “play the expert” in those fields run up against the hard edges of reality more often – in the form of losses in the stock market, bankruptcy, leaky pipes and cars that won’t run.
On the other hand, only the most egregious examples of “playing the expert” in science will run afoul of reality. Apart from trying to ignore the law of Gravity – and certain critical and inescapable facts of human physiology – most science “wannabes” remain blissfully aware of their folly.
Another part of the problem is the general public’s conception of what “science” is. Too many people think that “science” consists of memorizing a long list of facts – the names of body parts, chemical elements, equations, etc. – and then applying that knowledge. This, unfortunately, is how “science” is taught in most high schools and – sad to say – in many undergraduate level college courses.
Real science is more like an “Indiana Jones” movie – it is the search for new knowledge, answering questions and solving problems. The “knowledge base” is important because it gives you the basic information you’ll need to formulate and answer the questions. It’s a tool to help in the search, not the goal. Real science is the process of finding pieces of new information that help us to understand the Universe a little better.
Bottom line: if you’re a novice at science – meaning that you don’t get paid to do it – acknowledge that fact and don’t pretend that you’re not.
Second step: Read the Article
You’d be amazed at how many people think that they know what a scientific paper is about without having bothered to actually read it. These folks have read (or heard) a summary of the paper (often by someone who knows as little about science as themselves) in the newspaper, a magazine or on the Internet. As anyone who has witnessed a newsworthy event and then seen it reported in the news can attest, it often loses something in the translation.
News articles and Internet summaries are a good starting point - they can help you wade through the masses of scientific articles that are published each day to find the ones that you’re interested in. But – and I can’t emphasize this enough – they are not a substitute for actually reading the original article.
Yes, sometimes it can be a real pain to get the article – especially if it’s been published in an obscure journal – but you cannot evaluate a study when you haven’t read the article. Check with your local library about inter-library loan. If you live near a university, especially one with a medical school, their library may have the journal. Even if you’re not a student or faculty member, almost all university libraries will let you look through their paper journals and – for a small fee – photocopy the article (or articles) you are interested in.
Anatomy of a Scientific Paper
Science writing has its own conventions and its own style. Although the individual journals may dictate certain deviations from the “standard” style, the “parts” described below will appear in all scientific articles. If one or more is missing, you may not be reading a scientific paper.
Abstract:
This is a very brief summary of the article written by the author(s). As such, it is very much like a first date – only the good things about the study are mentioned. My advice to you is to not read the abstract. Its function – in an ideal world – is to give people who are searching for articles a way of quickly determining if a specific article meets their needs. It does not necessarily give a complete or even accurate description of the study or its results.
Introduction:
The purpose of the introduction is to “set the stage” for the study. It should explain why the study was done and why the study is important. Most introductions give a brief (very brief!) capsule summary of what was known about the subject of the study and how the study was intended to increase this knowledge.
One good sign of a bad study is a very long introduction (relative to the total length). You can, in some ways, think of the introduction as an excuse. In most papers, it is a very brief “excuse” for spending time and money on the study. Like most excuses, the longer it gets, the more it is probably trying to “explain away”. Some of the worst papers I have read were all introduction – they were simply a means of trying to explain why bad data (or no data at all) should cause people to think a certain way.
On occasion you will find a long introduction in an article that covers two or more scientific disciplines. These interdisciplinary papers require more explanation and background in order that people who aren’t familiar with both (all) fields can understand why the study was done and what it was intended to do. With the rise of interdisciplinary journals, these sort of “introduction-heavy” papers are disappearing.
When you read the introduction, make a note (write it down!) of what the authors thought they were trying to accomplish. You will later compare this to what they actually accomplished – often a very different thing.
Methods:
In most journals, the “methods” section comes after the introduction, but some journals are different and put it at the end. Either place, the “methods” section explains – often in very terse and almost telegraphic language - how the study was done. This is where a lot of laypeople come to grief when reading scientific papers. If you don’t understand what the methods are – especially the statistics – you may completely miss major flaws (even “fatal” flaws) in a study.
Every study has flaws and limitations. Not every variable can be controlled and not every study can afford to be as thorough or complete as you might like. In some fields – especially those involving human subjects – the ethical and financial restraints make it extremely difficult to do good work.
Here are some things to look for in the “methods” section:
[1] Is there a control group?
In biology and especially in medicine, any treatment, intervention, exposure or risk factor has to be compared against a group – the control group – that is as similar to the test group as possible in order to counter random variations due to chance.
The control group should – ideally – be exactly like the treatment group except in the variable(s) being studied. Thus, a study looking at whether turnips cause terminal moraine needs to have a control group that is just like the group eating turnips, except that they don’t eat turnips.
Control groups are those parts of the method (see below) that act to check for errors and “contamination” (either physical or mental). For example, when I do PCR (polymerase chain reaction), I include a “positive control” (a sample that I already know should give me a “positive” result) and a “negative control” (a sample that I already know should give a “negative” result).
If the “positive control” comes out negative, I know that there was a problem in the procedure and that any negative results should be ignored. Likewise, if the “negative control” comes out positive, I know that there is contamination and that any positive results should be ignored.
[2] Is there a placebo control?
For treatment or exposure studies, where a something is done to the test group, the control group should receive a “placebo” in order to counter the tendency of the subjects and/or the observers to attribute changes to the treatment or exposure.
A placebo is a treatment or exposure that is indistinguishable (to the subjects and the observers) from the “real” treatment or exposure. If neither the subjects nor the observers know who is getting the “real thing”, there is less chance that a change will be erroneously attributed to the treatment or exposure.
Some studies can’t do a placebo control for either ethical reasons (it’s not ethical to do “sham” surgery on humans) or practical reasons. The reasons for not using a placebo control should be explained in the article and the limitations this caused should be discussed. If they are not, ask yourself, “Why not?”
[3] Group sizes should be large enough and roughly equal.
Biology is a field plagued by complexity and variation. For that reason, seeing something once – or even a few times – is not considered adequate. “Einmal ist keinmal!”, my PhD advisor was fond of saying – “Once is never!” Even a “statistically significant” finding can be a “statistical fluke”, since even the standard level for statistical significance allows a 5% chance that the results could be due to random chance.
One way to even out the “noise” in biological systems is to do the same test many different times (preferably on different subjects). The more subjects (human, animal, plant, etc.), the more the “noise” gets “averaged out” and the greater the ability to spot real differences between groups. Be especially wary of studies that use small numbers of subjects – it is very easy to get large apparent differences that are due entirely to chance.
One quick check is to calculate how results “translate” into numbers of subjects. For example, if a study had 13 rats treated with unobtainium and found that 15% of them developed cancer and 8% of the 13 control rats developed cancer, that might seem impressive. However, when you “run the numbers”, you find that they translate into 1 rat in the control group and 2 rats in the treatment group. If one subject can have such a large impact on the results, the vagaries of chance can easily create the false appearance of a difference.
[4] Are the statistics well-described and appropriate?
Statistics, as Disraeli observed, are very slippery things to the average person. Even mathematicians who specialize in statistics can disagree on which test or analysis is best or what their results mean. If you’re not familiar with statistics beyond “average” – if you can’t tell at sight the difference between mean, median and mode and don’t know what parametric and non-parametric statistics are – then you are at the mercy of the authors.
That said, some authors make the most amazing errors in their statistics – errors that anyone can find, if they know what to look for. Here are two of the most common:
[a] Are discrete variables treated as continuous (or vice-versa)?
OK, many of you might have no idea what that means – but it’s a relatively easy concept. Discrete variables are things that can only be measured in whole units – boxes, people, fish, etc. There is no such thing as 0.3 of a person, even though one study found that the “average” family had 3.3 people.
Continuous variables, on the other hand, can take any value along a scale. Things like height, weight, volume, voltage, sodium concentration, etc. One way to determine if a variable is continuous is to ask if it would make sense to have a value between any two values. For example – if one person is 58 inches tall and another is 59 inches tall, would it make sense if someone else were 58.5 (or 58.348573) inches tall? Of course!
On the other hand, if you’ve counted 58 bunnies and someone else counted 59 bunnies, it’s unlikely that anyone would ever count 58.5 bunnies (not live ones, anyway).
Warning: sometimes, things that seem like continuous variables can actually be discrete. A good example of this is my digital bathroom scale. Besides telling the most scurrilous lies, it can only register weight in multiples of one half pound. It shifts instantly from 134.0 to 134.5 pounds, with no indication that it even noticed all of the intervening weights.
Anyway, “average” and mean values are meaningless with discrete variables (for the “0.3 person” reason mentioned above) and serve only to point out that the authors don’t know what they’re doing with statistics. Median and mode values are acceptable for both discrete and continuous variables, although the mode is often rather pointless with continuous variables.
[b] Are they correcting for multiple comparisons?
In these data-mining days, it is very tempting for authors to amass vast mounds of data – or buy it from someone – and then go “digging” until they find a correlation that seems interesting. This has led to some very dramatic – although later very laughable – “correlations”.
The problem lies with the tests used to determine “statistical significance”. Without getting into the mathematical theory too deeply, they are simply probabilities. They give the probability that the difference between two (or more) groups could be due simply to chance – the probability that if you selected different subjects from the same groups (for example autistic vs non-autistic, treated vs untreated) that the difference would disappear.
The usual “cut-off” value for accepting that the difference is “probably real” is 5%. This means that we’re willing to accept a 5% chance that the difference is due to random events and not “real”.
What this means is that for every comparison made between two groups, there is a 5% that a “difference” will be found, even if the two groups are identical. If you make multiple comparisons between the same two groups, these 5% probabilities add up (actually, they multiply, but let’s not get all fussed about semantics). Here’s how it works:
2 comparisons – 9.8% chance of finding a “difference” where none exists
3 comparisons – 14.3% chance…
4 comparisons – 18.5% chance…
5 comparisons – 22.6% chance…
10 comparisons – 40.1% chance…
14 comparisons – 51.2% chance…
20 comparisons – 64.2% chance…
Fortunately, there is a way to correct for this problem. Either the Sidak or Bonferroni correction will adjust the probability cut-off limits for the individual tests so that the risk of finding a false association is below a certain limit, usually picked as 5%.
If multiple comparisons are made and there is no mention of a correction, assume that none was made and mentally adjust the “significance” of the findings downward.
Another problem is when multiple comparisons are made but only a few (or only one) are reported. Unless it is mentioned in the “methods” section, there is no way to pick up this “error” (it often seems to be done deliberately). Be suspicious when a large database is searched and one or a few “results” are reported.
Well, this looks like a good place to stop for now.
Next time: Results, Conclusions, References, Biases and the Profit Motive
Prometheus
Filed under: Autism Science, Critical Thinking | 21 Comments »
