Q: First, let's begin with a basic question, what do we mean when we say DNA fingerprinting?
A: DNA fingerprinting, or DNA profiling as I prefer to call it, characterizes a small portion of our DNA. It is a way of identifying the DNA content of an individual. We think of fingerprints as being unique to the individual. We know that even genetically identical twins have different fingerprints. In contrast, DNA typing, because it uses a very small fraction of the DNA, is certainly not unique.
Q: Let's imagine an hypothetical crime scene. When the technical people arrive at the crime scene, what do they do, what are they looking for?
A: The people at the crime scene look for biological tissues, such as blood stains on the ground and semen stains on clothing. They may take swabs from the victim's body if rape is involved. It is also possible to extract DNA from exotic things like the saliva on the back of a postage stamp. If it is a very old crime, they may obtain DNA samples from a skeleton.
Q: Wouldn't a drop of blood or other specimen at a crime scene be commingled with all kinds of other DNA from bacteria, flora, etc.? How is this extraneous DNA excluded?
A: The methods used in forensic DNA are not so much concerned with excluding extraneous DNA as with the identification of human DNA. The probes used are very specific for human DNA.
Q: Tell us a bit about the techniques used in forensic DNA testing. Are the laboratories using PCR to look at allele combinations or RFLP (restriction fragment length polymorphisms) or both? What are the advantages of the different techniques?
A: The laboratories use both techniques. The RFLP tests were developed first and are better in the sense that they are much more variable. There are many more variants in the population. A typical RFLP will have 30 distinguishable types. The disadvantage of the RFLP method is that it requires a relatively large amount of DNA along with radioactive labeling visualized on a gel. To expose an X-ray film with a radioactive-labeled probe takes about a week. You have to do that for each system, so the entire process can take many weeks. PCR techniques offer the advantage of requiring only trace amounts of DNA, and they can be done overnight. Unfortunately, PCR systems currently in use are not very variable, allowing typically only three of four variants.
Q: When laboratories use RFLP they are only looking for five or six polymorphisms. Why not look for ten or 100 or 1,000?
A: The state of North Carolina uses eight polymorphisms. I believe by the time you have eight systems even with as few as 20 alleles, you are not going to gain anything by looking for more.
Q: A number of statistical techniques are now used to confirm the reliability of these methods. Can you explain how the most fundamental technique, the allele multiplication rule, is used to rate the odds of a match between specimens?
A: If we want to come up with a figure for the frequency of the pattern, we rely on a statistical model. A DNA profile contains information from, let's say, five RFLPS or five PCRs. This provides at least ten pieces of information. The entire profile has almost never been seen. Out of the 30,000 people who have been typed and their profiles put into a database, no two people have had the same profile with five loci. While the entire profile has not been seen, each component has been seen quite often. Let's say each variant at each locus occurs about ten percent of the time. So say we have a ten band profile based on five loci, with each band occurring ten percent of the time. We then multiply the ten numbers together to gain a probability for that combination.
Q: It has been suggested that some factors could affect the accuracy of that kind of probability calculation. For example, some alleles are probably more likely to be seen in a given racial or ethnic group. This led to a concept called the ceiling principle. Can you explain this concept?
A: The ceiling principle was a method designed to be conservative in estimating probabilities, providing a frequency which would not overstate the strength of the evidence. The DNA databases used are designed to be representative of the entire population. If a crime was committed in a particular area, and the suspect in a case belonged to a specific population group, the frequency for a certain allele might be higher than for the population as a whole. So it would be prejudicial to the defendant to quote the population-wide frequencies instead of the specific ones.
The ideal would be to have data bases tailored to every crime. That is not practical, mainly because the population groups are not well defined. So we are pretty much obliged to use the wider population samples. There are ways to characterize the variation in allele frequency across population subgroups using a statistical method developed by population geneticists in the early 1950's. We really would like to have samples from subgroups of the population to be able to estimate how much they differ one from another. Because these subgroups are ill-defined we can't sample them. So we fall back on what is available, geographic sampling. The FBI has samples from different states and has also compiled a worldwide survey derived from data bases compiled by forensic scientists around the world. So we can compare the frequency of alleles in different recognizable geographic groups.
We have found that any particular allele can have a frequency that differs significantly from one population to another. It is the frequency which differs, not whether that allele occurs in a given population. However, the variants occur within all populations. This leads us to believe that those variations occurred before the divergence of the various human population groups. So each individual allele frequency varies depending on the population, but when we take a collection of say ten alleles, the ups and downs tend to cancel out. We find that there is not really a great deal of difference in the profiles we have seen from one group to another. When we modify the product rule appropriately by measures of population substructure, then the ceiling principle is inappropriate. I think the ceiling principle is poor science and I don't think it will be used in the future.
Q: Doesn't it seem that human error involved in the laboratory work would be the weak link in the chain of DNA forensics?
A: There has been a lot of discussion about the potential for human error. I would think the weak link would be right at the beginning, for example, does the tube labeled 'crime scene blood stain' reflect the true source of that material? The forensic laboratories have a lot of safeguards built in, such as dual observation of each step, and signing for custody of the evidence. Forensic laboratories have a lot of experience in taking care of evidence. But I take your point, if there is going to be an error, it would be of the gross human kind, rather than in technique.
Q: Modern science in general relies on publication of data in peer-reviewed journals, sharing raw data with other researchers to confirm conclusions, etc. One criticism of forensic DNA profiling as opposed to genetic susceptibility testing, is that the methodologies have not passed through the usual channels of peer review and comment.
A: In the early days that was true. Some people were very jealous about these data. However, the data are now routinely made available to researchers. There is now an extensive bibliography of studies available. The FBI, Lifecodes, Cellmark, and other laboratories have all published peer-reviewed scientific papers explaining their protocols and methods of analysis. When they do publish, they are then obligated to make the raw data available.
Q: Several laboratories have established proprietary techniques in DNA profiling. Is one RFLP the same as the next? What are the differences between the companies' methods?
A: The RFLPs do differ from one laboratory to the next, but the PCRs do not. Because the RFLPs are so variable, it is not trivial to distinguish one variant from another. As a result, ad hoc methods are used to accomplish this. We are talking about variations in length of the regions. The regions examined may vary in length from 500 base pairs up to 20,000 base pairs. They differ in the multiples of repeat units of about ten. So if we are going to go from 500 to 20,000 in sets of ten, we are talking about thousands of types, far too many to distinguish on current gels. So binning strategies have been developed to amalgamate alleles that are close together. Different laboratories use different binning strategies.
Q: So it sounds like the ultimate in sensitivity and specificity in DNA profiling would be a combination of RFLP plus a PCR?
A: Actually the ideal would be to use sequence data, which is being used in some contexts. The Armed Forces Institute of Pathology, for example, uses PCR mitochondrial sequence data to identify war remains. This method can make very specific identification using only a few hundred base pairs.
Q: It seems straightforward enough from the scientific perspective what these tests are measuring and that they offer high levels of reliability and accuracy. So why does there appear to be so much debate on the validity of DNA forensics and the statistical accuracy of DNA profiling?
A: There is a perception among the public that there is some debate within the scientific community. I believe that there is no such debate. When we look at the scientific literature, which is where science is discussed, the published, peer-reviewed papers are overwhelmingly in favor of this technology and the protocols and analytic methods used. Much of the current debate has been outside of the scientific literature. It typically comes from court cases, where someone is on trial for a crime. The defendant and prosecutors each have expert witnesses, so it looks likes half of the scientists are on one side and half on the other. This is quite misleading and it is not the way science operates. There are very few people who have thought about and examined the issues carefully who remain critical of DNA profiling.