The Importance of Data Access for Science and Governance


Mr. Horn and Members of the Committee, thank you for thisopportunity to address you. I am here as a scientist and a citizento testify that regulation and taxes that are promulgated as beingbased on science should not be shrouded in mystery because theunderlying data are not available to the regulated and taxed.

Karl Popper, an English philosopher, inquired as deeply asanyone into questions about what is science and how does sciencework. He concluded that the scientific process, for all itsaccouterments of math, instrumentation, and specialized knowledge,can be divided into two parts. The first part is the formulation ofan idea or a hypothesis or theory, the words are used somewhatinterchangeably, about how some part of the physical universeworks. The second part is the design and execution of an experimentor a test to examine whether or not the idea or hypothesis ortheory is correct. And, of course, if it is correct, the idea orhypothesis becomes incorporated into scientists' knowledge of theuniverse, and it can be used in the construction of other ideas andhypotheses.

Ideas, hypotheses, and theories are the stuff of all humaninquiry, but the requirement of having to devise a test for an ideaor hypothesis and demonstrating that the idea or hypothesissurvived the test is the hallmark of science. An essential part ofthe testing process is review of ideas and hypothesis, tests andexperiments and studies by other scientists. It's necessary becauseall people can make mistakes, and scientists who investigate theunknown are in areas without guideposts or milemarkers. There'snothing shameful about a mistake, but it's inefficient and costlywhen mistakes are incorporated into accepted science. Additionalideas and hypotheses that are based on the mistake are almostcertain to be wrong, and the time and effort expended on developingthem and testing them is lost. Far better to review, analyze, andattempt to replicate a new finding before accepting it.

Scientists have developed myriad methods for review. Scientistsare expected to present talks to their peers in seminars andmeetings of all kinds. Most scientists welcome the opportunity totalk about their results and insights; after all, scientists whodon't talk can pass into obscurity and their work go unnoticed.Scientists tend to be pretty good listeners. They like to learnabout what's new even if it sometimes includes protracted periodsof boredom. It's not all sunny and serene, however. I think everypracticing scientist can recall when a question from the audienceopened a huge hole in the speaker's logic or experimentation.

Beyond oral presentations, scientists, to obtain attention fortheir results and to be successful, have to publish their findings.Scientific journals have varying standards for review of paperssubmitted for publication, and scientists know that the journalswith the most rigorous review are also the most prestigious.

One of the problems faced by scientists and journals is that thedata that go into describing an experiment or a study can be such abulky package that it won't fit into a paper of any reasonablelength. Some journals in economics and political science haveresponded to that problem by requiring that authors inform thereaders about where the complete set of data is available and howto obtain it.

More informally, scientists make personal contact by phone oremail to obtain additional data, or they visit each others'laboratories. There are no rules for such requests or visits, butit's generally understood that it's okay to ask for data that arenecessary for complete understanding of a published paper and notokay to ask for data that are still being examined beforepublication.

Good science requires that observations and analyses berepeatable and repeated. Given information about technique andprocedure by the scientist who made the observation or analysis,other competent scientists should be able to replicate theobservation or analysis. Reproducibility distinguishes science fromanother human activity called magic. For centuries, magiciansclaimed "special powers" that couldn't be taught to others wholacked the power. Now, we know that magic is tricks, and that thetricks are necessarily kept secret so that non-magicians can'tlearn them. Science, on the contrary, works best when it's open toskepticism, review, and attempts at replication.

I am going to focus on scientific data are used for thedevelopment of laws, rules, and regulations, risk assessments andother government guidance documents, and I am going to divide thosedata into two types. Laboratory experiments and replication oflaboratory data can be attempted in other laboratories. Mosteveryone can remember about a decade ago, when cold fusion burstinto the news. The hypotheses underlying cold fusion and theexplanations for how it could produce wondrous worlds of energy inan open glass beaker on a laboratory workbench at room temperaturewere contradicted by much of physical theory, but cold fusiondidn't fade away because of theory. It faded away because otherscientists tried and failed and failed repeatedly to replicate theresults.

There is a similar story of laboratory mistake (or worse) thathas contributed to what are likely to be billions of dollars spenton largely or completely wasted toxicity tests. In 1996, scientistsfrom Tulane University published a paper in Science magazine, oneof the most respected scientific journals in the world with areputation for rigorous review of papers before publication. TheTulane scientists reported that tiny amounts of pesticides, presentat concentrations that are now permitted under stringentEnvironment Protection Agency regulations, could interact andunleash a plethora of adverse biological events. Their report,which was leaked to EPA before it was published in Science wasinstrumental in the passage of the Food Quality Protection Act of1996 and especially important in Congress' directing EPA to requirenew tests of commercial chemicals. The Tulane results attractedmajor press and TV and political attention, they have had lastingimpact, and they are wrong.

Competent scientists in laboratories in universities, thefederal government, and industry tried and failed to replicate theTulane results. Initially, the Tulane scientists stuck to theirguns and suggested that special conditions in their laboratory thatweren't exactly replicated in the other laboratories explained thediscrepancy. These "special conditions" sound a lot like the"special powers" involved in magic that I mentioned earlier, andfew scientists accepted them as the explanation. About a year afterthe publication of their results, the Tulane scientists threw inthe towel, and published a letter in Science that acknowledged thatno one, not even they, had been able to replicate their originalfindings.

Science worked. Even though the faulty (or fraudulent) sciencewas not caught by the reviewers for Science, the requirement thatscientists describe their experiments in enough detail so thatothers can try to replicate them led to the debunking of themistake. Even so, American industry remains burdened with expensiveand unnecessary testing requirements that will drive up consumercosts and almost certainly reduce consumer choice.

That ends what I have to say about data from laboratories thatother scientists can attempt to replicate. I am now going to turnto epidemiologic studies that examine the health of populations ofpeople with particular exposure histories or the histories ofpeople with specific diseases. Such studies cannot be replicated.The data are collected on a unique set of people under uniqueconditions over a unique time period.

In large part, we are here today because of such a study. Astudy done by C.A. Pope and others1 is a primary basis for EPA's stringent airpollution regulations announced in November 1996. At the heart ofthe Pope study is information about a million volunteers whoparticipated in an American Cancer Society and supplied informationabout their habits, workplace and environmental exposures, andhealth. That data set is unique, and it cannot be replicated.

EPA's air pollution regulations are very expensive - tens ofbillions of dollars a year - and some scientists question whetherthey will produce the health benefits claimed by EPA. Congressrequested that the health data from the Pope study be madeavailable to independent scientists, which would include industryscientists, for review and analysis. The scientists involved in thePope study refused to release the data, and initially EPA backedthem up. When EPA changed its mind and said the data should be madeavailable for review, it was announced that the data reallybelonged to the American Cancer Society, and that EPA couldn'trelease them. Pope and his colleagues eventually agreed to releaseall their data to a committee of the jointly industry-EPA fundedHealth Effects Institute, which is supposed to report its analysisof the data in 2000, years after the air regulations went intoeffect.

The Shelby Amendment that directed the Office of Management andBudget to establish procedures for access to federally generateddata was one upshot of the attempt to get those data. In February,OMB published a proposal for the implementation of that amendment.In May, Steve Milloy and I wrote the EPA and requested the datathat went into the Pope study because the same study is the basisfor the calculation of most of the benefits EPA expects from itsproposed Tier 2/Gasoline Sulfur regulation. EPA replied in a letterand supplied us data about air pollution, but stated, "We are notproviding the health survey data you seek, because these data arenot in the Agency's possession.... Since the records were notproduced under an EPA award, the Public Law cited as authority foryour request is also not applicable."2

As a citizen, I am very disturbed by other information in theEPA letter. "The health study data you seek are contained in a database that is proprietary with the American Cancer Society (ACS).The EPA has never had access to this database...."3 Evidently, it's not only critics ofEPA's regulations that have not seen the data. Not even EPA hasseen them. I question whether billions of dollars in regulatorycosts should be heaped on American industry, cities, and consumerson the bases of data that have not been examined by the regulatoryagency.

Pope and his colleagues objected to releasing the health databecause they said it would compromise the privacy of individuals inthe study and make it impossible for Pope and his colleagues to doadditional epidemiologic studies. That is an overblown concern.

For five years, I chaired the Department of Health and HumanServices committee that advised the United States Air Force's studyof the health of the 1200 Air Force personnel who sprayed 90percent of the Agent Orange used in Vietnam. There are few morenewsworthy or politically sensitive epidemiologic studies.

It's an immense study, involving extensive physical andpsychological examinations of the 1200 men who sprayed Agent Orangeand a comparison group of 1200 men who flew and serviced similarairplanes during the Vietnam War but who did not spray AgentOrange. The study began in 1982 and will end with the examinationin 2002. The Air Force has contracted with famous and competentmedical institutions such as the Lovelace Clinic in New Mexico andthe Scripps Clinic in California for the conduct of theexaminations, and the examination records and statistical analysesfill many data tapes and books.

In 1990 or 91, the Air Force scientists told the advisorycommittee that they had received some requests for data. I rememberthat there was a few minutes' conversation about whether access tothe data should be restricted in any way, but that was replacedwith agreement that the data should be made available to anyone whorequested it. I also recall comments that taxpayers had paid forthe data and were entitled to it and that independent analyses ofthe data would strengthen the conclusions that the Air Force haddrawn and that the committee accepted or those analyses would showwhere mistakes had been made.

The Air Force and the advisory committee were very concerned toprotect the privacy of the study participants. An office at theNational Center for Health Statistics is skilled in "scrubbing"data so that personal identifiers are removed, and such identifierswere removed. Releasing data was and is not a trivial affair, but Ithink that the Air Force experience demonstrates thatconfidentiality can be preserved.

My final example of the importance of access to data isconcerned with the herbicide, 2,4-D (2,4-dichlorophenoxyaceticacid), the most widely used herbicide in the country. It has beenthoroughly tested for toxicity, and EPA has declared that there isno evidence to support even the possibility that it causescancer.

But 2,4-D has been the target of epidemiologic investigations bythe National Cancer Institute (NCI), and those investigations havebeen marred by mistakes that would never have come to light withoutpersistent requests for data collected by NCI. In 1986, NCIpublished a study of Kansas farm workers that included a table thatindicated that exposure to 2,4-D increased the risk for cancer, andNCI scientists concluded that 2,4-D was a likely cause of cancer.This widely reported conclusion frightened farmers and other usersof 2,4-D and raised concerns among consumers who worried abouteating food that was contaminated with the herbicide.

Manufacturers of 2,4-D were finally able to obtain a copy of thequestionnaire used by NCI in its study. The NCI scientists hadnever asked a question about 2,4-D use; instead they'd askedquestions about uses of all herbicides. The origin of the mistakethat transformed "herbicides" into "2,4-D," is not known, but NCIpublished a correction. In a subsequent study of farm workers inIowa and Minnesota, NCI completed its study without asking about2,4-D use. Then it went back and resurveyed study participants andtheir relatives about 2,4-D use. The resurvey delayed thepublication of the study by two years, and when the study appeared,there was no mention of 2,4-D.

Again, industry officials requested and obtained informationfrom NCI, and the resurvey data showed no association between 2,4-Duse and increased cancer risk. NCI scientists never released thosedata. Those data, of course, undermined any connection that couldbe drawn between 2,4-D and cancer, which they persisted insuggesting.

Each of the NCI studies was released with great fanfare thatproduced a lot of press coverage about the risks from 2,4-D. Thecorrections that showed no evidence of risk attracted far lessattention.

In 1991, NCI published a study that showed an associationbetween cancer in dogs and the dog owners' use of2,4-D.4 Like the NCI studiesof farmers, the dog study attracted a lot of attention, andeditorials drew attention to the similarities of the cancersreported in the farmers and in the dogs.

Industry officials had some doubts about the methods of analysisused by the authors of the dog study, and they requested theunderlying data from NCI. NCI stonewalled release of the data formore than 18 months. Although the dog owners' names had alreadybeen removed from the data, NCI said that they were concerned that"industry" would use information about the breeds of the dogs andZIP locations to track down and harass the dog owners.

Eventually, NCI released the data, and scientists at MichiganState University reanalyzed the data. Their reanalysis revealedseveral flaws in the NCI dog study, and when those flaws werecorrected, the association between 2,4-D and cancer in dogsdisappeared.5 The 2,4-D sagashows the importance of citizens having access to data to check onthe work of government scientists.

Science depends on skepticism, review, criticism, andreplication. Good science and good scientists thrive under thoseconditions.

The science used to support regulations and taxes must be basedon publicly available data for review and analysis. Otherwise,government, simply by calling any collection of data, conclusion,and conjecture "science" and refusing to let others see the data,has a free hand to impose taxes and regulations.


1 Pope, C.A., M.J. Thun, M.M.Namboordiri, D.W. Docery, J.S. Evans, F.E. Speizer, and C.W.Health. Particulate matter as a predictor of mortality in aprospective study of United States adults. American Journal ofRespiratory and Critical Care Medicine 151: 669-674.

2 Wegman, L.N., Direct or, AirQuality Strategies and Standards Division, U.S. EnvironmentalProtection Agency. Letter to Steven J. Milloy, June 9, 1999.

3 Ditto.

4 H.M Haynes, R.E. Tarone, K.P.Cantor, et al. 1991. Case-control study of canine malignantlymphoma: Positive association with dog owner's use of2,4-dichlophenoxyacetic acid herbicides. Journal of the NationalCancer Institute 83: 1226-1231.

5 J.B. Kaneene and RA Miller. 1999.Re-analysis of 2,4-D use and the occurrence of canine maliginantlymphoma. Veterinary and Human Toxicology 41:164-170.

Subcommittee on Government Management, Information, and Technology
Committee on Science
United States House of Representatives