Scientists often write custom, domain-specific computer programs as part of their research. But the source code to those programs – when it is released at all – is commonly released under restrictive licenses that prevent redistribution or modification. Recent editorials in the two most prominent scientific journals arguing for greater sharing of scientific source code have drawn some long-overdue attention to this issue (Reviews of the editorials are freely available online, but both original articles – even while arguing for more openness in science – sit behind massive paywalls, an irony that has been picked up by other observers as well. Open access is an important issue, but it is a topic for another post.) The common practice of publishing scientific conclusions based on code that is not itself made public has long troubled me. One incident in particular brought the issue into sharp focus in my mind.
A defining moment
In the summer of 2003 I spent two months in Puerto Rico to be with my wife, who was doing her dissertation research on shrimp that live in the streams there. I was a graduate student too at the time, trying to finish my own degree in chemistry. I knew that during the trip I would have a lot of “down time” holed up in our apartment, which I planned to spend analysing data I had accumulated over several years. Before heading for the tropics, I made sure I had all of my data with me, and the tools I needed to work with it.
One of those tools was a program for translating the raw Nuclear Magnetic Resonance (NMR) data generated by instruments into a more useful representation. This program, which had been written by scientists working at the National Institutes of Health, was available gratis to researchers and widely used in my field. I relied on this program, so before leaving for the tropics I made sure that my copy was good. So, it came as a great surprise several days into my visit when the program abruptly stopped working: it contained a logic bomb. Past a certain date, it would refuse to start normally, instead attempting to send you an email telling you to download the next release.
The program’s authors would claim that the logic bomb was innocuous, that it’s only purpose was to reduce the burden of supporting multiple versions by keeping all users in sync. They promised to always provide the next release for free, and they assumed that most users would be in a university setting with a high-speed internet connection, so that the forced upgrade would be a simple matter completed in a couple of minutes. But in my case their assumptions turned out to be wrong. The nearest internet connection was a half day’s drive away, slow, and unreliable. I could not reasonably ask my wife to interrupt her field work schedule so that I could attempt an upgrade, which might not even succeed. Besides, I had planned in advance to have everything I needed with me, and was annoyed at having my plans derailed by this logic bomb. And despite having programming skills and a decent development environment available to me, I could not solve my problem simply by editing the program’s source, because I did not have it – the authors declined to distribute that. That wasn’t some kind of oversight on their part: in a startup message displayed by the program, the authors made clear their intention to prevent any redistribution or modification by users, even to solve their own problems:
“This program and its related software is based on a design originating in the NIH Laboratory of Chemical Physics, NIDDK. It is not to be distributed or modified.”
This was a radicalizing, formative moment for me, my equivalent of the encounter between the young Richard Stallman and a malfunctioning laser printer. I was already a user of Linux and GNU and I had started to learn about the Free Software movement, but it took this incident to show me clearly how attempting to control the behavior of users – even in benign, relatively mild ways like forced upgrades – can have unintended consequences. And I began to question whether the motivation behind the restrictions on this code really were so benign. After all, the program’s authors were also my potential competitors. They said they would always provide future releases freely, but nothing required that they keep this promise. What if they one day decided to stop providing their upgrades? What if they only provided it to certain people of their choosing? The user community that had built up around this program was dependent on its original authors in a way that is unusual in science. By banning me or anyone else from building on their work, and publishing the results, they were ensuring that any advances would come from them. It’s normal in science to care deeply about getting proper credit for one’s work and to reserve the right to choose the time and place of publication. But it is not normal, once you’ve published a method, to try to prevent others from reading about it, modifying it, and publishing their modifications. Not normal, that is, unless the method happens to be in the form of a computer program, which is just a method of converting input to output.
Broader questions aside, I still was faced with the immediate problem of not being able to do my work. Not ready to accept the ethical or legal validity of the restrictions that had been placed on me, I set out to solve my own problem however I could. And as it so often turns out, these programmers’ desire to control their users’ behavior through technical means exceeded their technical ability to do so.
Disabling the logic bomb
I quickly figured out that I could get the program to run by setting the system clock back a month. But this was inconvenient, led to incorrect time stamps on things, and led to all sorts of other problems. So I set out to circumvent the logic bomb in the program itself. I was a chemistry student, not a computer science student, but even at that stage in my career I at least knew enough about programming to surmise that the program might call “time()” somewhere and compare the result to a fixed value. I fired up the program in a debugger, and saw this:
#0 0xb7ed10f5 in time () from /lib/libc.so.6 #1 0x0809a6c3 in pastDate2 () #2 0x0809a5b9 in pastDate () #3 0x0804c285 in main ()
OK, not a whole lot of subterfuge there. What if I just made “pastDate()” return right away? I did this:
0809a5a4 : 809a5a4: 55 push %ebp 809a5a5: 89 e5 mov %esp,%ebp - 809a5a7: 53 push %ebx - 809a5a8: 83 ec 08 sub $0x8,%esp - 809a5ab: 68 d6 07 00 00 push $0x7d6 + 809a5a7: c9 leave + 809a5a8: c3 ret + 809a5a9: ec in (%dx),%al + 809a5aa: 08 68 d6 or %ch,-0x2a(%eax) + 809a5ad: 07 pop %es + 809a5ae: 00 00 add %al,(%eax) 809a5b0: 6a 01 push $0x1 809a5b2: 6a 08 push $0x8 809a5b4: e8 d3 00 00 00 call 809a68c
And it worked – it took a change of two bytes to disable the logic bomb. Incidentally, I continued using the program for several more years and never did upgrade. I never saw the need to.
Why is scientific software so closed?
Every field of science today is dependent on computation, but research being what it is, there are often computational needs that can’t be met by any widely-available, well-known tool. So scientists who know some programming tend to write these niche, domain-specific applications themselves. But scientists who know some programming often aren’t very familiar with best practices in software development, and that includes licensing – many scientists, even those who program, have never heard of the GPL.
As a result, such niche scientific programs, when they are released at all, are often released under custom licenses that were written without much thought. And these custom licenses, more often than not, are quite restrictive. Why? There are several reasons. Some scientists, conscious of their limitations as programmers, don’t want their source code seen out of fear of embarrassment (reputation is everything in science.) Some are concerned about the potential extra work of supporting a user base if their programs are distributed. Some scientists – or their institution’s intellectual property office
– may be holding on to the hope of some day profiting from the code, though this hope is forlorn for most niche, domain-specific code that is only of interest to a handful of colleagues. Some may actually want to make it harder for others to build on their work, and thus increase their own standing. That kind of territorial attitude is out of place in science and can actually backfire.
I don’t know if the GPL is the answer to everything in scientific licensing, though I have used it for my own most significant scientific software. I do know that it should be used more often than it is, and that scientific licensing in general right now tends to be too restrictive. And I know that the scientific community isn’t thinking about licensing enough. Even the editorials in Science and Nature are another illustration of this. They call for distribution of source code that underlies published results. But access to source code is only part of what software licensing is about. What about derivative works? What about warranties? What about redistribution?
Science depends on the exchange of data, information, and ideas. In a sense that’s what science is. More and more, computers are what we use to do this. So we need to be thinking about standards of behavior for how we use them. “The code is mine, all mine” won’t cut it.