The Necessity of Theory in Science Or Big Data is Anti-Science (2)

Author: John Day (Follow on Twitter: @JeanJour)

Part 2 of 2. Read part 1 here.

I had been asked to write a review of a book for Imago Mundi, the premier history of cartography journal. Over the 2014 holiday break, I decided to knock it out.  The book was on Jesuit Mapmaking in Early 18th century China. (I have published a bit on this period.)
The book is primarily about the first major scientific mapping effort anywhere instigated by the Emperor Kiang-Xi and the resulting Atlas. But the book also discussed one of two well-known incidents in the late 17th century where the Jesuits had been pitted against the Court astronomers to see which could most accurately predict three astronomical events: a lunar eclipse, the length of a shadow cast by gnomon at a given time of day, and the relative and absolute positions of the stars and planets on a given day.
The Jesuits produced more accurate results than the Chinese Court Astronomers, resulting in their being put in charge of the Court observatory in Beijing.

Why were the Jesuits’ calculations more precise? It certainly wasn’t because the Chinese couldn’t do the math to the proper precision. After all, the Chinese had been using the decimal system for centuries. (When discussing surds, Needham notes that the Chinese had adopted the decimal system so early it wasn’t clear they noticed that there were irrational numbers.)

Then why?

Because the Jesuits were using techniques developed with and backed by theory.  They didn’t develop the techniques or the theory. Others in Europe had done that. But the “theory” behind it had forced the Europeans to be more precise to back up what they knew, to look more critically at their work, to think more deeply about it, improve their arguments. Hence creating more precise techniques.

The Chinese, on the other hand, had a procedure to follow. They didn’t understand why it was correct other than it had always worked “well enough,” so why look further? (Hmmm, where have I heard that before!) They had been trained that it was the way to do it.  They just knew it worked. And, the procedure didn’t really indicate directions that would lead to how to improve it. (Needless to say, respect for authority and ancestor worship didn’t help in this regard.)

We are seeing the same thing in the systems side of computer science today and especially in networking, where it has been a badge of pride for 30 years that they do not do theory.  In 2001, the US National Research Council lead a study of stagnation in networking research, one quote from their report sums up the problem:

“A reviewer of an early draft of this report observed that this proposed framework – measure, develop theory, prototype new ideas – looks a lot like Research 101. . . . From the perspective of the outsiders, the insiders had not shown that they had managed to exercise the usual elements of a successful research program, so a back-to-basics message was fitting.” [1]

It must have been pretty sobering for researchers to be told they don’t know how to do research. Similarly, the recent attempt to find a new Internet architecture has come up dry after 15 years of work. The effort started with grand promises of bold new ideas, new concepts, fresh thinking, clean-slates, etc and has deteriorated through ‘we should look outside networking for ideas’ (a sure sign they don’t have any ideas when, in fact, the answers were inside as they always are); to ‘the Internet is best when it evolves,’ (they have given up on new ideas) to ‘we should build on our success’ (It is hard to get out of that box).
When I asked my advanced networking class to read recent papers on the 6 efforts funded by NSF on Future Internet, after a chance to read some of the papers, their first question was, “These were written by students, right?” Embarrassingly, I had to reply that, they had been written by the most senior and well-respected professors in the field.

This is a classic case of confusing economic success with scientific success. They were focused on what to build, not asking the much harder and more dangerous question: what didn’t they understand. They didn’t question their basic assumptions. Even though, fundamental flaws were introduced as early as 1980 and made irreversible by 1986 and compounded in the early 90s.

On the other hand, our efforts, which have questioned fundamentals and forced us (me) to change long held views, have yielded new and often surprising result after new result: that a global address space is unnecessary; reducing router table size by 70% or more; recognizing that a layer is a securable container, greatly simplifying and improving security; that decoupling port allocation from synchronization yields not only a more robust protocol but more secure; etc.

Of course they have also shown that connectionless was maximal shared state not minimal; that of the four protocols we could have chosen in the 1970s TCP/IP was the worst, that of the two things IP does (addressing and fragmentation) both are wrong, that the 7 layer model was really only 3 (well, by 1983 we knew it was only 5), and much of what has been built over the past 30 years is questionable. Of 9 major decision points in the Internet, they have consistently chosen the wrong one, even though the right one was well known at the time.

There are many examples from networking, where not doing theory has missed key insights. A few examples should suffice:

  • It is generally believed and taught in all textbooks that establishing a connection requires a 3-way handshake of messages. However, this is not the case. In 1978, Richard Watson proved that the necessary and sufficient condition for synchronization for reliable data transfer is to bound three timers: maximum packet life-time, maximum time to send an ack, and maximum time to exhaust retries. The three messages are irrelevant. They have nothing to do with why synchronization is achieved. There are three messages exchanged, yes, but there are always 3 messages. They aren’t the cause. Watson then demonstrated the theorem in the elegant delta-t protocol. By not doing theory they missed the deeper reason that it worked and missed that the resulting protocol is more robust and more secure.
  • Many people will tell you that network addresses name the host. That naming the host or device is important. (Several of the projects noted above among them.) As it turns out, it may be useful for network management, but not for communications. In fact, it is irrelevant for communications. If you construct an abstract model and carefully look at what has to happen, you see that what the address names is the “process,” the locus of processing, that strips off the header of the packet carrying the address. The host is merely a container. Well, you might say, there are places where there is only one “process” stripping off that header, so it and the host are synonymous. Yes that case exists and in large numbers. But it is not required to exist in all cases and doesn’t in some very significant ones. By not doing the theory, they missed this insight, which made dealing with VMs very messy.
  • In 1972, we first realized that in peer networks, the “terminals,” now computers, could have multiple connections to the network. In all previous networks, the “terminals” were very simple and having only had one was all that was possible. The advantage of a computer having more than one link to the network is obvious: if one fails it still has connectivity. However, addresses in the ARPANET like all previous networks named the wire to the “terminal,” i.e. the interface. If one interface went down, the network had no way to know that the other address went to the same place. To the network, it appeared to be two different hosts. One could send on both interfaces, but not receive on both (not without re-establishing a connection to use the other address). Addressing the interface made addresses route-dependent. Addresses had to be location-dependent but route independent. The solution is apparent if there is a theory: Which we had in Operating Systems. In operating Systems, Application names designate what program and are location-independent, Logical addresses provide location-dependence, but route-independence (independent of where in physical memory), and physical addresses are route dependent addresses (dependent on accessing the memory). Naming the node, not the interface solved this problem. Not only did it not cost anything but it is significantly less expensive, because it requires between 60% and 90% fewer addresses and router table size is commensurately smaller. All other network architectures developed in the 1970s/80s got this right, only the Internet, which doesn’t do theory, got it wrong.
  • But we still thought that addresses could be constructed by concatenating an (N-1)-address with an (N)-identifier. This seemed natural enough. After all, files were named by concatenating directory names down through the tree with the file name as the leaf. That was until in 1982 when we started to look at the detailed theoretical model of what would happen if we did that. It quickly became apparent that it defined a path up through the stack. Concatenating the addresses made them route dependent. Precisely what we were trying to avoid. The address spaces at each layer have to be independent. Of course it was obvious once you remembered what Multics called a filename: a pathname! But it was doing the theory that lead to recognizing the problem. So why does IPv6 embed MAC addresses in the IPv6 address? Because they don’t do theory.

There are many more examples. All cases where not doing theory lead to missing major insights and improvements. But notice that today we are doing the same thing the Court Astronomers were doing. Our textbooks recount how things work today, which students take as the best way to do things. We teach the tradition, not the science. We don’t even teach how to do the science. We don’t teach what needs to be named and why. Watson’s seminal result is not mentioned in any textbook. (One young professor asked me, why he should teach delta-t if no one is using it. (!) I almost asked him to turn in his PhD! We aren’t teaching the fundamental theory, we are teaching the tradition.)

For a talk on this problem a few years ago, I paraphrased a famous quote by Arthur C. Clarke to read:  Any sufficiently advanced craft is indistinguishable from science.”  (Clark said, “Any sufficiently advanced technology is indistinguishable from magic.”) We are so dazzled by what we can do; we don’t realize that we are doing craft, not science.

Big Data is the same thing only worse. Big Data is accelerating the move to craft and is sufficiently sophisticated to appear to be science to the naive. Correlation is not causality. We create algorithms to yield results, but do we have proofs?   Big data is supposedly telling us what to do without telling us why or contributing to a framework of theory that could lead to deeper more accurate results and likely even deeper insights.

Even Wired Magazine called Big Data the End of Science. Although as usual, they didn’t realize what they were advocating stagnation. Of course, it is always the case that every field goes through a period of collecting a lot of data before it becomes clear what is important and the theory is. This has happened before. But what hasn’t happened before is to advocate that we don’t need theory. It is putting us in the same position as the Court Astronomers in 17th C China. And the rate of change and adoption is far faster now than then.

There are those who claim it is a *new* science!  When it is actually the greatest threat to science since the Catholic Church found Galileo guilty of proving a heathen (Aristotle) wrong. (I never have understood that one!)  The Scopes trial was more circus than threat, though that may have changed in the backward US.

We are taking on the same characteristics seen in Chinese science in the 17th C.  It isn’t pretty and it isn’t just networking.  Read the last 5 chapters of Lee Smolin’s The Trouble with Physics.  He sees it there!  And others have told him they are seeing it in their fields as well.

Big Data has us on the path to stagnation, if we are not careful. Actually, we are a long way down that path…

  1. Looking over the Fence at Networking, Committee on Research Horizons in Networking, National Research Council, 2001.
  2. Needham, Joseph. Science and Civilization in China, Cambridge University Press, (Vol 1- Vol. VII, Book 1) 1965-1998.
  3. Smolin, Lee. The Trouble with Physics: The Rise of String Theory, the Fall of Science, and What Comes Next, Houghton-Mifflin, 2006.