The necessity of theory in science, or Big Data is anti-science

Author: John Day (Follow on Twitter: @JeanJour)

Part 1 of 2.

There has been considerable hype surrounding Big Data of late as if it were something really new. The histrionics have gotten quite deafening. I have characterized the current fad as the 6th Generation of Big Data, starting with first generation in the 1830s. After a carriage accident made it impossible to go back to sea, Matthew Maury was made head of the US Naval Observatory in Washington D.C. There using the logbooks returned by Naval Captains after each voyage, he collated data and was able to discover previous unknown currents in the Atlantic and patterns in the wind that allowed shorter sailing times by days or even weeks. The second generation was Sears Roebuck, who in 1910 built a facility on the outskirts of Chicago to fill orders. At the time, Sears had no brick and mortar stores. The catalog was their Web. They were filling 100,000 orders a day and moving a million pieces of merchandise a day in 1910! (You could order everything from nails, clothes, carriages to an entire house! The houses (for which there was a whole separate catalog of styles) were pre-cut (not pre-fab) and shipped in several installments to give you time to build each phase.)

The 3rd Generation would be 1940s Bletchley Park and the advent of the computer. Von Neuman’s interest in computers was to get more data to see the patterns in differential equations he was working on. The 4th Generation would be the 1960s with Illiac IV and the advent of supercomputers, and the 5th Generation would be the 1980s and the establishment (in the US) of supercomputer centers. And now we turn the Moore’s Law crank yet again and we are at the current fad with racks of machines filling huge buildings and with millions of sensors spread around us.

But this last generation is the most dangerous, the greatest threat. Some have even called it a “new science” (if it is then so was the microscope was a new science) or the end of science (all we have to do crunch all this data and we will get the answers). It is closer to the latter than the former, but not for the reasons they think. Big Data is accelerating us toward stagnation. Let me explain:

A different approach

As a grad student, I discovered Joseph Needham’s magnum opus, Science and Civilization in China. It isn’t just a book. It is a multi-volume (with some volumes having multiple books) encyclopedia of science and technology in China up to about 1750, when it becomes too difficult to determine what was purely Chinese and what was influenced by Western contact.

Why was I reading such things? First of all, it was interesting! What other excuse does one need!?
Second, any system designer or architect must collect models to avoid the “If all you have is a hammer, . . .”[1] syndrome. And the models and the accomplishments, I found in Needham were fascinating: a very different approach to many problems than found in the West.

A couple of examples will illustrate what I mean:
In ship design Needham points out that both East and West used nature as a guide. The West used fish; China used waterfowl: Much more appropriate for something at the interface of air and water. Fish are a good model for submarines, but ducks are a better model for boats.

In China, the axis of a windmill is vertical and the vanes hang down. Not only is the gearing simpler, but it is always in the wind. It doesn’t have to turn into the wind.

China had Pascal’s Triangle centuries before Pascal.

Seventy years before Vasco DaGama in 1497 clawed his way down the African coast and rounded the Cape of Good Hope to put into Mombasa on the East African coast, the Chinese Admiral Zheng He paid several visits to Mombasa with a large fleet of huge ships with water tight compartments and other advancements, just out on a good will tour to say the Emperor thinks all of you are wonderful and if you would like to send back tribute to the Great Ming Emperor that would be fine.

Interesting isn’t it?

Scientific theory

It is appropriate that we are meeting in Portugal, which had such a major role in the Age of Discovery. Henry the Navigator’s great accomplishments earlier in the 15th century had left a legacy for Da Gama to build on that the Chinese didn’t have. As Needham points out, there was one thing missing in Chinese technology: There was no scientific theory.

It is all technique, technology; it is an artisan tradition, craft. What do I mean by scientific theory? Robert MacArthur, one of the founders of biogeography, distinguished Natural History from Science in that Natural History describes but Science predicts. The Chinese had certainly achieved a critical mass of knowledge that should have lead to theory. But for some reason (still debated by scholars), there was no theory. Some say, it was because they were so practically minded. However, because there was no theory, there was a tendency to lose knowledge: When Matteo Ricci, the first Jesuit into China in 1600, he initially thought they had brought the knowledge that the earth was round.[2] When quite to the contrary, the Chinese had known it centuries before but the knowledge had been lost. But the lack of theory had another far worse consequence. By the late-Ming dynasty (16thC), stagnation had clearly set in. Artisan traditions are predicated on doing what had been done before; improvements come by trying things, not by using theory to point the way.

Needham attributes both the lack of theory and the stagnation to the fact that merchants had very little status in China, and virtually no power.  All power was with the Emperor. In other words, Needham saw commerce as the driver of technology and the reliance on government funding as leading to stagnation.  But as we have seen more recently, the short ROI of commerce can also lead to stagnation. Everyone is looking for a technology enhancement that will yield a quick result, rather than delving deeper for more fundamental results that could yield far more but may take longer and also threaten to undermine existing investment.

Euclid’s accomplishments

I would tell historians, that another reason there was no theory in China was that there was no Euclid. They would give me blank stares as if to say, “Huh!?” But historians don’t see Euclid as we do. Clearly not Euclid for geometry, the Chinese understood geometry quite well, but as an example of an axiomatic system.

As we all know, Euclid’s accomplishment is the Holy Grail of science.[3]  The ultimate goal in any field is to be able to reduce it to a small number of assumptions from which all else can be derived. Newton did it for mechanics; Maxwell, for electricity and magnetism; CERN and the physicists are trying to do it for everything. Not that any scientist sets out to do that, but every scientist worth his salt is always open to that flash of insight that points toward a unification.

That of course begged the question: Why did the West have Euclid!? Why did Euclid do what he did? What pushed him to create such an elegant edifice!?

Of course, we are lucky just to have Euclid’s Elements let alone know anything about who Euclid really was, how the Elements came about, What made him want to organize things that way? Why was he looking for such an elegant solution? etc. Lost in the sands of time.

As it turned out, I didn’t need to know why Euclid did it to understand why. The insight to Euclid came, reading a Geometry book by Heilbron, where he notes that while several civilizations developed mathematics, only the West developed the concept of proof. The others have recipes, examples that are used as patterns, (dare I say algorithms?) but not proof. It is clear that the Babylonians had the Pythagorean Theorem, but they didn’t have a proof for it.

This answers the question! How?

Challenging proof

What do you do when you challenge a proof?  You question the assumptions. Continually challenging the assumptions leads to the minimum set of assumptions that will suffice.  Hence, an axiomatic system. Then it is a short step to ask what results can be derived with just the assumptions, and then what do those enable and so on and you have the Elements!  (BTW, my favorite disposition of the development of meta-mathematics is Chapter 1 of Bourbaki’s Theory of Sets. It is delightful!)

Why is theory important to science? Bear with me. On June 14, in my next blog I will elaborate on this…

Footnotes:

[1] As Mike O’Dell says, “When all you have is a hammer, everything looks like your thumb!”

[2] This is not the huge discovery we generally think it is. Since ancient times among the educated classes, it was well known. One merely has to watch a ship sail over the horizon and notice that the hull disappears before the sails do (the origin of the phrase “hull down”) to know the earth is round. No one funded Columbus not because they thought they could sail off the edge of the world, but because they knew it was round, knew its circumference and knew they didn’t have ships with sufficient range to make the voyage. Columbus fudged the numbers to make them look feasible, found someone with money that believed his math, and then got very lucky when there was a continent in the way!

[3] A carry over from when the distinction between math and science was not as clear as it is now.

One thought on “The necessity of theory in science, or Big Data is anti-science

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s