2020-10-19

The Wild, Wild Web

This is a little history of the early World Wide Web (WWW) for the benefit of the younger generation which may have not experienced the Internet in its infancy and therefore might not be aware of the horrors that it involved, and why certain things have come to be the way they are today.

As you are reading this, and thinking to yourself that it could not possibly have been as bad as I am describing it, remember, the general public was experiencing it using 2400 baud modems.

(Useful pre-reading: About these papers)

The World Wide Web was created in 1989 at CERN () to facilitate the sharing of information between universities and scientific institutes around the world. It was still not much more than an academic prototype when two years later, in 1991, it was made generally available outside of the educational and scientific institution. As soon as that happened, both the general public and the commercial sector started very eagerly embracing it, and this was the firing shot for the technology companies to start a big race for market share grab.

The World Wide Web was quite unlike anything that humanity had ever seen before: it was not an incremental improvement upon some pre-existing technology that the public was already familiar with, but a totally new thing; it was not another novelty that some might like and some might just not develop a taste for, it was going to be everywhere, used by everyone, and affect everything. It was clear from the beginning that it was going to be big.

Unfortunately, the people who originally came up with the HTTP protocol (), which is what makes the WWW possible, had a very limited idea of how it was going to be used, so their original prototype was woefully inadequate. Thus, in the years following the introduction of the WWW there was a massive effort to improve and extend the protocol. However, there were no standards in place, and no agreement on what should be done and how it should be done. The market demanded functionality much faster than the technologists could create it, and not a single company was in favor of things being done the way a competitor company was suggesting.

Decisions were made, based not on scientific or technological merit, but on burning market demand and market dominance aspirations. This led to many haphazard solutions being put into place to quickly cover immediate needs without any long-term vision. There was a lot of "I don't care if it is good, I want it yesterday" going on. Features were being implemented in hacky ways because nobody could wait for the protocol to be amended to accommodate them.

For example:

  • The web was originally intended only for displaying text, so the prototype did not include any provision for displaying images inside web pages. Support for images was added as an afterthought, and in an entirely ad-hoc way, without first making the necessary amendments to the protocol to accommodate them, so the result greatly suffered in terms of performance, because a new connection had to be established between the browser and the server for every single image on a page. I kid you not, there was a period of time in the early nineties when you would visit a page, and instead of images in it, you would see placeholders. Then, as you patiently waited, one by one the placeholders would be replaced by actual images. The protocol was later amended to address this problem.
  • The web was originally intended to be used only by anonymous users, so the prototype had no provision for visitors being remembered by the server when they come again another day, or even as they navigate from page to page during a single visit. There were some simple forms that could be filled in, but since the server had no idea who was submitting these forms, they could only be used for perfectly anonymous surveys. (And not even perfectly: the server could always take note of the user's IP address.) Support for identifying visitors was hacked into the protocol as a half-baked afterthought as late as in 1994, and only with the narrow-minded goal of enabling a rudimentary shopping cart, after much pressure from the commercial sector, which of course needed to use the web to sell stuff. That half-baked afterthought was the now infamous cookies ().
  • The web was originally intended only for navigating from static page to static page, so the prototype had no provision for dynamically changing page content. This was remedied as late as 1995 when Netscape introduced JavaScript on their browser. Despite its name, JavaScript had absolutely nothing to do with the Java Programming Language; the name was a marketing ploy. JavaScript was terribly bad as a language, and even its creator, Brendan Eich, later admitted so. Since Netscape was the predominant browser at that time, the use of JavaScript caught on, and then other browsers started supporting it so as to not appear incompatible with web sites that were already working with Netscape. So, JavaScript became the de facto standard. For a short while during the late nineties Microsoft tried to push their own scripting languages like VBScript and JScript running on their own browser, but luckily nobody cared.
  • Once we had scripts running on the browser, these scripts needed to be able to exchange data with the server, but there was no provision for such a thing in the HTTP protocol. To make matters worse, in the mass security hysteria that followed the realization that in a connected world, everyone was hackable by anyone, all TCP/IP ports across the world had been hastily blocked, except for SMTP and POP3 for e-mail, and HTTP for the web. So, HTTP was practically the only protocol available, despite being unsuitable. For this reason, the monstrosity known as REST was concocted to allow page scripts to communicate over HTTP, and the specification of REST was made to look as if it underlies HTTP, even though it was entirely an afterthought built on top of HTTP. Since then, REST has had a big impact on the way of thinking of web developers, by introducing the notion of a network that spans the globe and consists of resources identifiable by universal locators. Nobody seems to find it the slightest bit suspicious that this world-view was not intentionally designed this way, but came about purely by historical accident.
  • Web sites needed to be able to show new information on a web page not only as a result of user actions, but also as a result of events happening on the server. For example, a chat system would need to show new chat messages to the user as they arrive, without requiring the user to refresh the page. Unfortunately, the HTTP protocol had not been designed for this purpose at all. It was left up to the developers to overcome this limitation by bending and twisting the protocol in ways quite different from how it was designed to be used, namely with this formidable hack known as ajax. () It was only in 2009 that the WebSocket () standard was introduced, and it was again of course an afterthought, as evidenced by the "HTTP Upgrade” () header hack. It is just that the hack is now built into the protocol, so it is official.

Essentially, the protocol was being amended to include ad hoc features that had gained traction because the company introducing them had a big market share. The IETF () even stated it as their official philosophy "to keep basing standards upon successful prototypes", which is another way of saying "we will wait for someone to hack something together, and if it catches on, we will call it part of the standard."

The original technologies that comprised the WWW were so simple that one could barely say that they constituted inventions, but many of the hacks that had to be introduced later in order to accomplish something useful with it, such as cookies and ajax, had to be so ingenious that they could arguably be classified as inventions.

There were also many opportunities for improvement that were lost due to lack of consensus, because consensus is very hard to reach among companies at cut-throat competition against each other. So, many things were done incredibly backwards and certainly not in the best interest of the public.

We should not be too hard on the original creators of the WWW; after all, experimental prototypes are supposed to be just that: good enough to demonstrate an idea, but woefully inadequate in all other respects. The mistake was the release of the WWW to the public when it was clearly too early, and the initial lack of an arbitrating body, leaving all innovation up to relentlessly competing market forces. Business people are to blame for everything, as always.

Unfortunately, the rate of adoption of any newly introduced technology appears to depend not so much on its technical merit, but on how successful it is in covering narrow-minded immediate needs at hand. Sometimes, the technology that wins is the one that gets introduced first, despite being vastly inferior to a technology introduced shortly afterwards. Once a certain technology becomes widely adopted, it entrenches itself into the technological landscape, and it stays with us for a very long time, no matter how ill-conceived it was. Any attempt to improve an existing technology must always be backwards compatible, so as to ensure continuity in the transition from the old to the new, otherwise it has no chances of becoming a success, which means that you can never actually change the old technology, you can only add to it. The old problems will always be there, and the new solutions will always suffer to a smaller or larger extent due to those old problems. Thus, the echoes of the unhealthy Wild, Wild Web era of the Internet still linger in the technological landscape even today, some 30 years later.


References:

w3c.org - Raggett on HTML 4 - chapter 2 - A history of HTML  by Dave Raggett


No comments:

Post a Comment