August 26, 2008

Google, DNS and finding stuff on the Internet

What if you've encountered Internet for the first time ? World-wide-web for that matter. Someone opens you a browser and says

- This is Internet, it has everything. Just type in an address of a site you want to visit.

Er, excuse me ? An address of a site I want to visit ??? WTF is that supposed to mean ? Anyone remember the address of the Pyramids ? I wouldn't mind visiting that particular site.

But really, what is a site address ? It is merely a reflection of a technical detail of the physical network organization. It just so happens that for the sake of unambiguous data delivery each computer on the Internet needs its own unique address. Now, the techies that invented it in 1970s just chose such address to be an integer number. If it was for them, or shouldn't the count of connected computers have exploded, numbers could have been used just as well:

- Connect me to server 12345 !
- You got it.

But people are notoriously bad in remembering numbers, and so there emerged a service similar to the yellow pages where each address could be given a name, and conveniently looked up later. Then it went like this:

- The new server is at great.new.site.com

and the user never bothered to translate "great.new.site.com" into 12345. The responsible domain name system (DNS), the ubiquitous service for looking up pieces of information by name is quite fascinating. It is perhaps the biggest distributed database in the world, and its capabilities have been largely underutilized over the years. May be this is why it is still up and running.

Presence of the DNS became as important as physical network connectivity. If there is no DNS, the Internet might as well be down. If you care to notice, it is exactly DNS where mainstream operating systems have their like only built-in redundancy. You are actually encouraged to configure multiple DNS servers at once, just in case one dies.

Well, DNS being a nice thing, it still got it own idiosyncrasies. There is really no reason for the site names to be organized in a dot-separated hierarchial fashion. In other words, in

www.yahoo.com

there is no need for neither "www" nor "com". Yahoo is the name, but the rest is irrelevant. The whole "dot separated" thing and "com" are just technical nuisances which made the development of DNS technically feasible, so that the database could be distributed more effectively. And "www" is nothing but a habit, a meme introduced to the culture. The sounds of "double u, double u, double u" and perhaps the visual rhythm of letters www immediately prepare anyone familiar with the Internet that a site address is being transmitted. Synchronization bits if you like.

So, what matters is the "yahoo" part, right ? The name. But the name of what and what's in a name ?

First, I'll go about the "name of what" part. World wide web is de-facto a hypertext, a billion of files intertwined with mutual links. Accordingly, what you type in is but an entry to the web. Once inside, you neither type nor care to remember any more names nor addresses, you just keep following the links. Have you ever stared at a blank browser page trying to invent another name which to type in just to see what comes up ? That's the idea. Any name could be tried as entry gateway, but picking them at random is extremely ineffective. Whenever one has multiple entry points to the web, he has to write them down, which is a starting point for a personal bookmark catalogue, doubtfully a popular sport any more. Instead it happens that everyone has like ten favorite entry points to the web, the ones that are fashionable, familiar, have catchy names or refer to the person's location or interests. Ok, so each user has his own favorite entry points to the web and they are the only ones that need names.

What's in a name then ? Oh, it is then totally irrelevant what exactly the name is. www.google.com, www.wikipedia.org, www.reddit.com, www.e1.ru, www.kazna.ru whatever is meaningless but catchy or meaningful but easy to remember in connection to some relevant topic.

Google is a catchy name and it presents the most rich and the most poor entry page at the same time. See, it might look like it helps, when you type www.google.com and the simplest possible page pops up and says: hi there, just type in what you need. But it is the same question we have started from - just type in what you need ! The only difference is that before we had to type the name of a single site, presumably known beforehand. Now we have to try keywords until we find something.

One point here is that the DNS names of the sites are largely irrelevant. A name of a site used to be the single keyword available for finding it, but no more. Now you are far more likely to find a site through a right query to google.

Another point is that is that google and the likes perform the same function DNS was supposed to - for relieving the user from remembering addresses and looking up relevant sites. Truly distributed DNS mapping site names to addresses became the part of the physical network (on the right ISO layer if you care), and got replaced by centralized mammoth server farms that map keywords to pages.

Finally, this switch gave enormous power to a proficient user, but for the average user it is still a blank stare at

- This is Internet, it has everything. Just type in what you want to find.

Er, excuse me ?

1 comment:

dtbow said...

Very cool post, especially concerning the date. In fact, you foresaw the appear of Google Chrome :)