September 14, 2007

The most valuable pages of the World Wide Web

The World Wide Web contains a huge amount of files. Of those, HTML pages can point to each other (and other files too) with hyperlinks. Thus created hypertext structure can be presented with a directed graph.

Technically, the World Wide Web graph can contain cycles, but this is only possible if a page has been modified after it has been referenced. Specifically a page which has never been modified after it was created, cannot participate in a cycle. Therefore there could exist leaf pages, which are only linked to and not contain links themselves.

It is also seems reasonable that if page A links to page B, then the owner of page A somehow values page B, even if in some negative kind of way. Otherwise, he wouldn't even bother to put it there.

Now, the question is - aren't the most valuable pages of the World Wide Web the ones that are only referenced to but do not reference other pages themselves ? Taking it one step further, is a value of a page a function of N/M (where N is the number of the links to this page and M is the number of links from this page) ? Then a page with no links in it will indeed have infinite value.

1 comment:

Andrew Dalke said...

I can make broken links to pages that don't yet exist. In the future that page might exist, which can reference the original page. At that point there's a cycle, with neither page modified after page creation.