Why we need an underground Google

Many parts of the Internet are hard to index, or are blocked from being indexed by their owners

Comments

There has never been a search engine that accurately reflects the Internet.

In the 1990s and 2000s, the limitation was technical. The so-called "deep web" and "dark Internet" -- which sound shady and mysterious, but simply refer to web sites inaccessible by conventional means -- have always existed.

Many parts of the Internet are hard to index, or are blocked from being indexed by their owners.

Companies like Google have worked hard to surface and bring light to the "deep, dark" recesses of the global web on a technical level.

But in the past few years, a disturbing trend has emerged where governments - either through law or technical means or by the control of the companies that provide access - have forced inaccuracy, omissions and misleading results on the world's major search engines.

The censorship

Until recently, search engine censorship was not on the list of first-world problems. But in the last few years, governments in the United States, Europe and elsewhere in the industrialised world have discovered that, although they're prevented by free-speech laws from actually blocking or banning content where it lives, censoring search engine results is a kind of "loophole" they can get away with. In an increasingly digitized, search-engine discoverable world of content, censoring search results is a way to censor without technically violating free speech protections.

Starting in 2011, companies like Google started reporting a disturbing rise in government requests for search engine results to lie -- to essentially tell users that existing pages and content on the Internet do not exist when in fact they do. Requests for such removals by the US government, for example, rose 718 per cent from the first half of 2011 to the last half. And they've continued to rise since.

And such requests weren't just coming from the US, but from "Western democracies not typically associated with censorship," according to the Google policy analyst who reported the trend on behalf of the company and talked about Google's Transparency Report.

The reasons for these requests vary, and often sound reasonable -- national security, law and order, national pride, religious sensitivity, social order, suppression of hate speech, privacy, protection of children -- you name it. But when you add them up and allow them to grow in number over time, the cumulative effect is that increasingly, search results don't reflect the real Internet.

Many of these cases start out with the best intentions but result in serious problems. Let's start with a disturbing recent case in Canada.

A Supreme Court of British Columbia ruling on an intellectual property dispute between two small industrial equipment companies ordered Google to not only delete all search results referring to one of the companies, but all future such results as well -- not only in Canada, but worldwide. (Yet another unsavory dimension to the case was that the ruling applied only to Google. Bing and other search engines were not required to comply.)

The particulars of the case are irrelevant and the data involved unimportant. The precedent that a government in one country could censor information in other countries has bad implications if allowed to stand. Imagine if China were allowed to censor information about the Dalai Lama within the US, or if Pakistan were allowed to censor images offensive to Muslims in Denmark.

Even more recently, the European Court of Justice brought into existence Europe's "right to be forgotten" ruling. In a nutshell, Europe wanted to protect citizens from the fact that the Internet never forgets.

The particular case heard by the court involved a Spanish man who was in the press for serious debt problems, but who later climbed out of debt. Rather than ruling that the actual information about his money problems be removed or censored, the court invoked the search engine loophole for censorship and ordered Google, Bing and other search engines to remove his name as a search query that returned the outdated information about his finances.

Worse, the ruling required search engines to offer a process by which any European could request similar treatment, and ordered Google, Microsoft and other search engine companies to judge whether those requests were valid and to take action on the valid ones.

At last count, Google had received some 70,000 requests for changes to search results under the ruling in the past month. Microsoft only this week launched its process for censoring results.

Obviously, there's an argument to be had over the right to be forgotten vs the right to remember (i.e. freedom of speech). There's the issue of a slippery slope to more draconian forms of censorship. And yet another problem is the fairness issue -- search engines might be pressured by powerful media organizations (as has already happened) to uncensor results, while less powerful ones won't be able to. We'll leave those issues for other articles.

This column is focused on the cumulative effect of all these changes to the European version of search engines -- that search engines there will become radically inaccurate. Google users will have to assume that any search for any name might return censored results. We can't know. (Google now puts a warning to that effect on every search result involving a name, regardless of whether any removal has taken place.)

These examples show the potential for free-thinking democracies to force inaccuracy on search engine results.

It goes without saying that authoritarian regimes are way ahead in search engine censorship. Searching for any number of sensitive search terms on any search engine legally allowed to operate in China turns up a highly skewed set of results, which are presented as if they represent what's actually on the Internet.

Regimes ranging from democracies, such as Turkey's, to authoritarian governments, such as Iran's, are increasingly following China's lead in sophisticated methods for censoring search engine results.

The bottom line is that with each passing year, search engine results are becoming increasingly inaccurate and unreliable and search engines are therefore increasingly failing to perform their most basic function -- helping you find what you're looking for on the Internet.

It's clear that the world -- from the victims of the most repressive governments to the citizens of the freest democracies -- need search engines that can't be made inaccurate by governments.

So what's the solution?

An immodest proposal

The reason governments can force search engines to be inaccurate is because search engines are caught in a catch-22.

In order to be comprehensive and fully index the Internet, search engines need a lot of money for massive server farms and highly trained employees. In order to make money, they need to cooperate with governments and obey national laws and rules in whatever country they operate in, so they can sell ads to pay for it all. However, that cooperation ends up requiring them to skew search results, which prevents them from fully indexing the Internet.

And that's why there is no accurate search engine. The search sites with the money can't have the freedom to be accurate; the sites with the freedom can't make the money.

One solution might be a distributed search engine, for example, where instead of being housed in a location that can be shut down, it could be distributed in many locations and moved and shifted on the fly.

It's been tried before. Projects like InfraSearch, Opencola, YaCy and FAROO have attempted distributed search engines.

The problem is that the Internet is too large and too quickly changing for these small-time projects to get anywhere near the big search engines in comprehensiveness, even with government censorship.

So instead of trying to duplicate the indices of the major search engines, we need a distributed search engine that focuses exclusively on the censored content, where the major search engines have been forced to provide inaccurate results. Perhaps Google, Microsoft and others might even help this effort by freely providing data about what has been censored and why.

This distributed search engine should display the results from the search engine chosen by the user (Google, Bing, etc.), alongside results known to be censored somewhere -- anywhere.

Together, these two sets of results would not only show an accurate view of what's really on the Internet, but also what has been censored in a clear way.

I believe that anyone paying attention to the corrosive power of government censorship of search engine results sees how necessary this is. And if you don't see it, just wait a year or two.

Over the next two years, assuming current trends continue, search engines will increasingly lose their accuracy to the point where nobody can rely on them to show what really exists on the Internet. And each country or region will have it's own unique set of search engines that lie about what exists online in a different way from the other countries and regions.

It's time to get started on that distributed search engine of censored content.