The case focused on Google’s news aggregation service, which
automatically scans the websites of newspapers, extracting
headlines and snippets of text from each story. These are displayed
at Google News and the headlines link users to the full stories on
the source sites. Newspaper group Copiepresse, which represents
leading Belgian, French and German publications, said this amounted
to copyright infringement and a breach of database rules because
its members had not been asked for permission.
Copiepresse could have stopped Google without going to court but
chose not to. Instead, it wants Google to continue directing
traffic to its sites – and it wants Google to pay for the
privilege.
The court also ruled that Google’s cache, which is not part of
Google News, infringed copyright.
When a person performs a search at Google, results are displayed
with a link to the page on the third party site and also a link to
a ‘cached’ copy of the same page stored at Google’s own site. The
newspapers say this copy undermines their sale of archive stories.
Why buy an archived story if you can find it in Google’s cache?
Again, newspapers could have stopped their pages being cached.
Margaret Boribon, Secretary General of Copiepresse, told OUT-LAW
that Google’s behaviour is “totally illegal” because it does not
seek permission before extracting content for Google News or
copying pages to its cache. Google disagrees.
Understanding Google’s position within the law means
understanding how the search engine works.
Google uses an automated program to crawl across the internet,
known as its Googlebot. It locates billions of pages and copies
each one to its index. In doing so it breaks the page into tiny
pieces, analysing and cross-referencing every element. That index
is what Google interrogates to return search results for users.
When the Googlebot visits a page, it also takes a snapshot that is
stored in Google’s cache, a separate archive that lets users see
how a page looked the last time the Googlebot visited.
It is easy for a website to keep Googlebot or other search
engine robots away from all or particular pages. A standard has
existed since 1994 called the robots exclusion standard.
Add ‘/robots.txt’ to the end of any site’s web address and
you’ll find that site’s instructions for search engines. Google
also offers a simple way to prevent a page being cached: just write
the word ‘NOARCHIVE’ in the code of a page.
When asked why her members’ news sites didn’t follow these steps
to exclude Google, Boribon replied, “then you admit that their
reasoning is correct.” She said all search engines should obtain
permission before indexing pages that carry copyright notices.
But the real reason for not opting-out with a robots.txt file or
mandating against caching is that Belgium’s newspapers want to be
indexed by Google. “Yes, we have a problem with Google, but we
don’t want to be out of Google,” Boribon said. “We want Google to
respect the rules. If Google wanted to index us, they need to
ask.”
Copiepresse also wants Google to pay for indexing sites. Boribon
declined to discuss how or how much. “That has to be negotiated,”
she said.
The argument is not unique. The World Association of Newspapers
(WAN), which represents 18,000 newspapers in 102 countries, said in
January it would “explore ways to challenge the exploitation of
content by search engines without fair compensation to copyright
owners.”
At that time, WAN did not have a strategy for challenge.
Copiepresse did. It took direct action and convinced the Brussels
Court of First Instance to order Google to withdraw from its
sites all the articles and photographs of Copiepresse member sites.
Google was given 10 days to comply with the threat of a €1 million
fine for each day of delay.
Since the ruling, Google has pulled the plug on the news sites
in the lawsuit. They are not just missing from Google News Belgium,
they have disappeared from Google’s main index and cache too.
“They have done it to punish us,” said Boribon, who didn’t want
Google to go that far. “They have a bad attitude.” Yet Boribon went
on to complain that some of her members’ content can still be
accessed via Google News France. “They don’t apply the judgment
fully so we will ask for the fine,” she said.
Boribon does not seem to think she is cutting off her nose to
spite her face. “What I’m achieving now is getting all the
information to my European colleagues so we will have other
publishers taking part in the court case. Then maybe Google will
change its mind. If they see this is not a Belgian case but a
concern for all publishers all over the world, they will have to
review their business model.”
Her hope is that if enough publishers withdraw their content,
Google will have significantly less content to index – and that
will force it to the negotiating table.
Copiepresse is using the law as leverage in a commercial
argument: its content contributes to Google’s $10 billion-a-year in
revenue and newspapers want a cut. That argument should not focus
on Google News because Google News does not display ads. It is only
when newspapers’ pages appear in the results of the main search
engine that Google serves the ads that fuel the $125 billion
company.
Copiepresse told the court that Google damages the publishers’
ad revenue by bypassing their homepages. “We want search engines to
send people to our homepage,” she said, explaining that only the
homepage always carries ads.
Google says its practices are lawful. It acts as an intermediary
that connects users to sites. Europe’s Copyright Directive and
E-commerce Directive recognise the role of intermediaries and
afford them special legal protection, including a special right for
intermediaries to cache material. Confusingly, however, Google’s
cache may not be what the lawmakers had in mind.
Internet service providers use caches to save bandwidth on
delivering frequently-accessed web pages. Rather than deliver a
live page, it is more efficient to deliver a cached copy to
customers. The customer will never know the difference because the
cached copy is updated when the live page changes. The E-commerce
Directive doesn’t distinguish internet service providers from
search engine service providers. Instead it says “a service
provider is not liable for the automatic, intermediate and
temporary storage of that information, performed for the sole
purpose of making more efficient the information’s onward
transmission to other recipients of the service”. There are other
conditions, including that “the provider does not modify the
information” and that “the provider complies with conditions on
access to the information”.
Google has explained the purpose of its cache before, when the
function was challenged in a US court in January. Google listed
three purposes for the Nevada District Court: it allows users to
view pages that the user cannot access directly, perhaps because
the destination site has gone down; it allows users to make
comparisons between a live and cached web page; and it allows users
to identify search query terms (which are highlighted wherever they
appear in the cached page). Copiepresse might argue that these
purposes go too far beyond the Directive’s “sole purpose of making
more efficient the information’s onward transmission to other
recipients of the service”.
Even the legality of the primary search function of a search
engine is open to question. The Directive’s condition that a
provider “does not modify the information” is arguably breached as
soon as a search engine breaks a page into tiny elements for
analysis and cross-referencing in its gigantic index. That argument
was not raised in court but would cut to the heart of almost any
search engine’s operation.
Google won the Nevada case. Its opponent, a lawyer called Blake
Field, had “decided to manufacture a claim for copyright
infringement against Google in the hopes of making money from
Google’s standard practice,” according to Judge Robert Jones. Field
knew how the system worked and he placed copyrighted articles on
his site, waiting for Google to find and cache his work. When it
did, he sued.
The court endorsed Google’s opt-out approach: because Field knew
about the robots protocol and the NOARCHIVE command, Field’s
conduct was interpreted by Judge Jones “as the grant of a licence
to Google for that use.”
Google could use the implied licence argument when the
Copiepresse case returns to court. The robot exclusion standard has
been around for 12 years; Google could argue acquiescence.
Field also argued that Google’s cache was not “intermediate and
temporary storage”, as required by a US law. Judge Jones said that
Google’s caching for approximately 14–20 days at a time is
temporary. That may or may not influence a European court if it has
to decide the same issue: the wording is common to laws on both
sides of the Atlantic.
If the legality of the cache is uncertain, the legality of
Google News is no clearer. The Belgian court heard that it is an
information portal , not a search engine. It uses 4,500
English-language news sources and a few hundred Belgian sources, in
many cases without prior permission. Google says that’s okay.
“Copyright law allows for snippets to be published from
results,” Google spokesman D-J Collins told OUT-LAW. “That’s why we
have argued that the court order was flawed. Google News does not
break copyright law.”
Copiepresse disagrees with Google’s view that snippets of text
are unprotected. Copyright only protects against substantial
copying; but publishers would argue that a snippet can be
substantial in a qualitative sense, just as courts will protect
short samples from songs. Google takes each story’s headline – the
craft of a subeditor; and sometimes the entire first sentence or
more from the intro – the most labour-intensive part of a
journalist’s writing. The legality has never been fully
resolved.
The publishers might also argue that thousands of snippets in
aggregate amount to substantial copying in a quantitative sense.
Google might counter that it is taking only one
snippet of each copyright work – i.e. its thousands of snippets
are from thousands of works, not one work.
The Belgian court found that Google had also infringed database
laws. The EU’s Database Directive says that the repeated and
systematic extraction of insubstantial parts of a database can
amount to infringement of a database right.
Some courts have characterised websites as databases and ruled
against sites that aggregate content. But that was before
controversial rulings by the European Court of Justice in 2004 over
the use of horseracing and football fixtures data.
The upshot: many databases are only protected if the owners do
not ‘create’ their own data but obtain the data from others.
Google told OUT-LAW that it does not believe that Google News
breaks this database law. It did not elaborate, but might argue
that a newspaper’s site is not a protected database because the
database right does not cover the investment in creating the news;
it would only cover the obtaining of news from others. It might say
that there is no systematic extraction of a single database; it is
systematic extraction from lots of databases. But publishers could
argue that news stories are not the same as raw facts such as when
two football teams will play each other; and that their websites
are not a mere byproduct of investment, unlike the databases in the
fixtures cases.
WAN and other publisher groups will watch the rematch between
Copiepresse and Google with interest. A week after the September
ruling they identified the strategy that they had been seeking
since January: the Automated Content Access Protocol, or ACAP.
A briefing paper was sent to OUT-LAW. It describes a system very
similar to the robots exclusion standard: “a standardised way of
describing the permissions which apply to a website or webpage so
that it can be decoded by a dumb machine without the help of an
expensive lawyer.”
Angela Mills, executive director of the European Publishers’
Council, told OUT-LAW: “This isn’t about blocking content, it’s
about enabling it but with more sophisticated rules than are
currently possible. Right now we can say ‘don’t index’ – but that’s
not sophisticated enough. It’s very boring to have the choice of
yes or no.”
ACAP might say that text can be taken but not images; or that
images can be taken on condition that the photographer’s name
appears. Demanding payment for indexing might also be part of the
protocol, said Mills.
The plan is for ACAP to be a voluntary system. “If people wanted
to ignore the rights expression they could,” Mills said, “but that
obviously puts them in a much weaker position if challenged in
court.”
When asked what it thought of ACAP, Google’s Collins told
OUT-LAW, “We welcome any initiative that enables search engines and
publishers to work together more closely. We look forward to
discussing this proposal with the WAN and in particular how it can
build on robots.txt”. But asked if Google would pay publishers to
index their content, Collins replied, “That’s not something we
do.”
This feature, by OUT-LAW Editor Struan
Robertson,originally appeared in Issue 15 of
OUT-LAW Magazine. If you don't already receive the 16-page
Magazine, you can get a free
subscription. Contact: struan.robertson@out-law.com.