防止盗链图片(译) » 荒野无灯weblog

Keep It Simple, Stupid.

荒野无灯weblog

防止盗链图片(译)

作者:Ken Coar

译者:shanji

水平有限,难免有错译、漏译、误译,词不达意。一片心意,望海涵,指正

借基础区人气,以飨小鸟,望手下留情,暴晒几天再转Apache版

转载请注明出处

===============================================

为了保持图片和别的站点拥有不同的风格,web管理员不断的寻找方法使他们的站点看起来更酷和吸引人。

一种方法是使用图片、logo和其它图形进行装饰——通常所说的“视觉享受”。当然,如果你以任何方式

走在这种方法的最前端,就要承担其他人为了装饰自己的网站剽窃你创意的风险。并且他们可能不

经过你的允许或者得到你的授权。

这篇文章向你展示如何使用Apache配置指令限制别人盗链你的图片,使别人很难在其他地方引用。

简单的描述下问题,有两种“侵权”的方式包含在下面

1.某人使用图片链接引用你网站上的图片

2.某人从你的站点下载图片并且复制使用

第一种方式不仅盗用了你的图片,并且更直接的伤害是用户在下载盗链人网站上的图片时,给你的服务器

带来了压力。你的日志里塞满了访问记录,你的带宽被耗尽——并且你没有获得任何利益。这种盗窃行为

完全是可以阻止的。

第二种盗窃方式更阴险。剽窃者在访问图片时并没有给你的站点带来更大的压力,却把图片复制到了自己

的站点上,但是你没有因为自己的创意获得任何荣誉,并且你可能并不知道盗窃行为已经发生了,因为web

站点的工作方式,这种盗窃行为不可能被真正的阻止,但是你至少可以使它变的更困难些。

你不能完全阻止这些行为的发生,但是你能使它们实施起来更困难

鉴别受保护的文件,你不可能想要保护自己站点上的所有文档。即使你想这样做,为了这篇文章我假设你

只想保护美术作品。如何做到只将规则应用到它们上呢?在你服务器的配置文件里使用下面的指令:

<FilesMatch “\.(gif|jpg)”>
[限制指令写在这里]
</FilesMatch>

你可以把这样的一个集合写在<Directory>集合,或者<VirtualHost>集合,或者任何集合之外(这种情况

下规则将应用于你服务器上所有此类型的文件),甚至放在.htaccess文件里。无论放在哪里,确保它能

保护你想保护的东西。

关键:Referer头字段。网络上的浏览器,蜘蛛爬虫,和服务器,对web页面的每次请求都包含一个组成

部分,所谓的HTTP request header。它包含请求的信息,比如用户喜欢的语言,客户端能处理的文档类

型——尤其是被请求项目的名字,这些信息使用一系列名字/值(name/value)组合进行传输,叫做头字

段(header fields)。

其中的一个头字段对我们特别重要,它叫做Referer字段(是的,我知道,单词拼写错误,但是在定义中

他的拼写也是错误的),如果并且只能如果客户从一个链接跳转而来,Referer字段指出客户上一个页面

的URL地址。意思就是,若果你访问A页面,并且点击一个链接跳转到B页面,对B页面的请求将包含一个

Referer字段,说明:“我是从A页面上的一个链接跳转而来。”如果不是从链接跳转来的,例如用户直接

在浏览器输入B页面的URL地址,在请求的头信息里将没有Referer字段。

这有什么帮助呢?好的,它给我们一个方法去判断,图片是被我们自己的网页请求,还是被其他人盗用。

使用 SetEnvIf“标记”图片。探讨一种简单的情况,假设我们站点的主页是<http://my.apache.org/>;

。这这种情况下,我们想要限制不是源自我们站点的对任何图片的请求(也就是说,只允许我们自己的页

面访问图片)。我们可以使用环境变量(environment variable 也叫做an envariable)来做标记,在

条件正确的情况下设置它,我们应该作如下的事情

SetEnvIfNoCase Referer “^http://my\.apache\.org/” local_ref=1

当Apache处理一个请求时,将检测头信息里的Referer字段,并且设置环境变量local_ref为1,如果请求

从我们的网站地址开始,也就是说,是我们的一个页面。

在引号里的字符串是一个正则表达式,为了设置环境变量,Referer值必须匹配它。描述如何使用正则表

达式(REs)超出了本文章的范畴,眼下,只要明白SetEnvIf* 指令用到了它们。

“NoCase”这部分指令意思是,无论’http://my.apache.org/’, 或者’http://My.Apache.Org/’,或者

‘http://MY.APACHE.ORG/’都可以匹配,换句话说,忽略值的大小写。

使用环境变量进行访问控制。顺序,Allow和Deny指令允许我们依据环境变量的设置,控制对文档的访问

。首选要说明Apache处理Allow和Deny指令的顺序,你应该使用下面的Order指令:

Order Allow,Deny

这意味着Apache对当前的请求,将执行列表中的Allow指令,然后重复进行Deny指令。使用这个次序,默

认的情况是拒绝,就是说任何人不能访问任何东西,直到他们获得一条可用的Allow指令。

好了,让我们添加指令使涉及的局部正常工作:

Order Allow,Deny
Allow from env=local_ref

这将使设置了local_ref环境变量(无论什么值)的请求通过。任何其它的请求将被拒绝,因为它们不符

合Allow的条件并且默认是拒绝访问的。

注意:请不要落入.htaccess和服务器配置文件使用<Limit>集合的陷阱,你几乎肯定不需要他们,并且会

把问题搞乱,不要使用它们,除非你确实想使GET和POST请求使用不同的处理方式。

把它们放到一起,把这些拼凑起来,关于指令的讲解告一段落,它们看起来是这样的:

SetEnvIfNoCase Referer “^http://my\.apache\.org/” local_ref=1
<FilesMatch “\.(gif|jpg)”>
Order Allow,Deny
Allow from env=local_ref
</FilesMatch>

这些可以出现在你的配置文件范围内(例如,httpd.conf),或者你可以吧 <FilesMatch> 集合放置到

一个或者多个.htaccess文件中。效果是一样的:在这些指令的范围内,图片只能通过你的页面访问。

注意:对于Apache 1.3.12和更早的版本。SetEnvIf* 指令只能在服务器配置文件内使用。后续的版本,

可以在集合和.htaccess文件中使用。

深层次的考虑,我早先提及的不能完全阻止盗链图片。是因为两件事,两种不同类型的盗链行为:

*某人真正的想要盗取你的作品可以使用一个伪造的符合你的标准的Referer值,换句话说,耍个把戏,使

请求看起来像从你的站点发出的。

*如果某人正当的通过你的页面访问你的作品,图片文件将几乎确定在客户缓存的某处,所以他可以通过

合法的请求把图片从缓存中提取出来,偷图片比自己作图片爽多了。

尽管本质上不能阻止某人不顾一切的盗取你的作品,但是这篇文章描述的方法能使偶然的盗窃行为变得更

困难。

另一件你能做的事,要看你下多大的决心保护自己的作品,给图片加水印。通过嵌入一个特殊的“签名”

给一张图片加水印,以后就能发现它。水印并没有降低图片质量,可以这样作,图片裁切不正的部分包含

标记,插入标记后即使图片被他人编辑也同样可以发现。离开数字领域,通过仔细的检查甚至在打印出来

的图片上也可以发现水印。如果给你的图片加水印,当你在另一个站点的某处发现可疑的图片,有很大机

会可以证实此处盗用了你的图片。

记录尝试盗窃的请求,如果你不确定是否有人觊觎你的作品,你可以使用同样的侦查机制和环境变量记录

可疑的请求。例如,如果你添加如下的指令到你的httpd.conf文件,无论何时某人非法访问你的图片时,

/usr/local/web/apache/logs/poachers_log文件将记录一个条目:

SetEnvIfNoCase Referer      ”!^http://my\.apache\.org/” not_local_ref=1
SetEnvIfNoCase Request_URI  ”\.(gif|jpg)”               is_image=1
RewriteEngine  On
RewriteCond    ${ENV:not_local_ref} =1
RewriteCond    ${ENV:is_image}      =1
RewriteRule    .*                   -     [Last,Env=poach_attempt:1]
CustomLog logs/poachers_log         CLF   env=poach_attempt

这样有助于记录所有使用本文描述的盗窃技术访问你图片的尝试。前两行设置条件标志(这是一张图片,

并且没有被本地文档查阅)。RewriteCond行检查标志是否被设置,RewriteRule行结合前两个标志设置第

三个标志,并且如果设置了最后一个标志,最后一行将记录请求到一个专用的文件。日志将使用预先设定

的’CLF’格式(‘Common Log Format’)记录,但是你可以容易地设置自己的格式。

其他的资源。本文描述的技术都是为了一个目的,但是列举了一些Apache服务器的能力,这里是一些进行

深入研究所需的知识点资源:

* HTTP/1.1定义文档:    <URL:ftp://ftp.isi.edu/in-notes/rfc2616.txt>;
* Apache主页, 当然是:    <URL:http://www.apache.org/>;
* Apache和它的的模块(modules)文档:    <URL:http://www.apache.org/docs/>;
* 规范的邮件列表:    <URL:http://www.apache.org/foundation/email-response.html>;

(这个页面通常是用作回复邮件请求以提供支持的,但是上面列举了一些很好的资源。)

然后是一些在本文中直接涉及的Apache指令和命令的细节文档:

* The documentation for <FilesMatch> documentation:    <URL:http://www.apache.org/docs/mod/core.html#filesmatch>;
*mod_setenvif documentation:    <URL:http://www.apache.org/docs/mod/mod_setenvif.html>;
* The mod_access documentation:    <URL:http://www.apache.org/docs/mod/mod_access.html>;
* The mod_rewrite documentation:    <URL:http://www.apache.org/docs/mod/mod_rewrite.html>;
* The documentation on the CustomLog directive:    <URL:http://www.apache.org/docs/mod/mod_log_config.html>;

结论  艺术作品是一个人努力的结晶,未经允许拿走某些东西就是通常所说的盗贼,这篇文章描述了一个

基本的方法保护你的艺术作品,这不能阻止坚决的盗贼,但是可以妨碍或者劝阻更常见的偶然发生的盗窃

行为。

得到一个你想要的主题?如果你有特殊的和Apache有关的话题,愿意在今后的专栏文中分享,请和我联系

;给我发电子邮件<[email protected]>。我一定阅读并回复我的邮件,通常在几个小时内(尽管在我旅行

或者邮件太多时,会过几天回复),如果我没在合理的时间内回复,大方的再联系我吧。

关于作者 Ken Coar,Apache开发团队成员,Apache软件基金会副董事长。他也是Jikes open-source Java

编译器项目的核心成员,PHP项目的贡献者,Apache Server for Dummies的作者,Apache Server

Unleashed丛集著者。可以通过<[email protected]>联系他。

===============================================

附英文原文

Keeping Your Images from Adorning   Other Sites    Webmasters are ever searching for ways to make their sites look  cool and attractive.  One way is to dress it up with images,  logos, and other graphics — sometimes referred to as ‘eye candy.’  Of course, if you happen to be in the forefront of this in any way,  you run the risk of having others cadge your art in order to  dress up their sites.  And they probably won’t even ask  permission nor pay you a royalty, either.
This article shows how you can use Apache configuration directives to  limit access to your art so that it’s more difficult to use elsewhere.
The Problem    Simply put, there are two types of ‘infringement’ involved here:

* Someone uses an IMG tag on its site to refer to a graphic on yours
* Someone downloads an image from your site and makes a copy on its

The first type not only causes your images to prettify someone  else’s site, but hurts you more directly because visitors to  their site are hammering yours to get the images.  Your log files get  filled with access request entries, your bandwidth gets used — and  you’re getting no benefit from it.  This type of theft is almost completely  preventable.
The second type of theft is more insidious.  The ‘borrower’ doesn’t  cause your site to get pounded on for access to the images, since they’ve  been copied to the borrower’s site, but you probably weren’t given  any credit for the artwork — and you probably don’t even know the  theft happened.  Because of the way the Web works, this type of theft  can’t really be prevented, but you can at least make it a little more  difficult.
You can’t completely prevent either of these, of course, but you  can make them more difficult to do.
Identifying the Files to Protect    You’re probably not going to want to protect every document  on your site.  Even if you do, for the sake of this article I’m  assuming you only want to protect your artwork.  So how do you  indicate that the rules only apply to them?  With directives such  as the following in your server config files:
<FilesMatch “\.(gif|jpg)”>
[limiting directives will go here]
</FilesMatch>
You can put a container such as this inside a  <Directory> container, or inside a  <VirtualHost> container, or outside any containers at all  (in which case it applies to all such files on your server), or  even inside .htaccess files.  Put it wherever it makes  sense to protect what you want protected.
The Key: the Referer Header Field    Down on the wire, where the browsers, spiders, and servers live, every  request for a Web page includes a component called the  HTTP request header.  This contains information about the  request, such as the user’s preferred languages, the types of documents  the client is able to handle — and not least, the name of the item being  requested.  This information is conveyed in a series of name/value  pairs called header fields.
One of these header fields is of particular importance to what we want to  do.  It’s called the Referer field (yes, I know, it’s  misspelt — but that’s how it’s misspelt in the definition, too), and  it indicates the URL of the client’s last page if and only if  the client is following a link.  That is, if you’re viewing  page A, and click on a link to page B, the request for page B will  include a Referer field that says “I’m following a link  on page A.”  If no link is being followed, such as if the user  just typed B’s URL into the Location field of his browser,  there will be no Referer field in the request header.
How does this help?  Well, it gives us a way to tell whether an  image is being requested because it was linked to by one of our  pages — or by someone else’s.
Using SetEnvIf to ‘Tag’ Images    For a simple case, suppose our Web site’s main page is  <http://my.apache.org/>;;.  In this case, we want  to restrict any artwork requests that don’t originate on our site  (i.e., only allow them if the image was linked to by one  of our pages).  We can do this by using an environment variable  (also called an envariable) as a flag, and setting it if the  conditions are right.  Something like the following ought to do it:
SetEnvIfNoCase Referer “^http://my\.apache\.org/” local_ref=1
When Apache processes a request, it will examine the Referer  field in the header, and set the environment variable local_ref  to “1″ if the value starts with our site address — i.e., is one of  our pages.
The string inside the quotation marks is a regular expression pattern  that the value must match in order for the environment variable to be  set.  Describing how to use regular expressions (REs) is far beyond the  scope of this article; for now, just be aware that the SetEnvIf*  directives use them.
The “NoCase” portion of the directive name means, “do this  whether the Referer is ‘http://my.apache.org/’, or  ’http://My.Apache.Org/’, or ‘http://MY.APACHE.ORG/’ — in other words,  ignore the upper/lower caseness of the value.
Using Envariables in Access Control    The Order, Allow, and Deny  directives allow us to control access to documents based upon the  setting (or unset-ness) of an envariable.  The first thing to do  is to indicate the order in which Apache will process Allow  and Deny directives; you do with the Order  directive as follows:
Order Allow,Deny
This means that Apache will go through any list of Allow  directives it has that apply to the current request, and then repeat  the process with any Deny directives.  With this ordering,  the default condition is ‘denied;’ that is, no-one will be able to access  anything unless there’s an applicable Allow directive.
All right, so let’s add the directive that will let local references  work:
Order Allow,Deny
Allow from env=local_ref
This will let a request proceed if the local_ref  envariable is set (with any value whatsoever).  Any and all other requests will be denied because they don’t meet the Allow  conditions and the default is to deny access.

Note:Please don’t fall into the trap of sprinkling your       .htaccess and server config files with       <Limit> containers.  You almost certainly don’t       need them, and they’ll just confuse the issue.  Don’t use       them unless you really want to have GET requests       treated differently from POST requests, for instance.
Putting It All Together    Putting all these pieces together, we end up with a stanza of  directives that looks something like this:
SetEnvIfNoCase Referer “^http://my\.apache\.org/” local_ref=1
<FilesMatch “\.(gif|jpg)”>
Order Allow,Deny
Allow from env=local_ref
</FilesMatch>
These may all appear in your server-wide configuration files  (e.g., httpd.conf), or you can put the  <FilesMatch> container in one or more  .htaccess  files.  The effect is the same: Within the scope of these directives,  images can only be fetched if they were linked to from one of your  pages.

Note:As of Apache 1.3.12 and earlier, the SetEnvIf*       directives       are only allowed in the server-wide configuration files.       In later versions, they can be used inside containers and in       .htaccess files.
Going Further    I mentioned earlier that you can’t fully prevent image theft.  That’s  because of two things, which apply pretty much to the two different  types of poaching respectively:

* Someone who really wants your artwork can always    request it using a faked-up Referer value that    happens to meet your criteria.  In other words, by jiggering up    the request so it looks like it’s a reference from your site.
* If someone legitimately views your artwork by going through    your pages, the image files are almost certainly in his client’s    cache somewhere.  So he can pull it out of a cached valid request    rather than making another one just to pick up the image.

Though it’s essentially impossible to foil someone who’s really desperate  to snitch your artwork, the steps described in this article should  make it too difficult for the casual poacher.
Another thing you can do, depending upon how protective you are of  your art, is to watermark the images.  Watermarking a digital  image consists of encoding a special ‘signature’ into the graphic  so that it can be detected later.  Digital watermarking doesn’t  degrade the quality of the image, and can be done in such a way  that even a cropped subsection of the image contains the mark, and  it’s detectable even if the image has been otherwise edited since the  mark was inserted.  It’s even possible to detect a watermark in an image that was  printed and then scanned in, having left the digital realm  altogether!  If you watermark your images, there’s an excellent chance  you’ll be able to prove snitching if you ever find a suspicious image  on another site somewhere.
Logging Snitch-Attempt Requests    If you’re not sure whether anyone is really after your artwork,  you can use the same detection mechanism and envariable to  log suspicious requests.  For instance, if you add the following  directives to your httpd.conf file, an entry will  be made in the /usr/local/web/apache/logs/poachers_log  file any time someone accesses one of your images without a valid  Referer:
SetEnvIfNoCase Referer      ”!^http://my\.apache\.org/” not_local_ref=1
SetEnvIfNoCase Request_URI  ”\.(gif|jpg)”               is_image=1
RewriteEngine  On
RewriteCond    ${ENV:not_local_ref} =1
RewriteCond    ${ENV:is_image}      =1
RewriteRule    .*                   -     [Last,Env=poach_attempt:1]
CustomLog logs/poachers_log         CLF   env=poach_attempt
This should have the effect of logging all attempts to access  your images using one of the potential ‘snitching’ techniques  described in this article.  The first two lines set flags for  the conditions (that it’s an image, and that it wasn’t referred  by a local document), the RewriteCond lines check  to see if the flags are set, the RewriteRule line  sets a third flag combining the two, and the last line causes the  logging of the request in a special file if that last flag is  set.  The log entry is written in the pre-defined ‘CLF’ format  (‘Common Log Format’), but you could put together your own  format just as easily.
Other Resources    The techniques described in this article are geared toward a single purpose,  but illustrate some of the capabilities of the Apache server.  Here are some pointers to resources for further investigation:

* The HTTP/1.1 definition document:    <URL:ftp://ftp.isi.edu/in-notes/rfc2616.txt>;
* The main Apache Web site, of course:    <URL:http://www.apache.org/>;
* The documentation for Apache and its modules:    <URL:http://www.apache.org/docs/>;
* The canonical email response page:    <URL:http://www.apache.org/foundation/email-response.html>;
(This page is normally used to respond to email requests for    support, but there are lots of good resources listed on    it.)

Then there are the specific pieces of the Apache documentation that are  directly related to the directives and commands described in this  article:

* The documentation for <FilesMatch> documentation:    <URL:http://www.apache.org/docs/mod/core.html#filesmatch>;
* The mod_setenvif documentation:    <URL:http://www.apache.org/docs/mod/mod_setenvif.html>;
* The mod_access documentation:    <URL:http://www.apache.org/docs/mod/mod_access.html>;
* The mod_rewrite documentation:    <URL:http://www.apache.org/docs/mod/mod_rewrite.html>;
* The documentation on the CustomLog directive:    <URL:http://www.apache.org/docs/mod/mod_log_config.html>;

Conclusion    Custom artwork can result from someone’s effort, and taking without  permission something that another has created is generally accepted as  theft.  This article has described a basic way to put your works of  art behind a velvet rope — if you’re so inclined.  It won’t stop  determined thieves, but it should hopefully stymy or dissuade the  more casual ones.
Got a Topic You Want Covered?    If you have a particular Apache-related topic that you’d like covered  in a future article in this column, please let me know; drop me  an email at  <[email protected]>.  I do read and answer my email, usually within a few hours  (although a few days may pass if I’m travelling or my mail volume is  ’way up).  If I don’t respond within what seems to be a reasonable  amount of time, feel free to ping me again.
About the Author    Ken Coar  is a member of the Apache Group and a director and vice  president of the  Apache Software Foundation.  He is also a core member of the  Jikes open-source Java compiler project, a contributor to the  PHP project, the author of  Apache Server for Dummies, and a contributing  author to  Apache Server Unleashed.  He can be reached via email at  <[email protected]>.

Tagged in :

All Comments (0)
Gravatar image
No Comments