(一) 对象更新校验方式:
HTTP通过两种方式验证对象是否有更新if-non-match 或者 if-modified-since. 通过在Request中包含上述header向服务器发起询问。当response中包含E-tag头时,浏览器应该使用if-non-match来询问;response中含有last-modified头时,浏览器应用if-modified-since来进行更新询问。HTTP1.1规范建议使用E-tag方式(当不能使用e-tag方式时使用last-modified),但事实上很多现代服务器依然使用last-modified方式。当服务器同时使用E-tag和last-modified时,浏览器应同时发送if-non-match和if-modified-since头,服务器应同时对这两个头进行确认,只有同时符合未更新条件方可返回304响应。
(二) Cache控制:
1. 用在request中的cache控制头
Pragma: no-cache :兼容早起HTTP协议版本 如1.0+
Cache-Control: no-cache ,表示不希望得到一个缓存内容。只是希望,cache设备可能忽略。
Cache-Control: no-store,表示client与server之间的设备不能缓存响应内容,并应该删除已有缓存。
Cache-Control: only-if-cached,表示只接受是被缓存的内容
2. 用在response中控制cache的头
Cache-Control: max-age=3600,用相对于接收到的时间开始可缓存多久
Cache-Control: s-maxage=3600,与上面类似,只是s-maxage一般用在cache服务器上,并只对public缓存有效
Expires: Fri, 05 Jul 2002, 05:00:00 GMT 基于GMT的时间,绝对时间,但该头容易受到本地错误时间影响
Cache-Control: must-revalidate 该头表示内容可以被缓存但每次必须询问是否有更新。
各种cache-control头值和意义:
Cache-Control header directives
|
Directive
|
Message type
|
Description
|
no-cache
|
Request
|
Do not return a cached copy of the document without first revalidating it with the server.
|
no-store
|
Request
|
Do not return a cached copy of the document. Do not store the response from the server.
|
max-age
|
Request
|
The document in the cache must not be older than the specified age.
|
max-stale
|
Request
|
The document may be stale based on the server-specified expiration information, but it must not have been expired for longer than the value in this directive.
|
min-fresh
|
Request
|
The document's age must not be more than its age plus the specified amount. In other words, the response must be fresh for at least the specified amount of time.
|
no-transform
|
Request
|
The document must not be transformed before being sent.
|
only-if-cached
|
Request
|
Send the document only if it is in the cache, without contacting the origin server.
|
public
|
Response
|
Response may be cached by any cache.
|
private
|
Response
|
Response may be cached such that it can be accessed only by a single client.
|
no-cache
|
Response
|
If the directive is accompanied by a list of header fields, the content may be cached and served to clients, but the listed header fields must first be removed. If no header fields are specified, the cached copy must not be served without revalidation with the server.
|
no-store
|
Response
|
Response must not be cached.
|
no-transform
|
Response
|
Response must not be modified in any way before being served.
|
must-revalidate
|
Response
|
Response must be revalidated with the server before being served.
|
proxy-revalidate
|
Response
|
Shared caches must revalidate the response with the origin server before serving. This directive can be ignored by private caches.
|
max-age
|
Response
|
Specifies the maximum length of time the document can be cached and still considered fresh.
|
s-max-age
|
Response
|
Specifies the maximum age of the document as it applies to shared caches (overriding the max-age directive, if one is present). This directive can be ignored by private caches.
|
(三) 两个特殊的HTTP 动作 options,trace
1. Trace可用来追踪在client和Server之间存在多少个代理服务器,当然前提是代理服务器支持设置via头,用法:
执行:
trace /tttt.gif HTTP/1.1
host:www.sohu.com
服务器会返回如下头信息
HTTP/1.0 200 OK
Date: Mon, 16 Mar 2009 11:47:52 GMT
Server: Apache/1.3.37 (Unix) mod_gzip/1.3.26.1a
Content-Type: message/http
X-Cache: MISS from 19709705.29867846.28603073.sohu.com
Via: 1.0 19709705.29867846.28603073.sohu.com:80 (squid)
Connection: close
服务器返回如下内容(这个内容反应的是中间代理服务器发向OWS的头部内容)
TRACE / HTTP/1.0
Cache-Control: max-age=36288000
Connection: keep-alive
Host: www.sohu.com
Via: 1.1 19709705.29867846.28603073.sohu.com:80 (squid)
X-Forwarded-For: 58.31.225.229
从上可以看出,中间经过了19709705.29867846.28603073.sohu.com代理服务器,而且该服务器只支持http1.0
2.Options可用来探测请求某个对象时,服务器能支持的HTTP动作
OPTIONS /ssss.gif HTTP/1.1
host:www.sohu.com
HTTP/1.0 200 OK
Date: Mon, 16 Mar 2009 11:59:17 GMT
Server: Apache/1.3.37 (Unix) mod_gzip/1.3.26.1a
Cache-Control: max-age=5184000
Expires: Fri, 15 May 2009 11:59:17 GMT
Content-Length: 0
Allow: GET, HEAD, OPTIONS, TRACE
X-Cache: MISS from 32583031.43658676.41464477.sohu.com
Via: 1.1 32583031.43658676.41464477.sohu.com:80 (squid)
Connection: close
(四) HTTP连接控制:
http连接可以分为1.顺序连接 2并行连接 3保持连接
顺序连接:是为每个对象建立一个TCP连接,这导致了传输中增加了大量的TCP建立、拆连时间
并行连接: 同时建立多个TCP通道,并行传输对象,重叠了TCP连接建立时间,因而总体延迟会减少,但并行连接对客户端及服务器性能提出了更高要求,HTTP规范并行TCP连接不应超过2个,事实上现代浏览器已经支持6-10个不等
保持连接:
通过保持TCP通道的打开,在通道内连续传输对象,可以有效减少TCP建立带来的开销或TCP慢启动带来的影响。
在HTTP1.0+版本中开始引入keep-alive概念,在HTTP1.1中改为persistent,两者的区别是HTTP1.0中,必须在header中显式说明keep-alive,而HTTP1.1中persistent是默认行为,除非使用connection:close明确指明关闭连接。
使用keep-alive或persistent需注意:
在HTTP1.0中必须显式申明keep-alive,并在一个通道的后续request中也明确包含keep-alive,否则服务器将会认为client希望关闭连接;服务器的response中可以通过包含connection头来指明是同意keep-alive还是希望关闭连接。
使用保持连接必须在response中正确包含实体内容的长度或使用chunked,否则其他HTTPrequest将无法知道前一个对象是否传输完成。
(五) HTTP规范认为:如果Request中不含Accept-Encoding:即表示接受任意编码类型(例如GZIP压缩.------------实际测试发现并不一定成立。
(六) Chunked
这是一种传输编码,正常情况下http要求先知道对象的大小才能进行传输,以便接收端正确知道传输该何时结束,但是如果服务器无法报告对象的大小(例如)时,且连接是一个保持连接,则必须使用chunked传输。设置chunked后(在response头中设置transfer-encoding:chunked),对象将被切割为多个长度来传输,每次传输均指明当次内容长度,并在最后一次设置0以指示传输结束:
(七) 区间请求(range request)
http容许请求一个文档的指定区间内容,如果一次http下载因为某种原因中途失败,则http可以在下次请求使用range头,这样可以实现断点续传。同时range也广泛用在P2P类下载中,同时从多个服务器上下载同一类容以实现加快下载速度。
GET /bigfile.html HTTP/1.1
Host: www.joes-hardware.com
Range: bytes=4000-
User-Agent: Mozilla/4.61 [en] (WinNT; I)
在request头中包含Range: bytes=4000-表示已经下载4000bytes,本次请求从4000bytes开始即可。
而在response中可以设置Accept-Ranges: bytes以表示服务器可以接受range请求,并求度量单位是byte。
(八) Delta Encoding
一种减少http传输量的方法,正常情况下,如果服务器端一个文档更新后,将导致在下次客户端请求时,服务器端发送整个新文档给客户端,而如果这个文档只是更新了一小部分,重新传输完整的文档则是对资源的一种浪费。http通过delta encoding技术实现只传输变化部分,其技术原理是:
1. 服务器在第一次响应中包含一个e-tag头,表示该文档的一个唯一版本识别码
2. 客户端在下一次请求时,将在request中包含if-non-match头,向服务器询问该文档是否有更新;同时在request设置A-IM(accept-instance manipulation)头表示可以接受delta技术。
3. 服务器在接到请求后发现自己拥有文档的新版本(因为文档的e-tag已经变化了),于是在响应中包含IM头,e-tag头,delta-base头向客户端表明文档是如何更新的,其中IM头的值表示的是delta的某种算法,e-tag头是新的e-tag,delta-base表示本次delta算法是基于哪个版本计算出来的(正常情况下应该等于request中的if-non-match头值)
4. 客户端在接到response后启动delta算法更新本地文档,并更新本地文档的e-tag值为新的e-tag值。
在delta算法中用到的头有:
Delta-encoding headers
|
Header
|
Description
|
ETag
|
Unique identifier for each instance of a document. Sent by the server in the response; used by clients in subsequent requests in If-Match and If-None-Match headers.
|
If-None-Match
|
Request header sent by the client, asking the server for a document if and only if the client's version of the document is different from the server's.
|
A-IM
|
Client request header indicating types of instance manipulations accepted.
|
IM
|
Server response header specifying the type of instance manipulation applied to the response. This header is sent when the response code is 226 IM Used.
|
Delta-Base
|
Server response header that specifies the ETag of the base document used for generating the delta (should be the same as the ETag in the client request's If-None-Match header).
|
可以包含在A-IM和IM头中的值有(即delta可用的算法):
IANA registered types of instance manipulations
|
Type
|
Description
|
vcdiff
|
Delta using the vcdiff algorithm[14]
|
diffe
|
Delta using the Unix diff -e command
|
gdiff
|
Delta using the gdiff algorithm[15]
|
gzip
|
Compression using the gzip algorithm
|
deflate
|
Compression using the deflate algorithm
|
range
|
Used in a server response to indicate that the response is partial content as the result of a range selection
|
identity
|
Used in a client request's A-IM header to indicate that the client is willing to accept an identity instance manipulation
|
(九) HTTP状态码一览表:
Status codes
|
Status code
|
Reason phrase
|
Meaning
|
100
|
Continue
|
An initial part of the request was received, and the client should continue.
|
101
|
Switching Protocols
|
The server is changing protocols, as specified by the client, to one listed in the Upgrade header.
|
200
|
OK
|
The request is okay.
|
201
|
Created
|
The resource was created (for requests that create server objects).
|
202
|
Accepted
|
The request was accepted, but the server has not yet performed any action with it.
|
203
|
Non-Authoritative Information
|
The transaction was okay, except the information contained in the entity headers was not from the origin server, but from a copy of the resource.
|
204
|
No Content
|
The response message contains headers and a status line, but no entity body.
|
205
|
Reset Content
|
Another code primarily for browsers; basically means that the browser should clear any HTML form elements on the current page.
|
206
|
Partial Content
|
A partial request was successful.
|
300
|
Multiple Choices
|
A client has requested a URL that actually refers to multiple resources. This code is returned along with a list of options; the user can then select which one he wants.
|
301
|
Moved Permanently
|
The requested URL has been moved. The response should contain a Location URL indicating where the resource now resides.
|
302
|
Found
|
Like the 301 status code, but the move is temporary. The client should use the URL given in the Location header to locate the resource temporarily.
|
303
|
See Other
|
Tells the client that the resource should be fetched using a different URL. This new URL is in the Location header of the response message.
|
304
|
Not Modified
|
Clients can make their requests conditional by the request headers they include. This code indicates that the resource has not changed.
|
305
|
Use Proxy
|
The resource must be accessed through a proxy, the location of the proxy is given in the Location header.
|
306
|
(Unused)
|
This status code currently is not used.
|
307
|
Temporary Redirect
|
Like the 301 status code; however, the client should use the URL given in the Location header to locate the resource temporarily.
|
400
|
Bad Request
|
Tells the client that it sent a malformed request.
|
401
|
Unauthorized
|
Returned along with appropriate headers that ask the client to authenticate itself before it can gain access to the resource.
|
402
|
Payment Required
|
Currently this status code is not used, but it has been set aside for future use.
|
403
|
Forbidden
|
The request was refused by the server.
|
404
|
Not Found
|
The server cannot find the requested URL.
|
405
|
Method Not Allowed
|
A request was made with a method that is not supported for the requested URL. The Allow header should be included in the response to tell the client what methods are allowed on the requested resource.
|
406
|
Not Acceptable
|
Clients can specify parameters about what types of entities they are willing to accept. This code is used when the server has no resource matching the URL that is acceptable for the client.
|
407
|
Proxy Authentication Required
|
Like the 401 status code, but used for proxy servers that require authentication for a resource.
|
408
|
Request Timeout
|
If a client takes too long to complete its request, a server can send back this status code and close down the connection.
|
409
|
Conflict
|
The request is causing some conflict on a resource.
|
410
|
Gone
|
Like the 404 status code, except that the server once held the resource.
|
411
|
Length Required
|
Servers use this code when they require a Content-Length header in the request message. The server will not accept requests for the resource without the Content-Length header.
|
412
|
Precondition Failed
|
If a client makes a conditional request and one of the conditions fails, this response code is returned.
|
413
|
Request Entity Too Large
|
The client sent an entity body that is larger than the server can or wants to process.
|
414
|
Request URI Too Long
|
The client sent a request with a request URL that is larger than what the server can or wants to process.
|
415
|
Unsupported Media Type
|
The client sent an entity of a content type that the server does not understand or support.
|
416
|
Requested Range Not Satisfiable
|
The request message requested a range of a given resource, and that range either was invalid or could not be met.
|
417
|
Expectation Failed
|
The request contained an expectation in the Expect request header that could not be satisfied by the server.
|
500
|
Internal Server Error
|
The server encountered an error that prevented it from servicing the request.
|
501
|
Not Implemented
|
The client made a request that is beyond the server's capabilities.
|
502
|
Bad Gateway
|
A server acting as a proxy or gateway encountered a bogus response from the next link in the request response chain.
|
503
|
Service Unavailable
|
The server cannot currently service the request but will be able to in the future.
|
504
|
Gateway Timeout
|
Similar to the 408 status code, except that the response is coming from a gateway or proxy that has timed out waiting for a response to its request from another server.
|
505
|
HTTP Version Not Supported
|
The server received a request in a version of the protocol that it can't or won't support.
|
(十) 【原创】一个负载均衡与E-tag头矛盾导致缓存效果变坏的实例分析:
负载均衡服务器后端是WEB服务器,但这些服务器是异构的比如说有linux的有windows的。
linux上设置http response中含last-modified头,但没有etag头:
linux.jpg (16.64 KB)
2009-3-13 19:02
windows服务器上设置response中既有last-modified 又有etag头。
windows.jpg (12.37 KB)
2009-3-13 19:02
第一次打开网站,a图片是从windows服务器上下到的,b图片是从linux服务器上下到的。
第二次打开网站(第2次打开时候超过缓存时间,由于该网站响应中只含有last-modified头,因此浏览器会使用启发式机制来计算可缓存时间。启发式缓存时间控制会有一个计算系数
WA上的assembly策略中有一个50%的系数就是控制这个的)
浏览器在请求a图片时候,被分配到了linux服务器上,b图片被分配到了windows服务器上。
由于a图片在第一次下载时拥有etag和last-modified两种属性。因此在第二次请求时浏览器会同时进行带2个条件的get
if-none-match和if-modified-since,根据http规范必须这2个条件同时满足未变化才会返回304。可惜第2次请求被分配到了linux服务器上,这个服务器是没有设置etag属性的,本来可以从本地缓存的图片却变成了重新下载:
截图00.jpg (57.37 KB)
2009-3-13 19:02
进一步分析:
使用e-tag是一件很坏的事情:
不同的服务器对同样的e-tag算出的值是不一样的,如果用e-tag作为判断条件,在被负载均衡到不同服务器后,则很容易导致缓存失效
截图01.jpg (52.91 KB)
2009-3-13 19:11
上图,同一图片在不同服务器上e-tag不同导致重新下载。
从服务器选择上看,这个图片这次恰好又分配到了另一台windows服务器,这样e-tag和last-modified头都有了,可以看到时间没有变化。可惜的是由于e-tag不一致导致重新下载。
(十一) 使用yahoo的Yslow评测工具分析一个站点在HTTP方面所做的优化:
Yahoo WEB应用开发团队是HTTP应用优化的倡导者和身体力行者,其开发团队根据多年的经验总结了数条网站优化规则,并编写成程序,该程序已经被众多的测试人员所津津乐道,并和强大的firebug工具集成,成为开发和测试人员的有利工具。
安装方法:
1. 下载安装Firefox浏览器
2. 下载安装Firefox浏览插件firebug
3. 下载安装Yslow
使用方法很简单,类似httpwatch,打开一个网站时,该程序会自动分析并评测:
从上面可以看出,网站总体测评分较低,属于F级别(A最优),其中还列出了具体可以优化的项目,并给各个项目的测评级别,例如可以再减少一些HTTP请求书,哪些项目可以使用CDN优化,哪些项目可以使用expire头或GZIP压缩等等。从上面的结果看,前5项都有很大的优化空间。具体内容可以展开项目后的三角形箭头查看,例如CDN部分:
对象过期优化:
一些对象位置的优化:
Gzip优化