The goal of caching in HTTP/1.1 is to eliminate the need to send requests in many cases, and to eliminate the need to send full responses in many other cases. 即通过“过期”机制直接在浏览器端搞定,减少交互;或者“验证”机制,不发送完整响应,从而减少带宽占用。

协议里,针对透明度有一些折中,在我们的应用场景下,初期可以不考虑,但是需要明确指定no-cache或者明确的cache机制。但是,这里的透明性和非透明性操作指什么呢?

http的cache机制中,有3种角色:客户端、服务器端、缓存服务器(cache)。在某些中文翻译里,将原协议中的cache翻译为缓存服务器,但是个人觉得,这里所指的cache,完全不同于web架构中的缓存服务器(squid等),而是浏览器等客户端实现中的缓存组件。

 

If an origin server wishes to force a semantically transparent cache to validate every request, it MAY assign an explicit expiration time in the past. This means that the response is always stale, and so the cache SHOULD validate it before using it for subsequent requests. See section 14.9.4 for a more restrictive way to force revalidation.

If an origin server wishes to force any HTTP/1.1 cache, no matter how it is configured, to validate every request, it SHOULD use the “must- revalidate” cache-control directive (see section 14.9).

这两段,提到expiration model中如何强制客户端验证每一个请求。可以通过设置an explicit expiration time in the past“建议”验证,也可以通过cache-control指令,“force”验证。在我们的场景下,我认为应该使用后者,保证不该被cache的内容,绝对每次都被验证。

Servers specify explicit expiration times using either the Expires header, or the max-age directive of the Cache-Control header.与明确的过期时间相对的,还有Heuristic Expiration(启发式过期),即通过last-modified time等推测一个可能的过期时间。

expiration model涉及两种时间:cache age和freshness lifetime。cache age的计算涉及到Date(response header), Age(如果中间有代理服务器,则可能设置该header)。这里cache age的计算,有一些晕,存疑。freshness lifetime的计算涉及Expire,Cache-Control: max-age, or Cache-Control: s- maxage,该值的计算都是以server端时间为准的,不依赖与客户端的本地时间。判断一个缓存是否过期,只需要比较cache age 和 freshness lifetime即可。

在expiration model里,server端只需要提供expire time or max-age,缓存是否过期由client端判断。

 

另一种是validatation model,这里有强弱之分。我们的应用当使用强验证,即任何内容的改变都会导致缓存失效。baidu首页是有etag的,但明显它是分布式server,那么使用规则生成的etag呢?在这种model里,server端需要生成etag和last-modified time,并在接收到request header里的if-modified-since,if-match等字段时,比较缓存是否失效,从而选择发送304或者完成的body。

 

需要确定几个名词:

user agent: The client which initiates a request. These are often browsers, editors, spiders (web-traversing robots), or other end user tools. (例如浏览器,SDK)

server: An application program that accepts connections in order to service requests by sending back responses. Any given program may be capable of being both a client and a server; our use of these terms refers only to the role being performed by the program for a particular connection, rather than to the program’s capabilities in general. Likewise, any server may act as an origin server, proxy, gateway, or tunnel, switching behavior based on the nature of each request.

origin server: The server on which a given resource resides or is to be created.(例如web server)

proxy: An intermediary program which acts as both a server and a client for the purpose of making requests on behalf of other clients. Requests are serviced internally or by passing them on, with possible translation, to other servers. A proxy MUST implement both the client and server requirements of this specification. A “transparent proxy” is a proxy that does not modify the request or response beyond what is required for proxy authentication and identification. A “non-transparent proxy” is a proxy that modifies the request or response in order to provide some added service to the user agent, such as group annotation services, media type transformation, protocol reduction, or anonymity filtering. Except where either transparent or non-transparent behavior is explicitly stated, the HTTP proxy requirements apply to both types of proxies.

cache: A program’s local store of response messages and the subsystem that controls its message storage, retrieval, and deletion. A cache stores cacheable responses in order to reduce the response time and network bandwidth consumption on future, equivalent requests. Any client or server may include a cache, though a cache cannot be used by a server that is acting as a tunnel.(浏览器缓存、代理缓存等)

当前需要做的3件事:

1. 对于非缓存的内容,明确指明 cache-control: no-store,禁止client缓存

2.支持expire模式

3.支持etag+last modified模式

Leave a Reply