您在這裡

Caching and undercounting

15 一月, 2016 - 09:48

The previously developed model assumes that all hits are counted. However, there are hits that are never detected by a Web server because pages can be read from a cache memory rather than the server. A cache is temporary memory designed to speed up access to a data source. In the case of the Web, pages previously retrieved may be stored on the disk (the cache in this case) of the personal computer running the br owser. Thus, when a person is flipping back and forth between previously retrieved pages, the browser retrieves the required pages from the local disk rather than the remote server. The use of a cache speeds up retrieval, reduces network traffic, and decreases the load on the server. As a consequence, however, data collected by a Web server undercount hits. The extent of undercounting depends on the form of caching.

Netscape, one of the most popular browsers, offers three levels of caching: once per session, always, and never . In terms of undercounting, the worst situation is never , which implies that if the page is in cache, the browser will not retrieve a new version from the server. This also means the customer could be viewing a page that could be months out of date. Always means the browser always checks to ensure that the latest version is about to be displayed. A hit will not be recorded if the page in the cache is the current version. The default for Netscape, once per session , results in undercounting but does mean the customer is reading current information, unless that page changes during the session.

The existence of a proxy server can further exacerbate undercounting. A proxy server is essentially a cache memory for a group of users (e.g., department, organization, or even country). Requests from a browser to a Web server are first routed to a proxy server, which keeps a copy of pages it has retrieved and distributed to the browsers attached to it. When any browser served by the proxy issues a request for a page, the proxy server will return the page if it is already in its memory rather than retrieve the page from the original server. For instance, a company could operate a proxy server to improve response time for company personnel. Although dozens of people within the organization may reference a particular Web page, the originating server may score one hit per day for the company because of the intervening proxy server. To further complicate matters, there can be layers of proxy servers, and one page retrieved from the original Web server may end up being seen by thousands of people within a nation. Clearly, the proliferation of proxy servers, which is likely to happen as the Web extends, will result in severe undercounting.

The use of cache memory or proxy servers will result in undercounting of hits (Q2) and active visitors (Q3). Consequently, the locatability/attractability index ( η1) will be underestimated since Q2 is the numerator in the index's equation, and the conversion efficiency index (η3) will be overestimated as Q3 is in the denominator. It is more difficult to conjecture the effect on the contact efficiency index (η2). One possibility is that the index is underestimated because active visitors browse the site more frequently than those who just hit, and as a result are more likely to read the page from cache memory.

Clearly, empirical research is required to estimate correction factors for η1, η2, and η3. Unfortunately, these correction factors are likely to differ by page and change over time as the distribution of proxy servers changes. Therefore, the initial perception that the Web enables the ready calculation of efficiency measures needs to be tempered by the recognition that cache memory can distort the situation.

The counting problem caused by caching is not unlike other counting problems encountered by advertisers. Viewership, listenership, and readership of conventional media are cases in point. The issue of readership, for example, has perplexed advertisers, researchers, and publishers for many years: How does one measure readership? Is it merely circulation? Circulation probably undercounts in one way, because there may be more than one reader (e.g., two people read the subscription to Wired ), or overcounts in a nother (e.g., no one reads the subscription). We thus believe that caching is a new variation of the same old counting problem, and creative managers will need to discover innovative ways to solve it.