Premise
One of the recurring themes over on the AdSense Help forums runs something like this:
"Adsense says 50 page views and my domain stats say 260 from 135 unique visitors. Clearly AdSense is putting the screws to their publishers! I want my money NOW!"
I wish I was kidding, but you see that kind of tone a lot over there. I've tried and tried to explain to people, repeatedly, that expecting AdSense, Analytics, or any other type of embedded metrics tool like StatCounter or Quantcast to show the same numbers you'll see in a server log is just ridiculous. It's not going to happen, and it shouldn't happen. Different measurement methods, quality control to protect advertisers, even the differences in time zones between your server and the measuring tool, will all have a major impact on the numbers you see. As we'll find out below, a good rule of thumb when estimating your potential advertising market - which is what any savvy advertiser actually cares about - is to take the number of page views/visitors shown in your server log and divide by five. This will present a much more realistic picture of what your traffic looks like to someone who wants to pay for eyeballs on ads rather than bots and click-bombing.
The Numbers
In response to a message much like this, I decided to run some comparisons against numbers for a given day and see what happened. These are the numbers I had for one domain I operate on 06-May-2009:
Server log: 2576 hits, 468 Page Views, 303 Visitors.
Statcounter: 98 page views, 66 visitors
AdSense: 80 page views, 1 click
Analytics: 88 page views, 45 visits, 43 visitors.
Then we start looking a little more closely. Oh, hey...246 of those page views in my server log were from robots. Guess we can throw those out.
Hm, here's the top ten visiting IP's...
|
Rank
|
Host
|
Country
|
Hits
|
Visitors
|
|
1
|
llf531035.crawl.yahoo.net
|
United States
|
46
|
18
|
|
2
|
MYSERVER
|
Germany
|
34
|
16
|
|
3
|
yx-out-f136.google.com
|
United States
|
73
|
16
|
|
4
|
crawl-66-249-67-106.googlebot.com
|
United States
|
126
|
12
|
|
5
|
194.8.74.220
|
Germany
|
20
|
10
|
|
6
|
rate-limited-proxy-209-85-238-27.google.com
|
United States
|
8
|
8
|
|
7
|
220.181.61.227
|
China
|
12
|
6
|
|
8
|
220.181.5.144
|
China
|
6
|
6
|
|
9
|
61.135.163.29
|
China
|
6
|
5
|
|
10
|
llf531112.crawl.yahoo.net
|
United States
|
10
|
5
|
Analysis
That second column of numbers is the 'visitor' count...and clearly no fewer than 75 of my visitors were web bots - 70 of them google. So we can throw those out...and that's without tracking down the 17 'visitors' from China, but I can tell you with a pretty high degree of confidence that nobody is visiting this site from China to actually visit this site. Could be Baidu scrapers, could be other search engines or even e-mail harvesters. Let's be generous and say 2/3rds of those hits are legit, round it off, and you've got twelve "visitors", all from the same two IP addresses. Then twenty-six more 'visitors' from Germany (I checked the raw logs - "Myserver" is the same box as the 194.8 entry), again all from the same IP. Of course, I went to the trouble of actually checking that IP, and the registrar is actually a company in the British Virgin Islands called Dragonara. But Dragonara says it's a company in Switzerland...and it's a hosting company, which means whatever those twenty-six 'visitors' were, they weren't human beings. (Note: Statcounter registers some GoogleBot/AdBot hits, but not all; it doesn't seem to pick up at all on most other 'bots.)
So now, out of a total of 100 visitors, we can say with absolute certainty that *95* of them aren't human. And that's just the top ten, out of an available list of the top fifty. The first entry I can find that I know for certain is human is at number 12, with 55 hits and 3 visitors. Problem is, that entry happens to be a good friend of mine, and I'm 100% certain that she is only one visitor.
All of a sudden, without even finishing this little exercise, it starts to become really, really clear how my server logs might be giving me a drastically overinflated sense of self-importance, don't it? Eliminate the bots. Eliminate multiple "visitors" who are really the same person - something that Google can likely determine better than the log analysis software I'm using (actually, they DEFINITELY can: my friend counts as one visitor in Analytics). Eliminate my own hits. Then start eliminating hits from IP addresses that Google, through years of research and analysis, discounts as originating from click-for-pay programs. THEN eliminate clicks where the user didn't stay on the target page long enough for it to even load properly, let alone to read it.
Pretty soon it gets real, real obvious that trying to gauge your actual human traffic that represents potential customers by reviewing your web logs is...well, pretty flippin' stupid and pointless. Pretty soon it becomes very clear that this vast conspiracy by Google to 'put the screws' to their poor, innocent publishers is actually a situation of Google having a very high-quality analysis system that weeds out all of the 'visitors' who aren't actually visitors at all - 'visitors' that mean absolutely nothing but wasted money to advertisers, visitors that I, as an advertiser, would be right pissed if I was paying for.
Pretty soon, it gets real obvious that the only screws being put to anyone are those of a paranoid, over-active imagination who would rather blame Google for 'screwing' them than take a hard look at how they're getting traffic, how much they're getting, the quality of that traffic, and the most important question in this equation, the value of that traffic to potential advertisers.
It has been my experience over many, many years of careful analysis of logs and web-based statistics that you can generally count on roughly 17-23% of your log activity to represent live human beings who might actually see an ad, be interested in it, click on it, and at least have a meaningful interest in the page they end up on. I generally see CTR around 0.2 - 2% of my traffic (substantially higher for search result traffic, but the volume of that traffic is substantially lower than the rest of a given site, of course). 2% is a high end, and that's 2% of the traffic that AdSense actually counts...so now it's 2% of 20%, which is .4% of my page loads, and that is an outrageously optimistic projection based on an average performance level ten times higher than reality.
Sidebar: For grins, I checked my historical stats for the best-placed ad on that site; 0.16% CTR over roughly 23,000 impressions. I can almost certainly improve this, but frankly I'm more worried about creating content people want to see than creating ad impressions, so I don't spend nearly as much time as I probably should fiddling with ad formats and optimization.
So now you're talking about a reasonable expectation of .04% CTR - that's 0.2% of 20% - vs. web log traffic.
The Reality Check
Working with the above information, we know now that for every ten thousand page views in my web log, I can expect:
Roughly seven thousand of them will be 'bots.
Roughly 200-800 of them will be me, depending on my development activity at any given time.
An unknown number, but 500 (5%) is a low estimate, will be from scrapers, pay-to-click, and other non-valid traffic.
This leaves 1700 - 2300 real, live human beings for each page load, from whom I will average about:
4 ad clicks.
(Interestingly, that falls perfectly in line with my estimate above. I swear I didn't push numbers to make that happen.)
And that's to say nothing, of course, of other benign factors that could shuffle the numbers - for instance, Google's "day" might be a different 24 hours than your "day"; the difference between east and west coast would be fairly negligible, even if you assume (which you can't, really) that Google's "day" runs midnight-to-midnight, but the difference between Google's server "day" in Mountain View, CA and the "day" in say, Bangalore is going to be pretty substantial because Google's "day" splits right in the middle of peak business traffic time there.
And people wonder why I keep saying that this is not a 'get rick quick' scheme? When you cast aside the wishful thinking and dollar signs in your eyes and look at the numbers objectively, it's pretty clear that making money with AdSense is not to be approached as a half-measures, part-time thing. The average stay-at-home mom writing a very popular blog (we're talking in excess of 500 daily readers) might see enough real traffic to generate $100-$150 a month, if she's lucky.
As a point of reference, Alexa ranks the site these numbers are pulled from in the high 3 million range of most popular sites on the 'net. Quantcast puts it at 1,541,783 as of this moment...and their stats tend to be more accurate than Alexa's by a couple orders of magnitude for small sites like the one we're discussing. That's out of roughly 106,000,000 sites on the 'net, as counted by DomainTools.
Conclusion
If you're planning to make a little extra money, you'll probably do all right as long as you're providing information that people care about. If you think you're going to retire rich on AdSense...forget it. If someone tries to sell you some 'program' or 'method' to get rich on AdSense, they're just trying to get rich on you. The only way to truly make money with this is to build websites filled with quality, original, content and do your best to publicize them without spamming or resorting to paid traffic schemes that will get you banned from AdSense anyway.
Before you start seeing conspiracies and fraud behind every bush, take your server logs and divide that traffic by five. Then you're getting close to a realistic picture of how much traffic you really have that an advertiser would want to pay for.