Alrighty folks! Today is all about stats!

I want to know which (if any) of my pages are getting hits, and ideally if they're getting hits from not-robots.

I can easily add a middleware that grabs some stats and posts to a queue to be added to a db (so it doesn't slow down the main thread too badly). Question is, what should I keep?

I very much don't want to collect PII, so I can't grab raw IP. I'd like to know the difference between bots and not-bots, so I'll need User Agent (or at least, the result of userAgent.Contains('bot')). Obviously I need the URL. I'm not expecting very much traffic, so I can bucket into hours (which also helps with the PII issue).

That looks something like:

CREATE TABLE HitsByHour (
  Text Path,
  Number Count,
  Number Date, -- seconds since 1970 to the start of the containing hour
  Number IsBot, -- 1 yes, 0 no
);

I think I also want a view like:

CREATE View IF NOT EXISTS
  HitsTotal (Path, IsBot, Count) AS 
    SELECT Path, IsBot, Sum(Count)
      FROM HitsByHour
      GROUP BY Path, IsBot;

Did some research ("cat access.log") and doing a check for "bot" isn't going to cut the mustard.


Got distracted from the stats plan predictably quickly. On the other hand, I've tidied up my DNS records (mostly upped the TTLs from 5 minutes to one day, since I'm fairly confident they're correct), updated the nginx logs to include cache hit/miss and hostname (the first to see if the cache is working (spoiler: not as much as I'd like) and the second because all the hosts go into the same file and it's not clear which request is for shiv host), and verified a couple of sites on Google search console.

First news from the logs: My Windows browser uses IPv4 for one site and IPv6 for another, even though both share addresses. Weird. I'll wait a couple of days and see if it's still happening.


To remember your current position in the blog, this page must store some data in this browser.

Are you OK with that?