Alrighty folks! Today is all about stats!
I want to know which (if any) of my pages are getting hits, and ideally if they're getting hits from not-robots.
I can easily add a middleware that grabs some stats and posts to a queue to be added to a db (so it doesn't slow down the main thread too badly). Question is, what should I keep?
I very much don't want to collect PII, so I can't grab raw IP. I'd like to know the difference between bots and not-bots, so I'll need User Agent (or at least, the result of userAgent.Contains('bot')
). Obviously I need the URL. I'm not expecting very much traffic, so I can bucket into hours (which also helps with the PII issue).
That looks something like:
CREATE TABLE HitsByHour (
Text Path,
Number Count,
Number Date, -- seconds since 1970 to the start of the containing hour
Number IsBot, -- 1 yes, 0 no
);
I think I also want a view like:
CREATE View IF NOT EXISTS
HitsTotal (Path, IsBot, Count) AS
SELECT Path, IsBot, Sum(Count)
FROM HitsByHour
GROUP BY Path, IsBot;
Did some research ("cat access.log
") and doing a check for "bot" isn't going to cut the mustard.