In tonight's episode of "Fun with pipelines", we've got this monstrosity:
awk '{ print $3 }' logs/* |
sort -n |
uniq |
tail +6 |
awk '{print "-x " $1 }' |
xargs dig +noall +answer |
rev |
awk -F. '{print $2 "." $3}' |
rev |
sort |
uniq -c |
sort -nr |
head -n 10
which produces an output something like:
349 amazonbot.amazon
186 ahrefs.net
67 yandex.com
47 msn.com
39 semrush.com
16 petalsearch.com
14 babbar.eu
10 googleusercontent.com
10 google.com
telling us that amazon used nearly 350 different source IPs to make requests to this server.
Explanation
It's bash, so the |
at the ends of the lines means "pass the output of this command to the next command".
awk '{ print $3 }' logs/*
Awk is a small text processing tool. By default, it splits its input into lines (using \n
) and then into fields (using spaces and tabs). In this case, print $3
means "print the 3rd (counting from 1) field". We've told awk to read all the files in the logs
folder, which have an IP address in the 3rd field.
sort -u
sort takes it's input, splits into lines, sorts it, and prints the output. The -u
option tells sort to only output one copy of identical lines (which is what we really want, the sorting is a side effect).
awk '{print "-x " $1 }'
Awk again, this time printing each input line with "-x " in front, which helps us build the next command.
xargs dig +noall +answer
xargs is a neat toy. It takes the input fed to it, and the command you give it, and then it runs the command, passing the command all the input as options.
It's very useful for, well, this kind of situation, where we've got a list of things that we want to run a command with.
The command we've asked it to run is dig, a "DNS lookup utility". Given a hostname, it will find the IP, or (as in our case), given an IP address, it will find the hostname. We have to give it the -x
option per address to tell it we're looking for the name. the +noall +answer
options limit it's output to the part we care about (the answer!).
rev
rev outputs each line of input reversed.
(Sorry, I ran out of energy at this point. Hopefully I'll come back and finish at some point)