osric.uk

In tonight's episode of "Fun with pipelines", we've got this monstrosity:

awk '{ print $3 }' logs/* |  
	sort -n |
	uniq |
	tail +6 |
	awk '{print "-x " $1 }' |
	xargs dig +noall +answer |
	rev |
	awk -F. '{print $2 "." $3}' |
	rev |
	sort |
	uniq -c |
	sort -nr |
	head -n 10

which produces an output something like:

    349 amazonbot.amazon
    186 ahrefs.net
     67 yandex.com
     47 msn.com
     39 semrush.com
     16 petalsearch.com
     14 babbar.eu
     10 googleusercontent.com
     10 google.com

telling us that amazon used nearly 350 different source IPs to make requests to this server.

Explanation

It's bash, so the | at the ends of the lines means "pass the output of this command to the next command".

awk '{ print $3 }' logs/*

Awk is a small text processing tool. By default, it splits its input into lines (using \n) and then into fields (using spaces and tabs). In this case, print $3 means "print the 3rd (counting from 1) field". We've told awk to read all the files in the logs folder, which have an IP address in the 3rd field.

sort -u

sort takes it's input, splits into lines, sorts it, and prints the output. The -u option tells sort to only output one copy of identical lines (which is what we really want, the sorting is a side effect).

awk '{print "-x " $1 }'

Awk again, this time printing each input line with "-x " in front, which helps us build the next command.

xargs dig +noall +answer

xargs is a neat toy. It takes the input fed to it, and the command you give it, and then it runs the command, passing the command all the input as options.

It's very useful for, well, this kind of situation, where we've got a list of things that we want to run a command with.

The command we've asked it to run is dig, a "DNS lookup utility". Given a hostname, it will find the IP, or (as in our case), given an IP address, it will find the hostname. We have to give it the -x option per address to tell it we're looking for the name. the +noall +answer options limit it's output to the part we care about (the answer!).

rev

rev outputs each line of input reversed.

(Sorry, I ran out of energy at this point. Hopefully I'll come back and finish at some point)