Security visualization and graph databases

In a recent brainstorming session, I was asked a question: How can we make things easier for the SOC?

Now, this isn’t a new problem. You could probably come up with a bunch of different ways to address the question - #automation, process #standardization or #machinelearning driven #intelligence orchestration.

But this question was more specific. Continuous monitoring of multiple security events/logs is practically where a SOC starts and the challenge is - How do we make sense of it all? Say you have 15 different data sources to deal with, how would you do complex correlations in a consumable yet easily actionable manner?

Led me down a path I didn’t know existed (at least didn’t know there was an official name for it) #security #visualization. Started exploring #graph #databases and their ability to visualize massively disparate sets of data. To my surprise, I learned that this isn’t just something confined to a lab but has implications for the way you could design a SOC.

SOC personnel can query complex relationships in a (sort of) natural language i.e. think as they would in real life. Here’s an example: with #neo4j (an #open #source graph database) and just 4 queries, queried a sizeable set of DNS logs for signs of malice. Here’s what came out of it

[Update: Adding a legend for the graph]

  • Blue bubbles: The filtered list of all IP’s querying the DNS server. These are the malicious queries
  • Pink bubbles: Timestamps when the queries came in. All of them pretty much arrive in a 1-hour window
  • Purple bubble (center): is the reverse IP that the blue bubbles are trying to look up or enumerate

I picked 3 fields from the DNS server logs (client IP addresses, domain names, timestamp) and modeled a graph to infer the following relationship between the 3

  • At what time did a client make a particular DNS query?

These logs were picked from an authoritative DNS server which logs all incoming queries and the question I wanted answered was - Can these logs tell us anything about the attacker’s tactics?

In this case, more specifically what methods did the attacker use to enumerate non-public facing domains (i.e. domains not openly advertised on the internet but publicly accessible via the internet). From the graph what we can tell is that the attacker attempted to enumerate domains and subdomains and build an organizational map or a resource inventory for himself.


  1. This graph was a followup to a bunch of alerts that were triggered on a few other systems. Scouring through the DNS logs confirmed our theory and the attacker’s most likely approach. The idea was to see if we can correlate this kind of domain enumeration behavior with other IOCs to get an understanding of the TTP’s.
  2. I could not capture all of the IPs in this graph (pruned it to 50). In the original list, they were more than 300
  3. Yes, the data modeling‍ part does have a bit of a learning curve. But once you get past that, this does give you a perspective of what traditional tooling misses.
1 Like