Welcome back! In this series of blog posts we are wrapping the awesome OnionScan tool and then analyzing the data that falls out of it. If you haven’t read partsone andtwoin this series then you should go do that first. In this post we are going to analyze our data in a new light by visualizing how hidden services are linked together as well as how hidden services are linked to clearnet sites.
One of the awesome things that OnionScan does is look for links between hidden services and clearnet sites and makes these links available to us in the JSON output. Additionally it looks for IP address leaks or references to IP addresses that could be used for deanonymization.
We are going to extract these connections and create visualizations that will assist us in lookingat interesting connections, popular hidden services with a high number of links and along the way learn some python and how to use Gephi, a visualization tool. Let’s get started!
NetworkX and GephiIf you read one of myearlier posts on solving the game Her Story using Python, you might already have the NetworkX library installed as well as Gephi. If not you can install NetworkX like so:
Mac OSX / linux: sudo pip install networkx
windows: pip install networkx
If you have never used pip before or don’t know what it is, take myPython course and find out.
Gephi can be downloaded from here .
NetworkX is the Python library that we are going to use to create entities on a graph (nodes) and then allow us to connect them together (edges). Once we have constructed this graph we will save it to the GEXF file format that Gephi can then open. We then use Gephi to layout the graph and begin exploring the data.
Now that you have the prerequisites installed, let’s start writing some code to analyze the data.
Coding It UpThe Python part is actually pretty quick and easy. We are just going to walk through each of the JSON files, examine the data, and then check a handful of fields that can include linked data. From there we simply add that data (nodes) to the NetworkX graph and connect them together (edges).
At this point if you read thesecond post, you are probably thinking that you could do the same with SSH keys, server headers, or other information that might indicated shared infrastructure. As homework feel free to take our graphing technique and go back and apply it to SSH keys, the results are pretty neat!
Crack open a new Python file, name it hidden_services_graph.py and start pounding out the following code (you can download the source here ):
import glob import json import networkx file_list = glob.glob("/tmp/onionscan_results/*.json") graph = networkx.DiGraph() Lines 1-5: we import all of our modules and then get the list of files (5) using the glob module as previously discussed inpart two of this series. Line 7: here we initialize our graph object so that we can begin adding nodes and edges to it as we discover links between hidden services, clearnet sites and IP addresses.Now let’s iterate over each of our JSON files and start extracting the relationships that were discovered by OnionScan:
for json_filein file_list: with open(json_file,"rb") as fd: scan_result = json.load(fd) edges = [] if scan_result['linkedSites'] is not None: edges.extend(scan_result['linkedSites']) if scan_result['relatedOnionDomains'] is not None: edges.extend(scan_result['relatedOnionDomains']) if scan_result['relatedOnionServices'] is not None: edges.extend(scan_result['relatedOnionServices']) Line 15: we are creating an empty list to hold the edges (connections) that we find in the JSON results. Lines 17-19: we test to see if the hidden service has any linkedSites (17) and if it does we grab all of them and push them into our edges list using the extend function. Lines 21-27: we repeat the same process as our previous chunk but testing for the relatedOnionDomains and relatedOnionServices members of the JSON.Now we are going to loop over the various linked hidden services and clearnet sites and get them added to our graph. Let’s implement this code now:
if edges: graph.add_node(scan_result['hiddenService'],{"node_type":"Hidden Service"}) for edgein edges: if edge.endswith(".onion"): graph.add_node(edge,{"node_type":"Hidden Service"}) else: graph.add_node(edge,{"node_type":"Clearnet"}) graph.add_edge(scan_result['hiddenService'],edge) Lines 29-31: we test to see if there are any edges (connections) to the current hidden service (29) and if so we add the current hidden service to the graph object using the add_node function. The first parameter of the function is the name (label) of the node, and the second parameter we are passing in a dictionary. This dictionary is a set of node attributes. In this case we create an attribute called “node_type” and we set it to “Hidden Service”. You can create as many node attributes as you like and name them whatever you want (instead of “node_type”). What this allows us to do later is to color the graph in Gephi to have all “Hidden Services” be one color, clearnet sites another color and IP addresses as separate color. Lines 33-41: we start walking over each edge (33) and first test if the current edge ends with “.onion” (35) which indicates a hidden service. If it is a hidden service, we add it to the graph (37) again setting the node_type attribute to “Hidden Service”. Lines 39-41: if the edge does not end with “.onion”(39) then we assume it is a clearnet site and so we add a new node to the graph object (41) and set it’s node_type attribute to “Clearnet”. Line 43: we now complete the connection between our current hidden service and the edge we were just processing by using the add_edge function. This function takes two parameters, the source and then destination node in the graph to create the connection. The source will always be the current hidden service we are processing.Beautiful, we are almost done! Next we are going to handle any IP addresses that were detected by OnionScan when scanning the current hidden service we are processing from the list. We will add some specific code to handle them and then we will output the graph to a file so we can open it in Gephi.
if scan_result['ipAddresses'] is not None: for ipin scan_result['ipAddresses']: graph.add_node(ip,{"node_type":"IP"}) graph.add_edge(scan_result['hiddenService'],ip) networkx.write_gexf(graph, "onionscan-with-ips.gexf") Lines 45-47: we test to see if there are any values in the ipAddresses field (45) from our scan result, and if so we