Saturday, 11 August 2012

Packet capture analysis in SQL Server with C5 SIGMA and Regex

I have been looking to get pcap data into a database to do some aggregation and analysis for some time now. A few days ago I came across C5 SIGMA from Command Five

C5 SIGMA takes network packet capture (pcap) data as input and produces a structured relational database that can be used for analysis and reporting using SQL queries.
This software automates TShark (a component of the free network protocol analysis tool Wireshark) to produce structured XML metadata about the packets within a collection of pcap files. The metadata is then stored in a relational database using a database schema automatically derived from the XML

I ran the pcap file from Puzzle #10 through C5 SIGMA using a SQL Server 2012 Database. Running SIGMA from source code in the Visual Studio debugger with no filters and a local SQL Server instance all running from a laptop it took about 5 minutes to process.

SIGMA displayed 20 warnings while processing the data, 4 about SQL column type incompatibility, the code gracefully modifies the database column data type to accept the value if required. The other warnings were to do with data truncation due to the nvarchar(4000) limit, I will have a look at adding an option to allow nvarchar(max). The SQL Server (max) datataypes have become a less restrictive over the versions and especially so with SQL Server 2012.

For now I am just going to look at how you could use C5 SIGMA to help with Puzzle #10 

Below is just the HTTP subset of tables.
I wanted to be able to do all my analysis from the SQL Server 2012 Management Studio however SQL Server 2012 does not have a Regex function, but it does have the ability to create CLR Functions in C# that can be called from SQL.

As the built in "CLR Database Project" for VS 2010 only supports SQL Server 2005 and 2008 I had to install Microsoft SQL Server Data Tools first. Then I could create a new "SQL Server Database Project" from "Other Languages" in Visual Studio. 

I created some Inline functions with Regex to parse a single parameter value from a full URI and one to remove anything past a ?, =, &, or ; from a full URI.

Now we can run queries to filter on the parameter 'q' used by most search boxes on web pages, not just Google and Bing
You will even see Google instant results for whatever the user typed. 

Depending on what you are looking for, you could run something like this to get list of URLs visited. 

It is possible to do way more then just http analysis e.g. simple arp enumeration

Take a look at C5 SIGMA yourself and have a play.


  1. Steve,

    Hi not sure if you still play with this project, I am not having much luck getting this to work with a recent pcap file I produced using tcpdump on a device.

    Hoping you can help out.

  2. Steve,
    Is this still a tool you are actively involved in? Were you able to make that make that change to support nvarchar(max)?

    Any feedback would be great and thanks, Matt

  3. Hi Matt,
    I haven’t been actively involved with using C5 Sigma for some time now, it also looks like Command Five haven’t published any updates to C5 Sigma as well. I did make a change to the test setup I was running at the time, however I don’t have a copy of my source code for that project anymore.
    Looking at the code just now the changes would be quite trivial.
    logic in the method UpdateColumnDefinition in AsyncDataWriter.cs would have to be changed along with the AlterColumn method in SqlServerDataWriter.cs