About Psyche

Psyche is currently in its prototype stage, so it will change quite a bit over time. Currently Psyche consists of four main parts:


User Interface

The UI is web based and should be compatible with most browsers. We use the flotr graphing JS library for the graphs. Flotr is not perfect, but its the best we can get our hands on right now. It's not designed for scientific visualization so there are some graphs that look a bit funky at first. We hope to fix that in future revs. Also, there is some AJAX-y goodness in here, but we'll be adding more of that in the future to streamline the user experience

Collection Engine

We wrote our own NetFlow v5 collection engine from scratch. It's written in C using the NetScape Portable Runtime to (hopefully) keep it pretty portable. From our testing it's quite quick and is able to keep up with ingesting at least a Mb/s of NetFlow datagrams.

Running meaningful queries against large relational datasets can be very computationally and IO intensive. Even medium size networks can generate millions of flow records a day. Just running a bunch of SELECT's against one huge table to implement the query logic could take minutes.

To avoid as many performance issues as we can, we have the Collection Engine (CE) preprocess the data before the UI has a chance to query it. Through a series of stored procedures the CE launches, we match up TCP and UDP sessions, find out aggregate information about the traffic, and determine which direction the data is heading. By stirring the data first, the queries by the UI are much faster. The CE manages this stirring process in an effort to ensure Psyche is as fast as possible. So far in development the CE and the sorting algorithms represent the areas of most code churn and "ah-ha!" moments. We anticipate the CE to solidify over the next few months. However, if you have any ideas on how to tweak the CE or the stored procedures it is managing, please let us know.

Backend Database

Like the analysis engine, the DB itself has been in a state of flux through the development process. So far, Postgres appears to be a good DB for us and supports everything we want to do. Probably the biggest limitation is the lack of ability to commit a small subset of transations in the middle of a large stored procedure. We've been able to work through this by writing external code to work on small chunks of the DB at a time.

We've made great strides on the performance of the DB throughout development and hope to continue that trend. We'll probably stick with Postgres, but don't be surprised if the schema changes. Don't worry... if it does, we'll provide a migration script.

If you're interested in our database design and how the CE interacts with the DB, check out our awesome Database design doc.