Application profiling made easy
In this video, Brandon demonstrates how to use CodeLogic to discover vulnerable areas within your applications and map results down to the code level, so you can quickly locate potential problems and mitigate risk.
Brandon Tylke: My name is Brandon Tylke. I’ve been in the industry for 23 years. I lead product development and delivery for CodeLogic. My background is in architecture, relational databases. I’ve done some transpiler development, and I was the former dev team lead at Cofense. While there, I built distributed systems that detected emerging threats in real time. Those systems had to be dynamically scalable because as your email comes in in larger quantities, it collected data from a variety of sources and it had to be dynamically scalable because the heuristics that we used would need to spin off additional servers to be able to detect those things as the volumes changed throughout the day.
So that’s just a little bit about me.
So what are we at CodeLogic? We’re a directory of dependencies. At the end of the day, we check your applications and data storage and we look at all of that and how the things interact and how the dependencies are and we’re the single source of truth for what those dependencies are and a living map of those dependencies.
Our CEO Greg Wunderle always asks, “Where do you start?” If you have a problem: what broke? If you have a migration and an upgrade: how are you going to tackle that? Or perhaps you just had a system compromised because an intern allegedly used a password of SolarWinds123. Which is always nice to point out that your intern was the guy who compromised your entire system.
So why did we build this? Today, we have things like Microsoft Exchange being compromised. There are 4 different CVEs that allowed people to gain access to your LDAP and your exchange system. It is the literal root of your organization’s authentication. Once you have that you have root for the whole system, so it’s kind of a big deal. That, or maybe you have an intern who has root access to your build and deploy systems and you don’t know it. It’s good to know where your dependencies are and what it is that they have access to.
Alright, so this is what everybody, you ask for, you know an application architecture diagram show me what my application is. What does it look like?
Everybody gives you this. Some form of this. And I mean you can search for it. Literally, everybody and their brother gives you that diagram. Pages and pages and pages, that is everyone’s diagram for what your application looks like. I mean, it’s not. If we’re willing to accept that that’s an accurate diagram of your application, then this is definitely a completely accurate diagram of Hudson Yards. Right down to the chew chew you can’t go wrong. You can take that, build those buildings and it’s going to work out, that’s going to be just fine.
If you were to take an application and actually decompose it and find out what it actually looks like under the covers. I mean, ServiceNow you start out with a base application of 200,000 nodes and edges. It’s a big complicated system, there’s a lot of moving parts there. We have client applications that we’ve worked with they contain 1.2 million nodes. That’s just the base application, none of the database information, just the application. These things are really quite large. Let’s jump into what we have and what we’ve done on the web version of our application. And then we’ll show where we go from there.
So in our web application, what we’ve actually done is, build it so that you can actually handle this within a browser. You are limited. You don’t have threading. You don’t have concurrency. You’re limited in what you can show because you certainly can’t display all 1.2 million nodes and wait for them to do conflict resolution and display themselves because it kind of breaks things. We’ve tried.
So, if we dig into this CCDB and we’re poking around and we have a lot of good information about what’s going on here. You can view the nodes details, you have the ID, you have audit history, so you know what has changed about this particular node over time.
In this case, it happens to be a table with columns. You can set governance rules if you wanted to create a governance rule about it. You wanted to know, say this is a social security number and now you have an application that is now referencing that table or that column and you want to know about it and get notifications about it, you can do that. Once you’re in here you can actually kind of bounce around, take a look at the columns and see all of that good information. One of the problems that you run into though is, even in here, we’re able to take a look at this, calculate the impact, find out within the application what accesses this particular column and it’s good information. You’re able to take this you could put this into a JIRA ticket and now somebody knows all of the pieces that they would actually need to be aware of if they’re going to make a change to this column.
On to the demo type things, let’s go here...
This is Cytoscape and, what we’ve found is as we’re trying to scale what it is that we’re doing, we’ve looked to other fields to see what do they do when they have large amounts of data, big data solutions that they need to visualize.
So, one of the most obvious was the medical field because they look at the architecture of humans. So, it seemed like kind of a good fit. And what we found was Cytoscape. It’s a pretty great application, it has a nice architecture for it. It’s very pluggable and it has a good community around it for developing this and allowing us to kind of build and add on. Once we first load this up you find that have examples of viral DNA, RNA and we were pretty impressed by what they had. Then we started loading our applications and found out that human viruses are actually smaller than modern applications because there's a lot to them.
Brandon: And again, in this format, we have access to things like threading, concurrency, state management that in the browser you were very limited in what you could do with state management and there’s a lot to modern applications when you start tying them out to databases and other things. It’s pretty unwieldy. We can actually get into a situation now that if you were to scan significant amounts of data that trying to actually load it all even in this application you can actually overrun it. You have to be strategic about what it is that you’re looking for.
To use an example, in an older iteration of an application that we had, it had some performance issues. So, you start with, I know I have an application, it has performance issues, and I want to determine why it’s having performance issues. Maybe you use Data Dog or New Relic, you do some profiling of your application and you figure out, ok, I have this one area and it’s giving me issues and I can see when the code hits this suddenly my requests, they’re very slow.
If we look for assets, because we’ve identified that as our problem area, we can see here that, ok, we have assets, it’s in our ORM, and this is the object that we know is having an issue at the moment. If we were to say center on this node, it’s going to get very large, we can zoom out a little bit here, and then I want to know all of the things that reference this, or its parents and children, anything that’s connected to this object in particular. So, from here, we can kind of see, ok, now I know what it is that is in this application. I’ll try and zoom in a little because that’s gotta be a bit of an eye chart.
Ned: Hey, quick question for you Brandon. So, you did a switch there to a different visualization, but the idea is that it’s still showing the same data, right?
Brandon: Yes, so this is the same data. That was another nice thing about this application. It allows us to have different visualizations for the same data. If you wanted to get a better idea as to what something looks like, in fact we’ll actually touch on that in a minute because one of the things you can do is have the thirty-thousand-foot view of your entire application, and we’ll actually illustrate that because you can actually see the problem in the same application, just visually inspecting it.
If we do layout here for a hierarchy, and these are all of the fields on that class or methods in the class, what we found was, in this particular instance, we had this assets table that it’s referenced. In fact, let me pull up the code here because this is the same assets. We’re able to look at this, we can actually see what references it, take a look at all the things, and there’s a lot that’s going on here that’s connected to this and consuming it, but it really doesn't give us an idea as to why it may be an issue in our application. We have caching, it handles rewrite, all of that’s fine, but when we get into it, we started poking around, taking a look at it and, so for this node, I want to now know, ok, what’s it tied to? We find that this assets table, it has a lot of connections. We have our columns and all of that. What we ended up finding here was when you look at the “id” column, it’s connected to everything. Each one of these salmon colored lines, those are foreign key references to other tables within the application.
Ned: And there’s a legend for that, right?
Brandon: There is a legend for that. We have color-coding and everything. Yeah.
You can hover over, and it’ll tell you what it is, but then there’s always the legend to fall back on as well if you needed fast reference for multiple items.
What we found was, all of these foreign key references are coming back to a single column in this one table and, in this, it became the bottleneck of our application. It was the reason, even though the object was supposed to be cached, we’re still hitting this issue that all of these references are being locked and waiting. It was the single bottleneck for ingestion within the application.
So, let’s take another view at this same data. I’ll do a hierarchy layout here and this is the kind of the thirty-thousand-foot view and we’ll move in from there. So, there’s a lot of nice colors, it’s kind of a neat layout, but you’ll notice again that this pink area here kind of stands out. Once you know what the colors are that you’re looking for, this a lot of foreign key references in the application. When you zoom in and you find out that there’s a lot of those and here’s the one guy that really stands out because this is the one that has a whole lot of foreign key references. If we take this and do what we did before and we’ll just view the direct references of this and we’ll do the hierarchy layout again you can actually start drilling into the application and find out what’s referencing it for each one of those.
It allows you to drill back in then to the tables, from the table we can find out then which application pieces, which code pieces are actually referencing this, and it allows us to move back through the code and we find this asset hierarchy usage audit. From there, again you can just work your way back to all of the different pieces that are referencing this within your application. It kind of gives you a way to work backwards and find out where your problem areas are and then determine what those pieces are that access them.
Again, once you’ve identified something even like with Mentis, and you know that you have something that is important, sensitive data, now you can use this to view the entire application architecture all up and then drill into those pieces and know what things are accessing that sensitive data and what things may actually be exposing it to an external API, or to your intern that has bad password management, or to your exchange system.
So, this is coming out in beta form in our next release. We have a lot of exciting things that are happening. We’re working with community contributing back to Cytoscape and we have a lot of things that are pretty exciting in this area that we’re working on. I’m pretty excited about what we’re doing.