Why?
Because, if your monitoring security scenarios in your staging environments, then you are passively making your company safer, just by playing with new technologies. This post shows the data to back this claim up + provides a not-so-subtle nod to our OpsSight Connector product , which is a completely open source platform for for securing your cloud native data center by scanning each and every container, in a way that is non-invasive.
Please note, the key thing here, is that a non-invasivene threat detection model, for security scanning in staging evnironments, is critical: It encourages experimentation over time, in a way that gives you real time oversight, and insight, into the dynamics of your production threat model.
In Blackduck's OpsSight connector, we plot realtime scan throughput for all images in a cluster via prometheus. The overall picture often looks something like this:
- container churn : assuming this is directly proportional to # of apps... i.e. if you have 2000 apps, maybe containers will churn at a rate of 1000 a month.
- total app count (in a large cluster, 2000 to 10000 is reasonable.
- container vulnerability probability (low) (roughly 1/9 containers will have these)
- container vulnerability probability (medium) (roughly 1/9 containers will have these)
- container vulnerability probability (high) (1/9 will have these)
- rate of scanning (finding vulnerabilities), reasonably around 100-1000 a day, depending on how you scan.
- rate of scanning (finding vulnerabilities), reasonably around 100-1000 a day, depending on how you scan.
To just get down with the code and run it:
1) first start a local prometheus:
rm -rf ./data/* ; ./prometheus
2) git clone https://github.com/jayunit100/vuln-sim.git
cd clustersim/go build ./ && ./clustersim
Then open up localhost:9090 and check out the graphs. Details below... You should ultimately see a chart like this (explained later on).

Premise
Before moving forward into the stochastic nature of finding vulnerabilities in your clusters, we have to make one thing clear ~ this article assumes (1) You cannot at any given time keep up with the potential amount of new images in your cluster and (2) you never will be able to do (1).
If either of these assumptions is false, then the need to think deeply about the probability of a vulnerability being introduced to your cluster is obviate by the fact that, at any given time, you have the 'god view' of all vulnerabilities.
However, given that an application might have Gigabytes of code, libraries, binaries in it - and that apps continually change the libraries they depend on , and even the base images, you'll likely not ever be in a place where you have such a view - so you need to create a threat model, which tells you what you may or may not be at risk for.
Thus, we need to talk, a little bit, about probability, before we dive into it.
Distributions
Note: this is just an initial treatment of probabilistic selection of events. For something more sophisticated, check out RJ's blog post on http://rnowling.github.io/math/2015/07/06/bps-product-markov-model.html. Markov models, which allow you to select random elements in a more hierarchical fashion, can be even more realistic , but are still ultimately based on probability distributions.
Before we go through the findings from a simulator I've built for cluster vulnerability, lets talk about the backbone of that simulator: The normal distribution.
In a 'normal' environment, where people tend to make similar decisions (either because of social pressure or because there are simply inherent similarities in the way most people perform in a given task), picking a random element from a collection to simulate human behavior should be done against a probability distribution ... i.e. like this one :
Blackduck's OpsSight Connector is a really powerful tool for securing your openshift or kubernetes environments. It uses Blackduck APIs to scan *every* container, new or old, in your cluster over time.
However, as we all know... you're never 100% secure, and you need to use your intuition to decide on how aggressively you should tune any security product, including OpsSight itself, for low latency.
So how should you integrate a product like OpsSight into your data center?
The simplest thing, in my opinion, is to take some reasonable statistical realities of your cluster, and run a simulation of how your threat model changes over time.
From wikipedia:
Threat modeling is a process by which potential threats, such as structural vulnerabilities can be identified, enumerated, and prioritized – all from a hypothetical attacker's point of view.
Threat modeling with simulators.
So, what does your vulnerability profile look like in a kubernetes cluster, where you're scanning every image that comes in reactively?
Initially, assuming a normal distribution of images (i.e. there is a large segment of reuse), you see something like this:
Ignore the X axis, we are plotting time series data where milliseconds map to days. In any case, assuming you scan 100-200 images a day , even as far out as day 70, with a 10% modulation in 100 apps a day (average containers / app = 9), we can see periodic vulnerabilities that pop up and stick around for a short period of time. These spikes below happen long after the above drop off...
The integral under the below curve is your actual vulnerability over time.
The integral of this metric, over a given time series, gives you the vulnerability, over that time period - i.e.
![]() |
| This blog post is now officially legit: It has a math thingy in it. |
Are there other ways to model containers that are selected ? Yes: You could assume that apps have a completely randomly distributed set of containers. In that case, you get a different profile for vulnerabilities entirely !
In my simulator, by replacing the "image simulator" based on a normal distribution, with a random one, I got a much higher initial vulnerability scenario... see the one on the right ->....
However: Given the same churn, you actually get much less vulnerability in the long run, i.e. if your developers are more experimental early on - your cluster will be safer in the long run if your monitoring the whole time.
I feel like this has broader implications then just security: If your developers experiment, innovate, and take risks on a daily basis, it de-risks your products - not only from a security standpoint, but also, from a stability, performance, and innovation standpoint, over time.






Thanks for giving a great information about DevOps Good Explination nice Article
ReplyDeleteanyone want to learn advance devops tools or devops online training
DevOps Online Training
DevOps Online Training hyderabad
DevOps Training
DevOps Training institute in Ameerpet
DevOps Training in Ameerpet
DevOps Training institute in Hyderabad
DevOps Course in Hyderabad