The secret language of statistics, so appealing in a fact-minded culture, is employed to sensationalize, inflate, confuse, and oversimplify. With COVID-19 on the forefront of everyone’s mind, there are common statistical use-cases abound. Trend-lines show infections and deaths. Media organizations like the New York Times and Washington Post compile data into informative charts. Results from vaccine trials electrify the airwaves and newspapers.
At the same time, statistics and math is used by some to challenge the threat. Cybersecurity decision-makers rely on metrics, data and statistics to make decisions. This short post will provide a few areas of to pay attention to when interpreting information related to cybersecurity. Good data and metrics are ones that you can take action based on.
Sources of Data
In analyzing data, the sample that the data comes from is the basis for results. In real-world statistics, examples abound. If a survey of public opinion is based on phone calls conducted to landline phones at noon, there is a bias. This is because only people who have land lines are included in the study. In addition, if a person is working outside of their house, they are not being counted. Using the results of this survey as a proxy for a larger group, like a whole country, is thus misleading.
The sample might also be too small to be statistically significant. Asking 10 people down the street their opinions on a matter will yield a result. Taking the result and using it as a basis for a larger opinion is not true. Statistical significance and the error margin are the elements that determine if a sample is meaningful.
Cybersecurity Impact: In the context of cybersecurity, point remains. The impact of sample bias and size in cybersecurity decision-making is important to understand. For example, if a security analyst looks at a generic metric that shows 80% of attacks are related to email phishing, they might conclude more attention be paid to this attack vector. However, if there are only 5 phishing attacks observed over a period of weeks or months, the certainty of taking action may not be warranted. A larger sample size must be observed to justify conclusions and actions.
When it comes to Cybersecurity, it is important to gather security relevant data in real-time from as many sources as possible. This provides security analysts and incident responders with the most comprehensive and up-to-date data to make decisions and react to current security threats.
Not all charts are equal. The impression of a chart can influence opinions and decisions made. The data underlying a chart is fixed. The way the data can be displayed can change.
For example, a data set might show an increase from from a range of 87 to 95 over a period of time. A chart with the vertical axis from 0-100 can make the data look much different than if the axis is changed.
Changing the minimum value from 0 to 75 makes the changes more dramatic an impact. The widget-monger might look at the upward trend and conclude things are going amazing.
To really influence the viewer to believe an upward trend, the axis might be 87-95!
The axis comes into play again when it comes to logarithmic and linear scales.
Logarithmic charts represent the data on axis that are dependent on the percent change of the values.
Linear charts display the values with a fixed distance between axis.
Log charts are helpful to tease out the relative differences between some value if one value is an outlier. However, the misinterpreting the meaning and overlooking the scale can result in missed information. The volume of the changes are not as clear.
Cybersecurity Impact: When looking at charts related to cybersecurity data-sets, pay close attention to scales. If you see a sharp increase of a slope, do not assume the increase is worth taking action upon unless you confirm what the underlying data is. Use the right chart to analyze the desired information characteristics.
Averages should not be taken as a reflection of reality. Examples of the challenges of averages can be found. For example, a camper planning to take a trip to the desert might look at average temperatures to plan dress. If the temperature is 95° during the day and 40° at night, the average is about 68°. The ill-informed camper might just pack a light jacket and pants.
Cybersecurity Impact: In the context of cybersecurity decision making, averages comes into play in many areas. For example, it the CISO sees that on average there are 1,000 alerts handled by the SOC, he might plan staffing accordingly. However when it comes to averages, this figure sometimes hides the truth. On one day, there could be 250 alerts and the following day 1,750. Staffing according to the average will leave staff overwhelmed when the peaks occur. The more rational approach would be to invest in security automation and orchestration response (SOAR) technology to manage the spikes and deal with anomalies.
The book I referenced at the start says that “if you can’t prove what you want to prove, demonstrate something else and pretend that they are the same thing. In the daze that follows the collision of statistics with the human mind, hardly anybody will notice the difference.” This statement related to information that is irrelevant to the decision being made or analyzed. The marketing world calls this type of information vanity metrics.