Anomaly detection
What is it?
Anomaly detection, also called outlier detection, is the process of identifying outliers or items that differ significantly from the norm within a series of values.
It assumes you have data that falls within a certain understood range (for example, based on historical data), and that values falling outside that range occur only very rarely.
Why use it?
It's important for network administrators to be able to identify and react to changing operational conditions. Even the most nuanced change in the operational conditions of data centres or cloud applications can bring significant risks. This is where anomaly detection comes in.
An anomaly identifies deviating data values that can be linked to an event of interest.
Take the example of a sudden increase in temperature in a piece of equipment. During normal operation, historical data may show that operating temperatures fall within a certain range and that a temperature beyond that range usually leads to failure. In this case, it would be crucial to detect a high-temperature reading above a certain threshold as it is a sign of an impending failure. This will prompt a maintenance inspection and the pre-positioning of a replacement part.
Example
Case Study: Doxel
Doxel is a Silicon Valley start-up that provides AI-enhanced software focusing on improving construction productivity.
For example, rugged robots and drones are used to monitor and scan work sites. A Doxel robot scans construction sites every day to monitor how things are progressing, tracking what gets installed and whether its type of equipment and the place and time of installation are correct.
When a construction site shuts down for the night, the robot gets to work by scanning the site and uploading data to the cloud. Algorithms then flag anything that deviates from the building plans so that a manager can fix it the day after. The robot can follow prescheduled paths that even include stairs, and one of these robots can scan about 30,000 square meters over the course of a week.
See video above for more:
Categories
Supervised machine learning builds a predictive model using a labelled training set with two types of samples, 'normal' and 'abnormal'. When a new data point is detected, it's simply matched to one of these sets and classified as either normal or abnormal.
The advantage of supervised models is that they offer a higher rate of detection. This is because they can:
return a confidence score with model output
incorporate both data and prior knowledge
encode interdependencies between variables.
Unsupervised detection is the more common version of anomaly detection. The method does not require manual labelling of training data as the machine learning model is trained to fit normal behaviour using an unlabelled data set.
With this approach, it's assumed that:
the majority of the data points are normal, and
only a small, statistically different percentage of network traffic is abnormal.
In other words, it operates by flagging infrequent data groups that are significantly different to normal as anomalous.
Semisupervised anomaly detection can mean one of 2 things:
creating a model for normal data using data sets containing both normal and abnormal data points but unlabelled results in a train-as-you-go approach
a model based on a data set that is partially flagged (in which case it only uses the flagged portion of the data).