Anomaly detection

What is it?

Anomaly detection, also called outlier detection, is the process of identifying outliers or items that differ significantly from the norm within a series of values.

It assumes you have data that falls within a certain understood range (for example, based on historical data), and that values falling outside that range occur only very rarely.

Why use it?

It's important for network administrators to be able to identify and react to changing operational conditions. Even the most nuanced change in the operational conditions of data centres or cloud applications can bring significant risks. This is where anomaly detection comes in.

An anomaly identifies deviating data values that can be linked to an event of interest.

Take the example of a sudden increase in temperature in a piece of equipment. During normal operation, historical data may show that operating temperatures fall within a certain range and that a temperature beyond that range usually leads to failure. In this case, it would be crucial to detect a high-temperature reading above a certain threshold as it is a sign of an impending failure. This will prompt a maintenance inspection and the pre-positioning of a replacement part.

Example

Case Study: Doxel

Doxel is a Silicon Valley start-up that provides AI-enhanced software focusing on improving construction productivity.

For example, rugged robots and drones are used to monitor and scan work sites. A Doxel robot scans construction sites every day to monitor how things are progressing, tracking what gets installed and whether its type of equipment and the place and time of installation are correct.

When a construction site shuts down for the night, the robot gets to work by scanning the site and uploading data to the cloud. Algorithms then flag anything that deviates from the building plans so that a manager can fix it the day after. The robot can follow prescheduled paths that even include stairs, and one of these robots can scan about 30,000 square meters over the course of a week.

See video above for more:

https://youtu.be/L6uCtSzr59k

Categories

Supervised machine learning builds a predictive model using a labelled training set with two types of samples, 'normal' and 'abnormal'. When a new data point is detected, it's simply matched to one of these sets and classified as either normal or abnormal.

The advantage of supervised models is that they offer a higher rate of detection. This is because they can:

Unsupervised detection is the more common version of anomaly detection. The method does not require manual labelling of training data as the machine learning model is trained to fit normal behaviour using an unlabelled data set.

With this approach, it's assumed that:

In other words, it operates by flagging infrequent data groups that are significantly different to normal as anomalous.

Semisupervised anomaly detection can mean one of 2 things: