Cyber Security and Confusion Matrix

Monil Goyal
4 min readJun 5, 2021

How confusion matrix help to make our machine learning model more accurate and how it is important for cyber security world.

What is machine learning?

Machine Learning is the study of making machines more human-like in their behaviour and decisions by giving them the ability to learn and develop their own programs. This is done with minimum human intervention, i.e., no explicit programming.

How does machine learning work?

Machine learning is a form of artificial intelligence (AI) that teaches computers to think in a similar way to how humans do: Learning and improving upon past experiences. It works by exploring data and identifying patterns and involves minimal human intervention.

What is a cyber attack?

A cyber attack is an attempt to gain unauthorized access to a computer, computing system or computer network with the intent to cause damage. Cyber attacks aim to disable, disrupt, destroy or control computer systems or to alter, block, delete, manipulate or steal the data held within these systems.

Need of AI to prevent cyber attacks

AI is ideally suited to solve some of our most difficult problems, and cybersecurity certainly falls into that category. With today’s ever-evolving cyber-attacks and proliferation of devices, machine learning and AI can be used to “keep up with the bad guys,” automating threat detection and respond more efficiently than traditional software-driven approaches.

Do AI is capable of preventing Cyber attacks with 100% accuracy?

your answer may be yes but none of the machine learning models can achieve 100% accuracy so how it is possible to prevent cyber attacks with 100% accuracy. However, we can achieve efficiency nearby hundred by taking some measurements like the training model until it has the least chances of making wrong decisions.

So here is one tool called Confusion matrix which tells us about the accuracy of our machine learning model.

Confusion Matrix?

In the machine learning problem of classification, there is a matrix called confusion matrix which is a specific table layout that tells us about many factors and performance of the algorithm to train our machine learning model.

T → true predicted

F → false predicted

P → positive

N → Negative

Let’s suppose positive means the prediction is in our favour and negative means against us.

TP → means the prediction is in our favour and it is true i.e the actual output is in our favour.

TN → means the prediction is against us and it is true i.e the actual output is also against us.

FP → means the prediction is in our favour and it is false i.e the actual output is against us.

FN → means the prediction is against us and it is negative i.e. the actual output is in our favour.

So FP is the most important factor that the confusion matrix tells us because it tells us how many predictions are made wrong and they are actually against us. But the model will show them in our favour and we will not be aware of this. This may cause huge losses to us.

Let’s understand Type 1 Error and Type 2 Error errors with an abasic example.

Face Detection In Mobile Phones:

As we know, face detection system in mobile phones is using AI behind the scene. Since it is AI so there may be some vulnerabilities.

like if the system is not detecting the right person every time then it may lead to a bad user experience but there is no security issue. This type of error is known as Type 2 Error i.e False negative(FN).

But if the system is detecting the wrong person and giving access to it, then this will lead to security issue. This error is known as Type 1 Error i.e False Positive(FP) and it is against us.

So industries are more concerned about Type 1 Error.

likewise, industries are using AI to prevent cyber attacks on their server and try to minimise Type 1 Error.

CONCLUSION

AI is the basic need of today’s industries and it decides the future of their business. We can solve many problems with AI. The accuracy of their predictions depend on how the model is trained, the dataset provided to the model to train, the type of data and which factors were taken into consideration during creating the model.

--

--