SkopeAI: AI-powered data protection that mimics the human brain

Johannesburg, 27 Jul 2023

To solve modern DLP challenges, Netskope has pioneered ML-enabled image classification.

In the modern, cloud-first era, traditional data protection technology approaches struggle to keep up. Data is rapidly growing in volume, variety and velocity. It is becoming more and more unstructured and therefore harder to detect and, consequently, to protect. Most DLP solutions today rely only on textual data analysis in order to detect what data is sensitive, utilising regular character patterns and content matching techniques applied to “conventional” data types (such as Word documents and spreadsheets). These techniques were once revolutionary; today, they are behind.

Don’t get me wrong: it is fundamental for DLP to be equipped with as many text analysis tools as possible – after all, if identifiable, it’s the content itself that is sensitive. DLP must be able to recognise thousands of known sensitive data types and unambivalent regular expressions, plus understand different data specific to countries and languages. For reliability, DLP must also be equipped with highly scalable data fingerprinting engines that can memorise and match specific information found in sensitive databases and documents. Textual content must be clear and legible in order to be leveraged by such engines. To minimise false positives, today it is also fundamental to leverage rich context, deep learning, natural language processing (NLP) and other newer ML- and AI-based automated techniques.

When it comes to unstructured data sources like images, traditionally, optical character recognition (OCR) is used to extract text, which is then scanned for regular expression (regex) identification or exact matching analysis.

Because of the fast rhythms of modern business communication, users have developed new habits that make traditional data identification quite unreliable. In order to share information quickly and more often, users frequently share unstructured data sets, like images, taking screenshots or grabbing photos via a smartphone in order to rapidly convey ideas, show visual evidence, provide diagrams and slides on the go or show contact information to a colleague from a data repository like Salesforce. Those are just a few examples.

In these cases, even OCR cannot perform well on low-quality images where text is not clearly readable. With great amounts of images to be processed, OCR and data matching also consume excessive resources introducing incident response latency.

Evolving modern DLP

For modern businesses, DLP has to evolve. Think of the necessity for modern DLP as akin to functioning like a human brain. Our brain doesn’t necessarily have to read the text in a document like a picture ID to tell that the document is indeed a picture ID containing personally identifiable information (PII). Now, modern DLP can do the same.

To solve modern DLP challenges, Netskope has pioneered ML-enabled image classification. This technique leverages deep learning and convolutional neural networks (CNN) to swiftly and accurately identify sensitive images without the need for text extraction. It mimics the human visual cortex, recognising visual characteristics such as shapes and details to comprehend the image as a whole (much like how we can recognise that a passport is a passport without necessarily reading the details in it). ML enables feature recognition even in poor quality images, akin to the capabilities of the human eye. This is crucial, as images can be blurry, damaged or discoloured, yet still contain sensitive information.

The importance of personalised data classifiers

Netskope’s industry-leading ML classifiers empower automated identification of sensitive data, revolutionising the categorisation of images and documents with exceptional precision. This breakthrough technology detects and safeguards various sensitive data types, including source code, tax forms, patents, identification documents like passports and driver’s licences, credit and debit cards, as well as full-screen screenshots and application screenshots. The ML classifiers work in conjunction with text-based DLP analysis (like data identifiers, exact matching, document fingerprinting, ML-based NLP and deep learning, etc), complementing the DLP analysis of a file when text is indecipherable or harder to extract. They greatly enhance the detection accuracy and help enable DLP controls in real-time.

But what if I told you that a set of predefined ML classification templates may still not be enough?

Nowadays, organisations also possess proprietary document types and templates, personalised forms and industry-specific files that fall outside the realm of standard ML classifiers. Netskope’s Train Your Own Classifiers (TYOC) technology revolutionises data protection by combining the strength of AI, the adaptability of ML and the convenience of automation. TYOC automatically identifies and categorises new data based on a “train and forget” approach. Consider this analogy: your brain can recognise a known document like a passport or a W-2 form, but it won’t identify a new document type you’ve never encountered before. Yet, once your eyes see it and your brain learns its features, you can easily recognise it in the future. This is precisely how TYOC operates.

With TYOC, Netskope has democratised AI and ML data protection, granting customers the power of AI, automation and adaptive learning as part of the Netskope Intelligent SSE capabilities available today. Organisations can embrace these cutting-edge advancements to safeguard their sensitive data and stay ahead of ever-evolving data protection requirements. This innovation empowers organisations to confidently address today’s most formidable data protection challenges while relieving policy administrators of most manual burdens, allowing them to focus human resources on more critical tasks.

TYOC is part of SkopeAI, the new Netskope suite of artificial intelligence and machine learning (AI/ML) innovations now available across the complete Netskope SASE portfolio. SkopeAI offerings use AI/ML to deliver modern data protection and cyber threat defence, overcoming the limitations of legacy security technologies and delivering AI-speed protection techniques not found in products from other SASE vendors.

If you’d like to learn more, please visit our dedicated SkopeAI page or watch this video featuring a conversation about AI with Netskope CTO Krishna Narayanaswamy:

SkopeAI: AI-powered Data Protection that Mimics the Human Brain – Netskope