Navigating AI risks: Top data leak threats for enterprises in 2024

Yasir Ali, CEO, Polymer
Armando Pauker, Managing Director, Tensility Venture Partners

As we start 2024, the enterprise realm is witnessing a transformative wave with the escalating adoption of artificial intelligence (AI). This surge in AI integration, stretching across many industries, brings with it a critical focus area: heightened data security risks and the looming threats of data breaches. As businesses delve deeper into AI's potential, understanding the trifecta of enterprise areas most vulnerable to security risks becomes paramount in safeguarding the sanctity of data in this new digital age.

In this blog we are laying the groundwork for future publications. Our hope is that this at least makes the reader more sensitive to the exciting AI product announcements from your vendor of choice. Please read them a bit more carefully as it can be having an immediate affect on how your data might be misused.

AI usage accelerates in enterprises

Recent research published by IBM shows a significant move towards AI integration within large organizations. According to IBM Global AI Adoption Index 2023, 42% of IT professionals at

large organizations have actively deployed AI, and another 40% are exploring its use. In particular, the financial services industry is notably active in AI deployment with a significant focus on generative AI.

Deloitte's report Generating Value from Generative AI indicates that 2024 is a pivotal transition year, especially for generative AI tools, anticipated to launch in late 2023 or early 2024.

According to IBM, data security is a major concern that accompanies AI adoption. Data privacy and trust and transparency are the most significant inhibitors for organizations exploring or implementing generative AI.

How is AI adoption happening in the enterprise?

Patterns of adoption can be largely classified into three main categories:

  1. In-house LLM model development and deployment via company-specific AI chatbots

  2. Third-party generative AI tools such as ChatGPT and Bard

  3. AI assistance via existing workflow SaaS products

What are the data leakage risks of AI adoption?

Let's investigate how each of the adoption categories are playing out on the ground and what does that mean for data loss and privacy. We will also provide a high level perspective on how to secure each of these (goal is to follow up with a more indepth blog on each as a follow on).

1. In-house LLM models: data governance challenges

Developing AI models in-house offers more control but poses significant data governance risks and the potential exposure of sensitive corporate information. The models used are typically open source or provided as libraries in Python and other tools. The industry relies on open source transparency and communal usage as a proxy for specific QA testing of tools. In-house teams have to be continually thoughtful and careful about their model training considering fairness and ethics as well as guarding against the possibility of using private customer data anywhere in the AI/ML pipeline. Tools for addressing this include model audits and validation.

Use cases driving AI adoption are diverse and include:

●  Automation of IT and business processes

●  Security and threat detection

●  AI monitoring or governance

●  Business analytics or intelligence

There are many challenges to the mainstream adoption of these use cases, mainly due to the experience from cloud journey where master data management programs yielded mixed results. It’s one thing to train models using help docs for employee-facing chatbots, and quite another to

create valuable customer-facing interfaces using large language models (LLMs). If companies let AI training go wild in their environment, they risk sensitive data leaks and prompt poisoning.

In 2024, it’s anticipated that companies deeply rooted in technology, particularly those in advanced financial services and dynamic healthcare sectors, will possess the necessary expertise, resources, and scope to implement generative AI-powered chatbots that offer substantial value.

How to secure

Ensuring sensitive data is not included in any training datasets is the best way to minimize the risk of data loss. However, as witnessed in the cloud migration, this is easier said than done at scale. Understanding what data is sensitive and the context in which its sensitive is a complicated NLP problem.

Some developer tools can monitor for pre-defined personal identifiable information (PII) entities in large data sets, but this doesn’t ensure 100% data hygiene. Plus, there are instances where removing sensitive data makes the outcome of model training much less useful.

2. Generative AI in knowledge work: Balancing productivity with corporate privacy

One broad threat vector across the enterprise is the use of generative AI tools and services accessed through cloud API like popular tools ChatGPT, Bard, and Copilot. Different corporate functions might rely on some tools more than others.

GitHub Copilot, adopted by over 40,000 organizations, offers productivity gains for software developers but also presents potential code privacy issues and embedded security flaws in AI-generated code. Enterprises need to balance the productivity offered by AI and the sharing of private code development efforts with third parties like GitHub.

Marketing, support, and sales departments may use conversational AI chatbots that rely on internal product and support databases to interact with customers. These same tools can can be used to create marketing and sales collateral or communications using corporate CRM data. Connecting third-party tools to sensitive, private corporate data is a stumbling block for several organizations that see the benefit of using generative AI, but are worried about data privacy.

How to secure

Companies can include a data loss prevention layer at the chatbot level to monitor and stop sensitive data from being input into third-party tools.

Another area of focus should be ensuring malware does not make its way into source code.

Check for supply chain risk and stop nefarious codebase from being commited to the codebase when writing code with AI assistance.

3. AI assistance in SaaS products: Hidden threats

Whether companies like it or not, AI will go mainstream in 2024 through SaaS tools already approved for corporate use. The integration of AI to increase the functionality and productivity of third-party SaaS products is a growing trend.

Microsoft has taken the early lead integrating OpenAI models into its Office suite of products including Word and PowerPoint. For example, the release of Copilot with Microsoft Office 365 raises critical questions about data privacy and security because there have been numerous incidents of training data being leaked via clever prompts from ChatGPT.

Here’s how AI is imbued in many popular product offerings:

●  GitHub Co-Pilot - Coding or corporate spying?

●  Microsoft O365’s Co-Pilot - Is Microsoft reading your mail?

●  GitLab AI-Assisted Code - Who else is writing your software?

●  Zoom's AI Companion - Eavesdropping on your meetings?

●  Google Workspace’s Duet AI - Google’s eyes in your docs?

●  Salesforce’s Einstein CoPilot - Selling more than a CRM?

●  Atlassian's Assistant - Jira & Confluence or Big Brother?

●  Zendesk AI - Customer support or data siphon?

●  Dropbox Dash - Storage or stealthy data leak?

The products above represent a growing trend of AI acting as Shadow IT, operating subtly and often bypassing traditional security measures. Investment dollars back this trend.

How to secure

SaaS data loss prevention (DLP) is a must-have to protect against threats within existing tools. However, not all DLP solutions are created equal, and buyers have been burnt with false promises. Gaining visibility and workflows to handle data exfiltration is table stakes in the age of AI.

A call for evolving data governance

AI adoption is an inevitable and beneficial progression towards a more efficient and insightful business world. However, enterprises must be vigilant about the associated data leakage and data security risks. In line with increasing government-mandated regulations, corporate data governance is evolving to respond to the data risks inherent in AI adoption. These risks are present in three areas: in-house AI model development with open source, corporate usage of API-driven AI services, and the AI increasingly embedded in popular SaaS tools. All three areas require specific and thoughtful risk mitigation to continue enjoying the benefits and opportunities of AI.

Previous
Previous

AI Interactivity (Part I): AI Agents and Multimodal Agents

Next
Next

Small Models, Big Impact (Part II)