Generative artificial intelligence (AI) holds tremendous promise across many industries and disciplines. However, as with any powerful new technology, it also brings new security risks. Let’s take a few moments to dive into the emerging generative AI threat landscape, focusing specifically on areas of data and system security. This blog post will also highlight how organizations can securely adopt these tools, even with these risks.
How is generative AI different?
To grasp how generative AI changes the threat landscape, we must first consider how these new systems differ from traditional systems that have served as the backbone of supply chain systems for the past 50 years. The top five differences are:
- Security tools and practices for generative AI are still maturing, compared to technologies already available for databases. Database security vulnerabilities like SQL injection are well understood, following decades of focus. Developers are extensively trained on these threats, and robust auditing tools are integrated into CI/CD pipelines. However, the generative AI journey is just beginning, with threat modeling and tools still emerging.
- Generative AI delivers novel insights, rather than merely retrieving records. While databases return data that they’ve previously stored, possibly with transformations or calculations, generative AI synthesizes novel data based on its training. This is analogous to an analyst generating insights versus a clerk fetching records.
- Formal programming languages are predictable and unambiguous, unlike the nuances and ambiguity present in natural language used by generative AI. Databases utilize formal languages, such as SQL, which leverage a formal, understood syntax to access data. A given SQL statement, taken in the context of the already stored data, will always produce the same result. However, generative AI utilizes natural “everyday” language — with all its nuance and ambiguity — for all inputs and outputs. Like two people negotiating a contract, misunderstandings can occur between humans and AI applications. In addition, generative AI’s outputs are non-deterministic — which means identical inputs can yield distinct results in phrasing, wording or meaning.
- Generative AI may lack traceability and auditing capabilities, versus databases with tighter controls. With databases, authorized users can easily audit stored data and trace its origin. In contrast, generative AI models store knowledge in a neural network, in a form that’s incomprehensible to most people. In addition, there are currently no robust techniques available to audit the generative AI models’ acquired “knowledge,” or the potential biases from its training data.
- Generative AI currently has fewer built-in data access controls than databases. Databases have robust authorization controls that govern data access. However, generative AI currently lacks such built-in controls. Authenticated users may access any data.
Examining the differences between traditional systems and generative AI reveals new security vulnerabilities and necessary mitigations, which can be categorized into three key domains: Protecting sensitive data, securing systems and data from malicious use, and properly governing AI agents and plug-ins.