This session, presented by Adrian Gheorghe and Marius Bleif from Eraneos Switzerland, explored the landscape of Artificial Intelligence in security. It focused on current risks, particularly those associated with Large Language Models (LLMs), alongside mitigation strategies, best practices, AI governance structures, and the significant implications of the EU AI Act.
AI in a Nutshell: Definitions
The presentation began by establishing core terminology within Artificial Intelligence, drawing from John McCarthy’s definition: “AI is the science and engineering of making intelligent machines.”
- Artificial Intelligence (AI): The broad field encompassing any method allowing machines to mimic human behavior.
- Machine Learning (ML): A subset of AI using statistical methods to enable machines to recursively improve outputs based on data.
- Deep Learning (DL): A subset of ML relying on complex neural networks. These networks recursively improve not only the output but also the method of generating it. They function by processing inputs through layers of interconnected nodes, adjusting connection weights based on training data to recognize patterns, like identifying handwritten digits.
- Within Deep Learning: Key areas include Natural Language Processing (NLP) for understanding and formalizing human language, Large Language Models (LLMs) which can also generate human language, and specific architectures like Generative Pretrained Transformers (GPT) that predict subsequent words based on vast pre-training. Other categories are Predictive AI for forecasting based on patterns, and Generative AI (GenAI) for creating novel content (text, images, etc.).
Current AI Risks: OWASP Top 10 for LLMs
A significant focus was placed on the specific risks inherent in LLM applications, as cataloged by the OWASP Foundation. These risks often intersect within the data flow of typical LLM applications.
- LLM01: Prompt Injection: Crafting malicious inputs (“prompts”) to manipulate an LLM into performing unintended actions. This can be direct (overwriting system instructions) or indirect (manipulating inputs from external sources like webpages).
- LLM02: Insecure Output Handling: Failing to validate or sanitize LLM outputs before use, potentially exposing backend systems to vulnerabilities like XSS, CSRF, SSRF, privilege escalation, or even remote code execution.
- LLM03: Training Data Poisoning: Maliciously tampering with an LLM’s training data to introduce vulnerabilities, biases, or backdoors. This compromises the model’s security, effectiveness, or ethical alignment. Sources like Common Crawl are potential vectors.
- LLM04: Model Denial of Service (DoS): Overloading an LLM with resource-intensive requests, causing service degradation or excessive operational costs. The computational demands of LLMs magnify this risk.
- LLM05: Supply Chain Vulnerabilities: Exploiting weaknesses in the LLM application lifecycle, such as vulnerable datasets, compromised pre-trained models, or insecure third-party plugins.
- LLM06: Sensitive Information Disclosure: LLMs inadvertently revealing confidential data in their responses due to inadequate data sanitization, improper training on sensitive material, or flawed access controls.
- LLM07: Insecure Plugin Design: LLM plugins developed with insecure input handling or insufficient access controls, making them susceptible to exploitation, potentially leading to unauthorized actions or data exposure.
- LLM08: Excessive Agency: Granting LLM-based systems too much autonomy, functionality, or permissions, leading to unintended and potentially harmful consequences if the LLM is compromised or misinterprets instructions.
- LLM09: Overreliance: Humans or systems depending excessively on LLM outputs without proper oversight or critical evaluation. This can lead to the propagation of misinformation, biased decisions, legal issues, or security vulnerabilities stemming from incorrect or inappropriate content generated by the LLM.
- LLM10: Model Theft: Unauthorized access, copying, or exfiltration of proprietary LLM models. This results in direct economic loss, erosion of competitive advantage, and potential exposure of sensitive information embedded within the model itself.
Mitigating AI Risks & Best Practices
Jailbreaking
Jailbreaking refers to attempts to bypass an LLM’s built-in limitations and safety mechanisms. Attackers use techniques like manipulated questions or specific inputs (Deception) to elicit responses that are normally blocked (e.g., unethical, unauthorized, or harmful content). Sometimes, ArtPrompt attacks using ASCII art can circumvent text-based filters, hiding forbidden instructions within the visual representation.
AI Governance & Controls
Effective AI governance and controls should not exist in isolation but must be integrated into existing organizational frameworks, such as Data Protection, Information Security, and Business Process Management. This involves incorporating specific AI requirements, developing necessary skills within teams, utilizing established review processes, and ensuring comprehensive data collection for monitoring and auditing.
Governance is crucial for managing the secure flow of data through AI systems (Data → Model → Use). At the Data stage, focus is on quality, DLP, access management, protection, and availability. At the Model stage, technical complexity requires guardrails, input/output controls, grounding techniques, penetration testing, threat modeling, considerations for sovereign AI, deployment choices (cloud/on-prem), knowledge base integration, readiness assessments, and audits. For the Use stage, the goals are reliability, security, automation, usability, and scalability. Underlying all this is the Organizational Complexity & AI Governance, encompassing compliance and legal aspects.
Responsible AI & Digital Ethics
Responsible AI aims to create AI structures and roles aligned with humanitarian and social values. It’s closely tied to Digital Ethics, the discipline examining technology development and use from a moral-philosophical perspective, and Responsible Tech, which seeks to align technology and organizational behavior with the best interests of people and the planet.
Achieving this requires interdisciplinary teams considering ethical implications from the outset (socio-technical analysis). Key values steering AI development include robustness, safety, explainability, accountability, and fairness/non-discrimination. Ethics by Design is a proactive process to embed these values:
- Sensitize: Raise awareness of digital ethics, formulate guidelines, establish points of contact.
- Address Concerns: Use workshops and tools (e.g., “Ethics in a Box”) to document reflections on training data and model behavior, defining relevant test cases.
- Evaluate & Bind: Conduct impact analysis, select appropriate toolkits/metrics, and agree on criteria for threat modeling and incident response.
- Coordinate & Implement: Put measures into practice, test rigorously (e.g., via AI red teams), and use toolkits to manage defined behaviors/errors.
- Monitor & Communicate: Continuously incorporate lessons learned in an iterative improvement cycle.
Company-Wide Strategy & Lines of Defense
Securing AI usage demands a holistic, interdisciplinary strategy across the organization. This includes defining clear AI Use Cases and establishing DP-Conformity through registries, consistent data management, and training. A multi-layered defense model is recommended:
- 1st Line (Users/Developers): Requires awareness/training, an escalation framework, clear guidelines on responsibility/benefits, and adapted project management methodologies.
- 2nd Line (Risk/Compliance): Needs to integrate AI into existing risk processes, leverage review synergies, include ethical AI in legal assessments, and provide necessary technical AI training.
- 3rd Line (Internal Audit): Must develop expertise through workshops/training, define AI-specific assessment elements, adapt reporting, and manage external communication (e.g., with regulators like FINMA).
Comprehensive Security Measures & Transparency
Successful, scalable, and auditable AI solutions require a layered security approach. Guardrails & Groundings are essential to protect input/output, minimize attack success (like jailbreaking), reduce “hallucinations” (factually incorrect outputs), and ensure results are based on a defined knowledge base. Examples include input/output moderation, blocking injections, ensuring factual grounding, content moderation, style adherence, and accuracy enforcement.
Logging & Monitoring provide continuous transparency and auditability of prompts, outputs, and metadata (load, latency, costs). Secure development practices like CI/CD & Secure Coding ensure scalability and standardization. A Zero Trust Architecture & Governance framework adds further security layers (segmentation, IAM, DLP, data cleanup). Proactive Pentesting & MITRE Atlas threat mapping, connection to the SOC, embedding security via DevSecOps, and applying STRIDE Threat Modeling across the AI system’s components and interactions are all crucial elements of a robust security posture.
EU AI Act
The presentation concluded with an overview of the recently adopted EU AI Act, a landmark regulation aiming to harness AI’s potential while safeguarding individual rights.
Core Principles
The Act is built upon principles such as protecting fundamental rights, ensuring transparency, prohibiting specific high-risk AI uses, mandating risk assessment and classification, requiring ongoing monitoring, establishing clear responsibility, demanding documentation, and enforcing conformity assessments.
Coverage (Almost Global Applicability)
The Act applies broadly: to EU-based providers/distributors in EU markets, EU-based users, and importantly, providers/users outside the EU whose AI system outputs are consumed within the EU.
Risk Classification
The Act categorizes AI systems based on risk:
- Unacceptable Risk (Prohibited): Includes systems for manipulative purposes, social scoring, real-time remote biometric identification in public (with narrow exceptions), biometric categorization based on sensitive attributes, emotion recognition in workplace/education, and untargeted scraping for facial recognition databases.
- High Risk (Strict Conditions): Applies to systems with significant impact on individuals (e.g., in medicine, insurance, critical infrastructure, law enforcement, employment, education). These require rigorous compliance, data governance, documentation, transparency, human oversight, accuracy, robustness, and cybersecurity measures.
- Limited Risk (Transparency Obligations): Systems like general-purpose AI, chatbots (must disclose AI nature), and deepfakes (must be labeled).
- Minimal or No Risk (Allowed): The majority of AI systems, such as spam filters or AI in video games.
Implementation Timeline (Staggered Approach)
The Act’s provisions come into effect gradually:
- Prohibitions on unacceptable risk AI: Nov 2024
- Codes of practice for General Purpose AI (GPAI): Feb 2025
- Member State competent authorities appointed: May 2025
- Template for high-risk AI post-market monitoring: Nov 2025
- (Full application is generally expected around mid-to-late 2026)
Impact on Operating Model
Compliance necessitates changes across multiple organizational dimensions: Data, Governance, People, Process, Technology, and Risk & Controls. It requires updates to policies, decision-making bodies, training programs, risk frameworks, and more.
Compliance Approach (Eraneos 360°)
Achieving compliance requires an integrated approach covering regulatory adherence (risk mitigation, controls, reporting), the design and implementation of an appropriate operating model (processes, automation), and the identification and piloting of (Gen)AI use cases to leverage the technology’s potential safely.
Establishing compliance involves steps like screening AI uses, assessing requirements (Documentation, Classification, Limitation, Risk Management, etc.), clustering and planning implementation based on risk, mobilizing resources, anchoring changes within the organization, and finally handing over to business-as-usual operations. An organization’s maturity can often be gauged by its existing GDPR and data protection practices.