Towards Safe and Trustworthy Agentic System: Red-Teaming and Automated Guardrail Agents

Project

Towards Safe and Trustworthy Agentic System: Red-Teaming and Automated Guardrail Agents

Principal Investigator

University of Illinois, Urbana-Champaign

Oracle Fellowship Recipient

Chulin Xie

Oracle Principal Investigator

Krishnaram Kenthapadi
Swetasudha Panda, Principal Research Scientist

Summary

This project focuses on the design and development of novel trustworthy agentic system, particularly those operating in sensitive or critical domains like healthcare and software engineering. We will propose effective red-teaming algorithms to simulate adversarial attacks and privacy breaches in these agents and design an autonomous defense system that leverages safety-enhancing and privacy-aware tools to mitigate risks. The project aims to help Oracle create more resilient, trustworthy AI products, ensuring that enterprise systems can operate securely, reliably, and safely across various tasks.

What’s New