Towards Safe and Trustworthy Agentic System: Red-Teaming and Automated Guardrail Agents
Project
Towards Safe and Trustworthy Agentic System: Red-Teaming and Automated Guardrail Agents
Principal Investigator
University of Illinois, Urbana-Champaign
Oracle Fellowship Recipient
Chulin Xie
Oracle Principal Investigator
Krishnaram Kenthapadi
Swetasudha Panda, Principal Research Scientist
Summary
This project focuses on the design and development of novel trustworthy agentic system, particularly those operating in sensitive or critical domains like healthcare and software engineering. We will propose effective red-teaming algorithms to simulate adversarial attacks and privacy breaches in these agents and design an autonomous defense system that leverages safety-enhancing and privacy-aware tools to mitigate risks. The project aims to help Oracle create more resilient, trustworthy AI products, ensuring that enterprise systems can operate securely, reliably, and safely across various tasks.