Audit an AI Agent Like It Might Be Lying
For engineers and product owners running AI agents in production β or almost there. Describe or paste your agent's architecture, task, tools, and what you're currently measuring. Get a systematic evaluation across five dimensions: task quality, failure modes, tool use integrity, token efficiency, and observability gaps. Trust but verify β especially verify.