
I Built an Agentic QA Framework.
Here's Exactly How It Works.
If you're looking to implement an Autonomous AI QA Agent β one that discovers, plans, explores, executes, and self-heals β this is the post for you.
What I Built β inner-event
Autonomous Agents
Discovery Β· Planner Β· Explorer Β· Executor Β· Feedback β 5 specialized AI agents, each with a single job.
WebMCP Polyfill
Injects semantic tools into the browser. No more fragile CSS selectors. 95% execution cost reduction.
Closed-Loop Learning
Failures train the system. Every broken run updates the RAG Knowledge Bank. It gets smarter every time.
You write a goal. The AI writes the tests.
No page objects. No locators. No scripts.
Your Goal (Natural Language):
βLog in, sort products by price low-to-high, add the cheapest item to cart, and checkout.β
The Command:
python orchestrator.py \ --project projects/my_app \ --base_url "https://www.saucedemo.com/" \ --goal "Log in, sort by price low-to-high, add cheapest item, checkout" \ --headed
What happens next (automatically):
5 Phases. 5 Agents. Zero Manual Work.
- π·οΈ Crawls the DOM and maps every page structure
- πΊοΈ Builds a semantic sitemap (login forms, grids, navbars)
- π§ Runs only in --deep mode for regression suites

WebMCP β No Selectors. Ever.
The biggest win in this framework. Instead of brittle CSS selectors, the agent uses semantic tool calls injected directly into the browser context.
# Selector-less execution with WebMCP python execute_with_webmcp.py \ --project projects/my_ecommerce \ --headed
β Old Way
page.click('[data-test="sort-container"]')Breaks when ID changes. Constant maintenance.
β WebMCP Way
call_tool('sort_products_by_price', {direction: 'low_to_high'})Semantic. Resilient. Self-describing. Never breaks.
Banking? Healthcare? SaaS?
Use --deep mode.
Full semantic discovery + regression suite generation + security audit. One flag.
python orchestrator.py \ --project projects/parabank \ --base_url "https://parabank.parasoft.com/parabank/" \ --goal "Register new user and transfer funds" \ --deep --security
orchestrator.py β The Hub
The orchestrator doesn't do any testing itself. It coordinates every agent, tracks phase checkpoints, retries failures, and triggers the Feedback loop.
--projectPath to your project folder. All outputs (workflow.json, trace, report) land here. Required.
--goalNatural language objective. Passed directly to PlannerAgent for scenario decomposition.
--base_urlRoot URL of the app under test. Used by DiscoveryAgent and PlannerAgent for navigation grounding.
--deepEnables full BFS semantic crawl (depth 3, 50 pages). Activates DiscoveryAgent before planning.
--forceClears .checkpoint.json so every phase re-runs from scratch. Use when the app has changed significantly.
--headedLaunches Chromium in visible mode. Great for debugging or recording demos of the agent working.
--securityRuns SecurityAuditor after execution. Scans for XSS, SQL injection patterns, and session exposure.
--phaseRun a single phase only: planning | exploration | execution | security. Perfect for iterating fast.
Checkpoint System β Never re-run what already passed:
Each phase writes to .checkpoint.json. Re-running skips completed phases automatically.
6 Specialized AI Agents
Each agent has one job. One responsibility. Zero overlap.
DiscoveryAgent
core/agents/discovery.pysitemap.json
Parallel BFS crawler (3 concurrent tab workers). Visits up to 50 pages, classifies each by DOM patterns (login, product_list, checkout, etc.), extracts all interactive elements, forms, and business rules.
Key Innovation
LLM semantically filters out ghost/hidden elements so the planner never plans against phantom DOM nodes.
PlannerAgent
core/agents/planner.pyworkflow.json
Takes your natural language goal + sitemap, detects application domain (ecommerce, finance, saasβ¦), then prompts Gemini with strict grounding rules to decompose the goal into a structured keyword-driven workflow. Falls back gracefully if LLM parse fails.
Key Innovation
Anti-hallucination rules prevent the LLM from inventing elements not in the discovered sitemap.
ExplorerAgent
core/agents/explorer.pytrace.json + screenshots
Executes each workflow step live in Playwright. At each step, it reads the DOM, calls Gemini to decide the next action, executes it, and handles multi-tab switching, lazy loading, scrolling (keyboard + mouse + JS), and autonomous registration flows.
Key Innovation
SmartLocator tries 5 fallback strategies before failing β data-test, aria-label, text content, CSS, and XPath.
DeepExplorerAgent
core/agents/deep_explorer.pyextended trace.json + regression scenarios
Extended version of ExplorerAgent that generates regression scenarios on the fly while navigating. Explores branches autonomously, identifies untested paths, and appends new scenarios to the workflow for comprehensive coverage.
Key Innovation
Uses graph memory to avoid revisiting pages and prioritize unexplored branches.
ExecutorAgent
core/agents/executor.pyexecution.json + HTML report
Replays the trace as a deterministic Playwright test. Supports WebMCP tool interception β if a step matches a registered semantic tool (e.g. sort_products_by_price), it executes the tool directly instead of hunting DOM selectors. Auto-retries on failure.
Key Innovation
WebMCP coverage stat: tracks what % of steps used semantic tools vs. traditional selectors.
FeedbackAgent
core/agents/feedback_agent.pyUpdated knowledge/sites/{domain}/locators.json + rules.md
Post-mortem analyst. When execution fails, it parses qa_session_logs.json, sends the failure trace to Gemini for root cause analysis, extracts bad locators + learned rules, penalizes unstable selectors in the Knowledge Bank, and saves positive/negative rules to rules.md.
Key Innovation
Filters generic programming errors β only domain-specific UI lessons are persisted.
Architecture Diagram
How the orchestrator, agents, browser, and LLM all connect.

Tools & Libraries
Every tool chosen for a specific reason. No bloat.
Playwright (Python)
Browser AutomationDrives Chromium headlessly (or headed). Handles clicks, fills, multi-tab context switching, screenshot capture after every step, and network idle detection for slow production sites.
Google Gemini
LLM EnginePowers all AI reasoning β page summarization, semantic element classification, goal decomposition, and failure post-mortem analysis. Uses google-genai SDK with structured JSON response mode enabled.
WebMCP Polyfill
Protocol BridgeInjects a navigator.modelContext polyfill into the browser page context. Registers semantic tool functions that the ExecutorAgent calls directly β eliminating the need for CSS/XPath selectors entirely.
python-dotenv
ConfigLoads GOOGLE_API_KEY and other secrets from .env at startup. Keeps credentials out of source code. All agents call load_dotenv() on init.
Pillow + OpenCV
VisionUsed by the visual locator for screenshot analysis. Pillow handles image loading and resizing; OpenCV applies template matching and edge detection for element identification when DOM selectors fail.
Faker
Test DataGenerates realistic random test data at runtime β emails, names, phone numbers, addresses. Replaces placeholders like {random_email} in workflow steps. No hardcoded test users needed.
PyYAML
Knowledge BankReads and writes the domain knowledge files (banking.yaml, ecommerce.yaml). Stores stable locator patterns, business rules, and compliance checks that the PlannerAgent injects into prompts.
termcolor
DXColor-codes terminal output by phase β cyan for discovery, green for success, red for failures, yellow for retries. Makes it instantly clear which agent is running and whether it passed or failed.
Start Automating in 1 Command.
Clone the framework, set your goal, and watch the agent explore, plan, test, and self-heal β completely autonomously.
About the Author
Vishvas Dhengula β Lead SDET
Vishvas is a highly accomplished Software Development Engineer in Test (SDET) with 15+ years of experience architecting enterprise test automation frameworks for Fortune 500 companies across the United States and India. His expertise spans across a wide range of industry-leading automation tools, including UFT, Selenium, Cypress, Protractor, and Playwright.