
The agile testing quadrants organise software tests into 4 categories, each serving a different purpose and primary audience to build a quality strategy.
Most software teams treat testing as a single activity. Unit tests, exploratory sessions, load runs, and acceptance checks all get lumped into one category called QA. The result is a testing programme that looks busy and leaves specific kinds of failure completely undetected.
The agile testing quadrants solve this by giving teams a shared map of the different kinds of testing that need to happen, what each kind is for, who owns it, and when in the cycle it belongs. The framework does not add more testing. It makes the testing a team already does more deliberate and more complete.
The agile testing quadrants are a framework that organises software tests into four categories based on two questions.
The two questions produce four quadrants. Each quadrant serves a different purpose, has a different primary audience, and is most active at a different point in the development cycle.


Two things to note before reading further. The quadrant numbers are not a sequence. Q1 does not come before Q2 in time. All four kinds of testing happen throughout the release cycle, often in parallel. Reading the numbers as a workflow is the single most common misuse of the framework.
The quadrants are also categories of purpose, not categories of technique. The same test approach might serve different quadrants depending on what question it is asking. An automated end-to-end test that verifies a customer journey is Q2. The same journey run under concurrent load is Q4.
Q1 contains the tests that help developers write code with confidence. Unit tests verify that individual functions behave correctly. Component tests verify that individual modules work before they are integrated. API contract tests verify that service boundaries hold. Integration checks verify that data access layers behave correctly.
These tests are fast, automated, and run continuously. A developer gets feedback within seconds of writing code. A failing Q1 test points at a specific piece of code that does not behave as expected. The fix is local and the cycle is short.
Q1 is the foundation of test-driven development, where tests are written before the code they verify. Continuous integration pipelines depend on Q1 because the tests are precise and fast enough to run on every commit without slowing the build.
Throughout development, on every commit, in every developer's local workflow.
The risk of over-investing here: A team with a large Q1 suite and nothing else knows its code works in isolation. It does not know whether the code does what the business needed or whether the customer can use the result. Unit tests passing does not mean the product works.
Q2 contains the tests that verify the system does what the business actually asked for. Functional tests check that each feature performs as the requirement described. Story tests verify that user stories meet their acceptance criteria. BDD scenarios written in plain English or Gherkin syntax bridge the business and technical audiences by expressing requirements in a format that is both readable and executable.
The audience for Q2 is the whole team. The purpose is to align the implementation with the business intent. A failing Q2 test means the system behaves correctly at the code level but does not produce the outcome the business needed. The fix may be code, may be a clarification of the requirement, or may be both.
Q2 is where journey-level testing lives in modern enterprise environments. Customer-critical workflows across complex applications are tested end to end, with the business outcome as the standard against which the system is judged.
Alongside development, during sprint review, and as the primary regression layer in continuous delivery.
Q2 sits between several owners, which is why it is the most commonly underfunded quadrant. Developers write Q1 tests because the feedback is immediate. Testers write Q3 tests because exploration is their skill. Q2 requires collaboration between both groups and strong product involvement.
Teams that skip this collaboration end up with well-tested code that does not deliver what was asked.
Q3 contains the tests that probe how the product actually behaves once it exists. Exploratory testing follows the tester's judgement to investigate unexpected behaviour without a predefined script. Usability testing evaluates whether real users can navigate the product without difficulty. User acceptance testing confirms that the product meets business needs in realistic conditions.
These tests are led by humans with strong product judgement. They cannot be replaced by automation because what they are looking for is not a specific assertion but an answer to a question that was not known in advance: what did the team not think to test?
A Q3 finding is rarely "this code is broken." A Q3 finding is more often "this works correctly but is confusing to the user," or "this passes acceptance but feels wrong in context," or "this is correct but the error message when it fails is unhelpful." These are the findings that change the product. They do not surface in a unit test.
After sufficient stability exists to evaluate the product as a whole, typically toward the end of a sprint or after a major feature is assembled.
Q3 is the quadrant most likely to be cut when delivery pressure rises, because it is the least automatable. This is also the quadrant whose absence is hardest to detect. The product ships. The tests passed. Customers find the product hard to use. The gap between what was specified and what was needed goes undiscovered until users experience it.
Q4 contains the tests that probe how the system behaves under conditions that normal use does not produce. Performance testing measures how the system responds under load. Security testing identifies vulnerabilities that could be exploited. Reliability testing verifies that the system recovers correctly from failures. Load testing determines whether the system can handle peak demand without degrading.
The audience for Q4 is the engineering team, the operations team, and the compliance function. The purpose is to surface failures that do not appear in normal use but do appear in production at peak traffic, under attack, or after extended runtime. A Q4 finding rarely blocks a feature. It often blocks a release.
Q4 tests require specialist judgement. Running a load test is straightforward. Interpreting the results, identifying which response time degradations indicate a systemic problem versus a measurement artefact, and deciding which findings require immediate remediation requires expertise the test tool itself cannot provide.
Before major releases, during pre-production validation, and increasingly continuously in organisations with mature observability practices.
Q4 often lives in a different team on a different cadence. The risk is that it becomes a release gate rather than a continuous signal. Performance and security findings should inform the requirements that drive Q2 tests and the architectural decisions that shape Q1 tests. A Q4 finding that does not change anything outside Q4 has lost its leverage.

Understanding what the framework was designed to do prevents the most common misreadings.

A team that can describe each test by quadrant has a clearer conversation about coverage than a team that calls everything QA. The vocabulary surfaces which questions the testing is answering and which questions nobody is asking.
A team with 90% of its effort in Q1 has a different problem from a team with 90% in Q3. Each imbalance is a signal worth investigating. The quadrant map makes the imbalance visible in a way that a test count alone does not.
The quadrants give developers, testers, product managers, and business stakeholders a shared map of what testing exists, who owns each kind, and where the gaps are. This conversation is difficult without a shared vocabulary. The quadrants provide one.
The numbering is conventional, not chronological. A team that arranges its release process to complete Q1 before Q2 and Q2 before Q3 has recreated the waterfall the agile movement explicitly rejected. All four kinds of testing happen throughout the cycle.
Placing one test in each quadrant does not mean a system is well tested. The quadrants describe kinds of testing, not amounts. A team with one weak test per quadrant is balanced on paper and uncovered in practice.
The framework was designed for small, co-located agile teams releasing software every two to four weeks. A team operating in a microservices estate with continuous deployment several times a day applies the same vocabulary but at a different cadence.
The framework maps; the operational reading needs to be updated for the current context.
Most teams have more testing than they think and less useful testing than they realise. The inventory reveals the actual testing posture rather than the intended one. Tests that run without an audience are not contributing to quality; they are contributing to the illusion of coverage.
The plotting exercise reveals the team's real testing posture. A team that believes itself to be balanced often discovers, on plotting, that the majority of its effort sits in one quadrant. The discovery is the value. The diagram is the carrier.
Each imbalance pattern tells a specific story. A heavy Q1 with weak Q2 means the team verifies that the code works without verifying that it does the right thing. A heavy Q2 and Q3 with weak Q4 means good functional and usability coverage but exposure to performance or security failures at release.
Most imbalances trace back to organisational structure: Q1 is well-resourced when developers own quality, Q3 is well-resourced when product teams own quality, and Q2 is the gap because it sits between multiple owners without clear accountability.
Automation belongs in different proportions in different quadrants. Q1 should be heavily automated because the tests are precise, fast, and stable. Q2 is increasingly automatable through behaviour-led platforms. Q4 is tool-supported but requires specialist interpretation. Q3 resists full automation because exploratory and usability work require human judgement that automation has not yet replaced.
Assigning ownership prevents the common failure where each quadrant is assumed to be someone else's responsibility.
What to Do
The quadrants are most useful when they connect. A team that uses the quadrants only to categorise its testing produces a tidier list. A team that uses the quadrants to connect its testing produces a quality system that improves over time. The feedback loops are where the framework earns its keep in practice rather than in theory.
A retail bank releases new functionality to its consumer mobile banking app on a fortnightly cadence. The QA team has fifteen engineers and the team believes its coverage is comprehensive.
A quadrant inventory exercise reveals a different picture.
The fix does not require more headcount. It requires reweighting.
The same diagnostic shows a markedly different distribution. The release cadence has not changed. The defect escape rate has fallen by roughly a third. The team's conversations about quality have become more concrete: people talk about specific journeys, specific failures, and specific signals from each quadrant rather than about overall test pass rates.
The framework did not test the bank's app. It changed how the bank's people thought about testing it.

The framework was designed for small teams releasing software every two to four weeks. Agile and continuous delivery have compressed that cadence significantly. Three practical adjustments work well.
During sprint planning, identify which quadrant each user story's testing effort primarily belongs to. Developers write Q1 tests as they build. Q2 tests are drafted alongside the story and automated before sprint review. Q3 sessions happen toward the end of the sprint on assembled features. Q4 checks run on a rolling basis rather than only before major releases.
At the end of each release cycle, run the quadrant inventory exercise as a team. The fifteen-minute review surfaces imbalances before they become entrenched patterns. A team that finds its last three releases were heavily Q1 with light Q2 has a signal worth acting on before the fourth release ships.
In teams releasing multiple times per day, the sprint planning horizon shrinks to the time between merges. Q1 runs on every commit. Q2 runs on every merge. Q3 happens on a rolling schedule. Q4 increasingly runs continuously rather than only before release. The vocabulary stays the same. The cadence accelerates.
The numbering is conventional, not chronological. Teams that arrange their process to complete Q1 before beginning Q2, and Q2 before beginning Q3, have recreated waterfall inside an agile framework.
All four kinds of testing happen throughout the release cycle. The number indicates the quadrant's position on the grid, not its position in time.
Some tests genuinely span quadrants. A behaviour-driven scenario can verify business intent (Q2) and serve as a regression check (overlapping with Q1). An exploratory session may surface a usability issue (Q3) and a performance concern (Q4) in the same hour.
The quadrants are a vocabulary, not a classification system that demands every test fits in exactly one box. When a test spans quadrants, note both and focus on the primary purpose.
Developer-written tests cluster in Q1. Q2 requires shared authorship between technical and business stakeholders because Q2 tests are judged by business outcomes rather than code correctness.
Teams that delegate Q2 entirely to engineering end up with a functional suite that verifies field validation and misses end-to-end journey coverage. Q2 needs explicit product ownership alongside technical ownership.
Q3 is not manual testing. Q3 is critique testing: exploratory, usability, acceptance, accessibility. The defining characteristic is purpose, not method. Some Q3 activities are manual. Some are increasingly augmented by AI tooling.
Treating Q3 as "the things automation cannot do" undersells the quadrant and starves it of the investment it requires to do its job.
Q4 often sits in a specialist team on a separate cadence. The risk is that Q4 findings become release gate observations rather than continuous signals.
A performance finding that does not update the Q2 acceptance criteria for the affected journey, or a security finding that does not inform the Q1 tests for the affected service, is a finding that lost its leverage at the boundary of Q4. The quadrants are most valuable when findings flow between them.
Q2 is the quadrant that connects the technical work happening in Q1 to the business outcomes being evaluated in Q3 and Q4.
A strong Q2 layer, built from journey-level tests that run continuously and are owned jointly by the technical and business sides of the team, is the single highest-leverage investment most teams can make in their testing programme.
The distribution of tests across quadrants changes as the product grows and the team's practices evolve. A team that last ran the inventory six months ago may have drifted significantly from the balance it intended.
Fifteen minutes at the end of a release cycle is enough to surface the drift before it becomes entrenched.
Every Q3 finding should produce at least one Q1 or Q2 test.
Every significant Q4 finding should update Q2 acceptance criteria for the affected journey. These connections do not happen automatically. They require a process and a person responsible for making sure the signal from each quadrant reaches the others.
Q1 should be heavily automated. Q2 is increasingly automatable through behaviour-led platforms. Q4 is tool-supported but interpretation-dependent. Q3 should be lightly automated because the value of Q3 comes from human judgement.
Applying the same automation target to all four quadrants ignores the fundamental differences in what each quadrant is trying to do.
A team releasing fortnightly applies the quadrants differently from a team releasing multiple times per day. The vocabulary stays the same.
The frequency at which each quadrant fires, and the planning horizon over which each is considered, needs to match the release cadence rather than the cadence of the team that originally designed the framework.
Most teams discover their quadrant imbalance at the worst possible moment: a production incident that journey-level testing would have caught.
Virtuoso QA closes the Q2 gap before that happens.
The quadrant framework shows where your testing is concentrated and where it is not. Virtuoso QA gives you the infrastructure to close the gap.
Try Virtuoso QA in Action
See how Virtuoso QA transforms plain English into fully executable tests within seconds.