On Software Testing and the case for the Common Test Scenario Database

22 Aug 2011 soapbox

TL;DR


As a participant (and sometimes contributor of ideas) to Test Automation efforts (Strategy, tool selection and implementation), it has been my long-standing opinion that the true gains from automation can be achieved only when Test Scenarios (enshrined as exemplar test data) are a heavily reused resource across the organization. This page is an attempt to "arrive" at that opinion by progressively optimizing the software testing process.

I take the example of testing the hotel search feature and progressively optimize it through 4 stages to illustrate the value of each such optimization; and the value of shared test scenarios in particular.

What is Software testing?


Software Testing is the art and science of validating that a particular piece of software behaves as expected. Since most words in the previous sentence are loaded with meaning, I'll dissect it:

A Concrete Example - Testing Hotel Search functionality

Any travel website has a hotel search functionality. You enter a destination, start and end dates and number of guests; and the website returns a list of hotels. Each hotel card has some details (name, location, address, picture, reviews, rating, description etc); and the list itself has some details (count, sort order, #pages, etc).

Notice also that there are some implicit inputs (eg: choice of POS, choice of currency) and some implicit outputs (eg: sort order).

To truly test this feature, you need to validate that:

Focusing on functional testing alone for a moment, we see that:

The universe of all things that need to be tested therefore is of the order:
             Number of input combinations  X  Size of output for each input combination
Now, for the hotel search scenario,
            Number of input combinations = Product of (Size of the set that an input belongs to) for all inputs.
And
            Size of each output = Size of result set details + Size of details for all hotel cards
Note: inputs include both implicit and explicit ones


Using the details mentioned above and ignoring the etc's for now,

Size of inputs for hotel search
Number of inputs = 2 implicit + 3 explicit = 5
Product of sizes
= Size of the set of destinations
X Size of the set of Possible Start dates (and invalid ones)
X Size of the set of Possible End dates (and invalid ones)
X Size of set of number of guests (subset of Integers)
X Size of set of POSs allowed (subset of integers)
X Size of set of Currencies allowed
 
= 50000 (assumed)
X 365 (assuming a year of allowed booking)
X 365 (assumed similarly)
X 4 (assumed to be max allowed guests per booking)
X 1 (assuming USPOS)
X 1 (assuming USD)
Size of each output
= 3 result set details + N x (7 hotel card details)
 
Where N = size of result set itself

If N = 100, the Test Universe = 50000 x 365 x 365 x 4 x 1 x 1 x (3 + 100 x 7) = 1E13 Tests
Onto this set of 1e13 Tests, we'll have to add the Regression, UI Validation and Performance Tests as well.


Sidebar for the mathematically inclined
 All testing can be considered validation of functions. Given an abstract function y = f(x), testing can be considered validation that for all x in X (the input set), there exists the expected y in Y (the output set)
The domain of the function represents the size of the input and the range of the function represents the size of the output. The Cartesian product is, therefore, the set of tests to be conducted to validate the function. 


Obviously, this is a lot of testing to do; and if we're actually able to do all of it, that would be Exhaustive Testing. Also obviously, anything larger than the simplest feature would quickly be intractable due to the combinatorial explosion of tests required; so we apply the "art and science" part and try to pare the problem down a bit. I'll focus on the functional testing here, but most of the concepts apply to UI validation and Performance as well.

Before we start, however, I'll define some factors to evaluate the efficacy of our optimization with:

Optimizing testing - Round 1: Scenario Based Testing

This optimization concept is very simple - reduce each set mentioned above to a representative subset. Each element of this subset would "stand for" or represent an entire class of values within the original set. A success or failure of the chosen value is deemed a success or failure of the entire class.

To continue the Hotel Search example, the inputs could be reduced to the following scenarios (for example):

Destinations
  • Top 10 destinations
  • 1 non-US destination
  • 3 destinations with "known issues" or special edge cases
Booking Dates
  • Last minute bookings (1 date in the next 3 days)
  • Planned bookings (4 dates in the next 2 quarters)
  • Peak date bookings (2 National holidays)
  • Weekend bookings (2 weekends in the next 2 months)
Guests

  •            4 is small enough a number to test all combos, but we could still pick a subset, say 1 and 4 only
With this, the input size drops to 13 x 9 x 2 = 234
And the test universe becomes 234 x (3 + 100 x 7 ) = 164,502

That's still a huge number, but 8 orders of magnitude less already! We could optimize this further by reducing the validation on the output values if we want to. Realistically, we can probably get away with checking 4 of the 7 details for the first 25 hotels; and checking the result set details just once throughout. So the test universe reduces to:
234 x ( 25 x 4) + 3 = 23,403

How has this impacted our evaluation factors?

Note that there are more algorithmic ways of arriving at a subset of tests; Orthogonal Array testing to name one. I'll not elaborate on this further as the optimization is the same in principle - that of reducing the total number of tests required in validating a feature.

Similarly, on the regression testing side of the house, scenario-based reduction of tests can be done by carefully analyzing the areas that changed code is likely to impact; aka Impact Analysis.

Optimizing testing - Round 2: Automation

When you have many features similar to the one depicted above, scale effects come to bear:

The optimization to counter this is conceptually simple - relegate repeatable tests to a machine so that human cycles can be spent on the unique ones. This is easier said than done in practice; and the quickest way to get started is - surprisingly similar to BDD precepts - Outside In. That is, start at the outermost layer and automate tests at that layer. Work gradually inwards; or even not at all. Automating regression alone can have significant benefits.

One of the biggest issues with automation, however, is that you miss out on the human ingenuity bit. Scripts WILL break if data changes over time, so environments have to be stable; something that the human tester would easily sidestep by validating "outside the box" that the changed data is indeed still valid.

To continue with the Hotel Search example, assuming both the human and the machine take the same time for a test, the gain in human cycles due to various levels of automation are:

Feature
Manual Tests
Human cycles saved with 10% automation
25%
50%
75%
Hotel Search
23403
23404 * .1 = 2340.4
23404 * .25 = 5851
23404 * .5 = 11702
23404 * .75 = 17553.0

Reviewing our factors again, we see that with this additional optimization,

Optimizing testing - Round 3: Component based testing

The cost of maintaining test environments mentioned above is typically the tip of the iceberg. All testing espoused to this point has been strictly end-to-end, ie, the environment has been a live one from the UI all the way to the database (or back-end). There is a non-trivial cost associated with maintaining these environments; and a collateral cost of maintaining scripts (or known data for use by the scripts) as those environments change. Additionally, some kinds of testing may not be possible in live environments. Booking scenarios are typically such tests - contractual obligations or the cost of test bookings may deter such tests from being done in a large scale or at all.

In addition, end-to-end testing forces the entire organization into a train wreck of dependencies. Since all testing is done from the UI, all code must be built and integrated before ANY testing can start. This not only delays testing, it also puts pressure on changes to the inner layers of the application - that code has to be completed WAY IN ADVANCE of the UI code, but cannot validate their output until the UI is done.
Component testing attempts to fix these issues by testing each component
at ITS interface, not at the final user interface. That way, the owners of that component know for sure that they produce valid output for given input; a large live environment need not be maintained; and the validation of a test scenario is spread across multiple component tests which together comprise the larger validation.

Component testing almost always predicates the mocking of dependent components because the cost gains are not realized otherwise. That is, if A -> B -> C is a string of components involved in a particular test scenario, C must be mocked out to test B and B must be mocked out to test A; otherwise we've taken on the additional job of maintaining a separate instances of A,B and C solely for component testing purposes, thereby increasing cost of maintaining environments more than without it.

Component testing also typically requires some means of creating mock data - manual means will not suffice; especially if the request-response payloads are huge object graphs.

The choice, adoption and usage of an organization-wide mocking framework is therefore a non-trivial task and I will not go into the details of how to achieve this. I will, however, analyze the impact of adopting such component testing on the evaluation factors mentioned above.

To continue the Hotel Search example, a hotel search typically involves a string of internal components:
                                                               |------------> GDS
UI -> Business  -> Business  -> Business  ->    Back End-------|------------> CRS1
Dispatcher        Facade     Logic Executor  Abstraction Layer
                                                               |------------> CRS2
(Some details may be missing; sorry. I'm trying to make a larger point here).

Let's take the following Test Scenario:

Input
Expected Output
Top Destination (LAS), Weekend (start: next Friday, ret: next sun), 1 guest
  • 1st page has 25 hotels
  • 1st hotel is Caeasar's Palace @ $100
  • 2nd hotel is MGM @ $125
  • …and so on
…and break it into component tests:


Component
Given a test script that provides this input
..should provide this output
..using this mock data
UI Layer
LAX, Next fri, Next Sun, 1 guest
  • 1st page has 25 hotels
  • 1st hotel is Caeasar's Palace @ $100
  • 2nd hotel is MGM @ $125
  • …and so on
  • Business Dispatcher response containing the 25 hotels
Business Dispatcher/ Facade
LAX, mm/dd/yyyy,mm/dd/yyyy,1
+ other Dispatcher required params
Arraylist of Business Layer objects
  • Executor response containing the 25 hotels
Business Logic Executor
LAX, mm/dd/yyyy,mm/dd/yyyy,1
+ other Executor required params
Arraylist of Executor-specific objects
  • LAX-to-internal-location-object response
  • Back end Abstraction Layer responses
Back end Abstraction Layer
LAX, mm/dd/yyyy,mm/dd/yyyy,1
+ other Back end required params
Arraylist of Back end Abstraction Layer objects
  • Back end-specific responses (1 per link)

We'd have to do a similar exercise for each such scenario identified before as required, but if we did, the impact on the factors would be:

This last is not easy to do for various reasons:

Optimizing testing - Round 4: Common Test Scenario Database

These issues with implementation of component testing may even lead to a regression back to manual testing. The crux of the problem is that the cohesive force of the end-to-end test is lost in component testing very easily.

The central idea with the common test scenario database is retain the benefits of component testing while bringing back that cohesion via data: we need to ensure that test data that is distributed across the various component test scripts still have the same tie-in to the original scenario. That way, every component owner in a particular scenario refers to the same scenario using the same language. While we're at it, it would also be beneficial to change the mock data in two ways:

The larger change would be to introduce full-fledged exemplar data sets for application domain concepts that cannot be confused with live data, but clearly denote the exact scenario in which they can be used; and use policy to drive adoption of these exemplar data sets as the mock data backbone.

To continue on the hotel search example, the first step would be to define the following exemplar data:

Concept
Exemplar Data
Comment
Top Destination
LAS

Regular Destination
USDEST1

Special Destination
Acme Cozumel
Added "Acme" to Destination to call out that this is a test value
Next Week Friday
(Computed Value)
Mock data framework should be able to generate such values and respond appropriately
Hotel Chain
Acme Hotels LLC

Hotel
Grand Acme Ritz Chicago

Top Hotel @ Top Destination
Acme Hotel and Casino


The component test from before can then be rewritten like so:

Component
Given a test script that provides this input
..should provide this output
..using this mock data
Web-wl
LAS, Next fri, Next Sun, 1 guest
  • 1st page has 25 hotels
  • 1st hotel is Acme Hotel & Casino @ $100
  • 2nd hotel is Acme MGM @ $125
  • …and so on
  • TBS response containing the 25 hotels
    (Note: other hosts required to bring wl up ignored for now)
TBS/Plugin
LAS, mm/dd/yyyy,mm/dd/yyyy,1
+ other TBS required params
Arraylist of BookProduct objects
  • HSE response containing the 25 hotels
HSE
LAS, mm/dd/yyyy,mm/dd/yyyy,1
+ other HSE required params
Arraylist of objects
  • Market/Markup response
  • Supplier Link responses
SL Host(s)
LAS, mm/dd/yyyy,mm/dd/yyyy,1
+ other Supplier Link required params
Arraylist of objects
  • SL-specific responses (1 per link)

More importantly, when a second scenario has to be mapped to component tests, the exemplar data table above should be checked to see if the concepts in that scenario already exist, and if so they should be reused.

So, to convert the following scenario:

Input
Expected Output
Top Destination , Peak Weekend, 4 guest
  • 1st page has 25 hotels
  • 1st hotel is Acme Hotel & Casino @ $100
  • 2nd hotel is Acme MGM @ $125
  • …and so on

...into component tests, the following data items will have to be reused:
…and some new data items will have to be added:

…which will further be reused when automating the scenario:

Input
Expected Output
Top Packaging Destination , Peak Weekend, n guests
  • Acme Mexico Dest1
  • Labor Day Weekend dates
  • 2 guests
  • 1st page has 25 hotels
  • 1st hotel is Acme Cozumel Resort1 @ $100
  • 2nd hotel is Acme Cozumel Resort2 @ $125
  • …and so on
.. Which will further require new data items to be created, and so on.

When a new feature is added, say separate pricing for children or prepaid hotel rooms, that's the time for a completely new set of hotel chains and hotels to be created.
Over time, this practice of reusing test scenarios results in the creation of the Test Scenario Database which becomes the lingua franca across the organization when talking about Quality issues.

Let's see how our measuring factors score with this optimization:

Notes on implementation




Todo

  • Create a Test Scenario database tie-in for the Robot Framework

© 2024 Vinod KD