Software Testing Techniques Notes

Chapter 1: Why Worry?

Testing is not simple:

Fact: All software has bugs

Fact: The modern world runs on software

Implication: Fate of the world is on testers!


Software Engineering has made developers tools and made developers lives easier:

  • High level languages
  • Code Generators

All these sorts of tools make the developer more productive (more code in less time with less effort). Unfortunately this just mens we make bugs faster.

What Tools make a testers life easier?:

  • Test Execution Tools
    • Capture Replay
    • Load/Stress
    • Those still dont help define what to test
  • Test Creation Tools
    • Generation from formal specification language (Requires extra dev and design time)
    • State Transition Diagram Testing (fails because there are too many states)
  • Code Coverage Tools
    • Determines which source code lines were executed during a given test
    • Unit testers ensure all lines hit
    • Makes regression testing easier
    • But is it enough to know that a line was executed?
    • Turns out... NO.
    • The question isn't always "if", but "when"
    • Need context to test (limits, interface values, volumes)

Testers paradox:

  • Testing tools do not keep pace with development tools.
  • Coupled with real world business requirements leads to a distressed tester.

Question: Is removing all defects a good mindset?

Answer: No, it is a dangerous mindset in fact. Any attempt to removing all bugs on large scale general purpose software will only distract you from the real mission. Finding the defects that matter.

Pesudocode Example: Imagine a program whose only job it is to return to the caller. The following c-style pseudocode implements this behavior.

/* BR14 Code*/ main() {

return; /*Return to the caller */

}

Is there a bug in this program?

  1. Code coverage can guarantee each line was executed trivially.
  2. No dynamic memory allocations, so it cant possibly leak memory.
  3. No locks or threads created, so there cannot be any serialization isses.
  4. Cannot have data integrity issues since it doesnt read or write anything (not even console output)

Question: Is this program truly bug free?

Answer: No. It isn't what the code does, it is what the code does not do! The code does not set any return value! So while this code might work for some tests that don't check for a return code, other callers who do check for a certain return value might fail miserably. The unpredictable return code, and no indication of success could cause havoc with automation.

The 1 line pseudo code fix is an exercise for the reader


Good tester traits:

  • Curious - "Why" and "What if" (mostly what if)
  • Skeptical - Missouri motto - Show me
  • Restless - If a bug escapes, do not become defensive, re-examine and re-evaluate your test cases to do better.
  • Upbeat - Everyone else in the company thinks a bug is a bad thing. You do a good job finding defects, and it can come back as a negative environment to work in.
  • Diplomatic - Dealing with development is not always easy. Developers have stong ego. You have to tell them their baby is ugly.
  • Insatiable - You need to truly understand complex environments if you want to test for them.
  • Generous - You must share information and educate other testers and developers.
  • Empathetic - Think like the customer and share their concerns
  • Resilient - You find 2000 defects in a test cycle. Ship software. Cust finds one bug. They will assuredly ask "Didn't you test this?"

Chapter 8 Testing for recoverability


  • Attacking a programs recovery capabilities during FVT
  • Focusing on the entire product view in FVT
  • Expanding the scope of IT
  • An example of clustered server recoverability testing

Testing a programs recovery action makes a lot of sense early on, since early software is likely to be riddled with bugs and unstable. Having a working recoverability mechanism means less time between tests when critical failures occur.




FVT


  • Special Tools an Techniques - Some external methods may be used to trigger failure, such as filling logs, or killing a process. Other times FVT needs to do things like simulate bad parameter being passed to a function, or forcing an interrupt to occur just when a module reaches a critical processing point.
    • Stub routines - Just like unit testers do. Write stubs that replay what they get, altering a single parameter to be invalid at each iteration. You can also alter a module to pass back bad data to the next. Only works when the module being tested is called infrequently, and ideally only when called by the module under test.
    • Zapping tools - some software systems allow you to find the memory address of a particular piece of code running on a system. Clearly one can construct tools to write values at that address on the fly. Some systems even have this capability architected into the software intentionally. Dynamic alteration of memory at runtime is called a "zap". Writing a tool like this is not hard! This approach is less artificial than it may seem.
    • Error injection programs - Crafting specially purposed software that will seek and destroy something of interest by injecting an error at precisely the right place. This is really just a completely automated and targeted zapping tool.
    • Emulators and Hypervisors - Some hypervisors allow you to set breakpoints for the code executing on them. You can set a breakpoint at a particularly interesting location, modify memory or registers and resume execution. If you have done your job right, the recovery routine will begin shortly.
  • Restartability - the most basic form of recoverability. Can the software restart cleanly after a crash.
  • Component-level recovery from anticipated errors - most commercial software has component level recovery. Sometimes this is implemented with simple try-catch language mechanisms. When creating a test plan for FVT, you should identify all places where component level recovery can/should occur, and force each of those scenarios. Key places to target testing include memory allocation, and shared resources (mutex and locks).
    • Sufficient Diagnostic Data - Your test plan should attempt to verify that any error information generated is sufficient for its intended purpose. Ie are all messages presented to the user sufficient in their explanation and information such that a user can make intelligent decisions about what to do next. If diagnostic info is presented, is it sufficient to obtain root cause? Go beyond testing to specification and ensure "fit for purpose".
  • Component-level recovery from unanticipated errors - code should always contain catchall processing for errors! Find a way to force the code to react "reasonably" to errors that it hasn't been designed to address. Reasonable in the previous sentence should be shaped by your testing knowledge of the product and its purpose.

SVT


  • Restartability - program crash or system crash?
    • Program crash - Since you are operating at a system level, setting break points isn't allowed//applicable. You must find an external way to cause the program to fail. Bad input, memory shortage, or system commands that can force a process to die quickly and with a vengeance are fair game. There might also be test driven features to do some of these tasks depending on test input to the software design meetings.
    • System Crash - Get the program to do some work, then kill the entire system! Power it off is the simplest approach. After the system is restarted, check that the application operates normally, looking for any anomalies and or recovery action(look at messages and logs for starters).
  • Clustered System Failures - From the workload sprayer, crash a node in the cluster, crash all nodes serially, crash multiple nodes in parallel. Crash the sprayer as well (looking for a hot standby perhaps to take over).
  • Environmental Failures - Failure of underlying hardware may be fair game for some types of system test. For instance testing OS'es, they must cope with the loss of hardware functionality. Fill file systems, drop paths to storage devices on a SAN, remove networking cables, drop a CPU in a multiple CPU system. Even if applications don't support these failures directly it is well worth your time to test these scenarios. Forcing a branch to an OS error handler can allow these to happen even if you do not wish to destroy your real hardware.
  • Natural Failures - During normal testing, if the application should fail, ensure its recovery mechanisms get used when the software is brought back online!

Integration Test


  • Dependent product failure - Crashing dependent software products to determine recovery action in a tested product. What happens to "in flight" data or transactions during these crashes and what happens on application restart? Crash, hang, or disrupt each element in a chain. Really fun form of testing.

Summary


Recovery testing is fun, important and often overlooked.

Applicability across all test phases.

Particularly effective when done when systems are under load or stress.

Chapter 2: Industrial Strength Software

  • Testing software in college -> Run once, with successful output, hand it in, party
  • Testing software in industry ->

Run over and over without harming data Run concurrently without harming data Run for extremely long periods of time without harming data or eating resources Golden Gate Bride Analogy

Test Sensitive Characteristics:

  • Heterogeneous environments - If you are testing in a homogeneous environment, you are more than likely doomed to fail. In fact your successes may depend on a bug in your precise environment.
  • Size of data sets -
  • Service level agreements - Can you minimize planned and unplanned downtime
  • Continuous Availability - 1 Million bucks a minute is too costly for downtime
  • Backup and recovery
  • Compatibility
  • Virtualization

All of these must be kept in mind, or taken advantage of, when designing and implementing test cases.


Chapter 3: The development process


Cheap, Fast, Good, you chose!

Usually we make the decision to skimp on speed or cost.

But what does a bug cost? It turns out bugs are much less costly if found early in the development cycle.

Tester finding a bug in his own code means no later testers, build support, project managers, etc need to get involved.

Should definitely read and incorporate [Johanna Rothman02] here

Black Box Testing - Assume little if any internal knowledge when testing. Relies on inputs and surrounding components to formulate a test.

Grey Box and Glass Box testing - hybrid approaches that use strict externally driven black box tests as well as the details and internal process examined in white box testing.

White box Testing (aka Clear Box) - Rigorously understand internals to plan test approach.


Ontology of Test


Unit Testing: The First Phase

  • Scope: The initial testing done. Testing at its lowest level. Test all new and altered code for coverage, correctness, branches, loops, inputs and outputs. Also ensures program recovery, diagnostics, and traces can be run. Here the tester/developer asks: "Did I implement code functions that work as documented/Designed?" These testers often have exceptionally deep knowledge of a single function or functional components at the end of a test cycle iteration. They may lack breadth of knowledge however due to limited exposure.
  • Targeted Defects: Obvious errors, loop termination problems, invalid parameter passing, assignments,
  • Environment: Usually a single native system (sometimes virtualized or emulated hardware is used)
  • Limitations: This is a very micro view of software. Testing individual functions written by different developers without the greater context in a very isolated fashion can mask problems until later. But this is a trade off. Isolation is necessary to focus on the new and changed code paths. Scaffolding mechanisms are used when not all functions in a module are ready for test.
  • Cost and Efficiencies: usually performed by members of the development team. After all, you need to know which lines changed, and understand the code for quick turnaround. This is the most cost effective part of the test plan. Most bugs will be found at this stage.

Function Verification Test: The Second Phase (FVT)

  • Scope: After FVT of each individual module, these individual modules are packaged together and tested based usually on related function. Note some corporations call this integration test, but as we shall see, there is a more specific kind of test that used the latter nomenclature. Typically done as a single user and not done to extensive scale. Here the tester asks: "Can I do feature A as a single user one time?" These testers often have very deep knowledge of a single component or set of components at the end of a test cycle iteration. They may lack breadth of knowledge however due to limited exposure.
  • Targeted Defects: Feature based validation. Does the software function as advertised?
  • Environment: Native or virtualized environments similar to those used in the unit test phase. Virtualization has many benefits (can provide breakpoints, can allow a tester to alter disk and memory values without the danger of destroying their real system).
  • Limitations: Only covers functional units. Feature line items, without looking at the product/package as a whole.

Cost and Efficiencies:

System Verification Test: The Third Phase (SVT)

  • Scope: All of the code that makes up a single product or program is brought together to be tested as on collective work for the first time in concert. Here the tester asks: "Can I do feature A over and over again without failure? Can I do feature A as several users concurrently? These testers often have very broad and fairly deep knowledge of a single product at the end of a test cycle iteration.
  • Targeted Defects: Timing and Serialization defects, software capability under stresses. Data integrity testing. Security testing. Recovery defects.
  • Environment: Environments are often virtualized but on robust virtualization. Systems with larger capacity are needed as this is the first time tests of scale may happen.
  • Limitations: It is product based, and does not truly go after defects that occur on the boundary between 2 products, such as a web server connecting to a data base.
  • Cost and Efficiencies:More expensive hardware configuration are often needed to support the scale out and multiuser testing. This phase introduces the first real glimpse at test goal vs budget thinking.

Performance System Verification Test:

  • Scope: This test is all about `identifcation of performance related strengths and weaknesse.
  • Targeted Defects: Indentification of bottlenecks that reduce response time or throughput characteristics. Often strict experimentation is performed with data collection and analysis.
  • Environment: Native dedicated hardware is used.
  • Limitations: Software and hardware must be very stable before these types of experiments can be performed.
  • Cost and Efficiencies: Even more hardware is required than in the SVT situation. Typpically for any mature software performance defects can be costly to resolve as they require major algorithmic changes and complete testing from all the way through svt acceptance of those changes before it can be properly performance tested again.
  • Discussion of Load/Stress - Load/Stress tests can be used to find defects and can be used for performance analysis
    • The performance analysis is based on locating and finding bottlenecks while the other aspect id obtaining a performance number that encapsulates the best possible score for that software.
    • Testing for defects on the other hand uses incredible loads to cause failure.
    • SVT uses load/stress to create chaos and find defects, PVT uses them as a way to obtain a measurement only when the stress and load still allows a clean smooth repeatable measurement. The same toll might be used for completely different reasons depending if you are using it as a performance tool or as a svt defect finding utility.

Integration Test: (AKA Acceptance Test)

  • Scope: Targets complete solution stacks or multiple interacting products at once.
  • Targeted Defects: Defects which occur only when in multiproduct domains. Fleshing out the bugs between component interactions. Additionally systems management defects and operational gaps are identified. Integration testers act like a virtual company with a complete it staff and hardware to model complex interactions between software and hardware. These staff must have especially broad knowledge across products while often having fairly deep knowledge of some particular aspect of the environment they work in (such that in conjunction their team has someone with depth in each product being integrated).
  • Environment: Typically whichever environment is similar to the expected actual customer solution that is being modeled. This includes a complex mix of software and hardware from potentially many vendors.
  • Limitations: relies on client data for accurate modeling of real world environments for the targetting of time consuming and complex integration testing processes. Strategic choices about integration test must be made to be representative as not all options can be inclusive.
  • Cost and Efficiencies: Cost becomes very expensive when emulation of customer solutions is done to reasonable scale. But it can be a very efficient way to test or create end user documentation through whitepapers and experience reports.

Beta Test: (AKA Early Support Program ESP, Early Adopter Program)

  • Scope: The scope is broader than integration test and svt as it is being tested by a customer in a real environment under real constraints.
  • Targeted Defects: Anything affecting that business users needs, including installation, migration, management, regression, and integration.
  • Environment:
  • Limitations: since beta testing is done late in a test cycle (to keep customers enthusiastic about a product) there is often only a short window for feedback. Additionally close beta programs cannot hope to cover every possible customer like environment, so the act of selecting which customers to involve can be arduous.
  • Cost and Efficiencies: One the major efficiencies can occur when another division or group within the same company can begin using a relatively stable piece of software early. This might provide competitive advantage etc. Customers and partners get an early peek into your software so that they might take full benefit for compatibility with their own product development road maps. The cost to the customer are often overlooked. The customer has its own goals to meet and testing your software might not always make a good strategic fit. In addition poor software beta quality may shy customers away from the final product.

Ends chapter 3 up to page 43


Traditional Software Development Models

New models arise all the time. Lets take a look at a few of the widespread models out there today:

Waterfall Model


A linear and sequential approach that relies on the previous phase being entirely completed before the next begins.

Strength relies on thoroughness, rigorous work, and strict processes.

Success depends on clear concise documentation from one phase to the next.

Entry and Exit Criteria

Checkpoint reviews conducted throughout the process.

Often criticized for not having a proper feedback loop in place. The primary source of feedback is from large test phases after development has completed.

[Figure 3.2 from page 44]

Waterwheel Model


Time to marker pressures drove a modification of the waterfall. In waterfall model entire code development is completed before any test phase begins.

This is "overkill" as the book says. It is a single development cycle, with an iterative, cumulative code delivery loop between the development and test teams.


Waterwheel attempts to maintain the strengths of the waterfall but being a bit more dynamic.


Structured continuous code integration.


High level structure of the waterfall stays in tact, but development and FVT overlap and operate in an iterative fashion.


Code drops stream into test like a water drop on a waterwheel. FVT test cases expose bugs that influence development again.


The whole model stresses internal delivery of code to professional testers who test from a white box and black box perspective.


[Figure 3.3 from page 46]


Staged Dev liveries to FVT continue in the same fashion (FVT hands early stable builds to SVT as development hands early drops to FVT).


[Figure 3.4 from page 47]


The major issue is here that SVT is supposed to test everything in a chunk, so staged delivery can undermine it.


This method is well known in industry, gives test teams more flexibility, and can increase time to market when deployed properly.


Common elements

Both the waterfall and the waterwheel require careful management and a strong build process. Both rely on the expectation that new incoming requirements will be few and far between. Both of these procedures deliver code to a customer only at the end of the process.


  1. Checkpoints - Moving from one development point to the next requires a checkpoint review. Reviews assess criteria set forth in test plans (Entry and Exit Criteria)
  2. Criteria - Entry and Exit criteria attempt to control the total amount of overlap in test phases
  3. Overlap - Development and test overlap. Requirements definition and design can even overlap!
  4. The test cycle length is assumed to be rather long.

Build and Installation - key component to any software development project. Consolidating code drops into a working build is not trivial!


Iterative software development models

Agile processes

Agile software development processes focus on a drive for working code ad rapid production [Turk02]. Biggest issue they attempt to solve is time to market.

Deliver early and often.


Some "agile test driven development" techniques consider development as the customer of testing. Unit test created before actual code is implemented.


Agile explains that incoming requirements will change and a system needs to be in place that can handle it. According to Turk and associates, agile works well for small teams in close proximity. Williams and Cockburn show that agile is best when used with collocated teams of 50 or fewer. Additionally physical proximity is strongly ingrained in the agile principles. This may not be reasonable on a global software development team. Change control is critical yet not necessarily pointed out specifically in agile documents and guidelines. Speed of development can get in the way of reuse. The waterfall and waterwheel method allow more development time for reuse.


Extreme programming

Developed by Kent Beck, designed to react quickly to changing requirements (as is agile). The principle is that tests are developed from discussions with customers first, then code is written to make the tests succeed. Traditional models use feedback to correct problems, XP uses feedback to create designs. All tests from all programmers are accumulated and used as regression on each build drop.

Another strong point is cooperative code ownership. Many eyes looking at the code to improve quality. Each developer should be familiar enough with the code to point out defects. The authors of the text book question if the same level of objectivity would be had in this shared review scenario as compared to a traditional independent FVT test team. XP does not focus on documentation and focuses on shipping software. This in itself has a downside that programmers returning to this code base after moving on to other projects might have a hard time getting back up to speed without documentation. The authors claim agile programming is best suited for straightforward software development projects. They question the effectiveness in large complicated projects where documentation, quality control. and objectivity are critical.


Spiral model

Proposed by software Guru Barry Boehm. Not technically an agile model but has interesting ramifications. Distinguished by risk assessment and analysis during the development process. Specifically Boehms research finds end user applications do not fit well into the waterfall model as up front correct requirements may be difficult to obtain. Similar in many ways to the waterfall model but at a different time scale.


The duration of each of the phases is a function of the size of the software project itself.


Four process quadrants: Figure 3.5 on Page 52 . Start on the negative x axis close to 0. Radially spiral outwards clockwise through each of the phases below.

  • Objectives - continuously revisiting the objectives in the context of plans or developed code will help make sure the project is moving in the right direction
  • Risks - Risk driven rather than document or code driven. This stage determines if the project should pause, move forward or be terminated. This happens between project phases. (Waterwheel accomplishes similar tasks through the use of iterative code drops and integrated fixes. Funding, requirements updates, and hardware issues can all affect a projects level of risk. It is not specifically clear how risk assessment and analysis could be applied specifically in a portion of the revolution (such as the integration and test phases where it might make the most sense).
  • Product Development - intended to validate requirements, create deliverables, and verify the next level/iteration of product development. This phase of the cycle has the greatest intersection with test activities.
  • Planning the next phase - the final quadrant is all about looking ahead to the next product iteration to determine what should happen with the next revolution thought the quadrants.

Evolutionary Model

Developed by May and Zimmer. Referred to by them as EVO. It is essentially a combination approach, where phases flow from one to the next (sort of like the waterfall model), while making use of feedback loops to ensure required improvements to software, procedures, and processes are done as the project proceeds. They focus on breaking the implementation phase of development down into a set of smaller cycles (to help with risk analysis and mitigation) [May 96]. These small cycles tend to last 2 - 4 weeks and include design, implementation, and initial testing. There is a similarity to the waterwheel method as short cycles and turn around for implementation and fixes are both present, but a key differentiator is that with EVO the code might well be given to an external customer as beta code for external feedback at the end of one of these mini cycles, when the waterfall method only releases to customers at the end. Emphasis on customer as beta tester. Cycles are truly end to end. At the end of a cycle process improvements may also be considered to development and test processes in addition to defects and product improvements.


Downside is that such short cycles may not work well in extremely large scale software projects.


Test Driven Development: Algorithm Verification Test (AVT)

Uses a white box testing methodology to ensure an applications key algorithms are working correctly. Some consider this not to be a testing phase and more of a development tool. The activity is focused on finalizing algorithm behavior as opposed to verifying functions. This phase is most useful when overlapped with FVT testing or at the very beginning of SVT.

Start by defining a new set of experimentation points (places where the algorithm makes "significant" decisions that drive software behavior. Not that the algorithms being tested must be stable!

Perform experiments and gather data for analysis.

Iteratively analyze results, adjust algorithms as needed, reproduce experiments until optimization is complete.

Writing a workload for this type of test would be difficult if not impossible to do without white box testing. Since it is essentially what is extracted from white box analysis that defines the workload.

Data analysis may require special instrumentation or tooling. This is because the information generated by the algorithms under test need to be analyzed and not all algorithms are capable of producing that data out of the box.


Reactionary Approach


In reality many only consider this approach in reaction to catastrophe as the authors suggest. Most of the processes we have discussed up till now have been very planned regardless of the actual procedure followed. It is well defined how each phased flows into the next. But in the real world this can often break down. Reactionary iterative approaches can be used in these cases.

A one team approach - Get the developers and testers together and on a weekly if not daily schedule. Daily meetings set priorities and goals to get development back on track. In essence this begins to look like the SCRUM process followed by some development groups all the time. Begin by getting everyone on the same page and developing fixes that allow test to proceed in a most basic fashion. If that fails, in the worst case it might be necessary for the developer to sit next to the tester while the tester is working to remedy and defects in real time.

Iterative Model Summary


The iterative models shown here are a subset of those in industry. Each have targeted strengths and weaknesses. The iterative models focus on time to market needs. Traditional waterfall models deliver robust software but lack speed and flexibility. Combining aspects of iterative and waterfall based approaches may prove the best for some projects.


Fallacy of Skipping SVT


Sometimes it can be tempting to want to skip a test phase to deliver software more quickly. Sometimes development might consider skipping portions or even entire phases of test completely.

In some situations where organizations have exceptionally strong integration test environments setup, it becomes tempting to skip SVT entirely. But this approach often fails. Why?

Though similar goals on the surface what happens in each phase is largely different. SVT does many different tests in parallel, perhaps with round the clock testing (different shifts with different workloads). a single failure in SVT might block only some testing while other parts can proceed and progress can be made. Contrast this with integration testing where virtually all testing is intertwined. A failure in one are might block all further testing. This may be fixed and another burst of productivity will occur until another critical blocker bug is uncovered. So on in this fashion will test burst forward then stop dead. Additionally consider that Integration test environments often test many new pieces of software, where a defect in any one product might stop the testing of another entirely different product. Integration test is largely a serial process where SVT is largely a parallel process.


Chapter Summary


  • Different models of tests approaches exist, but they all share the same goal of uncovering defects to ship software.
  • The different test models, often are still comprised of the same underlying test phases.
  • Each model has its own strengths and weaknesses.
  • Different test phases are designed to extract different kinds of defects.

Chapter 4: The test and Development Divide


  • Testing your own code
  • Establishing your credibility

In Software Development it is conflict, confrontation, and tehcnical and organizational problem solving that allow us to move forward.

Testers need to hold an adversarial position yet work within a structure to manage this conflict constructively.

Should developers Test their own Software?

Experience shows it is not wise for a developer to be the only tester of a piece of software (especially mission critical software).

Developers do unit testing (UT). Unit testing benefits directly from developers familiarity with the code.

Software developers responsibility is to implement the fnuctionality, obtain a clean compile, and test the base functionality.

Some organizations consider a clean compile to complete UT. It is not a stance you should take if you desire high quality software. Explicit UT actions place more responsibility on the developer to ensure the simplest mistakes are caught early in the least expensive way possible.

Happy Path Testing

Some might wonder why if it is possible to allow developers to perform their own UT, why the same developers should not perform FVT, and SVT as well.

Experience shows this is a bad idea.

Despite good intentions, developers objectivity is compromised when reviewing their own code.

Since developers often work from use cases, they would tend to perform testing of only those strict use cases (ie the paths the developers know their code handles). If the developers thought of other use cases, they would likely have implemented those as well!

Happy Path Testing - Testing only what you know should work as implemented. Seasoned testers, for good reason, will often try things not anticipated or caught by the happy path testing approach.

Makers versus Breakers

Fundamentally the developers are the makers, and testers are the breakers. Each group is explicitly at odds.

Once a tester has created a working piece of software, there is little interest for them to uncover flaws.

Testers will often go to great lengths to break software. This can include devising their own use cases (some align with developer intentions, others may not).

Testers are trained to fundamentally think the software they receive will not work, and that it is their job to identify the flaws. Testers are often thrilled at discovering a bug, so they take personal pleasure in uncovering defects, while simultaneously doing the act that defines their successful function as an employee.

The World of Open Source Software

An exceptionally interesting way of looking at software development. Typically open source developers will begin work on new software by creating a patch that provides added functionality. This patch is then set forth for peer review by countless other developers. Many of the most successful open source projects have developers who are as passionate about code review as corporate professional software testers are. In many ways, the various patch reviewers are reviewing patches strictly to identify weaknesses across the spectrum of software defects (style guidelines, correctness, efficiency of algorithms, etc).

After this review phase if the patch is accepted, the software will likely be released in the next development snapshot which makes the software readily available to any beta tester (potentially thousands) who wishes to try the new function. Most large and successful open source software projects have dedicated following of beta testers who either wish to help the project by contributing, or merely wish to run bleeding edge software for feature purposes.

Diplomacy: The testers Relationship with development

When we saw the example of the traditional development processes (waterfall and waterwheel) it seemed as if there is a hard line separating the development and FVT teams. While these teams are likely be completely separate teams, they do need to cooperate.

FVT is often challenged with creating their test plan from use cases similar to those being used by the developers to implement the software. Developers specs, design docs, and actual code provided to the FVT team by development all help make the FVT test plan.

In the same vein FVT and SVT cooperate with one another to pass along FVT experiences and knowledge to the SVT team. The SVT team may then chose to look at the software in a slightly different light or alter their test plans based onFVT results (exercising paths deemed troublesome by FVT more for instance). The SVT teams may chose to test based on how other customers (who have not requested the particular feature or line item) might try to use the software.

The First Bug

When a new tester (or a tester new to a particular code base) thinks they have found a new defect, it is important to inform the developer while maintaining a thick skin attitude. Challenging developers can be daunting and memorable.

It is likely many defects will be claimed working as designed or unimportant, yet the tester must soldier on recording his opinion of each defect found. Over a long development period testers and developers will often gain rapport which makes everyones job easier. Confrontation likely yields to cooperation. That is of course not always the case. Even seasoned testers and developers will get "push back" from one another. This professional tension is what creates good software.

Building Credibility

Perhaps the simplest way to build credibility is to have deep technical understanding and a technical credibility. This provides equal footing with acclaimed and seasoned developers.

How do you grow technical credibility?

  1. Obtain a broad view of how software works externally and internally. This can be daunting. Breadth of product knowledge may lead you to conclusions a developer with a narrow view might have missed.
  2. It is recommended for large software to take a depth first approach in a particular area, widening your breadth as time goes on. Each time you engage in a deep examination of software you are performing what is colloquially known as a "deep dive".
  • Immerse Yourself in Information:
    • Review Customer Documentation - the most logical place to begin
    • Study Internals - Look at component notebooks or functional documentation. Study internal interfaces, structures, and flow of control.
  • Tap Into Your Colleagues Knowledge
    • Experienced testers will have knowledge that can help connect the dots between components.
    • Any successful test organization should be primed with seasoned testers to support those with less experience. This technical exchange from mentor (seasoned developer) to mentee (the new developer) and from the new developer to his or her coworkers is a great way to build technical credibility mentioned earlier.

The Importance of Debugging and Product/Function Skills

There are 2 views on how debugging can occur in a test environment

  1. Tester finds defect, and passes test case and results back to developer who recreates the defect and solves the problem, shipping the next new code drop with the fix. Potentially causes a backlog as developer ceases new development while recreating, locating, and solving the bug. Possible backlogs occur, and inefficient.
  2. The tester can try to determine the actual cause of the defect as it arises. Often more difficult up fton but allows the tester a better component knowledge of software. This can include elaborate breakpoint scripts, virtualized execution, reading system memory dumps, and numerous tracing techniques. Often the most advanced debugging testers are better at diagnostic root cause analysis than some developers. These debuggers are highly technically credible and well respected by developers and their test peers.

The latter approach allows developers to spend time fixing defects, rather than identifying them.

Building these skills allows a developer to understand if a potential defect is a user error or a real legitimate defect, and keeps hard won credibility. Injecting testability attributes into a design is easier than shoehorning it into a product later in development cycle.

Benefits of a Strong Relationship

Testability - How easy it is to test a piece of software.

Testers should be involved with development early on to ensure that any piece of software is made relatively easy to test.

Organizational Challenges

Development and test often have invisible (or in some cases highly visible) barracades separating them.

Large organizations need to provide balance to ensure that neither group is too dominant.

Determining where to place FVT is often a critical organizational challenge. It is truly the crux of beginning a solid test foundation. It is highly recommended for any moderately sized software project to have FVT outside the same immediate management of development. It is generally taken for granted that SVT and later test phases will also be under separate middle management.

First Line Department - a group of individuals and their manager who work to perform a specific function for an organization.

Second Line Department - a logical grouping of first line organizations, such as those working to develop a key piece of software, led by a manger with broader responsibility and accountability for products.

Third line Organization - a logical grouping of second line departments, the mangement of whom, reports to a companies upper level management.

[Diagram explained in figure 4.1 Page 70]

  • Model 1: FVT testers work in multiple first line departments within a second line FVT organization focused entirely on test.
  • Model 2: FVT testers reside in first line departments with a second line development department. FVT first line departments are charged with performing test primarily against the components and functions developed by its own second line organization.
  • Model 3: FVT testers and developers work in the same first line department, but with a clear separation between the two disciplines.
  • Model 4: Same as model 3 but without any clear distinction between development and test. While developers do not test their own code, they made test the code of a peer in their department. In some cases a single person or group of people are designated testers for that particular test cycle.

Each model contains its own strengths and weaknesses, and resolved technical disputes at different levels of management. Each modelalso has varying ability to foster communication, as well as varying career paths for testers.

Lets look at each model in more detail:

Models in detail

  • Model 1: Separate Second-Line Organizations for FVT and Development
    • Advantages:
      • Improved objectivity with clearest delineation between development and FVT. Ensures objectivity by its nature. FVT acts as conscience for development.
      • Higher point of Escalation - second line managers act as point of escalation during issues. Second line manager can also act as sounding board for technical and managerial issues solely about testing.
      • Fosters Testing Depth - encourages testers to coalesce a community that shares common interest, tools, and practices.
      • Supports Test as a Career Path - Management can offer growth potential and develop career paths specifically for testers. Test second line can act as an FVT advocate encouraging career growth for testers.
    • Disadvantages:
      • Impedes Communication Flow: high wall between fvt and developent can lead to communication breakdown. An "Us vs Them" mentality can set in.
      • Impacts Early Test Focus - resource allocation issues can arise within the second line FVT department. Influences at the third line might result in diminished early involvement by FVT testers. Focusing on current release, created blindness about the next. Second line manager priority decisions are extremely important in this model.
      • Increases Cycle Time - does not enable time to market requirements and challenges with the same aggressiveness as the other models. Tim spent on cross organizational issues is time not spent testing.
  • Model 2: Development and FVT in the same Second-Line Department
    • Advantages:
      • Encourages Issue Resolution - A flatter organization provides a lower point of escalation for issues. Expedites problem resolution.
      • Improves Communication - Since the second line manager is in charge of code and test, they have a vested interest in ensuring teamwork between Development and FVT.
      • Supports Early Test Focus - By prioritizing assignments, management can enable testers to participate early in development planning. Developer/Tester relationships tend to be less strained in this model.
    • Disadvantages:
      • Weakens Test Community - Multiple second line organizations doing test may splinter the FVT test community. These FVT tresters may have to work extra hard to share tools, best practices, and resources. Cross FVT team communication is key to overcoming this disadvantage.
      • Impedes Test Career Path - Within a second line department test achievements are likely to be overshadowed by development departments. It is the developers who are seen as the creators and those who deliver the product to customers. It is imperative the FVT tester first line manager acts as an advocate to prevent this.
  • Model 3: Integrated FVT and Development Departments
    • Advantages:
      • Increases Communication - FVT and development in a single department leads to the easiest communication possible of the models. Provides a high liklihood of balanced resource allocations and workload scheduling. First line is a promoter of teamwork and can work more effectively when dealing only with their own employees.
      • Supports Skill Building and Education - Scheduling within a first line department often facilitates cross discipline interaction. Knowledge exchanges and learning sessions are relatively easy to coordinate within a department.
      • Emphasizes Early Test Involvement - resource allocation and scheduling is much easier. Testers can be directly involved early in the process (enhancing testability) when working with their own department peers.
    • Disadvantages:
      • Lowers Escalation Point - A completely flat escalation point (first line manager). It is important to set up proper leadership within the department to reduce the need to constantly have a first line arbitrating his or her own department issues.
      • Weakens Test Focus - FVT can sometimes slip into an afterthought position. Development and Testing Metrics need to be in place to prevent this from happening, otherwise the pressures on the first line management to meet or beat development dates may compromise their objectivity.
      • Impacts Test Career Path - Makers might overshadow the breakers since there is no clear FVT career advocate in this structure. Tends to only work when the testers are very skilled and highly values, often so much that they are on the same career path as development already.
  • Model 4: Multitasking Developers and Testers
    • Advantages:
      • Enhances Communication - Since testers and developers trade job tasks, communication is exceptionally easy even on a supremely technical level.
      • Offers Flexibility - The first line manager is empowered to be extremely flexible with resource allocation. It creates a very dynamic work environment.
    • Disadvantages:
      • Uses Strengths Poorly - Sometimes persons who prefer to be developers are forced for a period of time to be a tester which may not be what they enjoy. Often the job of tester is relegated to the newest employees as a way to bring them up to speed. Sometimes testers who may not be skilled in development methodologies or even the language being used, might be asked to provide large function in a short period of time. Strengths and weaknesses may not be utilized to their fullest on this model. code quality may suffer as a result.
      • Decreases Test Career Opportunity - The worst career opportunity for testers is in this model. Here it is virtually impossible to maintain a tester only career identity and thus might miss out on many test expert skills that arise from a career of insight and expertise.

So which model should you chose?

It really depends on what you need! Chose model one if you want the greatest objectivity in test and career growth. Model 2 and 3 provide the easiest path for development and test influence in early development via enhanced communication. Model 4 gives the organization the most flexibility but tends to lead to sub par results (not recommended).


Is organization enough?

Organization being what it is can have an influence, but individuals with strong personalities will make the biggest difference. Planting highly effective testers will yield the most success. These test leaders act as a gateway to the development organizations by using their hard earned credibility. They can inform testers what sorts of questions are relevant to ask, the best approach to inform development of issues, and follow up strategies that work best.

Implementing one of the models above (preferably not model 4) with a group of strong leaders will yield respectable results in most situations. It is important to note that changing your test organizations layout can have a large impact on delivered quality. Keep in mind you should identify current problems and select a model that addresses those problems rather than changing for change sake.

Chapter Summary

Best Practices:

  • Testers should not test their own code
  • Testers need to have or build technical credibility with developers
  • Maximize your organizational model choice by supplanting it with key experienced leadership test personnel.

Chapter 5: Where to Start? Snooping for Information


  • The importance of knowing what you test
  • Test preparation approaches
  • Where and how to find information
  • Obtaining knowledge about customers
  • Finding your first defects

It is OK to not know something, it is not OK to test something you do not know. Preparation for a test is as important as the test itself Some defects may be uncovered by simply preparing to do a test

Proper test research when planning will contribute depth to your product breadth (boosting technical credibility).

The Importance of Knowing What to Test

Testing knowledge must be on par with your product knowledge.

Why you Must "Learn the Game"

Understanding domain specifics is not necessary to locate a crasher defects, but it is highly likely to be needed to find subtle bugs. These bugs manifest themselves as no obvious problem, but instead the software does something it shouldn't.

Testers need to understand the requirements from a customer standpoint. They should even go as far as to know the targeted users and act as their advocate.

One catch with industrial strength software is that even if it is entirely new technology, a tester needs to thoroughly understand it before beginning test!

Resources to Tap

The following items are crucial to understanding a test.

  • Requirements Document - the requirements gathered from potential or actual customers, by savvy development teams, that identify expected product behavior and function. This document concerns itself with what the software should do.
  • Specifications - Requirements documents cannot be implemented as is, they are first converted into a set of design blueprints and specifications (aka "spec"). This document is concerned with "how" the software is to be constructed.
    • Flow and Function - the overall flow and function of the software can be gleaned from the specifications. Start by getting a high-level overview of what is being developed before going back in more detail.
    • Whats under the hood? - When planning your actual tests cases, reference the specification for anything pertinent to what you are about to test. Look for "pressure points" which are the areas of the software which have the highest demands placed upon them. Additionally make note of documented recovery functionality and design test cases to stress that recovery model. If the software contains new function, determine which component has most significantly changed and emphasize testing on that.
    • Dependencies - learn to determine the hard and soft dependencies of the software you are testing. Hard dependencies are absolute requirements necessary for software execution. Soft dependencies are optional components that make the software work a certain way.
    • Exploiters - Exploiters consume data produced by software or use the software you are testing to perform their function. Sometimes there may not be any real exploiters yet, however if there are, use the dependent software to help you test! Testing API's or services should not be done in a vacuum, you should strive to understand real exploiters.
    • Just the first pass - make sure when examining the spec for the first time, you are looking at a high level, as you gain understanding of the product, repeated passes for more specific reading and understanding may be needed.
  • Book Drafts - Examine any early drafts of product manuals, release notes, help panels or other external documentation. External documentation might paint a different picture of software than the internal documentation.
  • Know your information developers - Even if there is no early external documentation for you, get to know the persons responsible for the external documentation early in the development process. Testers should certainly influence the documentation as they likely have the most hands on experience with the software. This influence can range from offering advice on content to providing specific examples for the documentation.
  • Learn from the past:
    • Postmortems - a postmortem is a meeting and lessons learned from a previous iteration of software test.
    • Strengths and Weaknesses - which approaches were most successful. Which scenarios uncovered the most defects.
    • Teaming - what types of interactions helped the test team meet its objectives. Which interactions were superfluous or a hindrance? Learn from the organization and make up of the team. Are there models to be avoided. Were there specific organizational items identified for future improvement.
    • Tools and Approaches - if an approach or tool was working well, re-use it, if it needs enhancement or replacement, do so before your test cycle begins.
    • Who has tested similar code? - find out who else has tested this software in previous iterations or releases. In addition you may locate people who test similar code. Attending their postmortems may help gain insight for your testing.
  • Meet with the experts - when preparing for a test, go to the people who know the software, or the testing process the best.
    • Designers and Architects - in some organizations there are architects designing software who are not themselves developers of the code. These architects often have intimate knowledge of the retirements, and customer needs and expectations. Sometimes architects will be able to show how a software product fits into a larger strategy which may influence your test.
    • Developers - developers often like to talk about their code. You should make an effort to tell them what you understand about the functionality (as a sign of credibility). Many developers will gladly point out weaknesses in their code to help with test preparation. If there is no formal documentation or specifications ready when you meet with the developers, you may opt to schedule "chalk talk", a sit down exchange. Questions to ask include:
      • What areas of code are you most worried about?
      • What areas were the most difficult to code?
      • What areas are you most interested in having the test cover?
      • What is the best thing about the function.
    • Service Professionals - the people who service defects or customer calls about a product make excellent input into a test strategy. Combinations of events or product environments that caused headaches may make good test design points.
    • Other Testers - even in preparation, you should work with other testers. Meet the other team members, and do an information exchange about what will be tested. If you can meet with testers performing other phases of test, go into detail about their test plans and coverage expectations. Learn from the phases before yours regarding where to expect problems (and possibly emphasize them in your testing). Pay special attention to avoid duplicated effort.
  • Participate in Code Reviews - Even though the purpose of code review is to eliminate defects rather than educate, it can be useful to sit in on code reviews. You may overhear discussion that can influnce your testing, and if you can keep up with the code being covered in the review, you will gain an intimate technical understanding. Do not bog down developers with questions about how code works, you will likely not be invited again.
  • Compare Notes - take the notes from each of the phases above (obtained in an individual setting) and compare them to an actual face to face meeting of the group of test leads, development, and architects.

Viewing All Software as a Solution to a Problem


Technology in industry is not implemented for the sake of it.

Researching targeted problems in detail is just one task of preparing a test. When the problem is well understood, you can think of new ways to experience the problem and demonstrate it.

Customer Reconnaissance & Where to Snoop

Being a good customer advocate is critical to performing good testing. But how does a tester become a good customer advocate?

  • Seek out customer interaction activities - work with customer service to know you are looking for those opportunities to engage the customer base.
  • Read existing problem reports in your issue tracking system. Make notes of tends or common issues on a per customer basis. Compare these notes on a per customer basis to identify cross customer trends.
  • Get involved with beta tests - volunteer to educate beta test customers, or help support them in their beta testing. Extend and offer to have a customer present to the test team (speaking about the production environment, daily struggles, etc
  • Go to trade shows, group events, or user conferences . These situations often have time alloted to informal socialization. In addition there will often be presentations ripe with material for better customer use case understanding.
  • Learn about the customers business by surfing the web and examining what it is their core business is. You should certainly know what major industry they are active in. But that might not be all you need to know. Your customer could be an internal division for your own company, and external corporation, or anything in between.
  • Apply knowledge of customers - make sure that less experienced persons on the team learn from mentors who know about the core business of your customers. Act to ensure environments and setup model real customers as closely as reasonable.

A simple test preparation Tool

After you have collected your data, you must organize it and begin the deep thinking phase. No test tool will do the deep thinking for you. A simple tool like a checklist can however help you track progress through your investigative process.

The checklist can be used as a training aid for new testers, or can performed individually and later reviewed in cooperation.

[Figure 5.1 and Figure 5.2 on page 91]

Don't just take, give a little

After performing the investigation for your test plan preparation you should provide feedback about the software.

  • Inject insight into a review - Make another pass through the specification and ensure that the design includes everything you know now are required.
  • Protect your customers - ensure what they want is what they will get.
  • Make your own Job easier - When suggesting changes in software for the customer benefit, it is also ok to suggest things that benefit the test team. Often after a first test iteration, little things that can greatly ease future testing cycles become obvious.

Chapter Summary

Preparing for a test is as important as executing it. A thorough understanding of the software and its use intended use cases is vital. Use all resources available to you as input to your test planning. Learn from previous test iterations, other test teams performing earlier phases, meet with experts and documentation specialists. Consider keeping a checklist or some other document to track your investigative progress. Zooming in to a detailed function may be sufficient if the test scope is only on that function. But often a larger view is needed for complex software. In fact the complexity of industrial strength software is often too much for one person to grasp.

Chapter 6: Coping with Complexity Through Teaming


  • Complexity of large system software
  • Reducing complexity through teamwork
  • Leveraging everyones expertise to solve the complexity issue

Software development is a team sport. Complex software requires coordination of that team. Here we will show you some strategies for doing that.

Software Complexity Definitions in time:

  • Complexity IEE 1983 - "The degree of complication of a system or a or system component , determined by such factors as the number and intricacy of interfaces, the number and intricacy of conditional branches, the degree of nesting, the types of data structures, and other system characteristics."
  • Complexity IEE 1990 - "The degree to which a system or component has a design or implementation that is difficult to understand and verify."

The redefinition of complexity on such a short interval shows that defining software complexity is itself complex.

Consider further environmental complexity when software may run on a variety of operating systems on a variety of hardware platforms etc.

Case Study: Complexity of an Operating System

The text demonstrates the complexity of the z/OS OS stack, but the same ideas apply to virtually any large scale software platform.

Reducing Complexity through component spies

An effective approach to building a comprehensive test plan in complex environments is for individual testers to take ownership of different components of the software, becoming a "component spy".

A component spy must not concern themselves with just the changed code in a module, but must instead get to know the code base as a whole.

FVT spies might learn detailed module control flow, data structures, recovery processing, module to module communication, and external interfaces.

SVT spies will learn at minimum the operational and external aspects of the component. Familiarity with messages, external interfaces, command, and configuration options should also be understood. (Though they might also opt to learn some of the things the FVT spy is interested in as well).

Component Assessment

Each spy will be responsible for generating a comment assessment. This might be accomplished via a component template (a questionnaire designed by experts). These assessments cover the component in depth, the history, test coverage, strengths and weaknesses. These reports will be presented to a conference of peers.

  • Component Basics
    • Technical Description - An overview of the component and what it tries to accomplish
    • Functions - Description of all of the functions (internal and external) in a component.
    • Where does it fit - a discussion of where the component fits into the larger software system and what role it plays. This is about getting the big picture in relation to the specific software being described in this document.
    • Dependencies - Interdependencies with other components and products. Describe the dependencies required by the component, as well as the other components that are Dependant on the component. Both directions of dependency gives better perspective.
    • Outage-causing capability - Clear description of the components potential to cause an outage. An outage is a slowdown or interruption of service availability.
    • Component size and language - The volume of code which is used to provide a pertinent view of the broadness of functionality. The source language is provided as it can hint at potential complexities (performance sensitivity issues etc).
    • Components Age - When software is altered, it is good to know the age of the current component in question. This can help gauge the depth of regression testing.
    • Recent changes - A review of modifications made to the component over the last few releases (including specific release numbers and changes). Provides insight into the components role over time and areas which may not have been stressed for such a long duration in the field.
    • Planned enhancements - Discussion of new product function leads to developer tester communication. In addition it may help in the design of test plans which will remain relevant in the future.
    • Competition - Commercial, or otherwise, competition. A deep investigation into a competitors product and its functions may demonstrate potential solutions to technical problems.
  • Test Coverage - This section of information all relates to how tests will be performed or are already being performed.
    • Functional Breakdown - Answering the following questions will help identify the testing techniques and methodologies that should be used appropriately.
      • Does the component get adequately tested by normal system operation and natural exploitations?
      • Are there record/playback scrip workloads to drive various functions of the component
      • Are there batch workloads, containing many test programs that can be used to exercise the component?
      • Are there manual scenario driven tests that should be executed as part of the component coverage?
      • Are there any ported test cases that can be obtained from a prior test phase?
      • Frequency of Workload Execution - how often are existing scripts and workloads planned to be executed against a given component. Are these scripts to background noise for other testing, or actual tests in their own right?
      • Functional Coverage - which structural aspects of software are being covered by the planned tests, scripts and workloads.
      • Strengths and Weaknesses of Test Coverage - a discussion of when the test coverage was last enhanced such that a test can leverage those enhancements or identify gaps.
  • Defect Analysis - An extremely important bit of information for test planning, is to learn about the previous defect escapes including:
    • Defect Rate - How many defects are being uncovered in the field with respect to the size of the component. Has this rate risen or declined over time?
    • The Top Defects - Describe the top defects external customers have found, and elaborate on the following 3 properties:
      • Did the problem cause critical situations or outages?
      • Explanation of why previous testing did not find the problem.
      • What could have been done by test to help uncover the problem originally?
  • Areas of Improvement - By identifying the critical escapes and exploiting the test coverage overview, the tester can create their own test plan. "If you had an open checkbook, what would you do differently?"

Team Benefits

The value of establishing component spies and doing all this research is not just to create a good test plan, it is also a means of building educational and training materials valuable to the current and future test teams

The Power of Spying

  1. Spying enables a team to develop the most comprehensive test plan possible.
  2. With the component expertise, test members can call on one another during execution for diagnosing complex system-wide issues.

Sharing Expertise Across the Team

In addition to expertise in particular components, system testers are usually knowledgeable about system operations across different platforms.

Some testers may be experts at scripting languages and automation techniques.

How does an organization bring those skills together effectively?

Theme-based testing

One way to perform complex testing, lat leverages existing expertise, is to do theme-based testing. To do this, often test organizations create a strategy team consisting of senior testers, and component spies. These strategy teams have the objective of building a complex test environment that can emulate the complexities of real customer environments.

Each expert must educate others and the strategy team. Additional experts may need to be brought in.

When technical review phase completes, strategy team decides how to put together discrete new solutions to solve business problems. The purpose is to define release themes that will form the basis of SVT.

Example themes:

  • Security
  • Systems Management
  • Constraint Relief

With themes well defined, component owners and experts fill in the details and testing begins.

Cooperative Problem Diagnosis

Since problem diagnosis is exceptionally challenging in complex environments, it becomes a team effort.

  1. Identify the symptoms and then interpret those symptoms until the point to an initial component to investigate.
  2. With a component identified an SVT testers can call on experience in the team to hypothesize what is going wrong. Shared knowledge is exceptionally important here.
  3. Testers examine symptoms from the perspective of the component in question to see if anything is awry with that component (or an adjacent component if interface data seems inconsistent.

Chapter Summary

Enormous complexity means testers need to work together and use everyones skills wisely. It is imperative to build effective and comprehensive test plans. Since the average tester couldn't do this alone, he/she should use component spies and tap senior test leaders.

Chapter 7: Test Plan Focus Areas

  • Who test plans are really for, and how they are structured
  • Unit test focus areas
  • FVT focus areas
  • SVT focus areas
  • IT focus areas
  • Special considerations for multi system testing
  • Test cases versus Scenarios
  • Why ever tester should love test plan reviews

So you have done the research and learned a lot about the product, even consulting experts. Now what?

It is time to construct a detailed test plan!

The test plan document

Developers satisfy themselves that their code will be tested right Other test teams and testers within a team use it to eliminate overlap Clearly define entry and exit criteria and handoff New testers use it as an educational tool Product managers and release managers use it as a tracking tool Auditors use it for competency checks Internal teams may use it to determine the best rout of adoption, deployment, and use of the software

Who is it really for?


The Testers!

Purpose is to crystallize plan of attack and scope out work

Tool for measuring progress

But be wary, a test plan is just a plan, sometimes deviations are necessary!

It makes sense to adapt the plan as you learn more about the products strengths and weaknesses.

One technique to deal with flexibility and adaptation is the use of "artistic test scenarios".

Artistic Test Scenarios are scenarios that are guided by intuition and the investigative instincts of the tester, based on experiences with the software acquired during initial , more structured testing.

Structure

What does a test plan consist of? Remember it is just a formal packaging of the real work done previously to plan the test.

There are formal specifications such as the "IEE Standard 829-1998 for Software Test Documentation" which are templates. Many companies chose to use their own template which may be considerably less complex or sometimes more so.

[ Figure 7.1 on page 109 ]

  • Front Matter - everything that is not a scenario or test matrix. Often this consists of boilerplate information that is reused with minor modification from test to test. This information should not be seen as trivial however as it may give unfamiliar reviewers context for your plan.
  • Test Metrics Scenarios - the actual list of test scenarios each tester will execute. Though the tests to be performed are often stored in an external tracking tool, having them briefly summarized(one line) in the test plan document can expedite reviews. If you want people to review your work, make it as easy as possible. A matrix format will often show which tests are executed across different hardware or software environments.
  • How much detail - there are 2 schools of thought.
    • The cookbook approach document each scenario in explicit detail. The advantage is it captures virtually all relevant knowledge. The disadvantage is it demands huge amounts of time to create (sometimes longer to document than execute a test). Often this approach is used with highly skilled testers as the plan writers and novice testers as the test executors.
    • The framework approach encourages the tester to jot down just enough to recall what they planned on testing. Capture the essence and let the tester work through the specific steps required. Best used when a broad level of skill permeates the team. Authors recommend this choice in general.
  • Test considerations and variations - considerations are one liners that identify key areas of concern that should be the tests focus. Not specific in execution. The intent is to create an outline of work to be done. Variations represent the actual test plan of attack on the function. These variations form the core of the test plan.

Unit Test focus areas

Responsible for exercising every new or changed line of code, taking all branches,driving all loops to conclusion, exercising all object behaviors and so forth. Since this is often done by setting breakpoints and altering conditions to ensure paths are taken, a devoted test plan doesn't necessarily make sense. Frankly it adds little value.

FVT Test focus areas

Scope is of a complete yet containable functional area. Items to emphasize in a test plan include:

  • Mainline function - does the software do the big things it is supposed to do?
  • Security Support - since security is so important and sometimes misunderstood it is important to distinguish it from mainline function
    • Authentication - confirming a user is who he or she clams to be though the use of a token, biometric, or id/password combination
    • Authorization - limiting an authenticated users activities to permitted areas only thought an access control mechanism
    • Confidentiality - hiding private information from public view usually thought encryption
    • Integrity - Ensuring that information was not altered during transmission (using Md5 Sums for instance)
    • Norepudiation - Preventing a sender of a message from later claiming he did not send it through digital signatures or similar techniques
  • Software Interfaces:
    • Private module to module interfaces - by studying the module interactions, conditions to exercise various internal interfaces can be created. This might cover passing parameters through a stack or shared memory or other complex structures. Though stressing internal interfaces, external stimuli need to be crafted to arrange those situations. If this cannot be accomplished, the FVT tester may resort to artificially rigging conditions to force the desired internal interface exploitation.
    • Private component to component interfaces - programmatic interfaces between modules or more complicated interfaces such as TCP/IP sockets, or RPC. If all external interfaces are no ready at the same time as your FVT test, you may need to erect scaffolding to continue your side of the test. Note that scaffolding is only ok for preliminary testing, and tests should not be marked complete until real implementations make both sides of the tested interface.
    • Application programming Interfaces (API's) - Note that in some cases complete API coverage is unlikely to be practical to test. Also note that when testing API's one should remember to include scenarios that invoke the API while other interesting environmental conditions are met (out of memory conditions, I/O bottlenecks, memory shortage, or an error recovery situation.
  • Human Interfaces:
    • Graphical User Interfaces (GUI's) - You must obtain or create a map of all possible path through the gui. Make sure to use valid and invalid inputs in your testing. Often this kind of testing can be time consuming. Automation can help this, but there are some gotchas, such as needing to re-record everything when a portion of the gui changes. Effective automated gui testing often requires extensive up front development time (possibly more than will be recouped by the successive automation runs)
    • Line mode commands - conceptually easy to test command line execution but can be time consuming to implement. There are often explosive numbers of options and combinations that need to be tested.
    • Messages - messages can dictate both success and failure. Defining scenarios to force out every possible message can be challenging, especially for some obscure failure messages. Once more, if external stimuli cannot be crafted, or not crafted practically, to do this conditions may need to be rigged. This should be the exception not the rule.
    • Error Codes and Memory Dumps - in addition to error messages, some sophisticated software will provide error codes and memory dumps for expert debugging in response to nasty situations. In addition there is often an associated recovery routine that cleans up the mess and allows the component to continue.
    • Logs and Traces - often overlooked even by experienced testers. Forcing all possible logs at all possible log levels and ensuring correctness can be daunting.
  • Limits - also known as boundary condition testing, means testing a piece of code to its defined limits. then exceeding those limits. "This one goes to 11". Testing very possible value often makes no sense, but testing at and near the boundaries is usually a vast and acceptable simplification.
  • Recovery - Can the software restart after failure? Does it self-heal? Can it recover successfully from error conditions it anticipates? What about those it doesn't anticipate? Does it provide adequate diagnostic data? Special emphasis should be placed on this often overlook testing aspect, since it is so critical to work properly!
  • Internationalization - sometimes thought of as simple translation testing, this testing actually needs to do much more. Various keyboard inputs, sorting changes, dates, menus and dialogs that all may be different under a different internationalization setting. In addition it is imperative that accuracy of the actual translations be tested by fluent speakers of the foreign language.
  • Accessibility - Determining how to test software for various types of disabled users can be very challenging for those without an extensive background in accessiblity. There are often government guidelines for what should be done in various circumstances. These same checklists used for implementations can be used to guide test plan scenarios.
    • Color
    • Contrast
    • Size
    • Mouse alternatives for all navigation
    • Keyboard alternatives for all navigation
    • Labels on every element for screen readers
    • Alternatives for any distinct sounds or audio feedback
    • Accessible documentation to match the software
    • and others!

SVT Test focus areas

Since svt is where the whole package comes together for the first time, testing needs to move beyond lower level granular fvt types of testing and into a more global view of the system.

  • Installation - Usually done across a variety of hardware configurations and or components. Additionally uninstall scenarios should be covered as well. Upgrade testing also resides in this category.
  • Regression - Do things still work as they did before? Some new software additions may stimulate latent defects.
    • Evolving Roles - Automated collections of tests run during regression are sometimes called a test bucket or regression suite. Regression should often be done daily or at least once for every new code drop.
      • Use 1: Though these previous tests cannot explicitly test new function, they may implicitly test updates to existing function. These tests can be used to determine if a product has met baseline stability/functionality.
      • Use 2: Using a regression suite as background noise for other testing execution. As other new tests pass, add them to the regression suite.
    • What if it is the first time ? If there is no existing regression suite to use or build upon, then the team should immediately begin the construction of one.
  • Migration/Coexistence - The intent of migration testing is to ensure that a customer can transition smoothly from a prior release (ideally all supported previous releases) of the software to a new one. Current release is often annotated as release "n" and the prior release as "n-1". At minimum testing of n-1 to n must be done.
  • Load/Stress - 2 Dimensions, Deep and Wide
    • Deep - Throughput related targets. Emphasis on timing and serialization issues (race conditions). Stress can lengthen timing windows to expose latent bugs not seen when operation occurs under normal load. Sample metrics of interest follow. Note these metrics should probably be based on percentages of hardware utlization as they remain constant over time, even as hardware gets faster.
      • CPU Utilization
      • I/O path Utilization
      • I/O interrupts/sec
      • Transactions/Second
      • Number of simultaneously active processes
      • Paging or Swapping rate
    • What kind of workload? Something realistic, as tight loops alone dont paint the whole picture. Workloads must simulate real world extreme conditions. Separate workloads may need to be run serially, concurrently, or both.
    • Coping with Stability Problems - it is often the case stability isnt good enough until the end of SVT to perform complex testing such as this. In order of preference, you can opt to simply omit test cases causing stability issues, or incrementally reduce the load the workloads are placing on the system until you find the right level of stability.
    • Wide - Maximizing volumes and and resource allocations. Think of it as system wide limits testing. Examples include:
      • Application: Number of files that can be opened concurrently, or the size of the individual file
      • Database: count the number of distinct db tables that can be joined together, the size of each table, or the number of simultaneous users that can issue queries or updates
      • Online retail store: ratio of browsers to buyers, the number of shopping carts that can be opened at once, the aggregate contents of individual carts
      • File System: the sum of individual disks that can back a single instance.
      • OS: amount of real memory it can manage, simultaneous users it can handle, CPUs it can exploit
  • Mainline Function - Targets new and changed functionality. Scope is end to end. Exhaustively exercise the entire software packages supported tasks from an end-user perspective.
  • Hardware Interaction - Special type of mainline function. Worthwhile to document separately since hardware can often be an expensive part of system test.
    • If software has explicit support for a new piece of hardware then it must be tested
    • Does software have any implicit dependencies or assumptions about hardware that it will run on or interact with. Implicit examples include:
      • Timing loops based on processor clock speed
      • Network Bandwidth or latency dependencies
      • memory availability
      • Multitasking on a uniprocessor vs multiprocessor machine
      • I/O latency
      • Reliance on quirks in the implementation of an architecture or protocol
  • Recovery - Recovery testing is similar to that which is done in FVT but broadens scope. Does the software restart cleanly after a system wide failure instead of just a software crash. Clustered systems and environmental factors come in to play.
  • Serviceability - Features such as logs, traces, and memory dumps to help debug errors when they arise. Thoroughness counts!
    • First Failure Data Capture (FFDC) - At the time of initial failure, serviceability features are able to capture enough diagnostic data to allow a problem to be debugged.
    • Often this testing is accomplished via "test by use". Serviceability features being used during the course of testing t debug problems seen in SVT.
    • Pay special attention to the fact that lightly loaded FVT or SVT may not produce nearly the same level of tracing or diagnostic data as a loaded production customer environment.
  • Security - Things like denial of service attacks come into play here. FVT should already have ensure unauthorized access cannot be gained, but the SVT tester needs to ensure that the software has been verified to withstand onslaughts without crashing. DOS is also considered specialized load testing.
  • Data Integrity - Ensuring data integrity is maintained at all times, regardless of external events. Most software assumes the data it uses is safe. This means corruption gets noticed long after damage is done.
  • Usability - often usability is not designed into an application by professional interaction designers. So the test teams are left to determine how usable software is.
    • Explicit Testing - Allow naive users to interact with software and note when they get stuck. note false paths. Can they understand and react to messages as intended.
    • Implicit Testing - when testing in svt the testers will become familiar with the designed workflows, and interfaces. Sources of frustration or confusion for testers are likely to be the same as those seen in customer environments.
  • Reliability - longevity or long-haul testing. Focus is to see if the software can continue running under significant load/stress for extended periods of time. Usually looking for resource leaks or fragmentation, and performance degradations over time. Testing is on the scale of days or weeks. This testing is usually done last in the test cycle during maximum achieved stability.
  • Performance - Though performance test should be examining for explicit performance test defects, the SVT team should look for obvious performance issues. No scientific measurements should be needed to perform this type of observational testing. Often "Wall clock performance" is used as a metric.
  • Artistic Testing - freeform activity based on testers intuitions and insights into what might break. Scenarios are actually filled in during the test cycle as they are more thoroughly defined. they are documented with place holders up front to allow you to plan time and resources for their execution.

Integration Test focus areas

In SVT we looked at a given software product as a single cohesive entity for the first time. Integration test expands this scope such that multiple software products are tested together and teated as a complex system. The main theme here is that IT focuses on multiple pieces of software operating together on complex hardware configurations solving different goals in parallel.

Names are the same, but focus is a bi different:

  • Regression testing in IT ensures the updated product can still interact correctly with older products. Also known as compatibility testing
  • Migration testing may attempt to migrate the full environments to multiple new or update products in a recommended sequence.
  • New function is exercises against a richer more complex environment with more intertwined interactions thought which to navigate.
  • Not just one, but multiple products must compete for system resources under heavy load and stress.
  • Recovery testing is no longer about the failure of one software products, but rather about the whole stack this piece of software depends on. Since software in this phase of test should be relatively stable, extremely complex test cases involving failures may be attempted.

Single System VS Multisystem testing

Some software is designed to run stand alone and still meet down time and service level agreements (SLAs).  Other software is designed to be deployed into clustered environments.  Clusters, federated databases, etc.  Multisystem testing is a challenge for a number of reasons. 
  • Sympathy Sickensss - When one node in a cluster is not operating correctly, does it cause the rest of the cluster to also get sick? This testing is all about the interactions when a cluster is not operating properly. Testing is generally done by inflicting damage on one node to observe the overall clusters behavior.
  • Resource Contention - If no system is sick but all nodes are vigorously vying for the same resources problems can arise. Locks, and synchronous messages are often behind the defects associated with resource contention tests. Does a dynamic update cause a spike in resource consumption or increased contention?
  • Storm Drains - Clustered systems often have a workload distributor or sprayer in front of them to distribute/route work. Round robin or more likely a capacity based distribution system is in place. If a system in the cluster enters an error condition where it continues to receive work but flush it rather than complete it, it may cause the distribution of extra work to the failing system since the flush operates quicker than processing! Eventually most new work will go to the ailing system and flushed away.

Test Cases Versus Scenarios

Sometimes the term test case and user scenario are used interchangeably but they mean different things.

IEEE Test case: "A set of test inputs, execution conditions, and expected results developed for a particular objective, such as to exercise a particular program path or to verify compliance with a specific requirement."

IEEE Test Procedure: "Documentation specifying a sequence of actions for the execution of a test."

This definition of test case maps to our earlier definition of test variation. In practice the term "test case" is often used to describe a test that is embodied with in a test program. When the test requires a sequence of actions to be performed, it is usually called a scenario.

Test Case: A software program, that when executed, will exercise one or more facets of the software under test, and then self-verify its actual results against what is expected. Think of this as software driven stuff.

Scenario: A series of discrete events, performed in a particular order, designed to generate a specific result. Think of it as a customer like activity.

Test Plan Reviews

Test plans need to be reviewed by multiple interest parties, take criticism well and incorporate feedback into your plans. You should love test reviews and those who offer critical suggestions for improvement.

Reviews can be conducted by sending the document out for review with an RFC by a particular date, or by calling a planned meeting with the stakeholders.

If you send a test plan out for review and do not hear anything back, DO NOT ASSUME IT WAS PERFECT, it is far more likely that NO ONE READ IT!

When possible do reviews during scheduled time in a meeting. It almost always ensures a better review since it forces everyone to review the document, and verbal interchange sparks remarks among the reviewers.

  • Internal Reviews - The review is confined to the testers own test team. Experienced testers lend wisdom and insight, while novice testers give fresh perspective that may lead to interesting scenario creation.
  • External Reviews - After internal review is complete, other external groups are invited to review and comment on the test plan. Designers, Developers, and other test teams should be involved. Ideally a customer who will be consuming this software should also be involved. Information developers will also provide interesting insight and should be invited to participate.

Chapter Summary

Test plans serve many important functions. Once created a test plan is a roadmap and measurement tool for the following test cycle. It is a vehicle for test methodology review. Should be seen as a tool to help testers, not a competition on who can produce the most documenation.



Chapter 8: Testing for Recoverability


  • Attacking a programs recovery capabilities during FVT
  • Focusing on the entire product view in FVT
  • Expanding the scope of IT
  • An example of clustered server recoverability testing

Testing a programs recovery action makes a lot of sense early on, since early software is likely to be riddled with bugs and unstable. Having a working recoverability mechanism means less time between tests when critical failures occur.

FVT

  • Special Tools an Techniques - Some external methods may be used to trigger failure, such as filling logs, or killing a process. Other times FVT needs to do things like simulate bad parameter being passed to a function, or forcing an interrupt to occur just when a module reaches a critical processing point.
    • Stub routines - Just like unit testers do. Write stubs that replay what they get, altering a single parameter to be invalid at each iteration. You can also alter a module to pass back bad data to the next. Only works when the module being tested is called infrequently, and ideally only when called by the module under test.
    • Zapping tools - some software systems allow you to find the memory address of a particular piece of code running on a system. Clearly one can construct tools to write values at that address on the fly. Some systems even have this capability architected into the software intentionally. Dy