The software Hara-kiri

The Japanese subsidiary of a global system integrator is required to outsource a part of their projects to the subsidiary in a ‘friendly’ country due to political reasons. However, this ‘Global’ delivery model has its banes. The pieces of code don’t necessarily integrate as expected.

After couple of times of burning their hands and re-code there was a noticeable pattern. Since freezing outsourcing was not an option, they enforced delivery of ‘Unit’ test cases along with the results hoping to improve the quality of the delivery. A whole lot of test cases and the code passed it all. This is where we entered the scene.

When a large System integrator having the third largest market share in Japan, faced quality issues in the code delivered by their outsourcing partner, they asked STAG’s involvement. We decided the best way is to assess the vast Unit test cases and reports that came along with the delivery. The assessment was to be done by comparing the available artifacts (Test Case, Data Definition Language (DDL), Screen Transitions and Bean Specifications) with those defined per HyBIST. The assets were assessed to understand

  1. Quality of the test cases
  2. Test Completeness
  3. Test Coverage
  4. Comparison with Ideal Unit Testing

For good Unit Testing, the unit should be validated from an external (using Black-box testing techniques) and internal/structural (using white-box testing techniques) view. In this case all test cases provided were designed using Black-box techniques per the specification and not using the code structure.

The results astounded the client – Apart from issues like poor test data, incomplete steps and insufficient negative tests, the tests were found to be designed applying only Blackbox techniques  i.e., the structural aspects were not evaluated at all. The results were used to confront the partner and renegotiate all future engagement contracts and deliverables.

Back to the future >> preparing for an avalanche

When a bank implements major solutions, you need to watch like a hawk. The smallest glitch can set off an avalanche. When we were asked to validate the performance of an integrated financing solution for leading commercial bank, we assumed it was like any other project. This wasn’t the case. The challenge thrown at us was ensure the system is future proof for 3 years! From our experience, we knew scripting and simulating large user load was the easier part. Banks run on data and documentation and this product is intended to cater to the agri-commodity business of the bank. We foresaw an avalanche of data thundering down!

The product is intended to enable financing for farmers for the commodity they have produced. Bank offers loan against the commodity that is being stored in warehouses. With a focus on commodity finance, the solution encompasses various modules of commercial operations right from sourcing of the account, operations, monitoring and control, recovery management, audit and closure through repayment. Each of the process that is initiated has to go through approval process and most of the processes have initiation and approval stages for the completion of the process!

Based on the understanding and post having some initial discussion with the bank, detailed operational profile was derived. 40+ scenarios were identified for the test with concurrency of 600 users.

The plan was to conduct the load test for 3 different combinations where peak concurrency of each module defined is achieved during the different combinations of load test.

Considering the key requirement was to conduct the test simulating 3 years of usage of the system, the only success factor was the test data creation. Hence it was required to create the huge test data before doing the actual test. The system was heavily loaded with data – 2000 users, 5000 borrowers 300 warehouses (100 Govt, 200 Private/Godown warehouses), 44,000 Loans, 50,000 liquidation, 10 image uploads per borrower and every warehouse creation and so on…

The scripts were developed to populate the required test data in the system to replicate three year usage. The first step towards that was to create 2000 users in the system. User creation meant creating more data for every user required role, and branch for which it needs to be created. After creation of the users, we started creating the warehouses and the borrowers required for the test. The next major activity was loan bookings and liquidations. 40,000 loan bookings and 50000 liquidation records were created by running JMeter scripts.

The interesting part was the interesting set of functional issues that surfaced during the data creation. The customer couldn’t be happier. The product was supposed to have been tested thoroughly for functionality. Once these were fixed, we were prepared for the next set of performance related issues. Steadily, with one step at a time, we ensured the avalanche would not occur for the next three years.

HyBIST implementation benefits more than just testing

We have been talking and advocating on various platforms, how scientific approach of HyBIST and the method, STEM, delivers key business value propositions. This time we thought it would be prudent to share our experience of implementing this in projects and convey the results as well as interesting benefits that it entails while delivering clean software to our customers.

HyBIST was applied on various projects that were executed on diverse technologies in variety of domains across different phases of product life cycle. The people involved where of mix from no experience to 5 years of experience.

We observed that HyBIST can be plugged into any stage or situation of the project for a specific need and one can quickly see the results and get desired benefits as required at that stage.

Our experience and analysis showed that there were varied benefits like rapid reduction in ramp up time, creation of assets for learning, consistent way of delivering quality results even with less experienced team members, increased test coverage, scientific estimation of test effort, optimization of regression test effort/time/cost, selection of right and minimal test cases for automation and hence reduction in its development effort/time/cost.

Following are the key metrics and the results/benefits achieved in some of the projects were HyBIST was implemented –

Project 1:

Domain: SAAS / Bidding Software

Technology – Java, Apache (Web server), Jboss (App server), Oracle 9i with cluster server on Linux

Lines of Code – 308786

Project Duration – 4 months

D1, D2 and D4 were done almost in parallel due to time constraints for this complete application that was developed from scratch.

  • D1 – Business Value Understanding (Total effort of 180 person hours)
    • 3 persons with 3+yrs experience were involved (had no prior experience in this particular domain)
    • 4 main modules with 25 features listed.
    • Landscaping, Viewpoints, Use cases, Interaction matrix (IM) were done.
    • D1 evolved and developed by asking lot of questions to the design/dev team.
  • D2 – Defect Hypothesis (Total effort of 48 person hours)
    • 3 persons with 3+yrs experience were involved.
    • 255 potential defects were listed.
  • D4 – Test Design (Total effort of 1080 person hours)
    • 3 persons with 3+yrs experience were involved.
    • Applied Decision tables (DT) for designing test scenarios.
    • Totally 10750 test cases were designed and documented.
    • Out of which 7468 (69%) are positive test cases and 3282 (31%) are negative test cases.
    • Requirement Traceability Matrix (RTM) and Fault Traceability Matrix (FTM) were prepared.
  • D8 – Test Execution (Total effort of 3240 person hours)
    • 9 persons were involved in test execution and bug reporting/bug fixes verification (3 persons with 3+ yrs experience and 6 persons with 2+ yrs experience).
    • 12 builds were tested in 3 iterations and 4 cycles.
    • Totally 2500 bugs were logged, out of which 500 bugs were of high severity.

Key benefits:

  • No bugs were found in UAT.
  • All change requests raised by QA team was accepted by Customer & Dev Team.
  • Interaction matrix was very useful for selecting test cases for regression testing and also for selecting right and minimal test cases for automating sanity testing.
  • Regression testing was for shorter periods like 2 to 3 days, interaction matrix was quite useful to do optimal and effective regression testing.
  • The method, structure, templates (IM, DT, RTM, FTM, Test case, Reporting) used and developed in this project is being used as reference model for other projects at this customer place.

Project 2:

A web service with 5 features that had frequent enhancements and bug fixes (Maintenance)

Technology – Java, Apache Web Server

Project Duration – 4 weeks

  • D1 – Business Value Understanding (Effort of 6 hours)

Mind mapping of the product and also the impact of other services & usage on this service

  • D2 – Defect Hypothesis (Effort of 5 hours)

Listed 118 potential defects

Key Benefits:

  • Preparation of D1 document enabled ramp up time for new members (Developers/Testers) to understand the product, to come down from earlier 16 hours to 4 hours now.
  • Any member added to this team was productive from day one and could start testing for any regression testing cycles for enhancements and bug fixes.
  • Listing of potential defects enabled adding missing test cases from the existing test case set.

Project 3:

Domain – E-learning

Technology – ASP.Net, IIS, SQL Server, Windows

Validation of a new feature added to the product

Duration – 2 weeks

  • D1 – Business Value Understanding (Effort of 5 hours)

Understood the feature by asking questions and interacting with development team over emails/conf calls

  • D2 – Defect Hypothesis (Effort of 2 hours)

Listed 130 Potential defects by thinking from various perspectives

  • D4 – Test Design (Effort of 16 hours)

Designed and documented 129 test cases

  • D8 – Test Execution (Effort for Test Execution – 626 Person hours,

Effort for bugs reporting/bug fixes Verification – 144 Person hours)

Executed test cases by performing 2 cycles of testing and 2 regression cycles

8 new test cases were added while executing the test cases

31 bugs were found in test execution of which 23 bugs were of high severity. 29 of the bugs can be linked to potential defects visualized and listed before. 2 of the bugs found were not linked to any documented test cases.

Key Benefits:

  • Arrived at a consistent way of understanding the feature and designing test cases for new features irrespective of the experience of the team member involved

Project 4:

Domain – Video Streaming

Technology – C++, PHP, Apache, MySQL, Linux

An evolving new product in very initial cycles of development/testing

Duration – 4 weeks

People Involved – 2 Fresh test engineers (No previous work experience but trained in HyBIST/STEM)

  • D1 – Business Value Understanding (Effort of 32 hours)

The understanding of the product in the form of listing features/sub features, Landscaping, Critical quality attributes, Usage environment/Use cases, by questioning

  • D2 – Defect Hypothesis (Effort of 40 hours)

Listing of over 150 potential defects

Key Benefits:

  • 2 fresh engineers could understand and comprehend the product features, the business flow and its usage in a scientific manner and also document it. They could also think and visualize possible defects to enable them to come up with needed test cases to identify and eliminate defects.
  • The process of doing D1 and D2 by the fresh engineers generated lot of useful questions that enabled better thinking, understanding and different perspectives of the product behavior to the senior engineers. This helped them to design more interesting test cases to capture the defects during test execution.
  • The assets created as D1 and D2 is helping the other members of the team to quickly ramp up on the product features and get the detailed understanding in 50 % lesser time.

Project 5:

Domain – Telecom protocol

3GPP TS 25.322 V9.1.0 Standards

Estimate effort for complete test design of RLC protocol by going through existing very high level test specifications and designing test cases for 2 sample functions

Duration – 3 weeks

None of the persons involved had any previous experience in testing protocol stack.

  • D1 – Business Value Understanding (Effort of 40 person hours)

Went through the generic RLC standards and in specific understood the 2 functions, Sequence number check and Single side re-establishment in AM mode

Prepared flow charts with data/message flows between different layers

Prepared box model illustrating various inputs, actions, outputs and external parameters

  • D2 – Defect Hypothesis (Effort of 16 person hours)

Listed 28 generic potential defects and the defect types

  • D4 – Test Design (Effort of 48 person hours)

Prepared 2 input tables and 2 decision tables

Designed 26 test scenarios (6 positive, 20 negative) and 44 test cases (6 positive, 38 negative)

  • Performed gap analysis of missing test cases in the customer’s test specification document for the 2 functions (Effort of 12 person hours)
  • Estimated time and effort for the complete RLC test case design, based on the above data (Effort of 4 person hours)

Key benefits:

  • Performed gap analysis in the existing high level test specs
  • 20 times more test cases designed for 2 functions covered
  • 86% of the test cases added were negative type
  • Test cases developed were detailed and covered various combinations of inputs, parameters and intended/unintended behaviors
  • Test cases developed were suitable in order to help easy conversion to test scripts using any tool
  • Performed estimation for RLC test design covering 22 functions
  • 256 Test scenarios and 1056 test cases to be designed with effort of 446 person hours for RLC

Project 6:

Domain – Retail

Validate railway booking software on point of sale device

Technology – Java

Duration – 4 weeks

  • D1 – Business value understanding (Effort of 16 person hours)

Documented software overview, features list, use cases list, features interaction matrix, value prioritization and cleanliness criteria

  • D2 – Defect Hypothesis (Effort of 18 person hours)

Listed 20 potential defects by applying negative thinking, 54 potential defects by applying Error Fault Failure model. The potential defects were categorized into 46 defect types and were mapped to the features listed.

  • D3 – Test Strategy (Effort of 6 person hours)

Based on the listed potential defects types, arrived at test types, levels of quality and test design techniques needed as part of test strategy/planning. Quality level 1(Input validation and GUI validation), Quality level 2(Feature correctness), Quality level 3(Stated quality attributes) and Quality level 4(Use case correctness).

  • D4 – Test Design (Effort of 24 person hours)

Designed and documented 30 test scenarios (15 positive, 15 negative) and 268 test cases (197 positive, 71 negative) for quality level 1, 70 test scenarios (21 positive, 49 negative) and 123 test cases (55 positive, 68 negative) for quality level 2 and 8 test scenarios for quality level 4. Created box models and decision tables to arrive at test scenarios

Prepared requirement traceability and fault traceability matrices

  • D8 – Test Execution (Effort of 32 person hours)

Out of 293 test cases designed, 271 were executed and 22 were unable to be executed.

52 defects (27 high, 12 medium, 13 low) and 8 suggestions were logged.

Quality level 1 – 23 defects (2 high, 11 medium, 10 low) and 2 suggestions

Quality level 2 – 27 defects (25 high, 2 low) and 6 suggestions

Quality level 4 – 2 defects (2 medium)

Key Benefits:

  • Complete validation of the product was performed successfully by one senior engineer guiding 2 fresh test engineers without any previous work experience and none of them having experience in this particular domain
  • All the suggestions logged were accepted and valued

Project 7:

Domain – Mobile gaming

Technology – Java, Symbian

Duration – 3 weeks

  • D1 – Business Value Understanding (Effort of 16 person hours)

Achieved product understanding by documenting software overview, technology, environment of usage, features list, use cases list, mapping of use cases to features, features interaction matrix, value prioritization and cleanliness criteria.

  • D2 – Defect Hypothesis (Effort of 16 person hours)

Listed 96 potential defects by categorizing issues related to installation, download, invoking application, connectivity, input validation, Search, subscription, authorization, configuration, control, dependency, pause/resumption, performance, memory.

Mapped the features to the potential defects

  • D3 – Test Strategy (Effort of 4 hours)

Based on the listed potential defects types, arrived at 4 levels of quality (Game initialization & invoking correctness, Game subscription correctness, Game download correctness, Dependency correctness). The different test types were mapped to the 4 quality levels.

  • D4 – Test Design (Effort of 20 person hours)

Box models and decision tables were created

Designed and documented 37 test scenarios and 66 test cases

Key Benefits:

  • Complete validation of the product was performed successfully by one senior engineer guiding 2 fresh test engineers without any previous work experience and none of them having experience in this particular domain
  • The assets created in this became a useful reference for other projects in mobile gaming software for its understanding and validation

Guy Fawkes – Beautiful fireworks, not a blast!

We had an interesting challenge posed to us by a large UK based government health organization. It was to assess if their large health related eLearning portal would indeed support 20000 concurrent users (They have 800K registered users) and deliver good performance. There was indeed a cost constraint, and hence we decided to use the open source tool JMeter.

The open source toolset has its own idiosyncrasies – max 1GB heap size support , supports only a few thousand users per machine and and has a nasty habit of generating a large log! To simulate the load of initially of 20000 and later 37000 concurrent users, we had to use close to 40 load generators and synchronize them.

We identified usage patterns and then created the load profile scientifically using the STEM Core Concept “Operational Profiling”. We generated the scripts, identified the data requirements, populated the data and setup synchronized load generators. During this process, we also discovered interesting client side scripting , we flattened them into our scripts. Now we were ready to rock and roll.

When we turned on the load generators, sparks flew and the system spewed out enormous logs – 3-6 million lines, approximately 400-600 MB! We wrote a special utility to rapidly search for the needle in the haystack! We found database deadlocks, fat content and heavy client side logic. Also the system monitors were off the chart and the bandwidth choked!

Working closely with the development team, we helped them identify bottlenecks, This resulted in query, content and client side logic optimization.Now the system monitors were under control and the deployed bandwidth was good enough to support the 20000 concurrent user load with good performance. To support higher loads in the future, system was checked with nearly twice this load and additional resources to support identified.

The FIVE weeks that we spent on this was great! (Hmmm- tough times over at last!)

Healthy baby at birth!

A large India Development Center (IDC) of a major consumer electronics& peripherals delivers 3-4 releases of their product every year. They had “birthing” problems – early stage defects were bogging them down. The root cause was identified as ineffective development testing. The team was mature, had good practices and were focused on “unit testing”. The big question that nobody wanted to ask was “what in the name of God is an Unit?”. This resulted in everyone in both early and late stage doing similar tests with poor results.

Applying STEM, we clearly identified the expectations of code from development and listed the types of defects that should not seep from development. Having setup clear cleanliness criteria, we had gotten around the “notion of unit”, and setup a goal focused development test practice. The test cases increased many-fold (this did not increase effort/time) and fault traceability made them purposeful.

The code coverage jumped from 65% to 90% with the remaining 10% identified as exception handling code that were hand-assessed. Now all early stage code were ‘completely assessed’. The RESULT – Defect escapes to QA team dropped by 30-40% and the specialist QA team could focus on their job and releases were made on time.

From premature babies needing incubators, we had transformed the organization to deliver bonny babies!

Quality injection – Scientific validation of requirements

Validating early stage pre-code artifacts like requirement document is challenging. This is typically done by rigorous inspection and requires deep domain knowledge. One of our Japanese customer threw a challenge – “How can you use HyBIST/STEM to scientifically validate requirements without knowing the domain deeply?” .

The core aspect of HyBIST is to hypothesize potential defect types that prove that they do not exist. These are identified by keeping in mind the end users and the technology used to construct the system. So how do you apply this to validate a pre-code artifact?

We commenced by identifying the various stakeholders for requirement document and then identified key cleanliness attributes. These cleanliness attributes if met would imply that the requirements was indeed clean. We were excited by this. We then moved and identified potential defect types that would impede these cleanliness attributes/criteria.

Lo behold, the problem was cracked and we then identified the various types and the corresponding evaluation scenarios for validating the requirements/ architecture document. We came up with THIRTY+ defect types that required about 10+ types tests conducted over TEN quality levels with a total of SIXTY FIVE major requirement evaluation scenarios to validate a requirement.

What we came up is not yet-another-inspection-process that is dependent on domain knowledge, but a simple & scientific approach consisting a set of requirement evaluation scenarios that could be applied with low domain skill to ensure that the requirement/architecture can indeed be validated rapidly and effectively. These ensure that the requirement document is useful to the various stakeholders over the software life cycle and does indeed satisfy the intended application/product attributes.

It was more than just validation. It was ensuring nation’s pride.

A large petroleum major was rolling out a specialized solution to ensure that fleet tracking solution to ensure zero pilferage during transport. The solution consisted of a plethora of technologies (GPS, GSM, Web, Mapping) and our role was to ensure that the final solution is indeed risk-free for deployment.

With the launch date in the next few weeks, we got cracking on applying HyBIST to extract the cleanliness criteria from the business and technical specifications outlined in the tender document.The cleanliness criteria consisted of multiple aspects- deployment environment correctness, cleanliness of software, clean working of hardware/software interfaces and finally the ability to support a large load, volume with real-time performance.

We identified the potential types of defects that spanned the entire spectrum of hardware and software. The first step was to understand system development process and identify senior consultants visited the vendor’s facility to assess the people and processes used to develop the system. This provided a clear picture of what to expect and the work that lay ahead of us.

Post our understanding of the development system, we developed a scientific strategy and the evaluation scenarios. A variety of tests were identified – individual feature validation, simulating various business use-cases, understanding load limitations and performance evaluation of the system.

Now we were ready to validate the final system in the data center. The first cut of the solution was used to develop a set of automated scripts for large scale load/stress/performance testing. The system was populated with large data representing a real life system. Vehicles were fitted with the vehicle mounted unit. We were ready to roll now.

The various vehicles were set in motion in various terrains, at various speeds and mapping of the fleet on the India map was validated. We simulated large number of vehicles with data arriving from the simulators at a high rate to ensure that performance was indeed real-time.

In addition, the deployment environment was validated, configurations checked, legality of the software verified. We also verified that the solution integrates with customer’s SAP database as well.

Bugs popped up and were fixed. We recommended changes in the system capacity, pushed the vendor to close all the critical, high and medium priority defects before providing a qualitative feedback on the solution and the potential risks. Once satisfied our customer’s investment was safe, we gave a go-ahead to rollout the solution.

Low cost automation challenge

A New Zealand based customer in the heath care domain embarked on a journey of migrating their Delphi-based products to Microsoft technologies. The products use specialized GUI controls that are not recognized by the typically popular tools. The company was keen to embark on automation right from the early stage of migration. And the budget to develop automation was tight.

We conducted Proof-Of-Concept (POC) to identify the tool that would support automation for both Delphi and VB.Net. We discovered that most popular tools were indeed not compatible with the developed product. The POC concluded that Test Complete did support both Delphi and VB.Net with a few constraints. It was very cost effective however but not user friendly. We convinced the management of our decision. The project started off with us identifying test cases which could be automated. Seven modules were automated and demonstrated.

We developed reusable Keyword Driven Framework for the client. Both individual test case execution and batch run was possible just by choosing the test cases. STAG provided detailed demo of the framework to the in-house QA team.

However some of the test cases chosen for automation were not complete. We validated the test cases, made the necessary changes and then initiated the scripting. The automation work was divided between the STAG and customers team. As we automated the test cases, we guided and trained the customer’s team to automate.

The result – By automating 326 test scenarios, the testing time was cut down from 80 hours to 12 hours! We saved the customer significant money spent on the tool, but more by enabling them to release the product to market ahead of schedule!

Delivering peace of mind- Assessing release worthiness

The product helps to detect different types of telecom fraud, be it in wireless or wire line networks. It also helps to detect fraud in roaming, pre-paid and post-paid environments and tailor-made for GSM/CDMA/Fixed/GPRS network. The product team comprised of strong development team ably supported by an in-house QA team. The product was developed using J2EE technologies and had undergone multiple versions of build – currently in version 6.0 – with wide installation base in Asian/US market. The company had an ambitious plan to expand the product reach and move into a new market – European market. The product went through multiple feature upgrade/modifications to meet the needs of the new market. Though the product was tested diligently by the in-house QA team, the management was skeptical about the release worthiness of the product. They preferred to have an independent third party product assessment to enhance their delivery confidence before the formal product launch.

STAG singularly focused to ensure the defect escapes are minimized. Hence a three-pronged approach was adopted to determine the breadth and depth of testing required –

  • Identify what poses high business risks? What has been de-risked already? What remains as risk that is to be assessed?
  • How well has the “net” been cast to uncover defects in the lifecycle? Are the methods to uncover defects expansive/complete?
  • Are the test cases (i.e. those inputs that have the ‘power’ to detect anomalies) good?  Do the already existing test cases and therefore the tests conducted have the power to uncover high-risk business issues?

Fixing the high impact defects improved the stability of the product– which otherwise could have led to USD 250K support cost in the initial months. The release worthiness certificate lowered the business risk for the customer and newly gained delivery confidence by the customer management powered their successful product launch and on time to market.

We demystified the automation puzzle. Relentless validation tamed!

A large global provider of BI solutions has a product suite that runs on five platforms supporting thirteen languages with each platform suite requiring multiple machines to deliver the BI solution. The entire multi-platform suite is released on single CD multiple times a year.

The problem that stumped them was “how to automate the final-install validation of multi-platform distributed product”. They had automated the testing of the individual components using SilkTest, but they were challenged with “how to unify this and run off a central console on various platforms at the same time”.

Considering each platform-combination took about a day, this required approximate two months of final installation build validation, and by the time they were done with this release, the next release was waiting! This was a relentless exercise, consuming significant QA bandwidth and time, and did not allow the team to do things more interesting or important.

The senior Management wanted single-push-button automation -identify what platform combination to schedule next, allocate machine automatically from the server farm, install and configure automatically, fire the appropriate Silk scripts and monitor progress to significantly reduce time and cost by lowering QA bandwidth involved in this effort. After deep analysis, in-house QA team decided this was a fairly complex automation puzzle and required a specialist! This is when where we were brought in.

After an intense deep-dive lasting about four weeks, we came up with a custom master-slave based test infrastructure architecture that allowed a central console to schedule various jobs onto the slaves, utilizing a custom developed control & monitoring protocol. The solution was built using Java-Swing, Perl, Expect and adapters to handle Silk scripts. Some parts of the solution where on Windows platform while some on UNIX. This custom infrastructure allowed for scheduling parallel test runs, automatic allocation of machines from a server farm, installing appropriate components on appropriate machines, configuring them and finally monitoring the progress of validation through a web console.

This test infrastructure enabled a significant reduction of the multi-platform configuration validation. The effort reduced from eight weeks to three weeks. We enjoyed this work simply because it was indeed a boutique work fraught with quite a few challenges. We believe that this was possible because we analyzed the challenging problem from wearing a development hat and not the functional test automation hat.