To save a team's resources when searching for and onboarding new employees, on-demand testing in the cloud of performers is a proven option. This service lets you scale quickly without overpaying. But clients often wonder if they can trust the quality of this kind of testing.
In this article, we'll tell you how we monitor the testing quality in Testory and show you our internal processes using graphs with numbers.
Three metrics for testing quality control in the cloudWe use two tools that have helped us triple the reliability of testing results. We check them using three metrics, which also help us identify any potential problems in crowdtesting, eliminate them, and improve testing quality.
Learn more about this topic → How we maintain high-quality testing with just two tools
Metric 1. Confidence in completing a pool of testing tasks
This metric shows how reliable testing results obtained during live checks and launches are.
Our main task is to maintain at least 80% confidence in the stream.
To do this, we:
- Implement different types of penalties for testers who cheat.
- Give testers constructive feedback, so they see their mistakes and learn from them (not everyone who misses a bug is a cheater).
- Improve the tester selection and hiring process.
In this metric, we look at streaming honeypots and custom honeypots. The difference is that streaming honeypots are created automatically and get mixed into the entire stream of tasks, regardless of the client. Custom honeypots are created by clients, and only people performing tasks for a specific product see them.
Trust is lower for streaming honeypots than for custom honeypots.
Streaming honeypots are created automatically, so they quickly lose relevance. That's why they have such short lifetimes.
Custom honeypots are static, so testers are well trained and get better at them over time.
Metric 2. Degree of coverage of active testers with test honeypots
This metric shows the level of coverage of testers with test honeypots and how testers generally deal with them. In other words, the more testers that are regularly checked, the more we know about their quality and the better we can predict their work by calculating the level of confidence in the result.
Just like the previous metric, our goal is to maintain 80% coverage of testers. We want at least 80% of active testers — those who have performed testing honeypots for a certain period — to complete at least 5 test honeypots during the same period. Based on the data obtained, we build a level of confidence in a particular performer and their work
To calculate this metric, we do the following
- We increase coverage with streaming checks (so that the performer encounters automatic checks regardless of their work intensity).
- We introduce permanent test tasks (to increase the number of honeypots and train testers).
- We use ML when mixing in test tasks so that they're as similar as possible to real tasks and the tester doesn't suspect that they're being tested.
- Based on the data received, we calculate the confidence in testers metric.
- We increase the quality of streaming honeypots to make them more like regular tasks.
Metric 3. Banning testers
Errors (accidental or intentional) lower confidence in the results, so testers who make too many mistakes in test tasks are banned for a day, a week, or forever. The severity of the ban depends on the severity of the “crime".
We don't have any goals for this metric yet. This is more like an extra measure to help us get rid of any questionable testers. The fewer cheaters we have, the higher the testing quality. That being said, the number of bans won't drop until the first two metrics reach 80%.
* According to Payscale, the average annual salary of a US QA engineer https://www.payscale.com/research/US/Job=Quality_Assurance_(QA)_Engineer/Salary.