These developers learned what lean can do that agile cannot
FEATURE – In this article, we learn how a simple andon system and lean problem solving helped Theodo move past some of the most common obstacles that Agile alone can’t overcome.
Words: Jean-Rémi Beaudoin, Deputy CTO, Theodo UK
Over the past four years, I have been involved in a software development project for an asset manager. The purpose of this project was to help them move to agile development so that they could ship products faster.
Indeed, the agile framework allowed us to get the development team to deliver relevant features faster than our client was used to. Everything was going well, the team scaled and began working on more and more projects.
But in certain situations, Scrum (the agile framework we were using) can lead you to just pass the ball to other stakeholders when they’re hindering you (or blocking you if we keep to the Kanban Board jargon). The team then focuses on another feature until that stakeholder unblocks you. We found that this created waste in our projects, which is why we introduced the team to lean problem solving.
First, let’s illustrate the problem.
On a recent Monday morning, Thomas – a developer on the team – deployed a new feature to the staging server. The feature worked fine and it was now time for Thomas to deploy it to the production server, along with three other features developed by his colleagues on that same day. As Thomas ran the deployment script, however, an error occurred. A change had to be made to the configuration of the production database. That change had already been made by Jimmy, from the System Administration team, the week before on the staging platform, but it hadn’t been implemented in production yet.
Following the scrum process, Thomas moved his ticket and the other three to the blocked column of his Kanban board. He then sent an email to Jimmy, asking him to fix the production’s configuration. Nothing could be deployed to production until then.
Thomas then moved on to a different ticket that was not blocked, trying to bring as much value as possible. He felt like he had done a good job on his end, having stuck to the process. But when Jimmy’s answer came a few minutes later, Thomas was disappointed.
Sorry to say that due to the yearly winter freeze of production environments, no changes in servers’ configuration are possible from the 15th of December to the 15th of January. We call it the “winter freeze”. Unfortunately, that change in configuration will only be possible then.
Sorry for the inconvenience,
In such a scenario, strictly following the agile framework means that the productivity of the development team will be decreased when interactions with other parts of the company are necessary (in this case, the System Administration team). Any task that is dependent on this interaction will likely end up blocked and the team will move on to the next task on which they can be fully autonomous. In terms of speed of delivery, the team might catch up, but there is no doubt that, in terms of customer value, a blocked task can have a very negative impact.
Now, back to our story. Dan, the lead developer in the team, receives a notification on his phone every time a task is put in the blocked column of the Kanban board. It is a simple, yet efficient andon system. As soon as the notification arrives, Dan runs to the team.
Lean problem solving shifted Dan’s mindset from trying to maximize the team’s speed to trying to maximize the value it brings to the customer.
Dan and the team started digging into the problem using a PISCAR framework. Here’s how it worked
- Problem: I cannot deploy my feature.
- Impact: My client’s boss won’t be able to try our application on the brand new servers that cost him a lot of money.
- Standard process.
In the present situation, our issue occurred at the second last step in the process.
Cause found using the 5 Whys.
- Why can I not deploy my feature? Because the deployment requires a MySQL configuration change.
- Why? Which part of the configuration? The application needs to make MySQL transactions that are bigger than the default size limit
- Why? What feature of the application does it serve? When an admin changes a parameter, we compute a cache object that is quite big and is used later on when users request computations.
- Why is this cache saved in the database? The truth is that it wasn’t. Until that very week, we used to save this cache in the filesystem, but had to synchronize the files between both production servers when the cache was generated. So, we moved it to the database, which makes it easier because it is shared between servers so there is no need for additional synchronization, hence simplifying the system.
- Short-term countermeasure: I will revert the code change that moved this cache from the filesystem to the database and I will manually synchronize the files between both production servers until the end of the “winter freeze”.
- Long-term improvement: this problem revealed that, for some reason, the System Administration team does not use an automated configuration tool that would remove the risk of manually introducing errors. Therefore, Dan decided to encourage the client to adopt such a solution. He talked to the client about this the next day, hoping they would adopt the solution by mid-February.
- The client can use the other three new features on the production server by the end of the day.
- Configuration of servers is automated by mid-February and no more configuration discrepancies are observed between staging and production.
As this short story illustrates, this agile team successfully adopted two lean tools: an andon system and lean problem solving in the form of a PISCAR. [You can see an andon in the main picture of this article; and in case you wondered, the rubber duck is a common tool that challenges developers to explain a problem as clearly as they can, while the hourglass is useful in pair-programming, switching developer every five minutes.]
As a good lean tool, the PISCAR helped the team focus on the value brought to the client and come up with improvements across silos. This triggered interactions very much in the spirit of DevOps movement.
Instead of leaving a task stuck with a “not my fault” mentality, the team was able to reduce the impact of the problem on the customer in less than an hour. In doing that, they also came up with actions that could slightly improve productivity across the organization.