Planet Lean: The Official online magazine of the Lean Global Network
Unearthing problems by reducing work-in-progress

Unearthing problems by reducing work-in-progress

Pierre-Henri Cumenge
Pierre-Henri Cumenge
July 11, 2018

BUILDING BRIDGES - This month, we hear how addressing work-in-progress led a startup specialized in AI to productivity gains on their machine learning projects.


Words: Pierre-Henri Cumenge, Co-Founder, Sicara - Paris, France


At Sicara we have been focusing more and more on computer vision projects [computer vision is the science enabling computers to see, recognize and process images like the human eye would].

The best-performing algorithms in this field are called neural networks, since they loosely imitate the workings of the human brain (whereby each input goes through different, interconnected layers of “neurons”). To function, neural networks need to be “trained” - ingest thousands of images with a known content to adjust their internal parameters and be able to recognize the content of a new image. For instance, if I want to build an algorithm that detects cars in images, I will first train it to use images that I already know have cars in it. The algorithm will then be able to determine whether a new image contains a car.

In computer vision projects, testing algorithm improvements is a key task for our data scientists, one that seems inherently unpredictable: we don’t know in advance the results we will obtain nor the time required to train a new algorithm.

When we first began to work on computer vision and to discuss projects with team members, there seemed to be many problems. However, these typically didn’t relate to delays or unmet quality standards. Looking into our team’s approach, we spent time analyzing the process for each new experiment. This entails:

  • Coding the algorithm and launching its training;
  • While the training takes place, keeping the functionality card in “doing” and start working on a new task. This waiting phase can last several days;
  • Going back to the experiment once the training is complete to extract the results.

It became clear that having several in-progress tasks for each team member was making it hard to estimate the time required for each task and hence to measure productivity issues. For example, we discovered afterwards some hidden rework: the team had been working on trainings that had failed and needed to be relaunched.

We looked in more detail at what a data scientist does when working on a new experiment, and found the process could be broken down in five steps. To illustrate this, let’s take a concrete example from one of our projects. The team had been training a neural network called MaskRCNN on photographs of flat surfaces taken from an angle, in order to automatically recognize objects in these images for one of our client (Michel). The next experiment was to first straighten these images to make them look like photographs taken from above, and then train the network with straightened images.

Let’s look at the different subtasks our data scientist had to perform:

  1. Code the pre-processing of the images;
  2. Launch an algorithm training using the pre-processed images;
  3. Wait for the algorithm to train;
  4. Retrieve the results of the training;
  5. Compute the performances of the algorithm trained on straightened images.

We decided to split these experiments in two phases: starting a new training (steps 1-2) and getting the results (4-5). In our example, the experiment was “Train MaskRCNN on straightened images”. Split in two tasks, it became:

We therefore had two subtasks for each experiment, which now didn’t include any waiting. This allowed the data scientists to estimate each subtask separately, know precisely the time spent on each and react in case one task took longer than expected.

During the following six weeks, the team identified 15 different occurrences of tasks that exceeded the planned schedule, over 24 different experiments. For more than half the experiments, one of the subtasks exceeded the estimation of the team! Using a red bin approach, the team analyzed each subtask to address the root causes of the problem.

Let me share an example of how we used the 5 whys (well, 4 whys in this instance) on an experiment to add new images to our training dataset. The problem we faced was that a data scientist took 1.5 hours more than expected to generate the new performance metrics.

  • Why? The performance analysis script caused an error.
  • Why? The computer ran out of memory during the execution of the script.

At this point, the first action of the data scientist was simply to expand the computer’s memory. Further investigation on the red-binned task led to new discoveries:

  • Why did the computer run out of memory? The neural network was reloaded for each image on which it was tested.
  • Why? A single piece of code was both loading the neural network and making the calculations, violating a standard coding rule: “Each function performs only one task.”

Here is the corresponding code (slightly edited for simplicity):

The last line shows the neural network model being reloaded in memory inside a function that is supposed to only retrieve the results, which is unexpected, and as a side effect increases memory usage.

This investigation led to improvements in the organization of that part of the code:

Now the model loading happens in the initialization function (__init__), and the predict function only handles getting the results.

Looking at the red-binned tasks also led to a couple of unexpected findings. Contrary to what I thought, most of the problems occured in the last stage of the experiment (determining whether the performance metrics of the algorithms had improved). Only three out of 15 exceeded tasks concerned writing and launching the training algorithms.

Some common causes have already started to appear:

  • Three out of the 15 red-binned tasks were linked to a training or metrics calculation that had to be restarted after the servers ran out of memory. This led to an experiment in our newcomers’ training: we added a step for being able to estimate the necessary RAM required for training a given algorithm.
  • Two tasks had ended up in the red bin because of wrong conversions from centimeters to pixels when calculating the distance between objects. The team addressed this recurring issue by creating a Distance class in the code that held both values.

Splitting each experiment into smaller tasks in order to remove work-in-progress helped us to highlight the actual problems the team was having. As the observations above show, analyzing small problems through red bins can show a great - and humbling - way to match my preconceptions with the reality of our work.


THE AUTHOR

Pierre-Henri Cumenge is co-Founder of Sicara in Paris, France.

Read more

Michael Ballé reflects on how lean really works
August 17, 2017
Michael Ballé reflects on how lean really works

FEATURE – Continuously studying the challenges we meet and the responses we devise, lean gives us endless opportunities to learn. The author goes back to basics, reflecting on some core lean premises.

Continue reading
What's the impact of lean thinking on a startup's growth?
March 3, 2015
What's the impact of lean thinking on a startup's growth?

FEATURE - An initial look into the impact of lean management principles on the growth of young organizations hopes to encourage further analysis into why and how lean startups succeed.

Continue reading
Jim Womack provides us with a few tips on how to sustain 5S
November 29, 2017
Jim Womack provides us with a few tips on how to sustain 5S

WOMACK’S YOKOTEN – 5S seems to mean different things to different people. What’s common, however, is the difficulty to sustain it. The author offers a few tips.

Continue reading
Reflecting on our lean healthcare journey
June 27, 2019
Reflecting on our lean healthcare journey

CASE STUDY – The CEO of a hospital in Johannesburg looks back to the last five years to reflect on a lean healthcare transformation that is creating positive outcomes for patients.

Continue reading

Read more

No items found.