Friday, March 23, 2007

Estimation vs. Guessing

When I was a kid, some book I had contained an exercise like this. Try to guess how big around your waist is, it said. It probably had a space to write down your guess. Then it recommended a technique for getting an estimate. Put your hands at your sides and use them to estimate how wide you are at the waist. Now assume your body is pretty circular. If you multiply your estimated width by 3, which is approximately pi, you can get an estimate of the circumference (diameter * pi = circumference of a circle). If you write down that estimate and then actually measure around your waist, it's probably much closer than your original guess.

I don't know if this book was trying to make a point about estimation or one about geometry, but lately the idea of estimation has come up a lot in my life. I think current school curricula for kids put more emphasis on this than when I was a kid too.

In any case, recently in our Software Engineering Principles class, Dr. Paul asked us to discuss amongst ourselves how we would set up a software testing program, how we would estimate the time that the testing would take, and how we would decide when testing was finished. So we talked about this for a few minutes, sharing what we'd learned from various readings, until the conversation ground to a halt.

"Are you done?" he asked us. We nodded, more or less.

"What did you decide?" he asked.

Our general answer was along the lines of, "Nobody knows, you can't really estimate it, and anything you come up with is just going to be kind of a way to justify your gut, so you just kind of release the software when you think it's ready."

He became theatrically angry with us. His general point was that any method of estimation would be better than a guess or "gut feeling" and that if he would accept a guess or gut feeling from anyone, it would be people with years of experience running successful projects, not a bunch of college seniors. Oops.

Let me give you a sense of how hard it is to estimate "how long testing will take." You could do it like this:

  • Our software has 100,000 line of code.
  • We know from the literature that very good programmers usually introduce about 4 bugs per line of code if the code is not too complex.
  • That means our software has about 400,000 bugs.
  • Except we had some really good reviews while we wrote this, so we bet we found half of 'em already, so there might be 200,000 bugs left.
  • Our software isn't that important (it's not for a cancer radiation machine or the Mars rover or whatever), so we can release it when 85% of the bugs are gone.
  • This means we need to find and fix 170,000 bugs in order to release our software.
  • We have a team of 20 people and we think each person can find 15 bugs per week. We think we can fix them at about that rate too.
  • So we estimate it will take about 57 weeks to test and debug this software.
Now let me point out that this is not a very good way of estimating this; an experienced person might well scoff at it. Common sense suggests that all of the numbers used could be way off.

Nevertheless, if you instead just "guess" how long testing will take, you'll most likely be off by as much as an order of magnitude (so you might guess 2 months when it will really take 20 months). Any intelligent estimation technique will get you much closer than that.

The other advantage of estimation over guessing is that you can learn from mistakes. If I made the estimate above for an actual project, and the testing instead took 80 weeks, I could see places where the model was wrong - were there way more bugs than we thought? did people find a lot fewer per week than we guessed they would? were we still finding really critical bugs even when we were no longer finding very many bugs in general? This would allow me to make a better estimate for the next project. It is much harder to adjust your gut; you can only use very crude adjustments like "I'm always too optimistic" or "things take twice as long as I tend to think."

Sometimes time estimation can brighten your day too. If you have a stack of 200 forms to process, it might feel like you'll never finish, but if you time yourself while you do 5 and find that 3 minutes have elapsed, you know you can finish in two hours, not counting breaks. So really, if you're starting in the morning, you'll easily be done by lunch.

Budgeting is another example of estimation's superiority over guessing, but that's a whole topic of its own.

4 comments:

Sally said...

I responded to this on Empirical Question today as well, but I wanted to mention that one thing that seems to me always very poorly estimated is the number of people at a large rally/event. This appears to be a place where extrapolation doesn't work very well.

Anonymous said...

When I was thinking of how to estimate debugging time, I thought of recording how long it takes to find the next bug and then when you can go a certain amount of time without finding a bug, you can declare yourself done.

Tam said...

In terms of determining when you are finished, there's a method somewhat similar to what you describe.

It's a given that you never find every bug. It can't be done in a non-trivial project. So one thing that is done is to keep a graph of the number of bugs found per week and keep an eye on the graph. You might decide ahead of time that you'll be done when you have two weeks in a row in which the number of bugs found is some specified fraction (say 1/20th) of the number found in the peak week. Or, more casually, you can keep an eye on this graph and watch for some kind of a petering out.

Anonymous said...

And....uh.... what about this Blog, so far, is boring?

Cool post. Thanks.

-- Ed