Thursday, November 20, 2008

Benford's Law

I'd never heard of Benford's Law before, but this is pretty cool:



For those who can't or don't watch the video, Benford's Law states that, for most natural data (e.g., population sizes, but not something like social security numbers), the most common first digit is 1, then 2, on down to 9, in a specific pattern.

I was curious if this law would apply to the data I worked with, and pretty sure that it would. I opened up the well production database for a random client I'm working on and put the monthly oil production values into a spreadsheet. There were over 16,000 non-zero values. I got the first digit and then did a scatterplot, shown below. The red curve is the values predicted by Benford's Law, and the blue curve is the values I actually obtained. Click for a larger image:

That's a pretty close fit, so I was pleased.

Benford's Law works because most real data is evenly distributed on a log scale, and the numbers on a log scale are farther apart the lower they are (i.e., 1 is further from 2 than 2 is from 3), so that more numbers fall into the lower end of any given order of magnitude, so that more of them will start with 1, and very few will start with 9. That's my understanding from Wikipedia, at least.

1 comment:

Sally said...

Nice. I enjoyed watching the video.