Monday, October 27, 2008

Advantages of a Log Scale

One option that Excel has, and that I imagine most graphing software has, is to convert one of the scales from the regular linear scale to a logarithmic one. In a logarithmic scale, the equally-spaced intervals represent powers of 10.

In the oil industry, we use log scale charts quite a lot, and they often make the data more sensical. Today I made the following scatter plot, which shows the oil to gas ratio for some wells in Oklahoma. The x axis represents how long the well has been running, so the idea is to see how these ratios might change over time, and how they vary from well to well. It came out like this (get a bigger image by clicking):
This is obviously not acceptable as a graph. The couple of really high values near the top edge make all of the more normal values clump together at the very bottom. And no matter how much I limit the range of the y axis, I get basically the same effect. Here I've limited it to 2000, which cuts off some data points:

It's better, but it's still not great. Then I tried a log scale:

And voila. Suddenly this data makes sense.

Log scales are a bit tricky for people to read, so it does help to add the minor gridlines.

The spacing of the minor gridlines helps remind you how the scale works, and you can also use them as a guide. The line immediately above 1000 is 2000, then the next one is 3000, etc., and they are spaced accordingly.

I love these types of data representation issues and when I make a display that is more intelligible than it might have been, it makes me really happy and proud.

4 comments:

rvman said...

Does each X represent a single well, or are all identical marks the same well at different times?Was there something fundamentally different about the 1979 well(s) - I'd guess it went dry around 30 years in? Everything else seems linear, with a small upward slope, but that set looks like a dozen or so completely unrelated data points tacked on the end of an otherwise well-behaved series.

(That's all the 'x's at the right of the plot, which look like a tail 'turning downward', but actually are all members of one discrete set.)

Tam said...

The points with the same symbols are points from the same wells. I think the engineering explanation of what happened to the 1979 well is that there was no longer enough pressure in the well to get the oil out. But I'm not really sure.

We used a different method to calculate CGR the next day and the results came out slightly different.

Looking at another chart I have, if you look at all of the wells from the same decade (meaning, that began producing in the same decade), there is no meaningful correlation between months on and the CGR for any of the decades of wells. I suspect each well has a somewhat constant, or at least just kind of randomly or slightly varying, CGR.

The engineer I'm working with wanted to confirm (or not) his hypothesis that the wells of the 2000's, which are horizontal, would have higher CGRs. I don't think the data bears that out very well.

Sally said...

I have no comment on the data, but could not resist pointing out that your chart locates the wells in Ohio. :)

Tam said...

Crap. So much for quality control.

But, you know, everyone knows OK is not a swing state, so it would be pointless to do all that work there.