Tuesday, January 15, 2013

Benford on China

Picking up on some research scouted out by some folks at ANZ - picked up by Bloomberg News - and further picked up by Kate Mackenzie over at Alphaville! Am I down the pecking order or what?

As someone who deals with far too much data on a daily basis, let me unequivocally state that China-related data IS NOT very friendly. So hats off to Liu and Lam for actually testing this out.

It's quirky, don't read too much into it and please bear with me while I give you my own short primer on Benford.

Named after Frank Benford, an American physicist and engineer, the law basically relates to the frequency distribution of digits, in data. So what you have is probabilities and frequencies of how many times x will appear. Since we're dealing with digits, x belongs to {1,2....., 9} or {0,1,2,.... 9} - more on this later.

What does it tell us specifically? Essentially, that (this is the first-digit law) in a set of data, larger numbers will occur in that distribution less frequently. 1 occurs the most frequently and 9 the least. The difference in the first-digit and second-digit law is that the second digit includes 0 in the subset. Mathematically, a data set will satisfy Bernford's law if the leading digit x, where x belongs to {1,2.....,9}, occurs with probability given by

P(x) = log (1 + 1/x) 

(logs are to the base 10) but this will work with any base. Think of a binary system (b=2) which would have a trivial solution because all numbers except 0 start with 1!

Fairly simple, isn't it?

So for x = 1, 
P(1) = log 2 = 0.301, 
x = 2,
P(2) = log(1.5) = 0.176
x = 9, 
P(9) = 0.0458

If you'll notice (because P(x) is also log(x+1)-log(x), the probabilities are proportional to the difference between x and x+1 on a logarithmic scale. Most importantly, this is the distribution expected if not the numbers but the coefficient or the significant digits of the logs are uniformly and randomly distributed.

Before you dismiss this on account of triviality, naturally generated data will roughly exhibit signs of observable patterns - test a set out for yourself. Greece failed the Benford test apparently, as did Madoff. So...

Anyways, this primer's been going on too long. In a nutshell, one can use the second digit law (that includes 0 in the set and hence has a much smoother frequency distribution) to observe data sets (logically, the first digit is far too significant to be tampered with so this makes obvious sense).

Two economists at ANZ did this with Chinese data and found that the nominal GDP data was largely conforming as was the IP data (these are large numbers). However, the moment they got into rates/percentages, things went awry. Think growth rates, inflation etc. The guilty second digit showed up and zero was observed to be occurring far too frequently for Benford's liking as did 1,2,3,4 while 7,8,9 appeared lesser than expected.

Hold your horses though! How big was the sample? Quarterly growth rates from '91 to '12 so that's roughly less than 90 observation which is decent enough.

As everyone will readily admit however, statistical evidence (significant or not!) doesn't necessarily accompany a tinkering of data and numbers. But it is a bit odd because raw and nominal production/growth numbers tell a different, more standard and expected story. 

No comments:

Post a Comment