Benford’s Law and charity data

If like me you’ve read recent columns by the fantastic Ben Goldacre and Tim Harford, you’ll have heard of Benford’s law. This is a strange mathematical formula that applies to large sets of numbers. You might expect that in a given set of numbers, each of the nine digits is equally likely to be the first digit in a number. But actually Benford’s Law – aka the “first-digit law” – says that the digits aren’t equally likely to be the first digit – “1” is the most likely first digit, “2” is second most likely, etc. The law can predict roughly how often we’d expect to see a digit as the first digit of a number – it says that 1 is the first digit 30% of the time. Wikipedia gives a much better explanation than mine!

As I spend my days immersed in data about charities, I thought I’d have a look at whether Benford’s Law works for charity financial data. The data is the right kind of data – it varies over a large magnitude (so charities can have incomes ranging from zero to hundreds of millions of pounds) – and we’ve got a lot of it – nearly 1.3 million data points in the set I used.

I looked at the detailed financial information that larger charities give to the Charity Commission –the “part b” of the annual return. This data contains nearly 40 variables looking at aspects of an organisation’s finances – income, expenditure, assets, etc. I took each of these variables in turn, and worked out for each of the 35,000 returned part b forms what the first digit was.

benford

The results are shown in the chart above, and fit Benford’s Law almost perfectly. Each column shows the proportion of values that start with each digit, and the black lines show the expected values using Benford’s Law. I’ve excluded values of zero.

For a couple of variables – notably the “totals” of income and expenditure – the results seem off, with “4” and “5” being overrepresented compared to what the law would suggest. This is a result of the thresholds for submitting data – charities with incomes under £500,000 aren’t required to submit the part B, so there is a natural gathering at numbers starting with a 5.

One application of Benford’s Law is using it to help detect fraud – when humans invent numbers they tend to do so in a way that looks random to them, but doesn’t comply with Benford’s Law. This exercise is just a bit of fun, but I’m glad that the results here show that the charity sector isn’t lying in its annual returns on a massive scale – phew!

Seeing Benford’s Law work in practice is quite impressive – it seems incredible when you first hear of it. But I’m glad that after applying it I can have a little more confidence in the data we use to construct the Almanac.

This entry was posted in Policy, Research and tagged . Bookmark the permalink.

Like this? Read more

David Kane David Kane was formerly NCVO’s Senior Research Officer. He discusses open data and emerging trends in the voluntary and community sector and wider civil society.

Comments are closed.