Benford’s Law compares the frequency distribution of leading digits to its (empirically proven) natural counterpart. This can then be used to detect fraud and errors.
The green columns show how often each number should be the first digit in numbers that should follow the Benford-distribution. In black you’ll see the actual distribution of first digits within my table. Lastly, the red line shows the percental absolute deviations between actual and Benford values.
In this example, there is a relatively high occurrence of numbers starting with 4 and 5. So this could be a sign for fraudulent manipulations.
First you need a table with the Benford-distribution. Just load it as a disconnected table to your data model and name it “BenfordTable”. The “Value”-field from this table will be taken as x-axis for the visualisations. As the Benford-distribution is logarithmic, it can quickly be created with the following DAX-code:
BenfordTable = ADDCOLUMNS ( GENERATESERIES ( 1, 9 ), "BenfordDistribution", LOG10 ( 1 + ( 1 / [Value] ) ) ) |
The measure calculates how often a number starts with one of the BenfordNumbers (1..9) compared to the total number of rows in the FactTable.
BenfordMeasure = VAR BenfordNumber = MAX ( BenfordTable[Value] ) VAR CountBenfordNumber = CALCULATE ( COUNTROWS ( FactTable ), LEFT ( FactTable[Value], 1 ) * 1 = BenfordNumber ) VAR CountTotal = COUNTROWS ( FactTable ) RETURN DIVIDE ( CountBenfordNumber, CountTotal ) |
If there are blank in the Value-column of the table to be analyzed, the measure has to be adjusted by filtering them out in the VAR CountTotal: ( CALCULATE(COUNTROWS(FactTable), FactTable[Value] <> BLANK()) )
Please note that you can create as many measures as you need in one model. So if you have multiple columns to investigate, just write a measure for each.
To calculate the difference between target and actual, I use a MAXX-aggregation. This returns the maximum difference there is for a number. I also use this in a card visual if I want to add a data driven alert. So I don’t have to check and eyeball the chart regularly, but can just use this in a card visual in a dashboard. Then I’ll set a threshold value for the alert and will not miss any alarming developments.
Deviation % = MAXX ( ADDCOLUMNS ( BenfordTable, "Diff", ABS ( [BenfordMeasure] - BenfordTable[BenfordDistribution] ) / BenfordTable[BenfordDistribution] ), [Diff] ) |
I’m using measures here instead of a calculated column (in the Benford-Table) because this allows me to filter and slice my table. This allows for making advanced and flexible analysis like comparing different values against each other or over time against the Benford distribution: