Don't search databases that are too big!

When you search against a database, how big is that database? Do you always search really big ones, or small local ones? If it depends, what does it depend on?

In this project, I worked with a mathematics undergraduate student at Indiana University to ask the question of whether bigger is always better. In principle, bigger seems better, because it increases the chances that your suspect will be in the database. However, as databases have been getting bigger, examiners are noticing that the number of close non-mated impressions is increasing.

We modeled this effect, looking at a tradeoff that can occur between two forces:

Larger databases increase the chances of containing the suspect, and this goes up linearly with the size of the database (i.e., a database that is twice as big is twice as likely to contain the suspect, all other things equal).
Larger databases increase the chance of a close non-match, which in the extreme could lead to an erroneous identification, and certainly decrease productivity as you work to eliminate the close non-match from contention. This effect also grows with the size of the database, but in complex ways that we ended up modeling using extreme value statistics. These help explain how the number of close non-matching prints will increase as the database increases in size.

There are a bunch of assumptions in the model, and a bunch of different conclusions. However, here is one surprising result: bigger is not always better. The graph above shows sensitivity, which is the tradeoff between finding the correct suspect and being fooled by a close non-match. It initially grows with small databases, but eventually it asymptotes and then starts to drop.

The whole endeavor is complicated by the fact that some crime is geographic in nature (e.g. property and person crime), while other crime is not (financial and internet). So a complete analysis requires some thought about whether the crime you are investigating is likely to be geographic in nature (see, not all external information is biasing, and some is actually important).

The bottom line, I think, is this:

Only search the database that you think reasonably contains your suspect

If you go larger, you really run the risk of being fooled by a close non-match.

You can read more by downloading the [pdf](../../publications/PDFS/Law, Probability and Risk-2014-Busey-151-68.pdf).

Don't search databases that are too big!

Help us conduct research that matters to the Forensic Community.