It looks like you're using an Ad Blocker.
Please white-list or disable AboveTopSecret.com in your ad-blocking tool.
Some features of ATS will be disabled while you continue to use an ad-blocker.
Botched attempt to scrub data reveals driver details for 173 million taxi trips.
City officials released the data in response to a public records request and specifically obscured the drivers' hack license numbers and medallion numbers. Rather than including those numbers in plaintext, the 20 gigabyte file contained one-way cryptographic hashes using the MD5 algorithm. Instead of a record showing medallion number 9Y99 or hack number 5296319
It turns out there's a significant flaw in the approach. Because both the medallion and hack numbers are structured in predictable patterns, it was trivial to run all possible iterations through the same MD5 algorithm and then compare the output to the data contained in the 20GB file. Software developer Vijay Pandurangan did just that, and in less than two hours he had completely de-anonymized all 173 million entries.
Taxi license numbers are always six-digit numbers or seven-digit numbers that begin with a five. That makes for a maximum of two million possible numbers, a sum that takes a matter of seconds to exhaust using programming rules built into cracking apps such as Hashcat. Medallion numbers similarly conform to specific patterns that make for a total of only 22 million possible combinations.