Design for max hash size given N-digit numerical input and collision related target -
assume hacker obtains data set of stored hashes, salts, pepper, , algorithm , has access unlimited computing resources. wish determine max hash size certainty of determining original input string nominally equal target certainty percentage.
constraints:
the input string limited 8 numeric characters uniformly distributed. there no inter-digit relation such checksum digit.
the target nominal certainty percentage 1%.
assume hashing function uniform.
what maximum hash size in bytes there nominally 100 (i.e. 1% certainty) 8-digit values compute same hash? should possible generalize n numerical digits , x% accepted answer.
please include whether there issues using first n bytes of standard 20 byte sha1 acceptable implementation.
it recognized this approach increase susceptibility brute force attack increasing possible "correct" answers there design trade off , additional measures may required (time delays, multiple validation stages, etc).
it appears want ensure collisions, idea if hacker obtained everything, such it's assumed can brute force hashed values, not end original values, set of possible original values each hashed value.
you achieve executing precursor step before normal cryptographic hashing. precursor step folds set of possible values smaller set of possible values. can accomplished variety of means. basically, applying initial hash function on input values. using modulo arithmetic described below simple variety of hash function. other types of hash functions used.
if have 8 digit original strings, there 100,000,000 possible values: 00000000 - 99999999. ensure 100 original values hash same thing, need map them space of 1,000,000 values. simplest way convert strings integers, perform modulo 1,000,000 operation , convert string. having done following values hash same bucket: 00000000, 01000000, 02000000, ....
the problem hacker not know 100 values hashed value be, know surety 6 of 8 digits are. if real life variability of digits in actual values being hashed not uniform on positions, hacker use around you're trying do.
because of that, better choose modulo value such full range of digits represented evenly every character position within set of values map same hashed value.
if different regions of original string have more variability other regions, want adjust that, since static regions easier guess anyway. part hacker want highly variable part can't guess. breaking 8 digits regions, can perform pre-hash separately on each region, modulo values chosen vary degree of collisions per region.
as example break 8 digits 000-000-00. prehash convert each region separate value, perform modulo, on each, concatenate them 8 digit string, , normal hashing on that. in example, given input of "12345678", 123 % 139, 456 % 149, , 78 % 47 produces 123 009 31. there 139*149*47 = 973,417 possible results pre-hash. so, there 103 original values map each output value. give idea of how ends working, following 3 digit original values in first region map same value of 000: 000, 139, 278, 417, 556, 695, 834, 973. made on fly example, i'm not recommending these choices of regions , modulo values.
if hacker got everything, including source code, , brute forced all, end values produced pre-hash. particular hashed value, know that 1 of around 100 possible values. know possible values, wouldn't know of original value produced hashed value.
you should think hard before going route. i'm wary of departs standard, accepted cryptographic recommendations.
Comments
Post a Comment