FastMail.FM patchset - new patches

Rob Mueller robm at fastmail.fm
Thu Mar 15 07:52:38 EST 2007


>   Is it safe? - we calulated that with one billion messages you have a one
>   in 1 billion chance of a birthday collision (two random messages with
>   the same UUID). They then have to get in the same MAILBOXES collection
>   to sync_client to affect each other anyway.
>
> Isn't the case: UUIDs span all MAILBOXES and APPEND event until a restart. 
> If a UUID appears in one event and then is referenced by a second event 
> some minutes later then the first message seen will be reused.

Well, it's half true.

> They then have to get in the same MAILBOXES collection
>   to sync_client to affect each other anyway.

May not be true, but:

>   Is it safe? - we calulated that with one billion messages you have a one
>   in 1 billion chance of a birthday collision (two random messages with
>   the same UUID).

Is true.

Basically the chance of collision is the same as looking at the birthday 
problem.

http://en.wikipedia.org/wiki/Birthday_paradox

md5 distributes an input set over a uniform distribution of 2^128 numbers. 
By truncating to 11 bytes, we're distributing evenly over 2^88 different 
numbers.

So from the page above "given n random integers drawn from a discrete 
uniform distribution with range [1,d], what is the probability p(n;d) that 
at least two numbers are the same?", we have an approximation formula given.

Lets try some different message counts:

$ perl -e 'for ($n=10**5;$n<=10**9;$n*=10) { $d=2**(8*11); print $n, " - ", 
1-exp(-$n*($n-1)/(2*$d)),"\n"; }'
100000 - 0
1000000 - 1.66533453693773e-15
10000000 - 1.6153745008296e-13
100000000 - 1.61558544320428e-11
1000000000 - 1.61558710853882e-09

So even with 1 billion messages, the chance of a *collision* is still about 
1 in a billion.

Most of our stores have on the order of 1-10 million messages, I doubt 
anyone has a billion messages making the chance of collision much, much 
smaller again.

Rob



More information about the Info-cyrus mailing list