FastMail.FM patchset - new patches
Rob Mueller
robm at fastmail.fm
Thu Mar 15 07:52:38 EST 2007
> Is it safe? - we calulated that with one billion messages you have a one
> in 1 billion chance of a birthday collision (two random messages with
> the same UUID). They then have to get in the same MAILBOXES collection
> to sync_client to affect each other anyway.
>
> Isn't the case: UUIDs span all MAILBOXES and APPEND event until a restart.
> If a UUID appears in one event and then is referenced by a second event
> some minutes later then the first message seen will be reused.
Well, it's half true.
> They then have to get in the same MAILBOXES collection
> to sync_client to affect each other anyway.
May not be true, but:
> Is it safe? - we calulated that with one billion messages you have a one
> in 1 billion chance of a birthday collision (two random messages with
> the same UUID).
Is true.
Basically the chance of collision is the same as looking at the birthday
problem.
http://en.wikipedia.org/wiki/Birthday_paradox
md5 distributes an input set over a uniform distribution of 2^128 numbers.
By truncating to 11 bytes, we're distributing evenly over 2^88 different
numbers.
So from the page above "given n random integers drawn from a discrete
uniform distribution with range [1,d], what is the probability p(n;d) that
at least two numbers are the same?", we have an approximation formula given.
Lets try some different message counts:
$ perl -e 'for ($n=10**5;$n<=10**9;$n*=10) { $d=2**(8*11); print $n, " - ",
1-exp(-$n*($n-1)/(2*$d)),"\n"; }'
100000 - 0
1000000 - 1.66533453693773e-15
10000000 - 1.6153745008296e-13
100000000 - 1.61558544320428e-11
1000000000 - 1.61558710853882e-09
So even with 1 billion messages, the chance of a *collision* is still about
1 in a billion.
Most of our stores have on the order of 1-10 million messages, I doubt
anyone has a billion messages making the chance of collision much, much
smaller again.
Rob
More information about the Info-cyrus
mailing list