De-duping attachments

Simon Matter simon.matter at invoca.ch
Wed Sep 15 04:01:18 EDT 2010


> On Wed, Sep 15, 2010 at 09:15:13AM +0530, Shuvam Misra wrote:
>> Dear Bron,
>>
>> > http://www.newegg.com/Product/Product.aspx?Item=N82E16822148413
>> >
>> > 2TB - US $109.
>>
>> Don't want to nit-pick here, but the effective price we pay is about
>> ten times this.
>
> Yeah, so?  It's going down.  That's a large number of attachments
> we're talking about there.
>
>> To set up a mail server with a few TB of disk space,
>> we usually land up deploying a separate chassis with RAID controllers
>> and
>> a RAID array, with FC connections from servers, etc, etc.  All this adds
>> up to about $1,000/TB of usable space if you're using something like the
>> "low-end" IBM DS3400 box or Dell/EMC equivalent. This is even with
>> inexpensive 7200RPM SATA-II drives, not 15KRPM SAS drives.
>
> Hmm... our storage units with metadata on SSD come in about $1200/TB.
> Yes, that sounds about right.  That's including hot spares, RAID1 on
> everything (including the SSDs), scads of processor and memory.
> Obviously multiply that by two for replication, and add in a bit of
> extra for backups and I'm happy to arrive at a figure of approximately
> $3000 per terabyte of actual email.
>
>> And most of our customers actually double this cost because they keep
>> two
>> physically identical chassis for redundancy. (We recommend this too,
>> because we can't trust a single RAID 5 array to withstand controller or
>> PSU failures.) In that case, it's $2000/TB.
>
> And because it's nice not to have downtime when you're doing
> maintainence.  I replaced an entire drive unit today, including
> about 4 hours downtime on one of our servers as the system was
> swamped with IO creating new filesystems and initialising the
> drives.   The users didn't see a thing, and repliation is now
> fully operational again.
>
>> And you do reach 5-10 TB of email store quite rapidly --- our company
>> has many corporate clients (< 500 email users) whose IMAP store has
>> reached 4TB. No one wants to enforce disk quotas (corporate policy),
>> and most users don't want to delete emails on their own.
>
> So you save, what, 50%.  Does that sound about right?  Do you have
> statistics on how much space you'd save with this theoretical
> patch?
>
>> We keep hearing the logic that storage is cheap, and stories of cloud
>> storage through Amazon, unlimited mailboxes on Gmail, are reinforcing
>> the belief. But at the ground level in mid-market corporate IT budgets,
>> storage costs in data centres (as against inside desktops) are still
>> too high to be trivial, and their prices have only little to do with
>> the prices of raw SATA-II drives. A fully-loaded DS3400 costs a little
>> over $12,000 in India, with a full set of 1TB SATA-II drives from IBM,
>> but even with high cost of IBM drives, the drives themselves contribute
>> less than 30% of the total cost.
>
> You're buying a few months.  Usage grows to fill the available storage,
> whatever it is.  And you can only pull this piece of magic once.
>
>> If we really want to put our collective money where our mouth is, and
>> deliver the storage-is-cheap promise at the ground level, we need to
>> rearchitect every file server and IMAP server to work in map-reduce mode
>> and use disks inside desktops. Anyone game for this project? :)
>
> You could buy as much benefit much more quickly by gzipping the
> individual email files.  Either a filesystem that stores files
> compressed, or a cyrus patch to do that and unpack files on the
> fly if the body was read.  Along with most/all headers in the

I guess much more efficient than a compressing filesystem would be a
compressing and de-duping filesystem or disk storage in this case. Has
anyone tried this with a Cyrus message store with lots of "corporate
message data" stored on it?

Simon



More information about the Info-cyrus mailing list