FOSDEM Report - Saturday

Tue Feb 7 08:27:59 EST 2012

(I'm copying this to the cyrus-devel list because it's of interest
 to Cyrus people too...)

This is getting really long, and I want a break from typing - so here's
the Saturday part of my FOSDEM report.  I'll type up Sunday tomorrow.

FOSDEM 2012 REPORT
==================

My main goal was to meet other people working in email related areas
and discuss a replacement for IMAP.

tl;dr - it was hard to write, it should be hard to read.

Seriously though, skim for the ============= bits and check if the
headline sounds interesting.  I've tried to keep things grouped by
talk/topic.

Saturday is mainly email related stuff.

Sunday I alternated between mysql talks, saw some good stuff about
monitoring, and went to a bit about fostering communities.

Monday is pretty short, and all Cyrus.

Saturday 10:30 - Welcome:
=========================

http://fosdem.org/2012/schedule/event/keynotes_welcome

Fosdem has been hosted at the same place (ULB) for 12 years now - there
were some rocky periods, but they university is now a strong supporter
of the conference and what it stands for.  There was a talk by the
university administration about how glad they are to have it there.

Paraphrasing slightly:

"Today, every University pays public money to private corporations for
 the fruits of research which was performed in universities in the
 first place - money which could be used for new research"

There are 420 talks, 273 hours of scheduled content.  You can't see it
all!  As much as possible will be videoed.  Call for volunteers to help
with the filming.

This year was an extra building and extra large room, making two huge
lecture theatres.  I heard many people saying how much less crowded it
felt than last year.  There was also a second coffee bar open, which
had espresso!  <==== Cosimo, see here.

And thanks to all the sponsors, w00t.

Saturday 11:00 -> 2:20 - "Mail Track"
=====================================

The "mail track" - in a coffee shop and then over lunch ;)
I met Jeroen from Kolab and Alexey from Isode.  We sat and talked for a
while about requirements for an IMAP replacement, and also where we're
planning to take Cyrus development process.

Cyrus process:

- Gerrit?  Some sort of code review process to make it easier to keep
  track of the work from drive-by contributers.
- Jenkins and Cassandane.  Thanks Greg, awesome to have this happening.
- Bugzilla - use it for everything.  If it doesn't have a bug number where
  discussion took place, it doesn't get accepted.  This is a major workflow
  change, and is probably more of a challenge for me than anyone else.  Need
  to talk to Ken as well.
- Need to clean up bugzilla.  First anything with 3 years of no activitity
  to be closed with a special tag set so we can find them quickly again.
  Then commit to keeping it tidy.
- Websites in git.
- Release process - simplify to save the repeated typing involved.  I wind
  up writing the changelog, the website release note and the email release
  note, plus manually changing a bunch of things in the website PHP every
  time I do a release.
- Jeroen to do all future releases on 2.4, and handle the backport/releng
  more.  It's supposed to be his role after all, I just stole it.

New Features - Conversations, XMOVE, etc:

- Alexey is willing to help us with the standardisation process if we want to
  push for conversations to become more standard.
- Possibly standardise some media-type for the calendaring stuff for Kolab
  (see below) - and for the DAV stuff from Ken too.  More NEED TO TALK TO
  KEN :)

Special-Use:

- Long discussion here.  Kolab currently store a custom "usage" annotation,
  but no other client knows to look for it.  There are two axes:
  - 1) "this is MY \Sent folder"
  - 2) "this is a \Calendar folder - it contains calendar entries as
       encoded attachments.  If I share it, other users should see that"
- so we probably have to extend special use for private and shared, and make
  sure there's a defined priority order if both exist.  Also, there may be
  more than one "special use" on the same folder - both \Calendar and a
  personal \DefaultCalendar (or something)

Attachment De-duplication:

- it's a stinky wormhole of trading off massive access complexity and
  performance risks for a bit of disk space saving.  Don't go there.

E-Discovery, deletion controls:

- Kolab are planning to use the "msg bus" stuff from Worldline to have a
  listener that collects data for e-discovery.  Kind of a "cyrus watcher".

ActiveSync:

- MetaWays have a very good open-source ActiveSync stack:
  http://www.h-online.com/open/news/item/Tine-2-0-supports-ActiveSync-740315.html

Spam Reporting via IMAP:

- Alexey mentioned that there's talk of adding a command to IMAP to report
  a message as spam/non-spam rather than setting flags.  This would be used
  to actually take action based on the report.
- Google and Yahoo are both involved in this effort.

Community:
- there is at least a community of IMAP Server implementors.  There isn't
  really one for IMAP Client implementors.  They just do their own thing,
  often just by looking at protocol traces, certainly not bothering to
  understand the entire RFC stack.

Saturday 2:20 - Dovecot (Timo Sirainen)
=======================

http://fosdem.org/2012/schedule/event/dovecot

- Timo wrote irssi (cheers from the packed theatre) in 1999, then started
  making Dovecot in 2002.
- In 2006, full time Dovecot
- In 2011, started a company to support dovecot.

Dovecot is an IMAP/POP3/LMTP server, not a SMTP server.  Out of scope.

It's very flexible, you can create plugins to do all sorts of stuff.  Demo
later.

Dovecot can be used as an "IMAP Adaptor"/proxy - it supports all sorts of
different backends.  Some examples, XMLMAP - IMAP syntax wrapped in XML.

"dmh" - dovecot implementation of "mh" using the admin commands.  109 line
shell script which replicates the old "mh" interface.  Full UTF-8 support.

Gmail backend - uses your gmail account as a 'mailbox'.
- not offline - needs a connection.

Dovecot can also be used as a library for a client:

* indexed mailboxes
* cached headers/metadata
* full text search
* complex search queries
* parsing and decoding of messages

Filter scripts - demoed an "auto-gpg-decrypt" filter, which decrypted the body
when it was fetched.

Demoed the dmh client talking to gmail (somebody in the audience sent a hello
email during the talk!)

Questions:
- disconnected imap?  No, gmail connector needs a link
- migration?  (plus in as a proxy, full sync, disconnect) - not really yet.
- replication? (there's dsync, dovecot synchroniser)
  - one user at a time
  - not real-time automated yet
- full text - is there a standard?
  - Timo helped write the "Fuzzy Search" extention to IMAP.

===> by now the talk had gone out into the hallway, and became a repeat of the
the "mail track" stuff before.  We had all those people, plus Jan from Trojita
and of course Timo from Dovecot.

Lots more talk about replacing IMAP.  We decided we needed another client on
board, so took ourselves off to:

Saturday 3:30 - Thunderbird (Ludovic Hirlimann)
===========================

http://fosdem.org/2012/schedule/event/thunderbird

Started with "seems like every year I get up here and promise stuff, and then
the next year I get up here and say - didn't do X, Y, Z, but we had other cool
stuff" - so I'm not going to promise much.

Versions:
 3.0 => 3.1 => 5, 6, 7, 8, 9, and now 10.

3.0 had lots of new stuff, lots of issues.  3.1 was mostly fixing bugs or
reverting behaviour from 3.0.

5=>9 no real user visible changes.  Plugin authors saw stuff, but the interface
has stayed the same.  Email users, even more than web browser users, really
care about the interface being the same (Alexey complained to me that they
took away "forward as attachment" - and now you can't report issues with the
mime structure of a message from TB any more).

V 10:

- much better attachment handling - can do it all from the keyboard now.
- "search for selection" - select text, right click, open a new tab with
  a google search from within Thunderbird.

Lots of stuff has been experimented with in the last year, but not much
actually delivered in builds.

Things underway:
- big files - basically upload large attachments to a 3rd party provider and
  send a link rather than attaching to the email.  Personally I think this is
  a stupid idea, because you lose the attachment right when the third party
  goes offline.  *sigh*.  Anyway.
- IM integration - chat, tweet, etc from within Thunderbird.  Basically
  integrate code from the instabird project.
- Address book

Address book is a big one:
- unchanged since 1998.
- redesign is being discussed in the open
- unlikely to be finished this year
- need to integrate with local address book on each platform (win, mac, android)
- people have a lot more email addresses, chat systems, multiple phone numbers,
  etc.  Need to support that in the data model.

Enterprise:

- lots of enterprise users.  First thing they do is change it so it doesn't
  report numbers, but the estimate is that there are as many Enterprise locked
  down users in France as there are home users.
- Need to support their IT departments as well.
- they have different needs
- ESR - enterprise long term support releases of Thunderbird along with Firefox
- need to move them off 2.0/3.1 at some point
- separate mailing list for enterprise issues, and there is a "deployment
  guide" now to help Enterprises

Team:

- Thunderbird is ~15 people, 4-5 of them part-time.
- want to encourage more community development
- setting up an "up for grabs" list of small issues for people to work on
- make it easier to join and code

Questions:

- 2 years ago you seemed quite depressed standing up here - small team, no
  real direction.  Is it better now?  Answer: Yes.  Much better.  Now we're
  part of the same organisation grouping as Firefox rather than off to the
  side.  Better support.

I grabbed Ludovic for a few minutes afterwards and outlined our plans for
a new mail protocol.  He will raise it at their team meeting next week, and
start a discussion about it on their mailing lists (this is already done, I
have joined the tb-planning mailing list)

=========================================================================

At this point, it was time to put mail away for a bit.  I promised to write
an email to all interested parties that night, and went to see some other
talks!

=========================================================================

Saturday 16:00 - Infrastructure as Open Source: (Ryan Lane)
===============================================

I arrived half way through this talk.  It's about how Wikimedia run their
servers.  I think Greg went to much the same talk in Australia not long
ago, and wrote up pretty good notes.  I'll just do some highlights.

- automated code review to production (gerrit)
- puppet configs build nagios tests - when they add a new machine, there's
  already a warning that it's not up before the build finishes!

The operations bottleneck shifts a bit - to educating people on how to do
things for themselves, and to code review.  Also, giving root is dangerous,
there's a lot of monitoring to make sure bad actors aren't sneaking stuff
on the systems.

RISK: easy to social engineer.  Need to be on the lookout:
- audit infrastructure, make sure nothing unexpected is running
- look for patterns in:
  * bandwidth use
  * disk space use
  * protocols being used
- mostly trust up front, but watch - and inspect if something looks "strange"

The checking part can be outsourced to trusted people, even if they aren't
experienced programmers/sysadmins.

Questions:
- Why GlusterFS, not Sheepdog or Ceph. Was most stable/feature complete.
  Worked.
- Why Gerrit?  Supported automatic "gated trunk", Fabrication (for example)
  did not.

Saturday 17:30 - Btrfs and Snapper: (Arvin Schnell)
===================================

http://fosdem.org/2012/schedule/event/btrfs_snapper

Arvin works for SuSE - Snapper is a snapshot manager over the top of btrfs
(or ext4, but it needs kernel patches)

Background on features of btrfs:

- COW (data & meta)
- checksums, scrubbing
- multi device support
- online resize and defrag
- subvolumes/snapshots
- SSD/TRIM support
- In-place conversion from ext3/4.

Multi-device is not mirroring, the devices can be different sizes.  It just
makes sure each block exists on n drives.

TRIM can be on-the-fly or batched cleanup

In-place conversion is done as a snapshot, so you can always revert if
you're not happy with btrfs!

btrfs roadmap:
- send-receive tool
- tiered storage
- data dedup
- just don't ask me about fsck.btrfs!  I don't know when it will be done.

Q: encryption?
- not planned

Subvolumes:
- separate internal FS root
- shares the same "storage pool"
- looks like a directory in the root of the FS when mounted regularly
- can mount any snapshot separately anywhere
- very cheap to create - one meta-data write only, regardless of the
  amount of data on the FS.
- can be read-write or read only

DEMO:
- created a FS, created a snapshot, showed mounting just the snapshot.
- snapshot is not recursive, showed that an empty directory existed where
  the old snapshot was mounted.

Two ways to "restore" from a snapshot:
- 1) rebooot, mount old snapshot with an explicity fstab modification
- 2) copy files from old snapshot to "current"

Reboot is very fast (except the actual rebooting), but is full system only.

Copy can be a bit slow, but lets you choose which files to recover.  Can use
clone, which is faster.  Clone support is integrated in regular shell 'cp'
now (on OpenSuSE at least)

Snapshots:

- YAST takes snapshots before and after making any change.
- automatic snapshots every hour
- cleanup by cron - first snapshot of every day/week/month/year kept.
- filter: don't revert /var/log, /var/spool, /srv, /etc/mtab, etc.
- /boot is not on btrfs, need to be a bit careful on kernel upgrades.

SNAPPER:

- cmdline + YAST2 tool
- separate config for each volume
- automatically configured if you choose btrfs during install (SuSE of course)
- C++ library to integrate into your own tools
- everything can be done with the command line

DEMO - lots of very cool stuff... added a user with YAST - reverted the add.
File system cleaned up and all.

FUTURE:

- integrate more with GUI file manager - see older copies of this directory,
  restore files.
- DBUS interface
- non-root user support?  In their home dir.  Tricky, would require a subvolume
  per user directory.
- do cleanup based on free space rather than time-based.

Questions:

- auto-versioning?  not supported now - maybe use 'incron', run commands on
  inode events.
- port to other distros?  Shouldn't be hard.
- where's the penalty?  Disk space, snapshots take space relative to how much
  has changed since it was taken.

My impression - this is very cool.  It's not "perfect", particularly
the YAST stuff doesn't only capture changes made by YAST - it's all
the changes to any part of the filesystem by anything... but it's nice
enough to save people from a few things!

================================================================

And that's the end of Saturday.  In theory.  Actually we went to the
"drinking beer at Timo's hotel" track.

================================================================

Mail Protocol - initial notes: (Timo & Bron)
==============================

- Issues - folders vs tags.
- if a tag can be added/removed, need to change the UID to be compatible
  with IMAP semantics.
- 1/1 relationship Tags/Folders - stay compatible?
- GUIDs for Folders as well, detect renames.  Dovecot and Cyrus both keep
  a GUID for each folder internally.
- MSGNO/UID/Uidvalidity/etc?  What do we need to keep?  Ordering properties
  are nice.  Definitely 64 bit everything.
- single modseq counter per user?  What about shared folders.  Need to get
  some statistics.

WISHLIST:
- simple enough that what client authors "expect" just works.  If they
  get confused, it's our fault.
- UTF8 everything.
- No Heirarchy Separators!
- GUID on every message
- Search + Action - a pipeline of events.  Group actions together.
- Transactional?  Maybe.
- Stateless
- Itempotent

MAPI/ActiveSync?
- SOGO+OpenChange - http://www.openchange.org/ - open source Exchange
  replacement.  Looking really good.

and then we drank a lot.

I wrote all this up as an email to the interested people from the day when
I got home at about 1am.

On Tue, Feb 7, 2012, at 01:56 PM, Bron Gondwana wrote:
> Mail Protocol - initial notes: (Timo & Bron)
> [...]
> I wrote all this up as an email to the interested people from the day when
> I got home at about 1am.

[... one extra bit I added later ...]

Here's the big thing I missed: "configuration".  Special-Use is kind
of configuration data - but there may be other things which are in
scope here.  Particularly the "how spam gets reported it".

We should probably put this all in a wiki somewhere where we can each
add the stuff _we_ need.  If we can get Dovecot and Cyrus doing the
server side, and Thunderbird/KDE/Evolution on the client side (plus
Opera's M2 of course), then I think we have critical mass to show
things are better.

Timo and I agreed last night that it's better to take the bulk of the
complexity load on the server side, and make sure we have good tests
for servers - the client authors should be able to look at a protocol
dump and wind up creating a client that does the right thing.  There
will be a lot more clients than servers.

[... here is the original ...]

This is something I've put together following our discussions today.
Regular warning that I've had a bit to drink, so the list probably
needs revising.  The important thing is that some bits may require
tradeoffs, and we need to make sure they're at least "reasonable"
if not perfect!

This is kind of an attempt at a summary of what I've discussed with
everyone today at Fosdem - a potential replacement protocol for
"IMAP4 + some random assortment of extentions"

The eventual goal here is "be compelling enough that the effort
required to change is worthwhile".  This includes server
implementors, client implementors, and end users.  If we're
going to make an open protocol to replace IMAP, it had better
give everyone a reason to switch.

Requirements:
* client and server should be able to present existing interfaces/
  talk to existing storage with minimal changes and without breaking
  existing semantics.  You switch to this protocol and you get
  benefits, yay.  Not downsides/caveats.
* A few implementations - more than one client / server so we don't
  get too specific to how one system does things.
* Decent test suite for server implementations, and test plan for
  clients - "X happens on server, you must present world-state Y
  on action Z" - so implementors can be sure of the quality of their
  implementations.  Not just a wordy spec.

Some areas of interest:

LIST/LSUB/Special-Use:

* Initial setup - detect the correct folders in clients
* Changes - create, delete, "currently selected folder is deleted".
  Inform the client or allow them to find out about changes cheaply.
* Wider question: folders vs "tags".  "Select" vs filters.
* Namespaces?
* LIST syntax - major cause of bugs in both Cyrus and Dovecot -
  complex code.  The problem isn't that complicated, we must be
  doing something wrong.
* "special folders" containing extra data like Calendar entries,
  etc - not just normal emails.

Message management:
* MOVE vs "COPY + expunge".
* Deletion - in-place set "\Deleted" flag vs copy to trash.
* Undo.
* Efficiently detect actions taken by another client.  If another client
  copies 100,000 messages from Inbox to Archive, don't have to re-download.
  GUID or similar.
* Batching / pipelines.  SEARCH + MARK FLAGGED + MOVE TO ANOTHER MAILBOX
  - basically, "lego blocks" vs "pre-defined"
  - leads on to;

Transactions:
* tell the server all the things you want: pipeline vs combined command.
* combined command: can posix_fadvise interest in all the necessary files,
  combine locks.
* Question: do we want full transactional semantics?  Great for clients,
  hard for servers.  What sort of changes can you make part of a
  transactions?  Folder rename plus expunge plus ???

Implicit vs Explicit:
* FETCH BODY - may set seen flag - need to check for side effects.
* SELECT "UNSEEN X" response - have to calculate even if client doesn't
  care.  A lot of the worst parts of IMAP come from implicit magic that
  helped a use-case nobody wants any more.
* My analogy - "a star-fighter lego kit" rather than "lego blocks".  If
  you want a star fighter, great.  If not, you wind up building the
  ugliest looking set of mis-shapen blocks, because they're not as
  generic as they should be.

Stateless Operation:
* Phones / poorly connected devices
* Power usage considerations.
* Dropped connections/changing IPs
* High availability/failover.
* With HTTP (at FastMail) we mark a server down and wait 15 seconds, then
  watch for TCP disconnections.  Frontend has moved traffic elsewhere, ZERO
  user-visible outage.  With IMAP, connections dropped, visible.  Silent
  reconnection and "stateless" simplifies many things.

Notifications:
* Able to easily receive notifications about ALL changes of interest,
  emails / folders / whatever.
* Notifications still work if connection disconnected (see above)
* Compatible with out-of-band notification to do cheap resync (use OS
  remote notification system in case of phone, etc) - if present.  Even
  SMS.

Bandwidth-wise:
* Don't waste bandwidth stupidly.

A lot of these is the CISC vs RISC debate.  I believe it's better to
compose your messages from client to server and server to client out
of groups of small "lego bricks" each of which expresses one thing
succinctly rather than pre-formed "fighter wing" shapes.  The biggest
lack I see in the current email landscape is that that IMAP clients
wind up doing convoluted things to support all the possible combinations
of multiple RFCs out there, or just giving up and supporting a very
simple profile, because that way they don't need multiple codepaths.

If we can agree on a more expressive communication protocol between
the clients and servers, I don't think we need to change the model
at either end very far.

The goal, again, is to provide a complelling _experience_ for users
of this protocol - that accessing your email via this protocol is a
better, more reliable, more predictable experience.  That's how we
win hearts and minds.

Thanks,

Bron.
-- 
  Bron Gondwana
  brong at fastmail.fm