Release plan blog post

Sat Dec 31 10:47:18 EST 2016

Hi Bron, all.

The suggestion proposed by Vladislav for the *Backup* mechanism would 
simplify the operation and reduce even more the lock time, so I also 
like the idea. Though, I see there 2 possible issues. First, each admin 
should ensure the command he/she passes as a param has absolutely no way 
to block, under no circumstances. Probably not a big deal for normal 
conditions, but corner cases should be analyzed.

Then, one may have highly granular privileges for every task, so the 
process that has access to the Cyrus daemon may not have filesystem 
modification access. IMO, there is no way (even in theory) to guarantee 
bug-free development with the current tools and practices, so the only 
feasible approach to security is compartmentalization and 
defense-in-depth principle 
<https://en.wikipedia.org/wiki/Defense_in_depth_%28computing%29>.

One possible arrangement here would be for a backup script to request a 
lock from the Cyrus daemon and then to signal via some IPC mechanism 
(could be as simple as creating a file in a special folder) to some 
other script (that is running with enough privileges for FS 
modifications but has no interaction with other components) that it's OK 
to perform the requested operation (e.g. a snapshot creation).

Even more, once chroot is implemented, cyrus_sync2disk executable (as 
all other processes) could run inside it and have no way of overseeing 
the entire filesystem. So if it has some security issues, it wouldn't 
affect the entire system, just the Cyrus daemon. For all this, the 
lock/unlock interface would be needed.

Anyway, these are just interface implementation details and could be 
easily adapted to the needs of the community once the most complex part, 
the global lock mechanism, is implemented. I'll update the #1763 issue 
with this comment and we could continue the discussion there.

Do you have an ETA for this? I can't talk for the entire community, but 
at least for me this is the most awaited feature.

With respect to "*Small sysadmin tasks*", I'll add the specifics in the 
#1764 issue.

As to *Push*, I didn't know a perl pusher layer was published, could you 
please indicate where it is? What I believe would be enough here is to 
have everything up to the point where a web request to APS would be 
needed (and IMO we could start with just a single server and when 
everything is working, adapt it for the replication).

Could you please explain the different mechanisms and components 
involved in the Apple push mechanism (e.g. the daemon interaction with 
the clients (the XAPPLEPUSHSERVICE extension), daemon -> notifyd 
communication, notifyd -> perl pusher, perl pusher -> web request to 
APS) and give us the current status for each of them (e.g. implemented 
in the repository, to be implemented, under an NDA but DIY with our 
assistance and contribute back, etc.)?

I know that Apple grants access on a case-by-case basis to some 
"privileged" implementation details (like those needed for OpenVPN iOS 
client) under NDAs, but in this case I can't see what functionality 
outside the Cyrus own code could be under an NDA. Is it the web request 
functionality? Or the XAPPLEPUSHSERVICE dialog with the mail.app (but 
this part is already implemented in the v3.0, right)? If I'm not wrong, 
the push itself (i.e. the mechanism to inform the APS servers about an 
event once there's a token from the client and the certs from Apple), is 
quite well documented, it's even more simple that the PushKit for VoIP 
apps. Could you please shed some light on this topic (maybe we should 
create an issue in github to follow this discussion and to track the 
advances)?

With respect to *Security*, sure it's enormous effort to considerably 
improve it in any old project in a single move. A gradual improvement 
would be a better approach. I would suggest starting some initiatives 
like the document-the-architecture and document-the-code sub-projects 
and contribute to them whenever time and other resources permit so they 
would be completed as a puzzle, piece by piece. And once some part is 
documented, any change to its code would be first implemented as a 
change in the corresponding documentation.

There are multiple benefits that greatly outweigh the expected overhead: 
first a developer would be able to /quickly/ become aware of the 
implementation details of the part to be modified (with a side effect of 
better understanding the implications of the intended change, i.e. less 
chances of breaking something and the change itself would be more 
aligned with the overall architecture). Then, while writing the 
documentation for the intended change, the developer could realize that 
there are better ways of achieving the same objective. And once the 
change is implemented, the developer would be able to complement the 
documentation for the undocumented parts with the insights he/she just 
gained. Some sort of a circular feedback.

Another benefit (depending on the internal organization of the dev team) 
is that more experienced developers could write the documentation for 
the changes (that would hold some relation with the formal 
specification), junior members would be those implementing the changes 
according to the documentation and, once ready, the senior developers 
would perform code reviews of the modifications - so even unexperienced 
developers and newcomers in your internal team could actively 
participate in the project. And of course the community would contribute 
more as now, without enough understanding of the internals and the 
overall architecture, it's a significant effort for an occasional 
contributor to implement any change at all.

Another initiative could be to formally define the security best 
practices and guidelines for the project and to ask everyone to try to 
follow them whenever possible. If you don't have anything similar yet, 
I'll see if I can contribute a draft.

And a security audit, IMO, should be a community-sponsored initiative, 
as probably no one has enough resources to sponsor it alone. But there 
should be someone starting the initiative ;)

As to the *chroot* implementation, my idea is to document in detail the 
process initialization part (that itself could serve as a base for the 
document-the-architecture/code sub-projects) so everyone who knows it 
well could inspect the documentation and make corrections. Once we all 
agree on the current implementation details, I'd describe the proposed 
changes and others (Greg) would be able to contribute their changes too. 
Again, once we all agree on them, everyone involved would provide 
corresponding patches. Then we'd repeat the above steps for the actual 
chroot changes.

Happy New Year!

Regards,
Anatoli

*From:* Bron Gondwana Via Cyrus-devel
*Sent:* Tuesday, December 27, 2016 21:04
*To:* Cyrus-devel
*Subject:* Re: Release plan blog post

Hi,

Sorry for the delay in responding to this - I left it over Christmas so 
I could sit down without distraction and reply when I was back in the 
office.

On Sat, 24 Dec 2016, at 17:09, Anatoli via Cyrus-devel wrote:
> Hi Bron, all.
>
> Thanks for the update and for the support of the project. That's great 
> we'll see the 3.0 release soon!
>
> Replying to your last paragraph in the blog post about the community 
> needs, I believe that what's good for FM is mostly good for the 
> community too. The FM team is probably the largest operator of the 
> project and has a better view / face issues and special needs more 
> frequently than anyone else, so your vision should suit well other 
> project users too.
>
> A few areas where I see the FM needs probably don't exactly match the 
> needs of the community are the following 3.
>
> *1. **Small (SMB) deployments* with a single server and somehow 
> limited physical resources (e.g. disk space).
>
> Here as an example comes the excellent backup mechanism Ellie 
> implemented that suits well the needs of medium to large deployments, 
> but IMO that's not the best approach for small deployments, as it 
> requires a separate server or, if ran at the same server just for the 
> safe data-to-disk synchronization, twice the disk space.
>
> A better approach for small deployments, as I see it (and I believe 
> it's highly demanded by the community), would be to have an executable 
> that would instruct Cyrus daemon to synchronize to disk all the 
> internal structures and lock (stop writing to disk) for a defined 
> period. The lock could be implemented by hanging on network write 
> requests or by writing them to temporary files, or by accumulating the 
> changes in memory (the latter approach has a potential for data loss).
>
> Once the flush is performed and the lock is applied, a (custom) backup 
> script could create a snapshot of the partition that would hold the 
> Cyrus data in a safe-to-backup state. Immediately after creating the 
> snapshot, the lock would be released and the daemon would continue its 
> normal operation. Then the backup script would be able to safely 
> backup the data, e.g. create an incremental backup and upload it to 
> some external storage, then destroy the snapshot.
>
> Usage example: cyrus_sync2disk --lock=5 -> returns 0 when the data is 
> synced and a lock for 5 seconds is obtained. cyrus_sync2disk --unlock 
> -> returns 0 if the lock has been released and 1 if there was no 
> active lock (e.g a previous lock has expired), so the backup script 
> knows if it performed the required operations with the lock still in 
> place or if it should perform the lock-snapshot-unlock operation 
> again. The short timeout is to protect the daemon from an infinite 
> lock if a backup script fails to unlock it.
>

I saw the reponse to this which suggested a "run a command under 
exclusive lock' which is definitely a better approach to this.  I 
understand what you want here, and I mostly like the idea.

The one thing that gives me pause is that it requires a single lock 
against ALL cyrus processes.  Right now, there's no global lock that 
processes take while making changes, and we'd need to add one.  I would 
want to make it be something that needs to be turned on in config so 
that sites which DON'T need it don't have to pay the extra locking cost.

But the design is definitely viable. I want to do some other things with 
locking as well, like a single global lock for moves between users, 
renames, etc - so that we don't have lock ordering issues with those things.

https://github.com/cyrusimap/cyrus-imapd/issues/1763

>
> *2. Small sysadmin tasks* for typical configurations that now require 
> manual actions or writing one's own scripts. An example: new mailbox 
> creation with particular flags (\Sent, \Junk, \Trash) set for 
> special-use folders (that could be implemented as an extended 
> functionality of the autocreate_inbox_folders option).
>
> At FM you have everything automated for sure with your own customs 
> scripts, but sysadmins with little experience with Cyrus or those that 
> don't write scripts with ease would find some tasks difficult to 
> accomplish, for others that's just an overhead/additional points of 
> failure that could be avoided with small built-in automations.
>

This is a definitely interesting area for enhancement.  The basic tool 
here is cyradm, and I think what we're really looking for is extending 
cyradm.

https://github.com/cyrusimap/cyrus-imapd/issues/1764

I'd love some more specific details here, including test plans ideally 
so that we can build and test these features.  Or pull requests that do 
that :)

>
> *3. New deployments* (vs ongoing upgrades/maintenance). How easy and 
> straightforward it is to setup a new deployment (possibly migrating 
> from other email servers). Here I'm referring to both the initial 
> configuration, tools and documentation.
>

Yeah, we know about this one. I'm not going to create a specific bug for 
it, because it's kind of spread out over lots of different things.  
Nicola is working on improving our documentation, but again the best 
people to give advice are people who've recently done it.  I haven't 
really "installed Cyrus from scratch" for 12 years, certainly not 
without the FastMail configuration and build systems.  Except for the 
test environment, which has its own special magics.

>
> *Push* is an area that is well implemented at FM, but there's no 
> considerable advance in the Cyrus repository, and I believe the 
> community needs in this area are mostly the same as the FM's.
>
> The 3.0 release includes Apple push notifications support 
> (XAPPLEPUSHSERVICE) and that's a good start. I haven't tried it yet 
> and I understand that some effort would be required to make it work 
> (the part that talks to the APS is not included and should be 
> implemented independently). I do wonder why wouldn't FM share the 
> notifier code & some documentation about how to make everything work? 
> The only thing that'd be different in each deployment are the 
> certificates. And it would be really exciting to have working apple 
> push in Cyrus just after some typical setup steps.
>
> If there are some impediments for the FM team to share their 
> implementation details on mail and caldav/carddav push notifications, 
> I'll try to make this feature work in my deployments in the near 
> future and contribute to the project a detailed howto and the APS 
> notifier code (but your assistance would be great).

Unfortunately part of that is under NDA, so we can't offer much more 
support there.  When/if Apple open up their push infrastructure more, 
we'll definitely release the other parts of it.

I'm sure we've published at least part of our perl pusher layer before, 
though some of the session magic uses our sql infrastructure rather than 
storing sessions in Cyrus so that it survives failover between 
replicas.  If we wanted to store them in Cyrus we'd need to have a 
replication protocol for key-value stores or some sort of replicated DB 
store.

>
> And a general area that would benefit everyone, but that wasn't 
> specifically mentioned in the blog post, is *Security*.
>
> I don't mean Cyrus is insecure, and I do know that the FM team pays 
> special attention to security of their infrastructure as a whole. 
> Rather I would like to suggest that a special emphasis could be placed 
> on Cyrus security from a development POV, e.g. to document in detail 
> (and keep updated) the entire project's code base and its 
> architecture, to follow most of the security development 
> best-practices, to re-implement with security in mind some old/hacky 
> parts of the system (they would become apparent during the 
> documentation phase), to apply general hardening tactics (like chroot) 
> or even to re-engineer the overall architecture for security, to 
> perform internal security code reviews on a regular basis.

This is the kind of well meaning plan that leads down a massive rabbit 
hole.  "Document in detail (and keep updated)".  Such few words for so 
much work.  We do bits and pieces of this as we can, and I've recently 
set up coverity to assess the project, and am working my way through its 
reports.

Certainly some parts of the code (like sieve) are a fricking mess, and 
could very well be hiding security issues because they're just so 
horrible.  We fix them up as we have time and deal with them.

>
> FM already had a security audit in 2014 (according to your previous 
> blog posts), but you don't specify any details of how deep it was and 
> what aspects it covered. Maybe an independent in-depth security audit 
> with public results just for the Cyrus code base could be sponsored in 
> collaboration with the community?

Again, unfortunately NDAs :(

Feel free to sponsor a security audit.  I'd be happy to participate, but 
I can't justify funding it.  I have an idea of where likely bugs are 
(URLAUTH, FETCH BODY[part] until recently when we rewrote it, maybe even 
message structure parsing) and I rewrite them to be safer when I deal 
with those bits of code, as do the rest of the team.

> As for me as a member of the community, I have an intention to 
> implement the chroot functionality for the daemon (late chroot like in 
> OpenVPN). I've already discussed it briefly with Ellie and was hoping 
> to make it ready for the 3.0 release, but had no time for it yet. To 
> implement it correctly, first some important changes should be applied 
> to the initialization logic (the moment of dropping the privs, it 
> should be inside newly started processes, rather than in the master). 
> This change should be carefully analyzed and it's a significant 
> effort, I hope to be able to contribute it during the Q1/17. Once this 
> change is implemented (which in itself wouldn't change almost any 
> functionality, so it would be easy to test and deploy), the chroot 
> functionality would be some 15 lines of code.

Interesting.  I'm looking forward to seeing it.

One thing that I would add here, is that we need to extract the SNMP 
code from master and run it in a separate process as well if we have any 
hope of making master something that can be allowed to run with any 
higher privileges than it currently does in its mainloop.  Greg 
explained to me what he had planned for that, but never had time to do it.

https://github.com/cyrusimap/cyrus-imapd/issues/1765

>
> Merry Christmas and Happy New Year!
>

Thanks, the same to you!

Regards,

Bron.

>
>
> Anatoli
>
> *From:* Bron Gondwana Via Cyrus-devel
> *Sent:* Thursday, December 22, 2016 03:15
> *To:* Cyrus Devel, Info Cyrus
> *Subject:* Release plan blog post
>> I posted on the FastMail advent about our plans for releasing Cyrus 3.0 - it's a bit roundabout doing it this way rather than here first, but hey - we talked about it on Monday night's regular meeting.
>>
>> Here's the blog post:
>>
>> https://blog.fastmail.com/2016/12/22/cyrus-development-and-release-plans/
>>
>> tl;dr, Ellie recently released 3.0beta6.  We're going to do a release candidate on Jan 13th and then release for real soon afterwards, so get testing!
>>
>> There are no major changes expected before release.  I'll be doing a couple of small JMAP changes to align with the latest spec and possibly to add getMessageListUpdates if I can manage it in time.
>>
>> Other than that, I'm looking a reverse UniqueId indexing similar to the RACL support - it's already in testing and might get added behind a default-off config switch.
>>
>> We'll be assessing all the defaults.  I'm really tempted to turn RACL on, but it needs group support if your site uses groups, and that's not done yet, so I'd need someone willing to test it!
>>
>> Bron.
>>
>>
>>
>>

--
   Bron Gondwana
   brong at fastmail.fm

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.andrew.cmu.edu/pipermail/cyrus-devel/attachments/20161231/293e3ed9/attachment-0001.html>