Blog

ActivityPub, Pleroma and Feather

First, some background.

In 2006, Evan Prodromou started a project that was intended to be a free software alternative to commercialized social media platforms that he originally called Laconica. After a while, this was renamed to StatusNet, and then later was placed under the purview of the GNU project. The GNU social software is built on a protocol stack known as OStatus, which is essentially a clever combination of other specifications.

Meanwhile, Evan started a new project called pump.io, which spoke a new protocol that was built on top of JSON. This protocol eventually fell under the stewardship of the W3C Social Web group, which initially called it ActivityPump, and then later ActivityPub. Development of the ActivityPub protocol went on for a few years until it was eventually ratified as an official W3C recommendation in early 2018.

While all of this was going on, as many people know, Eugen Rochko created the Mastodon project, in part as a reaction to problems on Twitter that first got major public attention during the 2016 US presidential election. Between November 2016 and April 2017, Mastodon got nearly 200k users, across a handful of instances. Mastodon built on top of the pre-existing OStatus infrastructure, so it could tap into the pre-existing userbase using GNU Social and Hubzilla.

In April 2017, Mastodon was brought to my attention, and I deployed it using a service called Scalingo. Mastodon is a Ruby on Rails application, and so I felt it was worth $20/month to make maintaining it effectively not my problem.

Mastodon Hardened

All went well, and eventually I switched from using Twitter to using Mastodon as my primary social media tool. But then I started noticing that Mastodon did not really do a very good job at guaranteeing user safety, and had design features that were entirely irresponsible, such as sending Block activities to the server of the person that was being blocked, in a bizzare attempt to emulate the Twitter block system. (As an aside, I’m not sure why anyone would want to emulate the Twitter block system, of course, given that it is trivially evaded by opening an incognito browser window.)

As a result, I wound up starting a friendly fork of Mastodon which followed the upstream tree but removed or replaced functionality that was user-hostile (such as the Twitter block emulation) or actually dangerous. I also fixed the timeline building code so that it would implicitly mute anyone who had blocked you, so that you wouldn’t have to deal with any potential harassment from somebody who had blocked you (a long-standing bug in Mastodon that still isn’t completely fixed today).

This went on for a while, and everything was fine, but then Mastodon 2.2 introduced a ton of changes, so I wound up staying on Mastodon 2.1 for my instance, which meant that I never bothered to update Mastodon Hardened to 2.2. Which meant the fork died.

Around this time, Scalingo changed their pricing a bit, and the instance started to creep up in costs. Additionally, they started billing for CPU time used while compiling new instances of the app, which meant that any time I changed anything, I would be charged for that.

Moving from Mastodon to Pleroma

This lead me to start thinking about different hosting for my instance, but Mastodon was starting to get really heavy. Which got me to thinking about writing a new implementation from scratch, which I did start working on for a while called Eshu.

Around the time that I started to get frustrated with developing an entire social streams server from scratch in asyncio, with an ecosystem that wasn’t really up to the challenge of supporting it, Pleroma announced that they had gotten their ActivityPub implementation to the point that it could be used in production. Additionally, Pleroma was frontend-agnostic (more on this later) and had bundled the Mastodon frontend as one of their frontend choices. lain, the primary Pleroma developer also ran her own instance on a raspberry pi, which was very interesting. So, I wound up buying a $3/month ARM server from Scaleway to see if it really would work out.

I already had some familiarity with Elixir, so I decided to give Pleroma a try. Within a few hours, I was pretty much convinced that Pleroma was the way to go for my needs, and flipped the switch. The mastodon.dereferenced.org instance was decomissioned, in favor of the Pleroma one.

Getting involved in Pleroma development

While I did switch from Mastodon to Pleroma, when I initially switched, the ActivityPub implementation was not fully compatible with Mastodon’s extensions. So, I went to work and started sending patches to fix them. After a while, people started asing me if I could implement changes for their needs in the Pleroma backend, so I started working on those issues too.

After a while, with some patching, we managed to get a fully compatible ActivityPub implementation that could federate with Mastodon and others like PeerTube without any problems. This is the reality today, and the 1.0 release will likely come within a month or two, with a full implementation report sent in to the W3C.

Feather

One evening, it hit me: if Pleroma is a generic social streams server that supports every client API used in the fediverse right now, then it would be a good starting point for building a new frontend, as it is effectively a platform for building social networking applications.

Specifically, I felt that the “lets be like Twitter” microblogging space was oversaturated. There was Pleroma with the Pleroma FE and Mastodon FE, there was Mastodon itself, and there were the OStatus nodes that both Pleroma and Mastodon could interoperate with.

So one of the main design goals for Eshu was to do away with that concept entirely. As such, I started taking my mental models of how the interface would work and began building them as a Progressive Web App (PWA) that runs on the Pleroma platform, using vue.js. It should be noted that I am not really much of a web developer and have been making this up as I go along. Hopefully other people will send in patches to fix my mistakes.

This frontend is called Feather, and it is more similar to something like Facebook than Twitter.

Screenshots of Feather

As an idea of how it looks, here is a screenshot of the basic Home timeline:

Feather's home screen

Feather uses hierarchical threading, which allows for discovering new people to interact with:

A thread in feather, hierarchically represented

There is also a work in progress “media view” for tags, which works nicely with tags like “#art” or “#photography”:

Feather's media view

There’s still a lot to do, but Feather demonstrates that it is possible to build any kind of social networking application on the Pleroma platform.

I plan to put up a public instance running Pleroma + Feather soon, so that people can try it for themselves, too.

pkgconf 1.1.0 release and new kaniiniware bug bounty program!!!

To kick off the 2017 year, I have released pkgconf 1.1.0.

In terms of critical impact, the pkgconf 1.1 release series is likely one of the most major releases we have done over the course of the project, and I’m quite serious about that: as of pkgconf 1.1, we have introduced many new features that were never properly implemented in the original pkg-config utility. But that’s just the beginning, lets look at a summary of changes since the 1.0 series:

As usual, it couldn’t have been done without the help of so many people, and we still have a ways to go before the full potential of libpkgconf is realised. Between building new bindings to libpkgconf, upgrading distributions from either old versions of pkgconf or pkg-config, and improving the .pc file format, there’s lots of work for us to do in the new year!

This brings me to my next point…

new kaniiniware bug bounty program!

I am launching a bug bounty program for all of the software I presently maintain. (This does not include software that I have passed maintainership to other people, such as charybdis, atheme and the other IRC software.)

Here’s the deal with that: find a security bug, find a normal bug in the code then patch it and get your patch landed, or significantly improve documentation in software I maintain, and you’re entitled to receive a stuffed rabbit sent to your house via Amazon. The size and price of said stuffed rabbit is dependent on overall severity/impact of the bug/contribution. Simply add it to an Amazon wishlist with your shipping address attached and mention it in your bug report or pull request. If the pull request or bug report qualifies, the rabbit will be sent to you ASAP. If for some reason you don’t happen to like rabbits, an alternative stuffed animal is ok too!

No really, that isn’t a joke. Consider it a token of gratitude for your contribution to the project!

pkgconf 0.9.12 and future pkgconf versioning changes

pkgconf 0.9.12 was released earlier today which improves some minor edge case issues with the deduplication support.

You can download the official tarball at http://rabbit.dereferenced.org/~nenolod/distfiles/pkgconf-0.9.12.tar.bz2 as with the rest of the releases.

This is the last planned release on the 0.9 branch. We are going to move to a versioning scheme where each release has it’s own number. So, the next release will be pkgconf-1. I am planning on shipping that in September with the libpkgconf split done, and hopefully improve some things to make life easier for MSYS users again.

A pleasant side effect of this change is that hopefully, we should be able to drop some of the cruft which we provide for pkg-config compatibility, such as lying about the version number in –version. After all, 1 is greater than 0.28.

Which brings me to my next point: bug for bug compatibility with pkg-config. At this point, we’re going to be more conservative in terms of which pkg-config bugs we simulate, as one of the main goals of the pkgconf project was to be more pedantic. So generally, starting with today’s release, regressions are only accepted as bugs if the modules involved are actually correctly formed, or similar levels of justification are provided. This is because we do not have an interest in providing bug-for-bug compatibility, as we’re trying to obviously release a better tool.

So for pkgconf-1, mainly we’re just doing the split so people can use pkgconf inside their own apps (IDEs, for example), and fixing whatever bugs are reported and then shipping it in September. After that, we will work on pkgconf-2 and so on.

That’s basically it for now… I might work on developing a talk to explain in greater detail what differences exist between pkgconf and pkg-config as well as motivations. More on that later.

RobustIRC isn't robust

Recently a new IRC implementation called RobustIRC was released. Among other things, it claims:

No netsplits on server unavailability

Traditional IRC networks split whenever a server has brief network connectivity issues to the rest of the IRC network, or whenever a server needs to be upgraded. With RobustIRC, your users will not notice when you roll out a new version or reboot the machine on which a particular RobustIRC server is running.

How interesting. Sounds like they are claiming to provide full CAP tolerance (or at least the appearance thereof) to the user. How does it work? The YouTube video of the talk the author gives is somewhat interesting, and describes an architecture not too dissimilar to the proposals for next-generation IRC server linking protocols. Specifically, it has these properties of interest:

As a result, they claim that the effect of netsplits are transparent to the end user. Are they?

In typical architectures featuring Raft, the event log is atomic. To do this, events from secondary servers have to be acknowledged by the master before they are committed to the raft log. The master does not acknowledge the event until it has been seen by at least N active nodes on the network. RobustIRC implements Raft-style consensus this way, just like traditional Paxos architecture is implemented (Raft itself is a simplified form of Paxos).

RobustIRC and simple network partitions

So how does RobustIRC handle a simple network partition? According to their talk, it depends on what side of the partition you are on. In their talk, they kill two RobustIRC processes running on localhost and demonstrate that the IRC client fails to work until a new RobustIRC server joins the cluster. However, rarely are network partitions actually that clean.

This leads me to believe that RobustIRC follows CP principles of the CAP theorem, instead of AP like traditional IRC. However, does it properly follow CP?

Using some of the code I used to test Brocade’s Virtual Chassis features back in December, we connect 2 IRC clients to the RobustIRC cluster and send 1000 messages to a channel. Here’s how that test looks when it’s normal:

Cluster has been formed, topology:
   master: 192.168.140.1
   |-- child1: 192.168.141.1
   |-- child2: 192.168.142.1
   `-- child3: 192.168.143.1
Connected test IRC client to node "master".
Connected test IRC client to node "child3".
Sender is sending 1000 messages to #test.
Receiver is waiting for messages on #test.
Test complete.
Cluster is consistent!

Now what happens if we snub child3 from master and then resolve the partition? That seems correct as far as I can see based on what they promise:

Cluster has been formed, topology:
   master: 192.168.140.1
   |-- child1: 192.168.141.1
   |-- child2: 192.168.142.1
   `-- child3: 192.168.143.1
Connected test IRC client to node "master".
Connected test IRC client to node "child3".
Sender is sending 1000 messages to #test.
Severing link between "child3" and "master"!
Receiver is waiting for messages on #test.
Healing link between "child3" and "master"!
Test complete.
Cluster is consistent!

Okay, so far so good it seems.

How does RobustIRC handle split-brain?

RobustIRC should handle an even partition by making all nodes useless. But does it? What happens if the partitions are made sufficiently large to provide quorum after a partition is created? Better yet, how can we prove which side of the split-brain won? Answer: We change the client to use IRC’s TOPIC command. The winning log entry updating the topic will be shown to new clients.

Cluster has been formed, topology:
   master: 192.168.140.1
   |-- child1: 192.168.141.1
   |-- child2: 192.168.142.1
   |-- child3: 192.168.143.1
   `-- child4: 192.168.144.1
Connected test IRC client to node "master".
Connected test IRC client to node "child3".
Severing link between "child3" and "master"!
Severing link between "child4" and "master"!
"child4" has been reconnected to "child3".
New RobustIRC container has been started, connecting to "child3".
New RobustIRC container has been started, connecting to "child3".
New RobustIRC container has been started, connecting to "child3".
New RobustIRC container has been started, connecting to "master".
New RobustIRC container has been started, connecting to "master".
Topology:
   master: 192.168.140.1
   |-- child1: 192.168.141.1
   |-- child2: 192.168.142.1
   |-- child8: 192.168.148.1
   `-- child9: 192.168.149.1
   child3: 192.168.143.1
   |-- child4: 192.168.144.1
   |-- child5: 192.168.145.1
   |-- child6: 192.168.146.1
   `-- child7: 192.168.147.1
Sending TOPIC to #test @ "master": "03cfd743661f07975fa2f1220c5194cbaff48451"
Sending TOPIC to #test @ "child3": "7b18d017f89f61cf17d47f92749ea6930a3f1deb"
Healing link between "child4" and "master"!
Healing link between "child3" and "master"!
Waiting for network to converge.
Test complete.
"child3" client reports TOPIC: "7b18d017f89f61cf17d47f92749ea6930a3f1deb"
"master" client reports TOPIC: "03cfd743661f07975fa2f1220c5194cbaff48451"
The cluster is SPLIT BRAIN: TOPIC mismatch! (╯°□°)╯︵ ┻━┻

It seems that when the network diverges sufficiently, the network won’t reconverge when the partition is healed. This means you have to restart the losing side of the network. Gross.

Traditional IRC handles this case correctly – the newest TOPIC wins provided both sides have the same channel timestamp.

Thoughts

RobustIRC trades IRC’s main property (availability) for consistency. While this may hide netsplits, it results in a degraded experience if you’re on the partitioned side of the IRC network. Thusly, I do not think it is any more robust than the traditional network. I also believe it provides a high maintenance burden for IRC client authors while providing little to no gain as it’s quite possible to be on an orphaned node during a network partition, and not all partitions are caused by hard disconnections – in reality, most partitions are caused by packet loss. While Raft provides a compelling quorum algorithm for many applications, I do not believe it maps well to IRC nor solves any of IRC’s actual problems. RobustIRC is only more robust under the assumption that the partitioned side of the network not processing messages is acceptable. In a typical IRC network it is desirable for all nodes to be able to process messages until the partition is resolved, hince why IRC is an AP protocol and not CP.

Further I think RobustIRC fails to provide the promises it makes even when you accept their concessions (like the partitioned nodes being dead until quorum is reached). When I introduced minor packet loss between the nodes, they failed to reach quorum even though in traditional IRC, they would have managed to remain linked (albeit with some lag).

So I do not think RobustIRC is robust at all, and I encourage IRC client authors to just ignore them and refuse their requests to merge code into your applications. I also think RobustIRC is thankfully dead on arrival because it’s a fair bet that mIRC will never support it.

As usual, just bet on IRCv3 to solve these problems, which it will (session resumption) in client protocol 3.3.

Rethinking ircd

Lately I have been working on an ircd, initially to host the channels which will be moved from irc.atheme.org once it is terminated. The result of this work is three python packages: ircmatch, ircreactor, and mammon. When combined, these provide a modular IRC implementation - ircmatch provides IRC hostmask matching and collapsing, ircreactor provides translation and manipulation of RFC1459 messages into an intermediate representation, and mammon brings it all together on top of Python 3.4’s excellent asyncio framework.

This post is long, and somewhat serves as a manifesto for the project, what we have in mind for both now and the future, and how all of this maps onto the IRCv3 standardization effort. While I can only recommend reading the entire post, I can provide a good overview in a few buzzwords: server-side authentication without services, channel management which makes sense and protocol correctness verification. The code is available if you want to play with it, and a server is running at mouse.dereferenced.org:6667, to prove that this is a real thing.

the ircds of yesterday

Every IRC network operates software called ircd. Most IRC networks also operate an authentication layer, which is provided by software called “services”. Atheme and Anope are presently the primary middleware platforms deployed by networks which provide the authentication layer.

Historically, the software acting as ircd has been derived from IRC 2.8, which has been showing it’s age for a long time. Many other replacements have been proposed over time, but only one of them really took off: InspIRCd, which is now the second-most widely used ircd implementation. InspIRCd could actually be used for prototyping new features, however, it’s written in C++ which makes it intimidating to new developers.

In fact, InspIRCd has implemented prototypes of many of the features we plan to implement in mammon. However, it is tied to having to support legacy clients and legacy approaches to network and channel management. This in combination with the C++ codebase makes it a difficult target for prototyping large changes to the protocol and user experience.

throwing out legacy design

As a result of this, I started writing a new server that threw out basically everything. RFC1459 is only considered a suggestion, with preference given to the IRCv3 interpretation on issues. This server is designed to allow us to eventually completely jettison the RFC1459 framing format, even.

Actually, I wrote an earlier server which ultimately was not viable. mammon is the rewrite, of the rewrite.

But what does it really mean to throw out legacy design? What it means in context of mammon is:

maintaining scalability

Right now, the design of mammon is not intended to be scalable. We will speed it up as the software matures. However, a common concern that has been mentioned is that mammon may fail to scale because it is written in Python. To this, I argue that we can maintain good scalability both on CPython, and provide better than ircd 2.8’s scalability on a high performance VM such as pypy once it supports yield from.

Put differently: ircd is an I/O-bound application. The main area where we need to be careful is ensuring we can keep our TLS code parallelizable, but Python already provides excellent primitives for this. It will be interesting to see how mammon performs verses other IRCds such as charybdis and InspIRCd as the software matures.

why not inspircd?

Although the Atheme community and InspIRCd developers have not always agreed on some issues (mostly related to InspIRCd’s now-defunct m_invisible.so), I do want to stress that in general, as far as C++ codebases go, InspIRCd is pretty easy to follow. However, InspIRCd is a C++ codebase. One of my main goals with mammon, is to make IRCd even more accessible for people looking to learn about programming. By using a language such as Python, this is a goal that is easily accomplished.

From a technical debt perspective, both the C++ codebase and the obligation to support legacy network deployments makes InspIRCd an undesirable choice for prototyping new features. The point of mammon is that it is a playground to try new things, this is as far as I know, not a goal of InspIRCd at this time.

putting it all together

Right now we have an ircd which implements a lot of the core fundamentals to make this system work. The rest of the fundamentals will be implemented as time permits, but we already have reasonably good RFC1459 coverage. We still need to implement many components of IRCv3.2 itself, but this should not be too difficult. Pull requests are definitely something we would take a look at…

The way mammon works, as previously mentioned is to operate on intermediate representation. This allows us to replace the RFC1459 transport with whatever transport we like. It also allows semantic information to be attached to the message in a flexible way, either as tags or as internal properties. This allows for simpler implementations of features which depend on state in a way which requires little to no boilerplate.

With any luck, mammon will be as influential as it’s predecessor was.

On the topic of Snoonet

So, there is this network called Snoonet. People have occasionally asked me what the problem was between the Atheme community and Snoonet, so I will attempt to explain it.

When Snoonet first adopted the Atheme platform, they had large ambitions. At first, the relationship was good and productive. There were a few things we found eyebrow-raising, such as the plans to raise funding to pay IRCops for their services (this in itself isn’t really that surprising, and other networks running on the Atheme platform do operate this way), but they weren’t really a major concern to us.

Then towards the beginning of October 2013, Snoonet got a new staffer named “idahodude” (new to us anyway). This person came into our channels with the wrong attitude, basically attacking the platform because he couldn’t be bothered to read any documentation at all, blaming the maxclients setting on his IRCd on the software, instead of his misconfiguration. We went out of our way to explain what rlimits on UNIX are to him, but he still felt it was a problem in charybdis, instead of taking a few moments to understand the documentation. At this point, we recommended he hire an admin company to configure his IRC network. Later we discovered this particular user had an agenda to move the network to InspIRCd anyway, which was fine by us if that was going to be the future.

We ultimately wound up banning their admins from our channels, with the exception of MilleniumFalc0n, who seemed reasonable enough. Later Snoonet decided to migrate to Anope when the Atheme maintenance baton was passed over to downstream forks, which is fine. We then decided to ban the last remaining Snoonet participant as frankly, we were tired of having to remember that this network existed to begin with and had staff who could not read documentation.

tl;dr: IRC network claimed to be a “corporation” (Snoonet, Inc.) and wasn’t, offered “paid staff positions” but not really, and generally wasted our time, resulting in their “network director” (some teenaged kid) picking a fight with us.

Anyway, that is all I am going to say on the matter. Apparently the “network owner” decided to ban me from the /r/irc subreddit thingy, while slagging me through the mud, so I figured I would clarify my point of view on the comments that he made (since you know, he banned me after attacking me, probably so I couldn’t reply this way).

Do not use or provide DH-AES or DH-BLOWFISH for SASL/IAL authentication

Atheme 7.2 dropped support for the DH-AES and DH-BLOWFISH mechanisms. This was for very good reason.

At the time that DH-BLOWFISH was created, IRC was a very different place… SSL was not ubiquitous, and it was thought that having some lightweight encryption on the authentication exchange might be useful, without opening services to a DoS vector. An initial audit on DH-BLOWFISH found some problems, so a second mechanism, DH-AES was created to get rid of some of them.

However, both of these mechanisms use a small keysize for Diffie-Helman key exchange (256 bits), as previously mentioned by grawity. After the freenode incident, where a user discovered they could DoS atheme by spamming it with DH-BLOWFISH requests, we decided to audit both mechanisms, and determined that they should be removed from the distribution.

The reasons why were:

  1. Users had a strong misconception that the mechanisms provided better security than PLAIN over TLS (they don’t);
  2. Because the DH key exchange is unauthenticated, users may be MITM’d by the IRC daemon;
  3. The session key is half the length as the keyexchange phase, making the entire system weak. DH can only securely provide half the bitspace for the session key as the size of key exchange parameters. Put more plainly: if you use DH with 256-bit parameters, the session key is 128 bits, which is weaker than PLAIN over TLS.
  4. Correcting the key exchange to use 256-bit keys would require rewriting every single implementation anyway.

If you want secure authentication, just use PLAIN over TLS, or use atheme’s experimental family of ECDSA mechanisms, namely ECDSA-NIST256P-CHALLENGE. Yes, it’s based on sec256p1, which is a NIST curve, but it’s acceptable for authentication in most cases, and most cryptography libraries implement the sec256p1 curve. While not perfect, it is still much better than the DH family of mechanisms.

Unfortunately, at least one atheme fork has resurrected this mechanism. Hopefully they remove it, as it should be treated as if it were backdoored, because the level of mistakes made in designing the mechanism would be the same type of mistakes one would introduce if they wanted to backdoor a mechanism.

Update: Unfortunately Anope also implemented these broken mechanisms. Luckily it appears that X3 has not.

How does Brocade's VCS stack up for resiliency?

This post is part of a series wherein we break various networks, using a transparent bridge lovingly called “the apparatus”. In this series, we’re going to learn how distributed systems intersect with the modern physical network, and why certain approaches and topologies are best avoided. In this particular post, we explore Brocade’s VCS platform, as implemented on it’s VDX switches, and test various failure domains using the apparatus.

Brocade’s VCS platform is built on-top of TRILL and Data-Center Bridging as the underlying primitives, and Brocade’s proprietary ISL trunking extensions for peer discovery. This type of approach is fairly common for ‘fabric’ architectures. The interesting thing is that Brocade’s implementation uses a different routing protocol, FSPF rather than IS-IS for the link-state routing protocol as required by TRILL. The most similar technology, Cisco’s FabricPath as used with their Nexus switching platform, uses IS-IS with custom extensions instead.

The Brocade VCS cluster itself

The fundamentals behind the VCS platform seem reasonable, but how does the clustering itself work? The configuration is merged using a protocol called BLDP, which is a subset of Brocade’s ISL trunking extensions. BLDP handles discovery of peers and joining the peer switch into the fabric, as long as the following requirements pass testing:

  1. The two peers must both speak the same version of the BLDP protocol.
  2. The two peers must have the same BLDP cluster ID configured.
  3. The primary switch must have an rbridge-id available for the new peer. Up to 239 rbridge-ids are available for a fabric to use.

You might have noticed I said “primary switch” here and may be confused as Brocade’s sales literature for the VDX says:

Unlike other Ethernet fabric architectures, Brocade VCS fabrics are masterless and can be designed in full mesh, partial mesh, leaf-spine, and various other topologies, and they are easily modified or fine-tuned as application demands change over time.

This does not actually mean there is no primary or master switch, which is contrary to how most people might interpret this statement. In other words, Brocade are spinning reality. What they mean here is that there is no explicit master switch to configure, however their documentation very clearly uses terminology such as “coordinating switch” and “primary switch”, which can be interchanged with “master” here.

Master election

As the VCS is a distributed system which uses a broker to coordinate transactions, a master node must be elected as the broker. According to Brocade’s documentation, this procedure takes place to elect the master node:

  1. Every switch at startup designates itself as a potential master and advertises solicitations that it wants to be the master on all trunk ports.
  2. At election time, all solicitations are compared. The solicitation has two fields: the switch WWN (a 64-bit unique identifier), and a priority level. At cold-boot, the priority level is the same for all switches.
  3. The solicitation with the lowest WWN (integer comparison) and highest priority wins. Priority is preferred over WWN, so that the administrator may specifically nominate a switch as master.
  4. At the end of the election process, the entire fabric’s peer group has been encapsulated into an acyclic graph with the master switch at the root.

At the end of the election, some conflict resolution has already occured: if a switch has an explicit rbridge-id assigned to it which conflicts with a pre-existing node then the switch is not allowed to join the fabric. In this situation a manual intervention is required: the explicit rbridge-id must either be changed or removed from the partitioned switch, and the links must be recycled - this is usually accomplished by rebooting the partitioned switch. This seems to fall short of Brocade’s promise of a zero-configuration fabric, but isn’t terrible in and of itself.

Configuration merging

In the event of a network partition, what does the VCS system do? Brocade’s documentation describes a strategy it calls a “trivial merge”, wherein the losing side of the partition loses it’s configuration and has it’s configuration entirely replaced by the configuration on the winning side of the partition. The winning side is determined by the number of peers on both sides of the partition, and the last update time on the configuration.

This means that consistency is given up by the clustering system when operating in logical chassis mode, leaving us with a distributed system that has AP qualities in both the configuration and forwarding planes. But does it really work? What happens when both sides of a partition are equal and the last update time is very close? Lets find out.

For this test, we configure a network topology consisting of 2 groups of 4 switches linked together directly and a shared path passing through the apparatus. This allows us to create an even partition by simply taking the ports connecting the two sides offline, at which time we will update the configuration on both master nodes.

The result? A split-brained cluster:

Cluster has been formed, topology:
   master: 192.168.140.1
   n1: 127.1.0.1
     |-- n2: 127.1.0.2
     |-- n3: 127.1.0.3
     `-- n4: 127.1.0.4
           |-- n5: 127.1.0.5
           |-- n6: 127.1.0.6
           |-- n7: 127.1.0.7
           `-- n8: 127.1.0.8
Severing link between n4 and n5 ('ix0', 'ix1')!
side 1 master: 192.168.140.1 pings!
side 2 master: 192.168.140.1 pings!
Inserting 50 VLAN definitions on side 1 and side 2:
side 1: [1, 2, 3, 4, 5, 6 .. 45, 46, 47, 48, 49, 50]
side 2: [51, 52, 53, 54, 55, 56 .. 95, 96, 97, 98, 99, 100]
Synchronized change commits.
Healing partition between n4 and n5 ('ix0', 'ix1')!
Master 192.168.140.1 belongs to n1.
Checking for survivors.
50 survivor VLANs on the master.
Checking configuration consistency for n4 vs n5.
n4: [1, 2, 3, 4, 5, 6 .. 45, 46, 47, 48, 49, 50]
n5: [51, 52, 53, 54, 55, 56 .. 95, 96, 97, 98, 99, 100]
n4 and n5 are inconsistent!
The cluster is SPLIT BRAIN: 50 inconsistent configuration nodes. (╯°□°)╯︵ ┻━┻

Ouch. At least the election algorithm works as described. This indicates that they are using per-second resolution on the configurations, as we can not get a perfect sync on clocks due to clock skew. I guess we shouldn’t be that surprised, as the “trivial merge” description did outright advise us that consistency goes right out the window.

However, does the split brain status result in packet loss? We repeat the same test as above, but with an added pair of loopback paths passing back through the apparatus. This allows us to run pkt-gen.c, as included with netmap to determine if the fabric is still available in such a configuration. The result? Packet loss if the return path crosses the partition boundary:

Cluster has been formed, topology:
   master: 192.168.140.1
   n1: 127.1.0.1
     |-- n2: 127.1.0.2
     |-- n3: 127.1.0.3
     `-- n4: 127.1.0.4
           |-- n5: 127.1.0.5
           |-- n6: 127.1.0.6
           |-- n7: 127.1.0.7
           `-- n8: 127.1.0.8
Configuring VLAN tags:
   TenGigabitEthernet 4/0/2: 45
   TenGigabitEthernet 5/0/2: 45
Severing link between n4 and n5 ('ix0', 'ix1')!
side 1 master: 192.168.140.1 pings!
side 2 master: 192.168.140.1 pings!
Inserting 50 VLAN definitions on side 1 and side 2:
side 1: [1, 2, 3, 4, 5, 6 .. 45, 46, 47, 48, 49, 50]
side 2: [51, 52, 53, 54, 55, 56 .. 95, 96, 97, 98, 99, 100]
Synchronized change commits.
Healing partition between n4 and n5 ('ix0', 'ix1')!
Master 192.168.140.1 belongs to n1.
Checking for survivors.
50 survivor VLANs on the master.
Checking configuration consistency for n4 vs n5.
n4: [1, 2, 3, 4, 5, 6 .. 45, 46, 47, 48, 49, 50]
n5: [51, 52, 53, 54, 55, 56 .. 95, 96, 97, 98, 99, 100]
n4 and n5 are inconsistent!
The cluster is SPLIT BRAIN: 50 inconsistent configuration nodes. (╯°□°)╯︵ ┻━┻
Sending traffic from 4/0/2 to 5/0/2.
pkt_gen helper ('4/0/2', 'ix2'): Sent 1000000 packets
pkt_gen helper ('5/0/2', 'ix3'): Timeout after 60 seconds.  Received 0 packets

The good news is that the forwarding plane is enabled on both sides of the partition, so if your VLANs are localized to a specific side of the fabric, then availability is maintained. If the packets cross the network partition boundary though, availability can be impacted. This makes sense as both sides of the split thought a configuration merge was unnecessary, resulting in a ‘split brain’ configuration.

All of this makes me wonder if the “trivial merge” strategy implemented by Brocade works at all. If we create a single-node partition, that should clearly work, right? We adjust the topology so that n7 and n8 are bridged through the apparatus and retry our original test:

Cluster has been formed, topology:
   master: 192.168.140.1
   n1: 127.1.0.1
     |-- n2: 127.1.0.2
     |-- n3: 127.1.0.3
     |-- n4: 127.1.0.4
     |-- n5: 127.1.0.5
     |-- n6: 127.1.0.6
     `-- n7: 127.1.0.7
           `-- n8: 127.1.0.8
Severing link between n7 and n8 ('ix0', 'ix1')!
side 1 master: 192.168.140.1 pings!
side 2 master: 192.168.140.1 pings!
Inserting 50 VLAN definitions on side 1 and side 2:
side 1: [1, 2, 3, 4, 5, 6 .. 45, 46, 47, 48, 49, 50]
side 2: [51, 52, 53, 54, 55, 56 .. 95, 96, 97, 98, 99, 100]
Synchronized change commits.
Healing partition between n7 and n8 ('ix0', 'ix1')!
Master 192.168.140.1 belongs to n1.
Checking for survivors.
50 survivor VLANs on the master.
Checking configuration consistency for n7 vs n8.
n7: [1, 2, 3, 4, 5, 6 .. 45, 46, 47, 48, 49, 50]
n8: [1, 2, 3, 4, 5, 6 .. 95, 96, 97, 98, 99, 100]
n7 and n8 are inconsistent!
50 lost deletes found, cluster is INCONSISTENT.  :(

Guess not.

What can we do to mitigate this situation? As a user, we should test to make sure the fabric is whole before applying configuration changes (show vcs and verify all nodes are present). That way we don’t have our data eaten by a “trivial merge” later.

I wish I could have expected better from the VDX, but they were a source of major pain for me when I had to deal with them previously… my suggestion ultimately is not to buy this product, I would wait for the second fabric product from Brocade instead.

How resilient are ethernet fabrics anyway

This post is part of a series wherein we break various networks, using a transparent bridge lovingly called “the apparatus”. In this series, we’re going to learn how distributed systems intersect with the modern physical network, and why certain approaches and topologies are best avoided. In this particular post, we will discuss the CAP theorem and how it applies to network fabrics of varying design, as well as the design and implementation of the apparatus.

Physical networks are a suspenseful place these days, with many questions like: did the peer switch get my frame? and is that equal-cost path really available? Indeed, the network has evolved from the acyclic graph architecture imposed by STP in the 1990s. With various technologies such as data-center bridging and transparent interconnection of lots of links, it seems everyone has a solution for converting ethernet from an acyclic graph to a mesh topology. The problem however, is that the new solution assumes the physical network is synchronous. In reality, this is not the case: most networks are not perfectly synchronous. Media conversion for example will always be asynchronous, and a bad cable or optical module can introduce significant degradation along a path.

Modern network “fabrics” are composed of multiple components and protocols, ultimately communicating over an asynchronous physical network. Therefore, understanding the reliability of a fabric requires careful analysis of both the physical network and the components which implement the fabric in failure situations. As fabrics convert the network into a truly distributed system, we can evaluate the underlying components using the CAP theorem to determine what tradeoffs are made by the fabric. Like many hard problems in computing, distributed network fabrics come down to handling of shared state, so the CAP theorem fully applies here.

Creating an intentionally unreliable network: introducing the “apparatus”

The apparatus is a server appliance which operates a virtual switch with some interesting properties. The main interesting property is our use of netmap-enabled ipfw to selectively forward ethernet frames based on dummynet link emulation. For those interested, here are the specifications of the server:

With netmap, we can natively forward the full 40GBe capacity available to the machine with proper tunings on a simple ipfw ruleset, which is all we need for these experiments.

We use netmap with some python scripting to manage the control plane of the vSwitch. This allows us to manipulate the running configuration in a programmatic manner, thus ensuring reproduceable results. As a result of using netmap-based ipfw, we are able to forward unmodified ethernet frames, allowing any form of layer-2 protocol through the vSwitch, thusly providing a truly transparent bridge which can be completely unaware of the actual contents of the frames being forwarded. This allows us to support extensions such as Brocade’s ISL trunking without difficulty.

Network Partitions

In distributed systems theory, a formal proof often assumes the network is asynchronous: in other words, that messages between peers are allowed to be arbitrarily dropped, reordered, duplicated and delayed. In practice, this is a reasonable hypothesis; while some physical networks such as Infiniband can provide stronger guarantees, IP and Ethernet-based networks will likely encounter all of these issues.

In practice, detecting absolute network failure is difficult. In a fabric, since our only knowledge of other peers passes through the network, delays caused by degraded links are indistinguishable from any other traffic. This is the fundamental problem with a network partition, they are rarely a true hard-failure but instead just a source of massive packet loss. Further, when partitions do arise, we have few options for diagnosing the cause, resulting in the need for an intervention. When the partition heals, the fabric controller software has to later work out what happened and try to recover from it, but how well does it do that in the result of inconsistency?

In this series, I intend to set up real ethernet fabrics and break them in various ways, observing the results. We will ultimately run an application (the netmap traffic generator) to generate traffic across the fabric, observing how many packets are permanently lost or duplicated by the fabric, as well as attempt to induce split-brain configurations by manipulating the fabric’s configuration on all sides of the network partition.

Hilarious shit that audacious users say

This is just a collection of hilarious quotes we’ve seen over the years of working on Audacious, a reasonably popular audio player. If you want to know where a quote came from, just google it. I’ve left these suckers fully unedited for your enjoyment. The only emphasis added is bolding of the more ridiculous parts, and hilarious replies which are on the same thread are kept in context.

Advice I - The values of Crystalizer and Extra Stereo plugin are dependent on ur audio quality (mp3 bitrate for example), the type of music (Metal, Techno, Pop..etc), and ur speakers capabilities. But an average value of 1.7 for both of them is just fine.

Before playing music, you go to the “Output” menu, “effects”, and select “crystalizer”. Open the Ouput-effects menu again and now you’ll see an item “settings” under “crystalizer”. Select this and set the crystalizer to 1.2 and close the setting. This setting will undo the exaggerated compression of MP3 files. Then go to the Output-effects menu again and select “Extra Stereo”. Go to the Output-effects menu again and click on the item “settings” that has now appeared under Extra Stereo. Set it to 1.3 and close the setting. That will compensate for the stereophonic effect loss due to MP3 compression.

Audacious is fast, lightweight and it has the best sound quality. Personally I use it with a bit of crystalizer and extra stereo.

i was dumbstruck with the audio quality pulseaudio + x-fi x-treme music + audacious media player with crystallizer plugin gave, when i switched to linux

I would add more, but I don’t feel like going over the various Linux distribution forums with a fine-toothed comb at this time.

You might notice a common theme here, if not it is basically this:

A common misconception is that these plugins actually ‘restore’ audio data lost at various stages (mastering, encoding, etc.). This couldn’t be further from the truth.

The crystalizer plugin’s name may seem arbitrary, but it’s actually intentional: the plugin is named after the same feature marketed by Creative. Creative claim:

Hear sound so vibrant and so dynamic, being surprised is an understatement. SBX Crystalizer enhances the dynamic range of your compressed audio source to give you a more realistic experience.

What they really mean here is that the dynamic range of the audio is expanded. The math behind this is simple and has been done since the 1980s, starting with the Aphex Aural Exciter, there is nothing new with this. Here is the basic algorithm shown in psuedo-code:

samples = [series of left, right pairs]
prev = [0, 0]
for x in samples:
    x[0] = x[0] + (x[0] - prev[0])
    x[1] = x[1] + (x[1] - prev[1])
    prev = x

This is a simple delta-limited expander, there’s nothing terribly new about it as I previously said. The only thing here is a simple psychoaccoustic trick, the expansion enforces perception of the peak amplitude of the pre-existing signal, nothing more.

Somehow people think that ‘compression’ means ‘artifacting caused by lossy encoding’, it doesn’t… the literature is discussing the increased difficulty in perceiving amplitude extremes in audio material mastered by idiots who think louder is better.

Is this plugin a cure for the loudness wars? No – not really. However, many people have reported that the additional emphasis has resulted in a more enjoyable listening experience to their 1990s/2000s audio collections. Does the plugin restore your crappy MusicMatch 128kbps MP3 collection you ripped when you were a kid? Definitely not.

As for the Extra Stereo plugin – it just adds additional stereo separation. The main benefit there is for headphone listeners as a lame substitute for proper crossfeed. Headphone listeners might find the BS2B plugin more useful.

CoreAudio, or how to actually use the worst documented audio API in history

Recently I started working on a CoreAudio plugin for audacious, to replace the old one which was removed in Audacious 3.2, since the mac port was abandoned due to the fact that Gtk+ is horrible on mac. Instead of updating the old CoreAudio plugin which was very limited and consisted of bad code ported from the XMMS days, I decided to start from scratch, using the simple SDL output plugin as a model.

The API itself is okay, but the documentation is misleading. For example, they encourage you to set up an Audio Unit Graph for simple audio playback with audio control. At least on OS X, this isn’t really necessary (thankfully).

Since this isn’t necessary, I will show you how to do it the easy way, since everyone seems to like to over-complicate things.

These examples assume you have a ‘pump thread’ which is pumping audio to a circular buffer. Implementing that is not covered here, but isn’t really very hard to do.

Figure 1: coreaudio_example.h

/* Header file for example code.
   We include CoreAudio and AudioUnit framework headers directly. */

#include <cstdio>
#include <cstdlib>
#include <pthread.h>

#include <CoreAudio/CoreAudio.h>
#include <AudioUnit/AudioUnit.h>

namespace coreaudio_example {

/* these format constants are based on the ones we use in audacious.
   S16_LE means signed 16-bit pcm, little-endian. */
enum format_type {
    FMT_S16_LE,
    FMT_S16_BE,
    FMT_S32_LE,
    FMT_S32_BE,
    FMT_FLOAT
};

bool init (void);
void cleanup (void);
void set_volume (int value);
bool open_audio (enum format_type format, int rate, int chan,
                 AURenderCallbackStruct * callback);
void close_audio (void);
void pause_audio (bool paused);

#define VOLUME_RANGE (40) /* decibels */

}

This is the API we will be implementing, it is pretty straight-forward. There are 6 functions which are provided:

The API will handle format conversion between your playback code’s preferred format type and CoreAudio’s native format – float32, non-interleaved linear pcm. We will do this using a lookup table based on the enum…

Figure 2: coreaudio_example.cc part 1

/* CoreAudio utility functions, public domain.
   http://kaniini.dereferenced.org/2014/08/31/CoreAudio-sucks.html */

#include "coreaudio_example.h"

namespace coreaudio_example {

struct CoreAudioFormatDescriptionMap {
    enum format_type type;
    int bits_per_sample;
    int bytes_per_sample;
    unsigned int flags;
};
static struct CoreAudioFormatDescriptionMap format_map[] = {
    {FMT_S16_LE, 16, sizeof (int16_t), kAudioFormatFlagIsSignedInteger},
    {FMT_S16_BE, 16, sizeof (int16_t), kAudioFormatFlagIsSignedInteger | kAudioFormatFlagIsBigEndian},
    {FMT_S32_LE, 32, sizeof (int32_t), kAudioFormatFlagIsSignedInteger},
    {FMT_S32_BE, 32, sizeof (int32_t), kAudioFormatFlagIsSignedInteger | kAudioFormatFlagIsBigEndian},
    {FMT_FLOAT,  32, sizeof (float),   kAudioFormatFlagIsFloat},
};

This is our lookup table which handles looking up the format specification attributes. There are a few members in each entry which are interesting:

We now continue with actually initializing our output unit so we can use it.

Figure 3: coreaudio_example.cc part 2

static AudioComponent output_comp;
static AudioComponentInstance output_instance;

bool init (void)
{
    /* open the default audio device */
    AudioComponentDescription desc;
    desc.componentType = kAudioUnitType_Output;
    desc.componentSubType = kAudioUnitSubType_DefaultOutput;
    desc.componentFlags = 0;
    desc.componentFlagsMask = 0;
    desc.componentManufacturer = kAudioUnitManufacturer_Apple;

    output_comp = AudioComponentFindNext (nullptr, & desc);
    if (! output_comp)
    {
        fprintf (stderr, "Failed to open default audio device.\n");
        return false;
    }

    if (AudioComponentInstanceNew (output_comp, & output_instance))
    {
        fprintf (stderr, "Failed to open default audio device.\n");
        return false;
    }

    return true;
}

void cleanup (void)
{
    AudioUnitInstanceDispose (output_instance);
}

The init () and cleanup () routines handle bringing up CoreAudio in the app. This gives you an output unit you can send data to using a callback. Now we should actually set up the unit for playback…

Figure 4: coreaudio_example.cc part 3

bool open_audio (int format, int rate, int chan, AURenderCallbackStruct * callback)
{
    struct CoreAudioFormatDescriptionMap * m = nullptr;

    for (struct CoreAudioFormatDescriptionMap it : format_map)
    {
        if (it.type == format)
        {
            m = & it;
            break;
        }
    }

    if (! m)
    {
        fprintf (stderr, "The requested audio format %d is unsupported.\n", format);
        return false;
    }

    if (AudioUnitInitialize (output_instance))
    {
        fprintf (stderr, "Unable to initialize audio unit instance\n");
        return false;
    }

    AudioStreamBasicDescription streamFormat;
    streamFormat.mSampleRate = rate;
    streamFormat.mFormatID = kAudioFormatLinearPCM;
    streamFormat.mFormatFlags = m->mFormatFlags;
    streamFormat.mFramesPerPacket = 1;
    streamFormat.mChannelsPerFrame = chan;
    streamFormat.mBitsPerChannel = m->mBitsPerChannel;
    streamFormat.mBytesPerPacket = chan * buffer_bytes_per_channel;
    streamFormat.mBytesPerFrame = chan * buffer_bytes_per_channel;

    printf ("Stream format:\n");
    printf (" Channels: %d\n", streamFormat.mChannelsPerFrame);
    printf (" Sample rate: %f\n", streamFormat.mSampleRate);
    printf (" Bits per channel: %d\n", streamFormat.mBitsPerChannel);
    printf (" Bytes per frame: %d\n", streamFormat.mBytesPerFrame);

    if (AudioUnitSetProperty (output_instance, kAudioUnitProperty_StreamFormat, kAudioUnitScope_Input, 0, &streamFormat, sizeof(streamFormat)))
    {
        fprintf (stderr, "Failed to set audio unit input property.\n");
        return false;
    }

    if (AudioUnitSetProperty (output_instance, kAudioUnitProperty_SetRenderCallback, kAudioUnitScope_Input, 0, callback, sizeof (AURenderCallbackStruct)))
    {
        fprintf (stderr, "Unable to attach an IOProc to the selected audio unit.\n");
        return false;
    }

    if (AudioOutputUnitStart (output_instance))
    {
        fprintf ("Unable to start audio unit.\n");
        return false;
    }

    return true;
}

void close_audio (void)
{
    AudioOutputUnitStop (output_instance);
}

At this point you should have full playback with callback to your callback and shutdown. Now to implement volume control…

Figure 5: coreaudio_example.cc part 4

/* value is 0..100, the actual applied volume is based on a natual decibel scale. */
void set_volume (int value)
{
    float factor = (value == 0) ? 0.0 : powf (10, (float) VOLUME_RANGE * (value - 100) / 100 / 20);

    /* lots of pain concerning controlling application volume can be avoided with this one neat trick... */
    AudioUnitSetParameter (output_instance, kHALOutputParam_Volume, kAudioUnitScope_Global, 0, factor, 0);
}

Two things here:

  1. Apple says you should set up an AUGraph for doing something as simple as controlling output volume. I say that is unnecessary. There is a lot of misinformation here as well, that kHALOutputParam_Volume sets the system volume; it doesn’t. It sets the individual output volume on the sound server, coreaudiod.

  2. The reason for the powf () and the scary math is to give a logarithmic scale for tapering the volume down, similar to what would be observed in an actual stereo system. If you don’t want this, just do float factor = value / 100.0.

Now to handle pausing…

Figure 6: coreaudio_example.cc part 5

void pause_audio (bool paused)
{
    if (paused)
        AudioOutputUnitStop (output_instance);
    else
    {
        if (AudioOutputUnitStart (output_instance))
        {
            fprintf (stderr, "Unable to restart audio unit after pausing.\n");
            close_audio ();
        }
    }
}

} /* namespace coreaudio_example */

That is all there is to the actual CoreAudio lowlevel operations. The rest is up to the reader. In the callback function you just copy your buffer data to ioData->mBuffers[0].mData. The amount of data requested is at ioData->mBuffers[0].mDataByteSize.

Hopefully this documentation is useful to somebody else, I mostly wrote it for my own reference. Usually I get to just deal with FMOD which is much more pleasant (ha ha).

Leaving atheme

Today I am announcing that I have decided to resign from atheme.org. With this, all involvement from me in IRC-related projects should be considered terminated; I have removed myself from the various machines run by Atheme and the github organization.

Atheme has been a great disappointment to me for the last few years. When I started the project with jilles, our goal and focus was to expand the potential of IRC networks by building a modern platform that other vendors could use as a reference to deliver a more compelling user experience, to make sure IRC remained relevant as a collaboration tool. To an extent, we succeeded in our mission, and for a time, built a world-class free software organization to back it. We knew then, that our work was more important than just building a next generation platform for freenode, which is why we built the project in this manner.

We were not alone in our efforts, many people including those from Anope, used Atheme as a playground for prototyping features that would later arrive in other projects. But it was not enough, IRC continued to lose users, and as the stakes got lower and lower people became more aggressive in their competition for the remaining market share.

But we did most of this work between 2005 and 2009. Since then we have just coasted along. Atheme ceased to be an incubator for new technology on IRC. Most of the actual engineering talent left long ago, to work on other projects.

Since then, various people turned it into some sort of ideology debate which is sillyness. IRC is meant to be an open protocol, developed by all stakeholders equally and in an innovative way. Unfortunately, that has not been the viewpoint of a lot of other Atheme people for some time, to the extent that I helped to fuel that fire (for example by enforcing our copyright on copied code in Anope) as APL, I apologise as IRC deserved better, and still does.

Atheme needs to rediscover itself and find a new path in order to maintain it’s position of relevance and leadership in the IRC development process. While Atheme has done some good in recent years, it has in many ways become a metrics game, a competition of how many networks are deploying the platform verses Anope, and this attitude will not bring the best possible engineering that I know Atheme can still deliver. This is a discovery process that needs to be done by the people running the day to day operations, not by someone who is busy pursuing other interests.

To that extent, it is time for me to move on. But it doesn’t really feel like quitting, as the Atheme that I care about has been dead for a long time. Atheme could have done so much more, like finishing Chiron, our implementation of Rob Levin’s CORRIDORS proposal, but didn’t. We had a shot at truly changing the world in the same way that IRC originally did and we blew it, simply because we were too apathetic to take on projects that we needed to.

Upgrading a production machine to Alpine 3.x - the definitive guide

This documents my upgrade process to the Alpine 3.0 tree from Alpine 2.8 tree. In reality, they are the same tree, but with different build parameters… specifically Alpine 3.0 uses Musl and Alpine 2.8 does not. Alpine 2.8 is also the last planned release series featuring uClibc.

The first step is to make sure you are using a modern apk-tools. Upgrade your system to 2.8 development (edge) if you’ve not done so already. Then install the static-linked version of apk-tools:

$ apk add apk-tools-static

Now modify your /etc/apk/repositories file to use the edge-musl repository. I recommend using this URL, but I may be biased:

http://mirrors.centarra.com/alpine/edge-musl/main
@testing http://mirrors.centarra.com/alpine/edge-musl/testing

Now do the actual upgrade. We’re going to use some flags that are not typically used, but are necessary to pivot the system safely into the new libc environment.

$ apk.static update
$ apk.static upgrade --available --no-self-upgrade

You are now running Musl, but you probably noticed that your configs for mkinitfs are wrong. Lets fix that and then reinstall the kernel package.

Change your /etc/mkinitfs/files.d/base to contain these lines:

/bin/busybox
/bin/sh
/lib/libcrypto.*
/lib/libz.*
/lib/ld*-musl*.so*
/lib/mdev
/sbin/apk
/etc/modprobe.d/*.conf
/etc/mdev.conf

Now we just reinstall the kernel package:

$ apk fix linux-grsec

Voila. We can now reboot and be purely on Musl.

ShapeShifter: The Latest in Snake Oil

Several people sent me a link recently to the ShapeShifter, a new web-application firewall product released by Shape Security. Among other things, it promises to protect your website by “applying polymorphism as a defense strategy.”

This is snake oil. First, because, none of the HTML markup transformations it actually does can be considered at all polymorphic, and secondly, because these methods have already been tried in the real-world and discarded because they are pointless.

Actually, the product video shows what it is really doing, which is scrambling HTML attributes and rewriting them to the correct input on the application side. This is nothing new, really.

What ShapeShifter really does

Malicious bots can already defeat ShapeShifter, simply by walking the DOM. I don’t hesitate to mention this, because the bad guys already do this. A login field is always going to be prefixed by a label saying whether or not it’s a username, e-mail address or password field. And really, you don’t have to look at it that way – a username field will almost certainly come before a password field in the DOM.

Then you have their bizzare press release, which has strange statements like:

“The ShapeShifter focuses on deflection, not detection. Rather than guessing about traffic and trying to intercept specific attacks based on signatures or heuristics, we allow websites to simply disable the automation that makes these attacks possible.”

This is how you know it’s 100% bullshit. The scripts that the criminals are using will simply be adapted to walk the DOM, and most of them already walk the DOM anyway. There’s frameworks that allow you to drive an entire browser programatically. These frameworks would not be defeated by ShapeShifter’s markup transformations.

Instead, lets work on truly innovative ways of defeating bots like implementing the Edia JSON signature scheme into web requests. Make the browser complete a complex (and most importantly, computationally expensive) proof of work in JavaScript. ShapeShifter is not the way forward, even CAPTCHAs are better protection than that.

Hosting Companies and CEOs

A lot of people have asked me today, what my opinion is on BurstNet getting a new CEO. This is not unusual for me, a lot of people ask my opinion every time one of these things happens. Especially now days because I have significant involvement in one of Dallas’s truly independent datacenters, which apparently makes me an expert on the hosting industry.

Until now, I haven’t really thought much about CEOs coming in and changing things up. But really, thinking about it, it seems like a trend that should be concerning a little.

Is a hosting company’s success really the result of a great CEO or instead a great engineer?

Every time I read about one of these established “industry standard” companies getting a new CEO, they always mention that they came from a startup that was outside the hosting industry (usually a startup that nobody heard about), but was acquired by some tech industry juggernaut:

Prior to joining BurstNET, JW Ray was the co-founder and Chief Operating officer of Learn.com, a world-wide leader in cloud based Learning Management Systems. Learn.com was acquired by Taleo, an Oracle company, in October of 2010. Mr. Ray also currently serves as the Managing Member of Backlog Capital, a venture debt fund, that focuses on the cloud industry.

As a systems engineer who has worked in hosting full-time since 2009, this seems foreign to me. And it’s not because he came from a startup, before working on my own projects (in hosting), I used to work full time for several startups… so I totally get what startups are about.

From my perspective, the hosting industry is something where having an outsider CEO could potentially have bad effects on the organization. There’s a certain art to running a business in this sector, and to do it well, you have to learn it from the bottom up.

Call me crazy, but I seriously doubt that BurstNET’s new CEO has ever had to do any of these things:

These experiences are, in my opinion, what drives excellent leadership in a hosting business. Because then, you’re not just barking orders and discussing pointless metrics – instead, you actually understand realistically what your staff are capable of doing, and give them the right tools to get the job done.

But what about strategizing?

The key thing about strategy is determining what the market wants and then figuring out how to sell it. An outsider CEO isn’t going to know necessarily what the market is looking for.

The best strategy is one that is built from the bottom up. Involve your support and sales teams… find out what is hot on the market and bring your dev team in to make it a reality. If you’re using prebaked software from some vendor, you’ve already lost this game. The serious customers want to see that your organization has the engineering skill to build a platform. This is why OpenStack and Cloudware are big deals.

I’m not convinced that the hosting industry cannot produce good CEOs. If you look at Steadfast, I would say that Karl is doing a pretty good job with it.

In my opinion, putting a new CEO from some startup in at BurstNET isn’t going to shake things up, because he doesn’t know the hosting industry like an experienced person would. But then again, nobody has really thought of BurstNET as an innovative brand for quite some time…

advanced abuild hacking, part 1

This is kind of a quick and direct introduction to how to work with abuild and APKBUILDs in a low-level way.

abuild(1) takes a list of targets, much like make(1) does. If you do not specify any targets, it defaults to a default set of targets. This functionality is provided by all() in abuild.

What this means is that we can run specific targets and get specific effects, which is useful when you are debugging an APKBUILD.

Installing the build dependencies for an APKBUILD

abuild(1) uses the builddeps() target to determine dependencies and depending on how it is invoked, synthesize a transaction in the package manager to install a set of packages pinned as dependencies of .makedepends-$pkgname. This behaviour is derived from similar behaviour in Debian’s pbuilder(1).

Thusly, we can invoke the builddeps target in order to install our dependencies, like so:

$ abuild -r builddeps
(1/1) Installing .makedepends-foo (0)

As all of our dependencies are pinned to .makedepends-$pkgname, removing them when we are done is also fairly easy:

$ abuild-apk del .makedepends-foo
(1/1) Removing .makedepends-foo (0)

Splitting up build logic into substeps

Some packages, such as the Xen hypervisor consist of many different components integrated into a single APKBUILD. In the case of Xen, we have the hypervisor itself, management tools, stub domains, and documentation. Splitting up these components into individual build targets allows us to debug the build process of the individual components, without having to build the other components. This saves time when the build process is relatively time-consuming.

For a practical example of this, we will look at the relevant parts of the APKBUILD of Xen 4.3:

# These tasks are added as separate tasks to enable a packager
# to invoke specific tasks like building the hypervisor.  i.e.
#    $ abuild configure build_tools
configure() {
	cd "$_builddir"

	msg "Running configure..."
	./configure --prefix=/usr \
		--build=$CBUILD \
		--host=$CHOST \
		|| return 1
}

build_hypervisor() {
	msg "Building hypervisor..."
	make xen || return 1
}

build_tools() {
	msg "Building tools..."
	make tools || return 1
}

build_docs() {
	msg "Building documentation..."
	make docs || return 1
}

build_stubdom() {
	msg "Building stub domains..."
	make stubdom || return 1
}

These are the individual build steps for building each component. abuild(1) itself calls the build target, so we will need to have our APKBUILD fan out to each build step. We provide our own build function to glue it all together, although I hope to be able to improve abuild(1) where this will eventually be unnecessary.

This is our build function:

build() {
	cd "$_builddir"

	configure || return 1
	build_hypervisor || return 1
	build_tools || return 1
	build_docs || return 1
	build_stubdom || return 1
}

The return 1 at the end of each step is important. It ensures that abuild(1) gives up if any of the components fail to properly build.

What this nets us is the ability to do the following:

# Clean, unpack and prepare our build environment (including patching).
$ abuild clean unpack prepare

# Test building only the hypervisor.
$ abuild configure build_hypervisor

In the event that we only need to look at building the hypervisor, or the management tools, this work has now cut our build times significantly.

You can also add utility targets, such as invoking make menuconfig in the kernel. Here is an example of that, from the linux-vanilla APKBUILD:

# this is so we can do: 'abuild menuconfig' to reconfigure kernel
menuconfig() {
	cd "$srcdir"/build || return 1
	make menuconfig
	cp .config "$startdir"/$_config
}

As you can see, the ability to declare custom targets in APKBUILDs allows for versatile control over the build process.

Atheme in 2014

One could argue that Atheme has not had a very good year for 2013. In a lot of ways, this is because what little precious time and energy I have to spend on free software has had to go to more important and urgently required projects, such as pkgconf.

Though, it’s not all been for nothing. We actually have made good progress on the IRCv3 front in many ways. There’s a real branch of UnrealIRCd 3.4 that exists now that you can download and play with, the result of collaboration between Atheme and the UnrealIRCd crew.

But for all the progress that has been made, there has been many things that have not been done, and really bring into question the status of Atheme as an organization. Some examples are:

So, for the 2013 - 2014 organizational year, we have selected jdhore as our new project leader. His platform is essentially to get us caught up and back to speed on the various sundry tasks that aren’t getting done.

This allows me to of course worry more about tasks outside of Atheme, which is arguably a good thing with two startups on my plate right now.