Websockets and You

Today’s topic is going over how the EventBus model works in general and how it will work specifically in situ, in ARA-V. There are important issues that have to be ironed out in ARA-V’s system because I want the file-system clients to be able to update the metadata of a file “by hand”, i.e. using either their OS’s tools or third party tools. Most of these problems are reserved to the ARAfs system, however having a robust model of event serving is key for user experience.

As well as explaining what I am doing, we’ll be going over what I’m not. As you’ll remember from my last article I was trying to hunt down what the heck made the Push notifications for this site work. I found it is a new part of the HTML5 spec called Server-Sent Events. They’re meant for uses like server notifications and event signaling. Now, since I’m making this section of the system from scratch (the eventbus gateway I mean, I’ll be using a library for websockets obviously) in python there’s some discussion to be had regarding why I chose websockets over HTML5’s SSE. A discussion which I’ll be having mostly on this post.

Event Bus flowchart, this basically amounts to the highest level overview of the system

Say we have two clients connected to our fancy API. Client one updates an object, changes its title or adds a file, how would the other client find out? Well it could ask, but does it even know where to ask? Maybe it could ask a dedicated route (derp.com/listofchanges) that documents changes, but that also opens one up for client spoofing attacks. Since authenticating a user on every connection has a heavy load we use session_ids and tokens and what have you. on a compromised network it may be possible get a client’s token and session_id and to present themselves as a client, and now they have a big long list of API updates they can get in a second.

Now all this is also possible on a websocket connection as well; however the websocket is read only, untraversable, and periodic. A hijacked session would not receive a full object update, that’s only sent during authentication. The hijacked session could only receive updates, many of which will be partial, to individual API objects.

Also it’s bandwidth expensive. Retrieving a json file with up to 100 changes in it every second or so is wasteful. On top of that the metadata, headers and the like, will increase the overhead of each request. Combine that fact with the fact there could be long periods with any changes(basically transferring only headers) and the bandwidth loses on one client could be in the megabytes over the course of an hour, with no benefit.

Lastly one minor detail: the longer the time between client updates, the higher the odds that conflicting edits will be made. Conflict resolution is possible with millisecond timestamps, however this takes more time to deal with. It’s better to handle conflicts before they happen.

And so sockets are the answer, right?

Hold on there. the Web gods have given us a choice: SSE or WS. SSEs are unidirectional, served http content that uses HTTP/1.1’s chunked transfer encoding system and long polling. Basically long polling allows the browser to open an http connect and wait, while chunked transfer allows shorter segments of data to be sent at a time so the connection doesn’t get blocked waiting on a huge file like an image. Similar to how packets work in low level networking. Chunked transfers also allow unrelated information to be sent to the browser on a single open HTTP socket, in this case server generated events.

All this used to be done in client code(e.g. javascript, imagine the inefficiencies), but SSE and the html5 specification abstracts the whole arrangement into an API.

SSE seems like the golden choice right? Well no, SSE’s are fast, but they’re not real time. For one thing you’re at the mercy of the http socket itself. In our application that connection is going to be full of images and video and music. It’s possible during a page load the SSE won’t get there until the end. However, the big issue is: there will be other things connecting to the api besides browsers. The file system(s) will be using the Event bus just like the webUI users. It will be changing file locations and updating metadata in both directions and the web clients need to know what it’s doing. Essentially a unified interface will make things less complex, keeping in step with the KISS principle.

On to what I’m planning on doing. The problem is as above, client one changes some metadata, client two won’t see it unless it sends another GET request. So in comes websockets, and the EventBus, both clients are already connected to their own websockets. Client one updates some meta data using a PUT, this HTTP request is routed through the EventBus gateway.

PUT /assets/{id}
The gateway knows from the PUT and the URL What the change is(update), what type of thing is being updated(asset), and what has been updated({id})(the id is probably not useful aside from error checking, but it has to be stored anyway). Then the gateway directs the HTTP request to the API on the backend. The api does its thing, updates the records, and returns a 200 and a body in json of the change. This return is important because the gateway isn’t able to tell if any data is correctly formatted, that’s the API’s job. The gateway then responds to the client’s request accordingly, minus the body of the API response. Then the gateway issues a broadcast to all connected clients with the body returned from the API server. All clients then update their local records accordingly.

(as an aside: there will be changes to limit the scope of which client receives what data when)

There’s a clear benefit to this method, it’s totally data-agnostic. The API server handles data formatting, validation, and storage. The predefined events in this system are based on the static schema models defined on the API server.

Now even though the API is separate from the Gateway, implementing distributed features would be of late stage importance(i.e. the gateway will be configured to use a specific API server) because understanding system performance will take some testing.

On the ARAfs side this can boil down to a straightup socket connection, allowing the filesystem(or another gateway) to send events as well as receive them. The ARAfs system will either behave like a sister event gateway or have to connect like a webUI client. or some version of both.

In the end this is a delicate balancing act between what is easier. SSE would make coding the web client easier, but are basically worthless when you’re talking about the filesystem side, adding extra gateways, or (another thing I should’ve talked about, but it can wait) the transcoding servers issuing progress updates to clients. Add all these things together and any development speed SSE provides would be outweighed by the fact the back needs real time communication between all of the backend pieces anyway. Websockets are just as useful in this scenario as any other socket protocol and hopefully will have no problem handling all these messages.

Leave a Reply

Your email address will not be published. Required fields are marked *