So, as you’ll see in //TODO I have listed at the bottom Virtual Filesystem. You may be wondering to yourself why what has up until now been seen as a web app needs it’s own filesystem. Well dear reader I’ve made up, you miss the importance of backwards compatibility.
This is an embed from the Digital Asset Manager, Razuna. I have this running for testing and comparison purposes.
I use it as a guide on what to improve.
Aside from the webui, this is the only way to get files from the server because the file structure is honestly horrendous.
Z:\assets\2\EA880B42F8E247FC9BC0F3D97A1BD442\img\CC5F4E6FCE9E4A9F8BE4307F46E897B6
This is where the files are stored on my system. Basically un-navigable. It flat packs every folder in the root directory, all the files are sorted by type for some reason, oh and the hashes are huge. However on the web ui these files are stored in /uploads/worpresstest. Much clearer than the monstrosity above right? Why can’t I access these files using that path instead of a 16-byte hex number? That is what I want to have on a file server. A clearer, easier to follow folder structure that is synced between the folder view on the webui and the folder structure returned by an actual file server. A system to sync metadata from the api database directly with files. Basically a total unification between a web service and a local nas. In service of this goal I’ve discovered several systems that should lead me to the promise land, I will briefly summarise them and why they’re important.
First up is the real champion here: FUSE, filesystem in userspace. This exposes hooks into the kernel’s file handling systems without needing to write and load a kernel module (which is basically like adding another cylinder to your 4 door sedan). This allows you to programmatically create a system of symlinks and modify files as they’re being accessed, all without needing to write a kernel module. In straight c, that would be not only a daunting task for even the best programmers, but basically impossible to maintain with all the versions of the linux kernel out there.
Next up is glusterfs, also a virtual file system and built using FUSE. Their site has a great explainer here: http://gluster.readthedocs.io/. It’s basically a distributed file system that uses a completely modular structure for interfacing with it. Its modules are called translators and they’re all stacked to act upon kernel handles to modify how they work.
The idea is the vfs portion of arafs will be a part of gluster and rewrite it’s file requests in real time(and thankfully there’s a python lib to help me).
Beyond its virtual file system and translators is the most unique selling point of this free software is UFO, Its Unified File and Object system (aka gluster-swift) Now known as SwiftOnFile. Basically it will allow arafs to serve both file server users and web users in native, fast ways(building object storage directly into the program will hopefully future proof it, but it’s hard to say if the file system will go away anytime soon).
Gluster builds and serves its files transparently to gluster-clients, making them totally agnostic when it comes to serving files over the network. Any traditional file server daemon can export gluster mounts without any special libraries. However this is where the problem lies, since most translators are static I don’t yet know how using a translator that has to retrieve metadata from an api would affect performance. Not to mention how file client rewriting exif data on files themselves would necessarily work. Right now I have roughly this model setup:
The entire file has to go through the exiftool pipe. I could probably get quicker write performance doubling up on the pipes , one strips while one exports the exif data. Thankfully linux has me covered with tons of automated caching. Ideally objects that are read once and not written to will be cached with the correct exif data. That will however be a late stage concern. If all else fails writing metadata directly to files is possible.
Some kind soul made this pass through driver for the python FUSE library: https://github.com/terencehonles/fusepy/blob/master/examples/loopback.py
However I will probably be attempting to make a module (aka translator) for gluster to prevent any interference a separate vfs module would cause. The python bindings for that are here: https://github.com/gluster/libgfapi-python
And I guess that’s it for this one, I would love any feedback, Hopefully there’s tangible content here and not just me rambling about drivers and abruptly stopping. I don’t edit these too much, this is designed to be a more coherent stream of consciousness while I soldier on towards completion.