Your Files Aren't Where You Think They Are

In this article

The filesystem is a presentation layer
What actually happens when you open a file
fanotify: the interception point
Stub files and the illusion of presence
Why you'd want to move data without telling applications
Searching what you can't see: the Husk Catalog
Why user-space is the right place for this

The Filesystem Is a Presentation Layer

Let's start with something that feels obvious but has a profound implication: when you look at a folder on your computer, you are not looking at your data. You are looking at a map of your data — a structured set of names, paths, and pointers that your operating system maintains so you don't have to think about where bytes actually live on physical hardware.

This map is the filesystem. And like any map, it can point to different territories without the reader knowing or caring. A road atlas doesn't change when a highway gets rerouted — the map is updated and you drive the new road. Your applications work the same way.

What you believe

The file is at that path

When you see /archive/video/film.mov, it feels like the data physically lives at that location — like a physical file in a physical folder in a physical cabinet.

What is actually true

The path is a pointer

The path is a name. The name points to a metadata record. The metadata record points to physical blocks. Those blocks could be on an SSD, a hard drive, a remote server, or a tape cartridge. The name doesn't change when the blocks move.

This is not a quirk or an edge case. It is the fundamental design of every major operating system. UNIX-style systems have worked this way since the 1970s. Windows NTFS has worked this way since the 1990s. The abstraction is so good that most people who work with computers professionally have never had to think about it. HuskHoard is built on top of that abstraction — and exploits it deliberately.

What Actually Happens When You Open a File

When your video editor calls open("/archive/video/film.mov"), a surprising number of things happen before a single byte of video data is read. Understanding this chain is essential to understanding where HuskHoard sits.

Your Application

Calls open(path). Has no idea what happens next.

user space

C Library (glibc)

Translates the function call into a system call that the kernel understands.

user space

3
fanotify / HuskHoard
The kernel pauses the request here and asks HuskHoard: "Is this file available?" This is the intercept point.
← husk lives here

VFS (Virtual File System)

The kernel's unified filesystem interface. Translates paths into inode lookups regardless of the underlying storage type.

kernel space

Physical Storage

SSD, HDD, NFS mount, tape drive — the VFS doesn't care. Neither does your application.

hardware

The critical insight is layer 2: the Virtual File System. The VFS is the part of the Linux kernel that lets you mount an NFS share, an SMB share, a FUSE filesystem, and a local ext4 disk — and address files on all of them with the same path syntax. The VFS is why /mnt/nas/file.mov and /local/file.mov look identical to your application. What's underneath the mount point is irrelevant to anything above it.

HuskHoard operates just above this layer, using fanotify to intercept access events before they reach the VFS for fulfillment. At that point it has a choice: allow the access (because the data is local), or pause the access and go get the data from wherever it actually lives.

fanotify: The Interception Point

Linux has two APIs for watching filesystem events. inotify is the older and more familiar one — it tells you that something happened (a file was opened, modified, deleted) but by the time you hear about it, the event is done. You're an observer after the fact.

fanotify is different. With fanotify in permission mode, your process doesn't just observe events — it participates in them. The kernel holds the calling process in a suspended state and waits for your daemon to issue a verdict before proceeding. This is the mechanism that makes transparent demand-loading possible.

// Simplified: what HuskHoard does when fanotify fires a permission event

fn handle_open_event(event: FanotifyEvent) -> Permission {
    let path = event.file_path();

    // Check the Husk Catalog: is this file local or offline?
    match catalog.lookup(&path) {
        FileState::Local    => Permission::Allow,   // data is here, proceed
        FileState::Cached   => Permission::Allow,   // warm cache hit, proceed
        FileState::Offline  => {
            // Data is on tape. Pause the caller, go get it.
            let volume = catalog.volume_for(&path);
            prompt_or_autoload(volume);             // insert tape / load from robot
            stream_to_cache(&path);                 // pull blocks from tape
            Permission::Allow                       // now let the caller through
        }
    }
}

The calling application — the video editor, the backup job, the Python script — never receives an error. It issued an open() call, and eventually the call succeeded. Whether that took 2 milliseconds (local SSD) or 45 seconds (tape load + seek) is the only observable difference. The path didn't change. The filename didn't change. The application's code didn't change.

Why not a kernel module?

Custom kernel modules can also intercept filesystem operations, but a bug in kernel space causes a full system panic — the entire machine goes down. A bug in a user-space daemon is just a crashed process. With HuskHoard written in Rust, even that is unlikely: Rust's memory model eliminates the entire class of use-after-free, buffer overflow, and race conditions that cause most daemon crashes. User-space is both safer and faster to develop against.

Stub Files and the Illusion of Presence

When HuskHoard moves a file's data to another storage volume, it replaces the file's contents with a stub. A stub is a near-zero-size placeholder that lives at the original path and carries the file's metadata in its Extended Attributes (xattr): the real file size, its checksum, and the UUID of the storage volume that holds the actual data, weather that is a disk, an S3 bucket or LTO tape.

From the directory listing's perspective, the file is still there. ls -lh reports the correct file size. stat returns the correct timestamps. A backup job scanning the directory sees all the filenames. Nothing looks different — until something tries to actually read the bytes, at which point fanotify catches the open request and HuskHoard goes to work.

# A stubbed file looks completely normal in a directory listing
$ ls -lh /archive/video/

-rw-r--r-- 1 jm users  47G Nov 14 2024 project_alpha_final.mov   # on tape
-rw-r--r-- 1 jo users 112G Mar  2 2025 broll_march_2025.mov       # on tape
-rw-r--r-- 1 jm users   8G May 20 2026 current_edit_v3.mov        # local SSD

# The xattr reveals what's really going on
$ getfattr -n user.husk.state project_alpha_final.mov
user.husk.state = "offline:volume=83ad72b7:checksum=a3f9..."

This design means your directory tree is always a complete and accurate representation of your archive. You never have to wonder what's "on tape" versus what's "on disk" — everything is present in the namespace. The catalog tells you the storage tier; the stub ensures the path is always valid.

Why You'd Want to Move Data Without Telling Applications

Once you accept that the filesystem path is just a pointer, and that HuskHoard can transparently intercept access to data that isn't local, the interesting question becomes: when would you want to move data behind the scenes? The answer turns out to be surprisingly broad.

01 — Performance

Hot / Cold Tiering

The project you're actively cutting needs fast SSD access. The 200 projects you finished last year don't. HuskHoard can automatically migrate files that haven't been accessed in 30, 60, or 90 days to tape — and transparently retrieve them if something reaches for them again. Your fast storage stays fast because it only holds what you're actually using.

02 — Security

Air-Gap Without Effort

Moving data to an ejected tape cartridge or a cold disk creates a physical air gap with zero workflow change. The files still appear in your directory. Applications still reference them by the same paths. But the actual bytes are on a cartridge that cannot be reached by ransomware, rogue processes, or network attackers — because it isn't connected to anything.

03 — Legal & Compliance

Immutable Retention

Many industries require records to be retained in a form that cannot be altered. WORM (Write Once, Read Many) sequentially written off line volumes are a physical guarantee of immutability: the data cannot be overwritten even by the drive itself. HuskHoard can migrate records marked for legal hold to WORM volumes automatically, while keeping them accessible at their original paths.

04 — Cost

Storage Tiers at Scale

At $0.005/GB, LTO-9 tape is roughly six times cheaper than current hard drives and two orders of magnitude cheaper than cloud object storage for data you actually keep long-term. Migrating your cold tier to disk or tape while keeping your warm tier on NVMe reduces your total storage spend without requiring any changes to how your applications reference data.

In all four cases, the mechanism is the same: HuskHoard relocates the physical data, leaves a stub at the original path, and uses fanotify to make retrieval seamless when something needs the data again. The applications above never see the machinery.

Tiering Policies in Practice

HuskHoard lets you define migration policies that run automatically. A simple policy might look like this:

# husk-policies.toml — example tiering configuration

[[policy]]
name        = "archive-cold-video"
watch       = "/archive/video"
action      = "migrate-to-tape"
after-days  = 60          # migrate files not accessed in 60 days
min-size    = "1G"        # only files over 1GB (ignore small assets)
volume-pool = "video-tape-pool"

[[policy]]
name        = "legal-hold"
watch       = "/archive/legal"
action      = "migrate-to-worm"
immediately = true        # move to WORM tape immediately on write
volume-pool = "worm-pool"

Once a policy runs, the files are on tape and the stubs are in place. Nothing else in your workflow changes. A user who opens the legal folder the next morning sees exactly the same files they always saw. If they open one, HuskHoard handles the retrieval. If they never touch it, the data costs $0.005/GB to keep rather than $0.030/GB.

Searching What You Can't See: The Husk Catalog

The stub-file model keeps your directory tree intact. But what happens when the file is on a cartridge that's in a box in another room? The stub is there, but if you try to open it HuskHoard needs to tell you which cartridge to insert. And what if you don't remember which folder the file was in — you just know it exists somewhere in your 200TB archive across 8 disks and 15 tapes?

This is what the Husk Catalog is for. The catalog is a persistent, queryable index that HuskHoard maintains on your host machine, entirely separate from the physical media. Every file across every volume — whether that volume is currently loaded, sitting on a shelf, or in an offsite location — is indexed and searchable.

project_alpha

husk find ↵

/archive/video/2024/project_alpha/project_alpha_final.mov

47.2 GB · written 2024-11-14

offline · vol 83ad72b7

/archive/video/2024/project_alpha/broll_uncut.mov

112.4 GB · written 2024-10-03

offline · vol a1f903cc

/archive/assets/project_alpha/grade_luts.zip

340 MB · written 2024-11-20

online · vol d4e712ab

The search above ran across three volumes — two of them offline, one loaded — and returned results in milliseconds. No cartridges were touched. The catalog already knew what was on each volume because it indexed the files when they were written.

This is fundamentally different from how LTFS works. LTFS stores its index on the tape itself. If the tape is ejected, the index is inaccessible. You cannot search an LTFS volume without mounting it. With the Husk Catalog, the index is always available regardless of where the physical media is. You can search your entire archive from a laptop while all your tapes are in a safe across town.

What the Catalog Stores

For every file on every volume, the catalog tracks:

Full path at the time of write
File size and last-modified timestamp
SHA-256 checksum (used for integrity verification)
The UUID of the volume the file lives on
The block address (for fast seeks during retrieval)
Current file state: local, cached, offline, or WORM

The checksum is particularly important. At any time you can ask HuskHoard to verify your archive — not by mounting every volume, but by comparing the stored checksums against the catalog's records. If a cartridge or disk has degraded, you know before you need the data.

# Verify integrity of a specific volume without mounting it
$ husk verify --volume 83ad72b7

Verifying volume 83ad72b7 (offline — please insert volume)...
[Husk] Please insert volume [83ad72b7] into /dev/nst0

Reading and verifying 14.3 TB across 847 files...

✓  845 files  — checksum OK
⚠    2 files  — checksum mismatch (see: husk-report-83ad72b7.log)

Recommendation: restore affected files from redundant copy on vol a1f903cc

Why User-Space Is the Right Place for This

The question we get most often from technically-minded readers is: why fanotify and not a custom filesystem or kernel module? Wouldn't a custom FUSE filesystem give you more control?

FUSE is a legitimate option and HuskHoard considered it. The problem is that FUSE filesystems need to be the mount point — you'd have to mount the Husk filesystem somewhere and then route your data through it. This means your existing directory structure either has to move (disruptive) or you have to use bind mounts everywhere (complex and fragile). It also means FUSE overhead on every single filesystem call, not just the ones that need intervention.

fanotify doesn't require any remounting. It watches a directory tree that already exists, on whatever filesystem you're already using. It only intervenes when intervention is needed. When a file is local, the fanotify handler responds in microseconds with FAN_ALLOW and gets out of the way. There is no measurable overhead for routine access to local files.

Application calls open()

A video editor, backup script, or any other process tries to open a file at a path in a Husk-watched directory.

Kernel fires fanotify permission event

The Linux kernel suspends the calling process and delivers the event to the HuskHoard daemon. The application waits.

HuskHoard checks the catalog

The Rust daemon looks up the file in the Husk Catalog. Is it local? Cached? On an offline volume or S3? On a WORM tape?

Data is retrieved if needed

If the file is offline, HuskHoard prompts for the disk or cartridge or triggers an autoloader. The data streams from the volume into a local cache buffer.

FAN_ALLOW — caller resumes

Once the data is ready, HuskHoard sends the allow response and the kernel lets the application proceed. From the application's perspective, a slow file open. Nothing more.

The result is an archive system that sits invisibly inside your existing directory structure, imposes no overhead on data that's already local, and handles offline retrieval transparently for anything that isn't. No custom kernel code, no separate mount points, no changes to any application that reads your files.

The filesystem is a presentation layer. HuskHoard manages what sits behind it.