December 5, 2014

Chunky Files

Filed under: Main — admin @ 12:01 am

I wasn’t a computer major in college. I could have been, but I just wanted to get the hell out of there. Had I changed majors, I hope that my contribution to the field of computer science would have been something I call chunky files.

My idea is this:

A file should know as much information about itself as possible. That information should be included as part of the file and be transportable with the file.

If you find that confusing, then let me explain a bit about how files are stored on a computer: A file is a mess of data. It’s just bits all crammed together — or in separate pieces (fragments) — on a storage device. That’s it.

A program can create a file and put all sorts of information into the file. Many do, but that information is specific to the program. Most programs simply dump raw data into the file and leave it at that. For example, when you create a tiny text file or a graphic image, it’s just data. The image may have information about color palette, resolution, and so on, but beyond that it’s just data.

The rest of the information about the file is stored in the directory. The directory is simply a database on the storage device. It’s used when you see a folder in a graphical operating system like Windows or OS X. The directory lists the file’s name, creation date, modification date, size, and other trivial. Internally, the directory also references where the file physically exists on the storage media.

The directory model is used for just about all common computer storage media. To me, what that implies is that files themselves are dumb. They don’t even know their own name! Further, the file type, the program that created the file, and even the user who ran the program are all unknown elements. Only by examining the file name does the operating system know the file type.

How such a horrid system has been allowed to continue is beyond me.

My solution is to use chunky files.

A chunky file knows a lot about itself. Like the old Mac OS files, a chunky file has two forks: A data fork and a meta-data fork.

The data fork is simply the file’s data. It could be text, graphics, bits and bytes, whatever.

The meta-data fork describes the file. Not only does it list the file’s type, it lists the creation date, modification dates, the user who created the file, which programs were used to create or modify the file, the file’s size, permissions, IP address, and a host of other trivial tidbits, all of which describe the file.

If the world switched to chunky files, then you’d never again see one of those “Open With What?” dialog boxes. Instead, you’d see, “This file was created by the Blorfus Editor 2.0 by Dan Gookin on July 16, 2006. Unable to find a compatible app for opening this file.”

The file knows what it is.

A drawback to this approach is that it makes the files larger. That’s not an issue with today’s mass storage; adding the chunky data wouldn’t overly-impact the file’s size. In fact, the data could be stored in the XLM format (plain text). And given the minimum allocation for files today (typically 8K), the impact on storage would be minimal. That wasn’t the case years ago, where most files were small, but today it’s manageable.

So that my notion for chunky files. It might be too late at this point in the space-time continuum to implement my idea. And even if I did choose to go all-nerd back when I was in school, who knows how far this idea would have gotten?

4 Comments

  1. Mmmm, that does sound sensible today, but as you mention in the old days would have caused havoc! there are ways around 8K the limit for instance I have created a 2K asm files (but thanks to the mapping of the drive it still takes 8K for Windoze to find it!)

    Comment by glennp — December 5, 2014 @ 6:48 am

  2. When 256 byte sectors were the minimum allocation unit, and your typical hard drive (in a PC XT) was 10MB, all that chunky data — metadata — would have been a burden on storage.

    A drawback to an 8K allocation unit is that the hard drive gets really sluggish when it has to deal with a directory full of small files. My programming directories have dozens of tiny files in them. While the terminal window lists them all lickety-split, a folder window takes several seconds to populate.

    Comment by admin — December 5, 2014 @ 7:26 am

  3. The future is mobile computing, and mobile operating systems are trying to destroy the concept of direct file access through I/O and replace files with content providers that can only be accessed by explicitly requesting the operating system through intents (in Android). The funny thing is that people assume mobile operating systems are more efficient than desktop operating systems, but they are actually more bloated because of this. But this does make the operating system more secure and stable so it crashes less.

    Comment by BradC — December 5, 2014 @ 1:53 pm

  4. BradC, yet another reason why I’m leery of the merger between desktop and mobile operating systems.

    Comment by admin — December 5, 2014 @ 1:58 pm

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.


Powered by WordPress