So far this is just a copy of the nullfs example from /usr/share/doc/python-fuse with some stuff renamed To make it work: - How do you get another arg in the options? - pydoc fuse shows some magic option parser stuff - need this for the "source" directory, or backing storage area - Better to compress chunks? Or have a blob more like a zip? ----- TODO: + Make inflate/deflate block based as needed, so we don't have to do a bunch of work up front and waste a bunch of space on disk - done + Make files just contain a backing storage key, this key will reference what we have in it now (the data list and stat info) so that complete duplicate files will not take up a few extra megs and still be able to have their own permissions and stuff + Copying read-only files doens't work (permission denied on close, because that is the point we are opening and writing to the original file) - done - we open a file handle at __init__ now and use that - R/W is basically ignored at this point - fsck: - test each chunk < last is a full block size (this would be a good assert too) + delete unused chunks (refcounting) - pack multiple chunks in to "super chunks" like cromfs/squashfs to get better compression (e.g. 4M of data will compress better than that same file split in to 4 1M pieces and compressed individually presumably) - Speed it up? Or is it "fast enough" - should die on errors accessing blocks, will also need some kind of fsck to find corrupt blocks and files effected, this way if there is a problem and you have another copy of the file then the block can be recreated - some kind of config/IOC to allow plugging in the hash, storage, etc methods. Main components like FileSystem, Chunk, ChunkFile don't need to be swapped out since they are generic - Maybe compression method doesn't belong in Chunk? it should be part of storage (for super-chunks?) or should super-chunks be a "standard" part + Add refcounting so we can expire chunks, or would a reverse list be better? - If we separate metadata from chunks then we can just rebuild the metadata for fsck ----- Other thoughts: - If there was an easy way to "open a file" or something and have it "touch" all it's pieces, you could just run that in the mounted tree then "find storage/ -mtime +1" and delete that stuff to clean out cruft - Alternatively have it keep track of block usage counts and when it goes to "zero" then delete it - Change load/save to be ref counted? Or have another method for "release" and "lock" to say "Yeah I'm using this" or "This is garbage now?" - Possibly better compression to be had if you use a squashfs sort of block of blocks. So you get redundancy of small blocks (32k or whatever) and pack those together in to big blocks (say 2-4M) then compress the big block. That way you get better compression in the big block. The question is if this constant inflating an deflating of blocks will be too much of a performance hit - Maybe have a "working set" of pre-expanded sub blocks? And automatically freeze out blocks when all the files are closed? - This might work well over a remote link for random-access to large files using sshfs or ftpfs or something since you don't have to download the whole original file to get chunks out, you download the index then just the chunks you want - Get rid of cpickle, it's way more than we need for saving essentially a few ints and a block list even though it is very convenient - Because of the way we refcount blocks I don't think we can open/unlink a file like you would for temp files, but that's not the purpose anyway - Do some profiling using a loopback filesystem, since most of it will be in memory we can see where the "real" bottlenecks are in the code by taking out the disk access unknowns - ext3 uses a lot of space because of directory inodes reducing the savings quite a bit