1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
|
So far this is just a copy of the nullfs example from
/usr/share/doc/python-fuse with some stuff renamed
To make it work:
- How do you get another arg in the options?
- pydoc fuse shows some magic option parser stuff
- need this for the "source" directory, or backing storage area
- Better to compress chunks? Or have a blob more like a zip?
-----
TODO:
+ Make inflate/deflate block based as needed, so we don't have to do a
bunch of work up front and waste a bunch of space on disk
- done
+ Make files just contain a backing storage key, this key will reference
what we have in it now (the data list and stat info) so that complete
duplicate files will not take up a few extra megs and still be able to
have their own permissions and stuff
+ Copying read-only files doens't work (permission denied on close, because
that is the point we are opening and writing to the original file)
- done - we open a file handle at __init__ now and use that
- R/W is basically ignored at this point
- fsck:
- test each chunk < last is a full block size (this would be a good
assert too)
+ delete unused chunks (refcounting)
- pack multiple chunks in to "super chunks" like cromfs/squashfs to get better
compression (e.g. 4M of data will compress better than that same file
split in to 4 1M pieces and compressed individually presumably)
- Speed it up? Or is it "fast enough"
- should die on errors accessing blocks, will also need some kind of fsck
to find corrupt blocks and files effected, this way if there is a problem
and you have another copy of the file then the block can be recreated
- some kind of config/IOC to allow plugging in the hash, storage, etc
methods. Main components like FileSystem, Chunk, ChunkFile don't need to
be swapped out since they are generic
- Maybe compression method doesn't belong in Chunk? it should be part of
storage (for super-chunks?) or should super-chunks be a "standard" part
+ Add refcounting so we can expire chunks, or would a reverse list be
better?
- If we separate metadata from chunks then we can just rebuild the
metadata for fsck
- Find a good way to test refcounting isn't purging things too soon or
keeping them too long
-----
Other thoughts:
- If there was an easy way to "open a file" or something and have it
"touch" all it's pieces, you could just run that in the mounted tree
then "find storage/ -mtime +1" and delete that stuff to clean out cruft
- Alternatively have it keep track of block usage counts and when it goes
to "zero" then delete it
- Change load/save to be ref counted? Or have another method for
"release" and "lock" to say "Yeah I'm using this" or "This is garbage
now?"
- Possibly better compression to be had if you use a squashfs sort of block
of blocks. So you get redundancy of small blocks (32k or whatever) and
pack those together in to big blocks (say 2-4M) then compress the big
block. That way you get better compression in the big block. The
question is if this constant inflating an deflating of blocks will be too
much of a performance hit
- Maybe have a "working set" of pre-expanded sub blocks? And
automatically freeze out blocks when all the files are closed?
- This might work well over a remote link for random-access to large files
using sshfs or ftpfs or something since you don't have to download the
whole original file to get chunks out, you download the index then just
the chunks you want
- Get rid of cpickle, it's way more than we need for saving essentially a
few ints and a block list even though it is very convenient
- Because of the way we refcount blocks I don't think we can open/unlink a
file like you would for temp files, but that's not the purpose anyway
- Do some profiling using a loopback filesystem, since most of it will be
in memory we can see where the "real" bottlenecks are in the code by
taking out the disk access unknowns
- ext3 uses a lot of space because of directory inodes reducing the savings
quite a bit
|