Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

A great idea! I think tags are definitely a better way to organize most personal data than trees.

Also I like that they describe what data they actually change on your computer right on the homepage: "TMSU does not alter your files in any way: they remain unchanged on disk, or on the network, wherever you put them. TMSU maintains its own database and you simply gain an additional view, which you can mount, based upon the tags you set up."

Unfortunately building on a foundation of sand (meaning not TMSU's code, but Unix filesystems) has downsides:

https://github.com/oniony/TMSU/wiki/FAQ#why-does-tmsu-not-de...

" Why does TMSU not detect file moves and renames?

To detect file moves/renames would require a daemon process watching the file system for changes and support from the file system for these events. As some file systems cannot provide these events (e.g. remote file systems) a universal solution cannot be offered. Such a function may be added later for those file systems that do provide file move/modification events but adding support for this to TMSU is not a priority at this time.

The current solution is to periodically use the repair command which will detect moved/renamed files and also update fingerprints for modified files. (The limitation of this is that files that are both moved/renamed and modified cannot be detected.) "

Ouch.



ding ding ding! This is the monkey in the wrench as it were.

Tagging is a really useful idea, it is also a naming thing and as such either it lives in the naming infrastructure (aka dirents) or it rots over time. A simple example I used to use in the 'object naming' [1] days was, imagine that instead of house numbers on the street you wrote down last names. That works fine until somebody moves and now not only did you show up at the wrong house, you don't even have a chance of knowing what the correct house is. [2]

Microsoft's LongHorn project was way out there but took a swing at the actual problem. Just make the file system an actual relational database. Then your home directory is simply 'select * from files where (owner = chuck);' It really does solve the problem at a more fundamental level, using naming by attribute rather than mapping. I got to observe that effort from the outside (I was at NetApp at the time) but I believe it died due to really horrible performance issues.

I find it pretty awesome that people can lose files, back when a "big" hard drive was 100MB it really wasn't all that hard to just look through all the files on it, but when its a couple or three terabytes, all bets are off!

[1] Object File systems were all the rage in the early 2000's, files themselves were object ids and the naming was a database that connected object ids to user recognizable names. -- https://en.wikipedia.org/wiki/Object_storage

[2] The typical solution is to add "tombstones" or redirects at the previous address. That then is a layer of additional meta data to maintain, and sometimes the file doesn't move, it just changes value (trivial example you have a file 'my-favorite-song.mp3' which is tagged 'jazz mp3' and then you discover techno and make something from Tiesto your favorite song and while the name and type are still valid, the tag 'jazz' is now invalid.


Hmm, seems like they could have gone the other way, throw everything into a DB, and then wrote a fuse plugin to access it all through traditional file system mechanics. That would have allowed for gating direct access such that moves and renames could be dealt with accordingly. Of course, there are other problems with that approach, but probably not as many as you might think (the file system is a database, so you're really just choosing a back-end that is less likely to be directly accessed).


    they could have gone the other way, throw everything into
    a DB, and then wrote a fuse plugin to access it all
    through traditional file system
This is the Camlistore strategy!

    Of course, there are other problems with that approach
Could you elaborate more on these? I've never worked with FUSE.


The other problems I was alluding to weren't really with FUSE, but one that does pertain to FUSE is speed, since FUSE imposes overhead through a daemon running in user space, and associated context level switches because of that. From just looking into is again, this may have been mitigated to some larger or smaller degree with some FUSE performance enhancements in 2012.

Specifically, I was referring to the different off the shelf database systems which could be used. Each will have it's own benefits and drawbacks to storing large chunks of data per-record. Benefits might include (relatively) easy sharding or replication. Drawbacks might include not being space efficient for removed files, not being as resilient to corruption due to crashes or corruption affecting more than the files in use, or overly aggressive use of memory to function efficiently.

If a custom database was developed, you could tailor to your exact needs, but then you have much more work to do, and a period of immaturity.

Off the top of my head, if I were designing a general purpose system for tagging files where people were expected to use it as a regular file system and some overhead from FUSE was acceptable, I think I would leverage the file system but in a different way. I would set up a specialized directory for the files themselves, and store then hashed within it, and have a BerkelyDB database relate filename to hash and tags, and use FUSE to do direct file access. But that's my 5 minute assessment, so I reserve the right to change it completely given someone pointing out the obvious problems. :)


Couldn't they just create a hardlink in a private, hidden directory that they control, and then symlink to that?

Then, it's OK if the original file gets renamed or moved, as long as it stays on the same FS. You still have your hardlink, and so your symlink still works.


what if you really want to delete the file tho? (passwords, customer data, incriminating evidence) then you have to remember to delete it from this system too!


I've been considering writing something similar myself, and my plan had been to hardlink the files by their hash into my blobstore. It won't fix the move/modification case, but it would solve the problem for simple moves. But I guess they're trying to deal with remote filesystems, too, and I was not targeting those.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: