This looks nice. What I'd really like to see, along these lines, is a python lib...

deanmalmgren · on Aug 4, 2014

I thought about the metadata thing but decided to exclude it for the earliest versions of textract to keep things simple. If you'd like to see it in there and have a good example of how you'd like to use metadata, please feel free to throw an issue on the issue tracker https://github.com/deanmalmgren/textract/issues/

kalkin · on Aug 4, 2014

As far as I have been able to tell, the public state of the art in academic paper metadata parsing is Grobid: https://github.com/kermitt2/grobid

Not quite as simple a commandline interface as you suggest, but not too hard to set up, and pretty impressive. Now if only Google Scholar would open-source whatever they use...

emillon · on Aug 4, 2014

For video files, guessit does something similar using only the file name:

http://guessit.readthedocs.org/