I really should write a post or something about it.
Basically you do a co-embedding of the graph and natural language and then you can use either to find the other. Something very roughly like the ImageSpace co-embedding example from StarSpace[1], but you do a graph representation on one side, and a phrase/sentence representation on the other.
It's weird - it's a reasonably well known technique, but there's no single paper which captures it all. I usually point at Zero-Shot Learning Through Cross-Modal Transfer[1], which kind of captures the essence, but it's from 2013 and on image/text embedding instead of graphs.
The authors (Richard Socher, Ganjoo, Chris Manning, Andrew Ng) are fairly well credentialed though, so it's usually enough to convince people it is worth thinking about at least.
That didn't seem to be the point of the paper in question though. That work seems to primarily present a usability study, even in 2014 there were better performing models for graph based QA tasks.
There's much better approaches these days.