Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
How to recover lost Python source code if it's still resident in a running inter (2017) (gist.github.com)
77 points by fanf2 on May 17, 2024 | hide | past | favorite | 42 comments


The other day I managed to nuke an hour or so's work from a Jupyter notebook because I didn't realise that VS Code would immediately sync the open contents with the disk after a git reset.

Dunno if this would have helped me then, but it's interesting either way.


This is what drives me nuts about apps these days.

In the "old days", an open file was in memory, and you needed to explicitly save it with Save, or save a new version with Save As.

But now everything's flown out the window. If I open a PDF in Preview on my Mac and delete a page from it, it saves the change instantly. If I made a mistake, I can't just close it and re-open the original version. I have to use "File > Revert to > Browse All Versions..."

But it acts differently if my file is on an external drive which doesn't support MacOS file revisions, so when I close the file it gives me this scary long dialog to explicitly make a choice whether to keep my changes or revert, this is my last chance.

Oh and there's no Save As anymore, there's a "File > Duplicate" instead which is never what I want, because the whole point is that sometimes I want to save my changes in a new file but keep the old one without the changes. So I have to do a whole song-and-dance to first duplicate and then save that and then go back to the original and then revert that.

Except it all depends on the application. Some applications still have the traditional Save / Save As, while some have moved onto Duplicate / Revert.

And as you mention, some apps will update the file you have open to reflect changes on disk, while others won't.

And of course cloud apps in the browser just autosave the whole time.

So you have to keep this insanely complex matrix in your head to figure out whether you need to constantly save your work with Cmd+S or not, or whether your work will accidentally overwrite your previous version which you might have been intending to save, or whether something like a sync will overwite the work you have open, and it all depends on just on which application you're using, but whether it's a cloud version or not, and whether you're working off of an external drive or not!

It's maddening. I hate it.


Then there's iWork (AKA Pages / Numbers / Keynote), which internally converts all Microsoft Office documents to its own format and tries really hard to save in that format. If you want to save to the original document, you need to re-export. If you work on a document for hours and your computer crashes, your document is gone, because you were technically working on an "unsaved" file. If you have a habit of pressing cmd+s all the time, you keep getting interrupted by the obnoxious save dialog. There's no way to fix this except by saving every document in Apple's format and then re-exporting to Microsoft's.


Wait, there are people who actually use Apple's Office version for work? I've only ever seen MS Office for macos used in actual business settings, and I have used it myself a few times over the years without issues. Despite running a Macbook in some jobs, I haven't touched Apple's suite in years because of some very bad experiences whenever you need availability beyond you own system. The other way around (mostly) just works, but it's still more headache than simply installing MS office for mac.


I wouldn't strictly call it "work", but I do get the occasional form or two that I need to fill in or whatever. This doesn't happen nearly often enough for me to bother with MS Office, but it does happen sometimes.


Maddening indeed.

And if I decide to save a copy of confidential.ppt to start a new presentation, the first saved version in the history will have all the confidential stuff in it, even if my first action is to delete them. Everyone who has access can now see it.


> If I open a PDF in Preview on my Mac and delete a page from it, it saves the change instantly. If I made a mistake, I can't just close it and re-open the original version.

There's a system setting to (mostly) restore the old behavior: Desktop & Dock > Ask to keep changes when closing documents. Apple applications will still immediately update the file and disk, but it's a lot easier to discard all changes since the last save by closing without saving, and the on-disk file will be rolled back to the previous version. This setting has been around since 2012, I think one OS after they introduced the "auto-save" feature, and I've been using it ever since.


Mac OS, especially in the olden pre-OSX days, was at least consistent here. Files autosave invisibly and frequently, to give the impression that the file in memory and the file on disk are the same thing. So long as the rules are simple and predictable, they can be learned.

But now the different disciplines are mixed. Does it use save as, or duplicate? Does it sync with external changes on the disk, or keep your work open in memory? With apps being so cross-platform these days, they rarely respect the interface guidelines of second class citizen platforms.


>Oh and there's no Save As anymore

Unfold the File menu. Look at Save. Now press and hold Option and look next to Save.

Why they hid it like that, I will never know — some harebrained idea to shift people away from the well-used paradigms you mention, I suppose.


On the one hand, thank you! It never even occurred to me that they kept it but hid it.

On the other hand, I just tried it, and it doesn't actually help at all -- it actually makes things worse. Because it's still autosaving the whole time. "Save As..." saves my current file to a new file... but the old file still has all the changes until that point, despite never having manually saved.

So I still have to go through the whole step of reverting the changes in the first file. Only with "Save As..." it's actually harder, because the first file is no longer open. I have to go find it and open it and revert.

Ugh.


That's why I use the Local History extension :D https://marketplace.visualstudio.com/items?itemName=xyz.loca...


Doesn’t vs code keep tons of snapshots? There’s a history panel in the lover left corner, which contains both git commits and VS codes own snapshots.


I was once asked to do this for an “idea guy” who burned his technical cofounder. The technical cofounder abandoned the project taking away access to the repos.

The funny thing is that the idea guy kept lying about what happened originally, but eventually confessed once everything I suggested to get access back to the repos wasn’t possible.


If I needed to do this, I would dump the ram of the process then grep for some comment or variable name from the file and Id almost certainly find a complete copy of the source file as some string in memory somewhere.


Might be a good idea to send the process a SIGQUIT if you can, first, so you can dissect the ram image without worrying that the process will finish before you do.


SIGSTOP might be better, in this case. Either way, the specific approach described in the article requires a running Python session.


SIGQUIT gives you a core dump which you can peruse / scrape at your leisure.

This is funny because normally I consider core dumps to be a barbaric relic of the card deck days. Core dumps are literally a shadow of what was going on so you have to guess.

In the mid 70s I was debugging crashed processes by attaching to them with all network connections, files etc all open so I could frob them and figure out what was going. Then again, ITS used the a debugger as its "shell" so was ultra hacker friendly. This should still be the default, but most POSIX systems can't even support it.


SIGQUIT can be ignored, e.g. with:

  # handle SIGQUIT appropriately (readable version)
  __import__('signal').signal(3, 1)
SIGSTOP just happens, and is mostly transparent (unless you handle SIGCONT, but who does?).


or use gcore which will save the core without terminating the process


Not working anymore on Python3 (tried it after rm -rvf ./ *)


Tangential, but funny story, I once ran that exact command but the . key didn't register, and (out of habit) I hit enter before I was able to notice.

I learned a valuable lesson that day about testing recovery from backups before you need them. I also learned a lesson about only running commands in a root shell when you really need to be root.


I just don't use rm directly, unless it's something very trivial and non-recursive. Start with ls, write out your path, execute it and check the results. If it looks fine, Ctrl+P → Ctrl+A → Alt+D → rm → Enter. Takes maybe two seconds more than just using rm, and definitely saved me more time over the years.


I have a similar habit. I usually do "echo rm ..." or "echo destructive command ..." first, check the output (so wildcards are expanded, etc), and if I'm happy I bring the command back in the history and edit out the echo.


In bash, you can alternatively replace your shortcuts

    Ctrl+P → Ctrl+A → Alt+D
with `!:1-$`, where 1 is the index of the first argument you want from your previous command.

So:

    ls [files ...]
    rm !:1-$
I personally find it faster to use bash's history expansion. But, we're talking about subsecond differences here.


You can also do:

    $ ls [files ...]
    $ ^ls^rm


I once spent an entire night setting up a new Macbook. Early in the morning, bleary eyed, I cloned a directory to ~/project. I must've quoted it, because it created a directory called "~". Without a thought, I typed `rm -rf ~` and hit enter.

Oops.


To save the rest of us from the same fate, ‘rm -rf /‘ has been disabled by default since coreutils 6.2 in 2006: https://git.savannah.gnu.org/gitweb/?p=coreutils.git;a=commi...


My first thought was "this won't work without --no-preserve-root"


Very tangentially related, the last time I nuked a computer it was with FUSE shenanigans. I used a secure temporary directory as the mount point, and the language's destructor effectively did a hidden rm -rf on the contents when I was done with it after the unmount silently failed.


i imagine it's the "we'll laugh at it in a year" kind of funny


Code should still be somewhere in the local git cache, right? I've had to go digging around in there once in a very similar situation.


If the file was ever staged with `git add`, it'll be in the git object store (note, git doesn't have a cache, it maintains a whole repo) as a blob until the next GC run, otherwise you're out of luck.

Even if it's in the object store though, it may be hard to find; if it was never commited, it was only ever referenced by the index, and now it's completely unrereferenced. Maybe you could enumerate all unreachable blobs, diff them with the current version of the file, and manually review all the blobs where the diff size is below some threshold.


> (note, git doesn't have a cache, it maintains a whole repo)

? not sure what you mean here, git definitely has a local cache. There is even a command like git rm --cached. Maybe we're getting into semantics nitpicking here, I'm talking about the file under .git/ that contains the staging area


Oh I see, I thought you were refering to the local repo as a cache of some central repo, which is a mistake I've often heard.

The file you're refering to is the index; it's a list of git object IDs which make up the current staged state. It only has metadata, the actual contents of staged files are stored in the object store as blobs. Honestly, I'm not sure why some commands use --cached to mean "with respect to the index". The index is not a cache in the common sense of the word. It doesn't mirror state stored elsewhere in the repo and it's the single source of truth for current staged state.


I think the name was changed at some point and the old options were just too engrained to bother with.


I dug through the Git mailing list, and lo and behold, the whole project used to be called "dir-cache", and was initially intended as a local cache over any arbitrary slow version control systems[1]. It looks like this is where the cache terminology lingers from, but the index already had its current name by the time Linus released the git source code. It seems like there was an effort to clean up the terminology pre-1.0[2], but obviously some things were missed.

[1]: https://lore.kernel.org/git/Pine.LNX.4.58.0508051104510.3258...

[2]: https://lore.kernel.org/git/200509020150.j821oXXM006699@lapt...


I really like the idea of having to recover code or variables for already-running scripts. I did something similar last year to recovery bash variables of a running script: https://joshua.hu/dumping-retrieving-bash-variables-in-memor...


I've had several instances where I accidentally nuked my local git copy. Thankful for Pycharm/<Your jetbrains tool of choice> Local History: https://www.jetbrains.com/help/idea/local-history.html


I love that the research and recovery probably took longer than just rewriting the code; and I'm sure that I would have tried that route too.


Shouldn't this be possible by looking inside of /proc/ if the file's still open?


* 2017

Cool!


I got trolled by that too, I didn't see it was posted by a bot




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: