lol, this post makes me feel like crying at my job in BigTech co. He’s complaini...

lol, this post makes me feel like crying at my job in BigTech co. He’s complaining about a 30 min feedback loop maybe, mine is 2+ hours.

I write a piece of code, however small the change. Then run the proc, first it takes ~20 mins to compile. Then since it is ML, it can only run on a remote server. It takes an easy 20 more minutes to start running in the remote server. Then after another 40 minutes I can confirm the job failed. The only way to debug is to read through a massive log file, for which we have an in built log reader lol. The log file will have thousands of errors that literally don’t matter, and you’ll never have enough context to know which errors don’t matter. You can simply ping someone and ask, does this error matter, could this be the reason my entire proc failed, oh no this is just a useless log that fails all the time, I shouldn’t have been wasting a day digging into this, ok thanks bye.

But this isn’t it, not even close lol, we in fact have a custom DSL to define computational graphs, which of course does not have any linter, or even any compiler and a very broken visualizer but our entire org runs on this. Syntax errors, logic errors, actual race conditions are all caught the exact same way —- as the process dying after trying to run on a remote server for 2+ hours with no useful error log. So our workflow is to just get a cup of coffee and stare at the graph which can get thousands and thousands of lines big to find at times completely trivial bugs that any half decent language would have caught as a syntax error.

My exp in BigTechCo makes me completely understand GitHub actions, my guess is many big companies have equally janky tools that harm dev productivity but still somehow take absolutely massive workloads. GitHub just thought of sharing it to the public.