I've worked at 8 different companies and none of them could even hold a candle t...

brandon · on Feb 17, 2021

> Nobody at Google has a need to raise a ticket with some ops group in Bengaluru to partition a Kafka topic, renew a certificate, bridge two VPCs, or any of that type of thing.

Except when your team wanted to initially onboard with GOOPS and your request sat in Buganizer for 2 weeks waiting for someone to triage. Uh oh — we're turning down this service next quarter, you will need to go start this onboarding process again with its replacement.

Or when you needed quota in a cell where your product area didn't have Flex. Maybe you can set up a VC with your PARM? Does next week work for your launch plan? Hopefully they can do something for you!

Or when your logs access request sat in GUTS for a month because both of the approvers were on vacation and no, there's not an escalation path.

Or when you needed to change a firewall rule for a project your team inherited which for some reason runs on GCE. Make sure you bring your Ariane link when you open your request. Have ISE reviewed your code? No? ISE currently have a quarter-long backlog, so we're not sure we can grant your firewall exception.

None of these examples are contrived; the weight of the operational bureaucracy is staggering. It may well be that this stuff is felt more on the SRE/Security side around production launches than on the SWE side for experimentation or iterative development, but I struggle with the idea that Google is nimble.

throwaway-dos · on Feb 17, 2021

Registered account to reply here, because your complaints feel one sided to me.

Most of what you described i felt as well _sometimes_ for security related stuff, like dedicated machines in that one cluster or an ISE review on short notice--but security related is also somewhat out of the norm and considering that is, Google does a great job.

For "normal" services what you described does not match my experience at all. Even for medium sized infrastructure services mostly everything just works (IME).

Never had a GUTS ticket that was not answered within a business day, but obviously just n=1 sample--imo support staff is mostly amazing.

throwaway3699 · on Feb 17, 2021

Sure, things get hairy when you go off the beaten path, but day-to-day infrastructure is not the issue. As a user of Google products I don't care as much about developer velocity as I do them shipping swiss cheese products security-wise. If I have to wait a few months more for some new feature, I'll take that trade-off.

jeffbee · on Feb 17, 2021

Right, the slothful approval process for log retention and access is a feature, not a bug. It's part of the reason why Google's technical privacy story is incomparable.

jeffbee · on Feb 17, 2021

That was some T7-9 whining right there. Do you think it's easier to get unplanned compute capacity at some other company?

strken · on Feb 18, 2021

Well, yes. I was provisioning a new service last week and it took me half an hour of clicking buttons in AWS. Without knowing anything about Google, I would have assumed they'd overprovison compute capacity to save developer time at least for smallish requests, since they literally run their own data centres.

joshuamorton · on Feb 18, 2021

They do. When GP says stuff about not having flex in a cell, that essentially means "has not provisioned any quota whatsoever in that zone". Once you do the baseline work to provision some quota, generally speaking you have a somewhat over-provisioned pool to use for whatever.

The need to run in a particular cell is unusual.

randomswede · on Feb 18, 2021

More usual is "I need to run in at least three cells in region R". Thankfully, I never faced the "you need to turn up in cell EX tomorrow" without TPM support.

bruckie · on Feb 18, 2021

They do.

gresrun · on Feb 17, 2021

5+ yrs @ Google, Google is my 5th company.

Google has all the building blocks for great backend services and front-end development and, if you know where to look and have some experience with them, you can build a rock-solid product in <6mos, also assuming you have a team that can execute and the political will to ship it.

Politics/consensus building is where the real roadblocks lie in Google, and presumably other large companies. Trying to make high-level product & technical decisions when you have 10 stakeholders with 3 VPs, all in different orgs, is serious exercise in patience; months of emails & meetings await you.

cobookman · on Feb 17, 2021

Consensus Building really sums up the issue. There's no clear decision maker at Google. There is no Tim Cook or Jeff Bezos. Instead it is a collection of teams in a department.

It is like a democracy, but differs in that you need every single team leader onboard to get anything done vs say 51% of the "vote".

joshuamorton · on Feb 17, 2021

This is far less true than you make it seem, I think. I can think of executives who are very clear decision makers in particular contexts.

But for most engineers, most of the time, you're working well below the level of those executives, and especially if you're engaging with shared infrastructure, the executive who is the final decision maker is Sundar.

For example, I am involved in an issue where 3 ICs whose levels are between 4 and 6 (really this is a simplification), are engaged in dealing with solving a problem. These three ICs are in 3 different PAs, reporting indirectly to 3 different SVPs, who join up at Sundar. It isn't worth it to have the CEO spend time refereeing this decision.

Ultimately this was resolved at the director level, by consensus building, because it would reflect badly on every one of those directors if they failed to resolve it and had to escalate to SVPs or CEOs about something that is, on the company scale, trivial (to be clear this is still a thing that is multiple engineer-years of work, but it's still Google-trivial).

I expect the same is true at Amazon or Apple. Cook and Bezos aren't making every decision. VPs and Directors deal with small potatoes, and most things are small potatoes. The difference may be organizationally that those companies are more siloed and so leaves from different trees interact less often. But this friction also is often intentional and has value (SRE explicitly not reporting up through normal product eng ladders, for example).

cobookman · on Feb 17, 2021

The executives are still beholden to supporting teams. Want to launch a new feature that depends on GFE. Looks like the current GFE is end-of-life, but the the new one still isn't ready yet. Let's connect with GFE team on if they'll support the older GFE and accept our CL to launch...<GFE has power to delay your launch right here>

Next up is the documentation. That requires the doc team's approval. Oh they require IL8n, lets go to that team and see where in their queue we are <Doc team has power to delay your launch>.

This same flow occurs across all supporting teams. And it can get complex with Service A depends on Service B, which depends on Service C...and Service C can reject the quota increase delaying your launch...etc.

> I expect the same is true at Amazon or Apple

At amazon you would connect to who everyone reports up to, or whoever has clear decision making authority. You would then provide a written document going over the facts and suggested decisions, and ask they make the call. After that its "disagree and commit".

joshuamorton · on Feb 18, 2021

> The executives are still beholden to supporting teams. Want to launch a new feature that depends on GFE. Looks like the current GFE is end-of-life, but the the new one still isn't ready yet. Let's connect with GFE team on if they'll support the older GFE and accept our CL to launch...<GFE has power to delay your launch right here>

I don't see how this is distinct from what I said, except perhaps that for many teams, the GFE team reports up to a different SVP than you, so the person who you'd connect up via is the CEO, which like I said, doesn't scale for every launch.

If you want to try and escalate your launch up to the CEO, nothing is stopping you, except perhaps your director or VP. But that is itself a signal that perhaps this isn't worth escalating about and that the status quo is acceptable.

tdeck · on Feb 18, 2021

There's plenty of red tape and broken processes at Google if you know where to look. I've waited months for a small log schema change to be approved for a yet-unused log topic. CI presubmit runs for tools I worked on routinely took several hours and needed to be manually restarted due to flakes, whereas at Square people would complain if a CI run took more than 10 minutes. The tool for releasing Android Studio SDKs was a broken mess of Python that nobody understood, so they spent 2 years writing a replacement that never came rather than fixing it. I could go on. These things definitely affect your happiness and productivity while working and the pace sure didn't seem fast.

xmprt · on Feb 17, 2021

All the reasons you've given for fast pace are tech reasons and I don't think anyone is arguing with that. The author mostly discussed how people reasons is why the pace is slowed down and that's something that's a lot more prevalent at big corporations than at startups.

pid_0 · on Feb 17, 2021

Wow and yet, they can't really build a single profitable product!

Emphasis on single product as they tend to have 5 apps that do the same thing until they kill off the popular ones.