Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
We are currently taking a DDoS attack and are working to mitigate (status.github.com)
81 points by orrsella on March 23, 2013 | hide | past | favorite | 138 comments


This is one reason us corporate software pushers can't use github and the likes. We need stuff to be hosted on a network we control. Each hour we lose is potentially multiplied by the number of staff as we would push and pull hundreds of times a day. Someones head will roll if were down for more than a couple of minutes at a time.

It becomes just as much of a PITA importing projects github and maven for example as well.

We actually ended up sticking with svn and trac and doing manual mirrors of every source dependency we pull in to protect our asses. SVN is kept only due to externals support (even though all our externals are internal!). Our head rev is over a quarter of million revs, 37gb of data and the repository has been online since 2006 to give you an idea.

I'm aware of github enterprise btw - its actually easier to build our own using trac as we can support it end to end, its a crap load more configurable and we can scale it up and run failover nodes easily using svnsync and pgsql replication. We have our own plugins and reports plugged in using reportlab as well. And its not atlassian's crap either.


You know you can run git internally as well?

Using github as an excuse to stay with svn seems rather ill informed.

You wouldn't lose hours if github is down. You might lose your ability to deploy (if you are deploying from github) but you could setup any other remote you like, and deploy from that.

I'm not sure what the 'size' of your repo has to do with anything...


To me github is the killer feature of git.

I think svn is easier and more appropiate for a corporate environment.


Really? Git seems like the killer feature of Github to me...

Why do you think that git is inappropriate for a corporate environment? I've been using it in one for a few years now and see nothing wrong with that.


I think the killer feature of git -- which applies to any other DVCS -- is that you're not dependent on github if you're well prepared. There's absolutely nothing stopping you adding redundant remotes so you always have somewhere to push/pull changes and deploy from.

    git remote add <remote_name> url


Yep and that's like managing dns with host files...


svn is fine for very small, highly co-ordinated teams which is rare in a corporate environment. It's decidedly not fine for complex merging, development outside of the private network, or maintaining multiple versions of the same codebase (customer installs) among other things.


Even then, doing without "add -p", "rebase -i", "commit --amend", inplace branches, and various other git features is going to cost you.


the achilles heel of svn is branches cost 100% of the branched repository. even an svn repo with one user can be prohibitively large.

even in a corporate lan environment just pulling down an svn repository can take far too long



500MB branch in git - git checkout -b mybranch master == 500MB 500MB branch in svn - svn copy ... == 1GB


Branches cost nothing in svn. I doubt you've used it. svn copy only appends a pointer back to the source rev, much as git does.


i used to use them extensively. unless something has radically changed, they cost nothing in the svn repository but when you pull them down, you have multiple copies on disk which takes double the space because of the way a branch appears like a separate folder in your working copy. am i missing something?


Your working copy has a URL. You can switch it to another URL (a branch/tag) using "svn switch". It only transfers the differences between the two.

You can have two separate copies checked out, but you don't have to.


It is interesting, I have heard several times the comment about svn branches being expensive. But, that is not true, they are checking out the whole branches directory!


Are you checking out all the branches? I mean the actual branches subdirectory? If you do that, it is going to be expensive. But you shouldn't do that. Just checkout the trunk and switch to branches as needed.


You have to maintain a full copy of the file tree locally per branch, it's a pain that gets worse as your project size grows. With git, switching between branches is near instantaneous.


It is in svn as well (200mb wc switch = 4 seconds).


sorry but that is a very long time. Branching is something I do dozens, if not hundreds, of times a day (creating, checking out, deleting, etc).

svn doesn't fit the sane modern workflow and everything it does can be done by git in a less painful manner. I think the only use case I can think of for svn is a very binary heavy repo that needs global locking.


I worked in an environment with a significantly larger repository, and svn switches were a constant source of frustration.


We'd lose our deployment, integration tests, continuous integration, ability to pull other team changes from the master and code review platform. That's a big chunk of our dev infrastructure.

Size is mentioned as to be honest we moved a load of stuff to git in 2011 as a test case and it couldn't handle it. We have some test fixture data files (big ones) that it choked on. Not only that, you can't compose software easily without introducing external tooling.


> We'd lose our deployment, integration tests, continuous integration, ability to pull other team changes from the master and code review platform. That's a big chunk of our deb infrastructure.

You'd lose all that from using git? Why? Are you saying if you switched to git without switching any of your svn infastructure? We aren't talking about github downtime anymore are we...

> Size is mentioned as to be honest we moved a load of stuff to git in 2011as a test case and it couldn't handle it.

A repo I interact with daily is now pushing 54gb, it contains large test fixtures that have HD videos, and images in it. I've never had any issues. (It takes a while to copy it around...)


So your desire to not use Github has nothing to do with Github then.


git is decentralized. Any sane sysadmin wouldn't use github as a 'central' point, but set up an own remote to which you push. Then every once in a while from that remote, commits are pushed to github.

heck you wouldn't even really need a remote.


Alternatively you can set up an alias (i.e. "all") for multiple remotes and just push to them in one hit. Useful when I'm pushing code to Heroku and GitHub simultaneously.


Aliases on that scale are like hosts files. An utter shit to manage. Its why they invented dns which is a centralized delegated service.

The "single source of truth" is hard to avoid.


Your deployment really shouldn't rely on your VCS...

Just added a 3 GB file to my github repo. It took <2m, most of which I imagine was spent gzip-ing it.


Well it doesn't. We package every rev that's made as part of our integration process. Then we choose one and deploy it (using our auto deployment software which farts out packages across 100-odd machines, per instance..).


> This is one reason us corporate software pushers can't use github and the likes. We need stuff to be hosted on a network we control.

cringe This is very corporate reasoning in the most pejorative sense of the word.

> Each hour we lose is potentially multiplied by the number of staff as we would push and pull hundreds of times a day.

You clearly haven't used git. Even if github were to disappear, it would be absolutely trivial for a team to switch to an alternative central git repo for everyone to work through.


> Even if github were to disappear, it would be absolutely trivial for a team to switch to an alternative central git repo for everyone to work through.

Except for all the tooling built into github that would have to be replicated locally somehow.

I guess you could switch to: http://www.atlassian.com/software/stash/overview

That wouldn't be "trivial", and if that's the backup plan, you may as well start by using Stash and skip github entirely.


> Except for all the tooling built into github that would have to be replicated locally somehow.

What tooling? The issue tracker and code comments? I imagine the uptime requirements aren't >99% for those, but if you absolutely need them all the time, that's what github's enterprise service is for.


The issue tracker uptime is higher than the VCS in a corporate environment. Its more an ALM tool for us as it has use cases, defects, project planning running off it. That means the front line support, dev team, project management team, business analysts and management all use it 24/7 (in our case from 5 different countries).

github enterprise is a crappy solution for the above as well and is hard to support and expensive too.


Yes, the tooling that's required if you're actually collaborating instead of working in personal silos.

There are better options than Github's overpriced virtual machine-based mess of an 'enterprise' offering.


Excuse my ignorance, since I don't work in the corporate world. But isn't one of the central points of a distributed version control system like git that you don't have to host it at all? (Of course, you still need a network that you're in control of.)


Sort of. You have "remotes" which you copy your own repo to. So in theory you could just have everyone push code to everyone elses workstations.

It is probably prudent to have some central location to push code to though, especially if you want people to work on it outside of your company LAN.

Best solution is probably to have a server you control and have a copy on somewhere like github and keep them in sync. That way you don't have to worry about DDOS of github and you don't have to worry about what happens if a drive dies on your server etc.


Well we need a master as an integration point for building deployment packages, which means that you need a central host anyway.

Github is usually just this thing for people.


You can have a master branch without necessarily having a central server.


Yes but how does our integration server know which one of our 180 dev workstations has the latest tip of the master branch?


Add the integration server as a remote.


the integration server has it's own repo that you push 'latest tip' to


That's a poor solution as there is no way to know how much of the product is actually on that repo...


Not quite sure what you mean, you will have whatever branches you assign to it.


This needs an answer.


You did know Git is a DVCS, right? i.e. D for distributed?


Yes. I do. Unfortunately it doesn't support your average corporate workflow and requirements of centralisation very well (it can do it but its not its primary use case). The desktop tooling is also crappy and hard to get non-code-monkeys using it. We use svn for documentation as well and our windows users quite happily edit word documents collaboratively on top of it.

SVN is actually better for us.


Since when, using SVN allows you to edit microsoft Word documents... collaboratively?

I sense a bunch of downvotes coming for me in 3..2..1..


TortoiseSVN and Word+Excel have great change integration. You can merge word documents quite happily.


I stand baffled. Always thought Word and Excel files were binary files, thus almost un-mergeable in any useful - collaborative - manner. Sorry for my misunderstanding.


AFAIK word provides it's own diff tools which can be hooked by version control systems.


I think your issue is, you don't understand DVCS very well.


I do. I'm a long term user of mercurial for my own projects. I was also an advocate for moving to git in our organisation, but it doesn't fit.

One hammer is not necessarily good for all nails.


I'm sorry with the poor assumption... But you started this thread with "Github downtime is the reason we can't use git internally" <--- This illustrates either you don't understand git very well. OR you don't understand logic very well.... I was giving you the benefit of the doubt on the latter... :\


How, exactly, do you propose the following things to work if GitHub is down?

- Pull requests and code reviews

- Automatic integration builds of all checkins

- Commit notifications via e-mail

- Browsing of sources via the web

- Sharing of code between collaborating users

DVCS may be "distributed", but organizations and user collaboration are not.


... None of that stuff requires github.

How about also have the code on bitbucket (if you want to use flashy web services). Chances of them going down at the same time seems remote.


Actually its more likely that they're both unavailable than one is I.e our pipe goes down. Another reason to do it in house.


None of that stuff is exclusive to github.


The would work exactly how they work with the Linux kernel, for which git was created.


Yes, via centralized mailing lists and collaboration infrastructure.


You know what part of github is not distributed? Issues.


    > I'm aware of github enterprise
Which I must say is also insanely priced, particularly since you have to pay for everyone with a user account.

Which means if you're wanting to use GitHub issues to replace some bugtracker the non-developers need access to you're paying full cost for everyone who might need to update a bug once or twice a year.

I understand that they're trying to get everyone to use their online offering, but a lot of organizations wouldn't consider it due to a combination of not wanting to host their code externally, and GitHub's ongoing uptime issues. They might go for GitHub Enterprise, but it's crazily priced compared to other similar software (e.g. Atlassian Stash+Jira).


If github is down, I can still commit locally. If github is down longer I can always:

git remote add online git@ddosfreedomain.com:you/yourrepo.git

Why waste so much time on support infrastructure? Wouldn't it be better to spend that time doing productive work?


This depends entirely if you have better uptime than github. (I don't know github's uptime stats.)

That said, it's pretty easy to keep a repo up, so I would guess that if github has a few hours of downtime a year you are still doing better.


Yes we do. 5 nines (that's 5 minutes in a year).


my git repo has remained up 100% of the time - even when github goes down, my repo continues working.

I'm available for consulting if you want this kind of uptime ;-D


I don't believe you. I bet you rebooted your workstation :)


Not really a good justification for not using GitHub. If your SVN server goes down, you're in a lot more trouble than the few times GitHub is unavailable. At least each user has a full working repo and you can always prepare alternate remotes to share commits. You are apparently already aware of GitHub for Enterprise. I suppose your organization is in full control of your data centers, private cloud, etc, but rationalizing about svn and trac completely stumped me as to your point about "corporate software pushers..."


With SVN each user has a local working copy. You can't commit if the central server is down, but they're not exactly dead in the water either. You can keep working locally. If your SVN server is down for DAYS it might get to be a problem.


...you really should not be going days between commits...


Indeed. We just create patches in such an event :)


What does Atlassian have to do with GitHub? Are you thinking of Bitbucket?


Regarding Atlassian, they're expensive, hard to support, damn unreliable and the software (particularly crucible) is unusably slow.

I'm comparing it to what we have as the suggestion is inevitable.


Atlassian stash is the internal version of Bitbucket


Does Github's web-presents actually take down their back-end/client access? I don't know, but I would assume they would use a different server/servers/DNS address for that stuff?


Since this seems to be a network problem, it took down both their web servers and their git (ssh) endpoints.


[deleted]


See updated comment about github. It doesn't cut it.


Github Enterprise?


From what I've seen in corporate environments downtime of infrastructure run by internal teams is just as high if not higher than with Github.


Internal infrastructure may have more total downtime but it's usually managed and scheduled to minimize impact.


That's anecdotal. My anecdote is that we've run 5 nines for 6 years. We know our shit.


I think that's the exception rather than the norm. Since we're sharing anecdotes, every corporate shop I've worked in tends to stand up services, then once they are working they are not monitored or thought about until there's a problem.


Well both of our experiences are anecdotal aren't they. And having a server running doesn't mean that everyone has access to it, one of the issues I've faced is getting approval for VPN access (took a month IIRC) which prevented access to the SVN repo, would not have been an issue with Github.

And that org's infrastructure team could probably claim 5 nines uptime as well.


The organizational delay on VPN provisioning has nothing to do with git, or github, at all.

You could have also hosted an external RCS repository over RSH and you wouldn't have had the problem of VPN provisioning being slow.

The problem was VPN provisioning, not the source control system.


Yes, but if the repo was hosted on Github rather than internally then the slow VPN provisioning wouldn't have been an issue.

(Good) web services like Github tend to reduce friction, corporate bureaucracies tend to increase friction.


(Good) corporate services wouldn't have had slow VPN provisioning.

This has nothing to do with Github, other than the fact that Github would let you end-run all rules about data protection, backups, centralized administration of user access, and everything else that you don't individually have to worry about, but the management and IT at the company you work for does.


"centralized administration of user access"

This is exactly the sort of thing I'm talking about. Let the team working on the code decide who has access to the code. Not some person in a different part of the country who has no idea what the code does.


Actually most of our developers can't be trusted with this. We have certain regulatory requirements.

We have active directory and its all managed there. If someone needs access, their role defines it. If someone wants access they have to ask.


That's a short-sighted selfish view that fails to take into account the responsibilities of the organization at large.


Actually it would be even worse as we can't risk people pushing our code to their private repo so Mr Firewall goes "no way". Another problem: we dont know which repo we're pushing to at a network level.

Code theft is a major problem. Not because the code is crap and we're embarrassed (unlike VMware and Microsoft code leaks), but because its valuable.


What do you do to stop people emailing code out?


We have content filters and policy based email management. You can't send documents or content through the mail gateway at all.


And gmail?


They don't have access to that :)


Haha, I guess I should have seen that one coming:)

Do you whitelist or blacklist the internet?


Whitelist and restrict to certain staff.

We have to cover our asses as we deal with classified information and financial data.

We even search people leaving the building.


Somewhere else on the Internet: "We are currently giving a DDoS attack and are working to..."


"...keep it that way."


If you have trouble cloning/pushing on git@github.com right now, you could use these config[1] for ~/.ssh/config

    Host github.com
      Hostname ssh.github.com
      Port 443
Cloning and pushing to github now works in my machine.

[1]: https://help.github.com/articles/using-ssh-over-the-https-po...


Wow! Huge thanks, this actually works (for those of you still having trouble)


I really wish they'd release geographic information on who's launching these attacks. On the bright side, a Saturday @ 7 eastern isn't near as annoying as something during the workweek.



While 'neat', it won't reveal who's actually controlling the botnet and launched the attack.


It's really quite a shame at how little can be done to find the perpetrator. I suppose contacting law enforcement wouldn't do much either, unless it's a significant persistent threat that would warrant a large scale investigation.


Based on the FBI taking 50 doors related to the operation payback ddos and yet not charging anyone it appears the US Attorney's office isn't sure its even a crime.


Also, you are likely to just get a population density map:

http://xkcd.com/1138/


Why is anyone DDoSing GitHub?


A lot of people relies and uses Github on a very frequent basis. It's their way to get attention I guess.


Possibilities mentioned so far: (1) Fame for attacker (2) Excuse for other problems at GitHub (3) Competitors

Not sure I find any of these is more convincing than the next or very convincing at all.

GitHub seems to be one of the least offensive businesses around but still gets attacked on a regularly. It's popularity might make it a target but don't those running botnets have better targets? And wouldn't they publicize their exploits?

And I can't believe GitHub would use it as an excuse. A DDoS attack has clear evidence that their entire devops team and many of their suppliers would have first hand knowledge of. If word leaked out it would be pretty embarrassing.

Lastly would a competitor risk destroying their entire business if found out. People are irrational and stupid but that would be crazy.

Why is this happening?


Anyone counting how many DDoS attacks github saw this year? I don't doubt they are having issues, but at this point it seems like they are blaming every outage on a DDoS


> at this point it seems like they are blaming every outage on a DDoS

Related: https://news.ycombinator.com/item?id=3576964


Was about to post asking why would somebody want to DDoS Github. Your link seems like an "understandable" motive.


Extortion is one of the better possible reasons for their DDoS. Eventually, when the money doesn't show, its in their best interest to stop attacking. While new extortionists will keep showing up, they won't likely come back.

Ideological or government attackers, on the other hand, can't be reasoned with and can only be expected to escalate in the future. If that's who Github is dealing with, then they'd better figure out how to actually deal with DDoSes sometime soon, because they will only increase in size and frequency.


In light of this it would be nice if github offered a feature to push repo's to s3 or something every time they receive a push, so you could use that repo until they get back up.


You know what I want? Transparent failover somehow baked into git remotes.

    you <--> (X <--> Y)
If you push to and pull from X, and X should be unavailable, automatically start using Y. When X is pushed to, check with Y first to keep in sync. There are ways to push to multiple remotes by defining multiple remote URLs, but then it's a harder problem of dealing with downtime.


It's really easy to push to two remotes. I have a tfs service account setup and using git and also a github account. I like tfs's project and issue tracking better but want the public repo available on github.

git add remote <name> <url> git add remote <name> <url> then create a shell script that checks both remotes for the latest if one is greater than the other merge the two if they are the same then push to both. Then just always push with that script.

Just how I've handled it so far. That said a autoMagical way of handling it built into github/git itself would be cool.


As I said, it's doable with scripts and whatnot, but transparent and built into git would be awesome. I see it as something you just set up on a remote repository server. No need to make N developers aware of it, or aware of the script, or resolving the script breaking, or having a ton of implementations of said script across different teams and companies.

Doesn't help if you use GitHub for issues, of course, but hey!


There's no need. git by itself already does that. Shameless plug: http://www.saltwaterc.eu/git-is-distributed-stupid.html


You can do this with a post-receive hook: https://help.github.com/articles/post-receive-hooks


It's back. I was able to watch the graph change and see it reflected in my own experience of trying to load a GitHub page. Great job on the Status site, GitHub!

I first noticed the top graph change, and a minute or two after it went back up towards 100%, they updated their status message to "Minor service outage". Now it's been moved to the history: https://status.github.com/messages


maybe... but I still can't push my stuff from Italy :-/


I'm still having problems pushing/pulling as well.


Oh that explains. Seems it started half hour before they posted the update.

I was trying to make pulls just a minute ago and it was incredibly slow, but the status.github was just showing "we're investigating the high rate of drop packages".

Now, why would anyone want to DDoS GitHub? Gee...


We need to agree on a date format for this kind of submissions: https://news.ycombinator.com/item?id=5430129. :P


What can WE do about it so GitHub stops getting DDoS-ed?


Stop letting computer-illiterate people believe it's okay to run Windows when they aren't going to be responsible about updates/safe computing?


Dance in the rain and sing magical incantations.


Convince Github to release lists of attacking IPs. Then we can all try to reverse takeover the botnet.


You can't do anything yourself other than look for alternatives or self host.


Fire people?


The only real thing is to encourage behaviors that stop people from having their machines compromised. Most of the DDOS attacks are masses of compromised machines.


Does anyone have insight into that repeated spike train pattern in the exception percentage graph? That seems so predictable it ought to be fixable?


No wonder.. I was wondering why the hell I couldn't push and then I thought, "well... I'm bored guess I'll read hacker news... "


This is the reason I love git---distributed. I can keep on keeping on despite these silly attacks (if they really are DDoS).


How about whitelisting known genuine users?


The attackers would just find such a user and spoof their IP. DDoS is a hard problem to solve, and it's a shame that so many ISPs and datacenters don't work harder to prevent spoofed traffic. On top of this, they'd still need routers and switches in front of their machines big enough to handle the traffic from the attack plus the load of trying to filter out the good traffic (this kind of hardware is quite expensive).


Or better yet, paying users.


Kudos to GitHub for showing their latency and error stats. That is grace under pressure.


Everyone on HN opening GitHub to read the message will surely help! :)


Wow, Bitbucket must be getting desperate... Haha. This is annoying, even though it's a weekend, I've got work to do and the fact I can't pull down some changes I made on a machine at work to my home machine is highly annoying.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: