Three Reasons Why You Should Have GitHub Backups

Github is the largest source code host on the internet, reporting over 40 million users and more than 190 million repositories according to Wikipedia

The home to hobbyist projects, large-scale open source applications, and confidential private company code, Github has become the defacto SaaS provider for source code management and version control. Every developer has probably used it at one point in their career. However any hard reliance on a third party, even one as big as Github and backed by Microsoft, presents glaring security and operational risk. Github is not a backup service and companies that rely on the hosted product should have a backup solution in place.

Github, like most SaaS platforms, follows the Shared Responsibility Model in which responsibilities are divided between the platform and the user. 

Just like on-prem solutions, a Saas requires the user to retain responsibilities for the information and data stored on the platform. In fact, Github’s terms of service explicitly state that they are “not liable to you or any third party for any loss of profits, use, goodwill, or data.” Github is a hosted source control service.  You need to bring your own security policy, access policy, and plan for how your team should leverage this tool.

Github provides more than just storage and versioning of the actual code.  You can also use it to store and track essential metadata like bug tracking and pull requests (PRs). 

Due to its ubiquity, many organizations take for granted that Github will be reliable and trustworthy.  Many professionals underestimate the level of risk they can expose themselves to without a backup for Github and the critical operational data stored in it.

The most common cause of data loss is human error or rogue employees. Applications can fail for a variety of reasons but the most common cause of unplanned downtime is human error, both malicious and unintentional. Employees who have recently been let go or fired may act vindictively towards their former employers. In 2020, a former Cisco employee maliciously began deleting systems after they were fired and this led to a $2.4 million dollar loss for the company. In 2021, a fired credit union employee wiped 21 GB of files in retaliation for being fired. Data loss whether it be code or just general data is a big concern for organizations and can lead to huge real-world losses. Data loss can even occur if the individual is not malicious. Force pushes for example can cause you to lose massive amounts of data. 

Even if a company is not worried about human targeting, malware is still a concern. Malware is growing more and more complex every day and these days there is malware specifically targeted at gaining access to Github or other source control repositories. These can be fairly complex pieces of malware that try to inject code in build files or something as basic as a phishing attack to gain access to Git credentials. In 2018, a piece of malware called Octopus Scanner began appearing in Github repositories. The malware infected a computer, looked for projects that used Git and Github, and then uploaded infected build and source files to Github. This malware was particularly difficult to manage because it infected real projects that could not be blanket banned or blacklisted. In 2020, a widespread phishing attack targeting Github users, nicknamed Sawfish was discovered.

This phishing attack mimicked the Github login page and relayed both passwords and TOTP two-factor authentication codes. These can lead to compromised accounts that can wreak havoc by deleting code and creating issues. If you need an example of what type of damage a compromised Github account can cause, look to the 2019 incident where Canonical lost control of a Github account. Even ransomware can be repurposed or modified to target Github. In May 2019, it was reported that hackers were holding source code hostage and if developers did not post the ransom they would release the code to the public. All of these prove that malicious software can and oftentimes does target the data stored on Github. 

Finally, service downtime is a real concern for organizations that rely quite heavily on Github. Github, like any cloud or SaaS offering, is not immune to downtime or unexpected outages.

To many organizations, Github has become a key part of their developmental workflow and any interruptions to the site result in delays that can cost thousands or even millions of dollars. In June 2020, Github went down for two hours costing thousands of developers thousands of man-hours. Some developers were completely locked out of their workflow because of how closely Github was integrated into their daily tasks. 

All of these problems and more can be solved with a Github backup solution. A backup strategy is essential for all types of data, so why not backup arguably the most important part of your application. While building a custom backup solution is often possible it negates much of the benefits of using a SaaS source and often requires a large upfront cost in development hours but also has associated ongoing labor and maintenance expenses. Going with a third-party solution like BackHub by Rewind, organizations can conduct daily backups and restore their data in a few clicks saving developmental time. In addition, it is compliant with SOC2 and similar standards.

Regardless of if data is stored on-premise or on the cloud, it is vulnerable. Data can be lost in accidental deletions, malware attacks, or any number of security threats. Even Github, a large cloud-based service, is not excluded from these threats. Some would even argue due to its size and prominence it is more vulnerable. Securing data in the cloud is a shared responsibility between the developer and the platform and it has become clear what the roles are.

An effective repository backup and recovery solution is the first step in protecting code in the cloud. Check out Rewind with a 14-day free trial here.

 

 

Ashvin Nihalani

San Francisco, CA
Education: B. Eng, EECS, University of California

Originally from Texas. Graduated from Berkeley with an B.Eng in EECS. Interested in basically anything, well anything interesting. More recently focused on Machine Learning, Blockchain, and Embedded Systems.

Software Daily

Software Daily

 
Subscribe to Software Daily, a curated newsletter featuring the best and newest from the software engineering community.