China Scale: the New Sandbox to Battle-Test Innovative Technology

My 87-year old grandmother lives in a senior home located in the suburb of Shenyang, an industrial sprawl in the Northeastern region of China. She’s tech savvy. She uses three apps to get her shopping done: JD.com for books, Pinduoduo for fruits, and Taobao for everything else (shirt, scarf, lotion, sudoku board).

These three apps happen to be three of the largest e-commerce marketplaces in China, with a scale that reaches users well beyond the typical digital audience of millennials and Gen-Z’ers.

It’s this scale, “China Scale,” that makes the Chinese internet economy one of the best sandboxes to produce high-quality software engineering, especially for infrastructure software.

 

Shopping Spree

One of the fastest growing verticals in China’s internet economy is e-commerce and, by extension, digital payment and fulfillment logistics. This sector is also where infrastructure technology gets tested the most. The shiniest example of this growth is Singles’ Day, an artificial online shopping holiday created by Alibaba that occurs on November 11 every year on its Taobao and Tmall marketplaces. The 2017 Singles’ Day generated $25.3 billion USD in total sales. The 2018 Singles’ Day generated $30.8 billion USD.

JD.com, the country’s second largest e-commerce platform, has its own mid-year shopping festival called “618”, which is a 18-day promotional period that ends on June 18, coinciding with JD’s founding anniversary. In 2017, 618 generated $17.6 billion USD in total sales, and its 2018 reincarnation generated $28.4 billion USD.

To put things in context, Amazon’s own manufactured mid-year shopping holiday, Prime Day, generated $4.19 billion USD and $2.41 billion USD in 2018 and 2017, respectively. U.S. Thanksgiving shopping season generated $17.8 billion USD and $19.62 billion USD in 2018 and 2017, respectively.

As an engineer, what’s interesting isn’t the eye-popping topline sales numbers, but the infrastructure that must be built to handle the workload. In 2017, Alibaba decided to reveal its system’s throughput during peak moments of that year’s Singles’ Day: 256,000 transactions per second and 42 million queries per second (source in Chinese).

It’s not hard to imagine the sheer number of transactions, queries, data consistency issues, capacity for real-time analysis, and other hard-to-imagine edge cases that inevitably arise during these shopping promotions.

And not just for these companies. All the other e-commerce companies that want to ride the wave of these promotions, all the banks that offer digital payment solutions so their users can buy online easily, all the logistic hubs and warehouses that delivery the goods — they all need good infrastructure technology to handle new workloads and new growth.  

Because of this rate of growth, and the competitive pressure that’s consequently created, Chinese technology companies are quite risk-tolerant in adopting new technology.

It’s not unheard of for a company that found product-market fit and was going through hypergrowth to serve production traffic on a new, relatively unproven but promising technology in less than two months. JD.com adopted Kubernetes in early 2016, less than a year after the project was officially open sourced out of Google, because it had to solve its scalability issues and OpenStack wasn’t doing the trick. (JD.com now sports the largest Kubernetes cluster in production, running on 20,000 bare metal machines.)

 

With Great Scale Comes Great Responsibility

Chinese technology companies are your classic early adopters — not out of luxury, but out of necessity. Because they operate in a country with the largest number of Internet users (800 million and counting), they have the scale — and all the unpredictable behaviors that come with scale — to give every piece of technology in their stack a fair shake. Technologies that survive these companies come out stronger, more resilient, and more trustworthy to be used elsewhere.

Many of these behaviors are impossible to anticipate or test against for an engineering team in building mode. How do you simulate the network traffic created by your Paxos or Raft implementation when your system gets hit with a 100x query spike and needs to reach consensus algorithmically? How do you account for data hotspots when an item, a song, or a video all of sudden becomes viral and all your users are trying to access it, and worse yet, valuable ad dollars depend on your system not crashing? How do you scale your storage capacity when your data growth rate is multiple terabytes per day?

All these scenarios and more occur in many Chinese tech companies. And they are looking for new solutions, quickly, to meet these challenges — a fertile ground to battle-test innovative technology.

 

Local Grown Beneficiaries

The impact of “China Scale” has already given birth to a few promising infrastructure technologies that originated in China. The Cloud Native Computing Foundation (CNCF) accepted three of them last year: Harbor, TiKV, and DragonFly. Each of their architecture and use cases are covered well in a previous Software Engineering Daily article. There are a few more outside of the CNCF ecosystem worth highlighting.

Oceanbase: a distributed relational database developed in-house by Ant Financial (an affiliate of Alibaba) originally to support Alipay, which is a ubiquitous digital payment portal in China. Oceanbase has since been gradually deployed as the core transactional database for all of Alibaba’s key e-commerce platforms, like Taobao and Tmall. It’s also a standalone product and counts Bank of Nanjing as a user.

Since 2014, it has been through the trial and tribulation of five Singles’ Day. Unfortunately, as a closed-source software not adopted much beyond China, there’s not much information available in terms of architecture, design, or engineering in English.

TiDB: an open-source NewSQL distributed database with MySQL compatibility first created by PingCAP in 2015.

 

TiDB Architecture

It has a layered architecture that separates the SQL processing layer (the TiDB cluster on the left side) and the horizontally scalable storage layer (the TiKV cluster in the middle). (Note: TiKV started by PingCAP as well but is now under the purview of the CNCF.) This design approach was inspired by Google’s Spanner project and how it works with its F1 project. The PD (Placement Driver) cluster stores the metadata, provides some load balancing support, and issues timestamps as part of the system’s transaction model. The TiSpark cluster is an optional component that simply allows users to run Spark jobs directly on data stored inside TiKV.

TiDB is currently deployed in production by a few hundred companies in China, like Mobike, Bank of Beijing, and iQiyi, as well as some large Internet companies outside of China, like Shopee and BookMyShow.

(Disclosure: my employer PingCAP provides enterprise and cloud offering of TiDB and maintains the open-source community version.)

Apache Kylin: a fast OLAP (Online Analytical Processing) engine that was first developed inside eBay’s China team, then open sourced to the Apache Foundation in 2014, reaching top-level status at the end of 2015.

Apache Kylin Architecture

Kylin works primarily in the Hadoop ecosystem and offers significant speed improvement for analytical queries on 10+ billion rows of data by first allowing users to define their preferred data model, then run multiple MapReduce jobs in parallel to pre-build the necessary multi-dimensional model (also known as “MOLAP”), taking advantage of the distributed nature of Hadoop. The engine eventually stores the pre-calculated model in HBase to be queried by the end user. It also leverages Zookeeper to coordinate and manage these different parts of the process.

As a big data analytical engine, Kylin integrates with popular BI tools like Tableau, MicroStrategy, and Excel. It also has a RESTful API to provide connectivity with third-party apps. Besides eBay, it’s been battle-tested in companies like OPPO, Baidu, China Pacific Insurance Company, and also cites Samsung and J.P. Morgan as users.

Apache Skywalking: a relatively new open-source Application Performance Monitor (APM) tool that’s designed to observe microservices in a container-based environment. It entered the Apache Foundation as an Incubator project at the end of 2017.

Apache Skywalking Architecture

Skywalking draws metrics from microservices via a service mesh and tracing information from popular tools like Jaeger, and make then queryable, easy to analyze, and nice to visualize on a UI interface that the team has built. It also has a pluggable storage interface where you can keep this information on some popular database solutions, like Elasticsearch, MySQL, and TiDB.

Despite being less than two years old, it has been deployed in sizable Chinese tech companies like Huawei, Xiaomi, and Beike (a large real-estate brokerage platform).

 

Beyond China

Beyond the local grown technologies, some that are created outside of China are already getting their taste of “China Scale”. JD is known to be a big user of technologies like Prometheus, Vitesse, Jenkins, and GitLab. Baidu is a big user of CockroachDB, another Spanner-inspired open-source database similar to TiDB. Alluxio, an open-source unifying layer of distributed file system that can run at in-memory speed (originated from a research project called Tachyon at AMPLab, UC Berkeley), is also getting usage from Baidu, China Unicom, and Didi Chuxing (China’s Uber).  

Not only do technologies get deployed at scale, sometimes they just get bought outright. Apache Flink, an open-source data streaming platform similar to Apache Spark, was first created as part of the Stratosphere research project at the Technical University of Berlin in 2009. Alibaba became its biggest user and eventually bought the company that was founded by the creators of Flink to commercialize the software, called data Artisan.

 

Worthy Tradeoffs?

As engineers, we know technology is always a set of tradeoffs, not absolutes. We trade off between throughput and latency, data consistency and response time, new features and system stability, the list goes on. Rarely can we have our cake and eat it too, nor do we trust technologies that market themselves as such.

Same goes for choice of market. There are legitimate concerns to consider when operating in China’s internet economy. Information censorship is worrisome. Respect and legal recourse for intellectual property violation is unreliable. Regulation around personal data usage by companies is nascent.

But if you are a developer looking for a stable technology to deploy in your stack — something that has “seen it all” — chances are the technology that has been battle-tested in China’s competitive internet landscape will be a safe bet.

And if you are an engineer team building the next best thing, especially on the infrastructure level, getting that project into the hands of a few consumer tech companies in China will improve it by leaps and bounds and might just help you leapfrog over your closest emulator.   

As a bonus, you might just make my grandmother happy along the way too!

 

Kevin Xu

Kevin Xu is the General Manager of Global Strategy and Operations at PingCAP. He studied Computer Science and Law at Stanford. He’s interested in distributed systems, cloud-native technologies, natural language processing, and open-source. 

Software Weekly

Software Weekly

Subscribe to Software Weekly, a curated weekly newsletter featuring the best and newest from the software engineering community.