(Originally posted on LinkedIn)

this is a purely speculative and imaginary product conceived out of curiosity and necessity 

Cloud Compiler compiles and transforms a single monolith program consisting of classes, methods, and calls, into a distributed microservices-based system:

(Vanilla distributed application, precompilation)

(Compiled application using Cloud Compiler)

Running the service locally is as easy as loading it up in Visual Studio Code and hitting F5. Debugging a replayed user session is an extension-installation followed by F5 again. Cloud Compiler absorbs the complexity through the abstraction in the same way Java or Rust compile down to machine code. Containers, lamdas, VPCs and the rest of the cloud building blocks are the machine code that Cloud Compiler produces.

All developers see are functions, calls, inputs, and outputs— the rest is invisible.

Where do I try it out?

It doesn’t exist, yet. I predict that a variant of Cloud Compiler will exist within the next 3 years, and building microservices directly will be as uncommon as directly writing machine code or assembly.

But microservices are popular!

So was writing assembly, out of necessity. Microservices came about in an age where the early at-scale online companies like Netflix and Amazon needed to serve their exponentially growing user base. The microservices style of architecture composes complex application software from small, individual applications that communicate over APIs, maintaining language independence across services through a shared understanding of how to communicate. The pattern of smaller, self-contained “do one thing well” microservices simplified the testing, scaling and deployment of each unit, and the decoupling pushed distributed system developers to build for failure, creating a necessary culture for fault-tolerant, and highly available systems.

It isn’t magic, however. In the example from the beginning of the post, it became easy to reason about the Movie Catalog, Recommendation Engine or User Preferences services, but reasoning about the flow of the entire Netflix-example application became prohibitively hard. Take into account that Netflix’s real set of micro-services is two orders of magnitude more complicated:

(Netflix microservice based architecture)

Complex as it seems, it gets even more so when you consider the explosive combinatorial mix of startups and tech stacks re-solving the development cycle tooling needs for a microservices ecosystem:

(Cloud Native Startups)

The solutions startups are coming up with include:

  • Compute scheduling and orchestration
  • Storage provisioning, replication, and scaling
  • Code observability, causality, monitoring, logging, alerting and distributed tracing, 
  • Application definition, imaging, deployment, and redundancy 
  • Software-defined networking, security, compliance, configuration, secret management 
  • And so much more

Even with that level of complexity, the reason microservice architecture still dominate the service development headspace for years is because it allows for building large, scalable, fault-tolerant systems that are future-facing and fast-moving.

Paradigms have spun up and down, each attempting to make the horses go faster: Azure Service Fabric, Google App Engine, and AWS Lambdas, to name a few. All those platforms require the developer to manage the application-level complexity.

Having spent the past few years in multiple organizations, with varying levels of expertise in distributed systems, it’s clear to me that this approach does not scale. Companies that leverage the cloud as an enabler won’t have an easy time scaling their cloud development needs with these paradigms; they are too low-level. There are too many moving parts, integration points, and operational complexities. It would take an architect years to understand the intricacies of just one of these ecosystems.

Compilers and Runtimes

The industry has already introduced multiple high-level abstractions enabling developers to wield new platforms with unique characteristics that have high levels of complexity. Multi-core CPUs, GPGPU programming and high-performance-computing all required abstractions and runtimes that simplified developers’ ability to design previously difficult to implement solutions. We now trust those abstractions as a reality-of-life: Go Channels for concurrency, .NET Parallel libraries, NVIDIA’s CUDA for GPU programming, Google’s TensorFlow for Machine Learning; no one in their right mind would avoid using those capabilities today. 

Similarly, the paradigm shift in how services are built and operated at scale will follow suit with a cloud compiler and runtime. For a cloud compiler to emerge and be trusted, it needs to satisfy a few conditions all compilers do:

  1. Correctness: Preserve the meaning or intent of the original abstraction, minimizing leaks 
  2. Efficiency: Produce an efficient translation to the target platform 
    1. Efficient in generating the translation
    2. Efficient while running
  3. Usable:Works well within an ecosystem of tools 

Portability also tends to be cited; compilers can be competitive differentiators, similar to how Microsoft’s Visual C++ compiler has dominated certain verticals for years until open-source C++ compilers became commodities. Vendors like AWS, Azure, and GCP will likely aim to optimize variants of their implementations to produce the best possible translation to their platforms while sacrificing others. In the future, however, open-source variants will emerge to cover more multi-cloud solutions or meta-compilers than bridge vendor-specific ones.

Cloud Compiler – Simple Example

Let’s imagine a straightforward example:

This code exposes a browser-accessible URL and processes a base64 encoded image through a CPU intensive function. 

Cloud Compilers’ first run will deploy the entire bundle as a single unit, and inspect the runtime behavior of the bundle. Due to bundle runtime characteristics, it will morph the topology from a single execution environment to a distributed one:

The runtime works hand-in-hand with the compiler, and as it observes usage pattern changes, it applies transformations learned across the vast architectures applied in AWS/Azure/GCP to adjust and continue performing nominally. For example, when traffic spikes are observed, the runtime signals the information to the compiler, which in turn scales out the architecture:

Higher-Order Observability and Causality

When the details of the implementation are left to the compiler/runtime combination, they can inject just-in-time instrumentation on multiple layers of the stack; by wiring up the needed and deepest language/platform runtime layers (Go, Node, JVM), Amazon, Microsoft and Google can build highly integrated experiences for reasoning about the application you’re building. Depending on the workload and signals coming from the compiler/runtime combination, it may choose to use SQS for backing the async RPC implementation, or transition to ElasticCache due to a more appropriate sharding/cost model. 

The significant bit to understand is that developers won’t have to care. F5-ing to run their code locally just works, and variations that allow remote debugging or playback of single calls bridges the cloud development workflow. 

Trust and Efficiency

Based on anecdotal observations of engineering time spent, I believe we’ll see 100X improvements in development times for distributed systems. This hinges on the experience being seamless, and the compiler/runtime combination being flawless. Compilers that produce the wrong code are not used in production. That’s the 100X opportunity, however, and the first vendor to demonstrate a sound combination that achieves these efficiencies will start the new cloud war.

I can’t wait to see communities and companies move into the Cloud Compiler direction, and see how much we can achieve with an order of magnitude of agility in how we build distributed systems.

What would prevent you from using this futuristic Cloud Compiler?