DCC: Distributed Cloud Computing: Applying Scale Out to the Data Center

Most of the focus in public cloud computing technology over the last 10 years has been on massive, centralized data centers with thousands or hundreds of thousands of servers. The data centers are typically replicated with a few instances on a continent wide scale in semi-autonomous zones. This model has proven quite successful in economically scaling cloud service, but it has some drawbacks. Failure of a zone can lead to service dropout for tenants if the tenants don't replicate their services across zones. Some applications may need finer grained control over network latency than is provided by a connection to a large centralized data center, or may benefit from being able to specify location as a parameter in their deployment. Nontechnical issues, such as the availability of real estate, power, and bandwidth for a large “mega data center”, also enter into consideration.

Another model that may be useful in many cases is to have many micro or even nano data centers, e.g., ISP POPs, interconnected by medium to high bandwidth links, and to manage the data centers and interconnecting links as if they were one larger data center. This distributed cloud model is perhaps a better match for private enterprise clouds, which tend to be smaller than the large, public megadata centers, but it has attractions for public clouds run by telecom carriers which have facilities in various cities with power, cooling, and bandwidth already available. It is also attractive for mobile operators, since it provides a platform whereby applications can be deployed and easily managed that could benefit from a tighter coupling to the wireless access network.  Finally, research testbeds like GENI and FIRE are evolving towards the distributed cloud model. The two models aren't mutually exclusive; a public cloud operator with many large data centers distributed internationally could manage their network of data centers like a distributed cloud.

Distributed clouds also encompasses federated clouds, where data centers managed by different organizations federate to allow users to utilize any of the data centers. The distinction between federated clouds and the model of more tightly coupled distributed clouds is whether authentication is handled centrally or whether each data center handles authentication individually, with authentication for entry into the distributed cloud implemented using single sign-on. Additionally, the more tightly coupled distributed cloud model may hide the locality distinctions between physical data centers and present the distributed cloud as one single data center without exposing the network interconnections. In the latter model, orchestration software manages the user’s view of the compute/storage/networking resources to hide locality. Such a model may be important in cases where locality isn’t an important characteristic for application deployment. In other cases, locality may be important and then it will be exposed through the orchestration layer.

Many applications can benefit from distributed cloud. Applications that can benefit from locality include real time applications where latency is important such as smart grid for control of the electrical network and any application where the local regulatory environment requires user account data to be stored in the same country where the user lives. Hybrid cloud applications, where a private data center delegates peak processing to a public data center, allow enterprises to provision their data centers for average rather than peak load, thereby saving CAPEX. Content delivery networks are another example where positioning of data close to the user is required. The opposite case –moving the processing closer to the data – may become important where the data sets are large and networking costs prohibit actually moving the data. Last but not least, in many cooperative scenarios such as Internet-based commerce or advanced manufacturing, different stakeholders or users have existing data and/or system deployments in distinct data centers – public or private – which need to interact without the ability of migrating components (“distribution-by-need” as opposed to “distribution-by-design”).

The vision behind distributed cloud is to utilize software as a means to aggregate compute/storage/networking resources across distributed physical data centers, with scalability and reliability automated by scale-out, primarily software based solutions, and with locality exposed as a deployment criterion for those applications which require it. The scale out model of service deployment – deploying many small instances of a service to meet demand rather than a few large instances – has proven successful for IaaS and SaaS, distributed cloud applies the same scale out model to data centers.

Workshop History

  • DCC 2013 was co-located with IEEE/ACM UCC 2013

  • DCC 2014 was co-located with ACM SIGCOMM 2014

  • DCC 2015 was co-located with ACM SIGMETRICS

  • DCC 2016 was co-located with ACM PODC 2016.