I start with a confession. I do not consider myself old, but I am old from an IT and computer science perspective. I programmed using front panel switches, paper tape, dumb terminals, and punch cards. I worked on DEC’s PDP-1, early timeshare systems including MULTICS, ran batch jobs on mainframes, and played on one of the first home microcomputers. I designed and developed a complex program on a 64K microcomputer using BASIC and implementing my own Key File Access Method on the 360K floppy disks. I saw early IT departments move from timesharing to owning their own mainframes as they became affordable. I worked on the world’s first heterogeneous 802.3 Local Area Network and helped people who were on its standards committee. I understood why the term “deterministic” was more descriptive than “real-time” when describing operating systems. I read the Seven Fallacies of Distributed Computing by Peter Deutsch shortly after their publication in 1994.1 Little did I realize that these fallacies (eight, with an addition in 1997) would still be true 22 years later and one could replace the word distributed with public cloud (distributed=public cloud), or now the Eight Fallacies of
Distributed Public Cloud Computing. I will cover these eight fallacies and their applicability today to cloud computing along with my predictions for public and hybrid clouds.
We will cover the eight fallacies in order and relate them to the cloud. The eight fallacies are:
- The network is reliable
- Latency is zero
- Bandwidth is infinite
- The network is secure
- Topology doesn't change
- There is one administrator
- Transport cost is zero
- The network is homogeneous (Added in 1997 by James Gosling2)
The network is reliable. Why is this a fallacy? Most of us pay for 5-9s network availability (~5 minutes of downtime a year). What is our real network reliability? Even when we are traversing five networks, our availability across all five networks is still 4-9s and a 5, 99.995%. We conclude that our carrier-grade wired networks are reliable. How do our users connect to our reliable networks? An office or home WiFi network? A cellular network? We just found the weak link in the network, the connection at the edge. The other end of the network, data center availability is also lower than 5-9s availability, ranging from a 99.671% (28.8 hours downtime annually) for a tier 1 data center to 99.995% (26 minutes downtime annually) for a tier 4 data center. We conclude that our network is reliable, but we need to design for lower availability with the edge connection. With the edge networks suffering drops (e.g., moving out of coverage, dead spot in WiFi coverage), our applications need to be robust and not require a constant connection.
Latency is zero. The speed of light, ~186,000 miles/second (300,000 km/sec), in a vacuum, and about 30% slower in a fiber optic cable, sets a lower limit on latency. The round trip time from India to the United States (~8000 miles) is approximately 125 milliseconds. If the application requires several round trips plus any compute time, latency can easily exceed 500 milliseconds (half a second). We wish companies the best of luck to host an application in the US that provides the same experience to both users in Asia and in the US. If one wants low latency, the applications need to be located near the users or much of the application needs to reside in the user’s device. (Note: This fallacy is truer today than in 1994. Back then, transoceanic networks were a robust 56 kbps and companies hosted applications with remote users. International travelers expected poor performance when traveling, and were connecting over dial-up with limited bandwidth (next fallacy).
Bandwidth is infinite. This was a fallacy in 1994 when office LANs were 10 Megabits per second (Mbps) and PCs network interfaces ran below 1 Mbps. Today, we have WiFi networks in the Gigabits per second (Gbps) range, office LANs of 10 Gbps, and data center networks of 40 and 100 Gbps. So, can we conclude that bandwidth is now infinite? The answer is “no” for two reasons. Starting with the user, more and more users are connecting from mobile devices. Even with a 4G (LTE) network, the network is able to handle download speeds between 5 and 12 Mbps and upload speeds between 2 and 5 Mbps, with peak download speeds approaching 50 Mbps.3 We have all experienced periods of much slower download speeds on LTE networks due to heavy traffic, poor or no connection, or a balky device. Data centers have fat pipes to the network, but also many users. As an example, Amazon reported a peak of 64 transactions per second on Black Friday 2014. If one assumes five minutes of shopping per item, transactions completing in five seconds, and 20% of shoppers purchasing, we can still estimate around 20,000 users active. Even with a 10 Gbps pipe, average bandwidth per user reduces to ~.5 Mbps (500 kbps), very finite bandwidth. Application designs need to account for variable bandwidth speeds based on the edge connection, quality of that connection, and the number of simultaneous users.
The network is secure. We all know this is a fallacy. Nicolas Sarkozy, a past President of France, helped put into circulation a view that was much in vogue, "Internet is a new frontier, a territory to conquer. But it cannot be a Wild West, a lawless place." Supposedly, Willie Sutton (sometimes attributed to Jesse James) was asked and answered, "Why do I rob banks? Because that is where the money is!" With the rise of eCommerce since 1994, the money is in the network and no network is secure. Microsoft’s Chief Information Security Officer, Bret Arsenault, has stated that it is understanding that it’s no longer a matter of "if" your company will be compromised but "when" and "how.”
Topology doesn't change. The beautiful thing about the cloud is that it is virtual and can change to meet your demand. You do not need to worry about the underlying infrastructure. IT departments and service providers (including IT departments that act as service/cloud providers) allow users to choose their virtual environment with no need to know or understand the underlying physical infrastructure. However, this physical infrastructure changes with no notice when the service providers believe it will not affect their users’ virtual environments. Changes in the infrastructure and even other users running on the physical infrastructure (i.e., co-tenancy) can and will impact your virtual environment (e.g., compute speeds, CPU utilization by other VMs on the physical server, number of VMs on the physical server, network path to the data center and servers, storage usage with other users). The topology is frequently changing and you are unaware of these changes except when they affect you. You will report an incident, and the service provider may or may not understand that their infrastructure change caused the incident.
There is one administrator. This was true only until the time that a data center grew enough to need a second administrator. Now, a pool of administrators supports hundreds or thousands of servers and other infrastructure in a data center (e.g., hyperscale service providers have over 10,000 servers deployed per administrator). If your cloud includes Software as a Service (SaaS) and that SaaS is running in the public cloud, do you even know which public cloud? Your support is from the SaaS provider, not the cloud provider. It is no longer, “Houston, we have a problem,” but now, “We have a problem. Who do we call?”
Transport cost is zero. If this was ever true, it was only during a period before the dot-com bubble burst in 2000 when carriers had “infinite” bandwidth and demand for this bandwidth dried up. You could argue that you do not pay for transport, but for available bandwidth. Whichever view you take, your transport costs are real and significant. If you are a large bandwidth user, you can reduce these costs by being your own carrier and leasing or owning fiber and providing your own transport. AWS CloudFront, Azure Content Delivery Network, and similar services provide a way to reduce your transport costs to their clouds. The cloud service providers provide the internal transport as part of their service (with an associated cost). Another indicator that transport costs are real is that there are 12-18 tier-1 network providers worldwide (there is no specific definition for tier-1, which is why there is a variation in the numbers) with an aggregate market capitalization of ~$1 billion. These providers, and the tier 2 and tier 3 network providers, are getting revenue for their transport to justify these valuations. Storage costs of data stored in the cloud after transport is a related cost. While this cost may be less than, comparable to, or more than storing the data locally, it is still not zero.
The network is homogeneous. James Gosling, another Sun Fellow and the inventor of Java™4, added this fallacy around 1997. If you agree with any of the first three fallacies, the network is reliable, latency is zero, and bandwidth is infinite, this fallacy logically follows. Your web services functionality may be immune to the network, but your services performance likely changes drastically due to network performance. Have you had timeouts when clicking on a link or logging into a website? Nothing has changed in the application, but I can almost guarantee that something happened in the network path between you and the servers.
Okay. We understand the fallacies of the public cloud (and distributed computing), how do we adopt our cloud strategy to bust these fallacies? Let us look at the fallacies from the perspective of whether or not we can control the variation by using a private cloud. Does it also vary based on whether our users are internal or external? If we see variations, can we bust the fallacies by having more control? The table below compares our level of control in public and private clouds for internal and external users.
Table 1: Do We Have More Control?
Rather than using the Socratic Method to get to my conclusions, I highlight the differences in the below table:
Table 2: We Have More Control for Internal Users with Private Cloud
For internal users, you gain control over your environment in a private cloud compared to the public cloud. You have:
- More control over the network your users are on as it is your network,
- You can require end-to-end security (e.g., VPN, DirectConnect),
- You control your data center topology and co-tenants on virtual devices,
- You control how you assign administrators, and
- You control your transport costs (although still not zero).
For external users, you control the topology in your cloud, and you can decide how you assign administrators.
How does this affect you and your cloud strategy? For external users, you do not have much control over their experience whether you use private or public cloud. The variations in the user networks from the last mile all the way to the data center is not controllable. You can develop robust applications to minimize the impact of an uncontrolled network, but you still do not have control. I conclude that for external-facing applications, I can host them in a public cloud with no significant difference on user experience. For internal-facing applications where I am concerned with performance and repeatability, I gain significant control running the applications in my data center (traditional or private cloud). Summarizing, unless all my applications are external facing, a hybrid cloud of private and public cloud and traditional workloads gives me the most control over my applications.
My prediction is that we will see public cloud go through a life cycle similar to mainframe computing. In the 60s and early 70s, mainframes were expensive and only the largest corporations purchased them. Timesharing services such as Electronic Data Systems (EDS) were the first IT startups and bought mainframes to timeshare with companies that could not afford mainframes. As the performance-to-price point for computers halved every 1-2 years, the economics changed to where few companies could afford not to own their own computers. The timeshare model evolved to a remote operate model for companies that did not want their own data centers. Move forward to 2012 and beyond. We are in the idea economy and everyone is moving or has moved to the cloud. The performance-to-price point continues to halve every 1-2 years. Workloads that require 1000 servers today will run on 1-10 servers in ten years. Companies that moved to the public cloud will, as the costs drop, choose to pay these costs to regain control over their environment and move to their private clouds. You will see this first for internal-facing applications and critical business applications where companies are willing to pay more to regain control. The external-facing applications, where less control is gained moving to a private cloud, will remain in the public cloud as long as companies perceive a cost advantage running in the public cloud (this is analogous to companies today that outsource their IT operations). I predicted this future over a year ago and was one of very few voices in the darkness. Now, experts are commenting that the white box servers (generic servers) are turning the tide back toward private cloud with private cloud getting cheaper for the smaller companies and economy of scale is going away for the cloud service providers.5
Please post your questions and comments. I will answer them. This has the makings of a lively (non-Socratic) conversation.
4 Java is a registered trademark of Oracle.
About the Author:
Mark Neuhausen is a technology entrepreneur with 30+ years experience successfully launching new products and services for small-to-large companies. Now applying his knowledge and experience at Hewlett Packard Enterprise as a Chief Technologist. Also active in the Seattle start-up community. Educated in Physics at MIT and has an MBA from the Wharton School, University of Pennsylvania.
Link to original article