Wednesday, December 21, 2016

The Public Cloud: A Defense

“Much of history revolves around this question: how does one convince millions of people to believe particular stories about gods, or nations, or limited liability companies? Yet when it succeeds, it gives Sapiens immense power, because it enables millions of strangers to cooperate and work towards common goals."
Yuval Noah Harari. Sapiens: A Brief History of Humankind.

“A conclusion is the place where you got tired of thinking.”
Steven Wright.

“This is the central fallacy of the writer: he or she must absolutely believe something that is not true in order for it to become true.”                 
Peter Welch. The Writer’s Fallacy.

I seem to have kicked a hornet’s nest with my recent blogs about Enterprise IT and Infrastructure trends. At the heart of this discussion is the architectural inflection point at which we stand today, the public cloud, and its readiness for taking on business critical workloads for the most demanding enterprises – financials, healthcare, government, etc.
I've been getting tons of feedback - both positive and negative, both in person and in writing. I guess I'm glad we are finally starting to have the conversation. It's about time. I guess, sitting in my bubble, I had sort of assumed these decisions had already been made and the rationale for them made obvious. So I was somewhat surprised recently when I met with a group of Enterprise IT practitioners and professionals who pushed back on me quite stridently, convinced that the cloud is not ready for enterprise workloads.
Several of the comments stuck with me, although I felt we didn't have enough time to delve into the details. Hence this blog. I hope to answer some of the points they raised. Most of them are valid concerns and objections that I've heard in the past. But I've never seen them documented in on place. Overall, there was general agreement in the room that the cloud was the future. The only question was, when. They all seemed to think it was still many years (“decades”) away. I happen to think it’s right around the corner, if not already upon us.
One of the comments I heard was: “But they [the big cloud providers] won't indemnify us. What if they have a major outage? It's going to cost us millions of dollars in business.”
I wish I had had the quick wit to retort: “… as opposed to what the corporation is getting from its IT department today for management of on-prem infrastructure?” Last time I checked, the IT department was a cost center. If you have a major outage in your private data center, some small subset of the IT department may get fired or laid off. But, chances are, you will get an even larger budget next year to “fix” the problem. So, where and how exactly does the corporation get “indemnification” from its current IT organization for an outage? The pay the salaries of IT personnel and have no recourse if and when something goes wrong. As such, why would you expect it from a cloud provider? If they agree to that and have an outage, they have to pay thousands of companies using their service. The budgetary impact would be huge to any cloud provider. It doesn't scale. Forget about it. You're never going to get indemnification. You didn't have it until now so why did it suddenly become a requirement?
Another comment I heard was: “we are not like Netflix. If their infrastructure crashes because of an Amazon outage, the worst that happens is you lose your place in the video you were watching. [General laughter around the room] If our infrastructure crashes, it costs the company millions of dollars per hour of outage.”
The implication here is that cloud is good enough for consumer brands like Netflix because there is no critical business data at risk due to an outage. But that obviously wouldn't work for us: “We are the financials, we are the health care providers, we can't afford an outage even for an instant.”
This argument seems to ignore the fact that “consumer” companies like Netflix and Amazon and Facebook are, routinely, serving millions of customers with higher availability and performance than almost any Enterprise company I can think of. Remember that those “Enterprise” companies (let’s say a financial institution or a hospital) typically only have to deliver their “services” to thousands or at most tens of thousands of customers at a time. Not millions. And the architectural solutions currently deployed by those same Enterprise organizations is already breaking at the seams trying to handle that load. In other words, “Don’t knock it till you’ve tried it.” Delivering cloud services to millions of people is, in fact, the best way to make a technology bullet-proof for the enterprise. If you don’t believe me, just look at the history of Gmail.
For every single vertical you care to name, I bet I can name a “cloud native” company that is delivering better quality of service to its “customers” than any traditional Enterprise company. Hands down. And is innovating more quickly and with more scalable backend databases than any on-prem solution already stretched to its architectural extremes through twenty years of contortions - ahem, I mean integrations. Yes, financial and health verticals, too. Not just Netflix. How about Amazon Web Services? How about Amazon as the world’s biggest super store? How about Apple as a financial company? How about Google as an advertising company?
Another comment was: “What if they have a total data center outage?” Any properly architected enterprise application should have a redundancy strategy and a plan for failover and failback in the case of full data center outages. The requirement is on the modern app to architect itself properly. Any properly architected modern enterprise app can withstand the outage of a data center - regardless of whether that data center is managed locally on-prem or in the cloud by a cloud provider. The onus is on the app. If you are still running business critical apps that can't withstand an entire data center outage, you have bigger problems.
Another comment I heard was: “Every few years, Silicon Valley gets enamored with another new technology or framework. Last year, you industry pundits were telling us about how wonderful OpenStack would be. Look at how far that got us. Why should we believe you now that you are preaching cloud?”
OpenStack is an open source community effort. You, Mr. Enterprise IT Guy or Gal, are signing up for being part of the community as it evolves. With OpenStack, you - again - get to play System Integrator. Because it’s an open system with Swiss knife connectors for everything from block storage to image management to networking to security to patching to whatever. Why on Earth did you think that path would lead to success or even converge quickly? The cloud is the opposite of that path. It says, Mr. Enterprise IT Guy, please stop getting in the middle of that level of infrastructure integration. Let us hide that complexity behind an API and an SLA. Go up the stack, young man!
Another, valid, comment was this: “We don't control the budget anyway. The BUs hold the purse strings and they get to make these kinds of big strategic decisions anyway. And they don't have any stomach for big upheavals. They just want the current stuff to keep working.”
Yup. And those are the same BUs who have developers writing cloud native apps right now. Because they've given up on Central IT’s ability to help them in any timely manner. I never said it wouldn't hurt to rip off the bandaid. It will require alignment from the top levels of the organization and you will get pushback from all the BUs: They just want to get their jobs done. They don’t want to deal with infrastructure. Sooner or later, some startup will offer the same service you are offering in your data center (be it block storage or compute or higher level services like database and firewalling and intrusion detection and load balancing). Sooner or later, you will acquire another company with a more progressive cloud based approach to infrastructure delivery, and you will find that some part of your critical infrastructure is already dependent on the public cloud anyway. You can either be a passive and resistant party to this journey or you can take the lead. It's up to you.
“It can be quite expensive. The prices for public cloud based services are still too high.” Yes, of course. They will charge what the market bears. It’s an open economy. I suspect they will continue to drop their prices over the next few years as their platforms mature further. The rate of architectural enhancements made to next generation cloud architectures is an order of magnitude faster than that of on-prem infrastructure hardware and software. You’re welcome to continue to invest in the old generation but, I promise, it’ll be for diminishing returns.
When you compare the costs, be honest and include all the hidden charges that go along with on-prem infrastructure. That’s not just a SAN box you ordered last year. This year, you already have to order the clustered upgrade to improve availability. Next year, you will also have to invest in the management console and the snapshot provider and the backup adaptor as well. Not to mention the Enterprise License Agreement for support, your own operations team that has to learn how to manage it (compared to the other three SAN solutions they’ve inherited over the past decade). And that’s just the SAN box. Of course, you also just finished M&A of a rival which came with its own NAS based strategy and associated hardware and software. I can keep going but you get the idea. Let’s be honest and compare apples and apples.

Yes, it’s expensive. But the cost will go down as more and more people and companies adopt it. Ironically, that will also improve its quality. It’s a virtuous cycle that can’t be duplicated in the complexity of on-prem plug-n-play architectures of yesteryear. At the end of the day, my argument is an architectural one. We have learned a lot about distributed systems architecture in the past ten or twenty years. It’s practically impossible to retrofit those learnings into old monolithic architectures like the ones currently running in every enterprise data center in the world.

At the end of the day, I walked away being even more convinced that we will see a massive sea shift to the public cloud for enterprise companies over the few short years - at least the ones that want to survive in the long run.

Thursday, December 15, 2016

The Death (and Re-Birth) of Infrastructure

“One of the insights from our research about commoditization is that whenever it is at work somewhere in a value chain, a reciprocal process of de-commoditization is at work somewhere else in the value chain. … The reciprocality of these processes means that the locus of the ability to differentiate shifts continuously in a value chain as new waves of disruption wash over an industry. As this happens companies that position themselves at a spot in the value chain where performance is not yet good enough will capture the profit.”
Clayton Christensen. The Innovator’s Solution.
"To really understand something is to be liberated from it.”
Dominic Frisby. The Four Horsemen.

“You know, I have one simple request. And that is to have sharks with frickin' laser beams attached to their heads! Now evidently, my cycloptic colleague informs me that that can't be done. Can you remind me what I pay you people for? Honestly, throw me a bone here. What do we have?”
               Mike Meyers. Austin Powers: International Man of Mystery.

I received a lot of feedback on my recent blog regarding “Public vs. Private Cloud”, in which I argued that private clouds, shrink-wrapped software, and - in general - on-prem infrastructure, are going the way of the Dodo. Most of the feedback was positive but I did also receive some pushback from IT practitioners so I figured I’d add an update and try to address some of the points that have been raised.

There is nothing new or earth shattering in what I wrote. Many experts have been saying the same thing for years. We all seem to agree that we’re moving towards the cloud. Yet, for some reason, enterprise companies continue to invest in, and perpetuate, the old model for infrastructure deployment. With all the hype around cloud adoption, it’s easy to forget that over 90% of all IT spend still goes to traditional on-prem deployments. Inertia continues to be a big factor in Enterprise IT organizations just as incrementalism reigns supreme in the R&D organizations of “old school” system and software providers.

I spent most of my career building operating systems and distributed system software delivered as shrink wrapped software and meant to be deployed on-prem. I'm proud of what we all accomplished as an industry. We've come a long way. But those battles are pretty much over and the industry has moved “up the stack” so they can continue to innovate in new and uncharted territories. Very few companies are starting new processor architectures or building operating systems from scratch. The world standardized on one of two processor types (x86 or ARM), one of two operating systems (Linux or Windows), one of two relational databases (Oracle or SQL), and so on. There is no longer any point in arguing that MIPS was a better/cleaner processor architecture. I personally spent a huge chunk of my career on that processor and am proud of the work we did but there are no longer any companies out there building systems based on the MIPS architecture. More importantly, there are no companies offering applications compiled for that instruction set. It’s time to move on.

The same logic should be applied to on-prem infrastructure hardware and software. We need to agree, as an industry, that the Enterprise-IT-owned-and-operated data center will also soon go the way of the Dodo. By the time you take all the variables in the equation into account, the total cost of ownership of any such solution far surpasses any cloud based solution. Here, I’m including all the hidden costs of installing, managing, patching, upgrading, securing, and testing infrastructure hardware and software in support of enterprise application delivery. Perhaps the only factors in favor of on-prem infrastructure are compatibility and familiarity. But at the rate this industry is moving, you will be rethinking that particular application or service in five years anyway, so why worry about compatibility with what you were running five years ago? Continuing to invest in on-prem infrastructure is the equivalent of throwing good money after bad down a bottomless well.

The typical smart Enterprise IT person usually spends a large percentage of his or her time and efforts getting close to purveyors of on-prem hardware and software. As the organization grows, he is constantly pressured into buying more servers, improving security, increasing storage, adding better email archival and compliance tools, adding load balancer appliances and firewalls as he is asked to offer better service availability for employees as well as customers. He is constrained by his budget and strategic decisions by upper management. The easiest answer is to ask for the same budget as last year and keep buying more "stuff". It's the path of least resistance. And because most enterprise applications run on-prem today, it's often easiest to just add to the existing infrastructure rather than completely overhaul the application.

These same smart guys often end up creating a symbiotic relationship not just with the sales teams at those hardware and software vendors but also with their respective R&D organizations. And they pressure these well-meaning R&D organizations into signing up for huge enterprise license agreements "if only" they can get a specific feature added to the product. The PM in charge shakes hands, promises to "look into it" for the next release of the OS or the appliance or the firewall, and creates the appropriate specs to get it into the next release. This is how incremental improvements that only solve the problems of a single organization end up as "requirements" that shape major releases of all platform software.

Meanwhile, back at the ranch, that IT organization has gone through three re-orgs and leadership changes, has changed direction four times, and has laid off or churned most of its staff. Finally, a year or two later, the fateful moment arrives and we deploy the new version of software on all our servers. And, of course, they all promptly crash. The engineers spend all weekend debugging the problem in the customer’s environment and come back with their verdict: “We ran into a specific bug that only manifests itself when you run version x.y of that firmware on the network controller as well as version a.b of the network driver from the vendor and you have to add more than 5000 firewall rules through this API that the customer requested. We had accounted for two out of three variables but had to cut the testing for that particular combination of variables in order to meet our schedules.”

So, basically, your data center is the first time all these pieces of code have come together - and I’ve only described the simplest of scenarios. By this point. the smart Enterprise IT Guy is polishing up his resume and quietly moves across the street to a competitor. The developers who worked on the software are long gone as well so some poor engineer in a maintenance team gets to “fix” the problem - which usually means introducing a hack because he or she doesn’t really understand the intent of the original author.

Multiply this by two dozen hardware and software vendors and you see why the private cloud/local data center story is doomed to failure. The costs associated with the "old" model of computing are often not included in the math when opting for on-prem solutions. Playing System Integrator to dozens of disparate pieces of hardware and software, owning and operating every level of the stack by people who don’t have access to the code, no longer make sense.

The war is over. Just like we gave up and standardized on one processor architecture and moved up the stack, it's time to admit that there is much better hygiene in the public cloud world than there is in the spaghetti world of shrink wrapped on-perm software. Reducing the combinatorics increases reliability by reducing complexity. Go up the stack, young man! Stop fretting about infrastructure, outsource it all to cloud based services, and move up the stack if you really want to add value to the business. Think at the application level, at the service level, not at the infrastructure level.

The only investment I would make in on-prem software at this point would be to improve utilization of existing infrastructure and applications. If it helps squeeze more out of the existing hardware and software, go for it. Otherwise, stop. Stop buying hardware, stop buying software, stop upgrading (except for security fixes), just stop! Instead, go spend the time to understand the real requirements from the organization on the specific enterprise application. Take the top five requirements, find the cloud vendor that offers them most effectively, and start using it as-a-Service. Don't worry about every random and esoteric feature that your employees currently use. They'll figure out how to do their job some new way. Worry about really nailing the top five requirements. If the rest of your requirements are really important, the software-as-service provider will sooner or later offer it - after the rest of the community-at-large has thoroughly tested it. Not as some one-off feature that you get to be the guinea pig for. The sooner we all abandon the "old" model and move up the stack, the better off we all are.

And, if you're in the on-prem infrastructure hardware or software business: Stop listening to your enterprise customers when they ask for bespoke features. You're not helping anyone. Chances are, you will build a feature that doesn't actually do what the customer really had in mind, will divert crucial development and test efforts that are doomed because they are guaranteed not to mimic the customer's kaleidoscopic and bespoke environment, and will end up disappointing the customer in the end.

IT personnel will correctly point out that they are often powerless when it comes to making such major architectural changes. The purse strings are held by lines of business within the corporation, they only get to implement what the various Business Units want. Having spent millions of dollars on data centers and related infrastructure, those decision makers are reluctant to abandon the status quo for the promise of the cloud.

The good news here is that developers in those same BUs are already moving to the public cloud in droves. They don't want to be bothered with infrastructure details or delays in hardware procurement, storage allocation, and network reconfiguration required to shoehorn a new application into an existing data center. It's so much easier to pull out your credit card and buy some capacity on a public cloud. Why fill out forms and wait weeks when you can start coding in minutes? It's only when the application is ready for deployment that the IT team is consulted. Given this trend, it will only be a matter of a few years before all new applications are cloud native and the on-prem infrastructure is relegated to the dust bin of history or at best begrudgingly maintained for legacy application support.

My recommendation to IT personnel, in this case, is to avoid adding more load, more users, more applications to the existing on-prem infrastructure. Cap the investment and aggressively move new applications and users to the cloud. Buying more hardware will only increase your depreciation budget over the next few years, thereby further reducing your ability to cut the overall Capex expenditure. If you need additional capacity during the transition, try renting bare metal servers from public clouds instead of building new data centers or upgrading existing ones with new hardware. This at least gets rid of part of the Capex problem and gives you a chance to validate the cloud provider’s reliability and availability while you wait for the next hardware refresh cycle. This, by the way, is the only way in which a “hybrid cloud” strategy makes sense: the outsourcing of hardware to public cloud providers for “bursting” workloads at peak times instead of amassing vast quantities of hardware on-prem just in case you need it.

I fully recognize that the enterprise application market has a very long tail. There are still many companies out there benefiting from the IBM mainframe based market. Many others will continue to flourish for the next decade or two (at least) on the on-prem infrastructure hardware and software market. But its days are numbered and we all need to step up and rethink our Enterprise applications in the process. We might as well start with a platform (the cloud) that is twenty years newer and fresher in terms of architecture. As opposed to continuing to spend 80-90% of our budgets on perpetuating the legacy enterprise stacks that were designed and implemented in the eighties and nineties. Here’s the rub, though: To do so will require really sitting down and understanding the top requirements. As opposed to assuming that 100% backwards compatibility trumps all others.

So much has changed over the past two decades. We have learned so much about availability, about reliability, about distributed systems architecture, about telemetry, about analytics and about security.Trying to shoehorn all of those learnings into a dated deployment architecture and a monolithic code base is like wearing a straightjacket and then picking a fight with Mike Tyson. You know it's not going to end well.

A new generation of startups are disrupting every industry on the planet: not just consumer brands but also enterprise brands like Workday and Salesforce and Atlassian. I can’t think of a single new startup that concentrates on on-prem software alone. They may offer an on-prem version of their product but all of their development and testing efforts are geared towards cloud based solutions. Carry that trend forward a couple of years and you will see the end of the traditional model.

The startup community and the VC community have spoken clearly. The former are now running world-wide operations and delivering services to millions of customers while the latter have bet heavily on their eventual success. Some small subset of these startups will become the IBMs, Microsofts, and Oracles of tomorrow - and they will get there with 100% born in the cloud software stacks. In fact, they are already delivering enterprise-class software to thousands of enterprise companies and millions of end users.

Who do you think will be more agile five years from now? The enterprise companies who amassed their own data centers and spent their time being System Integrators for the old guard or the ones who bet on the next generation computing platform - the cloud?

Monday, December 5, 2016

Psychological Disorders as Success Criteria in the Computing Industry

“How can we distinguish what is biologically determined from what people merely try to justify through biological myths? A good rule of thumb is ‘Biology enables, Culture forbids.’ Biology is willing to tolerate a very wide spectrum of possibilities. It’s culture that obliges people to realize some possibilities while forbidding others. Biology enables women to have children – some cultures oblige women to realize this possibility. Biology enables men to enjoy sex with one another – some cultures forbid them to realize this possibility. Culture tends to argue that it forbids only that which is unnatural. But from a biological perspective, nothing is unnatural. Whatever is possible is by definition also natural. A truly unnatural behavior, one that goes against the laws of nature, simply cannot exist.”
Yuval Noah Harari. Sapiens: A Brief History of Humankind.

“If it's worth doing, it's worth overdoing.”
Ayn Rand. 1905-1982.

“Only the paranoid survive."
Andy Grove. 1936-2016.

“There was something very slightly odd about him, but it was hard to say what it was.”
Douglas Adams. Hitchhiker’s Guide to the Galaxy.
I have blogged often about my struggles with a mild case of Obsessive Compulsive Disorder. In hindsight, I find I have spent my life obsessing mostly about socially acceptable pursuits: Work. Music. Sports. Books. I have tons of bad habits, too. But, thankfully, I don't obsess about washing my hands or bouncing quarters off the bed to make sure the covers are tucked in perfectly (I lived for a short while with a roommate who did. It was an eye opening experience!) I'd like to think I appear mostly normal to others.

Over the past thirty five years, I've worked with thousands of smart and successful people in the computing industry. Now that I look back on it, I am amazed by what a large percentage of them exhsibited symptoms from one or other so-called psychological “disorders”, myself being the poster boy. When I talk about psychological disorders here, I'm referring to those listed in DSM 5: Diagnostic and Statistical Manual of Mental Disorders, the authoritative book on the topic according to the American Psychiatric Association. I’m intentionally using quotes for the word “disorder” because I plan to argue that these once abnormal tendencies are now commonplace, at least in certain professions, and should no longer be viewed as abnormal nor should they be stigmatized. The question is not whether many of us suffer from such traits but whether we are able to function as normal and successful members of society despite them.

One obvious example of such behavior is an addiction to and an obsession for extreme sports. I claim, based mostly on anecdotal data, that a statistically aberrant percentage of successful people in the computer industry obsess over sports of one kind or another. I don't mean running a 5K or going for a weekend hike with the dog. I mean ultra-marathoners who routinely run a hundred miles or more, IronMan triathletes, bikers who do century rides every weekend, mountain climbers who train to climb Mt. Rainier, you name it. Several articles have recently been written on this topic, highlighting surges in kiteboarding, skydiving, sports car racing, mountaineering, and other similar extreme endeavours among Silicon Valley entrepreneurs. These are all extremely busy people - most of them working sixty, eighty, a hundred hours a week. Yet, they also somehow find the time to spend 24 hours running non-stop up a mountain or to bike two hundred miles from Seattle to Portland “for fun” in a single day!

This is not normal. You cannot compete in an IronMan triathlon unless you obsess over your training. You cannot run a hundred miles in a single day unless you run the equivalent of a marathon (and more) every weekend. That takes time, it takes commitment, and it takes obsession. Yes, of course, I see the logic. These are competitive people and they find avenues, outside of work, in which to push themselves. Or, perhaps I'm confusing cause and effect. Perhaps, it is precisely because they have obsessive personality traits that they are successful in this business. You need to obsess about work in order to succeed in this competitive environment. Either way, I claim we are seeing an ever increasing incidence of obsessive tendencies in the computing industry.

My own obsession with running meant I spent over twenty years doing permanent damage to my hips, knees, ankles, spine, etc. Even a lower back surgery a dozen years ago did not stop me. Marathons were ultimately too harsh on my body so I settled down to a regular cadence of half marathons - at least one or two a month for a dozen years or more. I liked this distance not because it was less time consuming but because I could do it more frequently, causing even more damage! I ran through the pain, I ran against the advice of doctors, I ran until I couldn’t run any more. It was only after the doctors gave up on my left ankle that I finally switched to biking, a sport that is much easier on your body but also requires two to three times as much time in the saddle for an equivalent workout. No worries. In order to compensate, I will bike four hours a day and I'll go up the mountain for ten miles straight!

That is not normal. We have all grown accustomed to this kind of story and see it as normal but it requires obsessive tendencies to compete at this level. I’m not fast enough or strong enough to compete at a professional level. The only person I’m competing with is myself. Note that I’m not painting a negative picture here. Not every obsession is a bad thing. I’m happy that some of us have found relatively positive avenues for our obsessive tendencies.

Studies have also shown that autism is linked to mathematical talent and that college students opting for STEM (Science, Technology, Engineering, and Mathematics) degrees have a higher than normal incidence of autism in their families. And one of the symptoms of autism is an “intense interest in a limited number of things” - in other words, obsessive behavior. The prevalence of mild autism (Asperger’s Syndrome) has been documented widely in the industry with well known examples such as Bill Gates. He is one of the most intelligent men I've ever met, but, according to the book, he “suffers” from a psychological disorder.

OCD and Asperger’s are only two psychological “disorders” that I have witnessed in the industry. In many cases, people with these conditions end up with successful careers despite their “disabilities”. Clinical depression is another common, but often hidden, problem. Suicide is rare but not unheard of. More common are addiction to alcohol and drugs, trends that are sometimes tacitly allowed by high tech companies.

“According to the Substance Abuse and Mental Health Services Administration, computer and data processing workers have the highest incidence of heavy alcohol consumption. Nearly 16.2 out of every 100 workers admit to engaging in heavy alcohol use.”
Jefferson Hane Weaver. What are the Odds?: The Chances of Extraordinary Events in Everyday Life.

Addictions to computer games are less stigmatized and, arguably, even more prevalent in the industry. Of course, every organization also has its share of ADHD, megalomaniacs, paranoids, narcissists, and other “abnormal” personality traits. It is a testament to human ingenuity that we all manage to behave as normal productive team members and leaders even while grappling with these “psychological disorders”.

Which brings me to my point: I wonder if we are just a microcosm of society at large - in other words, that these so called “psychological disorders” are now more commonplace than in the recent past, in which case, perhaps we should stop calling them “disorders” and “diseases”. After all, do you know anyone who doesn't obsess over their Facebook feed, their Twitter feed, their Instagram feed, or their Slack channels, checking his or her smartphone every two minutes? If everyone is obsessed about something all the time, perhaps this is really the new normal. A few short years ago, relatively few of us obsessed over our electronic friends - the computers. Fast forward twenty years and we seem to have addicted the entire planet.

Or, perhaps, as data seems to indicate, there really are a disproportionate number of those personality “disorders” present in the computing industry. Are we perhaps just the vanguard of society at large in this respect? The canary in the coal mine, so to say? I don't know the answer. What I do know is that our DNA hasn't evolved fast enough over the past few dozen years to accommodate such drastic changes in social demographics. Which means, these “disorders” may be influenced by genetics but are more often learned behaviors reinforced by the environment and activities we choose to pursue. If you don't believe me, just look at how quickly the general public (and specially the next generation) has taken to their digital addictions, something that didn't even exist twenty years ago. Either we have an epidemic on our hands or this is mostly learned behavior, rewarded and reinforced by our online actions.

“One of history’s few iron laws is that luxuries tend to become necessities and to spawn new obligations…. How many young college graduates have taken demanding jobs in high-powered firms, vowing that they will work hard to earn money that will enable them to retire and pursue their real interests when they are thirty-five? But by the time they reach that age, they have large mortgages, children to school, houses in the suburbs that necessitate at least two cars per family, and a sense that life is not worth living without really good wine and expensive holidays abroad. What are they supposed to do, go back to digging up roots? No, they double their efforts and keep slaving away.”
Yuval Noah Harari. Sapiens: A Brief History of Humankind.

The more competitive we become in the workplace, the more money we make, the more often we get promoted, the more addicted we become to work. It’s a virtuous cycle for a few and a vicious one for many. The same is true in our social activities and personal lives. The more pieces of gold we collect in a video game, the more likely we are to return and try again - even if those pieces of gold are not real. The more frequently we check our social media feeds, the more frequently we are rewarded by updates, the more likely we are to come back for more. One of the costs of such a life is a tendency to obsessive behavior. Either everyone has suddenly developed a genetic predisposition to OCD or we see a huge uptick because the environment rewards and reinforces obsessive behavior. As we build ever more immersive (read: addictive) experiences in the digital world, we should be careful about the impact those experiences have on our psyches.

I know this is an oversimplification it but I like that answer. I like believing that such traits are about learned behaviors as much, if not more, than about genetics. Learned behaviors can be unlearned. The environment can be changed.

Saturday, November 26, 2016

Public Cloud or Private, that is the Question!

"The reason that God was able to create the world in seven days is that he didn't have to worry about the installed base."
Enzo Torresi. 1945-2016.

“I consider the bicycle to be the most dangerous thing to life and property ever invented. The gentlest of horses are afraid of it.”
Samuel G. Hough, General Manager of the Monarch Line of steam ships. July 14, 1881.

“Where a calculator on the ENIAC is equipped with 18,000 vacuum tubes and weighs 30 tons, computers in the future may have only 1,000 vacuum tubes and perhaps weigh 1.5 tons.”
Popular Mechanics. March, 1949.

A few days ago I published a blog offering career advice based on some questions I’ve been getting from former colleagues. Interestingly, the part that garnered the most comments and feedback was about public vs. private clouds and shrink wrapped on-prem software vs. cloud based services:

Run, don't walk, away from any company recruiting you to work on shrink wrapped on-prem software. The days of the private cloud are numbered as are the days of legacy shrink wrap software that needs to be installed, maintained, and managed by an IT department. All of those companies are busy trying to figure out how to move their software to the public cloud.

If you are working for a hardware company, I feel for you. As the industry figures out how to offer anything-and-everything-as-a-service and as hardware becomes more and more commoditized, it’s no wonder that hardware companies are struggling - even the ones who are trying to “add value” with their “differentiated software stacks”. Proprietary software tied to a specific vendor’s hardware is a recipe for obsolescence. No, thank you.

The last company I worked for, Cloudflare, offers DDoS-protection-as-a-service, Firewall-as-service, DNS-as-a-service, Load-Balancing-as-service, Failover-as-service, Caching-as-service, you name it. Just take out your credit card, go to their website, and you can sign up for any of those services in just a few minutes. No need to buy any hardware or maintain a random collection of servers and routers and switches and disk arrays in your own data center, no need to hire an IT staff to manage aforementioned infrastructure. Oh wait. Did I say take out your credit card? Never mind. They offer most of these services for free. Why would I want to run my own data center full of servers and stale software, again?

"Private cloud" is just a euphemism for “IT department unwilling to let go of the past”. Even companies in heavily regulated industries are busy trying to figure out how to get out of running their own data centers. The public cloud is the future, no question in my mind. And the only way to win there is to be “cloud native”.

I was asked to expand on my comments and justify my rationale, hence this update, seen primarily from the enterprise customer’s point of view.

I believe the companies that will thrive in the next generation of computing, in the cloud era, are not the ones that will sell you hardware, then sell you the operating system to run on that hardware, then sell you the app to run on that OS, then sell you the backend database needed to scale that app, then sell you the management solutions to manage that app, then sell you the backup solution that integrates “deeply” with that app, then sell you the identity solution that also integrates “deeply” with that app as well as the management solution, then sell you the load balancers and firewall to put in front of that app, then sell you … I can keep going but I think you get the idea.

Don't you see that the minute you buy a piece of hardware and put it in your data center, you start building a bespoke stack and that nothing you do will ever solve that problem nor make it cheaper or easier to maintain? Every step you take from that moment on only adds to the complexity of managing that piece of hardware sitting in your data center.

Similarly, when you buy a piece of shrink-wrapped software - any software, be it an operating system or a firewall appliance or an HR application or an Oracle database - you are not buying just that piece of software. You are buying into an ecosystem that will, sooner or later, become a ball and chain forcing you to continue to invest in it.

Worse yet, that software running in your data center is guaranteed to be stale the minute you deploy it. Much like a car that loses a huge chunk of its value as soon as you drive it off the dealer lot, shrink wrapped software, on-perm software, by definition starts rotting the minute you start using it. The problem is that it takes us (let’s be generous and say) three to five years to design and build the various components needed to get your bespoke solution working and it takes you at least a year to do the integration and qualification testing needed to get the end to end solution working in your environment.

And during those three to five years, the rest of the industry has moved forward by leaps and bounds with respect to best practices in reliability, availability, security, and maintainability - just to name a few “abilities”. By the time a feature, any “enterprise” feature, makes it through the sausage factory, it must be designed, implemented, and tested - not just by one engineering team or even by just one company but by an entire slew of companies.  Worse yet, by the time you’ve run the gauntlet and gotten all the various pieces of hardware and software working together, it’s already time to install patches and service packs and agents and plugins and connectors and start the next round of upgrades. In essence, you sign up to be the system integrator and you have to keep the beast fed, all in order to pull together a solution that is unique to your company when viewed end to end.

You don't believe me about “bespoke stacks”? Try comparing your Exchange Server implementation with that of your competitor across the street. He has Exchange 2003 SP3 running on Dell servers with EMC Symmetrix and Windows Server 2008 SP2, integrated with Active Directory for identity and Digital Rights Management add-on for better security. He is running it on vSphere 6.0 and using OpenStack for management. His backup solution is Symantec NetBackup 6.3 and he is using a four node Cluster. He is also using F5 for load balancing and Cisco ASA for Firewall. Except for that one division that came through an acquisition. They have a different version of Exchange running on HP servers with NetApp filers and HP OpenView...

Your setup, let’s just say, is slightly different.

Any wonder none of those vendors can replicate your problem when you call them at 3:00 AM Saturday morning when you can't restore from a daily backup or when your perimeter is breached and your emails show up on Wikileaks? These are just half a dozen variables in the hardware and software soup (dare I say cesspool?)  that is running in your “private cloud enabled data center”. There are hundreds, if not thousands, of such variables in each data center running shrink wrapped on-prem “Enterprise class” software. Each of those components has gone through rigorous testing but I can pretty much guarantee they have never been put together in exactly the combination that you have chosen. Every time you change a single variable, you double the complexity and the testing matrix for the companies involved.

Every single enterprise company out there is running a bespoke software stack - nay, a dozen bespoke software stacks - in their data centers, one for each enterprise application. The example above just covered email. Add to that HR and CRM and Finance and a dozen other stacks. And every one of those stacks, I claim, offers less reliability, availability, security, performance, manageability, and worse overall TCO than any of the current generation of public clouds and comparable SaaS solutions.

Here's a simple analogy: If I ask you to take me from point A to point B, would you take out your phone and call for an Uber or would you start ordering vehicle parts so you can assemble a car to fulfill the request? Even if you choose to take the latter route, I bet you wouldn’t order parts from a dozen different car companies. Then why are you doing that when you want to run the most critical business apps, the ones that your company depends on? On-prem shrink wrapped software is evil. Private clouds solve only a part of the problem but don’t address the fundamental issues I’ve described above. Hybrid clouds? Those don't even really exist.

What I've said here is probably obvious to most industry pundits. What amazes me is that so many multi-billion dollar companies continue to employ thousands of engineers and build on top of the same ancient delivery model. Once they’ve sold you all the bits and pieces, they also promise that they will “hide” the complexity by giving you “universal management tools” and “one pane of glass visibility”.

At the end of the day, every piece of software you install and maintain in that ecosystem requires plugins and patches and “agents” and “connectors” in order to integrate with the other pieces of software. The complexity increases exponentially every time you add one more variable, all the way from hardware to operating system to applications to storage system to firewall to identity system to backup system to management solution to … you name it. And you are the system integrator. I guarantee no one else in the world is running the same exact mix of hardware and software that you are.

If you can find a single enterprise company IT department that offers the same levels of availability, reliability, security, and performance as the public cloud, I would urge you to short their stock. Because they are obviously spending way too much on their IT budget instead of their core business. The reality is that most Enterprise IT organizations make compromises based on budget constraints, time constraints, political constraints, lack of information, and often even the whims of their personnel. “I absolutely hate Microsoft and refuse to buy any of their software.” These same IT organizations also change their requirements every once in awhile - as disruptive industry trends catch up and their associated costs drop, as they go through acquisitions or mergers, as new CIOs come and go, as solution providers go out of business, and for a dozen other reasons. So you end up with spaghetti in the data center. You end up with a dozen miscellaneous unpatched operating systems on “appliances” because, of course as we all know, “appliances don’t count because I don't have to worry about the OS.” You end up with a dozen competing management solutions that promise to make your life easier but, in fact, often only add to expenses without delivering sufficient ROI.

At best, you end up building a system that works well during normal operations but falls apart as soon as any single component hits a problem. Any such deployment doesn't just have a Single Point of Failure. It has many Single Points of Failure. Compare that to the current generation best of breed public clouds that are designed from the ground up for redundancy, designed for availability, designed for maintainability. Designed for Failure. Remember that these are enterprise application deployments we are talking about - ones that your business depends on. Which environment would you rather depend on?

Hosted solutions are a step in the right direction as they remove several variables from the on-prem equation. The right long term solution is to re-architect all these applications for the cloud; to make them “cloud native”, not to try to use a forklift to move the legacy monolithic applications to the cloud because your IT department is “comfortable with the current tools”. it’s hard to let go of legacy but I argue it’s always better to understand the core requirements for that application, for that workload, and find the closest commercially available SaaS solution on the market. I don't even have to do the math to show you that such an answer is always the right one in the long run based not just on CapEx and OpEx savings but also in terms of overall service availability and reduced attack surface. But the IT department is usually the last one to tell you that. Their jobs are not best served by that answer. Nor are the hardware vendors. Nor are the database companies. Nor are the operating system companies. Nor are the application providers. Nor are the management solution providers.

The promise of the cloud is obvious. Utility computing. Simplicity. Fewer variables. We standardize on one piece of hardware, one operating system, one set of management tools. And, for all intents and purposes, we will always run the latest version of software. And we will offer you an SLA - which means we have to constantly monitor service levels, something your IT department is probably not doing. And We will do immediate postmortems and Root Cause Analysis in the case of service failure and share the findings with the public. In such a world, the fewer variables the better. Choice is the enemy of simplicity and reliability.

Seems obvious. Yet, a lot of people are still hanging on to the old delivery model. Every excuse is used to perpetuate the old world: compatibility, regulatory compliance,, training costs, budgetary constraints, etc. That model (bespoke stacks running in your data center) made sense ten or twenty years ago when we didn't have public clouds offering utility computing, when high speed connectivity didn't exist, when we didn't have such demanding service availability requirements. It makes no sense in the new world.

One other thing that these companies seem to miss is that customers usually make decisions based on applications, not on infrastructure. Updating and re-architecting an enterprise application (email or HR or finance, for example) is an arduous multi-year journey. Enterprise companies make these decisions one application at a time and they do so infrequently - for good reasons. If I'm looking to upgrade my aging Exchange email infrastructure, I want to look into all the new architectures that have come along since Exchange was architected over twenty years ago. Wouldn't it make a lot more sense and be a lot cheaper in the long run to switch to gmail, for example, than some Frankenstein Exchange solution virtualized to run in a VM so we can continue to run Exchange 2003 SP9 and back it up with Backup Exec 3.2.5b?

The right answer is not to perpetuate the old model but rather to cap investment in existing on-prem solutions and to implement tools that help squeeze every ounce of value from your sunk cost in the existing on-prem infrastructure while re-architecting those applications for the public cloud if they are core to your business or outsourcing them to SaaS providers if they are not.

Seen through this lens, a whole slew of companies are doomed in the long term - unless they reinvent themselves. It is just as hard for these companies to do so as it is for their enterprise customers to move off existing infrastructure and solutions. Too much inertia, too many engineers and executives who are happy making incremental improvements to existing products rather than rethinking their value prop and business model. Very few companies have been able to successfully maneuver through such a transition in the past. Adobe seems to be a good recent example - a company that went from delivering shrink wrapped software running on your home PC to a cloud based service without losing a large percentage of its customer base in the transition to the new architecture. It was able to do so because it drew a clear line in the sand and stopped supporting their “legacy” install base pretty quickly. This is an extremely hard thing for a company to do - walking away from its install base, its cash cow, its revenue stream - and committing itself to a disruptive new business model. The classic Innovator's Dilemma.

As I was trying to say earlier: the companies that will survive in this new generation will be the ones that help you re-architect your application so it fits better to the cloud world, not ones that keep selling you solutions for the old world. The journey to the cloud needs to happen application by application, not company by company. The trick, of course, is to avoid making the same mistakes again in the cloud, swapping one bespoke brittle proprietary stack for another hidden behind an API. That, I’m afraid, will have to be a topic for another day.