Friday, July 18, 2008

Citrix Management Land Grab - Project Kensho

In an effort to secure the management technology high ground as hypervisors proliferate and become more of a ubiquitous commodity than a premium point product, Citrix has announced Project Kensho. The strategy to enable portability and scalability across heterogenous hypervisors is so obvious and correct, that my only question after reading the release was “What is the providence of the name 'Kensho'?”

Given Citrix' headquarters deep in Florida, my first speculation was that it was a phonetically correct implementation of an affirmative response with a southern accent:

“I ken sho' yew zackly how dis new technology iz gonna be betta dan anything yew eva saw in yo' life.”

It seemed reasonable at first, given my own tendency to display a southern flair. But given the likelihood of a strong influence by Simon Crosby, an Englishman who is the Citrix CTO for all things virtualization, perhaps the name is not based at all upon the southern heritage of Citrix. I sent Simon a note asking about the providence of the name, and he replied:

“Kensho is a Zen Buddhist term (pun on Xen) for enlightenment experiences . . . . Now go deep into your Zen mind and figure those out!”

With the mystery of the name solved, let's make some commentary on the obvious. I have no doubt that the future of application release and lifecycle management is going to be based upon virtual machine images. By releasing and managing applications as virtual machines, it is possible to define the application independent from the infrastructure upon which it runs. In so doing, applications can be deployed and scaled on-demand - on any virtualized cloud of machines - without complex, costly, ad hoc setup procedures. More importantly, you can de-scale on one infrastructure and re-scale on different infrastructure as demand fluctuates because the setup and configuration information is not unique to the infrastructure. Unlike VMware's VADK technology which is unique to the VMware hypervisor, Project Kensho is aiming higher by embracing this obvious, management focused, hypervisor independent architecture for cloud computing.

Well, I say “Welcome to the party, Citrix!” The more voices we have proclaiming the benefits of this new architecture for cloud computing, the better. I have spoken with not less than 12 CTOs and CIOs over the last 3 weeks who have proclaimed to me the importance of multi-hypervisor support for any application release and lifecycle management system based upon virtual machines. The ability to scale, de-scale, and re-scale seamlessly and repeatably across multiple infrastructure targets is critical if cloud computing is to move from promising hype to bankable reality. Kudos to Citrix for moving the ball down the field on this critical goal with Project Kensho.

Thursday, July 10, 2008

Thank you, Diane Greene

“Hello, this is Diane Greene.” Such was my introduction to Diane back in 1998 when she joined a conference call with me and Matthew Szulik. I had just reviewed the VMware technology with one of VMware's business development managers, Reza Malekzadeh, as part of a partnership opportunity between Red Hat and VMware. Red Hat, although still a very small company with only 70 employees and about $12M in revenue, was a hot target for alliances, and VMware wanted us to distribute their product with our Red Hat Linux product as part of the “extras” CD. Our engineers thought the technology was “very cool,” so shipping it as part of the CD made sense because it would create more demand for our product. I also thought it was cool, but at the time I was very skeptical of the business model.

Reza had shown me a diagram of the different permutations of how someone might use VMware. He indicated that it would be used immediately as a host environment atop an existing OS (such as Windows or Linux) to enable developers to rapidly develop and test for many platforms atop their workstations. But, he indicated to me that the big vision was for VMware to be the bottom layer, right against the hardware, with multiple other OS implementations running as guests atop that layer. My response: “I don't understand why anyone would ever want to do that.” Now we understand why Diane got SDForum's visionary award a few weeks back and Billy Marshall was lucky to be on the guest list.

Not one to be left behind, it only took me 6 years to determine that this new approach indeed represents one of the biggest opportunities to improve the efficiency and capability of information technology. As the hypervisor replaces the general purpose OS as the layer that exposes the hardware, the applications that ride atop that layer become much more portable and the datacenter resources become much more efficient. As Diane leaves VMware to explore her next opportunity, I owe her a big debt of gratitude for shining so much bright light on this revolutionary approach to computing. Thank you, Diane Greene, for giving all of us that play in this market an opportunity to do something wonderful for our customers.

Monday, July 07, 2008

Shut Down the Datacenter

Or at least power down significant pieces of it during periods of low demand. This message always draws funny looks from IT types when I suggest a seemingly simple answer to the problem of extreme costs for datacenter resources. I push on:

Billy – If utilization is around 20 – 30%, aren't there periods of time when you could just shut down about 50% of the systems? Or at least 25%?

IT – We can't just shut the systems down. . .

Billy – Why not? You aren't using them.

IT – You don't understand.

Billy – What am I missing?

IT – Well, it just doesn't work that way.

Billy – How does it work?

IT – It takes a long time to lay the application down atop a production server.

Billy – Why?

IT – Set up is complicated. Laying down the application and bringing it online can take several days, typically 2 to 4 weeks.

Billy – So part of the application definition is described by the physical system it runs on?

IT – Yes, that's right. If I shut down the physical system, I lose part of the definition and configuration of the application.

And therein lies the culprit. The “last mile” of application release engineering and deployment is a black art. Applications become tightly coupled to the physical hosts upon which they are deployed, and the physical hosts cannot be powered down without losing the definition of a stable application. Bringing the application back up is expensive due to the high costs of expert administration resources, and it is fraught with peril because the process is not repeatable. Enterprises are spending billions of dollars on datacenter operating costs because the risk of bring applications back on-line is not worth the savings of taking them off-line.

Of course I blame most of this mess on the faulty architecture of the One Size Fits All General Purpose Operating System (OSFAGPOS). OSFAGPOS is typically deployed in unison with the physical hosts because OSFAGPOS provides the drivers that enable the applications to access the hardware resources. To get an application to run correctly on OSFAGPOS, the system administrators then need to “fiddle with it” to adjust it to the needs of any given application. This “fiddling” is where things run amok. It's hard to document “fiddling,” and it is therefore difficult to repeat “fiddling.” The “fiddle” period can last for up to 30 days, depending on the complexity of the “fiddling” required.

So how do we get away from all of this “fiddling” around, and deploy an architecture that allows the datacenter to scale up and down based on actual demand? Start with a bare metal hypervisor as the layer that provides access to the hardware. Then extend release engineering discipline to include the OS by releasing applications as virtual machines with Just Enough OS (JeOS or “juice”) in lieu of OSFAGPOS, complete with all of the “metadata” required to access the appropriate resources (memory, CPU, data, network, authentication services, etc.). By decoupling the definition of the application from the physical hosts, a world of flexibility becomes possible for datacenter resources. Starting up applications becomes fast, cheap, and reliable. As an added bonus, embracing cloud capacity such as that provided by Amazon's EC2 becomes a reality. Instead of standing up application capacity in-house, certain peak demand workloads can be deployed “on-demand” with a variable cost model (in the case of Amazon it starts at about $.10/CPU/hr).

With oil trading at around $140 per barrel, the cost of allowing datacenter resources to “idle” during slow demand periods is becoming a real burden. “Fiddling around” with applications to get them deployed on OSFAGPOS is no longer just good clean fun for system administrators. It is serious money.

Labels: , ,

Monday, June 23, 2008

Red Hat oVirt(ly) targets VMware and Citrix

Red Hat announced last week that they are developing technology that will one day become a product to challenge the offerings from VMware and Citrix. The technology is showcased at a website sponsored and maintained by Red Hat as part of an “emerging technology” initiative. Aside from the fact that Red Hat is announcing technology projects and not products, the most noteworthy detail of this approach is the emergence of KVM (kernel-based virtual machine) as the hypervisor approach that Red Hat intends to back. Taking a page from the Microsoft HyperV playbook, Red Hat is claiming virtualization as a “feature” of the OS they already control in order to maintain their investment in the server distribution channel they have established with OEMs such as Dell, HP, IBM, Fujitsu, and others. Bare metal hypervisors and their associated infrastructure management frameworks are threatening the entrenched status quo of the operating system vendors, and Red Hat does not intend to “go gentle into that good night.”

Historically, a general purpose operating system performed two key functions: 1) provide drivers so that applications can access the hardware, and 2) provide system services (file system, libraries, etc.) to the applications. The trouble with this approach is that applications become artificially coupled to the hardware. Have you ever experienced pain upgrading hardware when the application that ran just fine on the old hardware no longer runs on the new hardware because of changes in the operating system? Do you find that you are continuously porting and testing applications just to upgrade hardware? Why? Because a one-size-fits-all general purpose operating system (OSFAGPOS) couples hardware support with application support due to the architecture of the product. Aside from the application portability issues, the OSFAGPOS approach also leads to the enormous management cost associated with bloating. Most applications run atop an operating system in the datacenter that is 10X the size actually required by the application, which leads to a patching nightmare.

To be fair to Red Hat, KVM could certainly be implemented and maintained by them as a skinny bare-metal hypervisor that only concerns itself with managing the hardware infrastructure. The Linux kernel is a fine provider of hardware support, and Red Hat has more Linux kernel expertise than any other vendor in the world. Their technical credibility in this space is terrific. The trick will be the marketing challenge of offering a product that has a very different value proposition than the OSFAGPOS. The goal of a bare-metal hypervisor is to support as many application OS variants as possible in order to enable application providers to choose the system software that works best for their application. This flexibility for the application flies in the face of the OSFAGPOS argument to “standardize” the OS to gain economies of scale in management and support. With OSFAGPOS, at least you know what systems need to be patched – all of them . . . all the time. Embracing a bare-metal product approach would mean that Red Hat also needs to face up to OSFAGPOS management challenges and embrace virtual appliance value for applications that run with Just enough OS (JeOS or "juice") optimized for their workload.

I believe that the separation of the duties of the historic OSFAGPOS architecture into two separate product domains - hypervisors for drivers and JeOS for application support - is inevitable. Separating hardware support from application support simply provides too many benefits. The popularity of both VMware and Amazon's emerging EC2 service are great examples. Watching both Red Hat and Microsoft navigate this change will be interesting. The marketing gurus at both companies will be strained to the breaking point as they “rage, rage, against the dying of the light.”

Labels: , , , , ,

Wednesday, April 30, 2008

When "Agile" becomes "Fragile"

Last week one of my board members, Andrew Nash, shared a conversation with the CTO of one of the ten biggest software companies in the world. The subject of the conversation was agile development. The CTO complained to Andrew that agile development is worthless if you cannot extend the stream of innovation to customers in a timely manner. What does it matter if the application developers deliver code to release engineering and QA on a monthly basis if the subsequent delivery and consumption by the customer takes between 6 months and a year? The valuable concepts of rapid feedback and minimal work in process get blown away because navigating the matrix of pain and distributing innovation to customers is so inefficient. If application providers are to truly gain benefit from agile development, they are going to be forced to embrace appliances, virtual appliances, or SaaS as the delivery model. Without control over the the system software as part of the application delivery and consumption cycle, agile development rapidly deteriorates to fragile development and becomes worthless.

Agile development is a close cousin to the lean manufacturing concepts pioneered by Toyota in the mid-80's and early 90's. The concept is simple – keep work in process to a minimum in order to discover and improve mistakes with a minimal amount of waste and rework. Agile development likewise seeks to keep work in process to a minimum in order to avoid large scale mistakes in architecture and feature design by facilitating rapid feedback throughout the value chain – from the developer, to QA and release, and ultimately to the customer. It has the side benefit that products that rapidly evolve to deliver ever greater value to the customer become “stickier” and less prone to competitive displacement. It often happens that a competitor replaces a product because the upgrade to new features in the incumbent product was more expensive than the switching costs to the competitive product. Just ask the folks that used to work at Baan about this problem. SAP made a killing switching out Baan customers due to extraordinary upgrade costs.

In order to prevent software “work in process” from piling up at the proverbial shipping dock (i.e. release engineering), the complexity of the release engineering matrix must be dramatically simplified. Fortunately, appliances, virtual appliances, and SaaS all provide this goal of simplification, and they are rapidly becoming the de facto delivery model for much of the new software being consumed in the market today. These improved distribution models enable application providers to do something good for their customers and their developers in one fell swoop. Customers get more innovation with fewer hassles, and developers get to work on new features instead of grinding away to solve the problems created by the context of release engineering. If you hope to embrace agile development and avoid fragile development, you better sort out your distribution strategy as well. The legacy approach will not get the job done.

Thursday, April 17, 2008

Cloud Computing Casts Shadow on Walled Gardens

As a technology provider that helps application companies embrace cloud computing by virtualizing the applications to run on any cloud, I was a bit disappointed with Google's appengine announcement. It appears that Google is embracing the “walled garden” approach of salesforce.com and Microsoft instead of the cloud approach of Amazon. I believe that walled gardens will ultimately be overshadowed by clouds because you cannot achieve webscale computing if every application has to run on a server owned by Google.

Historically, Google has been very good about providing APIs that enable applications to access its web services independent of the computer on which they run. This is an important concept because it is often the case that an application needs to run on a particular network or network segment in order to preserve some critical aspect of performance or security. It is also important because it provides developers with the broadest choice of system and programming tools when developing or maintaining their applications. If you must program the application in the Python implementation specified by Google and run it on a Google server in order to take advantage of services like BigTable and Sawzall, a huge segment of the application market has just been eliminated from consideration (note that it is unclear to me at this time if Big Table and Sawzall can be accessed independent of appengine).

Why not simply expose a virtual machine API (such as Amazon Machine Image) along with the API for the web services (such as Amazon's S3, SQS, etc.)? Application instances that require minimal latency to Google services are provisioned as virtualized appliances on a Google server. For applications that need to run on a different network, you can provision the same system definition to that network while accessing the web services over the Internet. Write the program in any language you choose. With any set of system components that you choose.

The problem with walled gardens is that they ultimately restrict the growth of the market. While it is true that an attractive and well manicured walled garden will result in asymetrically large economic rent for the owner of the garden (witness Microsoft), the size of the market is nonetheless constrained. It seems to me that Google would reap the greatest benefit from maximizing the market for cloud applications quickly – independent of their ability to collect an asymetrically large portion of the rent from that market. Even their marketing of the current implementation of appengine indicates this hypothesis is correct – it is free. Success with cloud computing will no doubt lead to a decline in the value of the Microsoft system software franchise (the ultimate walled garden). Why not accelerate that decline with broad market capability instead of yet another walled garden (YAWG)?

Let me provide a concrete example. rPath was approached by a SaaS application provider to help them release their on-demand application as an on-premise application – without sacrificing management control of the system software. They want on-premise capability in order to meet the data security requirements of a certain segment of the market which they have been unable to penetrate with their SaaS offering. Their current application runs on Microsoft server technology, but it is written in Java so skipping out of the Microsoft walled garden was pretty trivial. We provided them with a virtualized implementation of their application, and we demonstrated how it could run on a local network atop a hypervisor, or as a variable cost implementation on Amazon's elastic compute cloud (EC2). Their reaction was so positive that they are now planning to gradually migrate their entire infrastructure from Microsoft to virtual infrastructure in order to seamlessly deliver the application via SaaS, variable cost cloud (Amazon), and local network (virtual appliance). Without changing their preference for programming language. Without sacrificing control of the system software layer.

To be fair to Google, appengine is a beta service. I have no doubt that they made compromises in architecture in order to get the service out the door more quickly. I hope they follow Amazon's lead and expose all of their great services as true web services while enabling any application to run close to those services via a simple virtualization spec such as Amazon's AMI. The faster we take the market to cloud computing, the sooner we can kill off the walled gardens through webscale shadows that deprive them of economic sunlight.

Labels: , ,

Friday, March 21, 2008

The Patching Dilemma

As virtual appliances pick up steam as an alternative approach to delivering SaaS value, I have seen a few analyst proclaim that the burden of patch delivery and management makes multi-tenancy via virtualization unattractive. They are correct . . . . if you attempt to build the customer virtual appliances using a legacy approach with a general purpose operating system without an integrated approach for lifecycle management. They are wrong if you consider a Just enough Operating System (JeOS or “juice”) approach with robust lifecycle management such as that offered by rPath. Let me provide an example.

rPath maintains a reference implementation of a full featured distribution of the Linux operating system as part of our rPath Appliance Platform offering. Compared with Red Hat and Novell, we probably offer about 80% of the software packages they provide as part of this reference. The remaining 20% represent desktop technology and other packages that do not matter for our target market, but are certainly important to their respective go-to-market strategies. As part of our commitment to maintain a full featured distribution, we have released about 200 security patches over the past 2 years. That is a lot of patches, but it is the reality of maintaining an OS. Keep this number in your head for a moment.

rPath also delivers our products to our customers as virtual appliances. Our ISV customers receive rBuilder and the rPath Appliance Platform as turn-key server applications completely managed by rPath on their network. Due to the unique packaging technology we pioneered, the operating system footprint to support rBuilder and the rPath Appliance Platform is about 50 Mb. When you use a JeOS architecture you eliminate any package that is not required by the application. Why is this important? Remember the 200 security patches released by rPath over the last 2 years? Only 3 were required to support our product implementation at our customers. That's correct – only 3.

Furthermore, because we deliver our patches to a specific implementation of the product (i.e. the customer did not assemble it themselves from multiple third party components, rendering clean application of patches virtually impossible), all of our customers received and applied the patches with no testing burden for them and no customer support burden for rPath.

Returning to the analysts that claim patching makes multi-tenancy via virtualization untenable, using a legacy approach with a general purpose operating system inside of snapshot virtual machine would ruin the economics of multi-tenancy via virtualization. With the rPath approach, coupled with rPath's technology for distributing patches to large numbers of systems with minimal administrator labor, you can host 66 customer virtual appliances for the same administrator effort as one virtual machine with the legacy model (3 patches vs. 200). And you avoid the expense of re-architecting and re-writing your code to support multi-tenancy – a VERY expensive proposition. And you avoid changing your business and sales model because customers can run the virtual appliances on-premise – but without the headaches of technology integration and multiple party maintenance management.

Virtual appliances deliver all of the value of SaaS to your customer base without all of the vendor hassles associated with changing your technology and changing your business model. However, just snapshotting an implementation of legacy components and ignoring the lifecycle management issues will not scale. Taking that approach would be crazy and unprofitable. rPath gives you the best of both worlds.

Labels: , , ,