Sunday, August 17, 2008

Single Minute Exchange of Applications (SMEA) - The Cure for Server Hoarding

I recently had an interesting conversation with an IT executive that has built a self-service datacenter capability based upon virtualization. He described for me a system whereby business units can request “virtual server hosts” with a pre-set system environment (i.e. Linux and Java), and within an hour or two they receive an email notification informing them of the availability of the “virtual machines.” The goal of this system, as it was explained to me, is to “cure server hoarding” by the business units.

The theory is that if the business units are confident that they can get new capacity “on-demand,” then they will not request more systems than they really need. And since they are billed based upon the actual amount of capacity deployed, they have incentive to “give back” any systems that are not necessary to meet production demands. I asked how it was working:

IT Exec – Great. We have over 1500 virtual machines actively deployed in production in support of business unit demand.

Billy – Wow! That's terrific. What do the statistics look like for server returns?

IT Exec – What do you mean?

Billy – I mean how many systems have the business units returned to the pool of available systems because their demand was transitory?

IT Exec – No one has ever given back a single machine ever. They have the economic incentive to do so, but so far not one machine has ever been given back to the pool.

And therein lies the problem. The reason no one gives systems back is because the setup costs associated with getting them productive are simply too high. Even in this case, when the setup of the operating environment is accomplished within an hour or two of the request, the process of “fiddling around with the system” to get the application installed, configured, and stable is so expensive that no one ever gives a productive system back when demand falls. This situation leads to tons of waste in the form of over deployed capital and over consumption of resources such as power. I am reminded of the early days of the lean production revolution in the world of manufacturing.

In the late eighties, Toyota was whipping Detroit's fanny because they had implemented a system that the folks in Detroit did not think was possible. The folks at Toyota got much higher utilization out of their capital investment with much lower levels of waste and work in process because they had implemented a system that assured the expensive production equipment was always engaged in producing parts and vehicles that closely reflected true demand. A big part of this system was a capability known as the Single Minute Exchange of Dies, or the SMED system, which was pioneered by Toyota and evangelized by the legendary manufacturing engineer, Shigeo Shingo.

With SMED, expensive body stamping machines (or any machine for that matter) are kept productively engaged building the exact parts that are required to meet true demand by reducing the setup time for a “changeover” to less than 10 minutes. This is accomplished primarily by precisely defining the interface between the machine and the stamping dies such that the dies can be prepared for production “off-line.” While a machine is productively engaged building Part A, the dies for Part B are setup for production in a manner that does not require interfacing with the production machine. When it is time for a changeover from Part A to Part B, the machine stops, the Part A dies are quickly released and pulled from the machine, and the Part B dies are quickly engaged using a highly standardized interface. No fiddling around to get it right. The machine starts up again in less than 10 minutes and down the line roll the perfect output for Part B.

Contrast this approach with the standard approach in Detroit in the late eighties. The economy of scale theory in Detroit was to set up the line for long runs of a single part type and build inventory because changing over the line was filled with setup costs. Fiddling around with the dies to get the parts to come off according to specification might take a day or even a week. So instead of building for true demand, Detroit over-deployed resources, both capital equipment and work in process, in an attempt to compensate for poor setup engineering. We all know how this story ends. The Toyota system is still the envy of the manufacturing world.

Now is the time for the technology world to take a lesson from Toyota. Virtualization will provide the standard interface for production, but it is almost worthless without “setup” technology that enables the applications to be defined independent from the production machine. The resources of the datacenter should reflect “true demand” for production output instead of idling away – suffering from a miserable case of server hoarding because setup is so expensive and error prone. The time has come for SMEA – Single Minute Exchange of Applications.

At rPath, we are working towards SMEA every day. We have high hopes that the complementary trends of virtualization and cloud computing will highlight the possibility for an entirely new, and more efficient, approach for consumption of server production capacity. An approach where applications are readied for production without consuming machine cycles “fiddling around” to get the application stable. An approach where expensive machines running application A are given back for production of application B when true demand indicates that B needs the resources instead of A. The Department of Energy and CERN are already on board with this approach, but it will be curious to observe who in the technology world emerges as “Toyota” and how long it takes the status quo of “Detroit” to wake up and smell the coffee.