Coolan was launched earlier this year by two former Facebook employees to make software which uses analytics and artificial intelligence to forecast server failure and avert data center outages. Amir Michael, co-founder of Coolan was formerly the manager of hardware design at Facebook. He is joined by Jonathan Heiliger, ex Facebook vice president of infrastructure and technical operations. The third co-founder is Yoni Michael, Amir’s brother.
Increasingly, companies are in need of new software as they follow Facebook and Google’s lead in exchanging data center hardware from traditional enterprise server vendors for bare-bones or white-label servers. Many of the newer suppliers offer low-cost and efficient servers, but don’t also sell the server optimization software that Dell Inc. and other large vendors offer.
Amir Michael explained that the aim of Coolan is to “enable that transition to happen with a lot less risk”. In a Coolan blogpost, Michael outlined how many companies want improved data on problems like why their servers are failing, how they can stop the next failure and if they bought the right equipment for their needs.
Coolan provides answers to these questions by asking customers to place a small bit of code on their servers. The code continuously collects metadata on how the servers are performing across a range of vendors. The data is made anonymous and Coolan’s algorithms sift through the information to analyze it for factors such as configuration, failure and event data from its complete customer base. As Coolan uses machine learning, the algorithms will adjust as they get new information. The software can forecast when particular components in different servers might fail or inform IT staff if they might get improved performance by altering server configurations.
Although the webscale data centers are designed with supposedly fault tolerant software that will continue to operate if there is a single server failure, outages can still occur if enough servers stop working simultaneously; for example, in the summer of 2011, a rain cloud formed out of the blue in Facebook’s Prineville, Oregon data center because of a problem with a water-based cooling system and the building management system. Servers that had front-facing power supplies broke entirely. Michael was one of the team who re-designed those power supplies. When such an event occurs, companies tend to rely on a buffer stock of extra servers. Amir Michael explained that Coolan “can predict failure that reduces the downtime of your server so you’re buying less buffer stock of servers”.
Coolan is just one startup within a growing group of suppliers developing around the Open Compute Project began by Facebook in 2011 to share information about building servers and constructing data centers cost-effectively. As companies have joined the project, a number of hardware and software suppliers have shot up around the project. Companies are joining it all the time, most notably Apple in March 2015. Apple has a huge data center infrastructure that supports its online services. Companies that operate large data centers profit from designing their own hardware. OCP has become a touchstone for the community of vendors and end users that support the re-design of data centers.