By extensively utilizing data, and paying attention to detail Tesla has changed the conversation on the type of personalized experience car owners (drivers and passengers) should expect from an automaker. In the process, it is building strong loyalty with the owners of its cars who appear willing to support it through thick and thin. Tesla has taken a lesson from Apple, Google, Facebook and Amazon, four companies that obsess about connecting pieces of data and using it to better understand their consumers and tailor their services to provide the right experience. It is this personalized experience that Tesla offers that has allowed it to build a brand that delights its customers. The exploitation of big data that is generated by vehicles, consumers and companies across the entire automotive value chain must become a key competence of all automakers. But as I discussed in previous posts of this series, with the possible exception of GM through its OnStar service, (and here) only recently have started to collect and utilize these types of big data (and here). As a result, they don’t capture data of sufficient scale and they are not best in class yet at exploiting big data. In this post I argue that automakers should accelerate their partnerships with companies that have strong data collection and exploitation DNA as Tesla has already demonstrated is possible. As mobility services are starting to play an increasingly important role in transportation solutions, companies that offer such services become ideal partners to automakers. By partnering with them, automakers will be able to better understand their customers in far greater detail than they do today, as well as mobility services, which threaten to disrupt them. Ridesharing and carsharing companies represent the best initial candidates for such partnerships because these companies a) are collecting and utilizing consumer big data with the same attention and rigor as Apple, Google, Facebook, and Amazon and b) have already collected impressive data sets due to the scale they have achieved. Apple’s just announced investment in Didi Chuxing (and here), in addition to the broad implications to Apple’s services in China, e.g., ApplePay, is a further indication that data partnerships even among companies that are some of the best in class, can be essential for developing next-generation transportation solutions, including autonomous vehicles.
Machine Intelligence/Big Data
In this first part of this two-part series, I discussed why the automotive industry, particularly the incumbent OEMs, is facing a big data challenge. This challenge is becoming extremely acute as a result of the increasing adoption of EAC vehicles combined with Mobility Services (EAC+MS) and the torrent of data that will be generated as a result of this adoption.
In this post, I present how the incumbent OEMs can address this challenge. To do so, automakers must:
- Think strategically and own the big data strategy. They must then drive the execution of this strategy instead of relying on their suppliers for partial solutions
- Revamp the vehicle’s computer system architecture to create a unified computing and big data architecture.
- Establish and enforce data ownership rights among the appropriate constituencies.
- Create a data-sharing culture.
In the not too distant future, automakers won’t be evaluated just on the physical, safety and performance characteristics of their vehicles. Instead incumbent and next-generation automakers will be evaluated based on the completeness of their solution along five dimensions: Electric, Autonomous, Connected, Mobility Services (EAC+MS), and Information. We read about the progress automakers and their suppliers are making along the first four dimensions. There is much less conversation about the fifth dimension. In this two-part series, we will discuss the big data challenge facing the automotive industry. The pieces are the result of my work in the industry helping corporations with their innovation and big data strategies. The first post provides the why and makes two points:
- Automakers must be in the information business. To be effective in the information business, automakers must change their perspective and start thinking about an overall process for big data in and around the car.
- Information in EAC+MS implies big data and incumbents in the automotive ecosystem must become serious about big data. Newcomers to the automotive industry such as Google, Tesla, Faraday Future, and likely Apple, but also Uber, and Lyft, realize this imperative.
The second piece will provide the how to try to address this challenge.
I want to take a quick breather from writing about corporate innovation and return to another topic of this blog: big data and insight as a service. Host Analytics, one of my portfolio companies, recently completed a $25M financing round. Host Analytics offers a cloud-based Enterprise Performance Management (EPM) Suite that streamlines a corporation’s planning, close, consolidation and reporting processes. But it is what they are enabling for the enterprise that is important to write about. Host Analytics has moved from being an EPM company, to being an insight generation company.
Since 1999 we have been investing in companies that develop SaaS applications targeting the mid-upper and global enterprise. Through these investments and the hundreds of other SaaS companies, targeting these and other segments, we have considered during this period, we have started to notice a transition in the way companies utilize cloud computing infrastructures platforms to develop, test and deploy their applications. In the last 5 years SaaS companies, particularly the earlier stage ones, have started to transition from exclusively building and deploying their applications on custom developed infrastructures, to utilizing third-party infrastructures and platforms for these tasks. Third party infrastructures come in two flavors: Infrastructure as a Service (IaaS), e.g., Rackspace, VCE, or Amazon’s AWS, and Platform as a Service (PaaS), e.g., Salesforce’s Heroku, or Microsoft’s Azure. During this period we have seen SaaS companies for which developing and deploying on a public infrastructure was absolutely the right decision, e.g., Dropbox developed and continues to deploy on AWS, and others which had to switch to a private infrastructure after having initially developed their application on a public one.
The decision to employ a custom/private infrastructure for a SaaS application, or, alternatively, the decision to switch from a public to a private infrastructure to develop and deploy such an application are expensive propositions for a SaaS company of any size. Using a private infrastructure means that the SaaS company has full control of its infrastructure but also that a meaningful percentage of its capital is spent for the development, maintenance and upgrading of this private infrastructure. Switching from a public infrastructure to a private one, or even switching among public infrastructures, done without proper planning leads to delays in product release schedules, increased downtime and low customer satisfaction.
SaaS entrepreneurs and management teams are asking two questions regarding the platforms and infrastructures used for their applications so that they can accomplish their development, testing and deployment goals while building profitable companies, maintaining their customers trust and expectations:
- What factors should I consider as I try to determine whether to use a third party/public cloud computing infrastructure?
- When should I move from exclusively using a public cloud computing infrastructure, even in a single-tenant mode, to using a private/custom infrastructure or to using a hybrid approach?
We see entrepreneurs selecting a third party platform to start developing their SaaS applications because they instinctively believe that the associated costs, for both development and initial deployment, will low. They are often right about the startup phase of their business. However, the decision for the long term use of such infrastructures is not as simple as it first appears because several interdependent factors need to be considered. They include:
- The economics associated with the company’s business model. For example, a SaaS application that will be monetized using an advertising or a freemium model has very different economics than one that will be sold and monetized through a direct inside sales model. The paying users of the application’s premium version must subsidize the usage of a very large number of users that utilize the application’s free version. Therefore, the company’s operating model must take into account the costs of running the infrastructure used to develop and deploy such an application. One can then determine if the company can create a profitable model using a third party infrastructure or roll out its own private infrastructure.
- The SLAs the SaaS application will need to meet in order to satisfy its customers. These SLAs can range from uptime to response time, from backup time to failover time, etc. SLAs are themselves a complex factor. They are dictated by the type of target user, e.g., consumer vs corporation, the number of users, e.g., hundreds for a specialized corporate application, to millions for a typical successful consumer application, the application company’s stage, e.g., the SLAs for an application that is entering its initial deployment phase are oftentimes different from the SLAs of a fully deployed application, the geographies where the application will need to operate, e.g., data storage and retention regulations in one geography may be different than in another. Each SLA has an associated cost. For example, if it is necessary for a SaaS application to run on multiple geographies from the time it is initially deployed, the use of a third party public infrastructure will enable the company to meet this requirement at a lower cost than having to build its own data centers. Certain application types, e.g., entertainment applications such as Flixster, general utilities such as Wunderlist or Open Table, etc., that are targeting particular market segments, e.g., consumer, SOHO, or applications targeting specific segments of the broader SMB market, e.g., Square, LevelUP, Milyoni, can be developed and deployed on third party infrastructures and never need to migrate to private ones. This is because the SLAs associated with such applications are more flexible and the third party infrastructures can easily accommodate them. Moreover, the scalability and capabilities of these infrastructures are constantly improving so keeping up with the applications’ growth is possible. SaaS applications such as Evernote, or Carbonite that have more stringent SLAs and, in addition to consumer and SMB segments, target the enterprise, run on proprietary infrastructures because third party infrastructures cannot meet their SLAs at acceptable economics.
- The regulations governing the industry targeted by the application. For example, the data privacy regulations governing applications targeting the health care and financial services industries often necessitate the use of private cloud infrastructures by companies developing application for this industry.
- The available in-house expertise and the importance of having such expertise. The company must determine whether it has the in-house expertise to build and maintain a custom cloud infrastructure to support application development and deployment, particularly as the company grows, whether acquiring such expertise provides it with a competitive advantage, and whether it is willing to continue incurring the costs associated with the building, maintaining and upgrading the required infrastructure and the associated expertise.
- The company’s stage. Early stage companies have different priorities, e.g., time to market, than later stage ones, e.g., sustaining growth at a reasonable cost.
Based on the factors above,
- Early stage SaaS companies use public cloud infrastructures to:
- Accelerate product development by focusing on the business logic and taking advantage of the ecosystem that is typically built around the third party platform to provide faster a more feature-rich application.
- Improve time to market by quickly onboarding customers.
- Address lack of expertise in building and effectively managing cloud infrastructures.
- Growth stage companies use public cloud infrastructures to:
- Reduce product development costs while enabling collaboration among distributed development teams.
- Reduce the cost and time to customer on-boarding.
- Utilize the elastic supply of computation and storage provided by the public infrastructures in order to easily grow their customers while meeting SLAs and regulations (geography-specific and/or industry-specific regulations).
- Achieve their growth goals while controlling capital and operating costs.
SaaS companies start using public cloud infrastructures and remain in such infrastructures if they target consumer and SMB market segments under business models that allow them to make money using such infrastructures, and can satisfy the SLAs of their target segments. Companies start with public cloud infrastructures and completely migrate to custom/private ones when they want to target mid-upper and global enterprises. If they target both the SMB and the large enterprise segments then they can use a hybrid approach remaining on public infrastructures to address the needs of the SMB segment and using their own private infrastructure to address the large enterprise segment, as Workday does which runs its application on both its own infrastructure, as well as in AWS. In all of these cases when a migration from a public to a private cloud infrastructure is contemplated I advise the companies to build their application assuming a multi-cloud strategy. This means that the application can simultaneously utilize several public cloud infrastructures, or that can it easily migrate from one public infrastructure to another, in this way also avoiding vendor lock-in. A multi-cloud strategy allows the company to avoid cloud vendor lock-in, effectively deal with SLAs and regulations (as described above), and better address demand elasticity (again, as described above). The problem with hybrid environments is that you have to keep track of multiple different security platforms and ensure that all aspects of your business can communicate with each other. Finally, if a company develops a SaaS application targeting a regulated industry such as health care or financial services then it needs to build and deploy its application on its own private infrastructure.
Determining the infrastructure and platform on top of which to develop and deploy a SaaS application is not as easy as it may initially appear particularly if the company is thinking long term. The factors I provided above which have been derived from my years of experience in investing in SaaS application companies will hopefully help entrepreneurs and management teams put some structure around this decision.
The physical world (from goods to equipment) is becoming digitally connected through a multitude of sensors. Sensors can be found today in most industrial equipment, from metal presses to airplane engines, shipping containers (RFID), and automobiles (telematics devices). Consumer mobile devices are essentially sensor platforms. These connected devices can automatically provide status updates, performance updates, maintenance requirements, and machine-to-machine (M2M) interaction updates. They can also be described in terms of their characteristics, their location, etc. Until recently these sensors have been interconnected using proprietary protocols. More recently, however, sensors are starting to be connected via IP, to form the Internet of Things, and by 2020 50B devices will be connected in this way. The connected physical world is becoming a source of immense amount of low-level, structured and semi-structured data, e.g., big data.
Collecting and utilizing sensor data is not new. For example, GE uses data from sensors to monitor the performance of industrial equipment, locomotives, jet engines and health care equipment. United Airlines uses sensors to monitor the performance of its planes on each flight. And government organizations, such as the TSA, collect data from the various scanners they use at airports. The key applications that have emerged through these earlier efforts are remote service and predictive maintenance.
While our ability to collect the data from these interconnected devices is increasing, our ability to effectively, securely and economically store, manage, clean and, in general, prepare the data for exploration, analysis, simulation, and visualization is not keeping pace. Today we seem to be pre-occupied with the goal of trying to put all of data we collect into a single database. Even in this task we are not doing a particularly good job. The existing database management systems are proving inadequate for this task. They may be able to process the time series data collected by sensors, but they cannot correlate it. The effectiveness of newer database management systems (NoSQL), e.g., Hadoop, MongoDB, Cassandra, is also proving inconsistent and depends largely on the type of application accessing the database and operating on the collected data.
The new generation of applications that will exploit the big data collected by sensors must take a ground up approach to the problem they are trying to address, not unlike that taken by Splunk. In Splunk’s case, the application developers considered the ways the sensor data being collected from data centers must be cleaned, the other data sets with which it must be integrated/fused, the approach to interact with the resulting data sets, etc. Splunk’s developers were able to accomplish this and deliver a very effective application because they understood the problem, the spectrum of data that must be used to address the problem, and the role the low-level data is playing in this spectrum. They also appear to have understood the importance of providing effective analyses of the low-level data as well of the higher-level data sets that resulted when several different data sources are fused.
The Internet of Things necessitates the creation of two types of systems with data implications. First, a new type of ERP system (the system of record) that will enable organizations to manage their infrastructure (IT infrastructure, human infrastructure, manufacturing infrastructure, field infrastructure, transportation infrastructure, etc.) in the same way that the current generation of ERP systems allow corporations to manage their critical business processes. Second, a new analytic system that will enable organizations to organize, clean, fuse, explore and experiment, simulate and mine the data that is being stored to create predictive patterns and insights. Today our ability to analyze the collected data is inadequate because:
- The sensor data we collect is too low-level; it needs to be integrated with data from other sensors, as well as higher-level data, e.g., weather data, supply chain logistics data, to create information-richer data sets. Data integration is important because a) high-velocity sensor data must be brought together and b) low-granularity sensor data needs to be integrated with other higher-granularity data. Today integration of sensor data is still done manually on a case-by-case basis. Standards-based ways to integrate such data, e.g., RESTful APIs, other types of web services, have not yet adopted broadly in the Internet of Things world and they need to. We need to start thinking of sensor data APIs in the same way we have been thinking about APIs for higher-level data. And once we start defining these standards-based APIs we also need to start thinking about API management.
- We don’t yet know the range of complex analyses to perform on the collected sensor data because we don’t know yet what enterprise and government problems we can solve through this data.
- Even for the analyses we perform, we often lack the ability to translate any analysis results to specific actions.
Finally, along with these two types of systems we will need to effectively manage the IP addresses of all devices that are being connected in these sensor networks. IPV6 gives us the ability to connect the billions of sensors using IP. We need better ways to manage these connected devices. Most organizations today manage them on spreadsheets.
The big data generated by the Internet of Things is opening up great opportunities for a new generation of operational and analytic applications. Creating these applications will require taking a ground-up approach from the basic sensor technology and the data sensors can generate to the ways sensors and managed and data is integrated, to the actions that can be taken as a result of the analyzed data.
A few days ago I presented a webinar on Insight as a Service. In the presentation I tried to provide further details on the concept which I first introduced here and later elaborated here. I am including the webinar presentation (click on the slide below) and the notes because they elaborate further on Insight as a Service and provide some examples.
In a previous post introduced the concept of Insight as a Service and described some of the issues that will need to be addressed for such services to be possible. Insight as a Service refers to action-oriented, analytic-driven solutions that operate on data generated by SaaS applications, proprietary corporate data, as well as syndicated and open source data and are delivered over the cloud. This definition is meant to differentiate Insight as a Service, which I associate with action, from Analytics as a Service, which I associate with data science, and Data as a Service which I associate with the cloud-based delivery of syndicated and open source data. For example, a cloud-based solution that analyzes data to create a model that predicts customer attrition and then uses it to score a company’s customer base in order to establish their propensity to churn is an Analytics as a Service solution. On the other hand, a cloud-based solution which, in addition to establishing each customer’s attrition score, automatically identifies the customers to focus on, recommends the attrition-prevention actions to apply on each target customer and determines the portion of the marketing budget that must be allocated to each set of related actions, is an Insight as a Service solution.
The survey data presented in Pacific Crest’s SaaS workshop pointed to the need for a variety of data analytic services. These services can be offered under the term Insight-as-a-Service. They can range from business benchmarking, e.g., compare one business to its peers’ that are also customers of the same SaaS vendor, to business process improvement recommendations based on a SaaS application’s usage, e.g., reduce the amount spent on search keywords by using the SEM application’s keyword optimization module, to improving business practices by integrating syndicated data with a client’s own data, e.g., reduce the response time to customer service requests by crowdsourcing responses. Today I wanted to explore Insight-as-a-Service as I think it can be the next layer in the cloud stack and can prove the real differentiator between the existing and next-generation SaaS applications (see also here, and Salesforce’s acquisition of Jigsaw).
In my last blog I tried to define the concept of insight. In this post I discuss insight generation. Insights are generated by systematically and exhaustively examining a) the output of various analytic models (including predictive, benchmarking, outlier-detection models, etc.) generated from a body of data, and b) the content and structure of the models themselves. Insight generation is a process that takes place together with model generation, but is separate from the decisioning process during which the generated models, as well as the insights and their associated action plans are applied on new data.
A little over two years ago I wrote a series of blogs introducing Insight-as-a-Service. My idea on how companies can provide insight as a service started by observing my SaaS portfolio companies. In addition to each customer’s operational data used by their SaaS applications, like all SaaS companies, these companies collect and store application usage data. As a result, they have the capacity to benchmark the performance of their customers and help them improve their corporate and application performance. I had then determined that insight delivered as a service can be applied not only for benchmarking but to other analytic- and data-driven systems. Over the intervening time I came across several companies that started developing products and services that were building upon the idea of insight generation and providing insight as a service. However, the more I thought about insight-as-a-service, the more I came to understand that we didn’t really have a good enough understanding of what constitutes insight. In today’s environment where corporate marketing overhypes everything associated with big data and analytics, the word “insight” is being used very loosely, most of the times in order to indicate any type of data analysis or prediction. For this reason, I felt it was important to attempt defining the concept of insight. Once we define it we can then determine if we can deliver it as a service. During the past several months I have been interacting with colleagues such as Nikos Anerousis of IBM, Bill Mark of SRI, Ashok Srivastava of Verizon and Ben Lorica of O’Reilly in an effort to try to define “insight.”