Some say data is the new oil, but I challenge you to ask simple questions to organizations gathering this oil. You will be quick to find that for most of these organizations, it looks more like a spill than a well. We live in a world of data swamps because data is the new oil. Therefore, we must store it all to make sure that we are not wasting any of it. Since nowadays there are many options to cheaply store this new oil, few will see an objection to doing just that. I agree with this initial statement. However, the real challenges come after. Now that we have all that oil, how do we refine it into gasoline? What type of gasoline should be refined? And, more importantly, who will use or buy this gasoline? In this blog, I will focus on ways to refine oil, i.e. data, to make it more usable and valuable like gasoline.
Data must be stored
These days, you can compare storage options with the multiple ways you can drink coffee or tea: from ice cold to burning hot. In the world of storage, the colder the storage, the cheaper the price and the slower the barista. Hence, expect significantly slower retrieval time for data stored cold (inactive data). To make sure your data will be available when you need it, and as long as you need it, it is highly recommended to have access to at least two types of storage, one cold and one hot (active data). It is also important to clearly define ahead of time where each type of data should go. For example, historical data used to train AI/ML models could be stored cold, but the recent data sent by your assets and customers should be stored hot.
Data must be cleaned up
Dirty data, such as incomplete or outdated records, is mostly useless data. When it comes to cleaning up your data, there are as many solutions as there are problems. Data resampling, filling missing values, removing outliers, scaling values, deriving or selecting features, normalizing, and clustering are just a few examples. A useful data cleaning feature is one that lets you easily explore your data as a whole or in parts and includes a simple way to apply one or many cleaning solutions.
Data must be structured
A data structure is simply a format that enables efficient access to, and modification of, the data. For cleaned data to be structured, you need a data mapping mechanism. Data mapping allows you to easily link the source of the data (the cleaned data) with the destination (the data format). A classic example of this in IoT is the collection of properties in the device twin ‘template’ mapping to the structure of the database where data is stored.
Data must be enriched
Data enrichment refers to the processes of merging third-party data from an authoritative source with an existing set of data. Data from IoT devices can be enriched with diverse sources such as weather, geographic locations, maintenance data, energy prices, crime rates, traffic data, demographic data, etc.
All done turning oil into gasoline? Far from it, we are only halfway there when it comes to refining the oil. At this point, your main problem is that you don’t know where the unleaded, the super, or the diesel gasoline are, and how much of each you have, or if, in fact, you have any of it. We are only in a position where we can more clearly establish what you have. To actually be able to establish what you have, you must be able to explore your data.
Data must be explorable
To be efficient in the initial data exploration step, stay away from artificial intelligence, machine learning, and other highly sophisticated features. Focus on simple data exploration through more descriptive insights to answer fundamental questions like:
- How many of this do I have?
- How can I characterize or group that?
- Do I have any normal or abnormal collections of data?
- Are there any obvious trends in the data?
Answers to these basic questions lay the foundation for identifying further, more complex opportunities available within the data.
While ‘Think Big’ is everywhere and the promise of an AI and ML revolution is strong, the reality is that you may not even have enough data, or clean enough data, or relevant enough data to actually train ML models. It’s like assuming high octane fuel will work best for you without trying regular unleaded. I can’t stress enough the value of starting with simple data exploration to, not only understand the general behavior of your IoT products, but to also identify the opportunities for future ML models.
At the end of the day, you may find that you can derive substantial business value with AI as we frequently see with our Mnubo Data Core customers. While all of the refinement mentioned above is performed easily and efficiently with Data Core, the product also provides a strong baseline level of data exploration to help you unlock value almost immediately from the data. Some good examples involve product usage and understanding if new features are being used, whether firmware updates are being applied, how frequently device faults occur – which are all native capabilities of Mnubo Data Core that require no additional coding or data science skills.
So, if data is your oil, Mnubo Data Core is your refinery turning it from crude to usable and valuable gasoline. Determining what type of gasoline you need is up to you and the needs of your business. However, starting simple and being thorough in your data exploration will help you determine if regular unleaded is all you need, and if the data will enable you to further refine into levels of higher octane.
Mnubo Data Core is the foundation for Mnubo’s industry-leading AIoT Platform providing native capabilities for connection and ingestion from data sources, data processing and management, resource management, and data exploration. Data Core is a pre-built and fully managed platform requiring no coding or DevOps support. The platform provides more rapid and secure time-to-value from IoT data with low cost of ownership.