Consumer IoT 14 MIN READ

Data Science Tools: Build, Buy or Lease?

By Jennifer
August 16, 2019
Data scientist using a tool with chart visuals hovering above a tablet

You can help your team by giving them the data science tools to quickly build, train, and deploy AI/Machine Learning models. But should you build it yourself, or find a vendor?

Most managers would agree that data scientists like to work. It’s a field for people who love the challenge of turning raw data into a fully-realized model for comprehending the world. Plus, there is always another layer of fidelity and detail within reach. What they don’t like is tedious grunt work. To a data scientist, data cleansing, or scrubbing, is the definition of grunt work, and it can often take up 80% of their time. That’s why data science tools are so important.

Besides frustrating your team, devoting so much of your most highly-skilled, highly-compensated staff’s time to cleansing is a terrible waste of resources. By optimizing the process with AI-based data science tools, you can save a lot. This frees up key stakeholders to focus on work that has a bigger impact on the underlying business.

Automating data cleansing and other relatively menial tasks to artificial intelligence requires investing in infrastructure to develop, train, deploy and run their algorithms. In this post, we’re taking a look at your four main options: building an in-house solution; buying an off-the-shelf solution; leasing a solution; and partnering with a provider on an “as-needed” basis.

Option #1: Build an in-house data science tool from the ground up

Data scientists using laptop in Data Center next to servers to run diagnostics with IoT analytics data

M.I.T.’s Sloan Review recommends a “data factory” model to optimize internal and external monetization potential. Like an assembly line pressing and repressing from the same mould, you should be automating your data collection, cleansing, enrichment and interface. Your data platform should meet the following needs:

  • Analysis: Is the interface intuitive enough to make analysis easier for your scientists, rather than adding another layer of complexity?
  • Synthesis: Does it abet the process of experimenting with your insights and testing out new strategies?
  • Modelling: How robust is its capability to generate sophisticated predictive models?
  • Interactivity: Is it easy to share insights with stakeholders and partners? Does it integrate well with internal and open-source libraries?
  • Scalability: Does your platform effectively scale with increased demand and scope?

Building your own solution comes with the most obvious upshots, and drawbacks. Your data scientists and developers should have a better sense than most outsiders of the profile of the data you need to manage.  They should also know the questions the data needs to answer, and what approaches have proven to be successful in the past.

After all, you’re essentially training an AI to apply the rationale of a seasoned human staffer to the cleansing process. Not only that, but if you do manage to develop a brilliant proprietary solution you will own a competitive advantage over your peers.

The primary downside is that upfront development costs can be steep compared to the other options on the table. Between tasking existing staff and bringing on additional help, you’re committing to a project that may end up costing more resources than you save by solving the original problem. And that’s without mentioning the time and cost of maintaining the solution after deploying it.

Option #2: Buying an off-the-shelf solution

Data scientist analyzing IoT data insights broadcasted on multiple computer screens

Buying an off-the-shelf solution sidesteps some of the upfront development costs of building your own platform—but over time, costs may be comparable. This is because in most cases these pre-built packages require significant customization (and therefore in-house development) in order to meet your business’s data profile. There may also be certain “brick wall” situations where the technical limitations of the purchased solution make further development impossible or unfeasible.

On the plus side, this route offers access to powerful tools developed by industry leaders. For example, an exciting aspect of AWS’s AI training tool SageMaker is its Ground Truth function. Training AI involves introducing it to a human-generated baseline and teaching it to follow the established patterns. You can teach Ground Truth to mimic trained human data labellers with a high degree of accuracy.

Amazon currently estimates that up to 70% of labelling tasks can be automated, with the AI automatically directing the 30% of cases where it is unsure to human personnel.

Another industry player, Tableau has distinguished itself with its Prep tool. Designed specifically to aid with data cleansing, Prep’s fuzzy clustering helps group broadly similar classification tasks, cutting down on repetition. It’s also a great example of a clean, real-time interface.

Option #3: Lease a solution with accompanying services

Transparent futuristic tablet used by data scientist for IoT analytics

Buying a data science solution means it’s yours to take advantage of as long as it’s useful. Trouble is, technology changes quickly, and the lifecycle of a data platform can be turbulent. This is especially true if you’ve been tacking on your own ad hoc additions over time.

Leasing, by contrast, offers limited term commitment and enhanced vendor support. After all, your vendors are incentivized to make sure you get the most out of their products to maintain you as a client.

Some use cases necessitate more customization than others. Mnubo’s Data Science Studio, for example, provides access to a full Python AIoT studio to help you develop custom IPs. It also makes it simple to version your code and distribute it on a worldwide basis.

The company’s IoT Data Science & AI/ML Services can be leveraged to supplement your in-house development team. This allows you to bring in the very people who designed the platform to help you get the most out of it.

However, it’s possible your requirements will make leasing unfeasible, and some clients prefer the continuity of owning their own architecture.

Option #4: Avoid the commitment and partner on an “as-needed” basis

Building it yourself is raising a family. Buying a solution is getting married. Leasing is dating. Then “as-needed” partnering is basically the “friends with benefits” of platform investment. This option emphasizes agility and customization, cherry-picking the best products and services as opportunities (or complications) arise. This approach has much to recommend it—if you have strong market intelligence and a shrewd knack for vendor management.

Although products like Mnubo’s data science tools are designed to work with the widest possible array of libraries and third-party tools, others are fussier and/or more custom in nature. You may also find it challenging to receive the favourable pricing vendors will often offer steadier clientele.

Ultimately, you should make your in concert with the data scientists on your team, as well as senior management. Ask what questions you intend to answer with the data you collect. Then, question if there is an external market for the insights you gain from it. Then ask what resources, human and material, are available to invest, and how AI can augment overall performance. A happy data science team is a creative one: it’s time to unlock that potential.


Stay up to date

You might also like

Man holding a dashboard on a tablet to an industrial machine to apply IoT learnings

Consumer IoT, Industrial IoT

Why IoT needs Machine Learning to (Really) Take Over the World

Putting the Internet of Things and Machine Learning in the same sentence is like playing…

IoT analytics connecting the city

Consumer IoT

5 IoT Analytics Myths

AI and the IoT have snowballed in attention over the past little while – garnering…

5G network lit up across a city

Consumer IoT, Industrial IoT

How 5G will change everything

What is 5G and how is it going to change the world we live in…

Data scientist analyzing data on a laptop

Consumer IoT

Writing a Data Scientist Résumé

8 things you need to include in your data science résumé, from one data scientist…

Take the first step today

Get started