Industrial IoT

There’s only so much your data scientists can do without the right tools

6 min readYou can help your team by giving them the data science tools to [...]

6 min read

You can help your team by giving them the data science tools to quickly build, train, and deploy AI/Machine Learning models. But should you build it yourself, or find a vendor?

Most managers would agree that data scientists like to work. It’s a field for people who love the challenge of turning raw data into a fully-realized model for comprehending the world, and there is always another layer of fidelity and detail within reach. What they don’t like is tedious gruntwork.

To a data scientist, data cleansing, or scrubbing, is the definition of gruntwork, and it can often take up 80% of their time. 

Besides frustrating your team, devoting so much of your most highly-skilled, highly-compensated staff’s time to cleansing is a terrible waste of resources. Companies can save a lot by optimizing the process via AI-based data science tools, freeing up key stakeholders to focus on work that has a bigger impact on the underlying business.  


Automating data cleansing and other relatively menial tasks to artificial intelligence requires investing in infrastructure to develop, train, deploy and run their algorithms.

In this post, we’re taking a look at your four main options:

  1. Building an in-house solution;
  2. Buying an off-the-shelf solution;
  3. Leasing a solution;
  4. Partnering with a provider on an “as-needed” basis. 

Man looking at all different data science tools
The amount of data science options can be overwhelming

Option #1: Build an in-house data science tool from the ground up

M.I.T.’s Sloan Review recommends a “data factory” model to optimize internal and external monetization potential: like an assembly line pressing and repressing from the same mould, you should be automating your data collection, cleansing, enrichment and interface.

Your data platform should meet the following needs:

  • Analysis: Is the interface intuitive enough to make analysis easier for your data scientists, rather than adding another layer of complexity?
  • Synthesis: Does it facilitate the process of experimenting with your insights and testing out new strategies?
  • Modelling: How robust is its capability to generate sophisticated predictive models?
  • Interactivity: Is it easy to share insights with stakeholders and partners? Does it integrate well with internal and open-source libraries?
  • Scalability: Does your platform effectively scale with increased demand and scope?

Building your own solution comes with the most obvious upshots, and drawbacks. Your data scientists and developers should have a better sense than most outsiders of the profile of the data you need to manage; the questions the data needs to answer; and what approaches have proven to be successful in the past.

IoT Company

After all, you’re essentially training an AI to apply the rationale of a seasoned human staffer to the cleansing process. Not only that, but if you do manage to develop a brilliant proprietary solution you will own a competitive advantage over your peers.

The primary downside is that upfront development costs can be steep compared to the other options on the table.

Between tasking existing staff and bringing on additional help, you’re committing to a project that may end up costing more resources than you save by solving the original problem, not to mention the time and cost to maintain the solution after it’s been deployed.

Option #2: Buying an off-the-shelf solution

Buying an off-the-shelf solution sidesteps some of the upfront development costs of building your own platform. However, the costs – over time – may be comparable. This is because in most cases these pre-built packages require significant customization (and therefore in-house development) in order to meet your business’s data profile.

There may also be certain “brick wall” situations where the technical limitations of the purchased solution make further development impossible or unfeasible. 

Off-the-shelf data science tool

On the plus side, this route offers access to powerful tools developed by industry leaders. For example, an exciting aspect of AWS’s AI training tool SageMaker is its Ground Truth function. Training AI involves introducing it to a human-generated baseline and teaching it to follow the established patterns; Ground Truth can be taught to mimic trained human data labellers with a high degree of accuracy.

Amazon currently estimates that up to 70% of labelling tasks can be automated, with the AI automatically directing the 30% of cases where it is unsure to human personnel. (Good news for your grumbling data scientists.)

Another industry player, Tableau has distinguished itself with its Prep tool. Designed specifically to aid with data cleansing, Prep’s fuzzy clustering helps group broadly similar classification tasks, cutting down on repetition. It’s also a great example of a clean, real-time interface.

Option #3: Lease a solution with accompanying services

Buying a data science solution means it’s yours to take advantage of – as long as it’s useful -. Trouble is, technology changes quickly, and the lifecycle of a data platform can be turbulent. This is especially true if you’ve been tacking on your own ad hoc additions over time.

Leasing, by contrast, offers limited term commitment and enhanced vendor support. After all, your vendors have an incentive to make sure you get the most out of their products to maintain you as a client.

Some business cases necessitate more customization than others. Mnubo’s SmartObjects AIoT Studio for example provides access to a full Python notebook to help you develop custom IPs. It also makes it simple to version your code and distribute it on a worldwide basis.

AIoT Studio
Mnubo’s AIoT Studio

It also improves the way you collect and categorize data, which reduces the amount of work required to cleanse it. Outsourcing what remains to Mnubo effectively reduces the burden on your own team to virtually nil.

However, it’s possible your requirements will make leasing unfeasible, and some clients prefer the continuity of owning their own architecture.

Option #4: Avoid the commitment and partner on an “as-needed” basis

If building it yourself is raising a family, buying a solution getting married and leasing dating, then “as-needed” partnering is basically the “friends with benefits” of platform investment.

This option emphasizes agility and customization, cherry-picking the best products and services as opportunities (or complications) arise. This approach is extremely appealing—if you have strong market intelligence and a shrewd knack for vendor management.

Some products – like Mnubo’s – are designed to work with the widest possible array of libraries and third-party tools.

Mnubo’s Asset Health dashboard

Some others are fussier and/or more bespoke in nature. You may also find it challenging to receive the favourable pricing vendors will often offer steadier clientele.

Ultimately, your data scientists and senior management should make a decision in concert.
Ask the following questions:

  • Why you are collecting data?
  • Is there an external market for the insights you extract from it?
  • What resources, human and material, are available to invest?
  • How AI can augment overall performance?

A happy data science team is a creative one: it’s time to unlock that potential.

Download our exclusive e-book today!