Katariina RoosipuuKnowledge Sharing Lead
+372 50 79 859
Export Marketing Lead+372 55 99 66 00
Data technologies are in the early stages of their life cycle, so diffusion is not yet prevalent, opening space for innovation in data science tools. We estimate that a great data science tool focusing on the core activities saves at least 50% of calendar time spent on developing models.
by André Karpištšenko
Most companies are awash in data and an increasing number are creating products that combine data with analytical capabilities. Figuring out how to derive a profit from data is the job tackled by businesses together with data scientists. The impact of the data economy on GDP is expected to be considerable: for example two to four percent of the EU economy will be impacted by 2020 thanks to the data economy.
The number of highly-skilled data workers is forecasted to grow through 2020 in most economic scenarios, and there will continue to be a shortage of professional data scientists globally.
Better tools that automate data scientists' work can increase supply. Data technologies are in the early stages of their life cycle, so diffusion is not yet prevalent, opening space for innovation in data science tools.
IDC, the global markets intelligence firm, has forecasted $46 billion in corporate revenues associated with cognitive and artificial intelligence systems in 2020, a compound annual growth rate of 54.4%. As data scientists become more productive, the number of scientists does not need to increase proportionally with the increase in technology investments.
Productive high-performance data scientists are a competitive edge for leading companies.
Preprocessing takes 80% of their time and is domain specific and hard to automate. Tamr and Trifacta provide tools for easing the workload in this area. Furthermore, data syndication and publishing services, such as Xignite, Planet OS, Enigma and Qlik, provide clean and curated data in some industries. Building upon data exchanges and preprocessing related technologies is a good use of data science resources.
The remaining 20% is the core of the data scientist's activity - building, training, visualizing and deploying predictive models for services and products. By automating this part, we can significantly reduce the feedback loop duration and enable faster turnaround time from ideas to results. Automated scaling of experiments has a similar impact, as it allows the search space for the best performing model to be extended. Together with reduced errors from manual activities, we estimate that a great data science tool focusing on the core activities saves at least 50% of calendar time spent on developing models.
At Proekspert, we are investing in the development of tools to boost productivity and automate these core activities. The first-hand experience of our data scientists is very encouraging. Contact us to talk about the approach.