22.11.2018

Python Applied to Machine Learning and Data

An overview by Travis E. Oliphant.

Most tech companies today are struggling to figure out how they can best work with their data. Travis E. Oliphant, founder of startups Quansight and Anaconda, and the primary developer of NumPy and SciPy packages for Python, gave a talk at North Star AI conference, powered by Proekspert on how Python can be used effectively for machine learning and data—the heart of ML-driven technology.

Oliphant’s connection to Python dates as far back as 1997, when he was working with version 1.4. The very first problem he focused on was actually a data problem, and this led him to understand that in order to do anything with machine learning you have to get the data right.

While data seems to be everywhere these days, said Oliphant, the biggest problem is how you gain access to and utilize that information.

In Oliphant’s mind, Python is one of the best languages to apply to machine learning problems. During his talk, he gave a thorough overview as to why and presented how he imagines artificial intelligence might develop in the future.

Not Artificial but Augmented Intelligence

When Oliphant thinks of AI, he thinks of “augmented” rather than “artificial intelligence.” For at least the next fifty years this technology will be more about “empowering people rather than replacing them,” he said. To be sure, AI might take over some tasks that you are doing today, but then you will shift to doing something more important, while the machine takes care of more mundane assignments.

What can AI be used for? Oliphant noted a multitude of possibilities.

Any time you have a complex function with many variables for which you want to have an understandable input and output—you can apply AI. For example, self-driving cars, medicine, and geophysics are a few of the fields where AI technology can make a big difference.

In order to apply ML and be successful with your applications, Oliphant noted, you will need to work with people who have domain expertise, who know the business. The good news is that every major tech company is now involved in the AI field. Microsoft, Google, Apple, IBM, and Amazon are pioneering machine learning and artificial intelligence research.

These AI applications, said Oliphant, are on the verge of broad usability. Moreover, they are all written in Python.

Obstacles in the Industry

There is amazing promise in the AI and machine learning industry, but we have a very long way to go, said Oliphant. The current landscape contains challenges such as organizational infrastructures that make data-sharing difficult, and out-of-date regulatory structures that were created for a different era. Technology is changing faster than education can keep up, and software is lagging behind hardware advances; programmers are not yet tapping into the full potential of the hardware that is available to them.

As quickly as things progress, there remain many silos of technological advancement and a general lack of integration when it comes to methodology. “Can’t we figure out frameworks that everyone can use?” asked Oliphant.

AI exists in some basic forms now, but the dream is something bigger. There is still so much that needs to be done in order for the promise of AI to become actual capability, said Oliphant.

Anaconda, a Possible Solution

Launched by Oliphant himself, Anaconda is an open-source tool that simplifies package management and deployment in the Python and R programming languages. It can be used to great effect for data science and machine learning applications. Moreover, its open-source package and environment manager, Conda, is language-agnostic and can distribute software for any language.

When it comes to AI, Python is not enough, explained Oliphant. You need ecosystem frameworks to solve your problems as well as machine learning tools—and Anaconda can bring all of these technologies together. “One of the key things we need is AI integrators bringing people these capabilities in everyday applications.”

Everybody loves modeling, predicting, classifying, and visualizing in AI, and these are fairly easy tasks to complete. The harder things are feature labeling, data-cleaning, data-extractions, deploying, reproducing, and scaling—and this is where Anaconda can help.

Oliphant discussed two other Anaconda tools, Numba and Dask, that he believes are crucial for anyone working in the realm of machine learning. Numba, he explained, is designed to help with scaling up. It is an open-source Python compiler that comes with a CUDA simulator. It can also compile for the CPU and GPU at the same time and make array processing easy. Most importantly, Numba executes code ~2.7 times faster than NumPy.

Dask is a parallel computation library for scaling NumPy arrays and Pandas dataframes. With Dask you can make a collection of arrays or dataframes that are larger-than-memory and can be used in distributed environments. It has a task scheduler that is optimized for computation, helping to run your custom algorithms on distributed nodes. Dask also has beautiful diagnostic dashboards that provide users with performance insight.

You can find more information about these tools on the Anaconda website.

Using AI in Your Organization

To end his presentation, Oliphant discussed how these tools should be applied within an organization or company. How do you actually go about integrating AI into your technology?

Using machine learning and artificial intelligence in any way requires a process, he said. First you have to bring your data together. Here, he suggested using visualization tools because the best way to understand your data is to look at it. Next, it is important to do AI brainstorming and consult with people who have real experience in the field. Once you have found the “right” features for your work, it’s time to build and validate your model and repeat that for many models, then publish and manage at some time-scale.

Following such a rigorous approach to machine learning and armed with tools such as Anaconda, we can start making the transition from the AI we have today to a more seamless AI of the future, concluded Oliphant.

Save the date – March 7th 2019!
North Star AI powered by Proekspert is coming again and tickets are now available.
More info: aiconf.tech


Where to meet us in 2019

Date: January 21
Location: Düsseldorf, Germany
Join Proekspert at the German-Baltic Digital Summit. We’re coming prepared to share our experiences and solutions in the areas of digitalization and Industry 4.0/IoT. Make an appointment to meet us in Germany!

 
Book a meeting
Date: March 7
Location: Tallinn, Estonia
North Star AI, powered by Proekspert, focuses on the technical aspects of data science. The conference connects developers, engineers, data scientists, and data-driven startup leaders. Come find out how your organization can advance through the use of machine learning and data science.
Book a meeting
Date: April 1-5
Location: Hannover, Germany
The Hannover Messe brings together the industry’s key players and provides a forum to discuss their innovations. Proekspert is coming to Hannover to discuss Industrie 4.0, integrated industry, industrial intelligence, predictive maintenance, and smart factory solutions. Join us!
Book a meeting

Go smarter with Proekspert.

Please fill in the contact form below and we'll get back to you as soon as possible.

Thank You!

Your message has been sent. Our team will get back to you as soon as possible.

Close this window