Use of Graphics Processing Units on the Rise
An overview of this year’s GPU Technology Conference (GTC) is about the world of GPU-driven deep learning and real-world applications of AI.
– André Karpištšenko
GPUs are the present in accelerated computing for analytics and engineering. Proekspert has been at the cutting edge of smart machines and software for 24 years and is actively investing in data science software and infrastructure. In the spirit of genchi genbutsu (“go to the source and see it for yourself”), we visited this year’s GTC. Here is a recap of the zeitgeist at the event
While the CPU outperforms the GPU in latency and energy efficiency, the GPU is the way forward for high-throughput massively parallel computing (growing 1.5 times year over year), matching the pace of data growth and reducing the compute gap of the CPUs. John Hennessy from Stanford University has claimed the start of a new era for computing in 2017. The underlying core concept is CUDA (Compute Unified Device Architecture), a decades old parallel computing platform and programming model suitable for accelerating common tensor operations (matrix multiplication and summation), for example in deep learning. With CUDA 9, synchronizing across multiple GPUs enables any scale of computing, a step towards an operating system for accelerated computing. The GPU Open Analytics Initiative is working towards pushing the entire stack of data science into GPUs, with Anaconda data science distribution, the H2O data science platform and MapD database providing the basis.
One of the fields taking the most out of this trend is narrow AI and its applications. At GTC, deep learning and AI made up well over half of the content.
The time when more software will be written by software than humans is no longer so distant. At the forefront of this direction are five tribes of machine learning, a subfield of AI: symbolists, Bayesians, analogizers, evolutionaries, and most prominently connectionists — called deep learning in the mainstream.
For high-end development of deep learning models, numerous frameworks support the most advanced data center GPUs. If you are an engineer making decisions about your technology stack, there is ample choice. Microsoft Cognitive Toolkit (CNTK), which focuses on scalability and performance, Facebook’s highly customizable PyTorch, the production-ready Caffe2, Google’s popular TensorFlow, academic Theano, and the collaborative endeavor MXNet, provide the basis for adding intelligent features related to computer vision, text, speech, images, videos, time-series and more. Symbolic loops over sequences with dynamic scheduling, turning graphs into parallel programs through mini-batching, reduced communication overhead, are but a few of the exemplary features available at production quality. For example, building a leading image classification ResNet that performs better than humans at a 3.5 percent error rate, is estimated to be a 30-minute task with the new frameworks. Deep learning has turned into a popular choice with its Lego-like building blocks that can be rearranged into specialized network architectures. There are many use cases for the method.
As a specific example of networks inspired by game theory, generative adversarial networks are starting to find new applications. Some examples: for simulating data, working with missing data, realistic generation tasks, image-to-image translation (from day to night, for example), simulation by prediction for particle physics, learning useful embeddings in images, and others. Networks are strong for perceiving and learning, but not for abstracting and reasoning. This is being solved by the new wave of AI for contextual adaptation that combines the statistical learning approach with handcrafted knowledge. The need for samples is decreasing considerably both for networks and for the new wave of models. For example, the new models can be trained with tens of labels in a handwritten dataset instead of the previous 60k.
Not limited to deep learning, the rising professional application of GPUs is narrow AI. A prominent field here is autonomous cars, where custom L3/L4 autonomy for cars can be bought without having to build the physical infra. Nvidia PX2, and modular and scalable Driveworks SDKs, make advanced tasks like calibration, sensor fusion, free space detection, lane detection, object detection (cars, trucks, traffic signs, cycles, pedestrians, etc.) and localization fast and easy. Developers of autonomous vehicles can focus on their applications instead of the highly complex development of the base components.
Moving closer to the roots of GPUs, namely computer graphics, there was another maturing trend well present at GTC. The devices for AR and VR have matured considerably in the decades since their inception. Novel directions like AI in VR are explored for interactive speech interfaces, visual recognition, data analysis and collaborative sharing. Corporate R&D teams are working on concepts for the metaverse native generations that are in the early stages. A step in this direction is Nvidia’s Holodeck, which is a photorealistic, collaborative virtual reality environment that incorporates the feeling of real-world presence through sight, sound and haptics. The state of the art can handle products as complex as the new electric Koeningsegg car design. By fitting the entire dataset into the GPU, multi-caching technologies enable interactive slice and dice queries and visualizations of fairly large datasets (384GB as of May 2017) in milliseconds.
Many industries are affected by the rising trend of GPUs; for example, companies focused on healthcare, materials, agriculture, maritime, retail, the elderly, mapping, localization, self-driving, graphics, analytics, games and music are discovering and inventing new ways of interacting with the new era of abundant computing power. While I/O is still the bottleneck, we are entering a new era of craftsmanship focused work at the intersection of art, science and engineering. This is evident from the 11-fold rise in GPU developers during the past five years.
The frontier is about finding better ways to manage the model and experiment complexity explosion. For example, in 2017 Google NMT runs at 105 exaFLOPS with 8.7B parameters; in 2016 20 exaFLOPS and 300M parameters were needed for Baidu Deep Speech 2, and in 2015 Microsoft ResNet required 7 exaFLOPS with 60M parameters. One exaFLOPS is equivalent to running all the supercomputers in the world for one second in May 2017. Proekspert is evaluating how this trend is impacting data scientists generally, and what tools a data scientist needs to achieve and maintain high performance and productivity in the new era.