Deep Learning in Simulink for NVIDIA GPUs: Generate CUDA Code Using GPU Coder

Simulink^® is a trusted tool for designing complex systems that include decision logic and controllers, sensor fusion, vehicle dynamics, and 3D visualization components.

As of Release 2020b, you can incorporate deep learning networks into your Simulink models to perform system-level simulation and deployment.

Learn how to run simulations of a lane and vehicle detector using deep learning networks based on YOLO v2 in Simulink on NVIDIA^® GPUs. The Simulink model includes preprocessing and postprocessing components that perform operations such as resizing incoming videos, detecting coordinates, and drawing bounding boxes around detected vehicles. With the same Simulink model, you can generate optimized CUDA code using cuDNN or TensorRT to target GPUs such as NVIDIA Tesla^® and NVIDIA Jetson^® platforms.

Published: 22 Oct 2020

Simulink is a trusted tool for designing complex systems that include decision logic, and controllers, sensor fusion, vehicle dynamics, and 3D visualization components. As of release 2020b, you can incorporate deep learning networks into your Simulink models to perform system-level simulation and deployment. If we look inside the vehicle and lane detection subsystem, we'll see the use of two deep learning networks at the top and bottom. Our input video will come in, we'll then do some preprocessing to resize the image, which we then feed into our lane detection network.

Here, you can see this is being brought in by a math file our MATLAB directory. We'll do some post-processing to detect the coordinates of the left number right lanes, and finally we'll do some annotation to highlight the vehicle and lanes. The bottom Deep Learning network is detecting vehicles, and it's based off of YOLOv2. And again, you can see this is being loaded off of the mat file on our directory. So coming back, we can run simulation. You see our input video on the left, and our output video on the right where we're highlighting the left and right lanes with the green markers. And then we're driving yellow bounding boxes around vehicles that we do see. So at this point, we're ready to go ahead and generate code so we can launch either the Simulink coder or embedded code wraps.

Let's take a look at the Code Generation settings first. Here, you'll see we're using the correct system target file, and we've checked the checkbox here to generate CUDA code. We can also take a look at the deep learning libraries. In this case, we can choose either cuDNN or TensorRT so we'll keep cuDNN for now. And for Toolchain settings, we're using NVIDIA's CUDA toolkit. And finally, for non-deep learning parts, we're using optimized libraries like cuBLAS, cuSOLVER, and cuFFT.

So we're all set, let's go ahead and generate code. Here's the Code Generation Report, and you can see the files generated on the left. Let's first look for the step function. Down here, you can see the cudaMalloc calls which is allocating variables on the GPU memory. Here, we have cudaMemcpy's which is copying data from the CPU memory to the GPU memory and back at the right places. And here, are a couple of GPU kernels being launched in order to speed things up on the GPU cores. We have our two deep learning networks as well. Here's our first one, LaneNet. And you can see all the public and private methods here. Here we have setup, predict, cleanup, in addition to several others.

Here's our second Deep Learning network, that's the vehicle detector using YOLOv2. And again, we have the same set of methods. If we look inside of the setup method, you can see the code which is run once at the beginning of the program to load the deep learning network into memory. So if you look inside of here, we are going through one layer at a time, and we're loading the weights and biases as we go through. So that's a quick look at generating CUDA code from Simulink models that use deep learning networks. For more information, take a look at the links below.