LabVIEW developers often hit performance walls when their applications grow beyond what traditional CPU-based processing can handle. Whether it's real-time signal processing that needs sub-microsecond response times, image analysis requiring massive parallel computation, or data acquisition systems that must process streaming data faster than it arrives - the solution isn't always obvious. Expanding your application beyond CPU may be appropriate if you are running into limitations with:
If any of these sound familiar, accelerated computing with FPGAs or GPUs might be the solution - but choosing the right approach requires understanding both your application's specific requirements, the strengths of each technology, and where you are in the development process.
Decision Framework: CPU vs GPU vs FPGA
When deciding how and where to offload work from the CPU to achieve better performance, first consider your application’s critical performance requirements.
Timing
Timing requirements can vary from nothing more than creating a smooth user experience while post-processing data, to safety-critical real-time applications.
Data Characteristics
At each stage of your data processing, consider the structure, size, and representation of the data.
CPU: Can handle mixed or unpredictable data patterns, and any data representation. Naturally, this is the best place to start developing, then offload processing steps as appropriate.
Algorithmic Complexity
At each stage of your data processing, look at the types of operations being performed.
FPGA Implementation
If sections of the data processing are a candidate for moving to an FPGA, the next step is to move the logic to code that can be deployed to an FPGA, and then select the appropriate hardware. Of course, an experienced FPGA developer could write verilog or VHDL that runs on an off-the-shelf FPGA System-on-Module, and allow LabVIEW to communicate through a TCP connection. However developing within the NI/LabVIEW ecosystem offers rapid development and time to delivery without needing to hire specialists.
Software
Moving your LabVIEW code into FPGA is as simple as installing the LabVIEW FPGA module, and learning a few basics about how to write FPGA code.
Hardware
LabVIEW FPGA is only supported on NI Hardware, but the rapid, easy development is often worth the price of admission, particularly for low volume or R&D applications. NI Hardware solutions can vary depending on price and performance needs.
GPU Implementation
From LabVIEW, there are a variety of ways to offload CPU processing to a GPU - you can have a separate service running in Python, or use a Cloud service and send work via REST APIs. Here, we’ll look at two approaches to accelerating your application with a GPU: a quick and convenient approach that stays within the LabVIEW IDE, and an advanced, comprehensive approach.
LabVIEW GPU Libraries
All LabVIEW users are familiar with the built in LabVIEW primitive functions for array manipulation and basic mathematics, which are executed on a CPU. JKI can help you identify libraries that can allow you to intuitively code in LabVIEW while the work is deployed behind-the-scenes to a GPU. This approach works best if your algorithm can be broken down into a few common operations of array manipulation, linear algebra, and arithmetic.
To have full, unbridled use of a GPU, a developer can write CUDA code, deploy it to the GPU, and communicate with the DLL through a LabVIEW library call. In this way, anything that can be thought of can be computed on the GPU, and accessed through LabVIEW. This approach is recommended if your application requires operations that are not supported by existing labVIEW toolkits, or if additional performance gains are required. This route takes more time and experience, but no license fees apply.
You can conceptualize CUDA code as written in C/C++ with some extra CUDA-specific syntax and functions. Your CUDA instructions are deployed simultaneously to thousands of mini CPUs called CUDA cores. Each core executes the same instructions on a different subset of data, similar to how a “for loop” might iterate over a range of data. Except in CUDA, many iterations happen simultaneously.
GPU Hardware
To get started, all you need is a CUDA compatible GPU. Various form factors are available
Always check the power and cooling requirements of your GPU card, and what the hardware power supply/chassis can provide.
Practical Example
Refactoring to incorporate FPGAs and GPUs
The example below presents a simple producer-consumer architecture. Raw data is acquired, and sent to a consumer loop for processing, and finally presented on the UI. The raw data is filtered, an FFT is performed, and the result is presented to the UI. How might the dataflow be refactored to push processing off the CPU?
Supposing the filtered data was required for time-critical decision making, it would be appropriate to move acquisition and filtering to an FPGA. Digital filters require very little logic and memory, so they are a perfect application for an FPGA. Meanwhile, if the FFT was performed on very large arrays and was needed only for post-processing, the application could be accelerated by offloading it to a GPU. Now, the CPU is only needed for User Interaction, and managing the data transport.
Conclusion
Of course, any combination of GPUs and FPGAs can be incorporated into an application. Initial testing with some of the above methods may reveal which parts of your application would benefit from being offloaded from the CPU, allowing a more mature design to materialize.
Ready to Talk to JKI?
JKI brings years of specialized experience working with every NI platform, GPUs, and FPGAs. We can help you find the hardware you need, and help develop your application to the edge of what’s possible.