The latest C language tools let not proficient in hardware development for you to quickly complete algorithm design-intensive applications.
Hardware designers have started in high-performance DSP designs using FPGA technology because it can provide better than PC or microcontroller solutions quickly on 10-100 times the amount of the operation.
Previously, on hardware design are not familiar with the software developers were very difficult to play to the advantages of the FPGA, which today is based on the c-language approach can allow software developers to easily play the FPGA advantage. These are based on the c-language development tools can be better than a hardware based on HDL language design more save design time, and does not require too much hardware knowledge.Because of these advantages, the FPGA technology not only makes these devices as the I/O devices of the front end, FPGA can achieve a lot of high bandwidth and computation-intensive applications for real-time processing.
In addition, the FPGA can also be very tightly integrated with onboard memory, and a circuit board on the integration of multiple devices. Better still, FPGA circuit board through emerging serial communication standard for communication, such as Rapid I/O or PCIX. These latest technology allows FPGA-based systems than the existing multi-CPU and DSP system price/performance than an order of magnitude.Therefore, use CPU and DSP solutions to high-bandwidth and algorithm intensive problem situations, such as medical imaging, industrial applications and military sonar and radar, often using FPGA.
Designers use the new C-based language development tool for developing DSP (in a PCI board installed single or multiple pieces FPGA processor), you can achieve the previously mentioned improvements in performance and shorter time-to-market.This article shows you how to to designers using C language tools in the FPGA-based signal processing in the system, and step by step developer describes multiple FPGA system implementation algorithm intensive signal handler process.
Use C language on FPGA computing solutions for programming, could be the program's execution time is reduced from 12 minutes to just 2 seconds.Through the C language and hardware interfaces
Suppose you design an algorithm intensive signal processing programs, such as analysis of thousands of kilometres of highway.
This application requires are/reverse Hough transform algorithm, which is also available on aerial picture of rivers and roads, as well as semiconductor surface defects.If you are using based on Pentium 4 and Windows XP PC, with multiple FPGA PCI Boards (such as the Tsunami plate), c-language development environment and Handel-C (Celoxica development environment) for the design, and suppose your HDL hardware language know very little, are familiar with some of the FPGA design basics.
The design process from the preparation of the C language code, and then copy the code into Handel-C and simulation on a PC, with the ultimate in multiple FPGA processor to run the tests.At the beginning, the first decision c language code on which algorithm acceleration.
A good analysis tools, such as the Intel VTune Performance Analyzer can help you find that consumes excessive clock cycle. In the signal processing applications, exclusively by the CPU complete algorithm takes 12 minutes, after analyses found that time almost consumed in a variety of nesting of the loop, this clearly shows what the code is accelerated by FPGA accelerator. After accelerating code need to be approved by the PCI bus on a PC input and output. Does the i/o data speed on PCI bus speed range – from 70 to 200Mbps.The next challenge is to create the FPGA design features to speed up the code.
Due to the FPGA can perform thousands of instruction, access hundreds of memory blocks, so the "channel" and "parallel processing" technologies can be leveraged to acceleration. Use pipeline technology, a path is a sequence of instructions, that is, when some algorithm is part of the data "channel" is executed, the other algorithms will be in the same "pipeline" in the rear section is executed, the process is very similar with the automatic production line. Have long clock through parallel processing to significantly reduce the running time (Figure 2).Finally, you must analyze the various algorithms, step by step into the mathematical operations (addition, subtraction, multiplication, Division, integral), latency, memory and look-up table, and so on.
No matter how complex algorithms can be broken down into these basic operations, but these actions in each case without the associated can be processed in parallel.Our sample application can be accelerated: nine cycles is fully processed, a pipe in initial delay after each clock is output as a result, then these cycles is embedded in the X, y, Θ 3-d loop, so the total number of cycles is 9 + (9 * X * Y * Θ), that is, each handling blocks only includes nine such cycle: delay + (nine cycles * 64 pixels * 64 pixels * 64-bit depth).
Although you can implement FPGA floating point unit, but they can quickly consume FPGA resources, so if you can, it is best used with caution.
Relies mainly on floating point arithmetic best converted to fixed-point arithmetic, so that you can use with the "module" method, and the floating point by point approach to designing the entire system. Then, by comparing the actual output and the original full-floating point software implementation to determine conversion accuracy. In the example of Hall algorithm, 14b + 7b fixed-point resolution and full floating point results are identical.Determine resource
In the design, the need for each processing part of clock cycles count.
Typically, each clock cycle can be completed in two to three operations, and then determine the FPGA resources required to meet code. Can multiple FPGA in subparagraph run code for higher computing power. The development of these solutions is easy, as long as you use the multiple FPGA (up to five), the system will automatically detect them.In this example, the design is based on processing block.
These blocks are sent sequentially to each FPGA, or collect from each FPGA (its logic is part of the code). A FPGA acceleration rate can reach 37: 1, and 10 FPGA (each two circuit boards 5) 370: 1 can be achieved.Design coding is relatively simple, because the design completed, mainly by the C language, except for some special feature Handel-C directive.
These new instructions are: enhanced bit operation, parallel processing, macro actions, and formulas, arbitrary width variable, FPGA memory interface, RAM and ROM type, signal (on behalf of the hardware of the signal line) and channel (in code parallel branches or clock domain communication). Sidebar "code conversion" can complete samples of C and Handel-C.Environmental simulation
Then the next step is to establish a simulation environment, including testing and optimizing hardware code.
Simulation environment provides complete bit-true/cycle-true simulation, and implementation on FPGA for reliable simulation. Use of design output and output C software simulation to test the accuracy of the comparison, the same can be true on FPGA processor running speed. Typically, a structure block simulation helps find design problems, because these blocks in the reorganized can determine the overall effect of the operation. You can do in the simulation process further adjustments, such as the use of assembly lines in each clock cycle single input single output test, or the process broken down into more parallel data stream until the FPGA resources utilization reaches 100%. In addition, when you compile the hardware can be found most slowly algorithm and optimization, even in the FPGA Board between segmentation can also get additional speed.Use of the software, further adjustments to better performance.
However, the precise adjustment of the performance gain will decline. Through a simple increase in the FPGA is very cost-effective. Does not need to make the design wanmeihua because the design based on these results may at any time for fast simulation and optimization. Once the simulation is complete, you can design compiled into hardware and activation data flow management (DSM), so that the data stream sent to FPGA processor board instead of the emulator.
No comments:
Post a Comment