By Michel Bissonnette, Advanced Engineer – AEG, Future Electronics
Engineers who are not familiar with programmable logic tend to ignore the technology when architecting their designs. Worries about pricing, complexity of the devices and tools or lack of experience with RTL coding will usually send the designer running to implement an MCU/CPU solution even when this technology is not optimal.
One such example is video. The amount of data generated by an image sensor will typically overwhelm most MCUs and some of the low end CPUs. Add two more sensors and the bandwidth requirements will exceed all but the very high end cores. In this article, we will look at a reference design put together by the Future System Design Center (SDC), that will receive three camera streams, stitch them together and send the resulting frame to an HDMI transmitter. Figure 1 shows the major components of the reference design. Three NanoVesta Lattice cameras are connected by the parallel 12-bits bus to the MachXO2-7000. There are also individual programing I2C ports between each camera and the PLD. The HDMI transmission is done through the Silicon Image SiL9022A. A programming link also exists between the PLD and HDMI transmitter.
Most noticeable by its absence from the design is RAM memory. Since the design is meant to stitch the three images in a panoramic format, the output is recombined on a line by line basis. So the memory requirement is limited to two lines per camera. Why two lines? The camera used on the NanoVesta boards by Lattice for the NanoVesta boards is the MT9M024. The incoming data is presented in a Bayer pattern (Figure 2). This means that the first pixel of the first row is green, followed by a red pixel. The blue pixels appear on the odd lines. In order to recombine Red-Green-Blue data for the HDMI transmitter, we must save two lines. The memory required for the project is therefore:
3 cameras X 2 lines X 24-bits X (1280 pixels per line/2) = 92K
The XO2-7000 has 240K bits of embedded block RAM (EBR) so we should be good to go, right? Well, not exactly. The XO2 EBR architecture presents the RAM block in a 1K X 9-bits wide format. This means that to save the two lines of 640 (1280/2) pixels in a 24-bits format, we will need 1024 cells or 1 EBRs for 9-bits and a total of 3 EBRs to store the entire 24-bits of the Bayer pattern. So there is some waste in the memory. Saving two lines for the three cameras will require nine EBRs. You can experiment with the Diamond IPExpress tool to set up a double line buffer dual port RAM memory block and confirm this. All cameras are synchronized by the TRIGGER input, but the input buffers will take care of any timing differences between individual sensors.
The second critical part of the design is the HDMI output. HDMI is as complex in the digital world as it is in the analog domain. Digitally, it includes packetizing, balancing, checksums, audio and video support and much more. Fortunately, Silicon Image manufactures an HDMI transmitter that takes care of all of this. All that we need to supply the part is hsync, vsync, data enable and 24-bits of RGB data. Since we want our output to run on computer monitors and TV screens, we selected the 720p at 60 Hertz resolution. Figure 3 shows the required timing for HDMI. For monitors and TV to output our video stream, we need to supply 1650 horizontal clocks, on 750 lines and repeat this 60 times per second. The required frequency can be calculated at:
1650 (horizontal clocks) X 750 (lines) X 60 (frames per second) or 74.25 MegaHertz
Visible resolution for 720p is 1280 horizontal pixels by 720 vertical lines. The extra 370 horizontal clocks and 30 vertical lines are respectively for horizontal and vertical synching. Since we will not be saving the frames, the actual timing will be somewhat different. The cameras are set up, using the register set, to output the proper frame data, but they send the data first, followed by the sync signals. This effectively is like reading Figure 3 from the bottom right to the top left instead of top left to bottom right. Since we are streaming video and not grabbing individual frames, this inversion has no visible effect.
The last part of the design is the need to program the individual cameras and HDMI transmitter. All programming is done through I2C so we need an I2C master IP block. This IP can be downloaded from the Lattice website. Register address and desired values are placed in a ROM memory block, also in an EBR, and are read out one by one by a state machine. The address of the register and its value are sent over the I2C bus and the state machine then waits for an ACK or a NAK. The process repeats until all registers are programmed. The I2C master will then inform the image acquisition state machine that the initialization is done and is correct. Using a text editor, the .MEM memory configuration file can be edited, as shown in Figure 4. The Lattice Diamond tool also has a memory editor that allows modification of this configuration file, but it does not allow for comments and block modifications are not possible.
Figure 5 shows data paths and block interactions inside the XO2-7000. We can see that to cross individual clock domains coming from each of the cameras, the data comes in through the dual port ram buffers. The write enable to the left side of the line buffer is simply done by the line valid_signal provided by the camera. The window control block is a mux that selects the proper data for the HDMI transmitter. It is synchronized to the input frame_valid signal of the center camera. All HDMI timings are synchronized to this signal.
So what should the reader take away from this article? Basically, all problems can be broken down in to smaller, more digestible, blocks. Each block can then be analyzed for complexity and possibly simplified. In our example, we could have integrated the HDMI transmitter into the FPGA, but the cost and complexity of doing so would have put the price of the device out of range of the client’s wishes. One of the requirements was for panoramic stitching, which effectively removed the need for buffer RAM. Using the Lattice IP core for I2C cut down development and debug time. Finally, we could have spent more resources on generating timing and synchronization logic, but rather used the camera’s configurable output timing to simplify this task. Attacking a problem with brute force and huge resources will usually result in getting the job done, but at higher cost and effort. Careful planning and analysis always save time and money.