The Design and Simulation of Split Radix FFT Processor using Multi-Bit Flip-Flop for Power Reduction

B. Balaji¹ P. Syamala Devi²
¹PG Student ²Assistant Professor
1,2Department of Electronics & Communication Engineering
1,2Annamacharya Institute of Technology & Sciences, India

Abstract— Fast Fourier transform (FFT) is one of the most important and fundamental algorithm in digital signal processing area. Split radix fast Fourier transform (SRFFT) algorithm requires the least number of multiplications and additions among all the known FFT algorithms, which contribute to overall system power consumption. In the design of such processors, modified radix-2 Shared memory architecture can be used instead of Pipelined FFT architecture. In contrast Shared memory based architecture requires least amount of hardware resources at the expense slower throughput. To implement this Shared memory architecture, Multi-Bit FlipFlop (MBFF) and Superfast low power SRAM (SFLP SRAM) cell are required. The SRFFT can be computed by using a modified radix-2 butterfly unit. The butterfly unit exploits the multiplier gating technique to save dynamic power. In addition, two address generation algorithm are developed for both real and imaginary parts of twiddle factors. In simulation results the power is measured by T- Spice using Tanner tool. Hence the proposed architecture will save more power when it comes to larger points of FFT.

Key words: Address Generation, Low Power, Radix-2, Split-Radix Fast Fourier Transform (SRFFT), MBFF, SFLP SRAM, Twiddle Factors

I. INTRODUCTION

The fast Fourier transform (FFT) is an efficient algorithm that computes the discrete Fourier transform (DFT) of a sequence, or its inverse. Which reduces the number of computations. Since the discovery of FFT, many variants of the FFT algorithm have been developed, such as radix-2 and radix-4 FFT. A new variant of FFT called Split-Radix FFT (SRFFT) was developed by Duhamel and Hollman in the year 1984. Their algorithm requires the least number of multiplications and additions among all the FFT algorithms. Since arithmetic operations significantly contribute to overall system power consumption. The idea behind their algorithm is the application of a radix-2 index mapping to the even-index terms and a radix-4 mapping to the odd-index terms, which results in an “L” shaped butterfly. The signal flow graph of split-radix FFT is the same as radix-2 FFT except for the location and value of twiddle factors. Therefore address generation scheme for conventional radix-2 FFT algorithm could also be applied to SRFFT.

SRFFT is a good candidate for the implementation of a low-power FFT processor. In general, all the FFT processors can be categorized into two main groups: pipelined processors or shared-memory processors. A pipelined architecture provides high throughputs, but it requires more hardware resources at the same time. One or multiple pipelines are often implemented, each consisting of butterfly units and control logic. In contrast, the shared-memory-based architecture requires the least amount of hardware resources at the expense of slower throughput. In the radix-2 shared-memory architecture, the FFT data are organized into two memory banks. At each clock cycle, two FFT data are provided by memory banks and one butterfly unit is used to process the data. At the next clock cycle, the calculation results are written back to the memory banks and replace the old data. The scope of this brief is limited to the shared-memory architecture.

The FFT calculations are arranged into two general classes, to be specific, the decimation in-time (DIT) and the decimation in frequency (DIF) uncommonness calculations. The key contrasts between the two are appeared in Fig. If there should arise an occurrence of DIF calculation (Fig. (a)), the info tests need bit-inversion reordering before being handled, while the yield FFT coefficients are created in characteristic request.

Fig. 1: (a) DIF FFT butterfly (b) DIT FFT butterfly

II. RELATED WORK

A. FFT Algorithm

The conventional signal and image processing applications requires high computational power based on Fast Fourier Transform (FFT) in addition to the ability to choose the algorithm and architecture. The Fast Fourier Transform (FFT) converts a time based signal into its corresponding frequency based signal by manipulating the sum of orthogonal components. The time based signal spectrum is indicated by the phase and amplitude of those components. Inverse Fast Fourier Transform (IFFT) does the reverse process, thus converting the spectrum back to time signal. Every point of data present in the spectrum is called a bin.

B. Radix 2 FFT

When ‘N’ is a power of 2, say N=2^K, where K>1 is an integer, then the above DIT decomposition can be performed K-1 times, until each DFT is length 2. A length 2 DFT requires no multiplies. The overall result is called a radix 2 FFT. A different radix 2 FFT is derived by performing decimation in frequency.

A split radix FFT is theoretically more efficient than a conventional radix-2 algorithm because it minimizes real arithmetic operations. The term “split radix” refers to a DIT decomposition that combines portions of one radix 2 and two radix 4 FFTs. On modern general-purpose processors,
however, computation time is often not minimized by minimizing the arithmetic operation count.

Assume that we have \( N = 2^S \) point FFT, radix-2 FFT require \( S \) passes to finish the computation, as shown in Fig1.

III. PROPOSED METHOD

For split-radix FFT, it conventionally involves an L-shaped butterfly data path whose irregular shape has uneven latencies and makes scheduling difficult. In this brief, we show that the SRFFT can be computed by using a modified radix-2 butterfly structure. This butterfly structure uses the multi-Bit FlipFlop (MBFF) architecture in place of D-Flipflop (DFF) for increasing the speed of the architecture. The signal flow graph of proposed Split-radix FFT is shown in Fig 2.

A. Shared-Memory Architecture

The architecture of shared-memory processor is shown in Fig3. The FFT data and the twiddle factors are stored in the RAM and ROM banks, respectively. We observed that the flow graph of splitradix algorithm is the same as radix-2 FFT except for the locations and values of the twiddle factors and therefore, the conventional radix-2 FFT data address generation schemes could also be applied to SRFFT (RAM address generator). However, the mixed-radix property of SRFFT algorithm leads to the irregular locations of twiddle factors and forbids any conventional address generation algorithm (ROM address generator).

B. Modified Radix-2 Butterfly Unit

In our previous work we proposed a modified butterfly unit which is shown in Fig. 4. The structure of this butterfly unit is determined by the fact that the SRFFT has multiplications of both the upper and lower legs. To prevent unnecessary switching activity, we put the clock gating registers in the multiplier path and a few registers are placed at the address port of memory banks to synchronize the whole design. The key to use this architecture is the knowing about which butterflies require no multiplications (the complex multipliers are then skipped), trivial multiplications (swapping), and nontrivial multiplications (using complex multipliers). In this proposed method, the modified radix-2 butterfly unit uses the Multi-Bit Flipflop (MBFF) in place of D-Flipflop (DFF) for reducing the dynamic power consumption and to speed up the processor.

C. SFLP SRAM

The complete schematic of the proposed Super-Fast Low-Power (SFLP) SRAM cell is shown in Fig5. The SFLP SRAM cell is divided into two circuits: write circuit and read circuit. The write circuit is similar to the 8T cell and it is used to perform the write operation. The read operation is performed by two transistors (N7 and N5) which are used to perform the read operation. Transistor N7 is used to decouple the cell node from the read bit line during write operation/sleep mode.

During write operation read word line (RWL) is set to low so that write circuit completely decouples from the read circuit. The write operation in the proposed cell is...
performed by setting the bitlines (BL and BLB) to the desired logic. During write operation, the proposed SRAM cell behaves as the 8T cell. In the SFLP cell, the switching behaviors of tail transistors are controlled by the respective logic on the storage nodes Q and nQ.

A read operation (RWL = 1, WL = 0) is performed by reading the data with help of the transistors N7 and N5. In read 1 operation, the transistor N5 is turned OFF, flips the node S to logic high, without allowing RBL to discharge which results lower power consumption. When node nQ stores data “1” (read 0 operation), transistor N5 turns ON. Now, the bitline (RBL) discharges through read access transistor N7 and pull down transistor N5. Since, storage nodes Q and nQ are completely isolated from bitlines during read operation; the voltage of the node which stores 0 is strictly maintained at the ground level.

**Fig. 5: SFLP SRAM Cell**

**D. Address Generation of Twiddle Factors**

The flow graph for the 32-point SRFFT is shown in Fig 2, there are two kinds of twiddle factors: $j$ and $W_n$. For those multiplications involving $j$ is called trivial multiplications, because these operations are essentially the swapping of the real and imaginary part of the multiplier, hence no multiplication is involved. For those multiplications involving $W_n$ are called nontrivial multiplications, because complex multipliers are used to complete these operations. In Fig. 2, each area surrounded by the dashed lines is called one L block which is formed by L butterflies in each pass and there are totally ten L blocks for a 32-point SRFFT.

![Fig. 5: SFLP SRAM Cell](image)

**Fig. 6: Pseudocode for tracking trivial and nontrivial twiddle factors**

**IV. RESULTS**

**A. MBFF**

**Fig. 7: Schematic Circuit for Multi-Bit FlipFlop**

**Fig. 8: Simulation Circuit for Multi-Bit FlipFlop**

Power consumption is an important issue in modern high frequency and low power designs. In this proposed method Multi-Bit FlipFlop (MBFF) is used to reduce the clock system power. The scaling with multiple supply voltage is an efficient way to minimize the dynamic power consumption.

**B. Block Diagram**

**Fig. 9: Simulation Block diagram of Shared memory based SRFFT processor**

**C. Power Results**

**Fig. 10: Power results**

**D. Delay Report**

**Fig. 11: Delay Report**
E. Simulation Results

Fig. 12: Simulation Results for Outputs corresponding inputs

In this paper, the proposed DIF SRFFT having normal inputs and produced corresponding Bit-Reversal Outputs.

V. CONCLUSION

In this paper, designed a low power Shared memory SRFFFT processor using Multi-Bit FlipFlop (MBFF) and SFLP SRAM. The proposed design reduces the dynamic power consumption at the expense of more hardware resources when compared to the both conventional radix-2 and radix-4 algorithms. Thus, it does not only achieve the minimum hardware requirement but also saves the power and increases the maximum clock rate at the same time. The Proposed design achieves over 30% lower power consumption in the computation of 32-Point DIF SRFFT. In Future, it was extend to computing a 1024-point complex-valued transform at low power, and it may save significant power for OFDM systems with long length FFT.

REFERENCES