# Data Encoding Techniques for Network on-Chip(NoC) Links Nikhila P Kamath<sup>1</sup> Dr. Baswaraj Gadgay<sup>2</sup> Veeresh Pujari<sup>3</sup>

1,2,3VTU PG Center, Kalburagi, India

Abstract— As the VLSI circuits become denser, every element in the system gets complicated. With the increase in number of chips embedded on a core, the on chip communication subsystem becomes prominent. Shrinking VLSI technology to nanometer scale, the links of the network on chip dissipate more power. This power dissipation by the links starts to compete with the power dissipated by the other essential elements of the communication subsystem, namely, the network interfaces and routers. The dynamic power dissipation in network links is the major contributor to the power consumption in network on chip. This is due to the switching activity in the links. The self-switching and the coupling capacitance are the most important causes for the dynamic power dissipation. A set of three encoding techniques are proposed to minimize the switching activity in the links. The proposed scheme reduces the self-switching by seeing the switching transition, and then the cross coupling capacitance between the links is checked and verified that the power consumption is decreased at the links of the network interface (NI). The proposed schemes can be employed without any modifications to the link and router design. These schemes are tested for several inputs and the effectiveness of the scheme is verified by synthesizing the proposed design in Xilinx ISE which is coded in Verilog HDL and simulated in ISim simulator. FPGA implementation is done on Xilinx Spartan-3E kit. The proposed design controls power dissipation and the energy consumption without any considerable performance degradation and with a reduced amount of area overhead in the NI.

Key words: NoC, Data Encoding, Network Interface (NI)

## I. INTRODUCTION

The communication quality in an on-chip system greatly influences the performance of the system. Rising complexities in an on-chip system degrades the intercommunication system which is a non-trivial issue to deal with. The complexity of each module in a system rises quickly as the VLSI circuits become denser. Complex systems called Multi-processor System on Chip (MPSoCs) contain increasing number of transistors (in the range of billions of transistors) which are operating at a frequency of several Giga Hertz (GHz). As semiconductor technology scales to nanometer (nm) technology, complexity in wiring viz. communication sub-system also increases rapidly. The next generation MPSoCs and chip multi-processors (CMPs) will contain many cores architecture. The communication subsystem becomes very important as the number of cores incorporated into a System on Chip (SoC) increases. These drawbacks related to on-chip communication system limits some of the factors which affect the operation and leads to more power consumption in recent and subsequent generation SoCs. The main disadvantage of such inter-communication system is bus arbitration and bandwidth limitation. The basic interconnection system over large SoC is busbased scheme. This scheme does not arrive well to very large SoCs, because of many Intellectual Property (IP) blocks which are the processing elements of the SoC. These processing elements are connected together to form a large system on-chip. The Intellectual Property (IP) blocks contend to set up communication over the shared bus. Another foremost downside of on-chip system is the varied characteristics of system components.

# A. Network On-Chip:

Network on-Chip (NoC) MPSoCs and CMPs contain hundreds and thousands of cores and require high performance interconnection to transfer data on-chip. This is accomplished by on-chip buses or multi-layer buses. The performance of SoC design is not up to the mark due to on-chip bus interconnection. In contrast, NoC became a good on-chip infrastructure. It is structured and scalable architecture interconnection with the chip and the problem of complex communication is reduced in the chip.

#### B. Network interfaces

The cores are connected to a onchip network through a device called Network Interface (NI). A list of communication services are provided by NI which includes address decoding and mapping, packetization of the core"s messages and depacketization of the received messages from the core, reordering the packet, priority services etc. The control of the end-to-end flow implemented by NI ensures availability of buffer space for messages entering the network.

#### II. EXISTING ENCODING TECHNIQUE

First set of methodologies are aimed to minimize the switching in the address busses. These methods of choice take advantage of the repetitiveness to reduce the power dissipation. The second set of technics intends to decrease the transitions in the data buses. Among them, the bus invert, by minimizing the transition from classical methods to reduce power consumption. For each bus cycle, bus invert (BI) coding method was originally used to reduce the average and peak power from the switch, is designed to deal with problems on the bus power consumption. This method is capable of reducing between bus lines, but the worst-case coupling capacitance coupling even if it does not eliminate its two adjoining lines in the reverse direction under the handover. Although it is no longer used as a stand-alone method, which is commonly used for the development of more complex and sophisticated system of preliminary points or with more recent technology and the combination of methods to save more power-based bus communication Systems. It is simple; various properties are more or less observable. According to this method encoding is done, if the number of bits that change is more than semi bus width, and then the entire bus is inverted and transmitted. At the receiving part of the bus, decoding is done by adding an extra bit with the transmitted data which indicates if the data in the bus is inverted or not. The main shortcoming of this process is that only self transitions are abridged.

## III. PROPOSED ENCODING TECHNIQUE

The main objective of the proposed methodology is the flit encoding before being interposed into the network, so that the activity of self-switching and clutch switching activity is minimized in the flash compounds crossed. As a matter of turning the facts themselves & coupling activity are responsible for the link power loss. A tip of the tip system is used in the proposed system. Wormhole is convenient and suitable switching technique for on-chip communications. End-to-end encryption leverages wormhole switching technology due to the nature of the wormhole switching technique pipeline.



Fig. 1: A general representation of proposed approach

The flits passing through the wires of the path of the routers maintain the similar sequence. Hence, the power saving encoding scheme applied at all NIs of the NoC yields to same amount of power savings at all the links. Therefore, regularity in power saving is maintained throughout the network. The encoder and decoder blocks added to the NI as depicted in fig 4.1. The NI is improved by the encoder and decoder blocks in the proposed scheme.

The flits are encoded by the encoder at the source NI and decoded at the destination NI by the decoder block employed in its design. This makes the scheme apparent to the basic network and do not generate any overhead in routers and links. This scheme can be employed in the next generation NoC by upgrading the NI which is augmented by the encoder and decoder blocks without affecting the basic network. The encoder encodes the leaving flits of the packet, apart from the header flit in order to reduce the power dissipation occurring within the various router point-to-point links that outline the current packet based on the routing path through the network. Because the routers are not included in or fitted on the logical encoder or decoder, the header is not encoded flit it contains control information such as target address, packet size, the routing logic, & so on. This control information is processed by the router along the switching path. At the destination, the decoder is designed such that the encoded input flit (except for the header flit) NI is decoded accurately with less delay. The design of the proposed scheme is exploited in no VC based implementations.

# IV. RESULTS AND ANALYSIS

The three encoder schemes are simulated individually for their correctness and also simulated by integrating all together in ISim in-built Xilinx ISE simulator. The design functionality of encoding schemes is written in Verilog HDL using Xilinx 13.4 tool. The results for the three encoding schemes are simulated individually and also the integrated top scheme simulation results are brought out.



Fig. 2: Simulation result of scheme I encoder

Any input 'X' given to the input of the scheme I encoder encodes the input data and provides the encoded results at the terminal 'out' provided the clock is forced. The values of Ty block, majority voter are also shown in the output screen. The decoder decodes the encoded data and this output can be seen at the 'decoder out'. The encoder is verified for a set of inputs of 16 bits wide and the encoded result also have the output which is 16 bits. Input X [15:0] Decoder\_out [15:0] Out [15:0] 45 Lesser number of transitions is seen at the output when compared with the input and hence the link power can be reduced with reduced switching factor.



Fig. 3: Simulation result of scheme II encoder

A 32 bit input 'X' to the input of the scheme 2 encoder encodes the input data and provides the encoded results at the output port 'out' if the clock is forced. The values of Ty block, T2 block, T4 block, ones block, and majority voter are also shown in the output screen. The count value of these blocks is also shown in the result window. The decoder decodes the encoded data and this output can be seen at the 'decoder out'. The encoder is verified for a set of inputs and the encoder gives rise to output with lesser number of transitions when compared with the input. This scheme reduces more transitions when compared to scheme 1. Hence, the link power can be reduced with reduced transitions.



Fig. 4: Simulation result of scheme III encoder

Input "X" 32 bit wide given to the input of the scheme III encoder encodes the input data and provides the encoded results at the "out" port if the clock is forced. The values of Ty block, T2 block, T4 block, Te block, ones block, and majority voter are also shown in the output screen. The count value of these blocks is also observed in the result window. The output screen also depicts if the input is half inverted or full inverted by setting it to one. The decoder decodes the encoded data and this result can be seen at the "decoder out". The encoder is tested for a set of inputs and the encoder gives rise to output with lesser number of transitions when compared with the input. This scheme reduces more transitions when compared to scheme I and scheme II. Hence, the link power can be reduced with reduced transitions.



Fig. 5: Simulation result of top encoder

All the three schemes are integrated into a single top module. The inputs and outputs are as shown in the simulation results. This top module is easier to analyze the outputs of all the three schemes simultaneously. The encoder is tested and verified for a set of inputs and results show that there is reduction in the quantity of transitions at the output when compared to the input. By this we can infer that the link power reduction can be achieved by employing any of these schemes as per the requirement.

| Device Utilization Summary (estimated values) [-] |      |           |             |
|---------------------------------------------------|------|-----------|-------------|
| Logic Utilization                                 | Used | Available | Utilization |
| Number of Slices                                  | 314  | 4656      | 6%          |
| Number of Slice Flip Flops                        | 64   | 9312      | 0%          |
| Number of 4 input LUTs                            | 611  | 9312      | 6%          |
| Number of bonded IOBs                             | 97   | 232       | 41%         |
| Number of GCLKs                                   | 1    | 24        | 4%          |

Fig. 6: Summary of device utilization

The comparative study of all three schemes is done by testing the top module for various inputs. The efficiency of the schemes is calculated individually by noting the number of transitions before encoding and after encoding.

#### V. CONCLUSION

In this project, a series of new data coding techniques that are intended at reducing the power dissipated by the links of a NOC are presented. The tires of a on-chip communication sub-system are liable for a noteworthy part of the total power consumption of the communication system. The contribution is expected in the future will increase micrometer scale meter technology nodes. The underlying principle of the proposed arrangements not only itself but also switching activity (especially) the link switching activity decrease primarily in charge for link power dissipation in the micrometer scale meter technology nodes. The applications of suggested encoding techniques transparent with respect to the original Noc architecture, in the sense that they do not require any change is not in the routers nor in the compounds. The proposed encoder & decoder logic are additional to the NI less overhead area. The power consumption is reduced by the reduced number of transitions after encoding the input flutters This reduces, in turn, limit the switching and thus the dynamic power dissipation. The proposed encoder module and the decoder module are designed as described in Verilog HDL synthesized on RTL level, and simulated in Xilinx ISE 13.4 and ISIM (Xilinx built-in simulator). The FPGA implementation is done on Xilinx Spartan-3E kit. The comparative study shows that three scheme are better than the other two approaches, depending on the application.

# VI. FUTURE WORK

Along with the data encoding and decoding schemes to lower self and coupling switching activity factors, data can be transmitted by low-swing signals in order to reduce the energy consumption. An error control coding (ECC) technique can be exploited, to maintain the communication reliability at low-voltage swing data signals.

### REFERENCE

- [1] M. R. Stan and W. P. Burleson, "Bus-invert coding for low-power I/O," IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 3, no. 1,pp. 49–58, Mar. 1995.
- [2] S. Youngsoo, C. Soo-Ik, and C. Kiyoung, "Partial businvert coding for power optimization of application-specific systems," IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 9, no. 2, pp. 377–383, Apr. 2001.
- [3] R. Ayoub and A. Orailoglu, "A unified transformational approach for reductions in fault vulnerability, power, and crosstalk noise and delay on processor buses," in Proc. Design Autom. Conf. Asia South Pacific, vol. 2. Jan. 2005, pp. 729–734.
- [4] M. Ghoneima, Y. I. Ismail, M. M. Khellah, J. W. Tschanz, and V. De, "Formal derivation of optimal active shielding for low-power on-chip buses," IEEE Trans. Comput.-Aided Design Integr. Circuits Syst., vol. 25, no. 5, pp. 821–836, May 2006.
- [5] Palm JCS, Indrusia LS, Moraes FG, Ortiz AG. "Inserting data encoding techniques into noc-based systems". IEEE

- Computer Society Annual Symposium on VLSI (IVLSI'07); 2007. p. 299–304
- [6] K. W. Ki, B. Kwang Hyun, N. Shanbhag, C. L. Liu, and K. M. Sung, "Coupling-driven signal encoding scheme for low-power interface design," in Proc. IEEE/ACM Int. Conf. Comput.-Aided Design, Nov. 2000, pp. 318–321.
- [7] P. P. Pande, H. Zhu, A. Ganguly, and C. Grecu, "Energy reduction through crosstalk avoidance coding in NoC paradigm," in Proc. 9th EUROMICRO Conf. Digit. Syst. Design Archit. Methods Tools, Sep. 2006, pp. 689–695.
- [8] C. P. Fan and C. H. Fang, "Efficient RC low-power bus encoding methods for crosstalk reduction," Integr. VLSI J., vol. 44, no. 1, pp. 75–86, Jan. 2011.
- [9] Palesi M, Ascia G, Fazzino F, Catanian V. "Data encoding schemes in networks on chip". IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems. 2011; 30(5):774–86.

