12Gbps All Digital Low Power SerDes Transceive
- 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
- 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
- 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
Figure 2. New three levels encoding technique
Figure 1. Block diagram of the proposed SerDes transceiver
A 12Gbps All Digital Low Power SerDes Transceiver
for On-Chip Networking
Sally Safwat, Ezz El-Din Hussein, Maged Ghoneima, and Yehea Ismail
Nano-electronics Integrated Systems Center (NISC)
Northwestern University/ Nile University
Abstract —In this paper, a new self-timed signaling technique for reliable low-power on-chip SerDes (Serialization and DeSerialization) links is presented. The transmitter serializes 8 parallel bits at 1.5GHz, and multiplexes the 12Gbps serial data stream with a 24GHz clock on a single line using three level signaling. This new signaling technique enables the receiver to recover the clock from the data with a simple phase detector circuitry . Moreover, t his technique is insensitive to jitter accumulated during signal propagation or at the receiver input because the clock signal is extracted from the multiplexed data stream. Hence, timing errors in the received signal reflects in both the data and the extracted clock, and the data will be sampled correctly. The SerDes transceiver was implemented for a 3mm long lossy on-chip differential transmission line in 65nm TSMC CMOS technology. A primary advantage of building an all digital SerDes transceiver is the ease of scaling with technology, and the power and area reduction. The total power consumed in the Tx/Rx pair with the transmission line is 15.5mWatt, which is very small as compared to similar published signaling architectures.
I. I NTRODUCTION
In future applications, hundreds or possibly thousands of cores are expected to be integrated on a single chip. A Network-on-Chip communication is based on passing the data from one core to another. This needs an efficient communication network infrastructure design to meet the specifications of a low power, high speed and reliable data transmission. Transmitting the data by using the conventional parallel bus for a long distance on-chip has become no longer suitable in terms of power, area and reliability. Interconnects do not scale with the technology, which puts a lot of pressure on wiring complexity and area occupied. Timing errors due to jitter and skew on the parallel bus makes the receiver synchronization very hard, and limits the bandwidth. Also, cross talk, noise and coupling from adjacent lines limit the bandwidth.
One promising solution is replacing the parallel bus by serial links with SerDes transceivers [1], [2] and [3]. At low frequencies, the transmission line behaves as an RC interconnect. Repeaters are inserted along the interconnect to improve the signal and this will increase the area and power. Serializing the data allows transmitting high frequency signals to benefit from the characteristics of the transmission line [4] controlled by the lateral dimensions.
This paper proposes a new signaling technique that further reduces the power consumption and routing resources of SerDes links. The data and the clock are multiplexed using a new three-level coding technique shown in Figure 2. This coding technique enables recovering the clock from the data at the receiving side with simple circuitry. This coding maintains the signal DC level at V DD /2 independent of the data pattern, eliminating the need for equalization circuits [1]. Furthermore, this new signaling technique also eliminates the need of sending the clock using an extra wire, or using a complex conventional Clock Data Recovery (CDR) or a Phase Locked Loop (PLL) at the receiver side. Accordingly, the proposed signaling technique results in lower area, power consumption and metal track usage as compared to other conventional SerDes designs [1], [2] and [3]. Moreover, this technique is insensitive to jitter accumulated during signal transmission because the clock signal is extracted from the data stream. Hence, any timing errors in the received signal will also reflect This work was partially supported by the Middle East Energy Efficiency Center – Intel, and the Egyptian Science and Technology Development Fund.
The proposed SerDes architecture including the transmitter and receiver is presented in section II. Transmission line design and signaling is presented in section III. Results of a 65nm TSMC implementation of the technique, in addition to a comparison with other conventional SerDes designs are presented in section IV. Finally, section V presents the conclusions.
II.S ER D ES T RANSCEIVER
The system design is presented in this section. Figure 1 shows the block diagram of the proposed architecture. The system consists of a transmitter, a receiver and a lossy on-chip wire that will be used as a transmission line. The transmitter serializes 1.5GHz 8-bit parallel data, generates the differential three-level code shown in Figure 5, and drives the lossy transmission line. The receiver decodes the three-level coded data, and extracts the clock from the received signal using a simple phase detector. A deserializer is then used to recover the 8-bit parallel data from the 12Gbps serial data stream.
A.Transmitter
The transmitter consists of a 24GHz differential ring oscillator, a serializer and an encoder.
Figure 3 shows the block diagram of the serializer. Double edge trigger flip flops are used to serialize the data [5]. The flip flops were implemented using C2MOS (Clocked CMOS) registers [6], which are clock overlap insensitive. Three stages differential digital controlled ring oscillator is used to generate
a 24GHz clock; this clock is later divided to serialize the
1.5GHz 8-bit parallel data. Any jitter in the ring oscillator will be deemphasized after the clock division. The three-level encoder is shown in Figure 4. Transmission gates were used to produce the three levels code and sized to perform the source matching. Figure 5 shows the internal signals of the three-level encoder, and the transmission line input and output signals.
B.Receiver
The receiver consists of skewed inverters, a phase detector and a deserializer. The two inverters at the receiver input are skewed with a low switching threshold to detect
DD
and generate two signals B (Active Low signals). This decoding technique represents each ‘1’ bit in the received data by a pulse on A followed by a pulse on B, and represents each ‘0’ by a pulse on B followed by a pulse on A.
The phase detector is used to reconstruct the data and the clock from these pulses, as shown in Figure 6. This phase detector works as a three-state machine, starting with both QA and QB set to ‘1’ representing the reset state. When a ‘0’ pulse is received on A the phase detector will reset QA, and then wait for a pulse on B to set it back to ‘1’. The same will be done with QB, when the phase detector receives a pulse on B followed by a pulse on A. An SR latch is used to set the output using QA and reset it using QB. An OR gate is used to extract the 12 GHz output clock. Figure 7 shows the output waveforms of the skewed inverter, the phase detector, and the extracted 12GHz clock. The deserializer is simply a shift register used to convert the 12Gbps serial data stream, to 8-bit parallel data at a clock frequency of 1.5 GHz.
III.T RANSMISSION L INE A ND S IGNALING
At low frequencies on-chip interconnects behave as RC interconnects that introduces a low pass effect to the signal. As we move into higher frequencies, the inductance effect becomes significant enabling the high frequency signals to travel with the speed of light. These high frequency signals still suffer from attenuation along the line due to the resistance of the interconnect (lossy transmission line) [4]. However, this attenuation is constant across different frequency components, so the signal keeps its eye.
Figure 4. Three levels encoder
Figure 5. The encoder internal signals and the transmission line
input/output signals
Figure 3. Serializer
Random data streams cover the entire frequency spectrum, and low frequency components will introduce large distortion to the signal. Equalization is required to suppress this distortion effect either using a Feed-Forward Equalizer (FFE) at the transmitter or using a Decision Feed-back Equalizer (DFE) at the receiver [1]. In this design, three-level coding is used to avoid using these complex and high power equalizers. Another issue that should be considered especially with high frequency signals is signal reflections, which requires matching the transmission line to avoid signal distortion. In this design source matching is used instead of matching at the receiver side to make use of the transmission gate (driver) resistance and benefit from the signal reflection at the receiver side to double the amplitude of the received signal. Figure 8 shows the cross section of the transmission line and the parasitic parameters. From the parasitic values of the interconnect, the attenuation constant α=0.313 mm -1 [4], the characteristic impedance Z 0=64Ω and the propagation speed is 37.6 mm/ns. Hence, the propagation time for the signal throughout the 3mm long transmission line is 80ps, which can be noticed from Figure 5.
IV. S IMULATION R ESULTS
The proposed SerDes architecture was implemented in TSMC 65nm technology and simulated at a supply of 1V. The line drivers are designed to match the characteristic impedance of the transmission line. When perfect matching is achieved at the source (line driver), the swing at the transmission line input is expected to be reduced by half, i.e. 500mV swing instead of 1V swing. Figure 9 shows the eye diagram for the input signal of the transmission line, where the swing is reduced to something between 500 and 600mV. Using a transmission line with length of 3mm and attenuation constant α=0.313mm -1, results in total attenuation of (1-e -αι)=60% of the signal amplitude, and as there is no matching at the receiver side (only a small capacitive load) the reflections will double the signal amplitude resulting in attenuation of only (1-2e -αι)=20% of the signal amplitude at
the input of the transmission line. Figure 10 shows the eye diagram for the output signal of the transmission line.
Although there is a significant amount of jitter in the received signal shown in Figure 10, the data was received and sampled correctly. Figure 11 shows the eye diagram of the extracted 12GHz clock, and Figure 12 shows the eye diagram of the extracted 12Gbps data stream.
Figure 6. Phase Detector
Figure 7. Output waveforms of the phase detector
Table 2. Design Comparison
This work [1] [2]
Technology 65nm 130nm 130nm
Core voltage 1V 1.2V 1.2V Data rate
12Gb/s 3.125Gb/s 9Gb/s
Power 15.5mWatt 100mWat 600mWatt
Table 1. Power Consumption
Component Power
Tx Ring Oscillator 4.5mWatt
Serializer 4mWatt Line Driver 4mWatt Rx
Phase detector 2.5mWatt
Deserializer 0.5mWatt Figure 8. Cross section of the on-chip transmission line
Figure 9. Eye diagram for the input signal of the transmission line Figure 10. Eye diagram for the output signal of the transmission line
Figure 12. Eye diagram for the extracted data
Figure 11. Eye diagram of the extracted clock
Table 1 shows the power consumption in each block of the design, and Table 2 shows a comparison between this work and others [1] and [2]. A large reduction in the total power consumption can be noticed, because the proposed design eliminates the use of the power hungry blocks, such as PLLs, FFE, CDR, DFE and the repeaters along the transmission line.
V.C ONCLUSIONS
A new self-timed signaling technique was proposed in this paper, and proven to consume lower power as compared to other works. The new three levels encoding technique enables recovering the clock from the data, which eliminates any need for sending the clock through an extra wire or using power hungry complex blocks, such as PLLs and CDRs. This leads to power and area savings and reduces the routing complexity. The proposed technique enables implementing the whole architecture using simple digital circuits.
An all digital SerDes is very attractive when technology scales as it shortens the design cycle. The only limitation for such CMOS high voltage swing digital circuits is the maximum data rate achieved.
R EFERENCES
[1]T.Geurts, W. Rens, J. Crols, S. Kashiwakura, and Y. Segawa, "A 2.5
Gbps - 3.125 Gbps multi-core serial-link transceiver in 0.13 μm CMOS," Solid-State Circuits Conference, 2004. ESSCIRC 2004.
Proceeding of the 30th European , vol., no., pp. 487- 490, 21-23 Sept.
2004
[2]J.Young, J. Kang, S. Park, and M. Flynn “A 9Gbit/s serial Transceiver
for on-chip global signaling over lossy transmission lines,” Custom Intergrated Circuits Conference (CICC) 2008, pp. 347-350.
[3]M. Harwood, N.Warke, “A 12.5Gb/s SerDes in 65nm CMOS using a
baud-rate ADC with digital receiver equalization and clock recovery,”
IEEE International Solid-State Circuits Conference ISSCC Dig. Tech.
Papers, pp. 611-613, June 2007.
[4] E. E. Hussein and Y. I. Ismail, “A novel variation insensitive clock
distribution methodology,” Proceedings of 2010 IEEE International Symposium on Circuits and Systems (ISCAS),May 2010, pp.1743-1746.
[5]R. A. Philpott, J. S. Humble, R. A. Kertis, K. E. Fritz, B. K. Gilbert and
E. S. Daniel, “A 20Gb/s SerDes transmitter with adjustable source
Impedance and 4-tap feed-forward equalization in 65nm bulk CMOS,”
IEEE Custom Intergrated Circuits Conference (CICC) 2008, pp. 623-626.
[6]Y. Suzuki, K. Odagawa, and T. Abe, “Clocked CMOS calculator
circuitry,” IEEE Journal of Solid State Circuits, vol. SC-8, December 1973, pp. 462-469.。