This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Best implementations of quaternary adders

Daniel Etiemble Computer Science Laboratory (LRI)
Paris Saclay University
Orsay, France
de@lri.fr
Abstract

The implementation of a quaternary 1-digit adder composed of a 2-bit binary adder, quaternary to binary decoders and binary to quaternary encoders is compared with several recent implementations of quaternary adders. This simple implementation outperforms all other implementations using only one power supply. It is equivalent to the best other implementation using three power supplies. The best quaternary adder using a 2-bit binary adder, the interface circuits between quaternary and binary levels are just overhead compared to the binary adder. This result shows that the quaternary approach for adders use more transistors, more chip area and more power dissipation than the corresponding binary ones.

I Introduction

Many designs of quaternary adders have been proposed in the recent years. Most of these papers are based on simulations using parameters of CNTFET technology. The recent most significant ones are [1][2] [3].:

  • [1] only uses one power supply.

  • The quaternary half adder presented in [2] uses 3 power supplies, even if the technique used to get the intermediate power supplies is not precised.

  • [3] presents both single-supply and 3 supplies versions.

In this paper, we propose a new design of quaternary adders using the same assumptions as in these three papers. This design leads to the most efficient implementation in term of transistor count.

II Methodology

II-A Why CNTFET technology?

This technology uses field-effect transistors that use a single carbon nanotube or an array of carbon nanotubes as the channel material instead of bulk silicon in the traditional MOSFETs. The MOSFET-like CNTFETs having p and n types look the most promising ones. The technology has advantages and drawbacks:

  • CNTFETs have variable threshold voltages (according to the inverse function of the diameter). This is a big advantage compared to CMOS for which different masks are needed to get different threshold voltages.

  • Among advantages, high electron mobility, high current density, high tranductance can be quoted.

  • Lifetime issues, reliability issues, difficulties in mass production and production costs are quoted as disadvantages.

  • CNTFET technology is far from being a mature one. In 2019, a 16-bit RISC microprocessor has been built with 14,000 CNFET transistors [4]. While this is an advance for CNTFET technology, we may observe that the Intel 8086 CPU, which was a 16-bit microprocessor, has been launched in 1978 with 29,000 transistors, more than 40 years ago!

However, as CMOS circuits and CNTFET ones have basically the same circuit styles, CNTFETs can be used to propose a new implementation of quaternary adders and compare it with previous published proposals.

II-B Comparing different implementations of quaternary adders

The transistor count is used to compare different implementations of quaternary adders. As comparisons are done by using the same technology and the same operators, the transistor count is significant as it is very doubtful that more transistors could lead to:

  • less interconnects

  • reduced chip area

  • reduced power dissipation

  • reduced propagation delays

  • Etc.

III Quaternary circuits

III-A Four different levels

While binary circuits have 0 and 1 levels, quaternary circuits have four levels 0 < 1 < 2 < 3. The corresponding levels could be voltage, current or charge levels.

  • Charge levels. This approach is used in flash memories. 4-valued (MLC) flash memories store two bits per cell. 8-valued (TLC) memories store 3 bits per cell. In 2018, ADATA, Intel, Micron, and Samsung have launched some SSD products using QLD NAND-memory with 4 bits per cell. While binary flash memories have the advantage of faster write speeds, lower power consumption and higher cell endurance, M-valued flash memories provide higher data density and lower costs. But charges are not suitable for combinational circuits

  • Current levels. Current levels have been used, but are no longer suitable because of the static power dissipation. Power dissipation is the main issue in to-day integrated circuits.

  • Voltage levels. This is the only practical approach to design combinational circuits.

III-B Three or one power supplies

Refer to caption

Figure 1: 4 voltage levels with 3 power supplies [2]

Refer to caption

Figure 2: 4 voltage levels with 1 power supply [3]

Refer to caption

Figure 3: 4 voltage levels with 1 power supply [1]

The first approach to get four different voltage levels is to use three power supplies: Vdd/3V_{dd}/3 , 2Vdd/32V_{dd}/3 and VddV_{dd}. Fig. 1 presents a possible implementation using transmission gates. S0,S1,S2,S3S_{0},S_{1},S_{2},S_{3} true and complementary control inputs are used to transmit to the output one of the four voltage levels corresponding to 0, 1, 2 and 3. The 3 power supplies version of [3] uses the same scheme. This approach drawback is to use three voltage supplies instead of one in the binary case. The second approach uses only one power supply for levels 0 and 3 and generates levels 1 and 2 through resistor-like dividers. Fig. 2 shows a first implementation. There are four several pathes: only one should be active to get each output value. Transistors T1, T2, T5, T6 are always on (resistor behavior). The inputs of the other transistors should be fixed to get these transistors on or off.

  • Level 0 : T9 on ; T0, T3, T4, T7 and T8 off

  • Level 1 : T0 and T3 on ; T4, T7, T8 and T9 off

  • Level 2 : T4 and T7 on ; T0, T3, T8 and T9 off

  • Level 3 : T8 on ; T0, T3, T4, T7 and T9 off

Fig. 3 presents a variant of the previous one. Only one path with resistor-like transistors is used with two resistor-connected p and two resistor-connected n transistors. T6 is used to bypass T1 and T7 is used to bypass T4.

  • Level 0 : T9 on ; T0, T5, T6, T7 and T8 off

  • Level 1 : T0 and T7 on ; T5, T6, T8 and T9 off

  • Level 2 : T5 and T6 on ; T0, T7, T8 and T9 off

  • Level 3 : T8 on ; T0, T5, T6, T7 and T9 off

Both circuits are similar with 10 transistors. This approach has two drawbacks. Levels 1 and 2 generates static power dissipation. The resistors in pathes 1 and 2 increase the RC loads and degrade switching times compared to pathes 0 and 3.

III-C Encoder and decoder circuits

The encoder circuits can be derived from the circuits presented in Fig. 1, Fig. 2 and Fig. 3. The decoder circuits are easy to implement. They correspond to Table I in which binary values are 0 and 3. NQI, IQI and PQI outputs are provided by 3 inverters having 3 different threshold levels. Fig. 4 shows the corresponding circuits presented in [1]. The situation is similar whether circuits use 3 or 1 power supplies. Appropriate threshold levels are got by defining the chiral number of each transistor used in the inverter.

TABLE I: Truth table of decoder circuits
IN NQI IQI PQI
0 3 3 3
1 0 3 3
2 0 0 3
3 0 0 0

Refer to caption

Figure 4: Decoder circuits presented in [1]

IV How to implement a quaternary adder

Table II shows the truth table of a 1-digit quaternary adder. There are different techniques to implement a quaternary 1-digit adder:

  • The simplest way is to use a 2-bit binary adder and to interface it with a 4-to-2 decoder and a 2-to-4 encoder. The corresponding adder is presented in section V.

  • The opposite approach is the direct implementation of Table II by using the general approach. A function f(inputs) is decompose into f(inputs) = 3.f3 + 2.f2 + 1.f1 where f3, f2 and f1 are respectively the binary functions of the inputs for which the functions have values 3, 2 and 1. f3, f2 and f1 includes the NQI, IQI and PQI functions of input variables (Table I). This approach is used in the adder presented in section VI.

  • An intermediate approach uses multiplexers to implement subfunctions that can be derived from Table II. An example of subfunction is the successor function: When A = 1 and Ci = 0 then QS = (B+1) mod. 4. Two adders using this approach are presented in sections VII-A and VII-B.

From the 1-digit quaternary adder, N-digit quaternary carry propagate (CPA), carry lookead (CLA) and carry save (CSA) adders can be easily derived.

TABLE II: Truth table of a quaternary adder
A B Ci QS QC A B Ci QS QC
0 0 0 0 0 0 0 1 1 0
0 1 0 1 0 0 1 1 2 0
0 2 0 2 0 0 2 1 3 0
0 3 0 3 0 0 3 1 0 1
1 0 0 1 0 1 0 1 2 0
1 1 0 2 0 1 1 1 3 0
1 2 0 3 0 1 2 1 0 1
1 3 0 0 1 1 3 1 1 1
2 0 0 2 0 2 0 1 3 0
2 1 0 3 0 2 1 1 0 1
2 2 0 0 1 2 2 1 1 1
2 3 0 1 1 2 3 0 2 1
3 0 0 3 0 3 0 1 0 1
3 1 0 0 1 3 1 1 1 1
3 2 0 1 1 3 2 1 2 1
3 3 0 3 1 3 3 1 3 1

V Quaternary adders with quaternary to binary interfaces

The simpliest way to implement a quaternary adder is to interface a 2-bit binary adder with quaternary to binary decoder and encoder circuits. Table III presents the truth table of the quaternary to binary conversion. Binary values are 0 and 3.

V-A 4 to 2 decoder circuit

The decoder circuit is presented in Fig 5. The circuitry is the same using 3 or 1 voltage levels. It is based on the inverters 1, 2 and 3 with the different threshold levels (such as the inverters presented in Fig. 4) followed by usual binary gates. The number of transistors depends on the implementation of the XOR gate. It ranges from 16 T when using 4 Nand gates down to 3 T as proposed in [5] (Fig.6). An acceptable value is 9 T, which corresponds to a conventional CMOS implementation used in [6]. This implementation doesn’t use pass transistors and has a full swing output. The overall transistor count for the decoder ranges from 28 T (most conservative implementation) down to 15T with 21 T as an acceptable value.

TABLE III: Truth table of decoder circuits
Q NQI IQI PQI X1 X0
0 3 3 3 0 0
1 0 3 3 0 3
2 0 0 3 3 0
3 0 0 0 3 3

Refer to caption

Figure 5: Quaternary to Binary Decoders

Refer to caption

Figure 6: CNTFET 3T Xor

V-B 2 to 4 encoder circuits

The binary to quaternary encoder circuits depend on the technique that is used to generate the four output values.

V-B1 Encoder of Fig. 1

The encoder circuit corresponding this approach is shown in Fig. 7. It uses 16 T.

Refer to caption

Figure 7: Binary to Quaternary Encoder

V-B2 Encoder of Fig. 2

The inputs of transistors T0, T3, T4, T7, T8 and T9 should be controled. p transistors are on when the input is 0 and n transistors are on when the input is 1. The corresponding truth table is shown in Table IV. The corresponding equations are

  • IT0=X1¯.X0¯=NAND(X1¯,X0)IT0=\overline{\overline{X1}.X0}=NAND(\overline{X1},X0)

  • IT3=NOT(IT1)IT3=NOT(IT1)

  • IT4=X1.X0¯¯=NAND(X0).X1¯IT4=\overline{X1.\overline{X0}}=NAND(X0).\overline{X1}

  • IT7=NOT(IT4)IT7=NOT(IT4)

  • IT8=X1¯+X0¯=NAND(X1,X0)IT8=\overline{X1}+\overline{X0}=NAND(X1,X0)

  • IT9=X1¯.X0¯=NOR(X1,X0)IT9=\overline{X1}.\overline{X0}=NOR(X1,X0)

4 NOT gates are needed (X0¯,X1¯\overline{X0},\overline{X1}, IT3 and IT4), together with 3 Nand and 1 Nor gates to control the inputs. The total transistor count is 8 (NOT) + 16 (Nand and Nor) + 10 (Fig. 2) = 34 T.

TABLE IV: Controling transistors in Fig. 2
X1 X0 IT0 IT3 IT4 IT7 IT8 IT9
0 0 1 0 1 0 1 1
0 1 0 1 1 0 1 0
1 0 0 1 1 1 1 0
1 1 1 0 1 0 0 0

V-B3 Encoder of Fig. 3

The inputs of transistors T0, T5, T6, T7, T8 and T9 should be controled. p transistors are on when the input is 0 and n transistors are on when the input is 1. The corresponding truth table is shown in Table V. The corresponding equations are

  • IT0=X1¯.X0¯=NAND(X1¯,X0)IT0=\overline{\overline{X1}.X0}=NAND(\overline{X1},X0)

  • IT5=X1.X0¯¯=NOR(X1¯,X0)IT5=\overline{X1.\overline{X0}}=NOR(\overline{X1},X0)

  • IT6=NOT(IT5)IT6=NOT(IT5)

  • IT7=NOT(IT0)IT7=NOT(IT0)

  • IT8=X1¯+X0¯=NAND(X1,X0)IT8=\overline{X1}+\overline{X0}=NAND(X1,X0)

  • IT9=X1¯.X0¯=NOR(X1,X0)IT9=\overline{X1}.\overline{X0}=NOR(X1,X0)

4 NOT gates are needed (X0¯,X1¯\overline{X0},\overline{X1}, IT6 and IT7), together with 2 Nand and 2 Nor gates to control the inputs. The total transistor count is 8 (NOT) + 16 (Nand and Nor) + 10 (Fig. 3) = 34 T.

TABLE V: Controling transistors in Fig. 3
X1 X0 IT0 IT5 IT6 IT7 IT8 IT9
0 0 1 0 1 0 1 1
0 1 0 0 1 1 1 0
1 0 1 1 0 0 1 0
1 1 1 0 1 0 0 0

V-B4 Transistor count for encoder and decoder circuits for quaternary to binary interfaces

The transistor count is

  • 28/15 (decoder) + 16 (encoder) = 44/31 T for the first implementation (subsection V-B1), according to the implementation of the Xor gate. The transistor count is the same for the third implementation with 3 supply voltages (subsection V-B3).

  • 28/15 (decoder) + 34 (encoder) = 62/49 T for the two subsequent implementations (subsections V-B2 and V-B3) with a single-supply.

V-C 1-digit quaternary adder using a binary adder

There are many different ways to implement binary adders. They differ on the use or not of transmission gates. It is out of the scope of this paper to present all the possible implementations. Fig. 8 presents two typical implementations of a full adder. The left part only uses Nand gates. The right part uses Xor and Nand gates. A CNTFET 8 T full adder (Fig. 9) has been presented [5]. This adder doesn’t restore levels and using it could raise issues, both for noise margins and switching times due to series of pass transistors. The transistor counts are respectively 36 T, 18 T and 8 T. The quaternary adder uses two binary adders, one encoder and one decoder circuits. Using 2-bit carry propagate adders, the overall transistor count for the 3 power supplies version is thus:

  • 72 + 44 = 116 T without using pass transistors

  • 36 + 31 = 67 T when using pass transistors for Xor gates

  • 16 + 31 = 47 T when using pass transistors for Xor gates and the 8T binary adder (Fig. 9)

The single-supply version would use more transistors (+ 18 T).

Refer to caption

Figure 8: Binary full adders

Refer to caption

Figure 9: 8T binary full adder

V-D N-digit quaternary adders

Using quaternary interfaces and 2N-bit adders, N-digit quaternary adders can be implemented. CPAs, CLAs and CSAs implementations are discussed in section VIII.

VI Quaternary adders presented in [1]

These adders are based on the following approach: Qs(inputs)=3.f3(inputs)+2.f2(inputs)+1.f1(inputs)Qs(inputs)=3.f3(inputs)+2.f2(inputs)+1.f1(inputs) where fi(inputs) is the binary function for which Qs = i. Any input must be decomposed according to Table VI. The corresponding circuit is shown in Fig. 10. It uses 18 T. For the half adder, according to the left part of Table II, the equations are
Sum=3.(A0.B3+A1.B2+A2.B1+A3.B0)+2.(A0.B2+A1.B1+A2.B0+A3.B3)+1.(A0.B1+A1.B0+A2.B3+A3.B2)Carry=1.(A1.B3+A2.B2+A2.B3+A3.B1.A3.B2+A3.B3)Sum=3.(A0.B3+A1.B2+A2.B1+A3.B0)\\ \hskip 28.45274pt+2.(A0.B2+A1.B1+A2.B0+A3.B3)\\ \hskip 28.45274pt+1.(A0.B1+A1.B0+A2.B3+A3.B2)\\ Carry=1.(A1.B3+A2.B2+A2.B3+A3.B1.A3.B2\\ \hskip 28.45274pt+A3.B3)

The half adder circuit is presented in Fig. 11. With 2 input decoders, the sum circuit and the carry circuit, the transistor count is 87 T. The corresponding full adder presented in [1] has a quaternary carry input. While this could be useful for designing compressors used in multiplier reduction trees, it is useless for usual N-digit adder in which carry input and output have binary values. In Fig. 12, we present a modified version in which binary carries are used. The half adder implements the H function, defined as H = (A+B) mod 4. A modified half adder implements Sum = H + C. It has the decoded values of quaternary input H (provided by the Q-dec shown in Fig. 10) and the binary carry input. The corresponding scheme is shown in Fig. 13. With one Q-Dec (H), one NQI inverter + one binary inverter to generate C0 and C1, it has 8 T + 4 T + 28 T = 40 T while the sum part of the quaternary half-adder has 52 T. The carry generator circuit is based on the following observations:

  • 0A+B+Cin70\leq A+B+Cin\leq 7

  • Cout=0 iff A+B+Cin<4 and Cout = 1 iff A+B+Cin>3

  • Cout=0 if (Cin= 0 and A+B<4) or (Cin=1 and A+B < 3)

  • The correspondance between H = (A+B) mod. 4 and A+B is given in Table VII

  • From Table VII, Cout¯=H0.A0+H1.Ai+H2.A3+Cin¯.H3¯)\overline{Cout}=H0.A0+H1.Ai+H2.A3+\overline{Cin}.\overline{H3})

The corresponding carry generator circuit is shown in Fig. 14.

The complete modified quaternary adder has 52 T (sum part of QHA) + 40 T (sum part of modified QHA) + 19 T (carry circuit) = 111 T. This number is minimal, as the minimal number of Q-DEC is used, assuming that there are no fan-out or routing issues.

TABLE VI: Decoding of quaternary inputs [1]
I I0 I1 I1¯\overline{I1} Ii l2¯\overline{l2} I2 I3
0 3 0 3 3 3 0 3
1 0 3 0 3 3 0 3
2 0 0 3 0 0 3 3
3 0 0 3 0 3 0 0
TABLE VII: Carry out computation
Cin A+B H Cout Cin A+B H Cout
0 0 0 0 1 0 0 0
0 1 1 0 1 1 1 0
0 2 2 0 1 2 2 0
0 3 3 0 1 3 3 1
0 4 0 1 1 4 0 1
0 5 1 1 1 5 1 1
0 6 2 1 1 6 2 1

Refer to caption

Figure 10: Complete quaternary decoder presented in [1]

Refer to caption

Figure 11: Half adder presented in [1]

Refer to caption

Figure 12: Modified full adder from [1]

Refer to caption

Figure 13: Modified full adder- Sum circuit

Refer to caption

Figure 14: Modified full adder- Carry circuit

VII MUX based quaternary adders

The MUX based implementation is based on the observation of the quaternary half adder truth table (left part of Table II when Ci = 0). When A = 0 then QS = B. When A=1 then QS = (B+1) mod 4 (successor B). When A=2, QS = (B+2) mod 4 (2nd level successor B). When B=3 then QS = (B-1) mod. 4 (predecessor B).

VII-A Quaternary adder derived from [2]

The quaternary half adder presented in [2] uses the decoder circuits and the muxes presented in Fig. 15. The QTG circuits are used to implement the successor and predecessor functions. Transistor counts for QDEC and QMUX are both 16 T. The half adder based on QDEC and QMUX is presented in Fig.16. The transistor counts are

  • For QS, there are 4 QTGs and 2 QDECs for a total of 16 T*6 = 96 T.

  • For QCarry, there are 6 inverters, 6 transistors and 1 QTG for a total of 12 + 6 + 16 = 32 T.

  • The half adder has 128 T.

The corresponding full adder is not presented in [2]. However, the full adder can be easily derived. The half adder (Fig. 16) corresponds to C=0. To compute QSUM1 corresponding to C=1, only two more QTGs are needed. The final sum is derived from QSUM0 and QSUM1 by using two transmission gates and one inverter. A similar technique is used to compute the carry output, as shown in Fig. 18. Only one more QTG and two transmission gates are needed. The overall transistor count for the full adder is 96 + 32 + 6 + 16 + 4 = 154 T.

Refer to caption

Figure 15: Decoders and Muxes from [2]

Refer to caption

Figure 16: Half adder presented in [2]

Refer to caption

Figure 17: Full adder sum output derived from [2]

Refer to caption

Figure 18: Full adder carry output derived from [2]

VII-B Quaternary adders presented in [3]

These adders also use MUXes, but implement the successor, second level successor and predecessor circuits as separate blocks. Basically, the half adder presented in Fig. 19 is similar to the half adder of Fig. 16. The corresponding full adder, presented in Fig. 20, also use the same approach than the full adder of Fig. 17 and Fig. 18. Two versions are presented, with one and three power supplies. The different components are

  • QMUX 4:1 is shown in Fig. 21. It has 12 T.

  • QMUX (not shown) is simplier with only 6T.

  • The successor circuit with 3 power supplies is shown in Fig. 22. It has 6 T. The second level successor predecessor circuits (not shown) have also 6 T. The transistor count for the 3 circuits is 18 T.

  • The successor circuit with 1 power supply is shown in Fig. 23. It has 13 T. The second level successor and the predecessor circuits (not shown) have respectively 12 T and 17 T. The transistor count for the 3 circuits is 42 T.

  • Inverters are needed for NQI(B)¯,IQI(B)¯,PQI(B)¯\overline{NQI(B)},\overline{IQI(B)},\overline{PQI(B)}, NQI(S_QHA)¯,IQI(S_QHA)¯\overline{NQI(S\_QHA)},\overline{IQI(S\_QHA)}, PQI(S_QHA)¯\overline{PQI(S\_QHA)}.

    • 3 power supplies: If the B inverters drive the different subblocks, the fan-out are respectively 10, 6 and 8. Only 3 inverters (6 T) are needed, but there could be fan-out and routing issues. If different B inverters are used for each subblocks, there are 12 inverters (24 T).

    • 1 power supply: If the B inverters drives the different subblocks, the fan-out are respectively 11, 9 and 11. There are 3 inverters (6 T). With different inverters for each subblock, there are 12 inverters (24 T).

The overall transistor count is given in Table VIII. Obviously, the 3 power supplies version is more efficient than the version presented in [2]: customizing the implementation of the successor and predecessor functions reduces the transistor count versus using 4-valued MUXes. The 1-power supply version has far more transistors.

Refer to caption

Figure 19: Half adder presented in [3]

Refer to caption

Figure 20: Full adder presented in [3]

Refer to caption

Figure 21: QMUX 4:1 presented in [3]

Refer to caption

Figure 22: Three voltages successor curcuit presented in [3]

Refer to caption

Figure 23: Single power supply successor circuit [3]
TABLE VIII: Transistor count for quaternary adder [3]
S-HA C-HA SFA CFA Inverters Total
3 supplies 30 14 12 20 6/24 82/100
1 supply 54 14 36 20 6/24 130/148

VIII Carry Look Ahead and Carry Skip Adders

We now compare the carry computation for a 8-bit and 4-digit CLA and CSA adders. The binary computation is decomposed in two 4-bit blocks. The quaternary computation only uses one block.

VIII-A Carry-Look Ahead Adders

Fig. 24 presents a 4-bit carry look-ahead adder. The binary equations of the carry computation part are well-known:
Gi=Ai.BiGi=Ai.Bi
Pi=AiBiPi=Ai\oplus Bi (or Pi=Ai+Bi)Pi=Ai+Bi)
C1=G0+P0.C0C1=G0+P0.C0
C2=G1+G0.P1+P0.P1.C0=G2+P1(G0+P0C0)C2=G1+G0.P1+P0.P1.C0=G2+P1(G0+P0C0)
C3=G2+G1.P2+G0.P1.P2+P0.P1.P2.C0=G2+G1P2+P2P1(G0+P0C0)C3=G2+G1.P2+G0.P1.P2+P0.P1.P2.C0=G2+G1P2+P2P1(G0+P0C0)
C4=G3+G2.P3+G1.P2.P3+P1.P2.P3(G0+P0C0)C4=G3+G2.P3+G1.P2.P3+P1.P2.P3(G0+P0C0)
Binary Gi and Pi functions are implemented respectively by Nand + Inverter and Nor + inverter. Both functions use 6 T. The optimal implementation of C1, C2, C3 and C4 uses a complex gate + one inverter. The transistor count for a 4-bit carry computation is given in Table IX. For quaternary adders, the binary G and P functions for any bit j are:
G=((A=3)(B1))+((A2)(B2))+((B=3)(A1))G=((A=3)\wedge(B\geq 1))+((A\geq 2)\wedge(B\geq 2))+\\ \hskip 28.45274pt((B=3)\wedge(A\geq 1))
P=A3.B1+A2.B2+A1.B3P=A3.B1+A2.B2+A1.B3

According to Table VI, the equations can be reformulated as
G=(A3+B0).(Ai+Bi).(B3+A0)¯G=\overline{(A3+B0).(Ai+Bi).(B3+A0)}
P=A3.B1¯.A2.B2¯.A1.B3¯¯P=\overline{\overline{A3.B1}.\overline{A2.B2}.\overline{A1.B3}}
where A0 and B0 are the outputs of NQI inverters, Ai and BI are the outputs of IQI inverters, A3 and B3 are the outputs of PQI inverters and A1, A2, B1 and B2 are the outputs of the circuit shown in Fig. 10. Assuming that all these values are available, the transistor count is 12 T for G and 16 T for P. For 4 digits, the equations are similar with different implementations of Gi and Pi functions. The transistor count for a 4-digit carry computation is given in Table X.

TABLE IX: Transistor count for the carry computations of a 8-bit CLA
Function Gi Pi C1 C2 C3 C4 4-bit 8-bit
T. count 24 24 8 12 16 20 104 208
TABLE X: Transistor count for the carry computations of a 4 digit CLA Quaternary Adder
Function Gi Pi C1 C2 C3 C4 4 quaternary digits
T. count 48 64 8 12 26 20 168

The transistor count is better for the carry computation of quaternary adders versus binary ones. The increase cost of Gi and Pi implementation is compensated by the reduced number of logical levels.

VIII-B Carry-Skip Adders

For an 8-bit CSA, the binary carry computation is composed of two 4-bit skip computations. For 4-bit, it means P1 to P4 functions, a 4-input And gate and a multiplexer. For a 4-digit CSA, the carry computation uses the same number of functions with the only difference in the implemention of Pi. The transistor counts are given in Table XI.

TABLE XI: Transistor count for the carry computations of 8-bit and 4-digit CSAs
Pi Nand+inverter Mux 4-bit CS 8-bit 4-digit CS
B 24 10 14 48 96
Q 64 10 14 88

Refer to caption

Figure 24: : A 4-bit binary carry look-ahead adder

Refer to caption

Figure 25: : A 4-bit binary carry skip adder

IX Comparing the different quaternary adders with binary adders

IX-A 1-digit quaternary adder versus 2-bit binary adder

Table XII summarizes the transistor count for the different quaternary adders:

  • QB adder corresponds to the binary implementation with binary to quaternary interfaces (section V). The different values correspond to the different ways to implement a binary full adder. The middle value is probably the most significant.

  • QFA [1] is the adder that was detailed in section VI.

  • QFA [2] is the adder that was detailed in section VII-A.

  • QFA [3] is the adder that was detailed in section VII-B.

With one power supply, interfacing a 2-bit adder with quaternary to binary interface is the best implementation. With 3 power supplies, there is no significant difference with the best MUX quaternary implementation. In both cases, the different quaternary adders have x2 or x3 the transistor count of a typical 2-bit binary adder.

TABLE XII: Transistor count for 1-digit quaternary adders
P. Supply QB adder QFA [1] QFA [2] QFA [3] 2-bit FA
1 134/85/65 111 148/130 72/36/16
3 116/67/47 154 100/82

IX-B 4-digit quaternary adders versus 8-bit binary adders

Table XIII and Table XIV summarize the transistor count for the different implementations of 4-digit quaternary adders to be compared with a 8-bit binary adder. Within these tables,

  • First column is the adder type.

  • Second column is the quaternary adders built from a 8-bit binary adder with 4-to-2 decoders and 2-4 encoders. The three values correspond to 1) implementation without pass transistor, 2) a conventional implementation with pass transistors and 3) a debatable option where the Xor implementation could raise noise and switching issues. The second value is the most trustable one.

  • Third column in Table XIII corresponds to the straigthforward implementation according to the quaternary functions using 1 power supply.

  • Fourth column in Table XIV corresponds to quaternary adders (3 power supplies) using Muxes.

  • Fifth column corresponds to implementations with Muxes and customized successor and predecessor circuits.

  • The last column presents the transistor count for the binary implementation. While this implementation only uses one power supply, it is included to Table XIV for the comparisons.

TABLE XIII: T. count for 4-digit quaternary adders - 1 power supply
QB adders [1] adder [2] adder [3] adder 8-bit adder
CPA 536/340/260 444 592/520 288/144/64
CLA 784/588/508 612 760/688 496/352/272
CSA 632/436/356 532 680/608 384/240/160
TABLE XIV: T. count for 4-digit quaternary adders - 3 power supplies
QB adders [1] adder [2] adder [3] adder 8-bit adder
CPA 464/268/188 616 400/328 288/144/64
CLA 672/476/396 784 568/496 496/352/272
CSA 560/436/284 704 488/416 384/240/160

Some significant results can be derived from Table XIII and Table XIV.

  • With only one power supply, the direct interfacing of a binary adder with 4-2 decoders and 2-4 encoders is the best implementation with the smallest transistor count.

  • With three power supplies, only the implementation proposed in [3] can compete with the interfacing of binary adders. We can notice than the transistor count for this implementation is optimistic as it implies that the minimal number of NQI, IQI and PQI inverters can be used without fan-out and connection issues. All the other implementations are outperformed by the direct interfacing of binary adders.

  • Obviously, the best quaternary adder is outperformed by the binary adder computing the same amount of information. This binary adder is included in the best quaternary adder, while the interfacing decoder and encoder circuits are a significant overhead.

Quaternary adders are specific combinational circuits. They have some drawbacks. Either they use three power supplies instead of one for binary circuits, or they exhibit static power dissipation and degraded switching times when using only one power supply. However, the main point is that the best implementation of quaternary adders consists in interfacing binary adders with 4 to 2 decoder and 2 to 4 encoder circuits. It means that there is no advantage to try to directly implement quaternary combinational functions. To summarize, the best quaternary adder with N digits is the corresponding 2N binary adder with a significant overhead: decoder and encoder circuits.

X Concluding remarks

Most presented implementations of ternary or quaternary circuits claim advantages of multiple valued circuits. The following quote summarizes the arguments that may be found in most MVL papers : “MVL circuits have potential advantages. Using MVL circuits reduces the complexity of interconnection via reducing the number of wires since each wire carries more than one digit of data. Power consumption and area of the MVL circuits are generally less than the corresponding binary circuits due to the reduction in number of active elements [8].

How does our results fit with these claims ? It is obvious that a N digit quaternary adder has less input and output digits than a 2N bit binary adder. But we have shown that the best N-digit quaternary adder includes the corresponding 2N bit binary adder with the overhead of input decoder and output encoder circuits. According to Table XIII and Table XIV, the best 4-digit quaternary adder has more than 2.5x the transistor count of 8-bit binary adders. These transistor must be interconnected: it means that the quaternary adders have far more connections than the binary adders as soon as the internal connections are considered. As a matter of facts, is there an “interconnection wall" in digital circuits as the well-known “power wall" and a “memory wall"?. The answer is no, even in there could be interconnection isssues in circuits such as FPGAs. While the up-to-date CMOS technological nodes are more and more costly, they have more and more interconnection layers. Twenty years ago, the 180 nm node had 6 metal layers. To-day, the number of metal layers in nano-CMOS technologies usually ranges from 8 to 15, with a trade-off between integration and cost.

It is difficult to believe that x2.5 more transistors could lead to a reduction of chip area and power dissipation. More transistors means more chip area and more power dissipation. It turns out that the assumptions of the quote are false, at least for using MVL techniques for combinational circuits such as adders, multipliers, etc.

MVL circuits are confined to a small niche [8] To the best of my knowledge, there are to-day only two significant applications of MVL circuits:

  • Reducing the number of interconnects with multiple levels is used in amplitude modulation: for instance, PAM-4 coding [9], that uses 4 levels to code 2 bits is adopted for high-speed data transmission (IEEE802.3bs). PAM-8 and PAM-16 have also been defined

  • 4-valued (MLC) flash memories store two bits per cell. 8-valued (TLC) memories store 3 bits per cell. However, these M-valued circuits (M=2n2^{n}) are used for higher density, not for higher speeds.

Trying to design MVL combinational circuits to compete with binary ones looks like a dead-end.

References

  • [1] S.A. Ebrahimi,M.R. Reshadinezhad, A. Bohlooli, M. Shahsavari, “Efficient CNTFET-based design of quaternary logic gates and arithmetic circuits", Microelectronics Journal, pp. 156-166, January 2016
  • [2] M.H. Moaiyeri, K. Navi, O. Hashemipour, “Design and Evaluation of CNFET-Based Quaternary Circuits", Circuits Syst Signal Process (2012) 31 pp.1631-1652, DOI 10.1007/s00034-012-9413-2
  • [3] E. Roosta and S. A. Hosseiny, “A Novel Multiplexer-Based Quaternary Full Adder in Nanoelectronics", Circuits, Systems and Signal Processing, https://doi.org/10.1007/s00034-019-01039-8
  • [4] G. Hills, C. Lau, A. Wright et al. “Modern microprocessor built from complementary carbon nanotube transistors", Nature 572, pp. 595-602 (2019) doi:10.1038/s41586-019-1493-8
  • [5] K.K. Nehru, T. Nagarjuna and G. Vijay, “Comparative Analysis of CNTFET and CMOS Logic Based Arithmetic Logic Unit", in Journal of Nano and Electronic Physics 9 (4), January 2017
  • [6] vlsitechnology, http://www.vlsitechnology.org/html/cells/vxlib013/xor2.html
  • [7] W. Haixia, Z. Shunan, S. Zhentao, Q. Xiaonan, and C. Yueyang, “Design of low-power quaternary flip-flop based on dynamic source-coupled logic,” in Proceedings of 2011 International Conference on Electronics, Communications and Control (ICECC), 2011, pp. 826-828.
  • [8] D. Etiemble, “Why M-Valued Circuits are restricted to a Small Niche”, in Journal of Multiple Valued Logic and Soft Computing, Vol. 9, No1, 2003.
  • [9] Intel,“ PAM4 Signaling Fundamentals", https://www.intel.com/content/ dam/www/programmable/us/en/pdfs/literature/an/an835.pdf