## Arithmetic Circuits



## Review: 2's Complement



8-bit 2's complement example:

$$
11010110=-2^{7}+2^{6}+2^{4}+2^{2}+2^{1}=-128+64+16+4+2=-42
$$

If we use a two's-complement representation for signed integers, the same binary addition procedure will work for adding both signed and unsigned numbers.

By moving the implicit "binary" point, we can represent fractions too:

$$
1101.0110=-2^{3}+2^{2}+2^{0}+2^{-2}+2^{-3}=-8+4+1+0.25+0.125=-2.625
$$

## Binary Addition

Here's an example of binary addition as one might do it by "hand":


Then we can cascade them to add two numbers of any size...


## Designing a Full Adder: From Last Time

1) Start with a truth table:

| $C_{i}$ | $A$ | $B$ | $C_{0}$ | $S$ |
| :--- | :--- | :--- | :--- | :--- |
| 0 | 0 | 0 | 0 | 0 |
| 0 | 0 | 1 | 0 | 1 |
| 0 | 1 | 0 | 0 | 1 |
| 0 | 1 | 1 | 1 | 0 |
| 1 | 0 | 0 | 0 | 1 |
| 1 | 0 | 1 | 1 | 0 |
| 1 | 1 | 0 | 1 | 0 |
| 1 | 1 | 1 | 1 | 1 |

3)Simplifing a bit

$$
\begin{array}{ll}
C_{o}=C_{i}(A+B)+A B & C_{o}=C_{i}(A \oplus B)+A B \\
S=C_{i} \oplus A \oplus B & S=C_{i} \oplus(A \oplus B)
\end{array}
$$

## For Those Who Prefer Logic Diagrams ...

$$
\begin{aligned}
& C_{o}=C_{i}(A \oplus B)+A B \\
& S=C_{i} \oplus(A \oplus B)
\end{aligned}
$$

- A little tricky, but only 5 gates/bit



## Subtraction: $A-B=A+(-B)$

Using 2's complement representation: $-B=\sim B+1$


So let's build an arithmetic unit that does both addition and subtraction. Operation selected by control input:


## Condition Codes

Besides the sum, one often wants four other bits of information from an arithmetic unit:
$Z$ (zero): result is $=0$
big NOR gate
$N$ (negative): result is < 0
$S_{N-1}$
$C$ (carry): indicates that add in the most significant position produced a carry, e.g.,
" $1+(-1)$ "
from last FA
$V$ (overflow): indicates that the answer has too many bits to be represented correctly by the result width, e.g., " $\left(2^{i-1}-1\right)+\left(2^{i-1}-1\right)$ "

$$
\begin{gathered}
V=A_{N-1} B_{N-1} \bar{N}+\bar{A}_{N-1} \bar{B}_{N-1} N \\
\text {-or- } \\
V=C O_{N-1} \oplus C I_{N-1}
\end{gathered}
$$

To compare $A$ and $B$, perform $A-B$ and use condition codes:

Signed comparison:

| LT | $\mathrm{N} \oplus \mathrm{V}$ |
| :--- | :--- |
| LE | $\mathrm{Z}+(\mathrm{N} \oplus \mathrm{V})$ |
| EQ | Z |
| NE | $\sim \mathrm{Z}$ |
| GE | $\sim(\mathrm{N} \oplus \mathrm{V})$ |
| GT | $\sim(\mathrm{Z}+(\mathrm{N} \oplus \mathrm{V}))$ |

Unsigned comparison:
LTU C
LEU C+Z
GEU ~C
GTU ~ (C+Z)

TPD of Ripple-Carry Adder


Worse-case path: carry propagation from LSB to MSB, e.g., when adding 11... 111 to 00...001.

$$
t_{P D}=(\underbrace{t_{P D, X O R}+t_{P D, A N D}+t_{P D, O R}}_{A_{O}, B_{0} \text { to } C O_{0}})+\underbrace{(N-2)^{*}\left(t_{P D, O R}+t_{P D, A N D}\right.}_{C l \text { to } C O})+\underbrace{t_{P D, X O R}}_{C I_{N-1} \text { to } S_{N-1}} \approx \Theta(N) \underbrace{\text { Cl }}_{C O}
$$

$\Theta(N)$ is read "order $N$ " and tells us that the latency of our adder grows in proportion to the number of bits in the operands.

## Faster Carry Logic

Let's see if we can improve the speed by first "rewriting" and then "reinterpreting" the equations for $C_{\text {out: }}$

$$
\begin{aligned}
& C_{\text {OUT }}=A B+A C_{\mathbb{I N}}+B C_{\mathbb{I N}} \\
&=A B+(A+B) C_{\mathbb{I N}} \\
&=G+P C_{I N} \quad \text { where } G=A B \text { and } P=A+B \\
& \text { generate propagate }
\end{aligned}
$$

To generate the Carry of the $\mathrm{N}^{\text {th }}$ bit:

$$
\begin{aligned}
C_{N} & =G_{\mathrm{N}-1}+P_{\mathrm{N}-1} C_{N-1} \\
& =G_{\mathrm{N}-1}+P_{\mathrm{N}-1} G_{\mathrm{N}-2}+P_{\mathrm{N}-1} P_{\mathrm{N}-2} C_{\mathrm{N}-2} \\
& =\underbrace{G_{\mathrm{N}-1}+P_{\mathrm{N}-1} G_{\mathrm{N}-2}+P_{\mathrm{N}-1} P_{\mathrm{N}-2} G_{\mathrm{N}-3}+\ldots+P_{\mathrm{N}-1} \ldots P_{o} C_{\mathrm{IN}}}_{C_{\mathrm{N}} \text { in only } 3 \text { levels of logic! }} \\
& 1 \text { for P/G generation, } 1 \text { for ANDs, } 1 \text { for final OR }
\end{aligned}
$$

Actually, P was can be either $A+B$ or $A \oplus B$, because the $G=A B$ term of $C_{\text {OUT }}$ handles the only case where they differ.


## N-Bit Addition in Constant Time?

So if we had ( $\mathrm{N}+1$ )-input gates and didn't mind a lot of loading on the $P$ signals, the propagation delay of adder built using P/G equation to compute $C_{\mathbb{I N}}$ of each bit would be:

## 4 gate delays $\approx O(1)$ (independent of $N$ )

Recall large fan-in gates (many inputs) are implemented using trees (see last lecture). So for large N we expect more like $O\left(\log _{2} \mathrm{~N}\right)$ gate delays. This concept does lead to some interesting adder designs:

- faster ripple-carry implementations
- hierarchical carry-lookahead adders


## Carry-Lookahead Adders (CLA)

We can build a hierarchical carry-lookahead chain by generalizing our definition of the Carry Generate/Propagate (GP) Logic. We start by dividing our addend into two parts, a higher part, H , and a lower part, L. The GP function can be expressed as follows:

$$
\begin{array}{ll}
G_{H L}=G_{H}+P_{H} G_{L} \longleftarrow & \begin{array}{l}
\text { Generate a carry out if the high part generates one, } \\
\text { or if the low part generates one and the high part }
\end{array} \\
P_{H L}=P_{H} P_{L} & \begin{array}{l}
\text { propagates it. Propagate a carry if both the high } \\
\text { and low parts propagate theirs. }
\end{array}
\end{array}
$$



Hierarchical building block
$1^{\text {st }}$ level of lookahead


## 8-bit CLA (GP Generation)



We can build a tree of GP units to compute the generate and propagate logic for any sized adder. For a $2^{\mathrm{N}}$-bit adder, we need $2^{\mathrm{N}}-1$ GP units.

$$
C=\underbrace{G_{7}+P_{7} G_{6}+P_{7} P_{6} G_{5}+P_{7} P_{6} P_{5} G_{4}+\ldots}_{G_{7-0}}+\underbrace{P_{7} \ldots P_{0} C_{1 \mathbb{}}}_{P_{7-0}}
$$

## 8-bit CLA (Carry Generation)

Now, given a the value of the carry-in of the least-significant bit, we


$$
c_{j}=G_{j-i}+P_{j-i} c_{i}
$$




## 8-Bit CLA (Complete)



## Carry-Skip Adders

Idea: full P/G equations are complicated, but $P$ by itself is simple. So just use P to "skip" carry across a block of ripple-carry adders:

(A) Carries ripple simultaneously through each block; if block generates a carry, it appears on carry-out of block (similar to G). If carry-in is $O$ at start of operation, no spurious carry-outs will be generated.
(B) If carry-in and $\mathrm{P}_{\text {BLock }}$ are both true, carry skips to next block
(C) Carry ripples though final block. $t_{P D}=2^{*}[K+(N / K-2)+K]$ With variable size blocks $t_{P D} \rightarrow O(s q r t(N))$

## Carry-Select Adders

Idea: do two additions, one assuming carry-in is $O$, the other assuming carry-in is 1 . Use MUX to select correct answer when correct carry-in is known.


Blocks on the left can be bigger (more bits) allowing more ripple time time while waiting for select
With one stage: $50 \%$ more gates, but twice as fast as ripple-carry With multiple (variable-size) blocks: $\mathrm{t}_{P D} \rightarrow O(\operatorname{sqrt}(\mathrm{~N}))$

## Adder Summary

Adding is not only a common, but it is also tends to be one of the most time-critical of operations. As a result, a wide range of adder architectures have been developed that allow a designer to tradeoff complexity (in terms of the number of gates) for performance.


A this point we'll define a high-level functional unit for an adder, and specify the details of the implementation as necessary.


## Shifting Logic

Shifting is a common operation that is applied to groups of bits. Shifting can be used for alignment, as well as for arithmetic operations.
$X \ll 1$ is approx the same as $2^{*} X$
$X \gg 1$ can be the same as $X / 2$
For example:

$$
X=20_{10}=00010100_{2}
$$

Left Shift:

$$
(x \ll 1)=00101000_{2}=40_{10}
$$

Right Shift:

$$
(x \gg 1)=00001010_{2}=10_{10}
$$

Signed or "Arithmetic" Right Shift:

$$
(-x \gg 1)=\left(1110110 O_{2} \gg 1\right)=1111011 O_{2}=-1 O_{10}
$$



## More Shifting



## Barrel Shifting



Barrel Shifting with a Twist
At this point it would be straightforward to construct a "Right barrel shifter" unit. However, a simple trick that enables a left shifter to do both.


## Boolean Operations

We also need to perform logical operations on groups of bits.
Which ones?
ANDing is useful for "masking" off groups of bits. ex. 10101110 \& $00001111=00001110$ (mask selects last 4 bits)
ANDing is also useful for "clearing" groups of bits. ex. 10101110 \& $00001111=00001110$ ( 0 's clear first 4 bits)
ORing is useful for "setting" groups of bits. ex. 10101110 | $00001111=10101111$ ( 1 's set last 4 bits)

XORing is useful for "complementing" groups of bits. ex. $10101110^{\wedge} 00001111=10100001$ (l's complement last 4 bits)
NORing is useful.. Uhm, because John Hennessy says it is! ex. $\sim(10101110 \mid 00001111)=01010000$ ( 0 's complement, 1 's clear)

## Boolean Unit (The obvious way)

It is simple to build up a Boolean unit using primitive gates and a mux to select the function.

Since there is no interconnection between bits, this unit can be simply replicated at each position. The cost is about 7 gates per bit. One for each primitive function, and approx 3 for the 4-input mux.


This is a straightforward, but not too elegant of a design.

## Cooler Bools

We can better leverage a mux's capabilities in our Boolean unit design, by connecting the bits to the select lines.
Why is this better?

1) While it might take a little logic to decode the truth table inputs, you only have to do it once, independent of the number of bits.
2) It is trivial to extend this module to support any 2-bit logical function. (How about NAND, John? Actually A \& /B might be more useful)


## An ALU, at Last

We give the "Math Center" of a computer a special name-the Arithmetic Logic Unit. For us, it just a big box!


