Título/s: | An integrated circuit realization for a piecewise linear function |

Autor/es: | Di Federico, Martín; Jiménez-Fernández, Víctor M.; Julián, Pedro Marcelo; Agamenoni, Osvaldo; Hernández-Martínez, Luis; Samiento-Reyes, Arturo |

Institución: | Universidad Nacional del Sur. Bahía Blanca, AR INTI-Centro de Micro y Nano Electrónica del Bicentenario. CMNB. Buenos Aires, AR Instituto Nacional de Astrofísica, Óptica y Electrónica. INAOE. Puebla, MX |

Editor: | s.e. |

Palabras clave: | Circuitos integrados; Funciones; Variables |

Idioma: | eng |

Fecha: | 2007 |

Ver+/- XII Reunio´n de Trabajo en Procesamiento de la Informaci o´n y Control, 16 al 18 de octubre de 2007
An integrated circuit realization for a piecewise linear function Martı´n Di Federico, Vı´ctor M. Jime´nez-Ferna´ndez, Pedro Marcelo Julia´n and Osvaldo Agamenoni† Luis Herna´ndez-Matı´nez and Arturo Sarmiento-Reyes‡ †Universidad Nacional del Sur, Bahı´a Blanca, Argentina mdife@uns.edu.ar ‡Instituto Nacional de Astrofı´sica, ´Optica y Electro´nica, Me´xico. luish@inaoep.mx Abstract— In this paper we present an integrated circuit (IC) realization for a three dimensional piece- wise linear (PWL) function. The IC is designed and fabricated in a standard CMOS 0.5 µm technology. It includes three analog or 8 bit-coded inputs. The out- put of the circuit is a digital word with 8-bit precision which represents the value of the PWL function at the three-dimensional input. Programmability is consid- ered in the chip architecture. The PWL function is programmed in an external 4kB RAM memory ad- dressed by a 12-bit word. Keywords— Piecewise linear function, inte- grated circuit realization, circuit architecture I Introduction In recent papers [1], [2] two different (analog and mixed- signal,respectively) implementations of the PWL approx- imation technique proposed by Julia´n et al. has been pro- posed. In particular, the circuit architecture proposed in [2] provides a piecewise-linear inputs-output relationship based on a weighted sum of the so-called α-functions which are defined over a domain partitioned by simplices. Each α-function is of a local nature, since it is different from “0” only over a reduced number of simplices of the domain. As a consequence, the value of the approximate PWL function can be obtained, for any n-dimensional input vector x, by combining a limited subset of the basis functions weighted by their corresponding coeffi- cients [3], [4]. Then all basis functions perform basically the same operation and the difference between two basis functions is that they operate over two different regions of the domain. Therefore, the evaluation can be done using only one function circuit block and an algorithm to shift the inputs [5], [6], [7]. For every evaluation point, all nonzero basis functions need to be evaluated, weighted and added. This principle has been considered in the ar- chitecture of the IC presented in this paper. II Mathematical background Let us consider a domain S subdivided with a simplicial partition H using a grid step δ. It produces a set of vertices Vs = {v ∈ Rn : vi = −1 +mi × δ, i = 1, · · · , n} (1) where 0 ≤ mi ≤ m, and m × δ = 2. The grid step δ is the size of the division on every coordinate axis. Also, let us consider a family of PWL functions defined over the simplicial partition. It constitutes a linear vector space PWLH(S) whose dimension is q = (m+ 1)n. Any function F ∈ PWLH [S] can be expressed in vecto- rial form as F (x) = cTΛ(x) (2) where c ∈ Rq is the so called vector of parameters, Λ = [α1, · · · , αq], and αi, for i = 1, · · · , q, is a PWL basis function. In the formulation under consideration, each basis function αi ∈ PWLH [S] is defined as αi(vj) = { 1, if i = j 0, if i = j } (3) where vj ∈ VS , for every i = 1, · · · , q. It has been shown in [5], that any point x ∈ S can be decomposed as x = r∑ l=1 µilvil (4) where the terms in the expansion satisfy 0 ≤ µil ≤ 1, vil ∈ VS , for every l = 1, · · · , r, with r = n + 1, and∑r il µil = 1. Also in [5], it has been proved, that any function of the basis given by (3) satisfies{ αi(x) = µil , for l = 1, · · · , r αp(x) = 0, for p = {i1, · · · , ir} } (5) where x ∈ S is a point in the form of (4). The evaluation of function (2) at a point (4) gives a result F (x) = q∑ i=1 ciαi ( r∑ l=1 µilvil ) (6) As F is linear inside simplex S i, then F (x) =∑q i=1 ci ∑r l=1 µilαi(vil ), and after exchanging the sumation terms, we have F (x) = r∑ l=1 µil q∑ i=1 ciαi(v)il (7) XII Reunio´n de Trabajo en Procesamiento de la Informaci o´n y Control, 16 al 18 de octubre de 2007
Finally, if we consider the relation given in (3), then (7) reduces to F (x) = r∑ l=1 cilµil (8) From this equation, we observe that in order to calculate the value of F (x) at the input x, we need to determine the a weighted sum that involves the parameters c i) and µil, for l = 1, · · · , r. III Chip architecture The architecture of the IC presented in this paper, rep- resents the circuit implementation of eq.(8). Fig.1 is a block diagram which describes a general scheme for im- plementing eq.(8). x µ i-generator Vertex selector Internal vertex location Comparator Ramp ci-Selector Adder ciµi-adder F(x) Figure 1: Block diagram architecture for implementing eq.(8) In references [5] and [6], it was reported a proposal for obtaining the µ parameters. Such proposal consists in a set of comparators which comparate the input signals (x vector), with a ramp. This idea has been considered in the µi-generator block of Fig.1. Notice that input x has been decomposed in two subsets, any of them considers all the fuction domain and it selects a specific vertex, the other one, indicates a location inside of the selected vertex. It is important to point out that the µici-adder performs a µi times addition of ci. Finally, the output block F (x) indicates the value of the fuction F (·) at the input x. A Description In Fig.2 is shown the circuit implementation for the scheme of Fig.1. The output of the IC is a digital word with 8-bit precision. In the present version of the IC, the memory was left outside. There are two alternatives to load the input values into the chip. The first alternative is by presenting three analog values at three input pins. There are three comparators which compare the input sig- nals with an analog ramp and latch the conversion. The second alternative is to load directly the digital values se- rially. In both cases, the inputs are stored in 8-bit regis- ters. The four most significant bits of the inputs are used to select the simplex the input belongs to and the four less significant bits indicate the input position inside the simplex. The weighting coefficients ck are kept in the ex- ternal memory, which is addressed with a 12-bit word. The value of the PWL functionF (x) at each input x is the weighted sum of n + 1 = 4 parameter values. The four addresses to the memory positions where the coefficients cj (j ∈ z) are stored are obtained by comparing the values of a digital ramp with the four less significant bits (LSB) of the 8-bit registers. This ramp is implemented with a 4-bit digital counter. Each 12-bit address is ob- tained by juxtaposing n = 3 4 bit strings. The i-th string is equal to the four most significant bits (MSB) of the i-th register if the counter count is greater than the four LSB of the register; otherwise, the i-th string is the value of the four MSB of the register plus one. The compari- son between the counter and each register is done using a digital comparator. Each address is calculated by a block called Address Generator, and the weighted sum is done with a 12-bit adder. The weighted sum ∑ j∈z cjµj is obtained for free, since the memory position of c j is ad- dressed by the Address Generator for a time proportional to µj . Then, it is sufficient the 12-bit adder to perform the whole weighted sum. B Architecture The IC has an analog block and a digital block; both are powered up from different sources to allow them working and being tested separately as shown in Fig.2 The analog block consists of three A/D converters, based on an exter- nal ramp and an OTA comparator. The analog ramp must be synchronized with the internal counter. The compara- tor output is used to latch the value of the input so that the A/D conversion is performed at the same time in the three input channels. The comparator outputs are connected to output pads and the Latch signals are connected to input pads. Therefore, an external signal can be used to latch the values in the registers. As was mentioned before, this alternative was used to obtain the experimental results of the IC. Figure 2: Chip Architecture. As shown in Fig.2, the digital Block consist of a control block, a 8-bit counter, a 12-bit adder and three sets of 8-Bit Register, Address generator and Comparator. Ap- XII Reunio´n de Trabajo en Procesamiento de la Informaci o´n y Control, 16 al 18 de octubre de 2007
Table I State EP PROC Nothing 1 0 Converting 0 0 Processing 0 1 propriately sized buffers were designed to drive the clock and clear lines. B 1 Control Block In order to perform the A/D conversion and the function evaluation, the chip has three different states called Noth- ing, Converting and Processing, which are coded with two registers in the Control block. In the Nothing state, the I/O bus works as an output bus and shows the value of the function calculated previously. When the input (say SP) is “1”, the state machine (FSM) goes to the Convert- ing state, to make the A/D conversion. The FSM stays in this state 256 clock cycles and after that, it goes into the Processing state. While the chip is making the A/D conversion, the signal to latch the value of the counter in the register is generated by the OTA. In the next state (Processing) it should be ensured that the signals to latch do not change, because the register would latch the new counter value. In order to avoid this, a multiplexer was placed in the input of the register which connects the out- put of the OTA in the Converting state, and sets a “1” in the latch signal in the Processing state. In the Processing state the I/O bus works as an input bus connected to the external RAM. In this state the chip performs the 16 addi- tions reading the PWL parameter values from the external memory. The two control signals EP (End of Processing) and PROC (Processing) provided by the FSM in each state are summarized in Table I. B 2 12-Bit Adder In order to produce the weighted sum, necessary to ob- tain the value of F (x), the adder adds the sixteen values from the memory and divide it by sixteen. In order to add 16 values of 8 bits, a 12-bit adder is needed; the divide- by-16 operation is easily done by taking only the 8 most significant bits. The 12-bit adder has 8 inputs, so that the 4 most significant bits are connected to “0”. The adder circuit is comprised of two modules, one calculates the carry, and another calculates the value of the sum. B 3 8-Bit Counter The 8Bit Counter is used for two different functions: To perform the A/D conversion and also to perform the ad- dition of the 16 sums of the values of the memory param- eters (ci). This block has a modular Structure and work with a two-phase clock. B 4 8-Bit Register Each register is Master-Slave with a two-phase clock, where the Master reads input data with a logic “1” in phase one and locks the data with a logic “0”. The slave works in a similar fashion but with the second phase. B 5 Comparator It compares the less significant bits of each input regis- ter with the digital ramp. Since the ramp is implemented with the 8-bit Counter the 4 LSB of the counter are con- nected to the comparator and also the 4 LSB of the input register. D0-D 3R0-R 3 Comp. Figure 3: Comparator. B 6 Address Generator The Address generator determines the address memory where the (ci) parameter can be found. The inputs of this circuit are the 4 MSB of the input register and the comparator outputs. If a comparator output is 0,then the corresponding address generator output is directly the four more significant bits of that specific input register. If a comparator output is 1, then the corresponding ad- dress generator output is given by the consecutive address memory. S0-S3D0-D3 IN Figure 4: Address generator. C Numerical example In order to clarify the mathematical background exposed in section II and the IC performance explained in the XII Reunio´n de Trabajo en Procesamiento de la Informaci o´n y Control, 16 al 18 de octubre de 2007
previous section, let us consider a hypothetical two- dimensional example. Suppose the continuous PWL fun- cion F (x1, x2) defined over a simplicial partition with m1 = m2 = 2 and a unitary grid step (δ = 1), as it is depicted in Fig.5. Figure 5: A two-dimensional PWL function. The value of the PWL funcion, ci = F (x1, x2) at the ver- tex points, is collected and stored into the RAM memory as it is summarized in Table1. i Vertex Memory Dir. ci = F (x1,x2) 0 (0, 0) 00000000 0 1 (0, 1) 00000001 2 2 (0, 2) 00000010 1 3 (1, 0) 00010000 0 4 (2, 0) 00100000 0 5 (1, 1) 00010001 1 6 (1, 2) 00010010 2 7 (2, 1) 00100001 2 8 (2, 2) 00100010 1 Table 1: ci = F (x1,x2) values. The evaluation of an arbitrary point, for instance, the point x = (1.5, 1.75) at the function F (x1, x2) is ob- tained as follows: As a first step, the point x is decomposed by[ 1.5 1.75 ] = 0.5 [ 2 2 ] + 0.25 [ 1 2 ] + 0.25 [ 1 1 ] Now, let us introduce the following notation: Xb = [BMSB .BLSB] = [B3B2B1B0.B−1B−2B−3B−4] where Xb indicates a 8 bits number decomposed in two sections separated by a point. BMSB and BLSB indicate the integer and fraction part of the number, respectively and Bn is the 2n-bit for n ∈ {3, 2, 1, 0,−1,−2,−3,−4}. The digital numerical code for x is given by[ 0001.1000 0001.1100 ] = .1000 [ 0010 0010 ] + .0100 [ 0001 0010 ] + .0100 [ 0001 0001 ] The BMSB and BLSB are in fact, the more and less sig- nificant bits of the 8-bits input register. Notice that the decomposed representation of the point x = (1.5, 1.75) can also be rewritten as[ 1.5 1.75 ] = 0.5 {[ 1 1 ] + [ 1 1 ]} + 0.25 {[ 1 1 ] + [ 0 1 ]} + 0.25 {[ 1 1 ] + [ 0 0 ]} where [1 1]T is a simplex selector term and it corre- sponds with the 4-bits more significant of the input register. In the digital format it is given by[ 0001.1000 0001.1100 ] = .1000 {[ 0001 0001 ] + [ 0001 0001 ]} + .0100 {[ 0001 0001 ] + [ 0000 0001 ]} + .0100 {[ 0001 0001 ] + [ 0000 0000 ]} In accordance with reference [6], from a purely mathe- matical point of view, the µil parameters are computed as µi3 = 0.5 µi2 = 0.25 µi1 = 1− (µi2 + µi3) = 1− (0.75) = 0.25 From a circuit point of view, the µil values indicate the times that a parameter must be added itself. Fig.6 shows the two comparator outputs for our example. Notice that µi3 takes 8 ramp cycles and µi2 = µi1, 4 cycles. Finally, according with equation (8), F (x) is computed by a weighted sum of the µilcil product terms, where cil indicates the value of F (x) at the il-th vertex. After substituting the value of F (x) from Table1, at the vertices [2 2]T , [1 2]T , and [1 1]T , it results F (x1, x2) = 0.5(1) + 0.25(2) + 0.25(1) = 1.25 In a digital format the evaluation of F (·) is obtained directly form the 12-bit adder output. It performs a XII Reunio´n de Trabajo en Procesamiento de la Informaci o´n y Control, 16 al 18 de octubre de 2007
000 0 100 0 110 0 111 1 Ramp x1 LSB x2 LSB Comparator µ i3 µ i2 µi1 Figure 6: µ parameters. µil times sum of the ci values indicated by the ad- dress generator. The memory directions to obtain the ci values are: DIR[00100010], DIR[00010010], and DIR[00010001]. For our example, the value of the PWL function is given by F (x1, x2) = 000000000001+ 000000000001+ 000000000001+ 000000000001+ 000000000001+ 000000000001+ 000000000001+ 000000000001+ 000000000010+ 000000000010+ 000000000010+ 000000000010+ 000000000001+ 000000000001+ 000000000001+ 000000000001 = [000000010100] As the adder result is scaled, then it must be divided by 16 in order to obtain the final result. Such result is obtained by considering the 8 more significant bits of the adder as the integer part of the final result and the 4 less significant as the fraction part. The evaluation of F (·) at the input (x1,x2) in a digital format is given by F (0001.1000, 0001.1100) = 00000001.0100 IV Layouts for the digital sections of the IC In this section we present the layouts for the digital sec- tions involved into the IC architecture. The layouts were designed aided by the Tanner EDA Tools software. The IC was integrated in a n-well non-silicided CMOS pro- cess of 0.5µm. This process has 3 metal layers and 2 poly layers. All the transistors of the digital part are min- imum size, being the PMOS of 3µm × 0.6µm and the NMOS of 1.8µm × 0.6µm. Fig.7 show the comparator layout. The size of this block is 114µm × 57µm. The 1-bit-adder block of the 12-bit-adder is shown in Fig.9. The selector layout of a 30µm× 114µm size is shown in Fig.8. Figure 7: Comparator layout. Figure 8: Selector layout. XII Reunio´n de Trabajo en Procesamiento de la Informaci o´n y Control, 16 al 18 de octubre de 2007
Figure 9: Adder layout. Figure 10: The IC. The main blocks of the circuit are evi- denced. V Conclusions It has been shown the implementation of a simplicial PWL function evaluator. The proposed IC allows to eval- uate with good accuracy a three dimensional PWL func- tion. The block diagram and a detailed explanation of the chip operation is described. The mathematical back- ground is presented and also a simple two-dimensional numerical example lets understanding the chip operation. VI Aknowledgment Ph.D. Vı´ctor M. Jime´nez Ferna´ndez is grateful for the partial economical support that he received by the Na- tional Institute for Astrophysics, Optics and Electronics in the Post.Ph.D. visitor position at the Universidad Na- cional del Sur, Bahı´a Blanca, Argentina. The authors would thank Poggi Tomaso for his help in the chip testing. Also, authors are grateful to “Fundacio´n Universidad Nacional del Sur”’ for the support given by the PICT 2003 No.13468.. REFERENCES [1] M. Storace and M. Parodi, “Towards analog imple- mentations of PWL two-dimensional non-linear func- tions,” International Journal of Circuit Theory and Applications, vol. 33, no. 2, pp. 147-160, Mar.-Apr. 2005. [2] M. Parodi, M. Storace, and P. Julia´n, “Synthesis of multiport resistors with piecewise-linear charac- teristics: a mixed-signal architecture,” International Journal of Circuit Theory and Applications, VOL.33, no. 4, pp. 307–319, Jul.-Aug. 2005. [3] P. Julia´n, A. Desages, and B. D’Amico, “ Orthonor- mal High-Level Canonical PWL Functions with Ap- plications to Model Reduction,” IEEE Transactions on Circuits and Systems-I: Fundamental Theory and Applications, VOL.47, pp. 702-712, May 2000. [4] P. Julia´n and O. Agamennoni, “High-Level Canon- ical Piecewise Linear Representation Using a Sim- plicial Partition,” IEEE Transactions on Circuits and Systems-I: Fundamental Theory and Applications, VOL.46, pp. 463-480, April 1999. [5] P. Julia´n, R. Dogaru, and L. Chua, “A Piecewise- Linear Simplicial Coupling Cell for CNN Gray-Level Image Processing,” IEEE Transactions on Circuits and Systems-I: Fundamental Theory and Applica- tions, VOL.49, pp. 904-913, ¡July 2002. [6] P. Mandolesi, P. Julia´n, and A. Andreou, “ A scalable and Programmable Simplicial CNN Digital Pixel Pro- cessor Architecture,” IEEE Transactions on Circuits and Systems-I: Regular papers, VOL.51, pp. 988- 996, May 2004. [7] M. Di Federico, P. Julia´n, T. Poggi, and M. Storace, “ A Simplicial PWL Integrated Circuit Realization, ac- cepted in” IEEE International Symposium on Circuits and Systems ISCAS-2007, New Orleans, U.S.A., May 2007. Ver+/- | |

Descargar | |

Atrás |