# A Low-Power Integrated Circuit for Interaural Time Delay Estimation Without Delay Lines

A. Chacón-Rodríguez, Member, IEEE, F. Martin-Pirchio, S. Sañudo, and P. Julián, Senior Member, IEEE

Abstract—A low-power IC for the estimation of the delay between two infinitely clipped (digital) signals is designed and implemented in a 0.35- $\mu$ m standard CMOS technology. The proposed circuit is based on a sliding-mode control system and does not need past values of the inputs, which are usually stored using chains of digital registers or analog delay lines and significantly increase the power consumption. The IC is intended to work in ultralow-power miniature sensor network nodes performing localization in the audio range [20, 1000] Hz, as part of a forest environmental protection network. Power dissipation results show a core power consumption of 1.04  $\mu$ W at 3.3 V and only 282 nW at 1.8 V—in both cases with a clock frequency of 200 kHz. The circuit is fully operative and was successfully tested on field as part of a low-power bearing sensor unit.

*Index Terms*—Acoustic signal processing, bearing angle estimation, CMOS digital integrated circuits, correlation-derivative circuit, delay estimation correlation methods, direction of arrival estimation, low-power consumption, low-power sensor networks, microphone arrays.

#### I. INTRODUCTION

**S** EVERAL techniques can be used for the localization of acoustic sources [1], [2]. A number of analog and digital VLSI ICs have successfully been demonstrated in [2]–[6], yet only few of them have been designed to operate with very low-power consumption; for example, the mixed-signal IC reported in [6], based on a spatial gradient approach, achieves 32  $\mu$ W at 3-V supply. Most of these demonstrated ICs rely on the interaural time delay (ITD)—also known as the time difference of arrival—between two sensors (microphones for the case of audio signals) for the localization of the source.

The direct measurement of the ITD is rather complex; in general, all localization methods use an indirect measurement of the ITD. Based on this indirect measurement, several

Manuscript received December 18, 2008; revised March 12, 2009. First published June 16, 2009; current version published July 17, 2009. This work was supported in part by the Agencia Nacional de Promoción Científica y Técnica under Grant PICT 2006 No. 1835, by the Universidad Nacional del Sur under Grant PGI 24/ZK12, and by the Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET) under Grant PIP # 5048 2005-2006. The work of A. Chacón-Rodríguez was supported by a scholarship funded by the Organization of American States, the Instituto Tecnológico de Costa Rica, and the Ministerio de Ciencia y Tecnología de Costa Rica. This paper was recommended by Associate Editor J. S. Chang.

A. Chacón-Rodríguez is with the Laboratorio de Componentes Electrónicos, Universidad Nacional de Mar del Plata, 7600 Mar del Plata, Argentina, on leave from Instituto Tecnológico de Costa Rica, 7050 Cartago, Costa Rica (e-mail: alchacon@itcr.ac.cr).

F. Martin-Pirchio, S. Sañudo, and P. Julián are with the Instituto de Investigaciones en Ingeniería Eléctrica (IIIE), CONICET and Departamento de Ingeniería Eléctrica y Computadoras (DIEC), Universidad Nacional del Sur, 8000 Bahía Blanca, Argentina.

Digital Object Identifier 10.1109/TCSII.2009.2023281

algorithms can be proposed to obtain an estimation robust to noise, sensor variations, etc.

The precision of the signal acquisition for the generation of the indirect measurement of delay has a strong impact in the complexity (and, thus, the power requirements) of the resulting circuits. Data word size determines the complexity (size) and power dissipation of the analog-to-digital converter, as well as those of the internal processing blocks (delay lines, adders, multipliers, etc.). Circuit complexity also depends on the length of the word representing the estimation  $(\log_2)$ . In [8] and [9], the authors prove that the power spectrum of certain signals is related to the spectrum of their infinitely clipped versions by a nonlinear function. In particular, for the localization of band-limited audio signals, a 1-bit quantization of the signal amplitude may be performed with a minor sacrifice in precision, as shown in [7]. A direct consequence of this is a drastic reduction in circuit size. There are two IC realizations along these lines reported in the literature (see [10] and [11]), where register chains are used to store the input signal samples needed to determine the ITD via a cross-correlation-derivative (CCD) approach. The requirement on time resolution sets the clock speed and, therefore, the sampling rate, whereas the measurement range defines the length of the register chains. The activity of these register chains, which are clocked at 200 kHz to achieve a specified time resolution of 5  $\mu$ s, impose a lower bound in the power consumption of the IC. The data stored in the register chains (more specifically, the number of taps between changes of bit signs in both chains) are used as an indirect measurement of the ITD. Using this information as an input, different filtering, estimation, or control algorithms can be utilized to produce a robust estimation in a given time window. In particular, the IC presented in [7] performs an integration during a time window of 1 s.

In this brief, we present the delay estimation problem as a control system where the input to the system is the delay to be measured and the error is generated by the subtraction between such delay and an internal state. The control system can be shown to be a sliding-mode control [12] that linearly converges to the ITD estimation and requires a minimum logic. Unlike previous cases, no chain of registers (or analog delay lines) are used to continuously delay the signal for the intended range of the estimator, as needed in standard correlation methods (see [13]) and also in the correlation and derivative-correlation approaches (as in [7]; see Fig. 1). In the present case, only three registers are required for synchronization and edge detection purposes. The control strategy updates an internal state by a fixed count number, depending on the delay estimation sign, which is obtained without the need of register chains. The



Fig. 1. CC bit slice. One per delay stage.

algorithm performs tracking of the measured delay using an architecture consisting of a state machine and a data path composed of an 8-bit counter, an 8-bit register, and a few registers for synchronization and edge detection purposes. The experimental results of the IC, fabricated in a 0.35- $\mu$ m standard CMOS technology, are presented and compared against circuits performing similar functions.

The elimination of the register chains produces a significant reduction of the IC power consumption by at least two orders of magnitude with respect to all other approaches previously reported in the literature. The measured IC power consumption is only 1.04  $\mu$ W at 3.3 V and 282 nW at 1.8 V (in both cases with a sampling clock frequency of 200 kHz). Considering that the IC has a range of 256 unit delays, the power consumption per delay stage is 4.06 and 1.1 nW, respectively (the minimum power consumption for equivalent specifications reported in the literature is 118 nW per stage [11]). The ICs are intended to work as part of a miniature autonomous wireless sensor network for forest environmental protection. Energy saving is critical, because the replacement of batteries or the use of alternative energy sources such as solar power is severely limited. Moreover, size limitations force us to use small-size (small-charge-capacity) batteries.

This brief is organized as follows. Section II gives an introduction to delay estimation using register chains. Section III explains the new approach without register chains. Section IV presents the silicon implementation of the new method. Section V discusses the verification results. Finally, Section VI offers some conclusions.

#### II. REGISTER-CHAIN-BASED ITD ESTIMATORS

This section presents background material on low-power architectures for ITD measurement based on delay chains, as an introduction to the new architecture.

### A. Cross-Correlation (CC) Approach

The standard parallel architecture for ITD measurement consists of a bank of delay registers. One of the signals enters the bank and is delayed by one unit before entering each bit slice; the other enters each bit slice without delay, as shown in Fig. 1.

The input signal  $x_2(t)$  and the delayed signal  $x_1(t-i\tau)$ , at the *i*th stage, enter a block where they are multiplied and added into a counter. After a certain integration time T, the



Fig. 3. New closed-loop approach for estimating the ITD.

maximum output indicates the normalized (with respect to the sampling time) ITD estimate  $(ITD_e)$  between  $x_1$  and  $x_2$ . In other words

$$ITD_e = \{i : y(i) > y(j), \forall j \neq i\}.$$
(1)

## B. CCD Approach

The architecture proposed in [10] replaces the correlation bit slice by a correlation-derivative one. Whenever a change of  $x_1$ is detected before a change of  $x_2$ , the adder count is incremented. In the opposite case, the adder count is decremented (Fig. 2).

As proved in [10], the output z is the spatial derivative of the correlation (with respect to the position of the index). Accordingly, ITD<sub>e</sub> can be obtained from the sign change of z

$$ITD_e = \{i : sign[z(i)] \neq sign[z(i+1)]\}.$$
 (2)

This approach reduces the complexity (and power consumption) of the previous architecture because there is no need for a block to calculate the maximum of all outputs and also because the counters are smaller and have a smaller activity factor (events are counted at the frequency of the input signal as opposed to the previous approach that integrates using the clock frequency). A realization of this algorithm was presented in [11], fabricated in a standard 0.5- $\mu$ m CMOS process. The IC discriminates between delay differences down to 5  $\mu$ s with a power consumption of 12  $\mu$ W at 2 V. This IC has 64 delay stages; therefore, the power consumption per stage is 187.5 nW.

#### **III. NEW APPROACH**

The new approach presented in this brief eliminates the delay chains and produces an estimation  $ITD_e$  using a sliding-mode control technique. Conceptually, the measurement system is a control system such as the one illustrated in Fig. 3, where W(k)is the state of the estimation at time k, and

$$W(k+1) = W(k) + \operatorname{sign} (e(k))$$
$$e(k) = \operatorname{ITD}_{e}(k) - W(k).$$
(3)

Assuming that  $ITD_e$  is a slow signal with respect to the system's clock, it can be seen that W(k) linearly converges to

576



Fig. 4. When an edge is detected in  $X_1$ , a descending count is initiated starting at W(k). (a)  $\text{ITD}_e(k) > W(k)$ ; thus,  $S(k) = \text{sign}(\text{ITD}_e(k) - W(k)) = +1$ , and W(k) must be decremented. (b)  $\text{ITD}_e(k) > W(k)$ ; thus, S(k) = -1, and W(k) is incremented. When S(k) = 0, W(k) is kept, and the system has converged.

 $ITD_e$  in at most  $ITD - W_0$  steps, where  $W_0$  is the initial value at W(k), which is either a previously estimated delay or zero.

The key factor in the choice of this control algorithm is that  $|ITD_e(k)|$  need not directly be obtained, since it is enough to know sign $(ITD_e(k) - W(k))$ , which can be estimated without any delay chains. To illustrate this, Fig. 4 shows two signals, with  $X_1$  leading  $X_2$ . The instantaneous delay is given by  $ITD_e$ . In the rising edge of  $X_1$ , a descending count is initiated starting at W(k). If  $W(k) > ITD_e(k)$  by the end of the count (count = 0), then  $S = sign(ITD_e(k) - W(k)) = +1$ , since  $X_2(W(k)) = 1$ . Thus, W(k) is decremented by 1. If, on the contrary,  $W(k) < ITD_e(k)$  at the end of the count, then S = -1, since  $X_2(W(k)) = 0$ , and W(k) is incremented by 1. When the edges are equal, then S = 0, and W(k) keeps its value, meaning that the synchronism between the two signals has been reached.

The implementation of this algorithm is shown in Fig. 5. At time k, W(k) is stored in a descending counter, which counts until a zero count is reached. At that moment, W(k) is updated to W(k + 1), and the system is deactivated until the next signal edge. It is clear that the choice of any other control action would require knowing the value of  $ITD_e(k)$ , which entails the need for delay chains to store it. The use of the sign instead of the whole value of  $ITD_e(k)$  makes the use of delay chains unnecessary, at the expense of restricting the convergence speed to one step for a signal edge.

The algorithm is slightly improved by considering both positive and negative edges of the input signal. Multiplexers at the input allow the estimation of negative delays by swapping the reference input. Convergence time  $T_c$  is given by a very simple formula

$$T_c = \frac{1}{2} \frac{f_{\rm clk}}{f_{\rm signal}} |\text{ITD} - W_0| \tag{4}$$

where  $|\text{ITD} - W_0|$  is the absolute signal delay change from the ITD to be estimated to the previous value stored at  $W_0$ .

### **IV. SILICON IMPLEMENTATION**

The algorithm is implemented using a three-state finitestate machine (FSM).  $X_1$  and  $X_2$  are synchronized and routed through two multiplexers. A pipeline provides for the detection



Fig. 5. Closed-loop digital architecture. Notice the extra delay introduced to  $Y_2$  to take into account the FSM latency.



Fig. 6. Microphotograph of the closed-loop architecture implemented with 0.35- $\mu$ m CMOS standard cells.

of transitions (Fig. 5). Due to the FSM latency, an extra register is used for  $Y_2$ . Signals data\_ready and out\_range implement the interface of the IC with external circuitry. The architecture was first described in the register transfer level using Verilog and verified with an event-driven simulator. The Verilog code was ported and tested on a field-programmable gate array (FPGA), which included the testing of the algorithm in a bearing sensor unit (BSU) such as the one described in [14]. Because the index counter only becomes active after a transition, until it reaches zero, latched clock gating at both the index counter and the storage register was directly introduced from the Verilog code [15]. This also produces an important reduction in power consumption, because in the application, the time delay is small compared with the period of the signal; therefore, the IC is idle for most of the signal period. Then, verification was performed at the technology synthesis level with an event-driven simulator. The final architecture was described and synthesized using the standard cells provided by the University ASIC Design Kit from Mentor Graphics. Critical registers were nonetheless replaced with minimum-size transistor cells. A photo of the IC is shown in Fig. 6.



Fig. 7. IC's linearity test. The deviations at the borders are due to the circuit's 8-bit data range limit.



Fig. 8. Absolute estimation error with its standard deviation. As expected, the standard deviation is bounded to  $\pm 0.5$  lsb, that is,  $\pm 2.5 \ \mu$ s.

## V. EXPERIMENTAL RESULTS

The IC was first verified with an FPGA-based testbench that provided the test signals to apply to the IC. The testbench performs a pseudorandom sweep of delays within and outside the circuit's full range, with a  $2.5-\mu$ s step. In Figs. 7 and 8, evaluations of the IC's linearity and its estimation accuracy are shown.

After the electrical characterization, the IC was incorporated into a BSU and taken to an open country field, where the unit was fully characterized for its use in the localization sensor network. The sensor unit has four microelectromechanical system microphones in a square array, providing for a full 360° range. The signal of each microphone is amplified and filtered with a fourth-order bandpass filter, implemented with a bank of switched-capacitor filters for the matching of the channels. The bandpass filters have a frequency range of interest [100 Hz– 300 Hz]. The signal is then quantized to 1 bit and fed to the delay estimator. The output of the estimator is sampled



Fig. 9. Open-field test setup. BSU on a wireless sensor network, with delay estimator IC incorporated.



Fig. 10. Open-field test results for an angle sweep between  $-90^{\circ}$  and  $90^{\circ}$  on the north–south axis. The standard deviation is below 4 bits for all angles, with  $7^{\circ}$  having the worst case.

by a Mica2 Mote, which then transmits the data through the wireless sensor network. The network consisted in this case of the BSU and several acoustic pressure sensor units. A receiver node collects the data packets coming from the BSU and the pressure sensors and feeds them to a personal computer running MATLAB on a TinyOs environment.

The test setup is shown in Fig. 9. A 200-Hz 80-db<sub>SPL</sub> tone was used as a test signal. A full 360° angle sweep was done, at 10° intervals, with ten delay samples per angle, at 10 ms between samples. One chip was used for the two pairs, with the Mote providing for the adequate switching of the microphones. Results for the BSU, using the microphone pair north–south are shown in Fig. 10, for a  $[-90^{\circ}, 90^{\circ}]$  sweep with regard to the north–south axis. The standard deviation shows that once the delay estimator converges to a steady-state value, it deviates by no more than 4 bits from that value, even considering sound fluctuations due to wind and the presence of interfering sounds in the vicinity. A picture of the chip on the BSU and the whole unit itself are shown in Fig. 11.



Fig. 11. BSU on the field and a detail of the unit with the incorporated chip.

TABLE I Power and Precision Comparison Between the Closed-Loop Architecture and Other Circuits Performing the Same Function

| Architecture                                                         | Technology | Power        | Precision              |
|----------------------------------------------------------------------|------------|--------------|------------------------|
| Neuromorphic sound                                                   | $0.5\mu$   | $1850\mu W$  | $\approx 10 \ \mu s^1$ |
| localizer@5V [2]                                                     |            |              |                        |
| Micropower gradient                                                  | $0.5\mu$   | 32µW         | $2\mu s$               |
| flow @3V, 2kHz [6]                                                   |            |              |                        |
| Derivative cross-correlator                                          | $0.5\mu$   | 12µW         | 5μs                    |
| @2V, 200kHz [11]                                                     |            |              |                        |
| Closed loop architecture                                             | 0.35µ      | $1.04 \mu W$ | 5μs                    |
| @3.3V, 200kHz                                                        |            |              |                        |
| Closed loop architecture                                             | 0.35µ      | 282nW        | 5µs                    |
| @1.8V, 200kHz                                                        |            |              |                        |
| <sup>1</sup> Precision for the Neuromorphic sound localizer is based |            |              |                        |
| on the data reported for a $100\mu s$ delay measurement [2].         |            |              |                        |

Table I shows the measured power data from this version, compared to three alternative circuits reported in the literature. This version greatly surpasses the power performance of all other architectures while providing similar accuracy. The functional verification of the chip at 1.8 V and 200 kHz was also successful, giving a total power consumption of only 282 nW, excluding pads.

## VI. CONCLUSION

A closed-loop architecture for the estimation of the ITD between two digital signals has been proposed and fully verified

in silicon. In terms of power dissipation, the architecture shows a significant cut with respect to other systems performing the same function, with equivalent accuracy. The circuit dissipates 1.04  $\mu$ W at 3.3 V and only 282 nW at 1.8 V (18.9  $\mu$ Ah and 9.4  $\mu$ Ah, respectively), both at a 200-kHz sampling clock rate. Considering that the IC has a range of 256 unit delays, the power consumption per delay stage is 4.06 and 1.1 nW, respectively. This makes the proposed circuit particularly suited for its use as part of a miniature autonomous wireless sensor network for forest environmental protection.

#### ACKNOWLEDGMENT

The authors would like to thank the MOSIS Educational Program for its support with the IC prototyping of this research project.

#### REFERENCES

- J. P. Lazzaro and C. Mead, "Silicon models of auditory localization," *Neural Comput.*, vol. 1, no. 1, pp. 41–70, 1989.
- [2] A. Van Schaik and S. Shamma, "A neuromorphic sound localizer for a smart MEMS system," *Analog Integr. Circuits Signal Process.*, vol. 39, no. 3, pp. 267–273, Jun. 2004.
- [3] I. Grech, J. Micallef, and T. Vladimirova, "Experimental results obtained from analog chips used for extracting sound localization cues," in *Proc.* 9th Int. Conf. Electron., Circuits Syst., 2002, vol. 1, pp. 247–251.
- [4] G. H. Harris, C. J. Pu, and J. C. Principe, "A neuromorphic monaural sound localizer," in *Proc. Adv. Neural Inf. Process. Syst. II*, 1999, pp. 692–698.
- [5] T. Horiuchi, "An auditory localization and coordinate transform chip," in Proc. Adv. Neural Inf. Process. Syst., 1995, vol. 7, pp. 787–794.
- [6] M. Stanacevic and G. Cauwenberghs, "Micropower gradient flow acoustic localizer," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 52, no. 10, pp. 2148–2156, Oct. 2005.
- [7] P. Julián, A. G. Andreou, G. Cauwenberghs, R. Riddle, and A. Shamma, "A comparative study of sound localization algorithms for energy aware sensor network nodes," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 51, no. 4, pp. 640–648, Apr. 2004.
- [8] J. Van Vleck and D. Middleton, "The spectrum of clipped noise," Proc. IEEE, vol. 54, no. 1, pp. 2–19, Jan. 1966.
- [9] H. Berndt, "Correlation function estimation by a polarity method using stochastic reference signals," *IEEE Trans. Inf. Theory*, vol. IT-14, no. 6, pp. 796–801, Nov. 1968.
- [10] P. Julián, A. G. Andreou, and D. H. Goldberg, "A low-power correlationderivative CMOS VLSI circuit for bearing estimation," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 14, no. 2, pp. 207–212, Feb. 2006.
- [11] P. Julian, F. Martin-Pirchio, and A. G. Andreou, "Experimental results of a cascadable micropower time delay estimator," *Electron. Lett.*, vol. 42, no. 21, pp. 1218–1219, Oct. 2006.
- [12] V. I. Utkin, *Sliding Modes in Control and Optimization*. New York: Springer-Verlag, 1992.
- [13] G. Carter, "Coherence and time delay estimation," *Proc. IEEE*, vol. 75, no. 2, pp. 236–255, Feb. 1987.
- [14] F. Martin-Pirchio, S. Sañudo, H. Gutiérrez, and P. Julian, "An acoustic surveillance unit for energy aware sensor networks: Construction and experimental results," in *Proc. XII Workshop Iberchip*, San José, Costa Rica, 2006, pp. 191–194.
- [15] A. Amara and P. Royannez, "VHDL for low power," in *Low-Power Electronics Design*, C. Piguet, Ed. Boca Raton, FL: CRC Press, 2004, ch. 11, pp. 1–26.