# A Modular Pipelined Processor for High Resolution Gamma-Ray Spectroscopy

Alejandro Veiga and Christian Grunfeld

*Abstract*—The design of a digital signal processor for gamma-ray applications is presented in which a single ADC input can simultaneously provide temporal and energy characterization of gamma radiation for a wide range of applications. Applying pipelining techniques, the processor is able to manage and synchronize very large volumes of streamed real-time data. Its modular user interface provides a flexible environment for experimental design. The processor can fit in a medium-sized FPGA device operating at ADC sampling frequency, providing an efficient solution for multi-channel applications. Two experiments are presented in order to characterize its temporal and energy resolution.

*Index Terms*—Digital pulse processing, field programmable gate arrays, gamma-ray spectroscopy, nuclear instrumentation, radiation detection.

## I. INTRODUCTION

**H** IGH resolution gamma-ray signal processing is a challenging task for digital systems. Computationally intensive algorithms are required to extract temporal and amplitude information from interesting events, while real-time analysis of thousands or millions of these signals per second is demanded. This imposes strict performance requirements upon the digital processors.

Sequential microprocessor architectures fail to operate with such high data volumes and sampling rates. On the other hand, Field Programmable Gate Arrays (FPGA) can achieve the required performance taking advantage of task parallelism, at the expense of a low-level approach to the solution [1], [2]. Consequently, high-level programming is not available for regular users and code reusability is a target that remains far from reach. Writing real-time processing algorithms for modern physics experiments has become a time consuming task that relies on a skill that is not usually available among the scientists that generate the requirements.

Microprocessor and FPGA technologies were considered in the past as two completely different solutions: microprocessor sequential behavior was suitable for high level programming, while FPGA parallelism was more adequate for low-level design. However, in last few years, a convergence of both

C. Grunfeld is with the Physics Department, Universidad Nacional de La Plata, B1900AWN La Plata, Buenos Aires, Argentina.

Digital Object Identifier 10.1109/TNS.2016.2515851

technologies is being envisioned. Microprocessors have now evolved to exploit explicit parallelism in multicore organizations and cores have been simplified and replicated, including streaming, digital signal processing (DSP) and single instruction multiple data (SIMD) capabilities. This approach is now closer to the FPGA design principle, where several tasks can be scheduled for simultaneous execution. This kind of parallelism has proven to be adequate for applications where several streams can be processed simultaneously in symmetric processors.

FPGA devices have also experienced important architectural advances in recent years. Low-cost devices have evolved to include embedded microprocessors, large memory blocks, fast communication interfaces and hundreds of DSP modules. A recently proposed OpenCL standard [3] is a hopeful attempt to unify the programming interface for both technologies, with heavy parallelism in mind. At the same time, FPGA vendors are designing low-cost, low-power devices with architectures that can accommodate this model. In the near future it will be possible to find hardware and software tools that provide a high-level interface to efficient parallel algorithms, which can eventually be implemented on both microprocessors and FPGA architectures.

Under these circumstances, it is reasonable to try to model the solution to a performance intensive problem in a way that can fit this paradigm. With that purpose in mind, we have designed and implemented a modular processing structure that exploits the streaming and parallel properties of the case we are interested in: gamma-ray signal processing.

In this work we present a pipelined organization based on the properties of high speed, pipelined analog-to-digital converters (ADC). With our scheme, signal processing can be designed by the user as a sequence of reusable modules, simplifying laboratory setup. It will be shown that a single-ADC processor, implemented with medium sized FPGA devices, can be designed to process and synchronize a large volume of data with the required precision, satisfying both energy and temporal resolution for a wide range of gamma-ray applications.

## II. PIPELINED DATA CONVERTERS AND PROPOSED PROCESSING ARCHITECTURE

Real-time digital pulse processing (DPP) presents a compromise between accurate energy and timestamp determination. In the first case, in order to determine the height or the area of a radiation detector output pulse, random noise must be eliminated by filtering or averaging techniques. Such reduction of the system bandwidth degrades the determination of the event timestamp due to rise time extension, making measurements

0018-9499 © 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications\_standards/publications/rights/index.html for more information.

Manuscript received June 05, 2015; revised November 13, 2015; accepted January 04, 2016. Date of publication February 03, 2016; date of current version February 16, 2016.

A. Veiga is with the Department of Electrical Engineering, Universidad Nacional de La Plata, CONICET, B1900AWN La Plata, Buenos Aires, Argentina (e-mail: alejandro.veiga@ing.unlp.edu.ar).



Fig. 1. Single-channel hybrid fast/slow detection system for gamma spectroscopy, including visualization and output modules (shaded).

highly dependent on filter parameters and amplitude variations. This compromise has resulted in two different approaches to instrumentation design: (1) the fast channel, noisy but with high bandwidth, allows accurate determination of event arrival; and (2) the slow channel, where signal is filtered to reduce noise, adequate for calculation of energy, amplitude or area.

However, some applications require simultaneous time and energy determination. In these cases such compromise usually involves duplicated equipment. For example, in time-of-flight positron emission tomography (ToF-PET), sub-nanosecond timestamp determination is crucial in order to increase the spatial resolution of the PET technique [4], but also a precise energy determination contributes to spurious event rejection, contributing to image resolution.

Analog-to-digital converters face an equivalent constraint: a compromise arises between speed and resolution of the devices. In recent years, commercial pipelined ADCs have improved greatly in speed, resolution and dynamic performance, and are now being considered as a viable alternative. They can provide 16-bit resolution with up to 310 MSPS, which has become a good option to design hybrid layouts, where both energy and timestamp can be determined within a single data acquisition system.

The design complexity of these devices increases linearly with the number of bits, instead of exponentially, providing converters with high speed, high resolution, and low power all at the same time. Analog to digital conversion is performed by several consecutive flash ADCs and DACs that produce the result after a fixed number of clock cycles. This behavior suggests a digital processor design that maintains the pipelined organization. Pipelining is a form of parallelism that is well suited for data streams that require a sequence of operations. This technique, if properly implemented, can drastically improve efficiency of the digital circuit, producing a throughput that can be well above those achieved using sequential microprocessor architectures. Latency introduced by the ADC and the rest of the pipeline is an issue that can be handled in most nuclear applications, as long as it is constrained and precisely specified.

A development platform for an AD9467 analog-to-digital converter from Analog Devices [5] was selected in order to test the performance of both pipelined converters and the modular processing architecture. The kit includes a board with a single channel 16-bit 250 MSPS ADC that can be tested in different input configurations. The board is interfaced to a motherboard, based on Virtex4-FX20 FPGA from Xilinx. This set is intended to run calibration and test programs provided by the manufacturer, but after some research, it was found that regular Xilinx design tools (i.e. ISE Design Suite) could be used on this platform for the development of our processor. An analog front-end was incorporated, based on ADL5562 differential wide-band amplifier. The front-end can operate with DC-coupled, single-ended, 50  $\Omega$  input, in order to match single-ended detector output with the differential input of the ADC.

Using this platform, our multi-channel digital processor was designed as a collection of parallel pipelines whose inputs are the data streams produced by the free-running pipelined ADC, as shown in Fig. 1 Each pipeline is composed of a cascade of elements that can operate simultaneously on different stages of the processing sequence. Every stage has a single input, connected to the single output of the previous stage. Data transfer from one stage to the next is performed in every clock cycle. In our case, all the stages of the pipeline were designed to receive as input, and produce as output, a single 16-bit word in every pipeline cycle, generating a data stream driven by the ADC clock. In each pipeline, successive modules produce output after a well determined number of clock cycles. Bifurcations in the pipeline can be introduced at any point, in order to use different parameters and parallel modules at different clock rates. Multiple processing pipelines can be implemented in order to process several detectors simultaneously and parallel streams can be synchronized using delay modules. Modules that require sequential operations are internally pipelined in order to keep maximum throughput.

#### **III. IMPLEMENTED MODULES**

Several processing elements were implemented according to the proposed organization. All the stages were described using Verilog behavioral models, relying on compiler ability to produce optimized hardware. In this context, new functionality or improved modules can be incorporated and algorithms can be easily replaced as long as interfaces are preserved. The implementation of low-level code is not discarded at any point in the chain where timing constraints exist.



Fig. 2. Block diagram of the slow channel operations, including decimator, shaping filter and constant fraction event detection.

## A. Decimator

This is an essential and basic constitutive block of modern multirate digital signal processing [6]. This module gets a 16-bit stream at clock frequency  $(f_c)$  in its input, averages a programmable power-of-2 number of samples  $2^n$  and feeds data to the pipelines at  $f_c/2^n$ . The decimation filter is implemented as a moving average rectangular window. Although it is not an optimal implementation regarding frequency response, it is very attractive in terms of simplicity and resource saving, since no floating point operations are required and division can be replaced by n-shift operation. It is worth mentioning that this is possible because, in the case of a slow channel, the signal bandwidth is far below the Nyquist frequency  $f_c/2$ . This decimation procedure can be used to reduce high frequency electronic noise and increase the effective number of bits for energy determination, or can be reduced to n = 0 for time critical detection. Hence, at this point, a bifurcation of the stream is produced: a highly-decimated (slow) channel for precise energy determination and a raw (fast) channel for time characterization, whose low or null decimator factor is selected to match the detector risetime.

### B. The Slow Channel: Energy Related Modules

The decimated stream is fed to the equivalent of a spectroscopy amplifier module that was designed as a pipelined trapezoidal finite impulse response (FIR) filter. This module translates input voltage steps from the preamplifier to unipolar shaped pulses of proportional amplitude. The filter was implemented with a classical structure, as shown in Fig. 2, and complemented with a constant fraction discrimination (CFD) technique for event detection. At zero-crossing, CFD provides a trigger to the asynchronous event detector module, as previously shown in Fig. 1.

Parameters of this channel are the number of terms and their coefficients, which can be selected by the user to better match the detector and preamplifier time constants. The coefficients of the FIR filter and the CFD are base-2 integers in order to simplify calculations. FPGA devices that provide DSP modules can benefit from this organization. Single cycle hardware multiply-and-accumulate (MAC) operations can be concatenated in a pipelined organization, providing one sample per clock throughput.

An optional baseline restorer (BLR) was designed to compensate the effect of the preamplifier discharge on the FIR filter, depending on detector characteristics and the requirements of the subsequent modules.

## C. The Fast Channel: Time Related Modules

As the slow channel is  $2^n$  decimated, its temporal resolution is degraded (i.e. last *n* bits of timestamp are zero). In order to preserve timestamp information at ADC period resolution, additional hardware is required, running in parallel with the slow channel.

The current state of the art for time measurement is mainly based on two approaches. In one, start and stop signals are fed directly to the FPGA, where delay lines are implemented in firmware [7]–[10]. However, our approach fits into purely digital techniques based on free running ADCs [11]–[14].

A timestamp module was designed to determine the arrival of an event within the sample period, using a free running counter driven by the ADC clock as a timebase. This module detects candidates by calculating the derivative of the raw ADC signal and comparing it with a threshold, providing an approximated energy calculation. If the threshold is surpassed an entry is generated in a database that keeps the last candidate timestamps to hold the first sample of the pulse. The database is a table that holds derivative-timestamp pairs, implemented as a circular buffer. When the event detector later calculates energy, using the slow channel information, the best pair of this table is matched.

Applying this technique, the smallest measurable time interval is equal to the ADC period (4 ns in our case). This coarse temporal resolution may be enough for spectroscopy, but is not adequate for time critical experiences like coincidence or time-of-flight measurements. In these cases sub-nanosecond temporal resolution is required.

For that purpose, a fine time tagging module was implemented appending a linear interpolator to the coarse logic. It calculates inter-sampled zero crossing of the digital constant fraction signal. A 64-bit timestamp is created by merging a 56-bit coarse timestamp with an 8-bit result of linear interpolation. A block diagram of the operations is presented in Fig. 3. The fine time tagging module introduces a delay due to integer division, required for linear interpolation.

The division algorithm was implemented in a fully pipelined module using an IP core provided in Xilinx ISE Design Suite: LogiCORE IP Divider Generator core v3.0 [15]. It creates a circuit for integer division based on Radix-2 non-restoring algorithm. The algorithm can achieve a single cycle throughput with a latency of the order of the number of dividend bits. Linear interpolation can be improved to a more elaborate algorithm, if resources are available, taking advantage of modular design.

## D. Event Detector

This low-level trigger is designed as a finite state machine (FSM) that receives data from the slow and fast pipelines and generates an asynchronous output when a set of programmable



Fig. 3. Block diagram of the fast channel operations. A digital constant fraction stage produces bipolar pulses. A zero crossing detector, consisting of a finite state machine, signals both a free running counter (56-bit coarse timestamp) and the output of the linear interpolation stage (8-bit fine timestamp). The linear interpolation stage is latched after a fixed delay to compensate integer divisor latency. Both timestamps are merged in a fixed point 64-bit absolute timestamp. Clocks and strobes are represented in dashed lines and data buses in solid.

conditions are met. This module represents the end of the ADC clock-driven pipeline, as further operation of the chain is performed on an event-driven basis.

Based on information from the slow channel, this module identifies event occurrence and calculates its amplitude, time over threshold, area and rise time. Finally, it incorporates timestamp information provided by the fast channel (selecting best matches from the table). Users can select upper and lower levels for all these values in order to reject unwanted data. Events that accomplish the selection rules are tagged as valid events, time-stamped and routed to the output modules.

The FSM is triggered by the zero-crossing of the CFD signal. If the discrimination process is successful (when amplitude, time over threshold, area and rise time of both lobes of the bipolar signal are within user selected limits), the coarse timestamp of the slow channel is calculated and completed with n zeros in the less significant bits. Coarse timestamp is used as a mask to find the fine timestamp in the timing table generated by the fast channel.

Optional pileup rejection techniques can be included at this point, based on shape recognition techniques, if resources are available.

#### E. Output Modules and Oscilloscope

Several visualization and interfacing modules that can operate simultaneously on asynchronous events were designed in the tradition of nuclear spectroscopy instruments. A multi-channel analyzer (MCA) module keeps a spectrum in the internal memory, based on pulse amplitude, area or time information. Data can be downloaded or reset by the user through a UART serial interface.

A single-channel analyzer (SCA) output module verifies whether event parameters are within a user defined range (selection window), in which case a logical output is produced for external multi-channel scaler (MCS). Several SCA modules can be implemented simultaneously with different selection criteria. A previously designed self-tuning SCA algorithm [16] was modified to fit the pipelined design.

Based on available temporal information, implementation of a time-to-amplitude converter (TAC) or a time-to-digital converter (TDC) is straightforward. These modules calculate the time elapsed between two independent events (start and stop) and produce a digital output for later processing. This time is calculated using information available in event timestamps. TDC output can be fed to the MCA module in order to visualize time distributions. Time tagging also enables simple coincidence/anticoincidence logic module implementation.

An additional oscilloscope module was developed for realtime visualization of input or internal signals. It was designed to send time traces to a computer through a universal serial bus (USB) link. This module can be useful for parameter tuning, but is not required in regular operation. Since its implementation leads to high resource consumption, it can be avoided if a low profile FPGA is planned for the design, or can be temporarily implemented for tuning and later discarded.

#### IV. USER INTERFACE

In order to simplify module interconnections and experimental reconfiguration, the user interface for the components was designed to mimic the behavior and modular design of the Nuclear Instrumentation Modules (NIM) standards, well known by scientists. In this environment, all modules can be replicated and combined as required in order to compose the processing chain. As every module introduces a precise delay, one clock cycle for each pipeline stage, user delays of a fixed number of clock cycles (implemented as FIFO data buffers) can be inserted in order to synchronize signals with a different number of processing stages. For example, Fig. 1 describes the organization of a single-channel hybrid fast/slow detection system that can be applied to several experiments. It includes a fast branch (top), were a fine timestamp of the event is calculated by linear interpolation using the raw samples from ADC. A second parallel branch (bottom) uses the decimator to reduce noise and a shaping filter to determine event energy. Time and energy information are combined together in the event detector, where different discrimination criteria can be applied prior to visualization in the output modules.

An example experiment is presented in Fig. 4 in order to illustrate the proposed user interface. In this example, three modules perform the simple task of verifying exponential distribution of radiation inter-arrival time. Inputs, outputs and interconnections are explicitly named for clarity. First, an instance of module PGEN, named source\_a, is created. This module provides a pseudo-random signal generator that emulates Poisson behavior of a radioactive source. This module was designed for testing purposes. Output of *source\_a* (named *out\_events*) is connected through a single wire named events to the input *(in\_events)* of the next module. Module TDC provides a singleinput time-to-digital converter. The instance created is named  $tdc_a$ . Output of  $tdc_a$  is routed to a MCA module, a multichannel analyzer, to build a histogram, an output familiar to physicists. Our instance of MCA module is named mca\_a. This module can transmit the spectrum to a workstation using a serial link. All modules are synchronic and share a common 250 MHz



// Poisson random event generator (source) PGEN #(.RATE(1000)) source\_a ( . clk(clk\_250), .out\_events(events) ): // Cable from source to TDC wire events: // Single input 12-bit time-to-digital converter TDC #(.NBITS(12)) tdc\_a ( . clk(clk\_250), .in\_events(events), .out\_ready(increment), .out\_data(channel) ); // 12-bit bus to MCA wire increment; wire [11:0] channel; // 4096-channel analyzer with RS232 output to PC UART\_MCA #(.NCHAN(4096)) mca a ( .clk(clk\_250), .in\_increment(increment), .in\_channel(channel), .uart(to\_PC) );

Fig. 4. An example of modules usage. This section of code implements functionality for the experiment described in the upper block diagram. In this example, inter-arrival times of an emulated radioactive source (PGEN module) are measured by a single input time-to-digital converter (TDC module); results of the conversions are registered in a multi-channel analyzer (UART\_MCA module) that can transmit the spectrum to a workstation using a serial link. All modules are synchronous and share a common 250 MHz clock generated by a PLL module (not shown).

clock generated by a PLL module (not shown in figure). Parameters are used during module instantiation to set the rate of the random generator (RATE), the resolution of the TDC (NBITS), and the number of channels of the MCA (NCHAN).

This simple example shows how the proposed interface hides the internal module complexity from the scientist, still allowing hardware upgrades in the form of new modules, in the hands of a FPGA specialists.

## V. EXPERIMENTAL RESULTS

Using the described development platform and some additional laboratory equipment, two experiments were designed in order to prove that the proposed processing technique can be used for both purposes in gamma instrumentation: high-precision energy spectroscopy and sub-nanosecond temporal determination.

## *A. First Experiment: Precise Energy Determination with Coarse Timestamp*

As a first application case, the digitization of a low energy gamma detection system is presented. The purpose of this experiment is to verify that high quality gamma spectroscopy can be achieved with the available resolution (16 bits), by comparing the results with a calibration spectra obtained with high-end analog instrumentation. Under laboratory conditions, our digital processor was configured to discriminate 14.4 keV resonant gamma photons from a 10 mCi <sup>57</sup>Co source in rhodium matrix. A proportional counter (Wissel LND-45431) was used as the detector. Direct digitization of the output signal of the charge preamplifier (Ortec 142PC) was chosen. The digital selection system was set to operate at a rate of 15,000 events/s.

The digital processor was designed according to the following premises. With the converter operating at ADC clock frequency  $(f_{ADC})$ , the timestamp is determined in a fast pipeline, while energy is calculated in a different branch at  $f_{ADC}/2^n$ . In this case, the coarse timestamp is sufficient, so the interpolator is not required. For energy determination, the decimator module was configured to average 32 ADC samples, clocking the rest of the slow pipeline with a period of 128 ns. This value provides good noise rejection for the analog section, still allowing several samples on the rising edge of the preamplifier output (8 samples for  $1\mu$ s average rise time). Shaping filter parameters were tuned to match the preamplifier rise time value. A symmetric profile was selected (trapezoidal filter), with 8 positive, 8 null and 8 negative coefficients. A total pulse width of 4  $\mu$ s results (32 samples), equivalent to 1  $\mu$ s shaping time of a linear amplifier. The event discriminator was tuned to reject pulses with time over threshold or rise time 20% away from the expected values. For monitoring purposes, an MCA was implemented to store a pulse height histogram. Finally, an SCA module was programmed to select events of interest using a multi-parameter strategy.

In order to characterize the attainable energy resolution, Fig. 5 presents the energy spectrum recorded with the internal MCA, superimposed with a calibration spectrum obtained with high-end Ortec instrumentation (Ortec 572A shaping amplifier and 551 single-channel analyzer) under equivalent conditions. In this figure, both spectra present the same energy resolution of about 2.4 KeV. The benefits of the subsequent discrimination technique was evidenced in an improvement of the resultant signal-to-noise ratio (not shown here) of the spectrum, as compared to high-end analog instrumentation.

The Verilog model of the whole one-channel pipeline was compiled with Xilinx ISE Design Suite 14.7. On-chip phase-locked loop module (PLL) was used to clock the circuit. All timing constraints were met for PLL operating at 250 MHz (ADC operating frequency) in the Virtex4-FX20 FPGA device. The maximum attainable frequency of operation is slightly over that value, meaning that an additional design effort will be required if a faster ADC is used. The design (not including scope nor MCA visualization modules) requires 150 slices for its implementation, an amount of hardware that can be accomplished with low-power programmable devices. For example, it represents approximately 15% of the resources available in a Spartan-3 XC3S100E, the first member of the Xilinx Optimized Platforms FPGA family.



Fig. 5. Energy spectrum of a 10 mCi <sup>57</sup>Co source with a Rh matrix. Proportional counter output was recorded and processed with high-end Ortec analog instrumentation (dashed) and with our digital implementation (solid). Central 14.4 keV gamma rays are well separated from the 6.4 keV Fe X-rays and 22 keV from Rh. Both present energy resolution of about 2.4 KeV.

#### B. Second Experiment: Precise Temporal Determination

A second experiment involving sub-nanosecond temporal requirements was designed in order to quantify the attainable temporal resolution of the system. The purpose of this experiment is to verify that the computational requirements can be satisfied, considering that only a 4 ns sample period is available and interpolation is mandatory.

As our prototype provides a single input channel, we were not able to implement a coincidence experiment, but instead we could obtain a similar measure of temporal resolution by digitizing two consecutive events, both on a single channel. For that purpose, a periodic sharp pulse generated with a Agilent 33250A function generator was added (using a Ortec AN308/NL dual mixer) to a delayed version of the same pulse (using a Ortec 425A passive delay). Fig. 6 shows the experimental setup. This configuration provides a stable and well determined pair of pulses, whose separation was measured using our digital processor. For that purpose, a fine-grain time interpolator was implemented, prior to the event detector, without a decimator. A TDC module was used to calculate the time between consecutive events and an MCA was implemented to produce output. Compiled in the same conditions as the previous experiment, the implementation of the complete processor required 1200 Xilinx slices. Operating at 250 MHz all timing constraints were met.

Fig. 7 shows the measured distribution of time between pulses when a delay of 64 ns was used. For better visualization, 60 ns were digitally subtracted to fit the MCA scale with optimal resolution (the peak centered at about 4 ns reveals this). The second peak in the figure shows a superimposed time distribution of the same experiment with the delayed pulse path outstretched 28 mm using a BNC tee connector. This result provides a characterization of attainable spatial resolution in time-of-flight experiments. A temporal resolution of 50 ps FWHM was obtained, as shown in the figure. That value is the order of gamma detectors transit time spread, making it suitable to achieve good coincidence resolving time in, for example, positron annihilation spectroscopy. ToF-PET can also be improved by a time resolution that represents 12 mm spatial resolution.



Fig. 6. Experimental layout for the temporal experiment. The FPGA development kit and analog front-end were mounted in a blank NIM module (cover is removed for visualization). Two Ortec modules, AN308/NL dual mixer and 425A passive delay, complete the setup.



Fig. 7. Inter-arrival time distribution of pulses delayed 64 ns in which 60 ns were digitally subtracted. The peak, centered at about 4 ns, has a resolution of 50 ps FWHM. The second peak (dashed) shows the time distribution of the outstretched delayed path (28 mm).

This experiment and the previous one show that the performance of our digital processor is suitable for simultaneous energy and temporal determination in gamma applications. In the first experiment, 16-bit spectroscopy results are comparable to those obtained with high-end analog instrumentation. The second experiment shows that 50 ps temporal resolution can be achieved with an available 4 ns sampling period. At the same time, it was also shown that resultant data streams can be handled by FPGA devices operating at ADC frequency if pipelining techniques are properly applied.

The development platform can be upgraded to a two channel configuration by replacing the analog front-end with an AD9652-310EBZ evaluation board. It includes a pipelined two channel 16-Bit ADC that can operate at 310 MSPS. This upgrade enables compact, single-FPGA implementation for coincidence experiments.

## VI. CONCLUSION

Modular high-level programming can be achieved in real-time gamma-ray applications with the utilization of pipelined ADCs and medium-sized FPGA devices. 16-bit, 250 MSPS pipelined ADCs provide a good balance between temporal and energy resolution, comparable with best analog instruments. Pipelined organization of the digital processor can guarantee a throughput not attainable with other technologies.

Modular design not only leads to a better understanding of the instrumentation process, but also provides an organization that can efficiently handle the resultant data streams with regular FPGA devices.

The proposed organization constitutes a viable modernization option for analog, high resolution laboratory equipment, like the one involved in Mössbauer, positron annihilation and TD-PAC spectroscopy. It also provides an efficient, high density, low power solution for applications that require a large number of channels, like time-of-flight positron emission tomography, where simultaneous energy and timestamp determination is required.

#### ACKNOWLEDGMENT

The authors would like to thank the Magnetism and Magnetic Materials Group (G3M) of the Physics Institute of La Plata for providing a highly-reliable testing environment for our design.

#### REFERENCES

- R. Schiffer, M. Flaska, S. A. Pozzi, S. Carney, and D. D. Wentzloff, "A scalable FPGA-based digitizing platform for radiation data acquisition," *Nucl. Instrum. Methods Phys. Res. A*, vol. 652, pp. 491–493, 2011.
- [2] A. Farsoni, B. Alemayehu, A. Alhawsawi, and E. M. Becker, "Realtime pulse-shape discrimination and beta-gamma coincidence detection in field-programmable gate array," *Nucl. Instrum. Methods Phys. Res. A*, vol. 712, pp. 75–82, 2013.

- [3] OpenCL, "The open standard for parallel programming of heterogeneous systems," [Online]. Available: http://www.khronos.org/opencl/ Jan. 5, 2016
- [4] V. Spanoudaki and C. Levin, "Photo-detectors for time of flight positron emission tomography," *Sensors*, vol. 10, pp. 10484–10505, 2010.
- [5] Analog Devices, "AD9467 Evaluation Board," [Online]. Available: http://www.analog.com/en/evaluation/eval-ad9467/eb.html Jan. 5, 2016
- [6] B. Porat, A Course in Digital Signal Processing. Hoboken, NJ, USA: Wiley, 1997, pp. 461–464.
- [7] S. Junnarkar, P. OConnor, P. Vaska, and R. Fontaine, "FPGA-based self-calibrating time-to-digital converter for time-of-flight experiments," *IEEE Trans. Nucl. Sci.*, vol. 56, no. 4, pp. 2374–2379, 2009.
- [8] C. Herv, J. Cerrai, and T. Le Caer, "High resolution time-to-digital converter (TDC) implemented in field programmable gate array (FPGA) with compensated process voltage and temperature (PVT) variations," *Nucl. Instrum. Methods Phys. Res. A*, vol. 682, pp. 16–25, 2012.
- [9] J. Torres, A. Aguilar, R. Garca-Olcina, J. Martos, J. Soret, J. M. Benlloch, A. Gonzlez, and F. Snchez, "High resolution Time of Flight determination based on reconfigurable logic devices for future PET/MR systems," *Nucl. Instrum. Methods Phys. Res. A*, vol. 702, pp. 73–76, 2013.
- [10] J. Torres, A. Aguilar, R. Garca-Olcina, P. A. Martnez, J. Martos, J. Soret, J. M. Benlloch, P. Conde, A. J. Gonzlez, and F. Snchez, "Timeto-digital converter based on FPGA with multiple channel capability," *IEEE Trans. Nucl. Sci.*, vol. 61, no. 1, pp. 107–114, 2014.
- [11] L. Bardelli, G. Poggi, M. Bini, G. Pasquali, and N. Taccetti, "Time measurements by means of digital sampling techniques: A study case of 100 ps FWHM time resolution with a 100 MSample/s 12 bit digitizer," *Nucl. Instrum. Methods Phys. Res. A*, vol. 521, pp. 480–492, 2004.
- [12] H. Peng, P. D. Olcott, A. M. K. Foudray, and C. S. Levin, "Evaluation of free-running ADCs for high resolution PET data acquisition," in *Proc. IEEE Nuclear Science Symp. Conf. Rec.*, 2007, pp. 3328–3331.
- [13] W. Hu et al., "Free-running ADC- and FPGA-based signal processing method for brain PET using GAPD arrays," Nucl. Instrum. Methods Phys. Res. A, vol. 664, pp. 370–375, 2012.
- [14] J. Y. Yeom, R. Vinke, V. Spanoudaki, K. J. Hong, and C. Levin, "Readout electronics and data acquisition of a positron emission tomography time-of-flight detector module with waveform digitizer," *IEEE Trans. Nucl. Sci.*, vol. 60, no. 5, pp. 3735–3741, 2013.
- [15] Xilinx (2011, March 1) LogiCORE IP Divider Generator v3.0 [Online]. Available: http://www.xilinx.com/support/documentation/ip\_documentation/div\_gen\_ds530.pdf, Jan. 5, 2016
- [16] A. Veiga, C. Grunfeld, N. Martínez, P. Mendoza Zélis, G. Pasquevich, and F. Sánchez, "Self-tuning digital Mössbauer detection system," *Hyperfine Interact.*, vol. 304, p. 3843, 2013.