Towards a High-Speed Photon-Counting CMOS Quanta Image Sensor (QIS)

Wei Deng (Student) and Eric R. Fossum
Thayer School of Engineering, Dartmouth College, Hanover, NH, USA
Contact: Wei.Deng.TH@dartmouth.edu

Abstract—We report progress towards a CMOS QIS capable of counting single photoelectrons at high speed without avalanche. Jot-parallel readout circuitry makes it possible to record the arrival time of photons. The sensor can operate at up to 33 Mcounts/s/jot with low dead time.

I. INTRODUCTION

The Quanta Image Sensor (QIS) is considered as a possible paradigm shift in image capture [1]. Conceptually, the QIS contains a large number of sub-diffraction-limit, high-conversion-gain, low-full-well-capacity jots and high-speed readout circuitry. A state-of-the-art room-temperature read noise level, such as 0.23e− rms on average, can be achieved in QIS [2-5]. The single-bit QIS has a binary output, which indicates if there is a photon impinging event with imaging performance modeled in 2013 [6]. In 2015, a 2.5 pJ/b 1-Mpixel QIS pathfinder operating at 1000 fps was reported [7], demonstrating the feasibility of achieving high-speed and low-power operation. However, with high light intensity, the pixel may receive multiple photons during the frame time. With multi-bit QIS [8] the photon number can be determined but arrival-timing information is limited to a resolution equal to the frame period (1msec for 1000 fps). Improved timing resolution is desired.

The single-photon avalanche diode (SPAD) is known for fine time resolution. A SPAD utilizes the avalanche effect to achieve a high gain for photon counting and time resolution in the sub-nanosecond range is not uncommon for some SPAD arrays used for time-of-flight applications [9].

Although SPADs can achieve superb time resolution, it is difficult to shrink their pixel size due to required isolation between pixels and large in-pixel circuitry. The high electric fields for avalanche typically result in high dark count rate although lower dark count rates have been recently reported [10]. CMOS QIS with pump-gate jot, however, has achieved sub-diffraction-limit pitch and low dark current [2]. In this paper, we explore faster CMOS-QIS readout circuits.

II. SENSOR ARCHITECTURE

The QIS is implemented in a modified TSMC 45 nm/65 nm stacked BSI CIS process. The sensor architecture is shown in Figure 1. The pixel wafer contains just 18 jots due to limited design time and available die size. A 1x2 shared readout circuitry architecture is used. The shared jot has a dimension of 2.2 µm x 1.1 µm. The output of each jot from the pixel wafer is sent to the ASIC wafer for jot-parallel readout. Each shared jot has its readout circuitry, resulting in 9 readout channels in total. In the readout chain, the jot output is continuously amplified by a PMOS common source amplifier. The amplified signal is then quantized by an on-chip 1-bit analog-to-digital converter (ADC). The jot-parallel ADC is capable of sampling at a speed of 66.7 MSa/s. Due to the limited number of pads, no analog output of the jot was available. However, the jot performance can be estimated using the characterization results from the last-generation QIS chip which has the identical jot design [2].

III. PHOTON-COUNTING JOT

The pump-gate jot with tapered-gate reset (TPG) [2] is used for this chip. The TPG jot was previously measured to have an average conversion gain (CG) of 345 µV/e−. The mean read noise at room temperature is 0.23 e− rms, so the single photoelectron quantization effect can be easily observed. A 1x2 shared readout circuitry is utilized, as depicted in Figure 2(a). The cross-section doping profile is shown in Figure 2(b). During photodetection, the jot is reset by pulsing both the reset gate (RG) and the transfer gate (TG) high. The TG is kept high after reset to enable continuous sampling and reduce clocking feedthrough, although this means the TG surface collects electrons until they start to spill to the floating diffusion (FD), which introduces potential lag and additional noise [11]. Other than clocking TG in the pump-gate jot to solve this drawback, using a different structure (e.g., PPD with distal, low-capacitance FD and continuous complete charge transfer) would be preferred in the future. The incident photon strikes the jot and generates a photoelectron. The photoelectron causes a step response of the FD voltage above background noise because of the high conversion gain. The FD voltage is sampled and then compared to the previous sample to determine if at least one new photoelectron has been added to FD. If true, the jot output is set. After integrating about a hundred photoelectrons, i.e. the full well capacity of FD and the readout signal chain, the jot’s FD needs to be reset by pulsing RG.

IV. JOT-PARALLEL QUANTIZER

The output of the shared jot is processed in the ASIC layer. The signal readout chain is depicted in Figure 3. The jot output signal is amplified by a PMOS common source amplifier, followed by a 1-bit ADC. The ADC contains a switched capacitor input circuit, a comparator for analog-to-digital conversion, and a
dynamic flip flop (DFF) to save the output of the comparator. The ADC output is fed back to control the comparator reset.

The comparator is composed of a cascade of two sense amplifiers (SA) and a dynamic latch, as shown in Figure 4. A 5T operational transconductance amplifier (OTA) is utilized for the SA. In future QIS design, a charge transfer amplifier (CTA) may be implemented for low-power operation [7]. The cascade of two SAs functions as the gain stage of 34dB to amplify the signal to a level higher than the intrinsic input-referred offset voltage of the dynamic latch, estimated to be 50mV. A power-efficient dynamic latch is used since there is no static power consumption. A ½ LSB-equivalent offset should be added to one side of the comparator input in future designs to reduce false counts.

During the readout, the jot is first reset by pulsing both RG and TG high. TG is kept high after reset to enable continuous sampling of the signal from the storage well. The 1-bit quantizer monitors any change of the FD voltage. The timing diagram is shown in Figure 5. The amplified jot output is processed by the quantizer in four phases.

**Phase I: SH1 = 0, SH2 = 1, SHC = 1.**

Store the new sample in storage capacitor C2. Connect storage capacitor C1 (where the previous sample Vi,0 is stored) to comparator input IN1 and C2 (where the new sample Vi is stored) to comparator input IN2.

**Phase II: SH1 = 0, SH2 = 0, SHC = 1.**

Compare the previous sample Vi,0 in IN1 and the new sample Vi in IN2.

**Phase III: SH1 = 1, SH2 = 0, SHC = 0.**

Store the new sample in C1. Connect C2 (where the previous sample Vi is stored) to IN1 and C1 (where the new sample Vi,0 is stored) to IN2.

**Phase IV: SH1 = 0, SH2 = 0, SHC = 0.**

Compare the previous sample Vi,0 in IN1 and the new sample Vi in IN2. If Vi,0 = Vi, i.e., there is no photoelectron added to the FD capacitor, the comparator state will not change and stay at “0”. If Vi,0 > Vi, i.e., there is a photoelectron added to the FD capacitor causing FD voltage to drop and thus the comparator state will change and output a “1”. The output will then be fed back and turn on the comparator reset switch, leading to the injection of a reference voltage VREF to IN1 to reset the comparator state to “0”. In this way, the incident single photon is converted to a digital “1”. In the case no photon impinges the sensor, the output will remain at “0”. The time-domain jot output voltage change as photons impinge and corresponding digital output is shown in Figure 6.

**V. CHARACTERIZATION RESULTS**

The testing system for characterizing the image sensor is shown in Figure 7. A printed circuit board (PCB) was designed. Peripheral components, such as digital-to-analog converter (DAC), current sources, voltage regulators, low voltage differential signaling (LVDS) receivers and drivers, and connectors to FPGA, were soldered onto the PCB. The DAC provides the bias voltages, while the FPGA generates the control timing signals. The data is grabbed by the data acquisition board and saved in the PC memory for further processing. During the testing, a LED array with uniform intensity is used to illuminate the QIS.

An example of measured data stream is shown in Figure 8. The sampling frequency is at 33MHz. The measurement time is 11µs. Three digital “1”s are detected, indicating three electrons were collected by FD during the measurement time. The randomness of the photon arriving time is due to the Poisson arrival statistics of photons and the subsequent emission of a carrier from the TG surface carrier layer. A “frame” concept similar to the integration-based image sensor can be introduced here. The time length of the data stream can be seen as the “integration” time, while the number of digital “1”s corresponds to the number of incident photons during the “integration” time.

A photon-counting histogram (PCH) can be obtained by looking at the distribution of the number of the total received photons from multiple frames. 3,000 frames were captured to construct the PCH. A representative example of PCH is shown in Figure 9. The measured histogram is not perfectly Poissonian, possibly due to noise-induced false counting, or the addition of noise in the emission of carriers from under TG to FD, although in Fig. 9 the variance is smaller than the mean, unlike the ideal Poisson distribution.

The specifications of the prototype chip are summarized in Table 1. Since the current chip is mainly implemented to demonstrate the idea of the high-speed readout, the resolution is very low, i.e., 6x3 and the resolution can be scaled up in future generation chips.

The CG mean and variation are measured from the jot array on the previous QIS chip with identical design [2]. Currently, the ADC sampling rate is limited by the FPGA. The ADC sampling rate may be increased to a higher rate when the FPGA clocks are made faster. A micrograph of the QIS chip is shown in Figure 10.

**VI. DISCUSSION**

The prototype chip illuminates a path for implementation of larger QIS array sizes, e.g., a mega-jot high-speed QIS array for performing coarse time-of-flight measurements. We note 30nsec corresponds to 4.5m resolution, and perhaps practical applications require 0.3nsec or 5 cm resolution or better – so a ~100x increase in the jot-parallel circuit speed is still needed. However, once quantization has been performed, most circuits developed for ToF with SPAD arrays will also be applicable to CMOS QIS-TOF, including event-driven readout and time-stamping, but the readout circuit area requirement may limit practical pixel...
shrink, and power dissipation considerations may limit the resolution and/or readout speed.

The jot used for the current chip has the same design as the previous-generation QIS. Improved QIS with even lower source-follower noise will likely be implemented in future high-speed QIS. Layout with less parasitic capacitances is also desired in a future design to reduce settling time and signal crosstalk. ADCs with a faster operation will be possible with improved design and more advanced process node.

In the future, we can foresee an erosion of the photon-arrival-timing advantages of SPADs over QIS for the same power dissipation. The pre-quantizer gain for SPADs is in the avalanche multiplication which gives high bandwidth but which results in lower pixel pitch, higher dark count rates and perhaps higher manufacturing costs. QIS, on the other hand, promises smaller pixel pitch and lower intrinsic manufacturing costs, but the pre-quantizer gain in the analog circuit domain requires as much or more energy than the SPAD gain, and the gain-circuit layout requires additional pixel area that might erode the pixel pitch advantages of QIS over SPADs.

VI. SUMMARY

We have demonstrated much higher speed, single photo-electron counting without the use of avalanche multiplication, compared to prior CMOS QIS implementations. The arrival of each photoelectron can trigger a counting circuit. A prototype QIS chip with up to 33 MHz high-speed readout for photon counting was designed and tested. A novel switched-capacitor input circuitry of the comparator enables fast quantization. The PCH constructed from the chip output demonstrates the photon-counting capability.

![Figure 1](image1.png)

**Figure 1.** Architecture of the high-speed QIS prototype chip.

![Figure 2](image2.png)

**Figure 2.** (a) Jot schematic with shared readout. (b) Cross-section doping profile of the pump-gate jot.

---

**REFERENCES**

Figure 3. High-speed signal readout chain.

Figure 4. Schematic of the comparator.

Figure 5. Timing diagram of the 1-bit ADC.

Figure 6. Jot output voltage change as photons impinge and corresponding digital output.

Figure 7. Block diagram of the test setup.

Figure 8. A measured data stream with three photons detected.

Table 1. Specifications of the prototype QIS chip.

<table>
<thead>
<tr>
<th>Process</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>Resolution</td>
<td>2x3x3</td>
</tr>
<tr>
<td>Jot pitch</td>
<td>2.2 µm x 1.1 µm</td>
</tr>
<tr>
<td>Readout area per jot</td>
<td>8 µm x 116 µm</td>
</tr>
<tr>
<td>CG mean</td>
<td>345 µV/e−</td>
</tr>
<tr>
<td>CG variation</td>
<td>2.6%</td>
</tr>
<tr>
<td>ADC sampling rate</td>
<td>33 MHz</td>
</tr>
<tr>
<td>ADC resolution</td>
<td>1 bit</td>
</tr>
<tr>
<td>Power per jot</td>
<td>0.26 mW</td>
</tr>
<tr>
<td>Package</td>
<td>PGA with 64 pins</td>
</tr>
</tbody>
</table>

Figure 9. A measured photon counting histogram from 3,000 frames with a quanta exposure of 1.7e−.

Figure 10. Micrograph of the QIS chip.