Quanta imaging sensors: achieving single-photon counting without avalanche gain

D. Starkey, W. Deng, J. Ma, S. Masoodian, E. R. Fossum
Quanta Image Sensors: Achieving Single-Photon Counting Without Avalanche Gain

D. Starkey1, W. Deng1, J. Ma2, S. Masoodian2, and E.R. Fossum1
1Thayer School of Engineering, Dartmouth College, Hanover, NH USA 03755;
2Gigajot Technology, Inc. Pasadena, CA USA 91107

ABSTRACT

This paper reports on the state of the art of the Quanta Image Sensor (QIS) being developed by Dartmouth. The QIS is a photon-counting image sensor. Experimental 1Mpixel devices have been implemented in a modified backside-illuminated stacked CMOS image sensor process. Without the use of avalanche multiplicative gain, the sensors have achieved room temperature average read noise of 0.22e- rms (analog readout) permitting photon counting, and over 1000fps readout at under 20mW total power dissipation including pads (single-bit digital readout).*

Keywords: Photon counting detector, quanta image sensor, QIS, CMOS image sensor, CIS, charge-transfer amplifier

1. INTRODUCTION

The Quanta Image Sensor (QIS) was introduced in 2005 as a possible paradigm shift in image capture to take advantage of shrinking pixel sizes enabled by technological advancements [1]. Conceptually, the QIS contains a large number of sub-diffraction-limit, high-conversion-gain, low-full-well-capacity pixels, called “jots.” The key aspects of the QIS involve counting individual photoelectrons using the jots at high readout rates, representing this binary output as a bit cube (x,y,t), and finally, processing the bit cubes to form high dynamic range images [2]. The QIS concept is illustrated in Figure 1.

Figure 1. QIS concept. Bit-plane images are captured at high frame rate, where a logic 0 corresponds to no received photoelectron, and a logic 1 corresponds to at least one received photoelectron. The bit-planes are then logically stacked in time, and grey-scale pixels are formed from a “cubicle” of bits in a spatial and temporal neighborhood. The extent of the neighborhood can be set on the focal-plane, or by software coding after capture.

Conceptually, a QIS may contain over a billion jots if they are small enough (e.g. <500nm pitch) with a readout speed primarily limited by allowable power dissipation in getting the data off the sensor. For example, a billion jots at 1000fps corresponds to 1Tb/s data rate for single-bit digital output.

* Much of this invited paper, 106391Q, is replicated for a simultaneous meeting as invited paper 1065902.
While initially conceived as a possible next-generation consumer image capture device, once it was demonstrated that we could reliably detect single photoelectrons with low bit-error rate (BER), it became apparent that the device has more immediate application to scientific imaging than consumer imaging. Progress on the device was reviewed in 2016 [2] and technical results most recently reported at the end of 2017 [3]. This paper summarizes some of these results as they pertain to scientific imaging, including some recent measurements at cooled temperatures as we try to understand current noise limitations. We also preliminarily report, for the first time, on QIS devices implemented with JFETs, although the JFET work remains underway and thus far has not yielded the same high performance as MOSFET readout.

2. SENSOR

2.1 Cluster-parallel sensor readout architecture

The simplified cluster-parallel readout architecture of the imager is shown in Figure 2. The stacked BSI QIS uses two substrates, with the substrates being stacked vertically and electrically connected by a dense 2D array of hybrid bonding pads, with the photo-detectors and circuits on different substrates. The jots are implemented on the detector substrate using a simplified 45nm technology-node process, and the readout circuits and addressing circuits are located on the ASIC substrate using a 65nm technology node and using larger-than-minimum gate lengths to handle the large input common mode range that results from the variety of jot designs. Each cluster of jots is associated with its own dedicated readout electronics stacked under the cluster. Twenty (20) different 1Mjot arrays are implemented in this chip, where the arrays have different variations of jot and readout design. There were two classes of readout designs, one supporting low-speed analog readout for detailed characterization, and one supporting binary data readout at much higher field rates. In this paper we will focus on the slower analog readout arrays. The digital output was described in more detail in [4].

The jots are two-way shared 2(H)x1(V) and clusters are 256 rows x 64 columns in size (16,384 jots/cluster) resulting in 4x16 array of clusters (64 clusters) for readout. The 64 columns of a cluster are multiplexed to 32 CDS processors, and the 32 CDS outputs are multiplexed to a single programmable gain amplifier, with gain set at 10x for most results described in this paper. There are 64 of the CDS/mux/PGA units on the readout layer that are connected to 16 analog pads which go to an off-chip 14-bit ADC for quantization.

2.2 Jot devices

Several types of jots were implemented in these arrays, some with different geometries. The photodiode, storage well (SW), transfer gate (TG) and floating diffusion (FD) portions of the jot, common to all devices, are shown in Figure 3a. The jot layer is fabricated with a slightly modified 45nm BSI CMOS image sensor (CIS) process. In essence, a buried photodiode has been created below a transfer gate called the pump gate. The pump-gate photodiode and storage well doping profile is adapted from our previously demonstrated design and optimized for an improved effective fill-factor and better response in the shorter wavelength regime. The fabrication of the jots followed the baseline CIS process flow, while
implantation modifications were made to realize the desired doping profiles for the various structures. These jots include MOSFET source-follower (SF) output devices and JFET SF output devices. The MOSFET SFs were implemented as both buried-channel and surface-channel devices, and the reset for the output floating diffusion (FD) of these devices was either a tapered-reset-gate MOSFET (TPG) or a punch-through-reset (PTR) device (shown in Figure 3b). The MOSFET devices and reset devices were described previously [3,4].

Figure 3. (a) Cross-section of doping profile from 3D TCAD simulation of pump-gate jot; (b) Punch-through reset (PTR) device doping profile and potential well schematic of reset of floating diffusion (FD); (c) Simplified layout and (d) simulated doping profile of the JFET jot.

The JFET SFs, modeled in [5], also have TPG or PTR reset, but only the TPG JFET SF jots have been characterized so far. The JFETS were implemented as a surface-channel device (gate under the channel) or as a buried-channel device, meaning that the JFET gate was wrapped around the channel, above and below, and a new special JFET symbol has been used in the schematics. For reasons likely related to implant conditions, the surface-channel devices are poorly behaved and so only the TPG buried-channel JFET jots are reported here. Cross-sections of the two JFET configurations are shown in Figure 3c and d.

Figure 4. Schematic of readout signal chain for analog output (JFET source-follower shown).

2.3 Sensor function

The sensor is readout using a rolling shutter like conventional CMOS image sensors. After a row is selected, the jots are reset. The reset level is readout through the signal chain in Figure 4. However, in the reported measurements, only one branch of the CDS unit was used with the SHR and MUX1 switches always closed. Figure 5a shows a timing diagram illustrating the testing methodology for standard operation. The PGA is typically set to a gain of 10x. The reset level is
amplified and sampled by the off-chip ADC 42 times at 667 kHz. After the TX gates have been pulsed, the new signal level is amplified and sampled by the ADC another 42 times. The reset level is determined by taking the mean of 38 of the 42 samples taken by the ADC for the reset level and the signal level is calculated similarly. CDS is performed digitally by subtracting these two values.

Figure 5. Timing diagram used for PCH testing (left) and SF noise testing (right).

3. EXPERIMENTAL RESULTS

3.1 Characterization of TPG and PTR jots

The TPG and PTR devices were characterized using the photon counting histogram (PCH) method [6] which was generated from 20k continuous reads of a single device with SF biased at 1µA. An example of a PCH can be seen in Figure 8d. The conversion gain can be extracted from the peak separation and the read noise can be calculated from the difference between the peaks and the valleys in the histogram, called the valley-peak modulation (VPM). Figure 6a and b show the histogram of the CG and read noise respectively for the PTR and TPG devices created from 8,000 SF devices. The average conversion gain is 345μV/e- for the TPG devices and 368μV/e- for the PTR devices, with the increase in CG coming from the reduction in overlap capacitance between the FD and the reset transistor gate. Both the devices show a CG variation of about 2.6% which results from variability in the fabrication process, such as small mask misalignments. The mean read noise for the TPG devices is 0.23e- rms and for the PTR devices is 0.21e- rms with 15.7% and 15.3% relative full-width standard deviation respectively. This large variation comes from the long tail in the read noise distribution which is a result of some devices exhibiting random telegraph noise (RTN) which is substantially larger than the background 1/f noise. Like the variation in CG, the variation in read noise is influenced by fabrication related processes which can cause variations in the SF voltage noise. A scatter plot showing the output voltage noise vs the CG is shown in Figure 6c and for both the TPG and PTR devices the scatter is random and shows no correlation, which agrees with the previous hypothesis.

In order to understand the noise limitations, low-temperature testing was carried out. The nature of the noise may be discerned from the noise trend with temperature. In this measurement, an array of 256(H)x8(V) TPG jots was tested. As shown in Figure 6d, a scatter plot of the voltage-referred read noise versus CG of TPG jots at 25C and -70C is presented. The TPG jots showed an average read noise of 0.22e- rms with a best-case of 0.17e- rms at -70C. The reduction of the read noise at low temperature is mainly due to the reduction of 1/f noise [7]. The noise trend with the temperature will help us understand the nature of our 1/f noise and guide future design for better noise performance. The demonstrated low-temperature operation also shows the possibility for QIS to find applications in scientific imaging which benefits from low temperature.
3.2 Bandpass filtering for noise elimination

Further 1/f noise reduction can be achieved via bandpass filtering that is controlled by the CDS time difference [8]. Figure 6 (right) shows the timing diagram for applying bandpass filtering to SF noise testing. At the beginning of the measurement, the FD is reset and the SWs of the pixel are emptied. The SF output is sampled by two CDS capacitors by closing SHR and SHS at the same time. By adjusting the ∆t between the falling edge of these two sampling signals, the SF noise in different frequency regions can be characterized. The CDS samples were readout by the PGA using a CMS process to suppress the noise contributions from the PGA and the subsequent readout chain electronics. Since the chip was placed in a dark chamber during the testing, the photon shot noise is eliminated. Thus the total measured noise should only include the SF noise and the kTC noise from the CDS capacitor.
The read noise with different types of SFs is investigated at different CDS ∆t and temperatures. Different types of SFs are tested including buried-channel devices with gate widths of 0.14μm (BC014) and 0.18μm (BC018) and surface-channel devices with gate widths of 0.14μm (SC014) and 0.20μm (SC020). All devices have a gate length of 0.22μm. For each type of SF, an array of 256(H)x5(V) jots was tested. Since the amplitude of RTN is relatively big, the RTN jots (about 50 jots) are excluded for 1/f noise analysis. Only the quietest 1000 jots (shown in Figure 7) not showing RTN were analyzed for 1/f noise analysis. The histogram of the CG for the JFET jots biased at 1μA, 2μA and 4μA is shown in Figure 8a, demonstrating some devices with CG as high as 595μV/e-. The JFET devices completely remove the parasitic SF gate-oxide capacitance from the FD and replaces it with a PN junction capacitance, which is significantly lower, resulting in a higher CG. The mean CG for 1μA, 2μA and 4μA is 330μV/e-, 326μV/e- and 354μV/e- respectively. While we expect CG to go up with bias current, the data shows that the 2μA data includes more lower-CG devices than the 1μA data as some devices are out of the input common-mode range at 1μA bias current. However, we also note that there is an extremely large variation in the CG of more than 14.4%, which is problematic for the thresholding employed in a QIS for binary imaging. The implant process for the JFET devices needs to be optimized in the future to possibly improve this large spread and bring up the mean CG.

To characterize the noise of these devices, the JFET jots were placed in the dark, 100k continuous readouts were collected and the standard deviation was calculated as the noise. The results for 1024 JFET devices is shown in Figure 8b, under the BC JFET RT label. For reference, the voltage noise data for the TPG devices is included on the plot under BC MOSFET RT. The higher noise exhibited by the JFET jots is currently not understood, as, in theory, the noise should be significantly lower due to reduced interactions with the Si/SiO2 interface. The current hypothesis is that there may be some excess noise caused by leakage from the channel to the substrate increasing the current noise due to a weak channel confinement. In testing the JFET devices, we also saw devices demonstrating RTN noise, which should not occur, but may be present if the channel is not sufficiently buried from the interface traps.

3.3 Characterization of JFET jots

While the JFET jots were anticipated to have a lower noise than the TPG jots, room temperature tests show that the JFET jots have fairly high noise, enough that the PCH method could not be used to characterize the devices (i.e. read noise >0.45e- rms). Instead, to characterize the CG of the JFET devices, the photon transfer curve (PTC) method was used, where 1000 frames were used per light level, the PGA gain was set to 2x, and the PTC was evaluated for each JFET jot. The histogram of the CG for the JFET jots biased at 1μA, 2μA and 4μA is shown in Figure 8a, demonstrating some devices with CG as high as 595μV/e-. The JFET devices completely remove the parasitic SF gate-oxide capacitance from the FD and replaces it with a PN junction capacitance, which is significantly lower, resulting in a higher CG. The mean CG for 1μA, 2μA and 4μA is 330μV/e-, 326μV/e- and 354μV/e- respectively. While we expect CG to go up with bias current, the data shows that the 2μA data includes more lower-CG devices than the 1μA data as some devices are out of the input common-mode range at 1μA bias current. However, we also note that there is an extremely large variation in the CG of more than 14.4%, which is problematic for the thresholding employed in a QIS for binary imaging. The implant process for the JFET devices needs to be optimized in the future to possibly improve this large spread and bring up the mean CG.

To characterize the noise of these devices, the JFET jots were placed in the dark, 100k continuous readouts were collected and the standard deviation was calculated as the noise. The results for 1024 JFET devices is shown in Figure 8b, under the BC JFET RT label. For reference, the voltage noise data for the TPG devices is included on the plot under BC MOSFET RT. The higher noise exhibited by the JFET jots is currently not understood, as, in theory, the noise should be significantly lower due to reduced interactions with the Si/SiO2 interface. The current hypothesis is that there may be some excess noise caused by leakage from the channel to the substrate increasing the current noise due to a weak channel confinement. In testing the JFET devices, we also saw devices demonstrating RTN noise, which should not occur, but may be present if the channel is not sufficiently buried from the interface traps.
To further explore the noise performance of the new JFET devices, the devices were cooled to -70°C. At this temperature the noise of the JFET devices was sufficiently reduced to allow for characterization via the PCH, possibly due to a reduction in the previous mentioned leakage hypothesis. Figure 8b shows the noise results under BC JFET -70°C, which are calculated from the CG and read noise extracted via the PCH from 1024 JFET devices. We see a fairly large reduction in the mean noise while the more limited tail is caused by the PCH is only being valid for read noise >0.45e-rms. The collected TPG data at -70°C is included in Figure 8b for comparison under BC MOSFET -70°C. A scatter plot of the CG and the output voltage noise is shown in Figure 8c, where we also see no clear trend, but note a very large scatter especially compared to Figure 6c. Figure 8d shows the PCH for the lowest read noise JFET jot at -70°C which had a CG of 425µV/e- and a read noise of 0.187e-rms. Furthermore, in doing the PCH testing, we noted a “RTN-like” noise that occurred during the CMS readout, which was present in light, and was not present in dark conditions. The exact cause of this has not been identified, but it seems this effect is increasing the read noise of the JFET jots.

A summary comparing the TPG, PTR and JFET jot devices is shown in Table I, including some other key pixel parameters shared by all the devices.

4. ACKNOWLEDGMENTS

The authors appreciate the sponsorship and collaboration of Rambus Inc. in the initial stages of this work, as well as the support and collaboration of the Taiwan Semiconductor Manufacturing Company (TSMC). The characterization work was sponsored by the DARPA DETECT program through Army Research Office (ARO) Cooperative Agreement Number W911NF-16-2-0162. The views and conclusions contained in this document are those of the authors and should not be
Table I. Specifications of the 1Mjot single-bit QIS.

<table>
<thead>
<tr>
<th>Jot type</th>
<th>TPG</th>
<th>PTR</th>
<th>JFET</th>
</tr>
</thead>
<tbody>
<tr>
<td>Process</td>
<td>45nm (jot layer), 65nm (ASIC layer)</td>
<td></td>
<td></td>
</tr>
<tr>
<td>Jot pitch</td>
<td>1.1µm</td>
<td></td>
<td></td>
</tr>
<tr>
<td>Array</td>
<td>1024 (H) x 1024 (V)</td>
<td></td>
<td></td>
</tr>
<tr>
<td>Conversion Gain on column @ RT</td>
<td>345µV/e-</td>
<td>368µV/e-</td>
<td>330µV/e-</td>
</tr>
<tr>
<td>Conversion Gain Variation @ RT</td>
<td>2.6%</td>
<td>2.6%</td>
<td>14.4%</td>
</tr>
<tr>
<td>Input Referred Noise @ RT</td>
<td>0.23e- r.m.s.</td>
<td>0.21e- r.m.s.</td>
<td>-</td>
</tr>
<tr>
<td>Input Referred Noise @ -70C</td>
<td>0.22e- r.m.s.</td>
<td>-</td>
<td>0.3e- r.m.s.</td>
</tr>
<tr>
<td>Quantum Efficiency @ RT</td>
<td>71% @ 450nm</td>
<td>79% @ 550nm</td>
<td>69% @ 650nm</td>
</tr>
<tr>
<td>Dark current @ RT</td>
<td>0.16e-/s/jot</td>
<td></td>
<td></td>
</tr>
<tr>
<td>Dark current @ 60C</td>
<td>1.06e-/s/jot</td>
<td></td>
<td></td>
</tr>
<tr>
<td>Field rate</td>
<td>30fps (analog)</td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

REFERENCES