Connected Component Labeling and Its Applications

Connected Component Labeling is commonly used to refer to the task of grouping the connected pixels in an image. (Note that there are different ways to define connectedness.) The basic approach is to scan the image and assign labels to each pixel until the labels for the pixels no longer change. This basic approach is slow because the labels propagates one layer in an iteration. There are two common strategies to speed up this process. The most common strategy leads to a two-pass algorithm. This algorithm uses a data structure to record label equivalence information. It scans the image once to assign provisional labels and discover the label equivalence information, and scans the image a second time to assign the final labels. Another very successful strategy is the one-pass algorithms, that find all connected pixel in one shot, for example through recursively visit all the connected neighbors. One of the most successful approach in this category is the Contour Tracing algorithm by Chang et al. (CVIU 93:206-220, 2004). They also distribute an implementation of their algorithm on-line. The most commonly used data structure for recording the label equivalence information in a two-pass algorithm is the union-find data structure. Researchers have recognized the possiblity of implementing this data structure implicitly using an array instead of pointers. We designed a set of algorithms that enabled us to operate on this array efficiently, and prove that a two-pass labeling algorithm with this implicit union-find data structure is theoretically as good as the best known labeling algorithms. We further show through experimental measurements that it in fact is faster than the best known one-pass algorithms.

Selected Publications

Kesheng Wu, Ekow Otoo and Kenji Suzuki. Two Strategies to Speed up Connected Component Labeling Algorithms. LBNL Tech Report LBNL-59102. 2005. [Abstract] [Full Text] [PAA v 12 p 117--135]: This paper reports our formal analysis of the underlying algorithms we used in a number of applications. An early version containing only empirical study of the algorithms was reported at SPIE Medical Imaging Conference 2005 [Preprint LBNL-56864]. The key element of our algorithms is a simple union-find data structure and a set of algorithm to work on it. This data structure allows the connected component labeling to be performed in linear time, which is considered theoretically optimal (cf. CVIU 93:206-220 ICIAP'99 and ICPR 2000).
In timing measurements against the fastest known labeling algorithm for 2D images (distributed by the original authors of the Contour Tracing algorithm), our optimized labeling algorithm can be up to 10 times faster on large random images. This is primarily because our new algorithm requires less memory than the Contour Tracing algorithm and accesses memory in more regular patterns as well.
In 2007, Yapa and Koichi published an independent evaluation of our algorithm and found it to be very hard to beat as well.
Kurt Stockinger, John Shalf, Kesheng Wu, Wes Bethel. Query-Driven Visualization of Large Data Sets, IEEE Visualization, Minneapolis, Minnesota, USA, October 2005, IEEE Computer Society Press. [Abstract] [preprint as LBNL-57511]: This paper describes our work on the dextrous data explorer (DEX). It marks the beginning of very fruitfuly line of work on query-driven visualization, where the query capability of FastBit is used to efficiently and effectively limit the amount of data deliever to the visualization pipeline. This combination not only improve the responsiveness of the viaulization system, but also reduces the visual clutter and focuses the human perception system in a more productive way.
Kesheng Wu, Wendy Koegler, Jacqueline Chen and Arie Shoshani, Using Bitmap Index for Interactive Exploration of Large Datasets. In Proceedings of SSDBM 2003. [Abstract] [Preprint (LBNL-52535)]: This paper describes the first application that motivated our work on connected component labeling. We were trying to meet the need of identifying and tracking ignition kernels in a combustion simulation. The procedure we use to track ignition kernel can be divided into three steps: (1) searching for points with user specified characteristics of burning, (2) connecting the points into regions in space, and (3) computing overlaps among regions from adjacent time steps to track their evolution. The FastBit search technology can accomplish step (1) very efficiently. The underlying data structure also allows us to compute the overlaps among regions efficiently too. Therefore, we have solutions to two of the three steps. What was missing was an efficient technology to accomplish step (2), which can be treated as a connected component labeling problem.

Here is a more extensive list of publications on connected component labeling and its applications. This work forms the basis of many fruitful collaborations with the Visualization group at LBNL, including DEX.

Related sites:: A description of connected component labeling for image analysis as it relates to HIPR2.; A free implementation of image analysis tool that uses connected component labeling idea.; Connected Components of a Graph
More research work by John Wu: Bitmap Index; Connected Component Labeling; Eigenvalue Computation
Information available elsewhere on the web: CiteSeer; DBLP; Google Scholar