Cut-out image mosaics -
Copyright �� 2008 by the Association for Computing Machinery, Inc. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from Permissions Dept, ACM Inc., fax +1 (212) 869-0481 or e-mail permissions@acm.org. NPAR 2008, Annecy, France, June 09���11, 2008. �� 2008 ACM 978-1-60558-150-7/08/0006 $5.00 Cut-Out Image Mosaics Jeff Orchard Craig S. Kaplan* David R. Cheriton School of Computer Science University of Waterloo Figure 1: Cut-out image mosaic involving various target tile shapes, colour correction, and the ability to select sub-images from images in the source database. The target and source images were taken from the Library of Congress collection on flickrTM. Abstract An image mosaic is a rendering of a large target image by arranging a collection of small source images, often in an array, each chosen specifically to fit a particular block of the target image. Most mo- saicking methods are simplistic in the sense that they break the tar- get image into regular tiles (e.g., squares or hexagons) and take ex- treme shortcuts when evaluating the similarity between target tiles and source images. In this paper, we propose an efficient method to obtain higher quality mosaics that incorporate a number of pro- cess improvements. The Fast Fourier Transform (FFT) is used to compute a more fine-grained image similarity metric, allowing for optimal colour correction and arbitrarily shaped target tiles. In addi- tion, the framework can find the optimal sub-image within a source image, further improving the quality of the matching. The similar- ity scores generated by these high-order cost computations are fed into a matching algorithm to find the globally-optimal assignment of source images to target tiles. Experiments show that each im- provement, by itself, yields a more accurate mosaic. Combined, the innovations produce very high quality image mosaics, even with only a few hundred source images. Keywords: image mosaic, registration, non-photorealistic, least- *e-mail: {jorchard,cskaplan}@uwaterloo.ca squares, Fourier transform, assignment problem. 1 Introduction The first image mosaics were large murals formed by placing thou- sands of coloured tiles [Battiato et al. 2006]. Inspired by these works of art, today the term ���image mosaic��� refers to a stylized representation of a large image, the ���target��� image, formed by piec- ing together a collection of carefully chosen smaller images called ���source��� images. The target image is subdivided into small pieces, called ���target tiles��� (typically rectangles), and each tile is filled with a source image that approximates the tile���s contents. Image mosaics communicate at two disparate scales. These two scales act as a symbolic divide, so that the target image is concep- tually set apart from the contents of the tiles that comprise it. This dichotomy provides a rich environment for combining images that either suggest the same message from two different perspectives, or supply contrasting viewpoints. For example, an image of a car might be made out of pictures of the employees that manufactured it, or pictures of bicycles. Image mosaics are a powerful medium for conveying such split-level messages. Every image mosaic must strike a balance between the opposing 79
goals of accuracy and discernability. Accuracy seeks to reproduce the target image as closely as possible discernability seeks to en- sure that each of the source tiles is legible. It is easy to trade off be- tween these two goals by controlling tile sizes, but more interesting to attempt to achieve accuracy and discernability simultaneously. In this paper we introduce a novel method for assigning source im- ages to tiles with much higher accuracy than previous approaches. This extra margin of quality allows us to achieve overall accuracy comparable to previous techniques but with larger individual tiles, thereby increasing discernability. One way to improve the accuracy of a mosaic is increase the num- ber of images in the source database. As the database grows, so does the probability of finding a close match for a target tile. How- ever, more source images require more processing time to create the mosaic. Also, not every mosaic should be produced using thou- sands or tens of thousands of images. Occasionally, limited re- sources or thematic constraints may restrict us to a few hundred source images. Previous work has sought methods to speed up the matching pro- cess to support efficient search over large image databases. Typi- cally, these approaches summarize a source image as a ���signature��� consisting of a few numbers. Matching can then be done at the sig- nature level. The most common technique is to partition the source image into a regular grid and average the contents of each grid cell down to a single colour [Di Blasi et al. 2006 Silvers 2001 Tran 1999 Zhang 2002]. Di Blasi et al. [2006] partition each image into a 3 �� 3 grid and compute the average red, green and blue values within each grid cell, yielding a signature of 27 numbers. They arrange their source image database into an antipole tree data struc- ture, reducing the number of comparisons needed to find a close match. Zhang [2002] partitions source images into four cells. Each cell is represented by a coarse histogram, and the histograms are reduced in turn to binary signatures. Finkelstein and Range [1998] use the wavelet signature approach of Jacobs et al. [1995] to ac- celerate their mosaic method. The wavelet signature records the largest coefficients of the wavelet transform of an image. These aggregate signatures place a bound on the smallest image features that can be used to compare source images to target tiles. For example, the 3 �� 3 signatures of Di Blasi et al. mean that their algorithm will be blind to any features smaller than one ninth of a source image. For a mosaic consisting of an M �� N grid of tiles, accuracy can effectively be no better than that of a 3M ��3N image. These algorithms could clearly benefit from a finer-grained com- parison metric, which could find correspondences between features at any size. Still, without a huge database, the chances of finding close matches is lower. The odds are greatly improved if we al- low the possibility of matching a target tile to a portion (or cut-out) of a source image, as in Figure 1(d). Given a user-defined target tile, we would consider all possible shifts (translations) of the tar- get tile���s footprint within the source image. This change would logically multiply our source database by a large constant factor (the number of possible shifts for each image), yielding more op- portunities for closer matches with a relatively modest database. Of course, we must deal with the seeming increase in computational complexity. In Section 2, we present an FFT-based approach that can feasibly compute fine-grained matching quality for all shifts. A complementary strategy to improve mosaic accuracy is to allow for colour shifting by applying a global scaling factor to the colour components of an image. Doing so can alter the colour composi- tion of an image dramatically, while still depicting geometric con- tent [Di Blasi et al. 2005]. Finkelstein and Range [1998] also ap- plied a colour shift, though their shift was potentially different at each target pixel. This pixel-wise method can introduce phantom features in the source images, even rendering them unrecognizable in other words, it increases accuracy at the expense of discernabil- ity. We show in Section 2 how our technique can be extended to compute an optimal colour adjustment for any source image as- signment. Assuming that we can feasibly compute the matching cost of every shift of every source image against every target tile, we are still left with the question of selecting the particular assignment of source images to tiles that will produce a pleasing overall result with high quality. The natural choice of a greedy algorithm is not always best. In Section 3 we discuss our assignment algorithm, which makes a global choice based on all matching costs. We describe our implementation in Section 4. In Section 5 we present results that isolate the effects of these various improve- ments, and show that our technique helps to improve accuracy. 2 Evaluating matching cost Let us assume that we are given a target image ~(i,j) T from which we wish to construct a mosaic. (We use uppercase italic letters to denote images, and include an arrow over an image name when pixels in that image are vectors as opposed to scalars.) Fix a single tile shape in the mosaic, and represent it via a characteristic func- tion W(i,j) that is 1 inside the tile and 0 everywhere else. That is, the non-zero part of W indicates the portal through which the source image and target image are compared. Fix a source image ~(i,j). S We now consider the problem of computing C(a,b), the image dissimilarity over all tile pixels between the target image and the source image shifted by (a,b). A reliable metric is the weighted sum of squared differences (SSD) cost function C(a,b) = X i,j k~(i S - a,j - b) - ~(i,j)k2W(i,j), T (1) where k �� k represents the Euclidean distance between two colours and the summation is taken simultaneously over all i and j. We may also wish to allow for colour shifting by scaling the differ- ent colour components in ~ S to best match the target ~. T We let D be a diagonal matrix with a scaling factor for each colour component of ~, S and rewrite the cost function as C(a,b, D) = X i,j kD~(i S - a,j - b) - ~(i,j)k2W(i,j). T (2) Given a source image ~, S it is infeasible to find values of (a,b) and D that minimize C(a,b, D) by direct computation of Equation 2. If ~ T and ~ S are roughly N �� N pixels in size, computing the cost function for a single shift (a,b) requires O(N2) time. However, there are also roughly N2 possible shifts, meaning that it would take O(N4) time to find the optimal (a,b), and even then we would need to compute D. Bear in mind also that we will eventually want to compute the matching costs for all possible source-image/target- tile pairs. If there are P source images and M tiles, we would expect a brute force algorithm to take O(N4PM) time in total. However, we can greatly speed up this computation by considering the problem in the frequency domain. To see how, we first separate Equation 2 into expressions for each of three colour components k, obtaining Ck(a,b,Dk) = X i,j DkSk(i - a,j - b) - Tk(i,j) 2 W(i,j) , (3) 80