The Penguin Hiding Problem: How to Hide One Photo Within Another

9 minute read

Published:

The Nature Briefing has a section they run where their resident rockhopper penguin, Leif Penguinson, hides within a photo of an beautiful and scientific location. You can check out all of Leif’s historical hiding places here. Nature’s “media maestro” Tom Houghton chooses where to hide Leif. Below are two solved examples (credit to the Briefing)

​ This penguin is genuinely one of my favourite parts of the Breifing — along with the science updates and discussion of academic culture of course. While searching for Leif, I wondered how he chose his hiding spot.

​ Thus, here we are, exploring the optimal way to hide a penguin in a photo. By hide, I don’t mean steganography, where you want to encode a message in a photo in a way that is imperceptible to a human eye. Rather, I mean finding the most penguiny looking patch of pixels in a target photo. Once you know this, replace that patch of pixels with your penguin and voila, Leif is hidden!

​ Let’s formalize the problem a little so we can get some equations. For now, we’ll also ignore the fact that images have red, green, blue, (and opacity) channels and just pretend all our images are black and white. We’ll have a base image $A \in \mathbb{R}^{n \times m}$ and a target image $t \in \mathbb{R}^{h \times \ell}$ where $h \leq n, \ell \leq m$. Our goal here is to find the $h \times \ell$ patch of $A$ that best matches $t$.

Well, it turns out that the process of finding and computing things about these patches is a very common routine. In fact, the very same process is what powers convolutional neural networks. In order to find which patch is the most penguiny, we need to find all the patches. This can be done with the im2col algorithm. im2col is an efficient way of collecting all the patches of a certain size that can be made from an image and storing all the patches in a new matrix. With a little bit of thought, we can see that there will be $p = (n-h+1) \times (m - \ell + 1)$ patches of size $h \times \ell$. Then, im2col$(A) = W$ where $W \in \mathbb{R}^{p \times (h \ell)}$ and each row of $W$ is a flattened patch.

​ There are some tricks you can play using your computers memory locations in order to make this process quite efficient. Alongside that, im2col allows you some extra parameters which affect the windows you get in the end. I won’t be going fully into this algorithm here and besides, Anirudh Shenoy already explained all the details better than I could. That being said, I did create a gif showcasing the idea behind the algorithm below. (The gif only plays 3 times — I don’t like it when gifs just keep repeating)

Our end result is a matrix $W$ that holds all our patches. I’ve been displaying our base image, target, and patch matrix with colours. But internally, each pixel is represented in terms of their red, green, and blue values with each value $0$ and $255$. This results in each pixel being associated with a $3d$ vector. Pure red is $[255,0,0]$. Pure green is $[0,255,0]$. And pure blue is $[0,0,255]$. Below we see our patches with each pixel represented by RGB values.

Then to compute which patch is most like our target, we just have to compute the distance between each row of $W$ and the flattened out target $t$.

We’ll use the Euclidean distance to calculate this. Remember that for two vectors $u,v$ in $\mathbb{R}^d$ the Euclidean distance between them is $\text{dist}(u,v) = \sqrt{ \sum_{i=1}^d (u_i - v_i)^2}$. Calculating the distance between each row of $W$ and the flat target, we get

$\text{dist}(W,t) = [509, 604, 557, 627, 632, 211, 372, 559, 553, 553, 657, 610]$ rounded to the nearest integer. And voila, now we know that the $6$th patch is closest to the target! In general, we are looking for $i^* = \arg\min \text{d}(W,t)$. However, it is actually easier to $0$ index the patches. So in the example above, $i^* = 5$.

Now that we know the index of the patch which looks the most like our target, we just have to convert this index to a location in the base image. I just figured this formula out with a little bit of guess and check, but this conversion is pretty common so it must exist out there somewhere. The optimal location in the base image is given by

\[i^*_A = \left [ \lfloor i^* / (m - \ell + 1) \rfloor, i^* \mod (m-\ell + 1) \right ]\]

where $\lfloor a \rfloor$ rounds down to the nearest integer—the floor function. In our example above, the optimal location is

\[\begin{split} i^*_A & = \left [ \lfloor 5 / (5 - 3 + 1) \rfloor, 5 \mod (5-3 + 1) \right ] \\ & = \left [ 1 , 2 \right ] \end{split}\]

This location is $0$ indexed from the top left of $A$. To hide our target in the base image we now replace the $h \times \ell$ patch of $A$ beginning at index $i_A^*$ with our target $t$. For the example above, this gives us

Before we are fully ready to find Leif’s optimal hiding spot, we should address a few more details. First, Leif is not perfectly rectangular, so we’ll need to address non-rectangular target images. Secondly, it also seems fair to rotate and flip our target image. Finally, how fast is this method on a real life example such as hiding Leif Penguinson. Let’s address these one-by-one.

Non-Rectangular Targets

In real scenarios, we’ll want to hide non-rectangular target images. However, this situation is not that hard to deal with in our existing framework. Every non-rectangular target fits within a rectangular bounding box and we can use the shape of the bounding box as our patch size. If we keep track of where our true image is within the bounding box then we can just go ahead and ignore the values not within the true image when calcularing the distance between our patches and the target image. For example, say we want to hide the below target in the same base $A$

We simply find the bounding box

and ignore the shaded pixels when calculating the distances between the patches

Recalculating, we get $\text{dist}(W,t) = [458, 537, 356, 409, 574, 157, 319, 374, 468, 506, 454, 439]$ rounded to the nearest integer. Again, $i^* = 5$. And this time when replacing our target in the base image, we just don’t replace the shaded out pixels.

Flips and Rotations

We can find the flips and rotations in the exact same manner as above. However, unlike flips, rotations change the size of the target. Changing the size of the target requires re-computing the $W$ matrix, which can get expensive. I think there is a trick you can play, but we’ll leave that for another day. Flips on the other hand are simple. $W$ remains exactly the same so we can just calculate all the distances between the patches and both the original and flipped target. Everything else remains the same and we just sub in the flipped target instead of the original if that turns out to be a better match. Leif seems not to have been rotated in most of the early hiding places, so I feel justified leaving the rotations for another day.

Complexity

There are $(n - h + 1) \times (m-\ell+1)$ convolutions in total and the matrix storing the windows has $(n - h + 1) \times (m-\ell+1) \times h\cdot\ell$ total entries. Thus, we see a tradeoff. Increasing the size of our patches reduces the total number of patches, yet increases the overall size of $W$. In practice, although smaller patch sizes require more rows in $W$, it is nonetheless faster to form the patch matrix using smaller patch sizes.

Side-by-Side Comparison

Finally! Let’s test out where the optimal Leif Penguinson hiding spot is compared to his hiding spot in the Nature Briefing.

Nature Briefing

Mine:

Nature Briefing

Mine:

Nature Briefing

Mine:

Some Originals!

Now, let’s see what the code can do on some other photos that I’ve gotten from the Wikimedia Commons. I’m not going to be giving the solved solutions here so have fun! I do promise that Leif is reasonably findable in all the below photos. You may want to download and enlarge them.

Crater Lake National Park in Oregon, USA

​ (By WolfmanSF, CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=23466412)

Zion Park in Utah, USA

​ (By Michael Gäbler, CC BY 3.0, https://commons.wikimedia.org/w/index.php?curid=18669741)

The White Terraces in Pamukkale, Turkey

​ (By A.Savin - Own work, FAL, https://commons.wikimedia.org/w/index.php?curid=94207479)

Try it Yourself

If you are interested in exploring the code I wrote, you can find it at: https://github.com/nmarzz/Penguin-Placing