Here's some sample output, the original is on the left, the resized version is on the right:
This article is about optimising the code, so I won't go into how seam carving works. If you want to read about it the wikipedia page does a good enough job.
It does the job, but it's slow and memory hungry. My bot runs on a VPS with a gig of memory, and eating up half a gig to process one image causes a lot of thrashing. I attempted a hack to help the memory usage, but it didn't do much.
Here's some benchmarks to show how slow it is, I reduced the width of two images by half:
Almost three minutes for castle.jpg. I know my Intel i5 4670K is getting on a bit but surely we can do better.
The first pass was a straightforward implementation of the algorithm. I used typed arrays everywhere and created them with the correct size before the main loop to avoid allocating memory. This worked quite well, here are the benchmarks:
Memory usage was also low enough to be not noticeable. This was pretty good, but I still thought 15 seconds was too long for castle.jpg.
Despite trying to avoid allocation, one instance slipped through the net. I fixed that by re-using an array instead of allocating a new one, which offered a small performance benefit:
I then split the main loop into a set of small functions. This was mainly to help with debugging and profiling, but it did give a tiny performance increase:
I think this might be TurboFan optimising the small functions sooner than it would optimise the big main function. Or perhaps it can be more selective with what it optimises, rather than doing the whole thing.
Once the algorithm has decided which seam to get rid of, those pixels need to be removed, one per row in the image. I do this in place in the array of pixels and every time one is removed the rest of the array needs to shuffle along to close the gap. I originally did this using a loop that copies each pixel to it's new location, skipping removed ones.
After running out of ideas and staring at the TypedArray docs on MDN, I found Array.copyWithin. It's like
memmove() and lets you shift data around in an array. Using that, I avoid looping over every pixel and instead only loop over every row, copying the left half of the row, skipping the removed pixel, then copying the right half.
Finally under a second for something. An arbitrary goal, but being able to say "...in under a second" feels good. The version of dog.jpg from the commit messages was larger, which is why the time here is different.
The Chrome devtools line-level profiling showed that some lines using
Math.max used up a lot of time. These lines are in the middle of the hottest loop in the library, and my assumption was they cause lots of branches (and branch misses). I know that min and max of two numbers can be compiled to one instruction with no branches, but I don't know if v8 can do that.
I decided to make the branchless versions myself instead.
This is how they work, given x and y:
x - y
(1 - size of the number)times. 31 in this case.
(x - y) & 0, or 0
(x - y) & (232 - 1), or
x - y
y + 0, or y
y + (x - y), or x
x - 0, or x
x - (x - y), or y
These only work on 32 bit integers. I'm using
Uint32Array everywhere, so I don't have to worry about other types sneaking in.
Here's the benchmark afterwards:
It worked quite well.
That's where it stands right now. I've made this look like a smooth process, but rest assured most of my ideas made it slower rather than faster, and had to be abandoned.
Here's a comparison between the original library that started this and mine:
|dog.jpg||38732 ms||618 ms||62.67x|
|castle.jpg||171990 ms||3816 ms||45.07x|