Custom Photo Importer with Golang
2018-05-20
After installing Kubuntu on my Dell XPS 15, I intended to use Darktable for photo management and editing. I tried it out and couldn't get used to the photo management aspects, so I looked for some alternatives in the KDE world. I tried several other programs including DigiKam, but I didn't like anything I tried for one reason or another. Some don't play well with the 4K screen on the Dell. Some don't match my workflow. Some are missing features.
I did like Gwenview for basic photo management. Oddly enough, it doesn't have a photo import tool, so I decided to piece together a small Bash script to handle importing.
Quick Bash Script
Since my days using Lightroom, I've organized my imported photos by date in top-level directories according to the year and subdirectories according to the full date. My photo collection looks something like this:
~/photos/originals
/2017
/2017-01-02
photo1.jpg
photo2.jpg
photo3.jpg
/2017-10-31
photo4.jpg
/2018
/2018-04-01
photo5.jpg
photo6.jpg
I wanted my script to keep the same structure. To get the date from the image
files, I'm using ImageMagick's identify
command and piping the output to
grep
and awk
, and then again using awk
to pull out just the year, month,
and day for constructing the subdirectories.
#! /bin/bash
import_dir=$1
base_destination=~/photos/originals
counter=1
for file in $import_dir/*.*
do
exif_date_time=`identify -verbose $file | grep -i "exif:DateTime:" | awk '{print $2}'`
year=`echo $exif_date_time | awk -F: '{print $1}'`
month=`echo $exif_date_time | awk -F: '{print $2}'`
day=`echo $exif_date_time | awk -F: '{print $3}'`
subdirectory="$year-$month-$day"
old_file_name=`basename $file`
new_file_name="$year-$month-$day-$old_file_name"
destination="$base_destination/$year/$subdirectory/$new_file_name"
echo "Copying $file to $destination ($counter)"
mkdir -p "$base_destination/$year/$subdirectory"
cp -ir "$file" "$destination"
counter=$((counter + 1))
done
I used the Bash script for a while, but it was painfully slow - around a second per image file.
Rewrite with Golang
Since I finished my book notes utility, I've been looking for another small project to build with Go, and this was a good opportunity. I decided I'd use this project to try out Go's unit testing and dependency management tools. Plus, it should be much faster than the Bash script.
I found a Go package for reading image EXIF data and decided to try it out. The package works great:
file, err := os.Open(sourceFilePath)
// some error handling
exifData, err := exif.Decode(file)
// some error handling
dateTime, err := exifData.DateTime()
// some error handling
Date formatting in Go is a bit odd coming from other languages as it doesn't use the more common pattern strings approach where specific letters in the string represent parts of the date (e.g. "yyyy-MM-dd"). From the Go time package docs:
func (t Time) Format(layout string) string
Format returns a textual representation of the time value formatted according
to layout, which defines the format by showing how the reference time, defined
to be
Mon Jan 2 15:04:05 -0700 MST 2006
would be displayed if it were the value; it serves as an example of the desired
output. The same display rules will then be applied to the time value.
It took me a while to understand what this meant, but I eventually caught on, and it's a clever and convenient technique. This comment in one of the examples helped to clarify a bit more:
We stress that one must show how the reference time is formatted, not a time of
the user's choosing. Thus each layout string is a representation of the time
stamp
To format a date, just give an example of how the reference date should look. If you want to output just the full month, then use the format string "January" since that's the full month from the reference date. For the full year, use the format string "2006". For the last two digits in the year, use "06". Now I wish every other language and date library would add this technique as an option.
The Bash script used the cp
command and so included the option to handle file
name collisions interactively. This needed to be implemented in the Go version.
This function handles file name collisions by renaming the destination file
automatically:
func getPathWithoutCollision(path string, originalPath string, collisions map[string]int) string {
if _, err := os.Stat(path); err == nil {
if originalPath == "" {
originalPath = path
}
next := collisions[originalPath] + 1
collisions[originalPath] = next
ext := filepath.Ext(originalPath)
withoutExt := strings.TrimSuffix(originalPath, ext)
newPath := withoutExt + "_" + strconv.Itoa(next) + ext
safePath := getPathWithoutCollision(newPath, originalPath, collisions)
return safePath
}
return path
}
Using Go's testing
package for unit tests was straightforward. One nice change
to the usual unit testing pattern with other languages is that you don't need to
make assertions. Instead, just call the Error
function if there's a test
failure. This simplifies the testing package quite a bit since there's no need
for the various types of assertion functions. Instead, you can just use the
language itself for comparisons and only call Error
when needed.
This was my first time using Go's dep dependency management tool, and again it was fairly straightforward coming from other dependency management tools such as NPM and Maven. It'll be interesting to see how the recent vgo proposal affects dep going forward.
You can find the full source code for the Go photo importer on GitHub.
Bash vs Go Performance
I expected the Go version to be faster than the Bash version, but not this much
faster. Using a test import consisting of 467M spread across 49 jpg files, the
Bash script took 53 seconds and used 163M of memory, and the Go version took
half a second and used 7M of memory. I'm guessing that the ImageMagick
identify
command is loading the entire image into memory and the goexif
package isn't, but I haven't verified that
yet.