Bryan Grohman

All Writing

Custom Photo Importer with Golang

2018-05-20

Gwenview photo management application

After installing Kubuntu on my Dell XPS 15, I intended to use Darktable for photo management and editing. I tried it out and couldn't get used to the photo management aspects, so I looked for some alternatives in the KDE world. I tried several other programs including DigiKam, but I didn't like anything I tried for one reason or another. Some don't play well with the 4K screen on the Dell. Some don't match my workflow. Some are missing features.

I did like Gwenview for basic photo management. Oddly enough, it doesn't have a photo import tool, so I decided to piece together a small Bash script to handle importing.

Quick Bash Script

Since my days using Lightroom, I've organized my imported photos by date in top-level directories according to the year and subdirectories according to the full date. My photo collection looks something like this:

~/photos/originals
    /2017
        /2017-01-02
            photo1.jpg
            photo2.jpg
            photo3.jpg
        /2017-10-31
            photo4.jpg
    /2018
        /2018-04-01
            photo5.jpg
            photo6.jpg

I wanted my script to keep the same structure. To get the date from the image files, I'm using ImageMagick's identify command and piping the output to grep and awk, and then again using awk to pull out just the year, month, and day for constructing the subdirectories.

#! /bin/bash
import_dir=$1
base_destination=~/photos/originals
counter=1

for file in $import_dir/*.*
do
    exif_date_time=`identify -verbose $file | grep -i "exif:DateTime:" | awk '{print $2}'`
    year=`echo $exif_date_time | awk -F: '{print $1}'`
    month=`echo $exif_date_time | awk -F: '{print $2}'`
    day=`echo $exif_date_time | awk -F: '{print $3}'`
    subdirectory="$year-$month-$day"
    old_file_name=`basename $file`
    new_file_name="$year-$month-$day-$old_file_name"
    destination="$base_destination/$year/$subdirectory/$new_file_name"
    echo "Copying $file to $destination ($counter)"
    mkdir -p "$base_destination/$year/$subdirectory"
    cp -ir "$file" "$destination" 
    counter=$((counter + 1))
done

I used the Bash script for a while, but it was painfully slow - around a second per image file.

Rewrite with Golang

Since I finished my book notes utility, I've been looking for another small project to build with Go, and this was a good opportunity. I decided I'd use this project to try out Go's unit testing and dependency management tools. Plus, it should be much faster than the Bash script.

I found a Go package for reading image EXIF data and decided to try it out. The package works great:

file, err := os.Open(sourceFilePath)
// some error handling
exifData, err := exif.Decode(file)
// some error handling
dateTime, err := exifData.DateTime()
// some error handling

Date formatting in Go is a bit odd coming from other languages as it doesn't use the more common pattern strings approach where specific letters in the string represent parts of the date (e.g. "yyyy-MM-dd"). From the Go time package docs:

func (t Time) Format(layout string) string

Format returns a textual representation of the time value formatted according
to layout, which defines the format by showing how the reference time, defined
to be

Mon Jan 2 15:04:05 -0700 MST 2006

would be displayed if it were the value; it serves as an example of the desired
output. The same display rules will then be applied to the time value.

It took me a while to understand what this meant, but I eventually caught on, and it's a clever and convenient technique. This comment in one of the examples helped to clarify a bit more:

We stress that one must show how the reference time is formatted, not a time of
the user's choosing. Thus each layout string is a representation of the time
stamp

To format a date, just give an example of how the reference date should look. If you want to output just the full month, then use the format string "January" since that's the full month from the reference date. For the full year, use the format string "2006". For the last two digits in the year, use "06". Now I wish every other language and date library would add this technique as an option.

The Bash script used the cp command and so included the option to handle file name collisions interactively. This needed to be implemented in the Go version. This function handles file name collisions by renaming the destination file automatically:

func getPathWithoutCollision(path string, originalPath string, collisions map[string]int) string {
    if _, err := os.Stat(path); err == nil {
        if originalPath == "" {
            originalPath = path
        }
        next := collisions[originalPath] + 1
        collisions[originalPath] = next
        ext := filepath.Ext(originalPath)
        withoutExt := strings.TrimSuffix(originalPath, ext)
        newPath := withoutExt + "_" + strconv.Itoa(next) + ext
        safePath := getPathWithoutCollision(newPath, originalPath, collisions)
        return safePath
    }

    return path
}

Using Go's testing package for unit tests was straightforward. One nice change to the usual unit testing pattern with other languages is that you don't need to make assertions. Instead, just call the Error function if there's a test failure. This simplifies the testing package quite a bit since there's no need for the various types of assertion functions. Instead, you can just use the language itself for comparisons and only call Error when needed.

This was my first time using Go's dep dependency management tool, and again it was fairly straightforward coming from other dependency management tools such as NPM and Maven. It'll be interesting to see how the recent vgo proposal affects dep going forward.

You can find the full source code for the Go photo importer on GitHub.

Bash vs Go Performance

I expected the Go version to be faster than the Bash version, but not this much faster. Using a test import consisting of 467M spread across 49 jpg files, the Bash script took 53 seconds and used 163M of memory, and the Go version took half a second and used 7M of memory. I'm guessing that the ImageMagick identify command is loading the entire image into memory and the goexif package isn't, but I haven't verified that yet.