I recently switched DSLR camera systems from Canon to Nikon for reasons of marital harmony. That meant choosing which Nikon lenses would replace the four Canon lenses I owned. To make an optimal decision I needed to know my historical usage, so I wrote some python to analyze image metadata from 10 years of digital photography.

In [1]:

import pandas as pd
import numpy as np
%matplotlib inline
import matplotlib
import matplotlib.pyplot as plt
matplotlib.style.use('ggplot')

from PIL import Image

Import the metadata parsing script I wrote, available here.

In [2]:

import exif_data # https://github.com/frankcleary/img-exif/blob/master/exif_data.py

In [3]:

df = exif_data.make_df('/Users/frank/Pictures/Photos Library.photoslibrary/Masters',
                       columns=['DateTimeOriginal', 'FocalLength', 'Make', 'Model'])
df.head()

Out[3]:

	DateTimeOriginal	FocalLength	Make	Model
0	2015:03:20 15:26:48	(28, 1)	Canon	Canon EOS DIGITAL REBEL XTi
1	2015:03:20 15:39:40	(28, 1)	Canon	Canon EOS DIGITAL REBEL XTi
2	2015:03:20 15:39:55	(28, 1)	Canon	Canon EOS DIGITAL REBEL XTi
3	2015:03:20 15:40:26	(28, 1)	Canon	Canon EOS DIGITAL REBEL XTi
4	2015:03:20 15:40:42	(28, 1)	Canon	Canon EOS DIGITAL REBEL XTi

The operation to get the data is slow (lots of disk I/O), so I made a copy to avoid accidentally modifying the original during interactive analysis.

In [4]:

exif_df = df.dropna().copy()
exif_df['RealFocalLength'] = exif_df['FocalLength'].apply(
    lambda x: x[0] / float(x[1]) if x is not None else None
)
exif_df.head()

Out[4]:

	DateTimeOriginal	FocalLength	Make	Model	RealFocalLength
0	2015:03:20 15:26:48	(28, 1)	Canon	Canon EOS DIGITAL REBEL XTi	28
1	2015:03:20 15:39:40	(28, 1)	Canon	Canon EOS DIGITAL REBEL XTi	28
2	2015:03:20 15:39:55	(28, 1)	Canon	Canon EOS DIGITAL REBEL XTi	28
3	2015:03:20 15:40:26	(28, 1)	Canon	Canon EOS DIGITAL REBEL XTi	28
4	2015:03:20 15:40:42	(28, 1)	Canon	Canon EOS DIGITAL REBEL XTi	28

First I investigated statistics broken down by camera, plotting the number of photos and the cumulative distribution function by focal length. On my 8 year old Canon XTi over 80% were taken at focal lengths shorter than 55 mm, which indicated that Nikon's extremely light 18-55mm lens would be a better choice than the heavier 18-140mm.

In [5]:

def format_plot(x_range, xticks, yticks, ylabel, xlabel, title):
    """Set names, labels and axis limits on the current matplotlib plot.
    """
    axis_label_props_dict = {'fontsize': 16, 'fontweight': 'bold'}
    title_props_dict = {'fontsize': 20, 'fontweight': 'bold'}
    plt.xlim(x_range)
    plt.gca().set_xticks(xticks);
    if yticks is not None:
        plt.gca().set_yticks(yticks);
    plt.ylabel(ylabel, **axis_label_props_dict)
    plt.xlabel(xlabel, **axis_label_props_dict)
    plt.title(title, **title_props_dict)
    
def plot_for_camera(model):
    """Generate a histrogram and CDF by focal length for the provided model
    of camera.
    """
    plt.figure()
    model_df = exif_df.query('Model == "{}"'.format(model))
    model_df['RealFocalLength'].hist(bins=xrange(10, 202, 2), figsize=(16, 5))
    format_plot([10, 202], xrange(10, 202, 5), None, 
                '# of photos', 'Focal length', model)
    plt.figure()
    model_df['RealFocalLength'].hist(bins=xrange(10, 202, 2), 
                                figsize=(16, 5), 
                                cumulative=True,
                                normed=True)
    format_plot([10, 202], xrange(10, 202, 5), np.arange(0, 1.01, .05), 
                'Fracation of photos', 'Focal length', model)
    plt.ylim([0, 1])
    
camera_models = ['Canon EOS DIGITAL REBEL XTi',
                 'NIKON D5300',
                 'NIKON D5500',
                 'Canon PowerShot S90']
for model in camera_models:
    plot_for_camera(model)

With all this data available, I was also curious how my camera usage as evolved over time. Important life events and vacations really stand out. The code below creates a table containing the count of images from each camera by day.

In [6]:

images_by_date = pd.crosstab(index=exif_df['DateTimeOriginal'], 
                             columns=exif_df['Model'])
images_by_date.index = pd.to_datetime(images_by_date.index,
                                      format='%Y:%m:%d %H:%M:%S',
                                      coerce=True)
images_by_date = images_by_date[pd.notnull(images_by_date.index)]
images_by_date.head()

Out[6]:

Model	C990Z,D490Z	...	KODAK DX4330 DIGITAL CAMERA
2002-12-31 21:54:57	0	...	1
2003-03-29 13:28:50	1	...	0
2003-03-29 13:29:01	1	...	0
2003-03-29 13:31:36	1	...	0
2003-03-29 13:31:39	1	...	0

5 rows × 28 columns

I wrote this helpful method to annotate the plot below.

In [7]:

def annotate(text, xy, xytext):
    """Annotate the current matplotlib axis"""
    plt.gca().annotate(text, xy=xy, xycoords='axes fraction',
                       xytext=xytext, textcoords='axes fraction',
                       size=22, ha='center', zorder=20,
                       bbox=dict(boxstyle='round', fc='white'),
                       arrowprops=dict(color='k', width=2));

To graph the images I filtered out cameras with less than 150 total images and resampled the data at two month frequency.

In [8]:

images_by_date = images_by_date.ix[:, (images_by_date.sum() > 150)]
images_by_date = images_by_date.resample('2M', how='sum')
images_by_date['2005':].plot(kind='bar', figsize=(16, 10), stacked=True)
plt.gca().set_xticklabels([tick.get_text()[:7] 
                           for tick in plt.gca().get_xticklabels()])
plt.ylabel('# of photos', fontsize=16)
plt.xlabel('Date of bin end (YYYY-MM)', fontsize=16)
plt.title('Number of photos by date and camera', fontsize=20)
annotate('Wedding', (.7, .8), (.6, .9))
annotate('Europe', (.243, .32), (.243, .45))
annotate('Drive to California', (.32, .43), (.32, .6))

Data Science Bytes

Analyzing 10 years of digital photography with python and pandas

Similar Posts

Comments

Similar Posts

Comments

Links

Social