I recently switched DSLR camera systems from Canon to Nikon for reasons of marital harmony. That meant choosing which Nikon lenses would replace the four Canon lenses I owned. To make an optimal decision I needed to know my historical usage, so I wrote some python to analyze image metadata from 10 years of digital photography.
import pandas as pd
import numpy as np
%matplotlib inline
import matplotlib
import matplotlib.pyplot as plt
matplotlib.style.use('ggplot')
from PIL import Image
Import the metadata parsing script I wrote, available here.
import exif_data # https://github.com/frankcleary/img-exif/blob/master/exif_data.py
df = exif_data.make_df('/Users/frank/Pictures/Photos Library.photoslibrary/Masters',
columns=['DateTimeOriginal', 'FocalLength', 'Make', 'Model'])
df.head()
The operation to get the data is slow (lots of disk I/O), so I made a copy to avoid accidentally modifying the original during interactive analysis.
exif_df = df.dropna().copy()
exif_df['RealFocalLength'] = exif_df['FocalLength'].apply(
lambda x: x[0] / float(x[1]) if x is not None else None
)
exif_df.head()
First I investigated statistics broken down by camera, plotting the number of photos and the cumulative distribution function by focal length. On my 8 year old Canon XTi over 80% were taken at focal lengths shorter than 55 mm, which indicated that Nikon's extremely light 18-55mm lens would be a better choice than the heavier 18-140mm.
def format_plot(x_range, xticks, yticks, ylabel, xlabel, title):
"""Set names, labels and axis limits on the current matplotlib plot.
"""
axis_label_props_dict = {'fontsize': 16, 'fontweight': 'bold'}
title_props_dict = {'fontsize': 20, 'fontweight': 'bold'}
plt.xlim(x_range)
plt.gca().set_xticks(xticks);
if yticks is not None:
plt.gca().set_yticks(yticks);
plt.ylabel(ylabel, **axis_label_props_dict)
plt.xlabel(xlabel, **axis_label_props_dict)
plt.title(title, **title_props_dict)
def plot_for_camera(model):
"""Generate a histrogram and CDF by focal length for the provided model
of camera.
"""
plt.figure()
model_df = exif_df.query('Model == "{}"'.format(model))
model_df['RealFocalLength'].hist(bins=xrange(10, 202, 2), figsize=(16, 5))
format_plot([10, 202], xrange(10, 202, 5), None,
'# of photos', 'Focal length', model)
plt.figure()
model_df['RealFocalLength'].hist(bins=xrange(10, 202, 2),
figsize=(16, 5),
cumulative=True,
normed=True)
format_plot([10, 202], xrange(10, 202, 5), np.arange(0, 1.01, .05),
'Fracation of photos', 'Focal length', model)
plt.ylim([0, 1])
camera_models = ['Canon EOS DIGITAL REBEL XTi',
'NIKON D5300',
'NIKON D5500',
'Canon PowerShot S90']
for model in camera_models:
plot_for_camera(model)
With all this data available, I was also curious how my camera usage as evolved over time. Important life events and vacations really stand out. The code below creates a table containing the count of images from each camera by day.
images_by_date = pd.crosstab(index=exif_df['DateTimeOriginal'],
columns=exif_df['Model'])
images_by_date.index = pd.to_datetime(images_by_date.index,
format='%Y:%m:%d %H:%M:%S',
coerce=True)
images_by_date = images_by_date[pd.notnull(images_by_date.index)]
images_by_date.head()
I wrote this helpful method to annotate the plot below.
def annotate(text, xy, xytext):
"""Annotate the current matplotlib axis"""
plt.gca().annotate(text, xy=xy, xycoords='axes fraction',
xytext=xytext, textcoords='axes fraction',
size=22, ha='center', zorder=20,
bbox=dict(boxstyle='round', fc='white'),
arrowprops=dict(color='k', width=2));
To graph the images I filtered out cameras with less than 150 total images and resampled the data at two month frequency.
images_by_date = images_by_date.ix[:, (images_by_date.sum() > 150)]
images_by_date = images_by_date.resample('2M', how='sum')
images_by_date['2005':].plot(kind='bar', figsize=(16, 10), stacked=True)
plt.gca().set_xticklabels([tick.get_text()[:7]
for tick in plt.gca().get_xticklabels()])
plt.ylabel('# of photos', fontsize=16)
plt.xlabel('Date of bin end (YYYY-MM)', fontsize=16)
plt.title('Number of photos by date and camera', fontsize=20)
annotate('Wedding', (.7, .8), (.6, .9))
annotate('Europe', (.243, .32), (.243, .45))
annotate('Drive to California', (.32, .43), (.32, .6))
Similar Posts
- Pandas Timedelta: histograms, unit conversion and overflow danger, Score: 0.972
- Pandas date parsing performance, Score: 0.963
- Polar plots and shaded errors in matplotlib, Score: 0.955
- Annotating matplotlib plots, Score: 0.950
- Label graph axes!, Score: 0.949
Comments