Programming with Expectations

(DRAFT)

Programming with expectations is a new way of programming. There are several kinds of programming:

  • programming by accident (many people are here)
  • programming by theorem (people want to be here, almost no one is)
  • genetic/evolutionary programming (people do it, but no one is sure if it’s right)
  • programming by contract (somewhat widely used, to greatly varying degrees of success)
  • programming with expectations (my new entry)

Note that I’ve left out how people organize code (structural, object) or coordinate (procedural, event-driven, message-passing). Those are orthogonal to the central conceit, which is “writing programs that work correctly”.

Programming by accident refers to how the vast majority of software engineers work. They write some code, run it, and poke and prod at it until it seems to be working (and sometimes they just write code and pray). This is a fairly insulting term, but also fairly accurate. If you use a debugger while writing code, then you are following this pattern.

Programming by theorem assumes that you can write code to a theoretical proof. A fair amount of work was done on this approach starting in the 1960’s, and while it had good spin-offs, as a programming discipline, it has pretty much been useless. Almost no programs are written this way. Note that most (all?) unit tests really do not fall into this category. If you needed a unit test, then you didn’t have theoretically sound code in the first place. Unit tests belong in the first category.

Genetic/evolutionary programming is the idea that we will grow our code by setting up a set of constraints and having our code be discovered through endless generations of semi-random permutation, loosely following biological genetics principles. It too has had some interesting spin-offs, but has failed as a widely used discipline of programming. Almost no programs are written this way.

Programming by contract was a response to the failure of programming by theorem, and was founded on the idea that if each piece of code had a contract that it enforced with all callers, then life would be good and we would have solid, dependable programs.

More stuff here.

Django REST framework

Django has an updated toolkit they are calling “Django REST framework”.

http://django-rest-framework.org/

This is worth reading through and exploring, not because you are a developer-user of Django, but because at first glance it’s well thought out and documented. In case you don’t know, Django is a web application framework written in Python. Django REST framework is a library on top of Django to make it easier to build Web APIs. And of course, good Web APIs are REST APIs.

If after further reading of my own, I recant this opinion, I’ll come back and update this post.

 

Cellphone GPS accuracy poor?

So, I converted the GPS stamp in one of my photos into a map location. Now, assuming that you can trust Google Maps (I think that’s fair), and that with the photo in question I know where I was to within about 10 feet, the GPS stamp in the picture is about 4000 feet away.

From Google maps, my latitude and longitude at the time of the photo was 33.6588 -117.7668. But the GPS stamp in the photo says

GPS Latitude Ref : North
GPS Latitude : 33 deg 38' 59.40"
GPS Longitude Ref : West
GPS Longitude : 117 deg 45' 60.00"
GPS Time Stamp : 13:32:53.72
GPS Img Direction Ref : True North
GPS Img Direction : 171.6590909

which, if I do the math, is 33.649833 -117.76666.

Is Google Maps that far off? Or did I do the math wrong? I think I did the math wrong. Also, other people think that GPS locators in cellphones can be drastically off some of the time.

http://mashable.com/2011/03/09/smartphone-gps-accuracy/

is an article about Shopkick’s experience in correlating GPS location from the cellphone with a known location.

This is strange, considering that GPS itself is pretty darn accurate.

http://www.gps.gov/support/faq/ and http://www.gps.gov/systems/gps/performance/accuracy/

Given the time I took the photo, this would have been with my iPhone 1, purchased the first day iPhones were available. Maybe the GPS has gotten better in newer models? I’ll need to test.

Ah, maybe it’s an indoors issue? Apparently, GPS signals are partially blocked by building walls, so the accuracy indoors is degraded. That sucks, considering that a lot of my photos are taken indoors. http://stackoverflow.com/questions/4424387/how-accurate-is-the-gps-on-the-iphone-4

Also see this: http://academics.skidmore.edu/blogs/onlocation/2012/03/smartphone_accuracy/. For outdoors, an iPhone is ok to several meters, outdoors.

And apparently the iPhone 4 has a vastly improved GPS receiver. Time to upgrade.

 

Stackable file systems

FUSE isn’t really a stackable filesystem, but it can be used as one.

http://dazuko.dnsalias.org/wiki/index.php/Main_Page

http://www.redirfs.org/tiki-index.php

http://www.filesystems.org/

Also, maybe fanotify could be used for this, which is more limited but faster (?) than inotify. The underlying system of all of these is fsnotify.

Of course, these are all for Linux.

Watchdog, in Python, is a cross-platform system used to monitor filesystem events. https://github.com/gorakhargosh/watchdog. It uses inotify on Linux, FSEvents and kqueues on Mac, ReadDirectoryChanges on Windows).

Article on IBM DeveloperWorks about Linux VFS: http://www.ibm.com/developerworks/linux/library/l-virtual-filesystem-switch/

 

Playing with EXIF and JPEG files

Most JPEG files have EXIF metadata in them.

For Perl, there is the handy-dandy Image::ExifTool library that encapsulates ExifTool.

With this, one can whip up quick programs to do all kinds of things. For example, I have lots of photos scattered across hard disks and iPhoto libraries. Lots of these are duplicates, and some have broken metadata. Reorganizing by hand is painful, but reorganizing with a program is only painful for the 15 minutes it takes to write one.

Some cameras and most smartphones even have GPS nowadays, so if you want to find all the pictures you took in England, you could do a quick search across the metadata.

Also, since actually scanning hard disks is slow, another thing would be to extract all the metadata and put it in one place. For this, someday I hope we’ll have FUSE-level capabilities for Windows and Mac OS X, so I could write a FUSE filesystem to automatically maintain the metadata database. If you’re unfamiliar with FUSE, here’s a teaser article: http://pramode.net/articles/lfy/fuse/pramode.html.

Here’s an example of what you have access to. This is the metadata that ExifTool pulled out of one of my photos.

---- ExifTool ----
ExifTool Version Number : 9.01
---- File ----
File Name : IMG_0890.JPG
Directory : Z:\Media\Images\bfitz\Pictures\iPhoto Library/Modified/2008/Jul 20, 2008
File Size : 742 kB
File Modification Date/Time : 2008:07:30 12:01:32-07:00
File Permissions : rw-rw-rw-
File Type : JPEG
MIME Type : image/jpeg
Exif Byte Order : Big-endian (Motorola, MM)
Comment : AppleMark\x0A
Image Width : 1200
Image Height : 1600
Encoding Process : Baseline DCT, Huffman coding
Bits Per Sample : 8
Color Components : 3
Y Cb Cr Sub Sampling : YCbCr4:2:2 (2 1)
---- EXIF ----
Make : Apple
Camera Model Name : iPhone
Orientation : Horizontal (normal)
Orientation : Horizontal (normal)
X Resolution : 72
Y Resolution : 72
Resolution Unit : inches
Software : QuickTime 7.5
Modify Date : 2008:07:30 12:01:32
Host Computer : Mac OS X 10.4.9
Y Cb Cr Positioning : Centered
F Number : 2.8
Exif Version : 0220
Date/Time Original : 2008:07:20 14:51:42
Create Date : 2008:07:20 14:51:42
Color Space : sRGB
Compression : JPEG (old-style)
X Resolution : 72
Y Resolution : 72
Resolution Unit : inches
Thumbnail Offset : 470
Thumbnail Length : 5639
Y Cb Cr Positioning : Centered
---- ICC_Profile ----
Profile CMM Type : appl
Profile Version : 2.2.0
Profile Class : Input Device Profile
Color Space Data : RGB 
Profile Connection Space : XYZ 
Profile Date Time : 2003:07:01 00:00:00
Profile File Signature : acsp
Primary Platform : Apple Computer Inc.
CMM Flags : Not Embedded, Independent
Device Manufacturer : appl
Device Model : 
Device Attributes : Reflective, Glossy, Positive, Color
Rendering Intent : Perceptual
Connection Space Illuminant : 0.9642 1 0.82491
Profile Creator : appl
Profile ID : 0
Red Matrix Column : 0.45427 0.24263 0.01482
Green Matrix Column : 0.35332 0.67441 0.09042
Blue Matrix Column : 0.15662 0.08336 0.71953
Media White Point : 0.95047 1 1.0891
Chromatic Adaptation : 1.04788 0.02292 -0.0502 0.02957 0.99049 -0.01706 -0.00923 0.01508 0.75165
Red Tone Reproduction Curve : curv\x00\x00\x00\x00\x00\x00\x00\x01\x023
Green Tone Reproduction Curve : curv\x00\x00\x00\x00\x00\x00\x00\x01\x023
Blue Tone Reproduction Curve : curv\x00\x00\x00\x00\x00\x00\x00\x01\x023
Profile Description : Camera RGB Profile
Profile Copyright : Copyright 2003 Apple Computer Inc., all rights reserved.
Profile Description ML : Camera RGB Profile
Profile Description ML (es-ES) : Perfil RGB para Cámara
Profile Description ML (da-DK) : RGB-beskrivelse til Kamera
Profile Description ML (de-DE) : RGB-Profil für Kameras
Profile Description ML (fi-FI) : Kameran RGB-profiili
Profile Description ML (fr-FU) : Profil RVB de l’appareil-photo
Profile Description ML (it-IT) : Profilo RGB Fotocamera
Profile Description ML (nl-NL) : RGB-profiel Camera
Profile Description ML (no-NO) : RGB-kameraprofil
Profile Description ML (pt-BR) : Perfil RGB de Câmera
Profile Description ML (sv-SE) : RGB-profil för Kamera
Profile Description ML (ja-JP) : カメラ RGB プロファイル
Profile Description ML (ko-KR) : 카메라 RGB 프로파일
Profile Description ML (zh-TW) : 數位相機 RGB 色彩描述
Profile Description ML (zh-CN) : 相机 RGB 描述文件
---- Composite ----
Aperture : 2.8
Image Size : 1200x1600
Thumbnail Image : <thumbnail image data>

It’s pretty daunting. But you can see I took this with an iPhone, the orientation is horizontal, it’s a 1200×1600 image at 72 DPI. Also note that I modified this photo, and you can see when it was modified, and that it was likely modified on a Mac (the “Host Computer” key).

What are some things you would want to fix?

Sometimes a photo orientation is recorded incorrectly. Rather than mess with rotating it, you could just fix the metadata. Obviously, it’s an AI task for a program to look at a photo and say “hey, the orientation is wrong”, but once you’ve determined this, my personal preference would be to fix the original data. Another thing you can do is fix dates – sometimes my photos claim to have been taken in 2000, or 2015, or some other crazy date. So, I would prefer to fix the metadata in the photo.

Also note that JPEGs can have embedded thumbnails. You might want to strip those out. Easy enough to do. Or you can insert new thumbnails. You can even add your own metadata. Or you could fix the GPS location if you have better information than what was stamped in the photo (my iPhone 3G has been spectacularly off for indoor pictures, sometimes a mile off).

Here is the start of a program I wrote to find and manipulate metadata in photos. I guess I should start putting this on Github rather that pasted into blog postings.

#!/usr/bin/perl
use 5.014; # implies use strict
use warnings;
use feature ':5.14';

use Digest::MD5;
use Data::Dumper;
use File::Find;
use Getopt::Long qw();
use Image::ExifTool;
use Time::HiRes qw();

Fix->new()->run();

# =======================================================================================
# =======================================================================================

package Fix;
use parent -norequire, 'Object';

sub run
{
    my ($self) = @_;

    $self->{'option'} = Option->new()->read_options();
    $self->{'diag'} = Diag->new()->init($self->{'option'}->{'verbose'});

    $self->do_commands();
}

sub do_commands
{
    my ($self) = @_;

    Scan->new()->scan($self->{'option'}, $self->{'diag'});
}

# =======================================================================================
# =======================================================================================

package Scan;
use parent -norequire, 'Object';

sub scan
{
    my $self = shift;
    $self->{'option'} = shift;
    $self->{'diag'} = shift;

    $self->{'start_time'} = Time::HiRes::time();
    $self->{'file_count'} = 0;
    $self->{'bytes_processed'} = 0;

    $self->{'exif_tool'} = new Image::ExifTool;
    $self->{'exif_tool'}->Options(Unknown => 1);

    my $process = sub { $self->process($_); };

    no warnings 'File::Find'; # don't want noise about dirs we can't iterate
    &File::Find::find({wanted => $process}, $self->{'option'}->{'base'});
}

sub process
{
    my ($self, $file) = @_;

    @{$self->{'stat'}} = stat($file);
    return unless -f _;
    return unless $file =~ /\.(JPG|JPEG|jpg|jpeg)$/;
    return if $file =~ /\.\_/; # skip ._ AppleDouble files

    $self->{'file'} = $file;
    $self->{'path'} = $File::Find::name;
    $self->{'file_size'} = -s _;

    return if $self->{'path'} =~ /\.AppleDouble/; # skip fake JPG inside AppleDouble folders

    $self->{'file_count'} += 1;
    $self->{'bytes_processed'} += $self->{'file_size'};

    my $info = $self->{'exif_tool'}->ImageInfo($self->{'path'});

    print sprintf("%10s %s\n", main::commify($self->{'file_size'}), $self->{'path'});
    my $group = "";
    foreach my $tag ($self->{'exif_tool'}->GetFoundTags('Group0'))
    {
        if ($group ne $self->{'exif_tool'}->GetGroup($tag))
        {
            $group = $self->{'exif_tool'}->GetGroup($tag);
            print "---- $group ----\n";
        }
        my $value = $info->{$tag};
        if (ref($value) eq 'SCALAR')
        {
            $value = $$value;
        }
        $value =~ s/([\x00-\x1F])/sprintf("\\x%02X", unpack("C", $1))/eg;
        print sprintf("%-32s : %s\n", $self->{'exif_tool'}->GetDescription($tag), $value);
    }
    print "\n";

    $self->{'diag'}->progress(
        sub {
            my $dur = Time::HiRes::time() - $self->{'start_time'};
            $dur = 1 if $dur {'file_count'});
            my $procMB = main::bytes_str($self->{'bytes_processed'});
            my $mbsec = main::bytes_str($self->{'bytes_processed'} / $dur);

            sprintf("%s files, %s (%s/sec) processed",
                $proc, $procMB, $mbsec);
        },
        $self);
}

# =======================================================================================
# Diag - diagnostic output
#
# char, line: print if verbose >= 3 (-v -v)
# status: print if verbose >= 2 (-v)
# progress: print if verbose > 0 (no option)
# =======================================================================================

package Diag;
use parent -norequire, 'Object';

sub init
{
    my $self = shift;
    $self->{'diag'} = $_[0] // 0;
    $self->{'need_newline'} = 0;
    $self->{'last_progress'} = 0;
    return $self;
}

sub progress
{
    my ($self, $callback, $param) = @_;
    return unless $self->{'diag'} > 0;
    return if time() == $self->{'last_progress'};

    $self->{'last_progress'} = time();
    print STDERR "\n" and $self->{'need_newline'} = 0 if $self->{'need_newline'};
    print STDERR sprintf("%-78s\r", $callback->($param));
}

sub progress_clear
{
    my ($self) = @_;
    return unless $self->{'diag'} > 0;
    print STDERR sprintf("%-78s\r", "");
}

# =======================================================================================
# Option - parse command-line options
# =======================================================================================

package Option;
use parent -norequire, 'Object';

sub read_options
{
    my ($self) = @_;

    my @opts = ('verbose|v+', 'quiet|q', 'base=s');
    my $status = Getopt::Long::GetOptions($self, @opts);
    $self->{'verbose'} = 0 if $self->{'quiet'};

    # get from ARGV once I add separate commands

    return $self;
}

# =======================================================================================
# Object - base class
# =======================================================================================

package Object;

sub new
{
    my $self = shift;
    my ($class) = ref($self) || $self; # allow both virtual (member) and static (class)
    $self = {};
    bless $self, $class;

    return $self;
}

# =======================================================================================
# Not in a class yet
# =======================================================================================

package main;

sub bytes_str
{
    my $val = shift;

    if ($val < 10_000) { return sprintf("%.2f KB", $val / 1_000); }
    elsif ($val < 100_000) { return sprintf("%.1f KB", $val / 1_000); }
    elsif ($val < 1_000_000) { return sprintf("%d KB", int($val / 1_000)); }

    elsif ($val < 10_000_000) { return sprintf("%.2f MB", $val / 1_000_000); }
    elsif ($val < 100_000_000) { return sprintf("%.1f MB", $val / 1_000_000); }
    elsif ($val < 1_000_000_000) { return sprintf("%d MB", int($val / 1_000_000)); }

    elsif ($val < 10_000_000_000) { return sprintf("%.3f GB", $val / 1_000_000_000); }
    elsif ($val < 100_000_000_000) { return sprintf("%.2f GB", $val / 1_000_000_000); }
    elsif ($val < 1_000_000_000_000) { return sprintf("%.1f GB", $val / 1_000_000_000); }

    else { return sprintf("%.3f TB", $val / 1_000_000_000_000); }
}

sub commify
{
    local $_  = shift;
    1 while s/^([-+]?\d+)(\d{3})/$1,$2/;
    return $_;
}

# =======================================================================================

=head1 NAME

fix-jpeg: Clean up JPEG files and EXIF information

=head1 SYNOPSIS

fix-jpeg [options] [args]

fix-jpeg --base BASEDIR -v

=head1 DESCRIPTION

Fix file dates to match EXIF dates, or EXIF dates to match file dates. Rename files.
Find duplicates. Fix iPhoto libraries. Other magic stuff.

=cut

 

Build systems

Google “Build in the cloud” – http://google-engtools.blogspot.com/2011/08/build-in-cloud-how-build-system-works.html

Google has a FUSE filesystem that keeps track of digests of files as an attribute. This is something that needs to be default in all filesystems. http://google-engtools.blogspot.com/2011/06/build-in-cloud-accessing-source-code.html. Also, they build everything from head.

http://www.cs.virginia.edu/~dww4s/articles/build_systems.html