SHA-1 implementations

I’m going to attempt to collect open-source implementations of SHA-1 here. There are many more of them than MD5, which is appropriate, considering that SHA-1 is the most widely used message digest function to date.

Obligatory Wikipedia page on SHA-1.

FIPS PUB 180-4: Secure Hash Standard, March 2012, covers SHA-1 and SHA-2. In this document, all but SHA-1 are variants of SHA-2 – e.g. SHA-256 and SHA-512 are 256-bit and 512-bit output variants of SHA-2, respectively.

NIST 800-107: Recommendations for Applications Using Approved Hash Algorithms is also worth reading.

NIST has a list of all validated SHA-1 implementations: SHS Validation List. As of 2/22/2013, there are 2024 of them, although to be fair, a lot of the entries are incremental updates by folks like IBM/Tivoli, and the majority of entries are in the past few years (I suppose this means increased scrutiny on security and trust).

1995, FIPS 180-1

FIPS 180-1 – Secure Hash Standard. This was the formal release of the SHA-1 algorithm. FIPS 180-1 was superceded by FIPS 180-2, FIPS 180-3 and now FIPS 180-4. The document had pseudo-code and several test vectors. While MD5 has a single author, Ronald Rivest, SHA-1 was invented by the NSA.

2001, RFC 3174 (Donald Eastlake and Paul Jones)

RFC 3174 has an implementation of SHA-1 in C by Donald Eastlake and Paul E. Jones. The date on the RFC is 2001, but it’s likely that drafts and code were written several years before this.

1998, Packetizer (Paul Jones)

There is a slight variant from Paul E. Jones, linked in Secure Hashing Algorithm (SHA-1),which appears to have been written around 1998, with an update in 2009. There are a handful of changes from the code in RFC 3174. I don’t know if this is the same Paul E. Jones from RFC 3174.

2006, RFC 4634 (Donald Eastlake and Paul Jones)

RFC 4634 is an update to handle input of arbitrary bit length. RFC and presumably code written by Donald Eastlake and Paul Jones, 2006.

2011, RFC 6234 (Donald Eastlake and Tony Hansen)

RFC 6234 replaces RFC 4634, and adding HMAC and HKDF functionality. The RFC (and presumably code) was written by Donald Eastlake and Tony Hansen, 2011.

1998, Ghostscript (Steve Reid, James Brown)

SHA-1 in C from ghostscript.

OpenSSL

There are validated implementations of SHA-1 in the OpenSSL FIPS Object Model, at http://www.openssl.org/source/.

IBM ICC

IBM Crypto for C (ICC) is listed as “a non-proprietary FIPS 140-2 cryptographic module”. I couldn’t find source available, though, so I’ll probably remove this.

Mozilla NSS

Mozilla Network Security Services (NSS) contains SHA-1 implementations.

Crypto++ (Wei Dai)

Crypto++ has SHA-1 implementations

2001, Botan (Jack Lloyd)

Botan has SHA-1 implementations.

2004, Aaron Gifford

Secure Hash Algorithm (SHA) has BSD licensed code for SHA-1.

2006, Cryptlib (Peter Gutman)

Cryptlib has SHA-1.

2009, Git (Linus Torvalds)

Commit d7c208a9, Add new optimized C ‘block-sha1′ routines, introduced an optimized version of the Mozilla SHA1 function to git, reportedly about 30% faster. This version would not work on some architectures, because it requires an architecture that allows unaligned 32-bit loads.

Also see comments on mailing lists, including Re: x86 SHA1: Faster than OpenSSL for development history.

This is in coreutils, which is GPLv3, but a message from Linus indicated he might be willing to stick with the MPL license, since he started from mozilla-sha1; see thread Linus’ sha1 is much faster! There’s also a separate implementation from George Spelvin in a thread spun off from that thread.

2011, smallsha1 (Micael Hildenborg)

smallsha1 is written in C++. It uses an old-style BSD license, the one with an advertising clause, so it’s probably not suitable for use by most companies.

2012, Nayuki Minasi

Fast SHA-1 hash implementation in x86 assembly

Source code available, but not open source. Gets about 327 MiB/sec on a Core 2 Q6600 2.4 GHz, compared with (claimed) OpenSSL speed of 305 MiB/sec on equivalent hardware.

 

A build layout – updated

This seems to be the cleanest.

Build/
  Win32-Debug/
    Obj/
      sub1/
      sub2/
    Lib/
      sub1.lib
      sub2.lib
    program.exe
    program.ilk
    program.pdb
  x64-Release/
    ..

This keeps all the object files in one place, one sub-folder per project, all the libraries in another place (named after project, so presumably unique), and then the executables in the main target folder. Instead of nesting platform and target, there’s a flat hierarchy of targets, with each target name sufficiently disambiguated. For example, if there’s only one architecture, that might be left out, but once there are several architectures, all are named.

I haven’t yet figured out how to do this in a single property sheet in Visual Studio. There are two, one for projects that make executables, and the other for projects that make libraries. This is because of multiple different properties that need to coincide.

Here’s a sample property sheet for libraries:

<?xml version="1.0" encoding="utf-8"?>
<Project DefaultTargets="Build" ToolsVersion="4.0" xmlns="http://schemas.microsoft.com/developer/msbuild/2003">
  <PropertyGroup>
    <OutDir>$(SolutionDir)Build\$(Platform)-$(Configuration)\Lib\</OutDir>
    <IntDir>$(SolutionDir)Build\$(Platform)-$(Configuration)\Obj\$(ProjectName)\</IntDir>
  </PropertyGroup>
  <ItemDefinitionGroup>
    <Lib>
      <TargetPath>$(SolutionDir)Build\$(Platform)-$(Configuration)\Lib\</TargetPath>
      <OutputFile>$(SolutionDir)Build\$(Platform)-$(Configuration)\Lib\$(TargetName)$(TargetExt)</OutputFile>
    </Lib>
  </ItemDefinitionGroup>
</Project>

and a sample property sheet for executables

<?xml version="1.0" encoding="utf-8"?>
<Project DefaultTargets="Build" ToolsVersion="4.0" xmlns="http://schemas.microsoft.com/developer/msbuild/2003">
  <PropertyGroup>
    <OutDir>$(SolutionDir)Build\$(Platform)-$(Configuration)\</OutDir>
    <IntDir>$(SolutionDir)Build\$(Platform)-$(Configuration)\Obj\$(ProjectName)\</IntDir>
  </PropertyGroup>
</Project>

It’s possible that I have some redundancy due to not completely understanding the interaction between $(OutDir), $(TargetPath) and $(OutputFile). I also should define them in terms of a base path for more readability, something like $(BuildBase).

Now, if only the other Visual Studio artifacts like *.suo, *.sdf, and *.user could be tucked away into a build folder like that. The ideal would be that “get from source and build” does not litter your working directory with artifact files all over the place, but that they are in one place.

Even more ideally, the build output folder would not be inside the source working folder. That’s probably too much to ask, given both existing practices and existing tools. But the very existence of an “ignore list” is a result of such scattering of temp files. And ignore lists are bad, because they hide things, and they are usually based on regular expressions against names, which can have unwanted side effects (e.g. an ignore on “Debug” as a temp folder name, but then an un-ignore needed if you have a folder named “Debug” in source control).

A build layout

Suggestion for Visual Studio builds – and generalized to all builds, including cross-platform builds into the same folder.

Build/
  Obj/
    Win32-Debug/
      main.obj
    x64-Release/
      main.obj
  Win32-Debug/
    program.exe
    program.pdb
  x64-Release/
    program.exe
    program.pdb

This puts all build artifacts in a single folder named Build, with object files separated from final build files.

Alternatively, the build folder could have Obj directories per-target, like this

Build/
  Win32-Debug/
    Obj/
      main.obj
    program.exe
    program.pdb
  x64-Release/
    Obj/
      main.obj
    program.exe
    program.pdb

This is probably a little more logical, except that it will give someone the impression that you could clean a single target easily. “Real” projects often have some pieces compiled in debug and others compiled in release, or with optimizations.

The main idea is that object files and binaries get unique paths based on build settings, including platform and target, but also perhaps including smaller subdivisions like optimization settings.

The main down-side is that a simple project requires several levels of folder to get to the binary, and also that any test data would either need to be duplicated, or located by some more complicated means than “next to the executable”. When launched from Visual Studio, the working directory is set to the folder containing the solution, but when run directly, the working directory defaults to the executable location. There is no obvious easy answer here.

Making this work in Visual Studio involves setting OutDir and IntDir variables like so:

OutDir: $(SolutionDir)Build\$(Platform)-$(Configuration)\
IntDir: $(SolutionDir)Build\Obj\$(Platform)-$(Configuration)\

and this can be done in the IDE (where, in English, OutDir is named “Output Directory” and IntDir is named “Intermediate Directory”) or directly in the vcxproj files.

  <PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">
    <LinkIncremental>true</LinkIncremental>
    <OutDir>$(SolutionDir)Build\$(Platform)-$(Configuration)\</OutDir>
    <IntDir>$(SolutionDir)Build\Obj\$(Platform)-$(Configuration)\</IntDir>
  </PropertyGroup>

gitdm

gitdm is a tool used by Jonathan Corbet, LWN editor, for doing reports of kernel code.

LWN article from 2008 making it available: http://lwn.net/Articles/290957/

Gitorious repository: http://gitorious.org/mining-tools/gitdm

Github fork: https://github.com/markmc/openstack-gitdm