Updated 2012-09-10 14:38:50 by LkpPo

Salvatore Sanfilippo 11Oct2004:

A minimal introduction to ISBN and a bit of Tcl code for checksum calculation and fields separation.

What's ISBN

The ISBN (International Standard Book Number), is a a unique ten digit number assigned to every printed book (even the same book in two different editions will have two different ISBNs).

The ten digits are separated in four parts with dashes or spaces, but actually it's not rare at all to find ISBNs without separators. This creates a lot of troubles with ISBN search engines generally, and specifically with queries to z39.50 servers because the separation have semantical value.
 The ISBN syntax is as follows:

 <language id>-<publisher id>-<title id>-<checksum digit>

The first three parts are numbers, the length of every field can be different for every ISBN number but the sum is always the same (9 digits). The last part, the checksum, is the only with a fixed length (one digit), and can be a number or an X character, because the math to calculate the checksum is computed modulo 11, the final value is in the range 0 - 10, so X is used to represent 10 (as Roman number).

An Example of real ISBN is this:
  0-201-63361-2

That's Design Patterns, the language ID 0 is for english (1 is also english), 201 is Addison-Wesley, and 63361 identifies the book Design Patterns in its first edition (the only for now).

Another example:
  88-386-3407-6

88 is the language code for Italian, the book is Linguaggio C++ written by Hervert Schildt, 386 is McGrawHill, 3407 for this particular book.

In both the cases the last digit (2 for the first, 6 for the second ISBN), is the checksum.

In theory to split the number in four parts is smart, because it allows to assign to very big publishers littler publisher IDs, leaving more space for the title identifier. The same is true about the language identifier, short ones are used for languages that counts a lot of books published (like english for example), while longer ones can be used for languages with few books where the other digits may be shorter because of few publishers and titles.

In pratice ISBN does not take into account the human factor. People are lazy and tend to just write down the number without to include dashes or spaces, not everyone is an engineer in this world, so I think the system is flawed and should be a possibly longer number without fields at all.

It's worth to mention that the new ISBN standard with 13 digits is ready to take place in the future, but AFAIK this problem is not addressed.

I'm slowly creating a web interface for an italian z39.50 server that's on-line at [1], the on-line version is not updated with the latest code I'm developing, so there is no ISBN search there for now, but I implemented it in my local copy, and being 233gradi develped in Tcl I had to write some code in order to check the ISBN numbers checksum and to automatically add dashes to badly formatted ISBNs. The checksum code is general and can be used with every ISBN number, but the code to add dashes works only for italian ISBNs because it uses specific knowledges about italian ISBN numbers in order to be able to guess how to separate the number in the right parts.

Checksum Algorithm

Let's call every ISBN digit with a letter from A to J, so that a 10-digit ISBN number is represented as:
 ABCDEFGHI-J

The Checksum digit J is calculated from the ABCDEFGHI part as:

(11-((A*10)+(B*9)+ .... +(I*2) modulo 11)) modulo 11

(where modulo is the division's remainder in this case)

So for the ISBN 88-386-3407-? (where ? is the checksum digit to calculate) the math is:

11-(((8*10)+(8*9)+(3*8)+(8*7)+(6*6)+(3*5)+(4*4)+(0*3)+(7*2))%11) = 6

6%11 = 6

As you can see the final modulo 11 step is only useful when the result of the expression is 11, in order to convert 11 to 0.

Remember that the result of the checksum algorithm is a number between 0 and 10, if it's 10 you need to put an X character as checksum digit.

TCL code
 # This software is:
 #
 # Copyright (C) 2004 Salvatore Sanfilippo <antirez at invece dot org>
 #
 # The following terms apply to all files associated with the software
 # unless explicitly disclaimed in individual files.
 #
 # The authors hereby grant permission to use, copy, modify, distribute,
 # and license this software and its documentation for any purpose, provided
 # that existing copyright notices are retained in all copies and that this
 # notice is included verbatim in any distributions. No written agreement,
 # license, or royalty fee is required for any of the authorized uses.
 # Modifications to this software may be copyrighted by their authors
 # and need not follow the licensing terms described here, provided that
 # the new terms are clearly indicated on the first page of each file where
 # they apply.
 # 
 # IN NO EVENT SHALL THE AUTHORS OR DISTRIBUTORS BE LIABLE TO ANY PARTY
 # FOR DIRECT, INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES
 # ARISING OUT OF THE USE OF THIS SOFTWARE, ITS DOCUMENTATION, OR ANY
 # DERIVATIVES THEREOF, EVEN IF THE AUTHORS HAVE BEEN ADVISED OF THE
 # POSSIBILITY OF SUCH DAMAGE.
 # 
 # THE AUTHORS AND DISTRIBUTORS SPECIFICALLY DISCLAIM ANY WARRANTIES,
 # INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY,
 # FITNESS FOR A PARTICULAR PURPOSE, AND NON-INFRINGEMENT.  THIS SOFTWARE
 # IS PROVIDED ON AN "AS IS" BASIS, AND THE AUTHORS AND DISTRIBUTORS HAVE
 # NO OBLIGATION TO PROVIDE MAINTENANCE, SUPPORT, UPDATES, ENHANCEMENTS, OR
 # MODIFICATIONS.
 # 
 # GOVERNMENT USE: If you are acquiring this software on behalf of the
 # U.S. government, the Government shall have only "Restricted Rights"
 # in the software and related documentation as defined in the Federal 
 # Acquisition Regulations (FARs) in Clause 52.227.19 (c) (2).  If you
 # are acquiring the software on behalf of the Department of Defense, the
 # software shall be classified as "Commercial Computer Software" and the
 # Government shall have only "Restricted Rights" as defined in Clause
 # 252.227-7013 (c) (1) of DFARs.  Notwithstanding the foregoing, the
 # authors grant the U.S. Government and others acting in its behalf
 # permission to use and distribute the software in accordance with the
 # terms specified in this license. 
 
 
 # isbnValidChecksum - Validate ISBN numbers checksum
 #
 # Return 1 if the input ISBN number checksum is ok. Otherwise zero
 # is returned. The algorithm only considers digits for the first 9
 # sums, and a digit or a 'X' or 'x' character for the final ISBN digit.
 # Any other character presents in the input string is ignored.
 #
 # Note that this algorithm works with any 10-digit ISBN, and is
 # not Italian-specific as [isbnAddDashes] is.
 proc isbnValidChecksum isbn {
     set digits 0
     set sum 0
     foreach d [split $isbn {}] {
         if {![string is digit $d]} { ;# Not a digit...
             if {$digits != 9 || ($d ne {x} && $d ne {X})} {
                 # ... Nor a 'x' as last character. Skip it.
                 continue
             }
         }
         incr digits
         if {$d eq {x} || $d eq {X}} {set d 10}
         set sum [expr {$sum+($d*(11-$digits))}]
     }
     if {$digits == 10 && ($sum % 11) == 0} {return 1}
     return 0
 }
 
 # isbnValidChecksum - Add/Fix dashes in Italian ISBN numbers.
 #
 # If the ISBN is not Italian or it appears to be corrputed,
 # the original string is returned unmodified.
 proc isbnAddDashes isbn {
     set orig $isbn
     set isbn [string trim $isbn " \t\r\n"]
     if {[string range $isbn 0 1] ne {88}} {return $orig}
     set isbn [string map {{ } {} {-} {}} $isbn]
     if {[string length $isbn] != 10} {return $orig}
     switch -- [string index $isbn 2] {
         0 - 1 {set idlen 2}
         2 - 3 - 4 - 5 {set idlen 3}
         6 - 7 {set idlen 4}
         8 {
             switch -- [string index $isbn 3] {
                 0 - 1 - 2 - 3 - 4 {set idlen 4}
                 5 - 6 - 7 - 8 - 9 {set idlen 5}
                 default {return $orig}
             }
         }
         9 {set idlen 5}
         default {return $orig}
     }
     set new "88-"
     append new [string range $isbn 2 [expr {$idlen+1}]]-
     append new [string range $isbn [expr {$idlen+2}] end-1]-
     append new [string index $isbn end]
     return $new
 }
 
 # Simple test program
 while 1 {
     set l [gets stdin]
     puts "With dashes fixed (if Italian): [isbnAddDashes $l]"
     puts "Valid checksum? [isbnValidChecksum $l]"
 }

PS: See also [2], and EAN-13 generation on how the ISBN number is hidden in the barcode on a book too. Hint: 978-<isbn-without-checkdigit>-<EAN checkdigit>, Same for ISSN but then with 977 prepended. - RS: See also Check digits

WJP Here's a procedure that calculates the check digit for both 10 digit and 13 digit ISBNs.
 # Given an ISBN, returns the check digit.
 # Both 10 digit and 13 digit ISBNs are acceptable input.
 # In either case, the ISBN may already contain a check digit,
 # which is ignored if present, or it may be an incomplete
 # ISBN for which the check digit is to be calculated.
 # Internal hyphens and spaces are ignored.
 # If the ISBN is ill-formed returns -1.
 
 proc isbncd {k} {
    set k [string map {- "" " " ""} $k];
    set len [string length $k]
    if {($len < 9) || ($len > 13)} {
        return -1
    }
    set sum 0
    # 13-digit ISBN
    if {$len >= 12} {
        if {$len == 13} {
            set k [string range $k 0 11];# Remove check digit
        }
        if {[regexp {\D} $k]} {return -1};# Ensure that there are only digits
        set kl [split $k ""]
        # Weights are 1, 3, 1, 3, ...
        foreach {a b} $kl {
            incr sum [expr {$a + (3 * $b)}]
        }
        set rem [expr {$sum % 10}]
        if {$rem == 0} {
            return 0
        } else {
            return [expr {10 - $rem}]
        }
    }
    # 10-digit ISBN
    if {$len == 10} {
        set k [string range $k 0 8];# Remove check digit
    }
    if {[regexp {\D} $k]} {return -1};# Ensure that there are only digits
    set kl [split $k ""]
    # Weights are 10, 9, 8, ...
    set weight 10
    foreach d $kl {
        incr sum [expr {$d * $weight}]
        incr weight -1
    }
    set rem [expr {$sum % 11}]
    if {$rem == 0} {
        return 0
    } else {
        set ck [expr {11 - $rem}]
        if {$ck == 10} {
            set ck X
        }
        return $ck
    }
 }

Test:
 %isbncd 88-386-3407-6
 6
 %isbncd 88-386-3407
 6
 %isbncd 88-386-340&
 -1
 %isbncd 978-88-386-3407
 9