perl module awesomeness
There's no such thing as a complete ISBN database. Some places, like the
Library of Congress have really big databases, but not complete. I learned a lot about ISBN numbers tonight (trying to find some sort of database I could use); I ended up writing a perl script to search the LOC database; and I wrote a sweetly simple screen-scraping script.
The LOC database isn't international, and it's not complete either (some lack ISBNs for some reason). Somehow places like amazon.com compile nearly-complete ISBN databases though. You can't really access amazon's database however.
I wanted to programmatically access book information by ISBN. I thought I'd have to write a screen-scraper to simulate a search and parse out the information from the results page. But LOC
supports a
z39.50 search interface. And there's a perl module to use that. The search results are returned in
MARC format. And there's a perl module to use that. It took a while to get it right, but I now have a very small perl script that takes an ISBN and prints the title, author, edition, and date! Awesome!
#!/usr/bin/perl -w
use strict;
use Net::Z3950;
use MARC::File::USMARC;
print "syntax: search.pl ISBN\n" and exit if !exists $ARGV[0];
my $isbn = $ARGV[0];
my $host = 'z3950.loc.gov';
my $port = 7090;
my $db = 'Voyager';
my $conn = new Net::Z3950::Connection($host, $port, databaseName => $db) or die $!;
my $rs = $conn->search("\@attr 1=7 $isbn") or die $conn->errmsg();
my $n = $rs->size();
$rs->option(elementSetName => "f");
$rs->option(preferredRecordSyntax => "USMARC");
foreach my $i (1..$n) {
my $rec = $rs->record($i) or die $rs->errmsg();
my $m = MARC::Record->new_from_usmarc($rec->rawdata());
print $m->title(), "\n";
print $m->title_proper(), "\n";
print $m->author(), "\n";
print $m->edition(), "\n";
print $m->publication_date(), "\n";
print "\n";
}
$conn->close();
I did write a screen-scaper tonight, too. But for a different reason. Using Template::Extract, I wrote some terribly simple templates to extract links and data from webpages. Looping through the returned data has a funky syntax, but I managed (with some help from Data::Dump):
foreach my $course (@{$data->{'courses'}}) {
print $course->{'dept'}, "\t", $course->{'number'}, "\t",course->{'section'}, "\n";
}
Posted by Dave at December 20, 2003 03:02 AM
| TrackBack