A little background for the MARC file
<Quote>MARC stands for MAchine Readable Cataloging records. It’s a format first developed in the 1960s for the U.S. Library of Congress in order to facilitate the exchange of bibliographic records among libraries. By the mid-1970s, it was an international standard, used around the world. Programming and development.
There are several variants of the MARC format. MARC21 was a merger in the 1990s between USMARC and CANMARC, the US and Canadian variants then in use, and other countries have their own formats. In much of Europe, UNIMARC is the variant most often seen. All of these records are formatted the same, with a structure of tags that are used to contain information, a directory which tells what tags are in the record, and where they are located.
Each tag, in each format, means something specific. For instance, in MARC21 bibliographic format, the 245 tag holds information about the title of the work. Additional information, including the publisher, author, size of the physical book, publication date, and subjects, are contained in other tags.
The format of the record, if you were to just print it out, is kind of hard to read. It was originally designed for serial interchange, via 9-track tape, and that medium was still in use in the early days of my career, in the 1990s. The first five bytes of the record are digits and tell you how long the record is, in bytes—including those five bytes. The clever modern nerd will instantly perceive the limitation of this structure: the record cannot be 100,000 bytes in length. Following that is the directory of tags, telling what tags to look for, and at which byte each tag starts. After that comes the tag data, and the next byte after that is the first byte of the next record. The leader/directory/tag structure is generically defined in ISO-2709; MARC21 or UNIMARC are the formats that define the meanings of the tags.
Yes, it’s a poorly designed format by modern standards. Yes, it needs updating, in the worst way, but that’s the subject of another article altogether. </Quote>
In this article, I am sharing my exploration on how to find a MARC file is valid or not in my linux command line.
Tried to install marcedit https://marcedit.reeset.net
It seems the best tool to create, edit, export to various formats. It is a Dotnet based application. Not opening in my linux machine.
This also failed to run. started to explore on other libraries and tools to work with MARC file.
Found a python module https://pypi.org/project/pymarc/
No validation tool found.
All they give the features of parsing the MARC files in different languaes. But all I want is to check a MARC file is valid or not.
Tested ruby, nodejs, java,php version too.
gives a validator. But seems java based and did not tried it.
Finlay, found few perl modules for MARC validation.
CPAN is the library portal for all perl modules.
In my ubuntu cpan is already available.
update CPAN :
sudo cpan install CPAN
sudo cpan install MARC::Schema
sudo cpan install MARC::Lint
Run the validators :
Wide character in print at /usr/local/bin/marclint line 98.
245: Must end with . (period).
Recs Errs Filename
1 1 test.mrc
Not sure which of the validator is correct for the mrc file.
Have to explore more on finding a correct command line MRAC file validator.
If you know anything, please share the details.
May be I have to solve the issues with marcedit application and use its commandline tool. Wondering why such tool not written in any other language.