[XML4Lib] Database schema for MARCXML?

Kyle Banerjee kyle.banerjee at gmail.com
Wed Aug 2 13:31:38 EDT 2006


The best structure depends on what you want to do with the data and
how much you have.

For example, if you want to be able to search individual subfields of
a huge bibliographic set, you'll want to index them separately.
However, if you only need to be able to search and display a few
fields, it might be much easier to just index those individually and
store the entire MARCXML record in one field. If you're interested in
what the fields mean as opposed to what they are (for example, do you
want to search for "authors" or do you really want people to specify
all the 1xx and 7xx fields), you'll want to assign tag types to
simplify searching.

I'd be inclined to retain the whole record and design for the need at
hand. If needs change, you can always add more tables and indexes.

kyle

As Mike hints, the joins get expensive if you

On 8/2/06, Mike Rylander <mrylander at gmail.com> wrote:
> On 8/2/06, jkglenn at umich.edu <jkglenn at umich.edu> wrote:
> > Hello,
> >
> > Hopefully the list is willing to entertain a naive question from a MARC
> > and XML newbie. I'm trying to figure out the best way to store MARCXML
> > records in a relational database, and am wondering if there are any
> > standard database schemas out there for doing this. In particular I
> > would like to avoid having to write database queries to replicate the
> > functionality of CCL commands that search a particular set of MARC
> > fields. I've found some limited info on the Library of Congress
> > website, but nothing very helpful. If someone could point me in the
> > right direction I would be most grateful.
>
> Unfortunately, MARC doesn't translate to a relational model very well
> ... not in a normalized, scalable way, at least.
>
> There are basically 2 routes you can take:
>
> Flat, denormalized table of field-subfield-data, something like this
>
> CREATE TABLE marcdata (
>   recid       int,
>   tag          char(3),
>   ind1        char(1),
>   ind2        char(1),
>   subfield  char(1),
>   data        varchar(1024)
> );
>
> Or you can go to a normalized, multi-table setup, along these lines
>
> CREATE TABLE marcdoc (
>   id           int,
>   source  text
> );
>
> CREATE TABLE marctag (
>   id          int,
>   record  int,
>   tag       char(3)
> );
>
> CREATE TABLE controlfield (
>   tag       int,
>   data     text,
> ),
>
> CREATE TABLE datafield_indicators (
>   tag         int,
>   ind1       char(1),
>   ind2       char(2)
> );
>
> CREATE TABLE datafield_subfield
>   tag          int
>   subfield  char(1),
>   value      text
> );
>
>
> Hope that helps! :)
>
> >
> > Thanks,
> > Jacob Glenn
> > _______________________________________________
> > XML4Lib mailing list
> > XML4Lib at webjunction.org
> > http://lists.webjunction.org/mailman/listinfo/xml4lib
> >
>
>
> --
> Mike Rylander
> mrylander at gmail.com
> GPLS -- PINES Development
> Database Developer
> http://open-ils.org
> _______________________________________________
> XML4Lib mailing list
> XML4Lib at webjunction.org
> http://lists.webjunction.org/mailman/listinfo/xml4lib
>


More information about the XML4Lib mailing list