From kratzer at bsb-muenchen.de Thu Jul 3 05:29:39 2008
From: kratzer at bsb-muenchen.de (Mathias Kratzer)
Date: Thu Jul 3 05:29:56 2008
Subject: [XML4Lib] Question on trailing whitespace in MARCXML controlfield
elements
Message-ID: <486CB823.92C9.00ED.0@bsb-muenchen.de>
Dear all,
is it valid to omit trailing whitespace in MARCXML controlfield elements?
Example:
920219s1993 caua j 000 0 eng
instead of
920219s1993 caua j 000 0 eng
According to the MARCXML schema leader and controlfields are subject to "whitespace preservation", and for inerior whitespace it is clear why this is vital. However, any application could easily fill any missing positions with whitespaces by default. IMHO it is still _not_ valid to omit trailing whitespace - so my real problem is that I'm not able to find any sort of document that states "yes, doing so will let your MARCXML become invalid".
I am completely aware of the fact that the LoC (as maintaining agency of the MARCXML standard) is the appropriate address for my question but the "Contact Us" link on http://www.loc.gov/standards/marcxml/// only led me to the general "Ask a Librarian" page. So I thougt this mailing list is probably the more direct way to ask the experts :-)
Best regards
Mathias
_____________________________________________________________
)_______
)_______ Bavarian State Library
)_______ Bavarian Library Network / Head Office
)_______ Dr. Mathias Kratzer
)_______ Ludwigstra?e 16
)_______ D-80539 M?nchen
)_______ phone: +49 (0)89 28638-2797
)_______ fax: +49 (0)89 28638-2605
)_______ eMail: kratzer@bsb-muenchen.de
)____________________________________________________________
From houghtoa at oclc.org Thu Jul 3 09:26:13 2008
From: houghtoa at oclc.org (Houghton,Andrew)
Date: Thu Jul 3 09:26:15 2008
Subject: [XML4Lib] Question on trailing whitespace in MARCXML
controlfieldelements
In-Reply-To: <486CB823.92C9.00ED.0@bsb-muenchen.de>
References: <486CB823.92C9.00ED.0@bsb-muenchen.de>
Message-ID: <6548F17059905B48B2A6F28CE3692BAAC89F48@OAEXCH4SERVER.oa.oclc.org>
> From: xml4lib-bounces@webjunction.org [mailto:xml4lib-
> bounces@webjunction.org] On Behalf Of Mathias Kratzer
> Sent: Thursday, July 03, 2008 5:30 AM
> To: xml4lib
> Subject: [XML4Lib] Question on trailing whitespace in MARCXML
> controlfieldelements
>
> Dear all,
>
> is it valid to omit trailing whitespace in MARCXML controlfield
> elements?
>
> Example:
>
> 920219s1993 caua j 000 0
> eng
>
> instead of
>
> 920219s1993 caua j 000 0 eng
>
>
>
> According to the MARCXML schema leader and controlfields are subject to
> "whitespace preservation", and for interior whitespace it is clear why
> this is vital. However, any application could easily fill any missing
> positions with whitespaces by default. IMHO it is still _not_ valid to
> omit trailing whitespace - so my real problem is that I'm not able to
> find any sort of document that states "yes, doing so will let your
> MARCXML become invalid".
>
> I am completely aware of the fact that the LoC (as maintaining agency
> of the MARCXML standard) is the appropriate address for my question but
> the "Contact Us" link on http://www.loc.gov/standards/marcxml/// only
> led me to the general "Ask a Librarian" page. So I thought this mailing
> list is probably the more direct way to ask the experts :-)
The MARC-XML schema use whitespace preservation because whitespace is a content issue. You could omit the whitespace and your MARC-XML would still validate. However, the MARC 21 standards, e.g., authority, bibliographic, holdings, etc. specify what the content should be. In those standards the 008 is exactly 40 characters long. Which means that those trailing spaces are significant to the interpretation of the content that is serialized into any format whether it be ISO 2709 or a MARC-XML instance document.
Andy.
From aashton at skidmore.edu Mon Jul 14 09:14:19 2008
From: aashton at skidmore.edu (Andrew Ashton)
Date: Mon Jul 14 09:14:04 2008
Subject: [XML4Lib] Position Available: Systems Librarian - Skidmore College,
Saratoga Springs NY (Search Extended)
Message-ID: <819F1B837600884C8C497F2BEE9CA4260795E647@MAIL-2.skidmore.edu>
Please excuse cross-posting:
SYSTEMS LIBRARIAN (search extended)
Skidmore College seeks a creative, service-oriented Systems Librarian to
provide leadership for library technology projects and digital
initiatives in the Scribner Library. The Systems Librarian participates
in the Library's strategic planning activities, helps to guide the
overall direction of technology implementation in the library, develops
and maintains library systems, and participates in reference,
instruction, and departmental liaison activities. In addition, the
Systems Librarian will be a key player in the development of Digital
Assets Management initiatives at the college.
Responsibilities:
* Develop and administer a comprehensive technology plan that is
integrated with the Library's strategic plan; recommend policies; plan
upgrades; be responsible for the Library's ILS (Ex Libris Voyager),
catalog (AquaBrowser), and other production systems (e.g. ILLiad).
* Stay abreast of emerging technologies, and collaborate with the
library faculty and staff to develop new technology projects.
* Serve as liaison with Skidmore's IT department.
* Represent the library in professional organizations and campus
committees.
* Supervises the Library Systems Analyst in supporting and
developing a variety of applications.
Required: ALA-accredited MLS/MLIS; a background in information
technology, programming, or equivalent experience; advanced knowledge of
emerging technologies and their impacts on academic libraries;
experience working with a broad set of technologies, including
programming and database management experience; solid knowledge of HTML
and common web technologies; capacity for working flexibly and
creatively in a rapidly changing environment; ability to work
effectively in a team environment; a demonstrated interest in
professional activities, including participation in local, state, and
national organizations.
Desirable: 2 years full time experience working as a professional
librarian in an academic library; a commitment to exploring how emerging
technologies, including Semantic Web technologies and XML, can impact
scholarly work; experience working with Perl, ColdFusion, SQL, and XML
in both Windows and UNIX network environments.
Expanded and renovated in 1995, Skidmore College's Lucy Scribner Library
is a state-of-the-art facility with an Ex Libris Voyager integrated
library system. The library, with a book collection of approximately
400,000 volumes and the most utilized computer cluster on campus, is
dedicated to serving the information needs of the college's student and
faculty population.
The position is a non-tenured 12-month faculty appointment reporting to
the College Librarian. For more information or to apply, please go to:
jobs.skidmore.edu
Review of applications will begin immediately and will continue until
the position is filled.
--
Andrew Ashton
Systems Librarian
Scribner Library, Skidmore College
(518)580-5505
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.webjunction.org/wjlists/xml4lib/attachments/20080714/a66e1cbe/attachment.htm
From jfitzgibbon at Galwaylibrary.ie Tue Jul 15 04:47:27 2008
From: jfitzgibbon at Galwaylibrary.ie (John Fitzgibbon)
Date: Tue Jul 15 04:48:48 2008
Subject: [XML4Lib] batch conversion of HTML files to XML
Message-ID: <2648AB3BE18F7544850030CBA097F27503865821D6@GCC-EXCHANGE07.galwaycoco.ie>
Hi,
Is it possible to convert a folder of HTML files to XML without having to edit each file with a text editor that supports regular expressions? In the past this is how I accomplished this task but I am hoping there is an easier way.
The process would have to change tags like
to
. Input tags in forms would also have to be closed.
It may have to close tags like
and
.
Finally, attribute values are not necessarily bounded by quotes. For example, width=200 will have to become width="200".
Am I searching for a holy grail?
Any advice would be much appreciated.
Regards
Jon
w: www.galwaylibrary.ie
e: info@galwaylibrary.ie
p: 00 353 91 562471
f: 00 353 91 565039
#####################################################################################
This e-mail message has been scanned for Content and cleared
by MailMarshal Hosted at Galway County Council
#####################################################################################
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.webjunction.org/wjlists/xml4lib/attachments/20080715/33acb3ca/attachment.htm
From dkane at wit.ie Tue Jul 15 06:21:02 2008
From: dkane at wit.ie (David Kane)
Date: Tue Jul 15 06:21:07 2008
Subject: [XML4Lib] batch conversion of HTML files to XML
In-Reply-To: <2648AB3BE18F7544850030CBA097F27503865821D6@GCC-EXCHANGE07.galwaycoco.ie>
References: <2648AB3BE18F7544850030CBA097F27503865821D6@GCC-EXCHANGE07.galwaycoco.ie>
Message-ID: <913710300807150321j1edd024fkaf5c6fb9f8a4842d@mail.gmail.com>
Hi John,
I suggest htmltidy, which is a utility that does just what you want. It
converts HTML to XHTML.
Google htmltidy and batch and you should get what you need.
Best,
David.
2008/7/15 John Fitzgibbon :
> Hi,
>
>
>
> Is it possible to convert a folder of HTML files to XML without having to
> edit each file with a text editor that supports regular expressions? In the
> past this is how I accomplished this task but I am hoping there is an easier
> way.
>
>
>
> The process would have to change tags like
to
. Input tags in
> forms would also have to be closed.
>
>
>
> It may have to close tags like and
.
>
>
>
> Finally, attribute values are not necessarily bounded by quotes. For
> example, width=200 will have to become width="200".
>
>
>
> Am I searching for a holy grail?
>
>
>
> Any advice would be much appreciated.
>
>
>
> Regards
>
> Jon
>
>
>
> w: www.galwaylibrary.ie
>
> e: info@galwaylibrary.ie
>
> p: 00 353 91 562471
>
> f: 00 353 91 565039
>
>
> ------------------------------
> This e-mail message has been scanned for Contentand cleared by *MailMarshal
> Hosted at Galway County Council*
> ------------------------------
>
> _______________________________________________
> XML4Lib mailing list
> XML4Lib@webjunction.org
> http://lists.webjunction.org/mailman/listinfo/xml4lib
>
>
--
David Kane
Systems Librarian
Waterford Institute of Technology
http://library.wit.ie/
T: ++353.51302838
M: ++353.876693212
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.webjunction.org/wjlists/xml4lib/attachments/20080715/3de7870c/attachment.htm
From rscheier at holycross.edu Tue Jul 15 08:41:59 2008
From: rscheier at holycross.edu (Robert H. Scheier)
Date: Tue Jul 15 08:42:08 2008
Subject: [XML4Lib] batch conversion of HTML files to XML
In-Reply-To: <2648AB3BE18F7544850030CBA097F27503865821D6@GCC-EXCHANGE07.galwaycoco.ie>
References: <2648AB3BE18F7544850030CBA097F27503865821D6@GCC-EXCHANGE07.galwaycoco.ie>
Message-ID: <487C9B17.9010901@holycross.edu>
I have also used XMLSpy to do some files, free to use for a limited
time. Not sure there is a batch feature.
Bob
=========================
Bob Scheier
Electronic Resources Librarian
Dinand Library
College of the Holy Cross
1 College Street
Worcester, Mass. 01610-2395
508-793-3495
rscheier@holycross.edu
=========================
John Fitzgibbon wrote:
>
> Hi,
>
> Is it possible to convert a folder of HTML files to XML without having
> to edit each file with a text editor that supports regular
> expressions? In the past this is how I accomplished this task but I am
> hoping there is an easier way.
>
> The process would have to change tags like
to
. Input tags
> in forms would also have to be closed.
>
> It may have to close tags like and
.
>
> Finally, attribute values are not necessarily bounded by quotes. For
> example, width=200 will have to become width=?200?.
>
> Am I searching for a holy grail?
>
> Any advice would be much appreciated.
>
> Regards
>
> Jon
>
> w: www.galwaylibrary.ie
>
> e: info@galwaylibrary.ie
>
> p: 00 353 91 562471
>
> f: 00 353 91 565039
>
> ------------------------------------------------------------------------
> This e-mail message has been scanned for Contentand cleared by
> *MailMarshal Hosted at Galway County Council*
> ------------------------------------------------------------------------
> ------------------------------------------------------------------------
>
> _______________________________________________
> XML4Lib mailing list
> XML4Lib@webjunction.org
> http://lists.webjunction.org/mailman/listinfo/xml4lib
>
From houghtoa at oclc.org Tue Jul 15 09:25:19 2008
From: houghtoa at oclc.org (Houghton,Andrew)
Date: Tue Jul 15 09:25:22 2008
Subject: [XML4Lib] batch conversion of HTML files to XML
In-Reply-To: <2648AB3BE18F7544850030CBA097F27503865821D6@GCC-EXCHANGE07.galwaycoco.ie>
References: <2648AB3BE18F7544850030CBA097F27503865821D6@GCC-EXCHANGE07.galwaycoco.ie>
Message-ID: <6548F17059905B48B2A6F28CE3692BAADECA85@OAEXCH4SERVER.oa.oclc.org>
No you are not searching for the holy grail. There are several tools that do what you are asking for. Tidy [1] and tagsoup [2] come to mind.
Andy.
[1] http://tidy.sourceforge.net/
[2] http://ccil.org/~cowan/XML/tagsoup/
From: xml4lib-bounces@webjunction.org [mailto:xml4lib-bounces@webjunction.org] On Behalf Of John Fitzgibbon
Sent: Tuesday, July 15, 2008 4:47 AM
To: xml4lib
Subject: [XML4Lib] batch conversion of HTML files to XML
Hi,
Is it possible to convert a folder of HTML files to XML without having to edit each file with a text editor that supports regular expressions? In the past this is how I accomplished this task but I am hoping there is an easier way.
The process would have to change tags like
to
. Input tags in forms would also have to be closed.
It may have to close tags like and
.
Finally, attribute values are not necessarily bounded by quotes. For example, width=200 will have to become width=?200?.
Am I searching for a holy grail?
Any advice would be much appreciated.
Regards
Jon
w: www.galwaylibrary.ie
e: info@galwaylibrary.ie
p: 00 353 91 562471
f: 00 353 91 565039
________________________________
This e-mail message has been scanned for Contentand cleared by MailMarshal Hosted at Galway County Council
________________________________
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.webjunction.org/wjlists/xml4lib/attachments/20080715/21385c71/attachment-0001.htm
From conal.tuohy at vuw.ac.nz Tue Jul 15 18:24:18 2008
From: conal.tuohy at vuw.ac.nz (Conal Tuohy)
Date: Tue Jul 15 18:28:16 2008
Subject: [XML4Lib] batch conversion of HTML files to XML
In-Reply-To: <2648AB3BE18F7544850030CBA097F27503865821D6@GCC-EXCHANGE07.galwaycoco.ie>
References: <2648AB3BE18F7544850030CBA097F27503865821D6@GCC-EXCHANGE07.galwaycoco.ie>
Message-ID: <1216160658.3670.13.camel@rb-501a-13-c>
Chiming in with one more option: JTidy (a Java version of Tidy)
http://jtidy.sourceforge.net/
On Tue, 2008-07-15 at 09:47 +0100, John Fitzgibbon wrote:
> Hi,
>
>
>
> Is it possible to convert a folder of HTML files to XML without having
> to edit each file with a text editor that supports regular
> expressions? In the past this is how I accomplished this task but I am
> hoping there is an easier way.
>
>
>
> The process would have to change tags like
to
. Input tags
> in forms would also have to be closed.
>
>
>
> It may have to close tags like and
.
>
>
>
> Finally, attribute values are not necessarily bounded by quotes. For
> example, width=200 will have to become width=?200?.
>
>
>
> Am I searching for a holy grail?
>
>
>
> Any advice would be much appreciated.
>
>
>
> Regards
>
> Jon
>
>
>
> w: www.galwaylibrary.ie
>
> e: info@galwaylibrary.ie
>
> p: 00 353 91 562471
>
> f: 00 353 91 565039
>
>
>
>
>
> ______________________________________________________________________
> This e-mail message has been scanned for Contentand cleared by
> MailMarshal Hosted at Galway County Council
> ______________________________________________________________________
> _______________________________________________
> XML4Lib mailing list
> XML4Lib@webjunction.org
> http://lists.webjunction.org/mailman/listinfo/xml4lib
--
Conal Tuohy
New Zealand Electronic Text Centre
www.nzetc.org
From tennantr at oclc.org Fri Jul 18 13:43:21 2008
From: tennantr at oclc.org (Roy Tennant)
Date: Fri Jul 18 13:43:25 2008
Subject: [XML4Lib] List of library-related APIs
Message-ID:
In following the UK effort to put together a "mashed up libraries" event, I
was struck with how useful having a list of library-related APIs would be
for this and other "unconference" kind of events. Unfortunately, I thought
that the list that Owen Stephens began at
http://tinyurl.com/59hop2
would be difficult to keep current and would unlikely to be used beyond that
one event. So with Owen's permission I took his beginning list of APIs and
some from the comments and started a page on my TechEssence.info site that I
hope can serve as a maintained list that all kinds of library developer
conferences can use and contribute to over time. It is at:
http://techessence.info/apis/
Please email me any suggested additions or changes. Or, if you're one of
these people: http://techessence.info/about/ you know you can do it
yourself. ;-) Thanks,
Roy
From steven_morris at ncsu.edu Sun Jul 20 15:43:08 2008
From: steven_morris at ncsu.edu (Steve Morris)
Date: Sun Jul 20 15:43:13 2008
Subject: [XML4Lib] Position: Digital Collections Technology Librarian (NCSU)
Message-ID: <4883954C.7090708@ncsu.edu>
Apologies for cross postings
The NCSU Libraries invites applications and nominations for the position
of Digital Collections Technology Librarian.
The Digital Collections Technology Librarian explores, adapts, and
implements emerging digital technologies in support of the library?s
digital collections, repository and publishing initiatives. The
incumbent will investigate and develop solutions to provide access to
and long-term management of heterogeneous collections including text,
images, video, and data. The Digital Collections Technology Librarian
will ensure established data standards are supported in the repository
for metadata management, data modeling and metadata workflow.
The incumbent will join the Digital Library Initiatives Department,
working in a highly collaborative environment with library colleagues
and external partners engaged in digital collections technical
architecture development, digital preservation, metadata architecture,
digital collections management, and digital services development.
Qualifications include an ALA-accredited MLS or equivalent advanced
degree, as well as relevant professional experience using emerging
digital library technologies. Knowledge of metadata standards, XML/XSLT,
and experience programming or scripting in a language such as PHP,
Python, or Java/JSP is expected. A familiarity with search and indexing
technologies such as SOLR, XTF, and Lucene is preferred, but not required.
See full vacancy announcements and further information at
www.lib.ncsu.edu/jobs/epa.html .
Apply online at https://jobs.ncsu.edu/. Search by position number C-60-0825.
The position will remain open until suitable candidates are found.
Affirmative Action/Equal Opportunity Employer
NC State welcomes all persons without regard to sexual orientation
Persons with disabilities requiring accommodations in the application
and interview process please call (919) 515-3148.
--
Steve Morris
Head of Digital Library Initiatives
North Carolina State University Libraries
Phone: (919) 515-1361 Fax: (919) 515-3031
Steven_Morris@ncsu.edu
From ann.apps at manchester.ac.uk Fri Jul 25 09:07:31 2008
From: ann.apps at manchester.ac.uk (Ann Apps)
Date: Fri Jul 25 09:07:37 2008
Subject: [XML4Lib] List of library-related APIs
In-Reply-To:
Message-ID: <20080725140731484.00000003644@annapps>
Hi Roy, and All,
You may be interested in the JISC Information Environment Service Registry (IESR) (http://iesr.ac.uk) in the UK, loosely within the library domain. It aims to record machine-to-machine services (APIs), both those that give access to collections and stand-alone services.
Via the IESR web search interface it is possible to search by service type (API protocol): http://iesr.ac.uk/service/iesrsrch?type=new . The resulting records give details of the resource collections and their APIs (bundled together). For each service there are details of its address, further interface details where appropriate (eg ZeeRex, WSDL), and Help page.
IESR itself has several APIs: OAI-PMH, Z39.50, SRU/W. And of course these are recorded in IESR.
Unfortunately, in reality there are not very many resource collections with m2m APIs, but we hope to increase IESR content. However a lot of resources have only a URL, which IESR pragmatically records as a webpage service.
Feel free to distribute this information more widely. I am not subscribed to the other lists to which the original message was sent so cannot mail to them.
Best wishes,
Ann
-------------------------------------------------
Ann Apps MBCS CITP. Research & Development, Mimas,
The University of Manchester, Oxford Road, Manchester, M13 9PL, UK
Tel: +44 (0) 161 275 6039 Fax: +44 (0) 161 275 6040
Email: ann.apps@manchester.ac.uk WWW: http://epub.mimas.ac.uk/ann.html
--------------------------------------------------
> -----Original Message-----
> From: xml4lib-bounces@webjunction.org [mailto:xml4lib-
> bounces@webjunction.org] On Behalf Of Roy Tennant
> Sent: Friday, July 18, 2008 6:43 PM
> To: Code for Libraries; web4lib@webjunction.org; xml4lib@webjunction.org
> Subject: [XML4Lib] List of library-related APIs
>
> In following the UK effort to put together a "mashed up libraries" event, I
> was struck with how useful having a list of library-related APIs would be
> for this and other "unconference" kind of events. Unfortunately, I thought
> that the list that Owen Stephens began at
>
> http://tinyurl.com/59hop2
>
> would be difficult to keep current and would unlikely to be used beyond that
> one event. So with Owen's permission I took his beginning list of APIs and
> some from the comments and started a page on my TechEssence.info site that I
> hope can serve as a maintained list that all kinds of library developer
> conferences can use and contribute to over time. It is at:
>
> http://techessence.info/apis/
>
> Please email me any suggested additions or changes. Or, if you're one of
> these people: http://techessence.info/about/ you know you can do it
> yourself. ;-) Thanks,
> Roy
>
>
> _______________________________________________
> XML4Lib mailing list
> XML4Lib@webjunction.org
> http://lists.webjunction.org/mailman/listinfo/xml4lib
>