From kratzer at bsb-muenchen.de Thu Jul 3 05:29:39 2008 From: kratzer at bsb-muenchen.de (Mathias Kratzer) Date: Thu Jul 3 05:29:56 2008 Subject: [XML4Lib] Question on trailing whitespace in MARCXML controlfield elements Message-ID: <486CB823.92C9.00ED.0@bsb-muenchen.de> Dear all, is it valid to omit trailing whitespace in MARCXML controlfield elements? Example: 920219s1993 caua j 000 0 eng instead of 920219s1993 caua j 000 0 eng According to the MARCXML schema leader and controlfields are subject to "whitespace preservation", and for inerior whitespace it is clear why this is vital. However, any application could easily fill any missing positions with whitespaces by default. IMHO it is still _not_ valid to omit trailing whitespace - so my real problem is that I'm not able to find any sort of document that states "yes, doing so will let your MARCXML become invalid". I am completely aware of the fact that the LoC (as maintaining agency of the MARCXML standard) is the appropriate address for my question but the "Contact Us" link on http://www.loc.gov/standards/marcxml/// only led me to the general "Ask a Librarian" page. So I thougt this mailing list is probably the more direct way to ask the experts :-) Best regards Mathias _____________________________________________________________ )_______ )_______ Bavarian State Library )_______ Bavarian Library Network / Head Office )_______ Dr. Mathias Kratzer )_______ Ludwigstra?e 16 )_______ D-80539 M?nchen )_______ phone: +49 (0)89 28638-2797 )_______ fax: +49 (0)89 28638-2605 )_______ eMail: kratzer@bsb-muenchen.de )____________________________________________________________ From houghtoa at oclc.org Thu Jul 3 09:26:13 2008 From: houghtoa at oclc.org (Houghton,Andrew) Date: Thu Jul 3 09:26:15 2008 Subject: [XML4Lib] Question on trailing whitespace in MARCXML controlfieldelements In-Reply-To: <486CB823.92C9.00ED.0@bsb-muenchen.de> References: <486CB823.92C9.00ED.0@bsb-muenchen.de> Message-ID: <6548F17059905B48B2A6F28CE3692BAAC89F48@OAEXCH4SERVER.oa.oclc.org> > From: xml4lib-bounces@webjunction.org [mailto:xml4lib- > bounces@webjunction.org] On Behalf Of Mathias Kratzer > Sent: Thursday, July 03, 2008 5:30 AM > To: xml4lib > Subject: [XML4Lib] Question on trailing whitespace in MARCXML > controlfieldelements > > Dear all, > > is it valid to omit trailing whitespace in MARCXML controlfield > elements? > > Example: > > 920219s1993 caua j 000 0 > eng > > instead of > > 920219s1993 caua j 000 0 eng > > > > According to the MARCXML schema leader and controlfields are subject to > "whitespace preservation", and for interior whitespace it is clear why > this is vital. However, any application could easily fill any missing > positions with whitespaces by default. IMHO it is still _not_ valid to > omit trailing whitespace - so my real problem is that I'm not able to > find any sort of document that states "yes, doing so will let your > MARCXML become invalid". > > I am completely aware of the fact that the LoC (as maintaining agency > of the MARCXML standard) is the appropriate address for my question but > the "Contact Us" link on http://www.loc.gov/standards/marcxml/// only > led me to the general "Ask a Librarian" page. So I thought this mailing > list is probably the more direct way to ask the experts :-) The MARC-XML schema use whitespace preservation because whitespace is a content issue. You could omit the whitespace and your MARC-XML would still validate. However, the MARC 21 standards, e.g., authority, bibliographic, holdings, etc. specify what the content should be. In those standards the 008 is exactly 40 characters long. Which means that those trailing spaces are significant to the interpretation of the content that is serialized into any format whether it be ISO 2709 or a MARC-XML instance document. Andy. From aashton at skidmore.edu Mon Jul 14 09:14:19 2008 From: aashton at skidmore.edu (Andrew Ashton) Date: Mon Jul 14 09:14:04 2008 Subject: [XML4Lib] Position Available: Systems Librarian - Skidmore College, Saratoga Springs NY (Search Extended) Message-ID: <819F1B837600884C8C497F2BEE9CA4260795E647@MAIL-2.skidmore.edu> Please excuse cross-posting: SYSTEMS LIBRARIAN (search extended) Skidmore College seeks a creative, service-oriented Systems Librarian to provide leadership for library technology projects and digital initiatives in the Scribner Library. The Systems Librarian participates in the Library's strategic planning activities, helps to guide the overall direction of technology implementation in the library, develops and maintains library systems, and participates in reference, instruction, and departmental liaison activities. In addition, the Systems Librarian will be a key player in the development of Digital Assets Management initiatives at the college. Responsibilities: * Develop and administer a comprehensive technology plan that is integrated with the Library's strategic plan; recommend policies; plan upgrades; be responsible for the Library's ILS (Ex Libris Voyager), catalog (AquaBrowser), and other production systems (e.g. ILLiad). * Stay abreast of emerging technologies, and collaborate with the library faculty and staff to develop new technology projects. * Serve as liaison with Skidmore's IT department. * Represent the library in professional organizations and campus committees. * Supervises the Library Systems Analyst in supporting and developing a variety of applications. Required: ALA-accredited MLS/MLIS; a background in information technology, programming, or equivalent experience; advanced knowledge of emerging technologies and their impacts on academic libraries; experience working with a broad set of technologies, including programming and database management experience; solid knowledge of HTML and common web technologies; capacity for working flexibly and creatively in a rapidly changing environment; ability to work effectively in a team environment; a demonstrated interest in professional activities, including participation in local, state, and national organizations. Desirable: 2 years full time experience working as a professional librarian in an academic library; a commitment to exploring how emerging technologies, including Semantic Web technologies and XML, can impact scholarly work; experience working with Perl, ColdFusion, SQL, and XML in both Windows and UNIX network environments. Expanded and renovated in 1995, Skidmore College's Lucy Scribner Library is a state-of-the-art facility with an Ex Libris Voyager integrated library system. The library, with a book collection of approximately 400,000 volumes and the most utilized computer cluster on campus, is dedicated to serving the information needs of the college's student and faculty population. The position is a non-tenured 12-month faculty appointment reporting to the College Librarian. For more information or to apply, please go to: jobs.skidmore.edu Review of applications will begin immediately and will continue until the position is filled. -- Andrew Ashton Systems Librarian Scribner Library, Skidmore College (518)580-5505 -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.webjunction.org/wjlists/xml4lib/attachments/20080714/a66e1cbe/attachment.htm From jfitzgibbon at Galwaylibrary.ie Tue Jul 15 04:47:27 2008 From: jfitzgibbon at Galwaylibrary.ie (John Fitzgibbon) Date: Tue Jul 15 04:48:48 2008 Subject: [XML4Lib] batch conversion of HTML files to XML Message-ID: <2648AB3BE18F7544850030CBA097F27503865821D6@GCC-EXCHANGE07.galwaycoco.ie> Hi, Is it possible to convert a folder of HTML files to XML without having to edit each file with a text editor that supports regular expressions? In the past this is how I accomplished this task but I am hoping there is an easier way. The process would have to change tags like
to
. Input tags in forms would also have to be closed. It may have to close tags like

and

  • . Finally, attribute values are not necessarily bounded by quotes. For example, width=200 will have to become width="200". Am I searching for a holy grail? Any advice would be much appreciated. Regards Jon w: www.galwaylibrary.ie e: info@galwaylibrary.ie p: 00 353 91 562471 f: 00 353 91 565039 ##################################################################################### This e-mail message has been scanned for Content and cleared by MailMarshal Hosted at Galway County Council ##################################################################################### -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.webjunction.org/wjlists/xml4lib/attachments/20080715/33acb3ca/attachment.htm From dkane at wit.ie Tue Jul 15 06:21:02 2008 From: dkane at wit.ie (David Kane) Date: Tue Jul 15 06:21:07 2008 Subject: [XML4Lib] batch conversion of HTML files to XML In-Reply-To: <2648AB3BE18F7544850030CBA097F27503865821D6@GCC-EXCHANGE07.galwaycoco.ie> References: <2648AB3BE18F7544850030CBA097F27503865821D6@GCC-EXCHANGE07.galwaycoco.ie> Message-ID: <913710300807150321j1edd024fkaf5c6fb9f8a4842d@mail.gmail.com> Hi John, I suggest htmltidy, which is a utility that does just what you want. It converts HTML to XHTML. Google htmltidy and batch and you should get what you need. Best, David. 2008/7/15 John Fitzgibbon : > Hi, > > > > Is it possible to convert a folder of HTML files to XML without having to > edit each file with a text editor that supports regular expressions? In the > past this is how I accomplished this task but I am hoping there is an easier > way. > > > > The process would have to change tags like
    to
    . Input tags in > forms would also have to be closed. > > > > It may have to close tags like

    and

  • . > > > > Finally, attribute values are not necessarily bounded by quotes. For > example, width=200 will have to become width="200". > > > > Am I searching for a holy grail? > > > > Any advice would be much appreciated. > > > > Regards > > Jon > > > > w: www.galwaylibrary.ie > > e: info@galwaylibrary.ie > > p: 00 353 91 562471 > > f: 00 353 91 565039 > > > ------------------------------ > This e-mail message has been scanned for Contentand cleared by *MailMarshal > Hosted at Galway County Council* > ------------------------------ > > _______________________________________________ > XML4Lib mailing list > XML4Lib@webjunction.org > http://lists.webjunction.org/mailman/listinfo/xml4lib > > -- David Kane Systems Librarian Waterford Institute of Technology http://library.wit.ie/ T: ++353.51302838 M: ++353.876693212 -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.webjunction.org/wjlists/xml4lib/attachments/20080715/3de7870c/attachment.htm From rscheier at holycross.edu Tue Jul 15 08:41:59 2008 From: rscheier at holycross.edu (Robert H. Scheier) Date: Tue Jul 15 08:42:08 2008 Subject: [XML4Lib] batch conversion of HTML files to XML In-Reply-To: <2648AB3BE18F7544850030CBA097F27503865821D6@GCC-EXCHANGE07.galwaycoco.ie> References: <2648AB3BE18F7544850030CBA097F27503865821D6@GCC-EXCHANGE07.galwaycoco.ie> Message-ID: <487C9B17.9010901@holycross.edu> I have also used XMLSpy to do some files, free to use for a limited time. Not sure there is a batch feature. Bob ========================= Bob Scheier Electronic Resources Librarian Dinand Library College of the Holy Cross 1 College Street Worcester, Mass. 01610-2395 508-793-3495 rscheier@holycross.edu ========================= John Fitzgibbon wrote: > > Hi, > > Is it possible to convert a folder of HTML files to XML without having > to edit each file with a text editor that supports regular > expressions? In the past this is how I accomplished this task but I am > hoping there is an easier way. > > The process would have to change tags like
    to
    . Input tags > in forms would also have to be closed. > > It may have to close tags like

    and

  • . > > Finally, attribute values are not necessarily bounded by quotes. For > example, width=200 will have to become width=?200?. > > Am I searching for a holy grail? > > Any advice would be much appreciated. > > Regards > > Jon > > w: www.galwaylibrary.ie > > e: info@galwaylibrary.ie > > p: 00 353 91 562471 > > f: 00 353 91 565039 > > ------------------------------------------------------------------------ > This e-mail message has been scanned for Contentand cleared by > *MailMarshal Hosted at Galway County Council* > ------------------------------------------------------------------------ > ------------------------------------------------------------------------ > > _______________________________________________ > XML4Lib mailing list > XML4Lib@webjunction.org > http://lists.webjunction.org/mailman/listinfo/xml4lib > From houghtoa at oclc.org Tue Jul 15 09:25:19 2008 From: houghtoa at oclc.org (Houghton,Andrew) Date: Tue Jul 15 09:25:22 2008 Subject: [XML4Lib] batch conversion of HTML files to XML In-Reply-To: <2648AB3BE18F7544850030CBA097F27503865821D6@GCC-EXCHANGE07.galwaycoco.ie> References: <2648AB3BE18F7544850030CBA097F27503865821D6@GCC-EXCHANGE07.galwaycoco.ie> Message-ID: <6548F17059905B48B2A6F28CE3692BAADECA85@OAEXCH4SERVER.oa.oclc.org> No you are not searching for the holy grail. There are several tools that do what you are asking for. Tidy [1] and tagsoup [2] come to mind. Andy. [1] http://tidy.sourceforge.net/ [2] http://ccil.org/~cowan/XML/tagsoup/ From: xml4lib-bounces@webjunction.org [mailto:xml4lib-bounces@webjunction.org] On Behalf Of John Fitzgibbon Sent: Tuesday, July 15, 2008 4:47 AM To: xml4lib Subject: [XML4Lib] batch conversion of HTML files to XML Hi, Is it possible to convert a folder of HTML files to XML without having to edit each file with a text editor that supports regular expressions? In the past this is how I accomplished this task but I am hoping there is an easier way. The process would have to change tags like
    to
    . Input tags in forms would also have to be closed. It may have to close tags like

    and

  • . Finally, attribute values are not necessarily bounded by quotes. For example, width=200 will have to become width=?200?. Am I searching for a holy grail? Any advice would be much appreciated. Regards Jon w: www.galwaylibrary.ie e: info@galwaylibrary.ie p: 00 353 91 562471 f: 00 353 91 565039 ________________________________ This e-mail message has been scanned for Contentand cleared by MailMarshal Hosted at Galway County Council ________________________________ -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.webjunction.org/wjlists/xml4lib/attachments/20080715/21385c71/attachment-0001.htm From conal.tuohy at vuw.ac.nz Tue Jul 15 18:24:18 2008 From: conal.tuohy at vuw.ac.nz (Conal Tuohy) Date: Tue Jul 15 18:28:16 2008 Subject: [XML4Lib] batch conversion of HTML files to XML In-Reply-To: <2648AB3BE18F7544850030CBA097F27503865821D6@GCC-EXCHANGE07.galwaycoco.ie> References: <2648AB3BE18F7544850030CBA097F27503865821D6@GCC-EXCHANGE07.galwaycoco.ie> Message-ID: <1216160658.3670.13.camel@rb-501a-13-c> Chiming in with one more option: JTidy (a Java version of Tidy) http://jtidy.sourceforge.net/ On Tue, 2008-07-15 at 09:47 +0100, John Fitzgibbon wrote: > Hi, > > > > Is it possible to convert a folder of HTML files to XML without having > to edit each file with a text editor that supports regular > expressions? In the past this is how I accomplished this task but I am > hoping there is an easier way. > > > > The process would have to change tags like
    to
    . Input tags > in forms would also have to be closed. > > > > It may have to close tags like

    and

  • . > > > > Finally, attribute values are not necessarily bounded by quotes. For > example, width=200 will have to become width=?200?. > > > > Am I searching for a holy grail? > > > > Any advice would be much appreciated. > > > > Regards > > Jon > > > > w: www.galwaylibrary.ie > > e: info@galwaylibrary.ie > > p: 00 353 91 562471 > > f: 00 353 91 565039 > > > > > > ______________________________________________________________________ > This e-mail message has been scanned for Contentand cleared by > MailMarshal Hosted at Galway County Council > ______________________________________________________________________ > _______________________________________________ > XML4Lib mailing list > XML4Lib@webjunction.org > http://lists.webjunction.org/mailman/listinfo/xml4lib -- Conal Tuohy New Zealand Electronic Text Centre www.nzetc.org From tennantr at oclc.org Fri Jul 18 13:43:21 2008 From: tennantr at oclc.org (Roy Tennant) Date: Fri Jul 18 13:43:25 2008 Subject: [XML4Lib] List of library-related APIs Message-ID: In following the UK effort to put together a "mashed up libraries" event, I was struck with how useful having a list of library-related APIs would be for this and other "unconference" kind of events. Unfortunately, I thought that the list that Owen Stephens began at http://tinyurl.com/59hop2 would be difficult to keep current and would unlikely to be used beyond that one event. So with Owen's permission I took his beginning list of APIs and some from the comments and started a page on my TechEssence.info site that I hope can serve as a maintained list that all kinds of library developer conferences can use and contribute to over time. It is at: http://techessence.info/apis/ Please email me any suggested additions or changes. Or, if you're one of these people: http://techessence.info/about/ you know you can do it yourself. ;-) Thanks, Roy From steven_morris at ncsu.edu Sun Jul 20 15:43:08 2008 From: steven_morris at ncsu.edu (Steve Morris) Date: Sun Jul 20 15:43:13 2008 Subject: [XML4Lib] Position: Digital Collections Technology Librarian (NCSU) Message-ID: <4883954C.7090708@ncsu.edu> Apologies for cross postings The NCSU Libraries invites applications and nominations for the position of Digital Collections Technology Librarian. The Digital Collections Technology Librarian explores, adapts, and implements emerging digital technologies in support of the library?s digital collections, repository and publishing initiatives. The incumbent will investigate and develop solutions to provide access to and long-term management of heterogeneous collections including text, images, video, and data. The Digital Collections Technology Librarian will ensure established data standards are supported in the repository for metadata management, data modeling and metadata workflow. The incumbent will join the Digital Library Initiatives Department, working in a highly collaborative environment with library colleagues and external partners engaged in digital collections technical architecture development, digital preservation, metadata architecture, digital collections management, and digital services development. Qualifications include an ALA-accredited MLS or equivalent advanced degree, as well as relevant professional experience using emerging digital library technologies. Knowledge of metadata standards, XML/XSLT, and experience programming or scripting in a language such as PHP, Python, or Java/JSP is expected. A familiarity with search and indexing technologies such as SOLR, XTF, and Lucene is preferred, but not required. See full vacancy announcements and further information at www.lib.ncsu.edu/jobs/epa.html . Apply online at https://jobs.ncsu.edu/. Search by position number C-60-0825. The position will remain open until suitable candidates are found. Affirmative Action/Equal Opportunity Employer NC State welcomes all persons without regard to sexual orientation Persons with disabilities requiring accommodations in the application and interview process please call (919) 515-3148. -- Steve Morris Head of Digital Library Initiatives North Carolina State University Libraries Phone: (919) 515-1361 Fax: (919) 515-3031 Steven_Morris@ncsu.edu From ann.apps at manchester.ac.uk Fri Jul 25 09:07:31 2008 From: ann.apps at manchester.ac.uk (Ann Apps) Date: Fri Jul 25 09:07:37 2008 Subject: [XML4Lib] List of library-related APIs In-Reply-To: Message-ID: <20080725140731484.00000003644@annapps> Hi Roy, and All, You may be interested in the JISC Information Environment Service Registry (IESR) (http://iesr.ac.uk) in the UK, loosely within the library domain. It aims to record machine-to-machine services (APIs), both those that give access to collections and stand-alone services. Via the IESR web search interface it is possible to search by service type (API protocol): http://iesr.ac.uk/service/iesrsrch?type=new . The resulting records give details of the resource collections and their APIs (bundled together). For each service there are details of its address, further interface details where appropriate (eg ZeeRex, WSDL), and Help page. IESR itself has several APIs: OAI-PMH, Z39.50, SRU/W. And of course these are recorded in IESR. Unfortunately, in reality there are not very many resource collections with m2m APIs, but we hope to increase IESR content. However a lot of resources have only a URL, which IESR pragmatically records as a webpage service. Feel free to distribute this information more widely. I am not subscribed to the other lists to which the original message was sent so cannot mail to them. Best wishes, Ann ------------------------------------------------- Ann Apps MBCS CITP. Research & Development, Mimas, The University of Manchester, Oxford Road, Manchester, M13 9PL, UK Tel: +44 (0) 161 275 6039 Fax: +44 (0) 161 275 6040 Email: ann.apps@manchester.ac.uk WWW: http://epub.mimas.ac.uk/ann.html -------------------------------------------------- > -----Original Message----- > From: xml4lib-bounces@webjunction.org [mailto:xml4lib- > bounces@webjunction.org] On Behalf Of Roy Tennant > Sent: Friday, July 18, 2008 6:43 PM > To: Code for Libraries; web4lib@webjunction.org; xml4lib@webjunction.org > Subject: [XML4Lib] List of library-related APIs > > In following the UK effort to put together a "mashed up libraries" event, I > was struck with how useful having a list of library-related APIs would be > for this and other "unconference" kind of events. Unfortunately, I thought > that the list that Owen Stephens began at > > http://tinyurl.com/59hop2 > > would be difficult to keep current and would unlikely to be used beyond that > one event. So with Owen's permission I took his beginning list of APIs and > some from the comments and started a page on my TechEssence.info site that I > hope can serve as a maintained list that all kinds of library developer > conferences can use and contribute to over time. It is at: > > http://techessence.info/apis/ > > Please email me any suggested additions or changes. Or, if you're one of > these people: http://techessence.info/about/ you know you can do it > yourself. ;-) Thanks, > Roy > > > _______________________________________________ > XML4Lib mailing list > XML4Lib@webjunction.org > http://lists.webjunction.org/mailman/listinfo/xml4lib >