Programming XML - SAX - comp.text.xml correspondance

Message 1 in thread
From: Martin Baker
Subject: how can I read a XML file into my program without validating an external DTD?
Newsgroups: comp.text.xml
Date: 2003-03-02 08:03:17 PST


Could anyone suggest how can I read a XML file into my program without
validating an external DTD?

The files I am reading specify a DTD as follows:
<!DOCTYPE X3D PUBLIC "ISO//Web3D//DTD X3D 3.0//EN"
"http://www.web3d.org/specifications/x3d-3.0.dtd">

But I want to be able to parse these files offline, when I try I get the
following error:
java.net.UnknownHostException: www.web3d.org

I am using the SAX parser which comes with Java JDK1.4, Crimson, but I would
like a solution which is portable accros any java distribution.

I have tried the following in an attempt to stop the parser from attempting
to validate:
parser.setFeature("http://xml.org/sax/features/validation",false);
but this does not make any difference (in fact this feature is already set
to false by default)

I have also tried the following:
parser.setFeature("http://xml.org/sax/features/external-general-entities",fa
lse);
parser.setFeature("http://xml.org/sax/features/external-parameter-entities",
false);
but these give the following error:
NotSupportedException: Feature:
http://xml.org/sax/features/external-general-entities

So I then tried:
parser.setFeature("http://apache.org/xml/features/nonvalidating/load-externa
l-dtd",false);
this gives the following error:
NotRecognizedException: Feature:
http://apache.org/xml/features/nonvalidating/load-external-dtd

I then tried setting an EntityResolver as follows:
parser.setEntityResolver(new mjbEntityResolver());

where mjbEntityResolver is defined as follows:

public class mjbEntityResolver implements EntityResolver {
public mjbEntityResolver() {
}
public InputSource resolveEntity(String publicID, String systemID) throws
SAXException {
return null;
}
}

This still generated a java.net.UnknownHostException: www.web3d.org
So I changed it as follows:

public class mjbEntityResolver implements EntityResolver {
public mjbEntityResolver() {
}
public InputSource resolveEntity(String publicID, String systemID) throws
SAXException {
return new InputSource();
}
}

The parser then generates this error:
java.lang.NullPointerException
so I then changed it as follows:


public class mjbEntityResolver implements EntityResolver {
public mjbEntityResolver() {
}
public InputSource resolveEntity(String publicID, String systemID) throws
SAXException {
return new InputSource("");
}
}

This gives the following error:
org.xml.sax.SAXParseException: External parameter entity "%[dtd];" has
characters after markup.

I can't think of anything elese to try, I don't want to modify XML files or
load a different parser.
Can anyone suggest how to do this?

Thanks,

Martin


Message 2 in thread
From: Johannes Koch
Subject: Re: how can I read a XML file into my program without validating an external DTD?
Newsgroups: comp.text.xml
Date: 2003-03-02 09:51:15 PST

Martin Baker wrote:
> Could anyone suggest how can I read a XML file into my program without
> validating an external DTD?
>
> The files I am reading specify a DTD as follows:
> <!DOCTYPE X3D PUBLIC "ISO//Web3D//DTD X3D 3.0//EN"
> "http://www.web3d.org/specifications/x3d-3.0.dtd">
>
> But I want to be able to parse these files offline, when I try I get the
> following error:
> java.net.UnknownHostException: www.web3d.org
>
> I am using the SAX parser which comes with Java JDK1.4, Crimson, but I would
> like a solution which is portable accros any java distribution.
>
> I have tried the following in an attempt to stop the parser from attempting
> to validate:
> parser.setFeature("http://xml.org/sax/features/validation",false);
> but this does not make any difference (in fact this feature is already set
> to false by default)

It's not a question of validation, but of DTD processing. You should
register an entity resolver implementing org.xml.sax.EntityResolver), so
the 'external entity' (the system identifier in the DocTypeDecl) gets
resolved to some file on your computer.

You may have a look at the resolver in xml-commons on the xml.apache.org
site which lets you use OASIS catalogs (among others).

--
Johannes Koch
In te domine speravi; non confundar in aeternum.
(Te Deum, 4th century)


Message 3 in thread
From: Martin Baker
Subject: Re: how can I read a XML file into my program without validating an external DTD?
Newsgroups: comp.text.xml
Date: 2003-03-02 10:55:16 PST

> It's not a question of validation, but of DTD processing. You should
> register an entity resolver implementing org.xml.sax.EntityResolver), so
> the 'external entity' (the system identifier in the DocTypeDecl) gets
> resolved to some file on your computer.
>
> You may have a look at the resolver in xml-commons on the xml.apache.org
> site which lets you use OASIS catalogs (among others).
>
> --
> Johannes Koch
> In te domine speravi; non confundar in aeternum.
> (Te Deum, 4th century)
>

I was hoping not to have to do this, because this means that when I
distribute my program I will have to distribute the DTD file or assume all
users have continuous online access.

I cant understand why SAX needs to read the DTD? My Java program understands
the structure all I want SAX to do is give me the Elements and Attributes in
the order that they are read.

Is there no way that I can use SAX to read the XML without using the DTD?

Martin


Message 4 in thread
From: Oliver Bonten
Subject: Re: how can I read a XML file into my program without validating an external DTD?
Newsgroups: comp.text.xml
Date: 2003-03-03 11:51:58 PST

On Sun, 02 Mar 2003 18:54:58 +0000, Martin Baker wrote:

> I was hoping not to have to do this, because this means that when I
> distribute my program I will have to distribute the DTD file or assume all
> users have continuous online access.
>
> I cant understand why SAX needs to read the DTD? My Java program understands
> the structure all I want SAX to do is give me the Elements and Attributes in
> the order that they are read.
>
> Is there no way that I can use SAX to read the XML without using the DTD?

The way I understand this is: not resolving an external entity set and not
validating the document are two things. Actually, in your case, the only
purpose for resolving the external entity set is to validate the document,
so it has no effect when you don't validate, but the parser is not aware
of the fact and does not think far enough ahead to realise that it would
only ignore the content of the resolved external entity anyway - and I
think the XML spec even demands this behaviour. So it dutifully tries to
read the file that your XML document includes, and then proceeds with
ignoring all the content.

XML was derived from SGML to allow parsers to be stupid. So don't complain
if parsers act stupid.

Oliver


Message 5 in thread
From: Oliver Bonten
Subject: Re: how can I read a XML file into my program without validating an external DTD?
Newsgroups: comp.text.xml
Date: 2003-03-03 11:52:02 PST

On Sun, 02 Mar 2003 18:54:58 +0000, Martin Baker wrote:

> I was hoping not to have to do this, because this means that when I
> distribute my program I will have to distribute the DTD file or assume all
> users have continuous online access.
>
> I cant understand why SAX needs to read the DTD? My Java program understands
> the structure all I want SAX to do is give me the Elements and Attributes in
> the order that they are read.
>
> Is there no way that I can use SAX to read the XML without using the DTD?

The way I understand this is: not resolving an external entity set and not
validating the document are two things. Actually, in your case, the only purpose for
resolving the external entity set is to validate the document, so it has
no effect when you don't validate, but the parser is not aware of the fact
and does not think far enough ahead to realise that it would only ignore
the content of the resolved external entity anyway - and I think the XML
spec even demands this behaviour. So it dutifully tries to read the file
that your XML document includes, and then proceeds with ignoring all the
content.

XML was derived from SGML to allow parsers to be stupid. So don't complain
if parsers act stupid.

Oliver


Message 6 in thread
From: Peter Flynn
Subject: Re: how can I read a XML file into my program without validating an external DTD?
Newsgroups: comp.text.xml
Date: 2003-03-03 14:44:04 PST

Martin Baker wrote:

> I cant understand why SAX needs to read the DTD?

It doesn't. It only tries to because your file says so.
Remove the DocType Declaration and it will parse it as well-formed only.

But if the DTD specifies stuff the processor needs, like attribute value
defaults, namespaces, notations, or ID/IDREFs, you need to perform a
validating parse otherwise your processing will screw up.

///Peter


Message 7 in thread
From: Peter Flynn
Subject: Re: how can I read a XML file into my program without validating an external DTD?
Newsgroups: comp.text.xml
Date: 2003-03-02 10:13:27 PST

Martin Baker wrote:
> Could anyone suggest how can I read a XML file into my program without
> validating an external DTD?
>
> The files I am reading specify a DTD as follows:
> <!DOCTYPE X3D PUBLIC "ISO//Web3D//DTD X3D 3.0//EN"
> "http://www.web3d.org/specifications/x3d-3.0.dtd">
>
> But I want to be able to parse these files offline

Download a copy of the DTD to your local disk, and then
edit the System Identifier in the XML file to reflect its
location, eg

<!DOCTYPE X3D PUBLIC "ISO//Web3D//DTD X3D 3.0//EN"
"/usr/share/dtds/x3d-3.0.dtd">

///Peter


Message 8 in thread
From: Martin Baker
Subject: Re: how can I read a XML file into my program without validating an external DTD?
Newsgroups: comp.text.xml
Date: 2003-03-02 10:55:17 PST

> Download a copy of the DTD to your local disk, and then
> edit the System Identifier in the XML file to reflect its
> location, eg
>
> <!DOCTYPE X3D PUBLIC "ISO//Web3D//DTD X3D 3.0//EN"
> "/usr/share/dtds/x3d-3.0.dtd">
>
> ///Peter
>
> Johannes Koch
> In te domine speravi; non confundar in aeternum.
> (Te Deum, 4th century)
>

I was hoping not to have to do this, because this means that when I
distribute my program all the users will have to modify the x3d files that
they are using. I would like to make my program a general purpose x3d editor
that people can use without making assumptions about the location of a dtd
file or having online access.

I cant understand why SAX needs to read the DTD? My Java program understands
the structure all I want SAX to do is give me the Elements and Attributes in
the order that they are read.

Is there no way that I can use SAX to read the XML without using the DTD?

Martin


Message 9 in thread
From: Martin Baker
Subject: Re: how can I read a XML file into my program without validating an external DTD?
Newsgroups: comp.text.xml
Date: 2003-03-03 00:51:05 PST

I have been doing some more research and I found the following which
suggests that it is posible to do what I want:

http://groups.google.com/groups?hl=en&lr=&ie=UTF-8&oe=UTF-8&threadm=3153269.Sb9uPGUboI%40ltgt.net&rnum=32&prev=/groups%3Fq%3Dentityresolver%26hl%3Den%26lr%3D%26ie%3DUTF-8%26oe%3DUTF-8%26start%3D30%26sa%3DN

This suggests that I can do what I want if I 'give it an EntityResolver
returning an empty InputSource'. As I said in my original message I tried
this as follows:


public class mjbEntityResolver implements EntityResolver {
public mjbEntityResolver() {
}
public InputSource resolveEntity(String publicID, String systemID) throws
SAXException {
return new InputSource();
}
}

The parser then generates this error:
java.lang.NullPointerException
so I then changed it as follows:


public class mjbEntityResolver implements EntityResolver {
public mjbEntityResolver() {
}
public InputSource resolveEntity(String publicID, String systemID) throws
SAXException {
return new InputSource("");
}
}

This gives the following error:
org.xml.sax.SAXParseException: External parameter entity "%[dtd];" has
characters after markup.

Can anyone say what 'an empty InputSource' means?

Martin

 

 


metadata block
see also:

 

Correspondence about this page comp.text.xml

This site may have errors. Don't use for critical systems.

Copyright (c) 1998-2023 Martin John Baker - All rights reserved - privacy policy.