[Nikto-discuss] Encoding in xml output

Mon Mar 29 12:36:11 UTC 2010

> Seeing some upper ascii characters in the xml output.  I think it is
> up to nikto to specify the encoding in its output.  Instead of this:
>
>  <?xml version="1.0" ?>
>
> Should it be something like this?
>
>  <?xml version="1.0" encoding="foo" ?>

Damnit I had to research to find this:
http://www.w3.org/TR/2008/REC-xml-20081126/#sec-well-formed

The format is:
[23]   	XMLDecl	   ::=   	'<?xml' VersionInfo  EncodingDecl? SDDecl? S? '?>'

So encoding is optional; which mean the XML is valid; but point taken
- we should really include an explicit encoding specification.

I'm interested in which malformed bits you found though - we should be
trapping anything that can have strange characters within CDATA tags,
anything else that gets through is a bug. Some redacted samples would
be useful (or a copy and paste of the bad bit).

> Assuming I'm correct and a patch will get in eventually, what should
> we assume the encoding is?  ISO-8859-1 or UTF-8?  In the output I've
> seen, it looks like ISO-8859-1.  Looks like we can edit
> templates/xml_start.tmpl and hard-code the encoding there until it
> gets patched.  Is that a decent workaround?

IIRC, perl 5.6+, like python, uses UTF-8 internally. This is a pretty
moot point at the moment as the databases and messages only use ASCII
codes from <127. I'd go with UTF-8 to be safe :-)

We can just fix this by altering templates/xml_start.tmpl, line 1 to be:
<?xml version="1.0" encoding="UTF-8" ?>

I'll add it to my list of things to do.

dave