Attachmate Worldwide  |   Contact Us  |   NetIQ.com
Home » Support » Solution Library

Technical Notes

Verastream, XML, and Character Encoding
Technical Note 10001
Last Reviewed 10-Jul-2008
Applies To
Verastream Integration Broker version 9.x or higher
Summary

When using Verastream to integrate different platforms and databases, you may encounter differences in character encoding methods, which can cause problems. These variances are most often seen when using XML or when integrating multiple platforms and databases, particularly ones from different countries. This technical note discusses XML character encoding makes suggestions for working in this environment.

XML Character Encoding

XML messages can use various character encodings, such as UTF-8, Unicode (UTF-16), or ISO-Latin1. Because each of these encodings use the same first 128 characters (the ASCII characters), you may not notice any problems when working with more than one character encoding type in your software until you start to use special characters (such as ë or Ä) that are outside of the ASCII set.

How Verastream Uses Character Sets

Verastream assumes that all it's strings are in the native character set encoding (the character set used by the operating system where Verastream Integration Broker is running), which is either ISO-Latin1 or EBCDIC. Therefore, if you use a script to manipulate an XML message, rather than simply passing the XML along, you need to verify that you are using the correct character encoding.

When working with character sets, consider the following information:

  • Verastream version 9.5 or higher provides support for both Binary and Text (string) forms of XML messages. When Text is selected, the XML message is assumed to be represented in the native character set; alternatively, when Binary is selected, the XML message is represented in the exact binary form.

For example:

    • The DOM_Implementation component has a method 'parseText' and a method 'parseBinary'
    • The DOMExt_Implementation component has a method 'readDocumentText' and a method 'readDocumentBinary'.

It is generally best to use the Binary form (and thus a field of type byte) to deal with XML messages in your application; however, if you need to manually manipulate the XML message using a string function (in a script), you may need to choose Text representation and use the corresponding Text methods to hand the XML message to an XML component.

  • For fields of type string or text:
    • If you read a file into the field with the "READ <field>" statement, the only translation that takes place is that line-breaks are translated into the native character sets line-feed character.

For example, when running Microsoft Windows, this means that a text file with CR-LF characters results in a field with one LF character.

    • The characters of the field value are assumed to be in the native character set.

Therefore, if you read a normal ASCII text file on Microsoft Windows, a byte with ASCII value 98 (decimal) represents a 'b' and on OS/390 a byte with a decimal value of 130 represents a 'b'.

  • For a field of type byte, no translation is done if the field is read from a file using "READ <field>".
  • Some tools, such as Notepad in Windows, precede a UNICODE text file with a BOM (Byte Order Marker). If you open or edit a file like this in Notepad, the BOM marker is not visible; however, if you read the file into a Verastream string, text, or byte field, the BOM marker is inserted in to the field.

If you then use the "<name>Text" method (a method with a name that ends "Text") to read this information into Verastream's XML parser, the BOM is converted incorrectly. To avoid this problem, use the "<name>Binary" method to hand over the field to the XML parser. Using this method, the BOM is not translated and is properly recognized by the XML parser.

  • When using the "<name>Text" method, character encoding is assumed to be native, and is converted by the Verastream Universal Integration Engine to the character encoding that is used by the XML parser.

If you want to construct or manipulate an XML message using string manipulation, and hand over the XML message to an XML component using the "<name>Text" method, you should not specify an XML declaration in that field. That is, the first line should not read <?xml ...?>.

If you read the document from file, or receive it in a non-binary form (so that the field of type text or string contains native characters), and you want to perform string manipulation, remove the <?xml ... ?> XML declaration line (that could define the character encoding).

Related Technical Notes
10999 Verastream Integration Broker Technical Notes

Did this technical note answer your question?

Yes    No    Somewhat     Not sure yet

Additional comments about this tech note:

Need further help? For technical support, please contact Support.