LpexNls (LPEX v3.0.0)

Overview

Package

Class

Use

Tree

Serialized

Deprecated

Index

Help

SWT LPEX
v3.0.0

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

com.ibm.lpex.core
Class LpexNls

java.lang.Object
  com.ibm.lpex.core.LpexNls

public class LpexNls
extends Object

This class provides national language support (NLS) functions. These are mostly DBCS/MBCS-related functions, and may not work correctly for other character encodings. Some bidirectional support will also be added gradually, mainly for the editor's internal use. There is one instance of this class for each document view.

These NLS functions attempt to ease the handling by users of files originating on and/or targeted for a remote system, in a manner similar to their handling in that particular environment (e.g., iSeries members being edited with an iSeries editor): emulation of SO/SI control characters, awareness of the text's actual positions, columns, length, and sequence numbers (all byte-determined), etc.

The current implementation, while attempting to be generic, focuses on Windows as the workstation, and zSeries (S/390) and iSeries (AS/400) as the remote system. These remote systems use EBCDIC character encodings: a DBCS character uses two bytes, a DBCS string is delimited by SO/SI controls, certain character combinations (Arabic lam + alef) may translate into a single-byte character (visual lamalef code point). Also, for these systems each byte in the character encoding takes up one display column on the screen (1-byte SBCS character = 1 display column, 2-byte DBCS character = 2-column display width, the SO and SI control characters = 1 display column each).

For EUC character encodings (UNIX, AIX, Linux), LpexNls may not provide adequate emulation of their source edit environments. The byte-length of characters differs from their display-column width, so you may have to use native code (JNI), e.g., the *mb* C library functions, for display-width calculations (as only byte-length information is currently obtainable in Java); these calculations affect the (column-based) tabs expansion in LPEX.

Terminology used in here:

Unicode	LPEX, like any Java program, uses Unicode for its internal representation of characters. More specifically, this is the UTF-16 encoding, which encodes the basic multilingual plane of Unicode version 1 directly, and uses surrogate pairs as the escape mechanism to encode the next 16 planes of Unicode version 3
encoding	a Java-supported character encoding, e.g., "Cp1252" (Windows Latin-1)
native encoding	this is the default character encoding of the platform (host operating system) that LPEX runs on, according to the default locale. This is, usually, an ASCII character encoding on a workstation (Windows, Linux, OS/2, etc.); an EBCDIC character encoding on a mainframe/midi (S/390, AS/400, etc.). This encoding is normally determined from the "file.encoding" Java system property
file encoding	this is the character encoding of an LPEX document's underlying file. The file encoding is normally the native encoding, as files are usually stored in an encoding that is same as the default encoding of the host operating system (for example, on the Japanese Linux, files are typically stored in EUC-JP). In a heterogeneous platform environment, the encoding of the host operating system may be different from the encoding of a file we want to load into the editor. In such a case, one must explicitly specify the encoding of the file, or let the editor attempt to detect it (see LpexView.LpexView(String,String,boolean)); the editor will then perform the character code conversion on loading the file in, and similarly whan saving the document. Internet character encodings are not supported. For these encodings, the surrounding escape sequences are not currently handled in the code. Their processing should be in a manner quite similar to the handling of SO/SI controls in EBCDIC DBCS. An application using the LPEX text widget may also read and write files in any encoding on its own, and use the editor API, for example, to set the document text from a Reader and save it through a Writer. As an example, an Eclipse technology LpexAbstractTextEditor-based editor may extend the FileDocumentProvider to read and save local UTF-8 files, while the LPEX Editor plug-in loads and saves its contents, all in Unicode, from and to the Eclipse IDocument provided.
source encoding	the source file's character encoding: the file being edited may originate from and/or be targeted for a remote system (i.e., different from the platform that LPEX runs on). Setting the source encoding information in the editor allows LPEX to emulate features of the file's original editing environment (for example, display emulated SO/SI controls), correctly establish the sequence numbers in effect, calculate the length limit of text lines for save operations, etc.
DBCS	Asian character set/encoding that contains double-byte characters
MBCS	Asian character set/encoding that contains multi-byte characters
SO, SI	Shift-out and Shift-in control characters. Only EBCDIC DBCS encodings use SO/SI escape characters. Balanced SO/SI characters enclose sequences of DBCS character bytes. LPEX can display emulation SO/SI characters in order to present the user an image of the file similar to the one seen in its source natural habitat (e.g., an iSeries member being edited with an iSeries editor).

Here is a scenario for editing a remote file with LPEX:

 
 -----------------------------                   ----------------------------
 | Windows XP workstation    |                   | zSeries mainframe        |
 | IBM Java 2 SDK 1.4.1      |                   |                          |
 |                           |                   |                          |
 | Native encoding: MS 932   |                   | Source encoding: CP 939  |
 |  (PC DBCS)                |                   |  (EBCDIC DBCS + SO/SIs)  |
 |                           |                   |                          |
 |                           |                   | 1.remote file XXX        |
 |                           |  2.file-transfer  |                          |
 |                           |    utility:       |                          |
 |                           |    zSeries ->     |                          |
 |                           |      workstation  |                          |
 |                           |<===================                          |
 |  3.local file xxx         |                   |                          |
 |                           |                   |                          |
 |  4.LPEX loads file        |                   |                          |
 |    from workstation:      |                   |                          |
 |    "MS932" -> Unicode     |                   |                          |
 |                           |                   |                          |
 |   [5.set sourceEncoding   |                   |                          |
 |      to "Cp939" for       |                   |                          |
 |      emulation purposes   |                   |                          |
 |    6.set sourceCcsid      |                   |                          |
 |    7.set sequenceNumbers] |                   |                          |
 |                           |                   |                          |
 |    8.LPEX handles         |                   |                          |
 |      the editing of       |                   |                          |
 |      document xxx         |                   |                          |
 |      (all in Unicode)     |                   |                          |
 |                           |                   |                          |
 |  9.LPEX saves file        |                   |                          |
 |    to workstation:        |                   |                          |
 |    Unicode -> "MS932"     |                   |                          |
 |                           |                   |                          |
 | 10.local file xxx'        |                   |                          |
 |                           | 11.file-transfer  |                          |
 |                           |    utility:       |                          |
 |                           |    workstation -> |                          |
 |                           |      zSeries      |                          |
 |                           ===================>|                          |
 |                           |                   | 12.remote file XXX'      |
 |                           |                   |                          |
 -----------------------------                   ----------------------------

Notes:

when the remote file has sequence numbers, the sequenceNumbers editor parameter should be set only after the sourceEncoding (and, when applicable, the sourceCcsid) parameter(s) were set; sequence numbers are defined in terms of byte columns in the source encoding
the Java character-encoding converters remove SO/SIs during the EBCDIC DBCS to Unicode conversion, and insert SO/SIs in the Unicode to EBCDIC DBCS conversion. If the EBCDIC DBCS text contains (malformed) adjacent SO/SI-delimited sequences (such as <SO>D1D2<SI><SO>D1D2<SI>), this information is lost in the conversion
if the Java Unicode String includes SO/SIs (values '\u000e' and '\u000f'), these will be kept (and balanced) in the converted EBCDIC by the Java character-encoding converters
as of Java 2 SDK v1.4, there is new support for encoding sets in the java.nio package. The canonical names currently used in here for character encodings are the ones used in the java.io and java.lang APIs. See also Java Supported Encodings.

Method Summary
String	addSourceSosi(String s) Add emulation SO/SIs to a Java Unicode string originating from, or targeted for, an EBCDIC DBCS source character encoding.
String	addSourceSosi(String s, char shiftOut, char shiftIn) Add emulation SO/SIs to a Java Unicode string originating from, or targeted for, an EBCDIC DBCS source character encoding.
boolean	displayingSosi() Query whether the view is displaying emulation SO/SI controls.
static int	encodingCharIndex(String s, int index, String encoding) Return the character index into the encoded string (i.e., as converted from the given Java Unicode string using the specified character encoding), which corresponds to the specified index into the string.
static int	encodingLength(char c, String encoding) Get the byte-length for a string consisting of one Java Unicode character converted to the specified character encoding.
static int	encodingLength(String s, String encoding) Get the byte-length of a Java Unicode string in the specified character encoding.
String	getFileEncoding() Retrieve the character encoding of the document's underlying file.
static String	getNativeEncoding() Retrieve the native (platform's default) character encoding.
String	getSourceEncoding() Retrieve the source character encoding of the document.
static int	indexFromEncodingIndex(String s, int index, String encoding) Return the index into a Java Unicode text string which corresponds to the specified index into its encoded string (i.e., as converted using the specified character encoding).
static boolean	isBidi() Query whether the editor is running in a bidirectional environment which it can handle.
static boolean	isBidiEncoding(String encoding) Determine whether a character encoding is bidirectional.
static boolean	isEucEncoding(String encoding) Determine whether a character encoding is EUC (AIX MBCS).
static boolean	isMbcsEncoding(String encoding) Determine whether a character encoding is DBCS/MBCS.
static boolean	isSosiEncoding(String encoding) Determine whether a character encoding uses SO/SI control characters (is an EBCDIC DBCS character encoding).
boolean	isSourceMbcs() Query whether the source character encoding of the document is DBCS/MBCS.
boolean	isSourceSosi() Query whether the source character encoding of the document is EBCDIC DBCS.
static boolean	isValidEncoding(String encoding) Validate a character encoding.
int	sourceLength(char c) Get the byte-length for a string consisting of one Java Unicode character converted to the source character encoding.
int	sourceTruncate(String s, int textLimit) Truncate a Java Unicode string so that it fits within the specified number of bytes when converted to its source encoding.
int	sourceWidth(char c) Get the display-column width for one Java Unicode character converted to the source character encoding.
boolean	usingSourceColumns() Convenience method to query whether the view's document is effectively using source columns.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Method Detail

isValidEncoding

public static boolean isValidEncoding(String encoding)

Validate a character encoding.

Parameters:: encoding - character encoding to validate
Returns:: true = the character encoding is valid and supported by the active Java run environment, or
false = the encoding is null or not supported

isMbcsEncoding

public static boolean isMbcsEncoding(String encoding)

Determine whether a character encoding is DBCS/MBCS. The given encoding is checked against internal tables of DBCS/MBCS character encodings. You must also use isValidEncoding(java.lang.String) to ensure that this character encoding is supported by the active Java run environment.

Parameters:: encoding - the canonical name (java.io, java.lang) of a character encoding

isEucEncoding

public static boolean isEucEncoding(String encoding)

Determine whether a character encoding is EUC (AIX MBCS). The given encoding is checked against an internal table of EUC character encodings. You must also use isValidEncoding(java.lang.String) to ensure that this character encoding is supported by the active Java run environment.

Parameters:: encoding - the canonical name (java.io, java.lang) of a character encoding

isSosiEncoding

public static boolean isSosiEncoding(String encoding)

Determine whether a character encoding uses SO/SI control characters (is an EBCDIC DBCS character encoding). The given encoding is checked against an internal table of EBCDIC DBCS character encodings. You must also use isValidEncoding(java.lang.String) to ensure that this character encoding is supported by the active Java run environment.

Parameters:: encoding - the canonical name (java.io, java.lang) of a character encoding

isBidiEncoding

public static boolean isBidiEncoding(String encoding)

Determine whether a character encoding is bidirectional. The given encoding is checked against an internal table of bidi (Arabic and Hebrew) character encodings. You must also use isValidEncoding(java.lang.String) to ensure that this character encoding is supported by the active Java run environment.

Parameters:: encoding - the canonical name (java.io, java.lang) of a character encoding
See Also:: isBidi()

getNativeEncoding

public static String getNativeEncoding()

Retrieve the native (platform's default) character encoding. The native encoding is normally determined from the "file.encoding" Java system property.

Returns:: the canonical name (java.io, java.lang) of the native encoding

getFileEncoding

public String getFileEncoding()

Retrieve the character encoding of the document's underlying file. This is usually the native (platform's default) character encoding.

Returns:: the canonical name (java.io, java.lang) of the file encoding

getSourceEncoding

public String getSourceEncoding()

Retrieve the source character encoding of the document. This is the character encoding of the document's underlying file on its origin / target platform.

The source character encoding either defaults to the file character encoding (usually the platform's default encoding), or has been explicitly set with the set sourceEncoding command for files brought over from a remote system.

Returns:: the canonical name (java.io, java.lang) of the source encoding

isSourceMbcs

public boolean isSourceMbcs()

Query whether the source character encoding of the document is DBCS/MBCS.

isSourceSosi

public boolean isSourceSosi()

Query whether the source character encoding of the document is EBCDIC DBCS.

usingSourceColumns

public boolean usingSourceColumns()

Convenience method to query whether the view's document is effectively using source columns. This method returns true when the document source encoding (as set in the current.sourceEncoding parameter) is DBCS, MBCS, or Arabic Cp420 in CCSID 420 (as set in the current.sourceCcsid parameter), and the current.useSourceColumns setting for the document is on.

In these cases document column positions, text lengths, etc., as normally calculated by the internal editor's Unicode text processing may differ from the underlying file's byte-oriented text processing in its source encoding.

displayingSosi

public boolean displayingSosi()

Query whether the view is displaying emulation SO/SI controls. The method returns true when the source encoding of the document is EBCDIC DBCS and the current showSosi parameter is on.

encodingLength

public static int encodingLength(String s,
                                 String encoding)

Get the byte-length of a Java Unicode string in the specified character encoding. For certain character encodings the length returned includes control bytes. For example, for EBCDIC DBCS encodings the length includes the SO/SI control characters; for UTF-16, it includes the byte-order mark.

Parameters:: s - Java Unicode string; encoding - character encoding

encodingLength

public static int encodingLength(char c,
                                 String encoding)

Get the byte-length for a string consisting of one Java Unicode character converted to the specified character encoding. For an EBCDIC DBCS character, this method returns 2 (i.e., the length of the two-byte character itself, without the SO/SI controls). For other character encodings, the length returned may include control bytes.

Parameters:: c - Java Unicode character; encoding - character encoding
Returns:: 1 if c converts to a single-byte character;
2 if the encoding character is double-byte;
n if the encoding character is multi-byte

sourceLength

public int sourceLength(char c)

Get the byte-length for a string consisting of one Java Unicode character converted to the source character encoding. If the source encoding is not DBCS/MBCS, this method returns 1; for an EBCDIC DBCS character it returns 2 (i.e., for the two-byte character itself, without the SO/SI controls).

Parameters:: c - Java Unicode character
Returns:: 1 if source is not a DBCS/MBCS encoding, or if c converts to a single-byte character;
2 if the source character is double-byte;
n if the source character is multi-byte

sourceWidth

public int sourceWidth(char c)

Get the display-column width for one Java Unicode character converted to the source character encoding.

Currently, this method effectively calls encodingLength(c, sourceEncoding). This may not be appropriate for certain character encodings (such as EUC).

Parameters:: c - Java Unicode character
See Also:: encodingLength(char,String)

encodingCharIndex

public static int encodingCharIndex(String s,
                                    int index,
                                    String encoding)

Return the character index into the encoded string (i.e., as converted from the given Java Unicode string using the specified character encoding), which corresponds to the specified index into the string.

If the encoding is EBCDIC DBCS, the index returned is positioned away from a SO/SI control character.

Parameters:: s - Java Unicode text string; index - ZERO-based index into s; encoding - character encoding
Returns:: ZERO-based index into encoded string

indexFromEncodingIndex

public static int indexFromEncodingIndex(String s,
                                         int index,
                                         String encoding)

Return the index into a Java Unicode text string which corresponds to the specified index into its encoded string (i.e., as converted using the specified character encoding).

Parameters:: s - Java Unicode String; index - ZERO-based index into the encoding string of s; encoding - character encoding
Returns:: ZERO-based index into s

sourceTruncate

public int sourceTruncate(String s,
                          int textLimit)

Truncate a Java Unicode string so that it fits within the specified number of bytes when converted to its source encoding. Used e.g., by text-limit operations when the useSourceColumns setting is on and the source encoding is MBCS, DBCS, or Arabic Cp420 with CCSID 420. It may not work correctly for other character encodings.

Parameters:: s - Java Unicode String; textLimit - maximum number of bytes in the source string
Returns:: length of the text String s, optionally truncated so that it fits textLimit bytes in its source encoding

addSourceSosi

public String addSourceSosi(String s)

Add emulation SO/SIs to a Java Unicode string originating from, or targeted for, an EBCDIC DBCS source character encoding. If the source encoding is not EBCDIC DBCS, the original string is returned unchanged.

The SO/SI characters added are those defined by the current.shiftOutCharacter and current.shiftInCharacter editor parameters in this view.

Parameters:: s - Java Unicode String
See Also:: addSourceSosi(String,char,char)

addSourceSosi

public String addSourceSosi(String s,
                            char shiftOut,
                            char shiftIn)

Parameters:: s - Java Unicode String; shiftOut - SO control character to use; shiftIn - SI control character to use
See Also:: addSourceSosi(String)

isBidi

public static boolean isBidi()

Query whether the editor is running in a bidirectional environment which it can handle. This method checks whether the native character encoding is bidirectional (Arabic, Hebrew), and whether LPEX supports it.

Currently, only SWT LPEX running on MS Windows platforms (Arabic and Hebrew) provides adequate bidirectional functionality, making use of the underlying Eclipse technology support.

Overview

Package

Class

Use

Tree

Serialized

Deprecated

Index

Help

SWT LPEX
v3.0.0

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

Overview

com.ibm.lpex.core Class LpexNls

isValidEncoding

isMbcsEncoding

isEucEncoding

isSosiEncoding

isBidiEncoding

getNativeEncoding

getFileEncoding

getSourceEncoding

isSourceMbcs

isSourceSosi

usingSourceColumns

displayingSosi

encodingLength

encodingLength

sourceLength

sourceWidth

encodingCharIndex

indexFromEncodingIndex

sourceTruncate

addSourceSosi

addSourceSosi

isBidi

Overview

com.ibm.lpex.core
Class LpexNls