draft-ietf-idn-udns-01.txt

     
Internet Draft                                            Dan Oscarsson
draft-ietf-idn-udns-01.txt                                Telia ProSoft
Updates: RFC 2181, 1035, 1034, 2535                       27 August 2000
Expires: 27 February 2001

   Using the Universal Character Set in the Domain Name System (UDNS)

Status of this memo

   This document is an Internet-Draft and is in full conformance with
   all provisions of Section 10 of RFC2026.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups. Note that other
   groups may also distribute working documents as Internet-Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time. It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

     The list of current Internet-Drafts can be accessed at
     http://www.ietf.org/ietf/1id-abstracts.txt

     The list of Internet-Draft Shadow Directories can be accessed at
     http://www.ietf.org/shadow.html.


Abstract

   Since the Domain Name System (DNS) [RFC1035] was created there have
   been a desire to use other characters than ASCII in domain names.
   Lately this desire have grown very strong and several groups have
   started to experiment with non-ASCII names.

   This document defines how the Universal Character Set (UCS)
   [ISO10646] can be used in DNS without extending the current [RFC1035]
   protocol and how DNS is extended to overcome length limits in the
   future.


1. Introduction

   While the need for non-ASCII domain names have existed since the
   creation of the DNS, the need have increased very much during the
   last few years. Currently there are at least two implementations
   using UTF-8 in use, and others using other methods.




Dan Oscarsson          Expires: 27 Februray 2001                [Page 1]

Internet Draft               Universal DNS                27 August 2000


   To avoid several different implementations of non-ASCII names in DNS
   that do not work together, and to avoid breaking the current ASCII
   only DNS, there is an immediate need to standardise how DNS shall
   handle non-ASCII names.

   While the DNS protocol allow any octet in character data, so far the
   octets are only defined for the ASCII code points. Octets outside the
   ASCII range have no defined interpretation. This document defines how
   all octets are to be used in character data allowing a standardised
   way to use non-ASCII in DNS.

   To support the software where only ASCII host and domain names are
   allowed, this document defines how resource records are to be
   returned in a response to avoid breaking that software.

   The specification here conforms to the IDN requirements [IDNREQ].

1.1 Terminology

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED",  "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in [RFC2119].

    IDN: Internationalised Domain Name, here used to mean a domain name
      containing non-ASCII characters.

    ACE: ASCII Compatible Encoding. Used to encode IDNs in a way
      compatible with the ASCII host name syntax.

1.2 Previous versions of this document

   The second version of this document was available as draft-ietf-idn-
   udns-00.txt. It included a lot of possibilities as well as a flag bit
   that is now removed.

   The first version of this document was available as draft-oscarsson-
   i18ndns-00.txt.


2. The DNS Protocol

   The DNS protocol is used when communicating between DNS servers and
   other DNS servers or DNS clients. User interface issues like the
   format of zone files or how to enter or display domain names are not
   part of the protocol.

   The update of the protocol defined here can be used immediately as it
   is fully compatible with the DNS of today.



Dan Oscarsson          Expires: 27 Februray 2001                [Page 2]

Internet Draft               Universal DNS                27 August 2000


2.1 Character data

   Character data need to be able to represent as much as possible of
   the characters in the world as well as being compatible with ASCII.
   It must also be well defined so that it can easily be handled and
   should be compact as only 63 octets is available without an extension
   of the protocol. Character data is used in labels and in text fields
   in the RDATA part of a RR.

   Character data used in the DNS protocol MUST:
    - Use ISO 10646 (UCS) [ISO10646] as coded character set.
    - Be normalised using form C as defined in Unicode technical report
      #15 [UTR15]. See also [CHNORM].
    - Encoded using the UTF-8 [RFC2279] character encoding scheme.

2.2 Name matching

   RFC 1035 states that the labels of a name are matched case-
   insensitively.  When using UCS this is no longer enough as there are
   other forms than case that need to match as equivalent.

   The original definition is now extended to be: labels must be
   compared using form-insensitivity.

   For the UCS character code range 0-255 (ASCII and ISO 8859-1) the
   case folding MUST be done by case-insensitive matching following the
   one to one mapping as defined in the Unicode 3.0 Character Database
   [UDATA].

   How to do form-insensitive matching for the rest of UCS will be
   defined in a separate document.

2.2.1 Rules for matching of domain names in DNS servers

   To be able to handle correct domain name matching in lookups, the
   following MUST be followed by DNS servers:
    - Do matching on authorative data using form-insensitive matching
      for the characters used in the data (for example a zone using only
      ASCII need only handle matching of ASCII characters).
    - On non-authorative data, either do binary matching or case-
      insensitive matching on ASCII letters and binary matching on all
      others.

   The effect of the above is:
    - only servers handling authorative data must implement form-
      insensitive matching of names. And they need only implement the
      subset needed for the subset of characters of UCS they support in
      their authorative zones.



Dan Oscarsson          Expires: 27 Februray 2001                [Page 3]

Internet Draft               Universal DNS                27 August 2000


    - it normally gives fast lookup because data is usually sent like:
      resolver <-> server <-> authorative server.
      While form-insensitive matching can be complex and CPU consuming,
      the server in the middle will do caching with only simple and fast
      binary matching. So the impact of complex matching rules should
      not slow down DNS very much.

2.3 Supporting older software and allowing for ASCII aliases.

   As there is a lot of software expecting host and domain names to only
   use a subset of ASCII, they may work incorrectly if receiving a
   response with non-ASCII characters. And when communicating between
   nations it is sometimes good to also have a version of a name that
   can be used by most people.

   To support this the following MUST be followed:
    - Queries for PTR records must return two records if the name
      pointed to includes non-ASCII. They may also return two records if
      an alternative name exist for the object pointed to.
      The two records MUST be ordered with the ASCII version of the name
      first and the non-ASCII or true name second. The second record
      defines the true name of the object, the first record an ASCII
      version of the name.

      Note: older software will normally stop analysing a response when
      finding the first PTR record so they will get the ASCII name.
      Newer software can select the name best suited for its needs.
    - Queries for other records with non-ASCII in the RDATA section MUST
      return an ASCII version also, unless the client is known to handle
      non-ASCII.

   At a future date IETF can decide that it is no longer necessary to
   support the software only handling ASCII names, and the servers can
   stop including ASCII versions in the responses.

   NOTE: a cache server shall return data in the same way as an
   authorative server. If some do not and change the order of the PTR
   records, some old software will not get the ASCII version of the
   name.

2.3.1 ASCII versions of a name

   When returning an ASCII version of a name, there are two
   possibilities:  returning a user defined ASCII alias or an ASCII
   compatible encoding (ACE) of the name.

   The ASCII Compatible Encoding (ACE) is used to support older software
   expecting only ASCII and to support downgrading from 8-bit to 7-bit



Dan Oscarsson          Expires: 27 Februray 2001                [Page 4]

Internet Draft               Universal DNS                27 August 2000


   ASCII in other protocols (like SMTP). It is a transition mechanism
   and will no longer be supported at some future time when it is so
   decided.

   All software following this specification MUST recognise ACE and
   decode them into their true name when doing matching and handling. A
   DNS server must recognise ACE in a query.

   The definition of the ACE to be used, is defined in a separate
   document.  Typical definitions that are suitable are [SACE] and
   [RACE].

   NOTE: To support the transition to UTF-8 in resolver code, it is
   recommended that a server recognise local encodings for the zones it
   is authorative for. This will allow clients using the local character
   set in many cases even before the resolver code is upgraded.


2.4 Handling long names

   The current DNS protocol limits a label to 63 octets. As UTF-8 take
   more than one octet for some characters, an UTF-8 name cannot have 63
   characters in a label like an ASCII name can. For example a name
   using Hangul would have a maximum of 21 characters.

   The limits imposed by RFC 1035 is 63 octets per label and 255 octets
   for the full name. The 255 limit is not a protocol limit but one to
   simplify implementations.

   To support longer names a long label type is defined using [RFC2671]
   as extended label 0b000011 (the label type will be assigned by IANA
   and may not be the number used here).

                                 1 1 1 1 1 1 1 1 1 1
             0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9
            +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-
            |0 1 0 0 0 0 1 1|  length       |  label data ...
            +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-

   length: length of label in octets
   label data: the label

   The long label MUST be handled by all software following this
   specification.  Also, they MUST support a UDP packet size of up to
   1280 bytes.

   The limits for labels are updated since RFC 1025 as follows:
   A label is limited to a maximum of 63 character code points in UCS



Dan Oscarsson          Expires: 27 Februray 2001                [Page 5]

Internet Draft               Universal DNS                27 August 2000


   normalised using Unicode form C.  The full name is limited to a
   maximum of 255 character code points normalised as for a label.

   As long labels are not understood by older software, a response MUST
   not include a long label unless the query did. At a later date, IETF
   may change this.

2.5 Handling to large responses and identifying non-ASCII clients

   If a client sends the QNAME in the query using the long label type,
   the client shows that it implements this specification and do not
   need ASCII compatibility.

   If the client is not identified to follow this specification, the UDP
   packet size is limited to 512 bytes. If then a response will not fit,
   the response MUST set the TC bit (truncated) to indicate that. A
   client may then resend the query using a long label in the query to
   show that it can handle larger responses.

2.6 DNSSEC

   As labels now can have non-ASCII in them, DNSSEC [RFC2535] need to be
   revised so that it also can handle that.

3. User interface issues

   Locally on a system or in a user interface a different character set
   than the one defined to be used in the DNS protocol may be used.
   Therefore software must map between the local character set and the
   character set of the protocol, so that human beings can understand
   it.

   This means that a zone file that is edited in a text editor by a
   person before being loaded into a DNS server must be allowed to be in
   the local character set. Software may not assume that the user can
   edit text encoded in UTF-8. A zone file transmitted between DNS
   software that is not handled by a human, can be transmitted using any
   format.

   When character data is presented to a human or entered by a human,
   software must, as good as possible, present it using local character
   set and allow it to be entered using the local character set.  It is
   the responsibility of the software to convert between the local
   character set and the one used in the protocol, not the human.

   The down coding defined above allows all names to be entered and
   displayed for all users, as long as at least the ASCII characters are
   supported.



Dan Oscarsson          Expires: 27 Februray 2001                [Page 6]

Internet Draft               Universal DNS                27 August 2000


4.1 Applications using DNS software

   If an application does a call to DNS, it must present the data to the
   users in the local character set used by the user, down coding if
   necessary. Software used to access DNS should give the application
   programmer both the possibility of doing queries and getting
   responses using local character set, and using UTF-8.

   APIs like getipnodebyname should be updated with a IDN flag that
   results in the name being returned using the current locale, instead
   of native UTF-8 or ASCII format.

5. Effect on other protocols

   As now a domain name may include non-ASCII many other protocols that
   include domain names need to be updated. For example SMTP, HTTP and
   URIs. The ACE format can be used when interfacing with ASCII only
   software or protocols.  Protocols like SMTP could be extended using
   ESMTP and a UTF8 option that defines that all headers are in UTF-8.

   It is recommended that protocols updated to handle i18n do this by
   encoding character data in the same standard format as defined for
   DNS in this document (UCS normalised form C). The use of encoding it
   in ASCII or by tagged character sets should be avoided.

   DNS do not only have domain names in them, for example e-mail
   addresses are also included. So an e-mail address would be expected
   to be changed to include non-ASCII both before and after the @-sign.

   Software need to be updated to follow the user interface
   recommendations given above, so that a human will see the characters
   in their local character set, if possible.

5.1 An example: SMTP

   When using SMTP it may be extended to allow UTF-8 in headers and
   addresses.  It will then have to, when transferring an e-mail to a
   SMTP system that have not been extended, encoded e-mail addresses and
   IDNs into an ACE.

   In this case an e-mail address could look like:
   ra--XXXXX.surname@ra--YYYYY.com
   where ra--XXXXX is the ACE of the given name and ra--YYYYY is the ACE
   of one part of the domain name.

6. Security Considerations

   As always with data, if software does not check for data that can be



Dan Oscarsson          Expires: 27 Februray 2001                [Page 7]

Internet Draft               Universal DNS                27 August 2000


   a problem, security may be affected. As more characters than ASCII is
   allowed, software only expecting ASCII and with no checks may now get
   security problems.

7. References

   [RFC1034]  P. Mockapetris, "Domain Names - Concepts and Facilities",
              STD 13, RFC 1034, November 1987.

   [RFC1035]  P. Mockapetris, "Domain Names - Implementation and
              Specification", STD 13, RFC 1035, November 1987.

   [RFC2119]  Scott Bradner, "Key words for use in RFCs to Indicate
              Requirement Levels", March 1997, RFC 2119.

   [RFC2181]  R. Elz and R. Bush, "Clarifications to the DNS
              Specification", RFC 2181, July 1997.

   [RFC2279]  F. Yergeau, "UTF-8, a transformation format of ISO 10646",
              RFC 2279, January 1998.

   [RFC2535]  D. Eastlake, "Domain Name System Security Extensions".
              RFC 2535, March 1999.

   [RFC2671]  P. Vixie, "Extension Mechanisms for DNS (EDNS0)", RFC
              2671, August 1999.

   [ISO10646] ISO/IEC 10646-1:2000. International Standard --
              Information technology -- Universal Multiple-Octet Coded
              Character Set (UCS)

   [Unicode]  The Unicode Consortium, "The Unicode Standard -- Version
              3.0", ISBN 0-201-61633-5. Described at
              http://www.unicode.org/unicode/standard/versions/
              Unicode3.0.html

   [UTR15]    M. Davis and M. Duerst, "Unicode Normalization Forms",
              Unicode Technical Report #15, Nov 1999,
              http://www.unicode.org/unicode/reports/tr15/.

   [UTR21]    M. Davis, "Case Mappings", Unicode Technical Report #21,
              Dec 1999, http://www.unicode.org/unicode/reports/tr21/.

   [UDATA]    The Unicode Character Database,
              ftp://ftp.unicode.org/Public/UNIDATA/UnicodeData.txt.
              The database is described in
              ftp://ftp.unicode.org/Public/UNIDATA/
              UnicodeCharacterDatabase.html.



Dan Oscarsson          Expires: 27 Februray 2001                [Page 8]

Internet Draft               Universal DNS                27 August 2000


   [IDNREQ]   James Seng, "Requirements of Internationalized Domain
   Names", draft-ietf-idn-requirement.

   [IANADNS]  Donald Eastlake, Eric Brunner, Bill Manning, "Domain Name
   System (DNS) IANA Considerations",draft-ietf-dnsext-iana-dns.

   [IDNE]     Marc Blanchet,Paul  Hoffman, "Internationalized domain
   names using EDNS (IDNE)", draft-ietf-idn-idne.

   [CHNORM]   M. Duerst, M. Davis, "Character Normalization in IETF
   Protocols", draft-duerst-i18n-norm.

   [IDNCOMP]  Paul Hoffman, "Comparison of Internationalized Domain Name
   Proposals", draft-ietf-idn-compare.

   [NAMEPREP] Paul Hoffman, "Comparison of Internationalized Domain Name
   Proposals", draft-ietf-idn-compare.

   [SACE]     Dan Oscarsson, "Simple ASCII Compatible Encoding", draft-
   ietf-idn-sace.

   [RACE]     Paul Hoffman, "RACE: Row-based ASCII Compatible Encoding
   for IDN", draft-ietf-idn-race.

8. Acknowledgements

   Paul Hoffman giving many comments in our e-mail discussions.

   Ideas from drafts by Paul Hoffman, Stuart Kwan, James Gilroy and Kent
   Karlsson.

   Magnus Gustavsson, Mark Davis, Kent Karlsson and Andrew Draper for
   comments on my draft.

   Discussions and comments by the members of the IDN working group.



Author's Address

   Dan Oscarsson
   Telia ProSoft AB
   Box 85
   201 20 Malmo
   Sweden

   E-mail: Dan.Oscarsson@trab.se




Dan Oscarsson          Expires: 27 Februray 2001                [Page 9]

Internet Draft               Universal DNS                27 August 2000





















































Dan Oscarsson          Expires: 27 Februray 2001               [Page 10]