draft-klensin-dns-role-00.txt

     
INTERNET-DRAFT                                John C. Klensin
November 10, 2000
Expires May 2001


                  Role of the Domain Name System
                 draft-klensin-dns-role-00.txt

Status of this Memo

This document is an Internet-Draft and is in full conformance with
all provisions of Section 10 of RFC2026.

Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups.  Note that
other groups may also distribute working documents as Internet-Drafts.

Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time.  It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."

The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt

The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html.

This document represents a summary of the personal opinions of the
author on the subject covered and is not intended to evolve into a
standard of any kind.

Copyright Notice

Copyright (C) The Internet Society (2000).  All Rights Reserved.



0. Abstract

The original function and purpose of the DNS is reviewed, and
contrasted with some of the functions into which it is being forced
today and some of the newer demands being placed upon it or suggested
for it.  A framework for an alternative to placing these additional
stresses on the DNS is then outlined.  This document and that
framework are not a proposed solution, only a strong suggestion that
the time has come to begin thinking more broadly about the problems
we are encountering and possible approaches to solving them.


1. History

Several of the comments that follow are somewhat revisionist.  Good
design and engineering often requires a level of intuition by the
designers about things that will be necessary in the future; the
reasons for some of these design decisions are not made explicit at
the time because no one is able to articulate them.  The discussion
below reconstructs some of the decisions about the Internet's primary
namespace in the light of subsequent development and experience.  In
addition, the historical reasons for particular decisions about the
Internet were often severely underdocumented contemporaneously and,
not surprisingly, different participants have different recollections
about what happened and what was considered important.  Consequently,
the quasi-historical story below is just one story.  There may be
(indeed, almost certainly are) other stories about how we got to
where we are today, but they probably don't, of themselves,
invalidate the inferences and conclusions.

1.1  Context for DNS development

During the entire life of the ARPANET and nearly the first decade or
so of operation of the Internet, the list of host names and their
mapping to and from addresses was maintained in a frequently-updated
"host table" [RFC625, 811, 952].  This table was just a list in an
agreed-upon format; sites were expected to frequently obtain copies
of, and install, new versions.  The host tables themselves were
introduced to

  * Eliminate the requirement for people to remember host numbers
  (addresses).  Despite apparent experience to the contrary in the
  conventional telephone system, numeric numbering systems, including
  the numeric host number strategy, did not (and do not) work well for
  more than a (large) handful of hosts.

  * Provide stability when addresses changed.  Since addresses --to
  some degree in the ARPANET and more importantly in the contemporary
  Internet-- are a function of network topology and routing, they
  often had to be changed when connectivity or topology changed.  The
  names could be kept stable even as addresses changed.

  * Some hosts (so-called "multihomed" ones) needed multiple
  addresses to reflect different types of connectivity and topology.
  Again, the names were very useful for avoiding the requirement that
  would otherwise exist for users and other hosts to track these
  multiple host numbers and addresses. 
Toward the end of that long (in network time) period, the community
concluded that the host table model did not scale adequately and that
it would not adequately support new service variations.  A working
group was created, and the DNS was the result of that effort.  The
role of the DNS was to preserve the capabilities of the host table
arrangements (especially unique, unambiguous, host names), provide
for addition of additional services (e.g., the special record types
for electronic mail routing which rather quickly followed
introduction of the DNS), and to do so on the base of a robust,
hierarchical, distributed, name lookup system.  That system also
permitted distribution of name administration, rather than requiring
that each host be entered into a single, central, table by a central
administration.

1.2 Review of the DNS

The DNS was designed primarily to identify network resources.
Although there was speculation about including, e.g., personal names
and email addresses, it was not designed primarily to identify
people, brands, etc.  At the same time, the system was designed with
the flexibility to accomodate new data types and structures through
the addition of new record types to the initial "INternet" class.
Since the appropriate identifiers and content of those future
extensions could not be anticipated, the design provided that these
fields could contain any (binary) information, not just the
restricted text forms of the host table.

However, the DNS as-used is intimately tied to the applications and
application protocols that utilize it, often at a fairly low level.

In particular, despite the ability of the protocols and data
structures themselves to accomodate any binary representation, DNS
names as used are historically not [even] ASCII, but a very
restricted subset of it, a subset that derives primarily from the
original host table naming rules.  Selection of that subset was
driven in part by human factors considerations, including a desire to
eliminate possible ambiguities in an international context.  Hence
character codes that had international variations in interpretation
were excluded, the underscore character and case distinctions were
eliminated as being confusing (in the underscore's case, with the
hyphen character) when written or read by people, and so on.  These
considerations appear to be very similar to those that resulted in
similarly restricted character sets being used as protocol elements
in many ITU and ISO protocols (cf. X.9, X.29).

Another assumption was that there would be a high ratio of physical
hosts to second level domains and, more generally, that the system
would be deeply hierarchical, with most systems (and names) at the
third level or below and a large ratio of names representing physical
hosts to total names.  There are domains that follow this model: many
university and corporate domains use fairly deep hierarchies, as do a
few country code TLDs (".US" is an excellent example).  However, the
RIPE hostcount list is now showing a count of SOA records that is
approaching (and may have passed) the number of distinct hosts.
While recent experience has shown that the DNS is robust enough
--given contemporary machines as servers and current bandwidth
norms-- to be able to continue to operate reasonably well when those
historical assumptions are not met (e.g., with a huge, flat,
structure under ".COM"), it is still useful to remember that the
system could have been designed to work optimally with a flat
structure (and very large zones) rather than a deeply hierarchical
one, and was not.

Similarly, despite some early speculation about entering people's
names and email addresses into the DNS directly, with the sole
exception (at least in the "IN" class) of one field of the SOA
record, electronic mail addresses in the Internet have preserved the
original, pre-DNS, "user at location" conceptual format rather than a
flatter one.  Location, in that instance, is a reference to a host.

Both the DNS architecture itself and the two-level provisions for
email and similar functions (e.g., see the finger protocol), also
anticipated a relatively high ratio of users to actual hosts.  It was
never clear that the DNS was intended to, or could, scale to the
order of magnitude of number of users (or, more recently, products or
document objects), rather than that of physical hosts.

Like the host table before it, the DNS has provided criticial
uniqueness for names and universal accessibility to them as part of
overall "single internet" and "end to end" models (cf [RFC2826]).
However, there are many signs that, as new uses evolve and original
assmumptions are abused, the system is being stretched to, or beyond,
its practical limits.
 
1.3 The web and user-visible domain names

>From the standpoint of the integrity of the domain name system --and
scaling of the Internet, including optimal accessibility to content--
the design decision to use "A record" domain names, rather than some
system of indirection, has proven to be a serious mistake in several
respects.  Convenience of typing, and the desire to make domain names
out of easily-remembered product names, has led to a flattening of
the DNS, with many people now perceiving that second-level names
under COM (or in some countries, second- or third-level names under
the relevant ccTLD) are all that is meaningful (this perception has
been reinforced by some domain name registrars who have been anxious
to "sell" additional names).  And, of course, the perception that one
needs a top-level domain per product, rather than a (usually
organizational) collection of network resources has led to a rapid
acceleration in the number of names being registered, a phenonenum
that has clearly benefited registrars charging on a per-name basis,
"cybersquatters", and others in the business of "selling" names, but
has not obviously benefitted the Internet as a whole.

The emphasis on second-level domain names has also created a problem
for the trademark community.  Since the Internet is international,
and names are being populated in a flat and unqualified space,
similarly-named entities are in conflict even if there would
ordinarily be no chance of confusing them in the marketplace.  The
problem appears to be unsolvable except by a choice between draconian
measures --possibly including significant changes to the underlying
legislation and conventions-- and a situation in which the "rights"
to a name are typically not settled using the subtle and traditional
product (or industry) type and geopolitical scope rules of the
trademark system but by depending largely on main force, e.g., the
organization with the greatest resources to invest in defending (or
attacking) names will ultimately win out.  The latter raises not only
important issues of equity, but the risk of backlash as the numerous
small players are forced to relinquish names they find attractive and
to adopt less-desirable naming conventions.

Independent of these sociopolitical problems, content distribution
issues have made it clear that it should be possible for an
organization to have copies of data it wishes to make available
distributed around the network, with a user who asks for the
information by name getting the topologically-closest copy.  This is
not possible with simple, as-designed, use of the DNS: DNS names
identify target resources or, in the case of email "MX" records, a
preferentially-ordered list of resources "closest" to a target (not
the source/user).  Several technologies (and, in some cases,
corresponding business models) have arisen to work around these
problems, including intercepting and altering DNS requests so as to
point to other locations, 
While additional implications are still being discovered and
seriously evaluated, it appears, not surprisingly, that rewriting DNS
names in the middle of the network, or trying to give them different
values or interpretations depending on the topological location of
the user trying to resolve the name interferes with end-to-end
applications in the general case.  These problems occur even if the
rewriting machinery is accompanied by additional workarounds for
particular applications: security associations and applications that
need to identify "the same host" as the applications for which these
tools have been designed often run into one problem or another.


1.4 A pessimistic history of the evolution of Internet applications
protocols.

At the applications level, few of the protocols in active, widespread
use on the Internet reflect the either contemporary knowledge in
computer science or human factors or experience accumulated through
deployment and use.  Instead, protocols tend to be deployed at a
just-past-prototype level, typically including the types of expedient
compromises typical with prototypes.  If they prove useful, the
nature of the network permit very rapid dissemination (i.e., they
fill a vacuum, even if one that no one previously knew existed).
But, once the vacuum is filled, the installed base provides its own
inertia: unless the design is so seriously faulty as to prevent
effective use (or there is a widely-perceived sense of impending
disaster unless the protocol is replaced), future developments must
maintain backward compatibility and workarounds for problematic
characteristics rather than benefiting from redesign in the light of
experience.  Applications that are "almost good enough" prevent
development and deployment of high-quality replacements.  

2. Signs of DNS overloading

Parts of the historical discussion above identify areas in which it
is becoming clear that the DNS is becoming overloaded (semantically
if not in the mechanical ability to resolve names).  While we seem to
still be well within the "just about good enough" range -- current
mechanisms and proposals to deal with these problems are all focused
on patching or working around limitations within the DNS rather than
dramatic rethinking -- the number of these issues that are arising
at the same time may argue for rethinging mechanisms and
relationships, not just more patches and kludges.  For example:

o While technical approaches such as larger and higher-powered servers
and more bandwidth, and legal/political mechanisms such as dispute
resolution policies have arguably kept the problems from becoming
critical, the DNS has not proven adequately responsive to business
and individual needs to describe or identify things (such as product
names and names of individuals) other than strict network resources.

o While stacks have been modified to better handle multiple addresses
on a physical interface and some protocols have been extended to
include DNS names for determining context, the DNS doesn't deal
especially well with high-multiple names per host (needed for web
hosting facilities with multiple domains on a server).

o Efforts to add names deriving from languages or character sets
based on other than simple ASCII and English-like names (see below),
or even to utilize complex company or product names without the use
of hierarchy have created apparent requirements names (labels) that
are over 63 octets long.  This requirement will undoubtedly increase
over time; while there are workarounds to accomodate longer names,
they impose their own restrictions and cause their own problems.

o Increasing commercialization of the Internet, and visibility of
domain names that are assumed to match names of companies or
products, has turned the DNS and DNS names into a trademark
battleground.  The traditional trademark system in (at least) most
countries makes careful distinctions about fields of applicability.
When the space is flattened, without differentiators by either
geography or industry sector, not only are there likely conflicts
between "Joe's Pizza" (of Boston) and "Joe's Pizza" (of San Francisco)
but between both and "Joe's Auto Repair" (of Los Angeles): all three
would like to control "Joes.com" and may claim trademark rights to do
so, even though conflict or confusion would not occcur with
traditional trademarks.

o Many organizations wish to have different web sites under the same
URL and domain name.  Sometimes this is to create local variations
--the Widget Company might want to present different material to a UK
user relative to a US one-- and sometimes it is to provide higher
performance by supplying information from the server topologically
closest to the user.  Arguably, the name resolution mechanism should
provide information about multiple sites that can provide information
associated with the same name and sufficient attributes associated
with each of those sites to permit applications to make sensible
choices, or should accept client-site attributes and utilize them in
the search process.  
o Many existing and proposed systems for "finding things on the
Internet" require a true search capability in which near matches can
be reported to the user and queries may be slightly ambiguous or
fuzzy.  The DNS can accomodate only one set of (quite rigid) matching
rules.  Current proposals to permit different rules in different
localities help to identify the problem, but, if applied directly to
the DNS either don't provide the level of flexibility that would be
desirable or tend to isolate different parts of the Internet from
each other (or both).  Fuzzy or ambiguous searches are desirable for
(at least) resolution of business names that might have spelling
variations and for names that can be resolved into different sets of
glyphs depending on context.  This goes beyond "mere"
canonicalization differences (different ways of representing the same
character) and into such relationships as the use of different
alphabets for the same language, Kanji-Hiragana relationships, etc.

o The historical DNS and applications that make assumptions about how
it works impose significant risk (or forces technical kludges and
consequent odd restrictions), when one considers adding mechanisms
for use with various multi-character-set and multilingual
"internationalization" systems.  Cf RFC 2825.

o In order to provide proper functionality to the Internet, the DNS
must have a single unique root (see RFC 2826 for a discussion of this
issue).  There are many desires for local treatment of names or
character sets that cannot be accomodated without either multiple
roots (e.g., a separate root for multilingual names) or mechanisms
that would have similar effects in terms of Internet fragmentation
and isolation.

In each of these cases, it is, or might be, possible to devise ways
to trick the DNS system into supporting mechanisms that were not
designed into it.  Several ingenious solutions have been proposed in
many of these areas already, and some have been successfully deployed
into the marketplace.

Several of the above problems are addressed well by a good directory
system (supported by the LDAP protocol or otherwise) or searching
environment (such as common web search engines) although not by the
DNS.  Given the difficulty of deploying new applications discussed
above, an important question is whether the kludges are bad enough,
or will scale up to bad enough, that new solutions are needed and can
be deployed.


3.  The directory story.

3.1 Overview 
The constraints of the DNS argue for introducing an intermediate
protocol mechanism, referred to here as a "directory layer".
Directory layer proposals would use a two-stage lookup, not unlike
several of the IDN proposals, but would do the first lookup in a
directory system, rather than in the DNS itself.  This would permit
us to relax several constraints and produce a more comprehensive
system.

Ultimately, many of the issues with domain names arise as the result
of people attempting to use the DNS as a directory.  While there
hasn't been enough pressure/demand to justify a change to date, it
has already been quite clear that, as a directory system, the DNS is
a good deal less than ideal.  This document suggests that there
actually is a requirement for a directory system, and that the right
solution to a directory requirement is a directory, not a series of
DNS patches, kludges, or workarounds.

In particular...

* A directory system would not require imposition of particular
length limits on names.

* A directory system could permit explicit association of attributes
of, e.g., language and country, with a name, without having to
utilize trick encodings to incorporate that information in DNS labels
(or creating artificial hierarchy for doing so).

* There is considerable experience in doing fuzzy and "sonex"
(similar-sounding) matching in directory systems.  Moreover, it is
plausible to think about different matching rules for different areas
and sets of names so that these can be adapted to local cultural
requirements.  Specifically, it might be possible to have a single
form of a name in a directory, but to have great flexibility about
what queries matched that name (and even have different variations in
different areas).  Of course, the more flexibility one provides, the
greater the possibility of real or imagined trademark conflicts, but
we would have the opportunity to design a directory structure that
dealt with those issues in an intelligent way, while DNS constraints
arguably make a general and equitable DNS-only solution impossible.

* If a directory system is used to translate to DNS names, and then
DNS names are looked up in the normal fashion, it may be possible to
relax several of the constraints that have been traditional (and
perhaps necessary) with the DNS.  For example, reverse-mapping of
addresses to directory names may not be a requirement, since the DNS
name(s) would (continue to) uniquely identify the host.

* Solutions to multilingual transcription problems that are common in
"normal life" (e.g., two-sided business cards to be sure that a
recipient trying to contact a person can access romanized spellings
and numbers when the original language may not be comprehensible to
that recipient) can be easily handled in a directory system by
inserting both sets of entries.

* One can easily imagine a directory system that would return, not a
single name, but a set of names paired with network-locational
information or other context-establishing attributes.  This type of
information might be of considerable use in resolving the "nearest
(or best) server for a particular named resource" problems that are a
significant concern for organizations hosting web and other sites
that are accessed from a wide range of locations and subnets.

* Names bound to countries and languages might help to manage
trademark realities, while use of the DNS in trademark-significant
areas tends to require worldwide "flattening" of the trademark
system. 
3.2 Some details and comments.

As several proposals have noted, almost any i18n proposal for names
that are in, or map into, the DNS will require changing DNS resolver
API calls ("gethostbyname" or equivalent or adding some
pre-resolution preparation mechanism) in almost all Internet
applications -- whether to cause the API to take a different
character set, to accept or return more arguments with qualifying or
identifying information, or otherwise.  Once applications must be
opened to make such changes, it is a relatively small matter to
switch from calling into the DNS to calling a directory service and
then the DNS (in many situations, both actions could be accomplished
in a single API call).

A directory approach can be consistent both with "flat" stories and
multi-attribute ones.  The DNS requires strict hierarchies, limiting
its ability to handle differentiation among names by their
properties.  By contrast, modern directories can utilize
independently-searched attributes and other structured schema to
provide flexibilities not present in a strictly hierarchical system.

There is a strong argument for a single directory structure (implying
a need for mechanisms for registration, delegation, etc.).  But it is
not a strict requirement, especially if in-depth case analysis and
design work leads to the conclusion that reverse-mapping to directory
names is not a requirement (see section 4).

While the discussion above includes very general comments about
attributes, it appears that only a very small number of attributes
would be needed.  The list would almost certainly include country and
language for IDN purposes and might require "charset" if we cannot
agree on a character set and encoding.  Trademark issues might
motivate "commercial" and "non-commercial" (or other) attributes if
they would be helpful in bypassing trademark problems.  And applications to
resource location might argue for a few other attributes (as outlined
above).

4.  The Controversies

4.1. One directory or many

As suggested in some of the text above, it is an open question as to
whether the needs of the community would be best served by a single
directory with universal applicability, a single directory but
locally-tailored search (and, most important, matching) functions, or
multiple, locally-determined, directories.  Each has its attractions.
Any but the first would essentially prevent reverse-mapping
(determination of the user-visible name of the host or resource from
target information such as an address or DNS name).  But reverse
mapping has become less useful over the years --at least to users--
as we have assigned more and more names per host address.  
Locally-tailored search and mappings would permit national variations
on interpretation of which strings matched which other ones, an
arrangement that is especially important when different localities
apply different rules to, e.g., matching of characters with and
without diacriticals.  But, of course, this implies that a URL may
evaluate properly or not depending on either settings on a client
machine or the network connectivity of the user, which is not, in
general, a desirable situation.

And, of course, completely separate directories would permit
translation and transliteration functions to be embedded in the
directory, given much of the Internet a different appearance
depending on which directory was chosen.  The attractions of this are
obvious, but, unless things were very carefully designed to preserve
uniqueness and precise identities at the right points (which may or
may not be possible), such a system would have many of the
difficulties associated with multiple roots.

4.2 Why not a proposal?

As this document has gone through various preliminary drafts and
reviews, the question has been raised as to whether it should contain
a specific proposal: a specific directory mechanism, schema, and so
on.  It deliberately does not take that step.  It has been difficult
to get directory systems deployed in significant ways in the Internet
infrastructure, partially because we have a surplus of options.
There are also some approaches that could be used to implement the
general concepts described here, such as the Common Name Resolution
Protocol [RFC2972], which some would not consider directory protocols
at all.  Consequently, it appeared better to present the general
concepts and arguments here and leave the specifics to other sources,
documents, and proposals.
 
5.  Security Considerations

The set of proposals implied by this document suggests an interesting
set of security issues (i.e., nothing important is ever easy).  A
directory system used for this purpose would presumably need to be as
carefully protected against unauthorized changes as the DNS itself.
There also might be new opportunities for problems in the two-layer
arrangement; but those problems are not more severe than a two-stage
lookup in the DNS.


6.  References

RFC 625 On-line hostnames service. M.D. Kudlick, E.J. Feinler.
Mar-07-1974.

RFC 811 Hostnames Server. K. Harrenstien, V. White, E.J. Feinler.
Mar-01-1982.

RFC 952 DoD Internet host table specification. K. Harrenstien, M.K.
Stahl, E.J. Feinler. Oct-01-1985.

RFC 882 Domain names: Concepts and facilities. P.V. Mockapetris.
Nov-01-1983.

RFC 883 Domain names: Implementation specification. P.V. Mockapetris.
Nov-01-1983.

RFC 1035 Domain names - implementation and specification. P.V.
Mockapetris. Nov-01-1987.

RFC 1591 Domain Name System Structure and Delegation. J. Postel.
March 1994.

RFC 2825 A Tangled Web: Issues of I18N, Domain Names, and the Other
Internet protocols. IAB, L. Daigle, ed.. May 2000.

RFC 2826 IAB Technical Comment on the Unique DNS Root. IAB. May 2000.

RFC 2972 Context and Goals for Common Name Resolution. N. Popp, M.
Mealling, L. Masinter, K. Sollins. October 2000.

ITU Recommendation X.9

ITU Recommendation X.25


7. Culprit address

John Klensin
AT&T Labs
99 Bedford Street
Boston, MA 02111
klensin@research.att.com

Expires May 2001