TOC 
Network Working GroupT. Kindberg
Internet-DraftHewlett-Packard Corporation
Expires: October 30, 2004S. Hawke
 World Wide Web Consortium
 May 2004

The 'tag' URI scheme

draft-kindberg-tag-uri-05

Status of this Memo

By submitting this Internet-Draft, I certify that any applicable patent or other IPR claims of which I am aware have been disclosed, and any of which I become aware will be disclosed, in accordance with RFC 3668.

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts.

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."

The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt.

The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html.

This Internet-Draft will expire on October 30, 2004.

Copyright Notice

Copyright (C) The Internet Society (2004). All Rights Reserved.

Abstract

This document describes the "tag" Uniform Resource Identifier (URI) scheme, for identifiers that are unique across space and time. Tag URIs (also known as "tags") are distinct from most other URIs in that there is no authoritative resolution mechanism. A tag may be used purely as an entity identifier. Unlike UUIDs or GUIDs such as "uuid" URIs and "urn:oid" URIs, tags are designed to be tractable to humans. Furthermore, using tags has some advantages over the common practice of using "http" URIs as identifiers for non-HTTP-accessible resources.

Terminology

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119.

Disclaimer

The views and opinions of authors expressed herein do not necessarily state or reflect those of the World Wide Web Consortium, and may not be used for advertising or product endorsement purposes. This proposal has not undergone technical review within the Consortium and must not be construed as a Consortium recommendation.

Further Information and Discussion of this Document

Information about the tag URI scheme additional to this document -- motivation, genesis and discussion -- can be obtained from http://www.taguri.org.

Earlier drafts of this document have been discussed on uri@w3.org. The authors welcome further discussion and comments.



Table of Contents

1.  Introduction
2.  Tag Syntax and Rules
    2.1  Tag Syntax and Examples
    2.2  Rules for Minting Tags
    2.3  Resolution of Tags
    2.4  Equality of Tags
3.  Security Considerations
§.  Normative References
§.  Informative References
§  Authors' Addresses
§  Intellectual Property and Copyright Statements




 TOC 

1. Introduction

A tag is a type of Uniform Resource Identifier (URI) [1]Berners-Lee, T., Fielding, R. and L. Masinter, Uniform Resource Identifiers (URI): Generic Syntax, August 1998. designed to meet the following requirements:

  1. Identifiers are likely to be unique across space and time, and come from a practically inexhaustible supply.
  2. Identifiers are relatively convenient for humans to mint (create), read, type, remember etc.
  3. No registration is necessary, at least for holders of domain names or email addresses; and there is negligible cost to mint each new identifier.
  4. The identifiers are independent of any particular resolution scheme.

For example, the above requirements may apply in the case of a user who wants to place identifiers on their documents:

a
They want to be reasonably sure that the identifier is unique. Global uniqueness is valuable because it prevents identifiers from becoming unintentionally ambiguous.
b
It is useful for the identifier to be tractable to humans: they should be able to mint new identifiers conveniently, and to type them into emails and forms.
c
They do not want to have to communicate with anyone else in order to mint identifiers for their documents.
d
The user wants to avoid identifiers that might be taken to imply the existence of an electronic resource accessible via a default resolution mechanism, when no such electronic resource exists.

Existing identification schemes satisfy some but not all of the general requirements above. For example:

UUIDs [4]Leach, P. and R. Salz, UUIDs and GUIDs, 1997., [5], Information technology - Open Systems Interconnection - Remote Procedure Call (RPC), 1996. are hard for humans to read.

OIDs [6], Specification of abstract syntax notation one (ASN.1), 1988., [7]Mealling, M., A URN Namespace of Object Identifiers, February 2001. and Digital Object Identifiers [8]Paskin, N., Information Identifiers, April 1997. require naming authorities to register themselves, even if they already hold a domain name registration.

URLs (in particular, "http" URLs) are sometimes used as identifiers that satisfy most of our requirements. Many users and organisations have already registered a domain name, and the use of the domain name to mint identifiers comes at no additional cost. But there are drawbacks to URLs-as-identifiers:



 TOC 

2. Tag Syntax and Rules

This section first specifies the syntax of tag URIs and gives examples. It then describes a set of rules for minting tags designed to make them unique. Finally, it discusses the resolution and comparison of tags.

2.1 Tag Syntax and Examples

The general syntax of a tag URI, in ABNF, is:

tagURI = "tag:" taggingEntity ":" [specific]

Where:

taggingEntity = authorityName "," date

authorityName = DNSname / emailAddress

date = 4*dig ["-" 2*dig ["-" 2*dig ]] ; see ISO8601 [2], Data elements and interchange formats -- Information interchange -- Representation of dates and times, 1988.

DNSname = DNScomp / DNSname "." DNScomp ; see RFC1035 [3]Mockapetris, P., Domain names - implementation and specification, November 1987.

DNScomp = lowAlphaNum [*(lowAlphaNum /"-") lowAlphaNum]

emailAddress = 1*(lowAlphaNum /"-"/"."/"_") "@" DNSname

lowAlphaNum = dig / lowAlpha

specific = 1*(URIchars) ; URIchars defined in RFC2396 [1]Berners-Lee, T., Fielding, R. and L. Masinter, Uniform Resource Identifiers (URI): Generic Syntax, August 1998.

lowAlpha = %x61-7A ; any char in the range "a" through "z"

dig = %x30-39 ; any char in the range "0" through "9"

The component "taggingEntity" is the name space part of the URI. To avoid ambiguity, this MUST be expressed in lower case; the domain name in "authorityName" (whether an email address or a simple domain name) MUST be fully qualified.

Authority names could, in principle, belong to any syntactically distinct namespaces whose names are assigned to a unique entity at a time. Those include, for example, certain IP addresses, certain MAC addresses, and telephone numbers. However, to simplify the tag scheme, we restrict authority names to be domain names and email addresses. Future standards efforts may allow use of other authority names following syntax that is disjoint from this syntax. To allow for such developments, software that processes tags MUST NOT reject them on the grounds that they are outside the syntax for authorityName defined above.

The component "specific" is the name-space-specific part of the URI: it is any string of valid URI characters [1]Berners-Lee, T., Fielding, R. and L. Masinter, Uniform Resource Identifiers (URI): Generic Syntax, August 1998. chosen by the minter of the URI. It is RECOMMENDED that specific identifiers should be human-friendly.

Examples of tag URIs are:

tag:timothy@hpl.hp.com,2001:web/externalHome

tag:sandro@w3.org,2004-05:Sandro

tag:my-ids.com,2001-09-15:TimKindberg:presentations:UBath2004-05-19

tag:blogger.com,1999:blog-555

tag:yaml.org,2002:int

2.2 Rules for Minting Tags

As Section 2.1 has specified, each tag consists of a "tagging entity" followed, optionally, by a specific identifier. The tagging entity is designated by an "authority name" -- a fully qualified domain name or an email address containing a fully qualified domain name -- followed by a date. The date is chosen to make the tagging entity globally unique, exploiting the fact that domain names and email addresses are assigned to at most one entity at a time. That entity then ensures that it mints unique identifiers.

The date specifies, according to the Gregorian calendar and UTC, any particular day on which the authority name was assigned to the tagging entity at 00:00 UTC (the start of the day). The date is specified using one of the "YYYY", "YYYY-MM" and "YYYY-MM-DD" formats allowed by the ISO 8601 standard [2], Data elements and interchange formats -- Information interchange -- Representation of dates and times, 1988.. The tag specification permits no other formats. The date MUST be reckoned from UTC -- which may differ from the date in the tagging entity's local timezone at 00:00 UTC.

In the interests of brevity, the month and day default to 01. A day value of 01 MAY be omitted; a month value of 01 MAY be omitted unless it is followed by a day value other than 01. For example, "2001-07" is the date 2001-07-01 and "2000" is the date 2000-01-01. All dates specify a moment (00:00) of a single day; they MUST NOT be taken as periods of a day or more, such as "the whole of July 2001" or "the whole of 2000".

It is RECOMMENDED that tagging entities use only one formulation for a given date, since alternative formulations of the same date will be counted as distinct and hence tags containing them will be unequal. For example, tags beginning "tag:hp.com,2000:" are never equal to those beginning "tag:hp.com,2000-01-01:", even though they refer to the same date (see Section 2.4Equality of Tags).

An entity MUST NOT mint tags under an authority name that was assigned to a different entity at 00:00 UTC on the given date, and it MUST NOT mint tags under a future date.

An entity that acquires an authority name immediately after a period during which the name was unassigned MAY mint tags as if the entity was assigned the name during the unassigned period. This practice has considerable potential for error and MUST NOT be used unless the entity has substantial evidence that the name was unassigned during that period. The authors are currently unaware of any mechanism that would count as evidence, other than daily polling of the "whois" registry.

For example, Hewlett-Packard holds the domain registration for hp.com and may mint any tags rooted at that name with a current or past date when it held the registration. It must not mint tags such as "tag:champignon.net,2001:" under domain names not registered to it. It must not mint tags dated in the future, such as "tag:hp.com,2999:". If it obtains assignment of "extremelyunlikelytobeassigned.org" on 2001-05-01, then it must not mint tags under "extremelyunlikelytobeassigned.org,2001-04-01" unless it has evidence proving that that name was continuously unassigned between 2001-04-01 and 2001-05-01.

A tagging entity mints specific identifiers that are unique within its context, in accordance with any internal scheme that uses only URI characters. Some tagging entities (e.g. corporations, mailing lists) consist of many people, in which case group decision-making and record-keeping procedures SHOULD be used to achieve uniqueness.

2.3 Resolution of Tags

There is no authoritative resolution mechanism for tags. Unlike most other URIs, tags can only be used as identifiers, and are not designed to support resolution. If authoritative resolution is a desired feature, a different URI scheme should be used.

2.4 Equality of Tags

Tags are simply strings of characters and are considered equal if and only if they are completely indistinguishable in their machine representations. That is, one can compare tags for equality by comparing the numeric codes of their characters, in sequence, for numeric equality. This equality-criterion allows for simplification of tag-handling software, which does not have to transform tags in any way to compare them.



 TOC 

3. Security Considerations

Minting a tag, by itself, is an operation internal to the tagging entity with no external consequences. The consequences of using an improperly minted tag (due to malice or error) in an application depends on the application, and must be considered in the design of any application that uses tags.

There is a significant possibility of minting errors by people who fail to apply the rules governing dates, or who use a shared (organizational) authority-name without prior organization-wide agreement. Tag-aware software MAY help catch and warn against these errors. As stated in Section 2, however, to allow for future expansion, software MUST NOT reject tags which do not conform to the syntax specified in Section 2.

A malicious party could make it appear that the same domain name or email address was assigned to each of two or more entities. Tagging entities SHOULD use reputable assigning authorities, and verify assignment wherever possible.

Entities SHOULD also avoid the potential for malicious exploitation of clock skew, by using authority names that were assigned continuously from well before to well after 00:00 UTC on the date chosen for the tagging entity -- preferably by intervals in the order of days.



 TOC 

4. References



 TOC 

4.1 Normative References

[1] Berners-Lee, T., Fielding, R. and L. Masinter, "Uniform Resource Identifiers (URI): Generic Syntax", RFC 2396, August 1998 (TXT, HTML, XML).
[2] "Data elements and interchange formats -- Information interchange -- Representation of dates and times", ISO (International Organization for Standardization) ISO 8601:1988, 1988.
[3] Mockapetris, P., "Domain names - implementation and specification", STD 13, RFC 1035, November 1987.


 TOC 

4.2 Informative References

[4] Leach, P. and R. Salz, "UUIDs and GUIDs", draft-leach-uuids-01 (work in progress), 1997.
[5] "Information technology - Open Systems Interconnection - Remote Procedure Call (RPC)", ISO (International Organization for Standardization) ISO/IEC 11578:1996, 1996.
[6] "Specification of abstract syntax notation one (ASN.1)", ITU-T recommendation X.208, (see also RFC 1778), 1988.
[7] Mealling, M., "A URN Namespace of Object Identifiers", RFC 3061, February 2001.
[8] Paskin, N., "Information Identifiers", Learned Publishing Vol. 10, No. 2, pp. 135-156, (see also www.doi.org), April 1997.


 TOC 

Authors' Addresses

  Tim Kindberg
  Hewlett-Packard Corporation
  Hewlett-Packard Laboratories
  Filton Road
  Stoke Gifford
  Bristol, Reading BS34 8QZ
  UK
Phone:  +44 117 312 9920
EMail:  timothy@hpl.hp.com
  
  Sandro Hawke
  World Wide Web Consortium
  32 Vassar Street
  Building 32-G508
  Cambridge, MA 02139
  USA
Phone:  +1 617 253-7288
EMail:  sandro@w3.org


 TOC 

Intellectual Property Statement

Disclaimer of Validity

Copyright Statement

Acknowledgment