From d3c6de7a382b5f263dd1614357480e603fc008b8 Mon Sep 17 00:00:00 2001 From: Ben McIlwain Date: Wed, 24 Sep 2025 17:04:37 -0400 Subject: [PATCH] Modify the base Latin LGR with our intended changes to improve security (#2829) --- .../java/google/registry/idn/Latin-IDN.xml | 359 ++---------------- 1 file changed, 33 insertions(+), 326 deletions(-) diff --git a/core/src/main/java/google/registry/idn/Latin-IDN.xml b/core/src/main/java/google/registry/idn/Latin-IDN.xml index 6163ce7a4..ddc2d6f5e 100644 --- a/core/src/main/java/google/registry/idn/Latin-IDN.xml +++ b/core/src/main/java/google/registry/idn/Latin-IDN.xml @@ -1,10 +1,10 @@ - 1 - 2024-10-25 + 1 + 2025-10-01 und-Latn - 11.0.0 + 2

INSTRUCTIONS

@@ -35,22 +35,21 @@

Note: version numbers start at 1. RFC 7940 recommends using simple integers. The version comment is optional, please replace or delete the default comment. Version comments may be used by some tools as part of the page header.

<version comment="[Please replace (or delete) the optional comment]">[Please fill in version number, starting at 1]</version>

-

<date>[Please fill in with publication date, in YYYY-MM-DD format]</date>

-

<validity-start>[Please fill in effective date, in YYYY-MM-DD format]</validity-start>

+

<date>2025-10-01</date>

+

<validity-start>2025-10-01</validity-start>

Note: the scope element may be repeated, so that the same document can serve for multiple domains.

<scope type="domain">[Please provide, in ".domain" format]</scope>

Registry Contact Information:

Please fill in the Registry Contact Details.

Change History

If you made technical modifications to the LGR, please summarize them in the Change History (and also note the details in the appropriate section of the description).

-

PLEASE DELETE THESE INSTRUCTIONS BEFORE DEPOSITING THE DOCUMENT

+

Registry Contact Details

    -
  • Contact Name: [Please fill in Contact Name]
  • -
  • Email address: [Please fill in Email address]
  • -
  • Phone Number: [Please fill in optional Phone Number]
  • +
  • Contact Name: Ben McIlwain
  • +
  • Email address: nomulus-discuss@google.com
@@ -67,17 +66,11 @@

Repertoire

-

The repertoire contains the 197 letters needed to write hundreds of languages in the Latin script. - An additional 7 combining diacritical marks are available as part of 21 explicitly defined combining sequences. +

The repertoire contains the 164 letters needed to write hundreds of languages in the Latin script. The repertoire is a subset of [Unicode 11.0.0]. For details, see Section 5, “Repertoire” in [Proposal-Latin]. (The proposal cited has been adopted for the Latin script portion of the Root Zone LGR.)

-

- Compared to that source, an additional language is supported by adding the code point for the Middle Dot used - in the Catalan Ela Geminada: U+006C U+00B7 U+006C. Context rules limit U+00B7 MIDDLE DOT to being bracketed - by the letter “l”. (See also [280])

- -

For the second level, the repertoire has been augmented with the ASCII digits, U+0030 to U+0039, plus U+002D HYPHEN-MINUS, for a total of 231 repertoire elements.

+

For the second level, the repertoire has been augmented with the ASCII digits, U+0030 to U+0039, plus U+002D HYPHEN-MINUS, for a total of 175 repertoire elements.

Any code points outside the Latin Script repertoire that are targets for out-of-repertoire variants would be included here only if the variant is listed @@ -142,28 +135,6 @@

    -
  • U+00B7 MIDDLE DOT and U+002D HYPHEN-MINUS — - the use of the hyphen as fallback for the middle dot in the Catalan Ela Geminada follows registry practice, see [281]. - The variant is limited to an Ela Geminada context.

  • -
  • U+00DF LATIN SMALL LETTER SHARP S and the sequence of two letters “ss” (U+0073 U+0073) — - IDNA2003 Compatibility: in IDNA2003, U+00DF LATIN SMALL LETTER SHARP S is mapped into “ss” (U+0073 U+0073). - Note: the fallback is also used outside domain names, but also used in locale variants of German and the - standard spelling.
  • -
  • U+0131 LATIN SMALL LETTER DOTLESS I and U+0069 LATIN SMALL LETTER I — - IDNA2003 Compatibility: in IDNA2003, U+0131 LATIN SMALL LETTER DOTLESS I is mapped into - U+0069 LATIN SMALL LETTER I.
  • -
- -

Some second level LGRs provide ASCII fallback variants for some or all accented Latin characters. - Likewise the U+0153 Small OE Ligature and U+00E6 Small AE ligature have ASCII fallbacks consisting of the - non-ligated “oe” and “ae” sequences. None of these fallbacks have been added to the current version of the LGR.

- -

Overlapped Variant Sequence: Both “ss” and “s” coexist in the repertoire and “s” has variant - relationships on its own. These variants thus overlap: making the variant set well-behaved for - index variant calculation requires that the sequence “ss” also be given variants to all permutations of - variants for the letter s followed by itself, as well as all transitive variants due to other variants - for U+00DF.

-

In-script Variant Mapping Types

In each of the fallback variant pairs defined above, the mapping type from the first element to the second is of type “fallback”, while the variant type for the other direction is “blocked”. In addition, the first element of each pair uses the @@ -194,17 +165,6 @@

Latin-specific Rules

-

The following context rule applies to U+00B7 MIDDLE DOT and its variants. - It ensures that the middle dot is part of an Ela Geminada sequence and variants between it and HYPHEN-MINUS are only defined in that context.

-
    -
  • surrounded-by-L — code points are invalid and variants undefined when not surrounded by “l”
  • -
- -

The following WLE rule invalidates labels in which two Ela Geminada sequences overlap.

-
    -
  • dot-L-dot — labels sharing a single “l” with two different middle dots are invalid
  • -
-

Actions

Default Actions

@@ -213,27 +173,6 @@ invalidate labels with misplaced combining marks. They are marked with ⍟. For a description see [RFC 7940].

-

Because this LGR defines allocatable fallback variants the following default actions are applicable.

- -
    -
  • blocked — a variant label containing a blocked variant will receive a disposition of “blocked”.
  • -
  • r-original — a label containing one or more of this reflexive variant type - and no others represents an original label - and receives a disposition of “valid”.
  • -
  • fallback — a label containing one or more of these variant types and no others - represents a label that contains only fallback variants - and receives a disposition of “allocatable”.
  • -
  • fallback plus other — any label remaining containing both this variant type and any others - receives a disposition of “blocked”.
  • -
-

These actions resolve as “allocatable” any label where all variants are of type “fallback”, and as “valid” any label - where all variants are of type “r-original”. Labels with a mix of variant types are resolved as “blocked”.

- -

To account for original code points in a permuted variant, reflexive variant - mappings with an “r-” prefix are used. (See [RFC 7940]). - In particular, the mapping type “r-original” is given to any code point that has a fallback mapping, - but that appears in its non-fallback form in the original label, and thus “maps to itself”.

-

Default actions that are triggered by the LGR-specific variant types described above limit the “allocatable” variant labels to those containing only “ss”, dotted “i” or hyphen variants, while @@ -258,7 +197,7 @@

Changes from Version Dated 25 October 2024

-

Adopted from the Second Level Reference LGR for the Latin Script [Ref-LGR-und-Latn] without normative changes.

+

Adopted from the Second Level Reference LGR for the Latin Script [Ref-LGR-und-Latn] with security improvements implemented by removing confusable variants.

References

@@ -496,9 +435,7 @@ - - - + @@ -509,89 +446,35 @@ - - - - + - - - - + - - - - - - - - - - - + - - - - - - - - - - - - - - - + + - - - - - - - - - + - - - + - - - - - - - - - - - - - - + - - - + @@ -601,126 +484,44 @@ - - - - - - - - - - - - - - - - - - - - - - - + - - - + - - - - - - - - + - - - - - - - - - + - - - - - - + - - - - - - + - - - - - - - - - + + - - - - - - + - - - - - - - - - - - - - - - - - - - - - @@ -731,41 +532,22 @@ - - - - - - + - - - - - - + - - - - - - - - - @@ -773,49 +555,17 @@ - - - - - + - - - - - - - - - - - - - - - - - - - - - - - - - - - - @@ -823,9 +573,6 @@ - - - @@ -837,31 +584,14 @@ - - - - - - - - - - - - - - - - - @@ -873,23 +603,12 @@ - - - - - - - - - - - @@ -928,18 +647,6 @@ - - - - - - - - - - - -