tech proposal: FALLBACK_ENCODINGS

This is a technical proposal of a standard for conveying a computer user's preference of the precedence list of character sets to decode textual data into, which is in unknown encoding.

Proposal: `FALLBACK_ENCODINGS` environment variable

The user sets FALLBACK_ENCODINGS variable in his computing environment according to this EBNF:

FALLBACK_ENCODINGS  = word , { ":" , word } ;

word                = char , { char } ;

char                = "a" .. "z" | "A" .. "Z" | "0" .. "9" | "_" | "-" ;

Process Flow

Programms encountering data with encoded text, to which there is no metadata indicating which charset to decode it into, takes the FALLBACK_ENCODINGS string, had been read from the OS environment, as charset names delimited by the colon ":" char.

Optionally appends its own programm-specific fallback charset name, if any, to the end. "UTF-8" is recommended for this purpose.

Then tries decode the data into the given charset. If the decoding routine succeeds, returns the decoded text back to the user, optionally with the used charset as metadata.

If fails, continue with the next item. If no charset succeded, fails the data decoding task.

Reference Implementation

in Python programming language

import os

def decode_textual_data(data, metadata):
  encodings = []
  detected_charset = metadata.get('charset')
  if detected_charset:
    encodings.insert(0, detected_charset)
  
  FALLBACK_ENCODINGS = os.environ.get('FALLBACK_ENCODINGS', '').split(':')
  FALLBACK_ENCODINGS.append('UTF-8')
  encodings.extend(FALLBACK_ENCODINGS)
  
  for encoding in encodings:
    try:
      data.decode(encoding, 'strict')
      break
    except UnicodeDecodeError:
      encoding = None
  
  if encoding is None:
    raise
  
  return {'text': data.decode(encoding), 'metadata': {'charset': encoding,}}

Related Standards

LANG
LANGUAGE
LC_MESSAGES

Tags:

standard, proposal, charset, encoding

»

46 reads

Add new comment

Pages that link to: tech proposal: FALLBACK_ENCODINGS

«backlinks»

No backlinks found.

Secondary menu

Main menu

Recent content

Tags

All content

You are here

Primary tabs

tech proposal: FALLBACK_ENCODINGS

Proposal: `FALLBACK_ENCODINGS` environment variable

Process Flow

Reference Implementation

Related Standards

Add new comment

Plain text

Pages that link to: tech proposal: FALLBACK_ENCODINGS

Languages

Email a Login Link

Navigation

Secondary menu

Main menu

Recent content

Tags

All content

You are here

Primary tabs

tech proposal: FALLBACK_ENCODINGS

Proposal: FALLBACK_ENCODINGS environment variable

Process Flow

Reference Implementation

Related Standards

Add new comment

Plain text

Pages that link to: tech proposal: FALLBACK_ENCODINGS

Languages

Search form

Email a Login Link

Navigation

Proposal: `FALLBACK_ENCODINGS` environment variable