Revisions for tech proposal: FALLBACK_ENCODINGS

--- 2025, August 29 - 22:25 by sysop
+++ 2025, August 29 - 22:34 by sysop
 -------- PROCESS FLOW --------------------------------------------------------
 Programms encountering data with encoded text, to which there is no metadata
 indicating which charset to decode it into, takes the FALLBACK_ENCODINGS
 string, had been read from the OS environment, as charset names delimited by
 the colon ":" char.

Current revision:

This is a technical proposal of a standard for conveying a computer user's preference of the precedence list of character sets to decode textual data into, which is in unknown encoding.

Proposal: `FALLBACK_ENCODINGS` environment variable

The user sets FALLBACK_ENCODINGS variable in his computing environment according to this EBNF:

FALLBACK_ENCODINGS  = word , { ":" , word } ;

word                = char , { char } ;

char                = "a" .. "z" | "A" .. "Z" | "0" .. "9" | "_" | "-" ;

Process Flow

Programms encountering data with encoded text, to which there is no metadata indicating which charset to decode it into, takes the FALLBACK_ENCODINGS string, had been read from the OS environment, as charset names delimited by the colon ":" char.

Optionally appends its own programm-specific fallback charset name, if any, to the end. "UTF-8" is recommended for this purpose.

Then tries decode the data into the given charset. If the decoding routine succeeds, returns the decoded text back to the user, optionally with the used charset as metadata.

If fails, continue with the next item. If no charset succeded, fails the data decoding task.

Reference Implementation

in Python programming language

import os

def decode_textual_data(data, metadata):
  encodings = []
  detected_charset = metadata.get('charset')
  if detected_charset:
    encodings.insert(0, detected_charset)
  
  FALLBACK_ENCODINGS = os.environ.get('FALLBACK_ENCODINGS', '').split(':')
  FALLBACK_ENCODINGS.append('UTF-8')
  encodings.extend(FALLBACK_ENCODINGS)
  
  for encoding in encodings:
    try:
      data.decode(encoding, 'strict')
      break
    except UnicodeDecodeError:
      encoding = None
  
  if encoding is None:
    raise
  
  return {'text': data.decode(encoding), 'metadata': {'charset': encoding,}}

Related Standards

LANG
LANGUAGE
LC_MESSAGES

Tags:

standard, proposal, charset, encoding

51 reads

Secondary menu

Main menu

Recent content

Tags

All content

You are here

Revisions for tech proposal: FALLBACK_ENCODINGS

Proposal: `FALLBACK_ENCODINGS` environment variable

Process Flow

Reference Implementation

Related Standards

Languages

Email a Login Link

Navigation

Secondary menu

Main menu

Recent content

Tags

All content

You are here

Revisions for tech proposal: FALLBACK_ENCODINGS

Proposal: FALLBACK_ENCODINGS environment variable

Process Flow

Reference Implementation

Related Standards

Languages

Search form

Email a Login Link

Navigation

Proposal: `FALLBACK_ENCODINGS` environment variable