Preamble


FTN networks are mainly MS-DOS based, when rfc ones are generally UNIX based. Things could have been simple, but as you know "simple" is an unknown word whe talking about charsets. As you know, the charsets used for MS-DOS and for UNIX are different (there is also others !), we need a way to change the coding when gating messages.

From time to time I wrote comments like "should I change this ?" or "what is better ?", etc. I will higly appreciate any comments from you about that. Thanks.

January 13th, 1996
Pablo Saratxaga <srtxg@chanae.alphanet.ch> 2:293/2219@fidonet

How it works ?


I think the most effective way is to search for headers lines that tell us which one is used.

For FTN networks there is:

"^aCHRS: <FTN-CHRS>" or "^aCHARSET: <FTN-CHRS>" (which is obsolescent)

Theres is also a codage, comparable to quoted printable, called fsc-0051, and indicated with "^aI51". However as I never saw a message using it I won't support it for now.

For rfc networks, it is done with:

Content-Type: ....; charset=<rfc-charset> [ ..... ]

I've only added charset support for "text/plain" types. (charset support for "video/avi" is somewhat difficult :-)

If we can't found a known charset like this, then ifmail will search if there isn't a line corresponding to the other side that can tell us this. In other words it will then search for

X-FTN-CHRS: <FTN-CHRS> or ^aRFC-Content-Type: text/plain; charset=<rfc-charset>

If a charset isn't found yet, and if you compiled ifmail with -DJE, it will at the Areas file (look further at "JE compatibility")
If we still haven't a charset, it will be guessed (ifmail will look at Message-ID, if it doesn't end with ".ftn>" (it should be an rfc message) then the code will be set to CHRS_DEFAULT_RFC. Otherwise to CHRS_DEFAULT_FTN

CHRS_DEFAULT_RFC and CHRS_DEFAULT_FTN are configurables in iflib/charconv.h

We now have a charset for the message that will be gated. We need to know to which charset eventually convert.
If you've compiled it with -DJE it will be found in Areas file (look at "JE compatibility").
If it is not found there, or if you don't compile with -DJE, the functions getincode() or getoutcode() will be called. These functions will return the other code according to the one of the message. The decision table is hardcoded you probably will want to custom it. For that you have to edit iflib/charset.c getincode() and getoutcode() functions (In a future version it may be configurable by a run time readable file)

Well, now we have incode and outcode. We can then translate the text strings (headers and body) from one to the other charset. Two case can be distinguished: 8 and 16 charsets.

a) 8 bits charsets


This is the easier.
The only thing to do to support a new 8bits transcodage is to add a maptable in the directory pointed by maptabdir keyword. And of course add the recognition of these charsets to the sources, if it isn't done yet.

Maybe in the future will I add a runtime configurable way of recognising 8 bits characters. Something like this:

charset charset filename

a) 16 bits character


This is theorically possible to as 8bit ones; but it isn't funny to deal with maptables of 65,000+ lines :)
The 16 bits translations are hardcoded so.
16 bits codes have also special codes (like iso-2022-* ESC sequences) that allow mixing of various codages (8 and 16 bits), differents charsets, etc. So is not possible to have a simple maptable.

MIME support

MIME (Multipurpose Internet Mail Extensions) is a way of allowing data to be put in 7-bit characters format, to fit in email messages than can pass trough (old) mail gateways.
MIME is not only limited to text, and can also encode video, sound, etc. However for what ifmail is concerned, only text ("text/plain" more precissely) will be handled.

There are three ways of sending mail/articles:

  • a) send them "as is". (more and more mail gateways accept 8bits messages without stripping the 8th bit).
  • b) encode it with "quoted-printable" scheme. This is usefull if there are few 8 bit chars with a lot of ASCII ones (like in latin-alphabets languages). It is mostly readable without decoding.
  • c) encode it with "base64" scheme. This is usefull for non-latin languages, where 8bit chars are the majority. Is absolutely unreadable without decoding.

    Ifmail can recognize those MIME messages, and decode them to plain 8bit when gating to FTN networks, so texts will be readable by FTN mail readers

    Messages are passed without coding from FTN to usenet/email (should I change this ? )

    MIME headers

    There is a special coding for headers. As an exemple is better than a long explanation, there is how mime-coded headers look:

          From: =?US-ASCII?Q?Keith_Moore?= <moore@cs.utk.edu>
          To: =?ISO-8859-1?Q?Keld_J=F8rn_Simonsen?= <keld@dkuug.dk>
          CC: =?ISO-8859-1?Q?Andr=E9_?= Pirard <PIRARD@vm1.ulg.ac.be>
          Subject: =?ISO-8859-1?B?SWYgeW91IGNhbiByZWFkIHRoaXMgeW8=?=
           =?ISO-8859-2?B?dSB1bmRlcnN0YW5kIHRoZSBleGFtcGxlLg==?=
    

    There are two codages "B" (which is in fact base64) and "Q" (which is quoted- printable).
    Those headers are recognized and decoded to 8bit when gating to FTN networks. Also, the charset used in the header being written, and as it can be different of the one used in the text body; the translation routines took the incode value from there.

    JE compatibility

    All that charset stuff has been possible thanks to the help of TANAKA Tsuneo (tt@efnet.com), who had modified ifmail to add handling of Japanese charsets. The way the JE version retrieved the charsets is by looking in the Areas file, where two supplementary fields were added: rfc-charset and FTN-CHRS.
    When I found it better to look at the message itself to try finding the charset used, I thing it is a great idea to have a per ECHO/newsgroups default. So I added support for that feature.
    That also allow people using the JE version to use this one (TX) without having to modify (much) their configuration.
    You must compile with -DJE for supporting this.

    Recognized charsets

    This is an exhaustive list of recognized charsets.
    rfc-charset is the representation used for this charset in the usenet/email side (in MIME headers, "Content-Type: " line, 4th field of JE's Areas file) FTN-CHRS is the representation used for that same charset in the FTN side (CHRS: and CHARSET: kludge lines, 5th field of JE's Areas file).
    Note than in CHRS: and CHARSET: kludge lines the "FTN-CHRS" is followed by a digit, telling the "level" of that coding. Ex: "^aCHRS: LATIN-1 2", "^aCHRS: IBMPC 2" (refer to fsc-0054 for more information about that). Ifmail doesn't look that digit, it can even not be there, the kludge line will be handled correctly.

    Strings in parenthesis are aliases, they are recognized in messages, headers and JE's Areas; but are never used when writting the CHRS: or Content-Type: lines)

    <rfc-charset><FTN-CHRS>
    EUC-jp (x-EUC-jp)UJIS (EUC-JP, EUC)
    EUC-krEUC-KR
    iso-2022-cnISO-2022-CN
    iso-2022-jpJIS (Kanji)
    iso-2022-krISO-2022-KR
    iso-2022-twISO-2022-TW
    iso-8859-1 (iso8859-1)LATIN-1 (8859, ISO-8859)
    iso-8859-2Latin-2
    iso-8859-3Latin-3
    iso-8859-4Latin-4
    iso-8859-5Cyrillic
    iso-8859-6Arabic
    iso-8859-7Greek
    iso-8859-8Hebrew
    iso-8859-9Latin-5
    iso-8859-10Latin-6
    iso-8859-11 (x-tis620)Thai
    koi8-rKOI8-R (KOI8)
    koi8-uKOI8-U
    unicode-1-1UNICODE
    us-asciiASCII
    x-cp424CP424
    x-cp437IBMPC (PC-8, CP437)
    x-cp852CP852
    x-cp862CP862
    x-cp866CP866
    x-cp895CP895
    x-CN-Big5 (x-x-big5)BIG5
    x-CN-GB (x-gb2312)GB
    x-FIDOMAZOVIA (x-MAZOVIA)FIDOMAZ (MAZOVIA, FIDOMAZOVIA)
    x-HZHZ
    x-mac-roman (macintosh)MAC
    x-mik-cyr (x-MIK)MIK-CYR (MIK)
    x-NEC-JISNEC
    x-sjisSJIS (CP932, CP942)
    x-zWZW
    and a special one:
    AUTODETECTAUTODETECT

    This last one only appears in Areas file.

    Configuration

    In the config file /etc/ifmail/config set the keywords defaultrfcchar and defaultftnchar to the appropriate values for your country

    The recognized values can be found in the list above

    Use:

  • Western languages :
    defaultftnchar			cp437
    defaultrfcchar			iso-8859-1
    
  • Poland:
    defaultftnchar 			FIDOMAZOVIA
    defaultrfcchar  		iso-8859-2
    
  • Czechia and Slovakia
    defaultftnchar                  cp895
    defaultrfcchar                  iso-8859-2
    
  • Other Latin-alphabet Eastern Europe countries
    defaultftnchar        		cp852
    defaultrfcchar   		iso-8859-2
    
  • Russia
    defaultftnchar                  cp866
    defaultrfcchar                  koi8-r
    
  • Bulgaria
    defaultftnchar                  MIK-CYR
    defaultrfcchar                  iso-8859-5
    
  • Ukrania
    defaultftnchar                  cp866
    defaultrfcchar                  koi8-u
    
  • Japan
    defaultftnchar                  SJIS
    defaultrfcchar                  iso-2022-jp
    

    Conversion tables

    Not all charset conversions are possible, I only included the ones for which I have the data; and of course inconsistent conversions (like cyrillic --> korean) aren't even dealt.

               |B|G|E|E|H|i|i|i|i|i|i|i|i|i|i|i|i|i|i|k|k|m|M|U|u|c|c|c|c|c|c|M|S|
               |i|u|U|U|Z|s|s|s|s|s|s|s|s|s|s|s|s|s|s|o|o|a|I|N|s|p|p|p|p|p|p|A|h|
               |g|o|C|C| |o|o|o|o|o|o|o|o|o|o|o|o|o|o|i|i|c|K|I|-|4|4|8|8|8|8|Z|i|
               |5|B|-|-| |-|-|-|-|-|-|-|-|-|-|-|-|-|-|8|8|i|-|C|a|2|3|5|6|6|9|O|f|
               | |i|j|k| |2|2|2|2|8|8|8|8|8|8|8|8|8|8|-|-|n|C|O|s|4|7|2|2|6|5|V|t|
               | |a|p|r| |0|0|0|0|8|8|8|8|8|8|8|8|8|8|r|u|t|Y|D|c| | | | | | |I|_|
               | |o| | | |2|2|2|2|5|5|5|5|5|5|5|5|5|5| | |o|R|E|i| | | | | | |A|J|
               | | | | | |2|2|2|2|9|9|9|9|9|9|9|9|9|9| | |s| | |i| | | | | | | |I|
               | | | | | |-|-|-|-|-|-|-|-|-|-|-|-|-|-| | |h| | | | | | | | | | |S|
               | | | | | |c|j|k|t|1|2|3|4|5|6|7|8|9|1| | | | | | | | | | | | | | |
               | | | | | |n|p|r|w| | | | | | | | | |0| | | | | | | | | | | | | | |
    -----------+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    Big5       |X| | | | | | | | | | | | | | | | | | | | | | |?| | | | | | | | | |
    GuoBiao    | |X| | |X| | | | | | | | | | | | | | | | | | |?| | | | | | | | | |
    EUC-jp     | | |X| | | |X| | | | | | | | | | | | | | | | |?| | | | | | | | |X|
    EUC-kr     | | | |X| | | |p| | | | | | | | | | | | | | | |?| | | | | | | | | |
    HZ         | |X| | |X| | | | | | | | | | | | | | | | | | |?| | | | | | | | | |
    iso-2022-cn| | | | | |X| | | | | | | | | | | | | | | | | |?| | | | | | | | | |
    iso-2022-jp|X|X|X| | | |X| | | | | | | | | | | | | | | | |?| | | | | | | | |X|
    iso-2022-kr| | | |p| | | |X| | | | | | | | | | | | | | | |?| | | | | | | | | |
    iso-2022-tw| | | | | | | | |X| | | | | | | | | | | | | | |?| | | | | | | | | |
    iso-8859-1 | | | | | | | | | |X| | | | | | | | | | | |p| |?| | |X| | | | | | |
    iso-8859-2 | | | | | | | | | | |X| | | | | | | | | | | | |?| | | |X| | |X|X| |
    iso-8859-3 | | | | | | | | | | | |X| | | | | | | | | | | |?| | | | | | | | | |
    iso-8859-3 | | | | | | | | | | | |X| | | | | | | | | | | |?| | | | | | | | | |
    iso-8859-4 | | | | | | | | | | | | |X| | | | | | | | | | |?| | | | | | | | | |
    iso-8859-5 | | | | | | | | | | | | | |X| | | | | |X|X| |p|?| | | | | |X| | | |
    iso-8859-6 | | | | | | | | | | | | | | |X| | | | | | | | |?| | | | | | | | | |
    iso-8859-7 | | | | | | | | | | | | | | | |X| | | | | | | |?| | | | | | | | | |
    iso-8859-8 | | | | | | | | | | | | | | | | |X| | | | | | |?| |X| | |X| | | | |
    iso-8859-9 | | | | | | | | | | | | | | | | | |X| | | | | |?| | | | | | | | | |
    iso-8859-10| | | | | | | | | | | | | | | | | | |X| | | | |?| | | | | | | | | |
    koi8-r (1) | | | | | | | | | | | | | | | | | | | |X|X| |p|?| | | | | |X| | | |
    koi8-u (1) | | | | | | | | | | | | | | | | | | | |X|X| |p|?| | | | | |X| | | |
    macintosh  | | | | | | | | | |X| | | | | | | | | | | |X| |?| | |X| | | | | | |
    MIK-CYR    | | | | | | | | | | | | | |p| | | | | |p|p| |X|?| | | | | | | | | |
    UNICODE    | | | | | | | | | | | | | | | | | | | | | | | |X| | | | | | | | | |
    us-ascii(2)|X|X|X|X|X|X|X|X|X|X|X|X|X|X|X|X|X|X|X|X|X|X|X|X|X|X|X|X|X|X|X|X|X|
    cp424      | | | | | | | | | | | | | | | | |X| | | | | | |?| |X| | |X| | | | |
    cp437      | | | | | | | | | |X| | | | | | | | | | | |X| |?| | |X| | | | | | |
    cp852      | | | | | | | | | | |X| | | | | | | | | | | | |?| | | |X| | | |X| |
    cp862      | | | | | | | | | | | | | | | | |X| | | | | | |?| |X| | |X| | | | |
    cp866      | | | | | | | | | | | | | |X| | | | | |X|X| | |?| | | | | |X| | | |
    cp895      | | | | | | | | | | |X| | | | | | | | | | | | |?| | |X| | | |X| | |
    MAZOVIA    | | | | | | | | | | |X| | | | | | | | | | | | |?| | | |X| | | |X| |
    Shift-JIS  | | | |X| | | |X| | | | | | | | | | | | | | | |?| | | | | | | | |X|
    iso-11 (3) | | | | | | | | | |X| | | | | | | | | | | | | |?| | | | | | | | | |
    iso-4 (3)  | | | | | | | | | |X| | | | | | | | | | | | | |?| | | | | | | | | |
    iso-60 (3) | | | | | | | | | |X| | | | | | | | | | | | | |?| | | | | | | | | |
    zW         | |X| | |X| | | | | | | | | | | | | | | | | | |?| | | | | | | | | |
    
    X : already implemented.
    p : planned, help welcome.
    ? : What do you think of it, have you any info ?
    
    (1) koi8-u is a fully compatible superset of koi8-r, so ifmail-tx doesn't
        distinguish them during conversion.
    (2) us-ascii being 7 bit only all charsets are supersets of it, so us-ascii
        can be "converted" to anything.
    (3) those are almost never used, they where very old charsets, from the time
        when only 7-bit existed and [ \ ] etc were replaced by some accentuated
        characters.
    
    INFO WANTED: on charsets used in both usenet and fido in Greece,
    Turkye, Arabic countries, Korea, Taiwan, Thailand and India.