From time to time I wrote comments like "should I change this ?" or "what is better ?", etc. I will higly appreciate any comments from you about that. Thanks.
January 13th, 1996
Pablo Saratxaga <srtxg@chanae.alphanet.ch> 2:293/2219@fidonet
"^aCHRS: <FTN-CHRS>" or "^aCHARSET: <FTN-CHRS>" (which is obsolescent)
Theres is also a codage, comparable to quoted printable, called fsc-0051, and indicated with "^aI51". However as I never saw a message using it I won't support it for now.
Content-Type: ....; charset=<rfc-charset> [ ..... ]
I've only added charset support for "text/plain" types. (charset support for "video/avi" is somewhat difficult :-)
If we can't found a known charset like this, then ifmail will search if there isn't a line corresponding to the other side that can tell us this. In other words it will then search for
X-FTN-CHRS: <FTN-CHRS> or ^aRFC-Content-Type: text/plain; charset=<rfc-charset>
If a charset isn't found yet, and if you compiled ifmail with -DJE, it will
at the Areas file (look further at "JE compatibility")
If we still haven't a charset, it will be guessed (ifmail will look at
Message-ID, if it doesn't end with ".ftn>" (it should be an rfc message) then
the code will be set to CHRS_DEFAULT_RFC. Otherwise to CHRS_DEFAULT_FTN
CHRS_DEFAULT_RFC and CHRS_DEFAULT_FTN are configurables in iflib/charconv.h
We now have a charset for the message that will be gated. We need to know to
which charset eventually convert.
If you've compiled it with -DJE it will be found in Areas file (look at
"JE compatibility").
If it is not found there, or if you don't compile with -DJE, the functions
getincode() or getoutcode() will be called. These functions will return the
other code according to the one of the message. The decision table is hardcoded
you probably will want to custom it. For that you have to edit iflib/charset.c
getincode() and getoutcode() functions (In a future version it may be
configurable by a run time readable file)
Well, now we have incode and outcode. We can then translate the text strings (headers and body) from one to the other charset. Two case can be distinguished: 8 and 16 charsets.
Maybe in the future will I add a runtime configurable way of recognising 8 bits characters. Something like this:
charset charset filename
MIME (Multipurpose Internet Mail Extensions) is a way of allowing data to be
put in 7-bit characters format, to fit in email messages than can pass trough
(old) mail gateways.
MIME is not only limited to text, and can also encode video, sound, etc.
However for what ifmail is concerned, only text ("text/plain" more precissely)
will be handled.
There are three ways of sending mail/articles:
Ifmail can recognize those MIME messages, and decode them to plain 8bit when gating to FTN networks, so texts will be readable by FTN mail readers
Messages are passed without coding from FTN to usenet/email (should I change this ? )
There is a special coding for headers. As an exemple is better than a long explanation, there is how mime-coded headers look:
From: =?US-ASCII?Q?Keith_Moore?= <moore@cs.utk.edu> To: =?ISO-8859-1?Q?Keld_J=F8rn_Simonsen?= <keld@dkuug.dk> CC: =?ISO-8859-1?Q?Andr=E9_?= Pirard <PIRARD@vm1.ulg.ac.be> Subject: =?ISO-8859-1?B?SWYgeW91IGNhbiByZWFkIHRoaXMgeW8=?= =?ISO-8859-2?B?dSB1bmRlcnN0YW5kIHRoZSBleGFtcGxlLg==?=
All that charset stuff has been possible thanks to the help of TANAKA Tsuneo
(tt@efnet.com), who had modified ifmail to add handling of Japanese
charsets. The way the JE version retrieved the charsets is by looking in
the Areas file, where two supplementary fields were added: rfc-charset and
FTN-CHRS.
When I found it better to look at the message itself to try finding the
charset used, I thing it is a great idea to have a per ECHO/newsgroups
default. So I added support for that feature.
That also allow people using the JE version to use this one (TX) without
having to modify (much) their configuration.
You must compile with -DJE for supporting this.
This is an exhaustive list of recognized charsets.
rfc-charset is the representation used for this charset in the usenet/email
side (in MIME headers, "Content-Type: " line, 4th field of JE's Areas file)
FTN-CHRS is the representation used for that same charset in the FTN side
(CHRS: and CHARSET: kludge lines, 5th field of JE's Areas file).
Note than in CHRS: and CHARSET: kludge lines the "FTN-CHRS" is followed by
a digit, telling the "level" of that coding. Ex: "^aCHRS: LATIN-1 2",
"^aCHRS: IBMPC 2" (refer to fsc-0054 for more information about that).
Ifmail doesn't look that digit, it can even not be there, the kludge line
will be handled correctly.
Strings in parenthesis are aliases, they are recognized in messages, headers and JE's Areas; but are never used when writting the CHRS: or Content-Type: lines)
<rfc-charset> | <FTN-CHRS> |
EUC-jp (x-EUC-jp) | UJIS (EUC-JP, EUC) |
EUC-kr | EUC-KR |
iso-2022-cn | ISO-2022-CN |
iso-2022-jp | JIS (Kanji) |
iso-2022-kr | ISO-2022-KR |
iso-2022-tw | ISO-2022-TW |
iso-8859-1 (iso8859-1) | LATIN-1 (8859, ISO-8859) |
iso-8859-2 | Latin-2 |
iso-8859-3 | Latin-3 |
iso-8859-4 | Latin-4 |
iso-8859-5 | Cyrillic |
iso-8859-6 | Arabic |
iso-8859-7 | Greek |
iso-8859-8 | Hebrew |
iso-8859-9 | Latin-5 |
iso-8859-10 | Latin-6 |
iso-8859-11 (x-tis620) | Thai |
koi8-r | KOI8-R (KOI8) |
koi8-u | KOI8-U |
unicode-1-1 | UNICODE |
us-ascii | ASCII |
x-cp424 | CP424 |
x-cp437 | IBMPC (PC-8, CP437) |
x-cp852 | CP852 |
x-cp862 | CP862 |
x-cp866 | CP866 |
x-cp895 | CP895 |
x-CN-Big5 (x-x-big5) | BIG5 |
x-CN-GB (x-gb2312) | GB |
x-FIDOMAZOVIA (x-MAZOVIA) | FIDOMAZ (MAZOVIA, FIDOMAZOVIA) |
x-HZ | HZ |
x-mac-roman (macintosh) | MAC |
x-mik-cyr (x-MIK) | MIK-CYR (MIK) |
x-NEC-JIS | NEC |
x-sjis | SJIS (CP932, CP942) |
x-zW | ZW |
and a special one: | |
AUTODETECT | AUTODETECT |
In the config file /etc/ifmail/config set the keywords defaultrfcchar and defaultftnchar to the appropriate values for your country
The recognized values can be found in the list above
Use:
defaultftnchar cp437 defaultrfcchar iso-8859-1
defaultftnchar FIDOMAZOVIA defaultrfcchar iso-8859-2
defaultftnchar cp895 defaultrfcchar iso-8859-2
defaultftnchar cp852 defaultrfcchar iso-8859-2
defaultftnchar cp866 defaultrfcchar koi8-r
defaultftnchar MIK-CYR defaultrfcchar iso-8859-5
defaultftnchar cp866 defaultrfcchar koi8-u
defaultftnchar SJIS defaultrfcchar iso-2022-jp
Not all charset conversions are possible, I only included the ones for which I have the data; and of course inconsistent conversions (like cyrillic --> korean) aren't even dealt.
|B|G|E|E|H|i|i|i|i|i|i|i|i|i|i|i|i|i|i|k|k|m|M|U|u|c|c|c|c|c|c|M|S| |i|u|U|U|Z|s|s|s|s|s|s|s|s|s|s|s|s|s|s|o|o|a|I|N|s|p|p|p|p|p|p|A|h| |g|o|C|C| |o|o|o|o|o|o|o|o|o|o|o|o|o|o|i|i|c|K|I|-|4|4|8|8|8|8|Z|i| |5|B|-|-| |-|-|-|-|-|-|-|-|-|-|-|-|-|-|8|8|i|-|C|a|2|3|5|6|6|9|O|f| | |i|j|k| |2|2|2|2|8|8|8|8|8|8|8|8|8|8|-|-|n|C|O|s|4|7|2|2|6|5|V|t| | |a|p|r| |0|0|0|0|8|8|8|8|8|8|8|8|8|8|r|u|t|Y|D|c| | | | | | |I|_| | |o| | | |2|2|2|2|5|5|5|5|5|5|5|5|5|5| | |o|R|E|i| | | | | | |A|J| | | | | | |2|2|2|2|9|9|9|9|9|9|9|9|9|9| | |s| | |i| | | | | | | |I| | | | | | |-|-|-|-|-|-|-|-|-|-|-|-|-|-| | |h| | | | | | | | | | |S| | | | | | |c|j|k|t|1|2|3|4|5|6|7|8|9|1| | | | | | | | | | | | | | | | | | | | |n|p|r|w| | | | | | | | | |0| | | | | | | | | | | | | | | -----------+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Big5 |X| | | | | | | | | | | | | | | | | | | | | | |?| | | | | | | | | | GuoBiao | |X| | |X| | | | | | | | | | | | | | | | | | |?| | | | | | | | | | EUC-jp | | |X| | | |X| | | | | | | | | | | | | | | | |?| | | | | | | | |X| EUC-kr | | | |X| | | |p| | | | | | | | | | | | | | | |?| | | | | | | | | | HZ | |X| | |X| | | | | | | | | | | | | | | | | | |?| | | | | | | | | | iso-2022-cn| | | | | |X| | | | | | | | | | | | | | | | | |?| | | | | | | | | | iso-2022-jp|X|X|X| | | |X| | | | | | | | | | | | | | | | |?| | | | | | | | |X| iso-2022-kr| | | |p| | | |X| | | | | | | | | | | | | | | |?| | | | | | | | | | iso-2022-tw| | | | | | | | |X| | | | | | | | | | | | | | |?| | | | | | | | | | iso-8859-1 | | | | | | | | | |X| | | | | | | | | | | |p| |?| | |X| | | | | | | iso-8859-2 | | | | | | | | | | |X| | | | | | | | | | | | |?| | | |X| | |X|X| | iso-8859-3 | | | | | | | | | | | |X| | | | | | | | | | | |?| | | | | | | | | | iso-8859-3 | | | | | | | | | | | |X| | | | | | | | | | | |?| | | | | | | | | | iso-8859-4 | | | | | | | | | | | | |X| | | | | | | | | | |?| | | | | | | | | | iso-8859-5 | | | | | | | | | | | | | |X| | | | | |X|X| |p|?| | | | | |X| | | | iso-8859-6 | | | | | | | | | | | | | | |X| | | | | | | | |?| | | | | | | | | | iso-8859-7 | | | | | | | | | | | | | | | |X| | | | | | | |?| | | | | | | | | | iso-8859-8 | | | | | | | | | | | | | | | | |X| | | | | | |?| |X| | |X| | | | | iso-8859-9 | | | | | | | | | | | | | | | | | |X| | | | | |?| | | | | | | | | | iso-8859-10| | | | | | | | | | | | | | | | | | |X| | | | |?| | | | | | | | | | koi8-r (1) | | | | | | | | | | | | | | | | | | | |X|X| |p|?| | | | | |X| | | | koi8-u (1) | | | | | | | | | | | | | | | | | | | |X|X| |p|?| | | | | |X| | | | macintosh | | | | | | | | | |X| | | | | | | | | | | |X| |?| | |X| | | | | | | MIK-CYR | | | | | | | | | | | | | |p| | | | | |p|p| |X|?| | | | | | | | | | UNICODE | | | | | | | | | | | | | | | | | | | | | | | |X| | | | | | | | | | us-ascii(2)|X|X|X|X|X|X|X|X|X|X|X|X|X|X|X|X|X|X|X|X|X|X|X|X|X|X|X|X|X|X|X|X|X| cp424 | | | | | | | | | | | | | | | | |X| | | | | | |?| |X| | |X| | | | | cp437 | | | | | | | | | |X| | | | | | | | | | | |X| |?| | |X| | | | | | | cp852 | | | | | | | | | | |X| | | | | | | | | | | | |?| | | |X| | | |X| | cp862 | | | | | | | | | | | | | | | | |X| | | | | | |?| |X| | |X| | | | | cp866 | | | | | | | | | | | | | |X| | | | | |X|X| | |?| | | | | |X| | | | cp895 | | | | | | | | | | |X| | | | | | | | | | | | |?| | |X| | | |X| | | MAZOVIA | | | | | | | | | | |X| | | | | | | | | | | | |?| | | |X| | | |X| | Shift-JIS | | | |X| | | |X| | | | | | | | | | | | | | | |?| | | | | | | | |X| iso-11 (3) | | | | | | | | | |X| | | | | | | | | | | | | |?| | | | | | | | | | iso-4 (3) | | | | | | | | | |X| | | | | | | | | | | | | |?| | | | | | | | | | iso-60 (3) | | | | | | | | | |X| | | | | | | | | | | | | |?| | | | | | | | | | zW | |X| | |X| | | | | | | | | | | | | | | | | | |?| | | | | | | | | | X : already implemented. p : planned, help welcome. ? : What do you think of it, have you any info ? (1) koi8-u is a fully compatible superset of koi8-r, so ifmail-tx doesn't distinguish them during conversion. (2) us-ascii being 7 bit only all charsets are supersets of it, so us-ascii can be "converted" to anything. (3) those are almost never used, they where very old charsets, from the time when only 7-bit existed and [ \ ] etc were replaced by some accentuated characters. INFO WANTED: on charsets used in both usenet and fido in Greece, Turkye, Arabic countries, Korea, Taiwan, Thailand and India.