intfiction.org

The Interactive Fiction Community Forum
It is currently Tue Dec 12, 2017 8:32 pm

All times are UTC - 6 hours [ DST ]




Post new topic Reply to topic  [ 8 posts ] 
Author Message
PostPosted: Thu Jun 09, 2011 5:25 pm 
Offline

Joined: Wed Jun 08, 2011 8:53 am
Posts: 4
Hi all!

I'm trying to do a Chinese translation of Inform 6. (context: IFictionFR) In order not to lose my head, I'll note my progresses and ideas here. I'll extremely appreciate if you have any ideas or comments.

I set out to add a new option (-Cu) to allow typing the program text in UTF-8. (instead of '@{304A}@{3059}@{3082}', one should be allowed to type 'おすも' directly) It quickly turned out that the current string compression mechanism isn't done for Chinese. (MAX_UNICODE_CHARS is 64 not 3000, and the compression would be quite inefficient with 3000) So currently I'm trying to change the semantics of strings: when -Cu is on, strings will become polyvalent. If a string is entirely within ISO8859-1, it will stay as compressed (E1). If the string comprises a Unicode character > 0xFF, it will be uncompressed (E2). I think this will not be a great nuisance: the major use of a string is to be printed; the printing is done with @streamstr which takes either string type.

I'll come back to say if the change works.


Top
 Profile Send private message  
Reply with quote  
PostPosted: Thu Jun 09, 2011 7:28 pm 
Offline

Joined: Sat Jan 23, 2010 4:56 pm
Posts: 5507
I think it would be better if there were a separate string compression option to control whether strings are polyvalent. People might want to use -Cu with Roman alphabets, or even within English. (I've wanted -Cu for years now.)

For the past few years, we've mostly been adding compilation options as $SETTING entries rather than command-line switches. (Too many command-line switches!) So the string compression option could be called $COMPRESS_UNICODE_CHARS, with a default value of 1, but you'd set it to 0 for your system.

I also think the compression system will work better than you expect for Chinese text. But you can test that when you get there.

Thanks for working on this.


Top
 Profile Send private message  
Reply with quote  
PostPosted: Fri Jun 10, 2011 3:21 am 
Offline

Joined: Wed Jun 08, 2011 8:53 am
Posts: 4
zarf wrote:
I think it would be better if there were a separate string compression option to control whether strings are polyvalent. People might want to use -Cu with Roman alphabets, or even within English. (I've wanted -Cu for years now.)

For the past few years, we've mostly been adding compilation options as $SETTING entries rather than command-line switches. (Too many command-line switches!) So the string compression option could be called $COMPRESS_UNICODE_CHARS, with a default value of 1, but you'd set it to 0 for your system.

I also think the compression system will work better than you expect for Chinese text. But you can test that when you get there.

Thanks for working on this.


Great zarf replying in person!

Thanks for the info. I initially wanted to add a command-line option to switch the system to "Unicode mode", but neither -u nor -U is available. It's conceptually ugly to have -Cu as a "global Unicode switch", so indeed, better leave the internal workings to $SETTING entries.


Top
 Profile Send private message  
Reply with quote  
PostPosted: Thu Sep 22, 2011 11:07 am 
Offline

Joined: Wed Jun 08, 2011 8:53 am
Posts: 4
Now you can have UTF-8 source input with Inform 6. A dirty hack, though, I hope, bearable in its dirtiness.
http://minus273.eu/if/

Cu.patch (on the I6 compiler tree of I7)
unidicttest.inf (Test file with $MAX_UNICODE_CHARS=4000 and a famous Chinese text for absence of duplicating characters. The compression algorithm in fact works quite well with Chinese, as zarf suggested.)

The idea is to extend ISO type to represent UTF-8 bytes, and do the UTF-8 -> Unicode expansion in text_* routines.

Enjoy.


Top
 Profile Send private message  
Reply with quote  
PostPosted: Mon Dec 19, 2011 6:39 pm 
Offline

Joined: Sat Jan 23, 2010 4:56 pm
Posts: 5507
I've finally sucked this into my work tree and tested it. Looks good, except for a couple of bugs in the 2-byte and 3-byte UTF8 cases -- easily fixed.

See this post: viewtopic.php?f=7&t=3995

Thanks again!


Top
 Profile Send private message  
Reply with quote  
PostPosted: Wed Dec 28, 2011 4:56 pm 
Offline

Joined: Mon Jun 29, 2009 5:51 am
Posts: 583
I've pulled this into the I6-in-I7 tree here: https://github.com/DavidKinder/Inform6 This will eventually become Inform 6.33.

By the way, do you have a name that you'd like to appear in the credits?


Top
 Profile Send private message  
Reply with quote  
PostPosted: Sat Jun 23, 2012 6:07 pm 
Offline

Joined: Wed Jun 08, 2011 8:53 am
Posts: 4
@DavidK: Thanks! You can put "Xun Gong".


Top
 Profile Send private message  
Reply with quote  
PostPosted: Mon Jun 26, 2017 9:06 pm 
Offline

Joined: Tue Feb 21, 2017 11:32 pm
Posts: 3
How did this project turn out? Is it finished?


Top
 Profile Send private message  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 8 posts ] 

All times are UTC - 6 hours [ DST ]


Who is online

Users browsing this forum: No registered users and 1 guest


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB® Forum Software © phpBB Group