2010/2/4 John M McIntosh
<johnmci(a)smalltalkconsulting.com>om>:
First let me assume that WAKomEncoded is what I
should be starting, versus WAKom ?
We are talking about Seaside 2.8, right?
WAKom: takes the bytes (!) as sent from the client and creates a
ByteString from them without any decoding, which means a character
that is encoded in two bytes in UTF-8 will take up two Charaters with
their values being the values of the bytes
WAKomEncoded: does UTF-8 de/encoding on the input, which will create
WideStrings for non-Latin-1 strings
Us old Smalltalkers remember starting WAKom so in
WikiServer startup that is what happens.
I *guess* it really should be WAKomEncoded?
Judging on your problems below: yes (assuming you're cool with WideStrings)
So what's the fall out, I mean I can stuff
UTF8 chars into PRPages... Happy Happy.
Well not quite, I got a support email out of South Korea that the UTF8 character that was
entered for the
Page title was being mangled. In fact if they use the *wrong* character the app would
hang as it's loading
from binary storage to instantiate the PRPage.
In looking at this it turns out that because WAKom is used, the UTF8 data from the
request is being passed
as a String into PRStructure (instance var name). Later lazy initialization is used to
populate title
title
"Answer the title of the receiver, essentially the name but starting
uppercase."
^ title ifNil: [ title := self name capitalized]
Now here is the bad part, the capitalized runs Character>>asUppercase which
actually is kinda unicode aware
so it's attempting only to deal with wide characters but since the UTF8 character is
multiple bytes in a String then it mangles
the first byte to uppercase thus destroying the meaning of the UTF8 sequence.
Yeah, that's expected. ;-)
However now if I restart with WAKomEncoded the
squeak to utf8 process then messes the UTF8 data that was
stored in the binary data file.
So thoughts on how to fix things when I load the PRPages from storage, and what fields
would need fixing are welcome
Assuming you don't already have corrupted data in your image and want
to do a migration:
Option 1:
Do a utf-8 decoding on the Strings in your model and use WAKomEncoded
from that point on.
Option 2:
Hack #title method (and the other senders of #capitalized) to first do
a utf-8 decoding, then #capitalized and then utf-8 encoding. Continue
using WAKom.
On second thought:
Option 3:
Don't the the #capitalized at all and use CSS:
text-transform: capitalize
Cheers
Philippe