Hi phil

I’m looking at my chainsaw with a really bizarre and gloomy look these days.
So I guess that leading char will not pass the spring this year  (evil laugh in the back) if you see what I mean.
We should stop to turn around such mess and fix it.

Stef


On 27 Feb 2014, at 10:19, phil@highoctane.be wrote:

Couldn't this be dumped in a Pharo book chapter in draft form to avoid losing it?


Phil


On Thu, Feb 27, 2014 at 8:50 AM, Stéphane Ducasse <stephane.ducasse@inria.fr> wrote:
>
>
> By the way, I would love to have a short tutorial about unicode characters at esug.

me too but if nobody starts to have a look and try to understand leadingchar and friends it will not happen.
I sent a note about my analysis a while ago and nobody reacted.












allCharacters
        "This name is obsolete since only the characters that will fit in a byte can be queried"
        ^self allByteCharacters


=> all the senders should us allByteCharacters


During my journey to the leadingChar realm I took notes and I share them with you.

leadingChar: leadChar code: code

        code >= 16r400000 ifTrue: [
                self error: 'code is out of range'.
        ].
        leadChar >= 256 ifTrue: [
                self error: 'lead is out of range'.
        ].
        code < 256 ifTrue: [ ^self value: code ].
        ^self value: (leadChar bitShift: 22) + code.

charCode
        ^ (value bitAnd: 16r3FFFFF).

leadingChar
        ^ (value bitAnd: (16r3FC00000)) bitShift: -22.

characterSet
        ^ EncodedCharSet charsetAt: self leadingChar

=> a character encodes the characterSet.





============================
Why are we using
        Latin1>>leadingChar
                ^ 0.
        Unicode>>leadingChar
                ^ 0

and I do not get why
        GreekEnvironment>>leadingChar
                 ^0
        Latin2Environment>>leadingChar
                 ^0
        Latin1Environment>>leadingChar
                 ^0
        Latin9Environment>>leadingChar
                 ^0
        RussianEnvironment>>leadingChar
                 ^0
        SimplifiedChineseEnvironment>>leadingChar
                 ^0

======================
I do not understand why Unicode is declared as 1 and not 0.

Unicode class>>initialize


        EncodedCharSet declareEncodedCharSet: self atIndex: 0+1.
        EncodedCharSet declareEncodedCharSet: self atIndex: 256.



================================
I do not understand why Latin1 does not use declareEncodedCharSet

Latin1 class>>initialize
        "
        self initialize
"
        compoundTextSequence := String streamContents:
                [ :s |
                s nextPut: (Character value: 27).
                s nextPut: $(.
                s nextPut: $B ].
        rightHalfSequence := String streamContents:
                [ :s |
                s nextPut: (Character value: 27).
                s nextPut: $-.
                s nextPut: $A ]


I started to distribute the initialization into subclasses starting from this method:

declareEncodedCharSet: anEncodedCharSetOrLanguageEnvironmentClass atIndex: aNumber

"this method is used to modularize the old initialize method:
        EncodedCharSets at: 0+1 put: Unicode.
        EncodedCharSets at: 1+1 put: JISX0208.
        EncodedCharSets at: 2+1 put: GB2312.
        EncodedCharSets at: 3+1 put: KSX1001.
        EncodedCharSets at: 4+1 put: JISX0208.
        EncodedCharSets at: 5+1 put: JapaneseEnvironment.
        EncodedCharSets at: 6+1 put: SimplifiedChineseEnvironment.
        EncodedCharSets at: 7+1 put: KoreanEnvironment.
        EncodedCharSets at: 8+1 put: GB2312.
        EncodedCharSets at: 12+1 put: KSX1001.
        EncodedCharSets at: 13+1 put: GreekEnvironment.
        EncodedCharSets at: 14+1 put: Latin2Environment.
        EncodedCharSets at: 15+1 put: RussianEnvironment.
        EncodedCharSets at: 17+1 put: Latin9Environment.
        EncodedCharSets at: 256 put: Unicode.

and indeed Latin1Environment was not part of the list.

Now apparently we can remove Latin1 because

        EncodedCharSets of EncodedCharSet do not contain Latin1


==================================
No senders
        emitSequenceToResetStateIfNeededOn: aStream forState: state
        rightDirection


Funny
        nextPutRightHalfValue: ascii toStream: aStream withShiftSequenceIfNeededForTextConverterState: state
        nextPutValue: ascii toStream: aStream withShiftSequenceIfNeededForTextConverterState: state

==========================




_______________________________________________
Moose-dev mailing list
Moose-dev@iam.unibe.ch
https://www.iam.unibe.ch/mailman/listinfo/moose-dev


_______________________________________________
Moose-dev mailing list
Moose-dev@iam.unibe.ch
https://www.iam.unibe.ch/mailman/listinfo/moose-dev