Hi phil
I’m looking at my chainsaw with a really bizarre and gloomy look these days. So I guess that leading char will not pass the spring this year (evil laugh in the back) if you see what I mean. We should stop to turn around such mess and fix it.
Stef
Couldn't this be dumped in a Pharo book chapter in draft form to avoid losing it?
Phil
On Thu, Feb 27, 2014 at 8:50 AM, Stéphane Ducasse stephane.ducasse@inria.fr wrote:
By the way, I would love to have a short tutorial about unicode characters at esug.
me too but if nobody starts to have a look and try to understand leadingchar and friends it will not happen. I sent a note about my analysis a while ago and nobody reacted.
allCharacters "This name is obsolete since only the characters that will fit in a byte can be queried" ^self allByteCharacters
=> all the senders should us allByteCharacters
During my journey to the leadingChar realm I took notes and I share them with you.
leadingChar: leadChar code: code
code >= 16r400000 ifTrue: [ self error: 'code is out of range'. ]. leadChar >= 256 ifTrue: [ self error: 'lead is out of range'. ]. code < 256 ifTrue: [ ^self value: code ]. ^self value: (leadChar bitShift: 22) + code.
charCode ^ (value bitAnd: 16r3FFFFF).
leadingChar ^ (value bitAnd: (16r3FC00000)) bitShift: -22.
characterSet ^ EncodedCharSet charsetAt: self leadingChar
=> a character encodes the characterSet.
============================ Why are we using Latin1>>leadingChar ^ 0. Unicode>>leadingChar ^ 0
and I do not get why GreekEnvironment>>leadingChar ^0 Latin2Environment>>leadingChar ^0 Latin1Environment>>leadingChar ^0 Latin9Environment>>leadingChar ^0 RussianEnvironment>>leadingChar ^0 SimplifiedChineseEnvironment>>leadingChar ^0
====================== I do not understand why Unicode is declared as 1 and not 0.
Unicode class>>initialize
EncodedCharSet declareEncodedCharSet: self atIndex: 0+1. EncodedCharSet declareEncodedCharSet: self atIndex: 256.
================================ I do not understand why Latin1 does not use declareEncodedCharSet
Latin1 class>>initialize " self initialize " compoundTextSequence := String streamContents: [ :s | s nextPut: (Character value: 27). s nextPut: $(. s nextPut: $B ]. rightHalfSequence := String streamContents: [ :s | s nextPut: (Character value: 27). s nextPut: $-. s nextPut: $A ]
I started to distribute the initialization into subclasses starting from this method:
declareEncodedCharSet: anEncodedCharSetOrLanguageEnvironmentClass atIndex: aNumber
"this method is used to modularize the old initialize method: EncodedCharSets at: 0+1 put: Unicode. EncodedCharSets at: 1+1 put: JISX0208. EncodedCharSets at: 2+1 put: GB2312. EncodedCharSets at: 3+1 put: KSX1001. EncodedCharSets at: 4+1 put: JISX0208. EncodedCharSets at: 5+1 put: JapaneseEnvironment. EncodedCharSets at: 6+1 put: SimplifiedChineseEnvironment. EncodedCharSets at: 7+1 put: KoreanEnvironment. EncodedCharSets at: 8+1 put: GB2312. EncodedCharSets at: 12+1 put: KSX1001. EncodedCharSets at: 13+1 put: GreekEnvironment. EncodedCharSets at: 14+1 put: Latin2Environment. EncodedCharSets at: 15+1 put: RussianEnvironment. EncodedCharSets at: 17+1 put: Latin9Environment. EncodedCharSets at: 256 put: Unicode.
and indeed Latin1Environment was not part of the list.
Now apparently we can remove Latin1 because
EncodedCharSets of EncodedCharSet do not contain Latin1
================================== No senders emitSequenceToResetStateIfNeededOn: aStream forState: state rightDirection
Funny nextPutRightHalfValue: ascii toStream: aStream withShiftSequenceIfNeededForTextConverterState: state nextPutValue: ascii toStream: aStream withShiftSequenceIfNeededForTextConverterState: state
==========================
Moose-dev mailing list Moose-dev@iam.unibe.ch https://www.iam.unibe.ch/mailman/listinfo/moose-dev
Moose-dev mailing list Moose-dev@iam.unibe.ch https://www.iam.unibe.ch/mailman/listinfo/moose-dev