I tried PetitParser for the first time a few days ago, to import some data required for testing my Masters project (which is due RealSoonNow(c)). I found it easy to adapt PetitCSV for part of my requirement, but am having trouble extending it further. I was hoping to get a lot done this vacation on completing my project but this is slowing me down.
Now since everyone has better things to do this festive break, I am offering two small bounties of AUD $50 to hopefully draw the attention of someone more familiar with PetitParser, for whom this will hopefully be much quicker than my random experimenting.
The attached LEImportSKMPowerToolsData.txt shows a sample of what I need to parse. A single file containing multiple tables (12 of) with the data of each table in CSV format. Tables start with a line "// --- tablename ---", followed by column names on the next line, followed by multiple lines of data.
Using the attached LEImportSKMPowerTools.st (also see attached LEImportSKMPowerTools.txt formatted for quick reference )
I can successfully parse the first table of the following sample...
=========
// --- Bus ---
//<ComponentName>,<ComponentType>,<SystemNominalVoltage>,<AF_ArcType>,<AF_WorkingDistance>
00BFA10,10,415,In Box,609.6
00BFA20,10,415,In Box,609.6
// --- Cable ---
//<ComponentName>,<SystemNominalVoltage>,<Phase>,<CableSize>,<NeutralSize>,<Length>,<Size/Do Not Size>
11P08023,11500,ABC,240,,680.0,Do Not Size
11P08024,11500,ABC,240,,610.0,Do Not Size
=========
... but the next table ends up as a subpart of the first as shown in the following result for the above sample...
#(
#('Bus'
#('ComponentName' 'ComponentType' 'SystemNominalVoltage' 'AF_ArcType' 'AF_WorkingDistance')
#(
#('00BFA10' '10' '415' 'In Box' '609.6')
#('00BFA20' '10' '415' 'In Box' '609.6')
#('')
#('// --- Cable ---')
#('//<ComponentName>' '<SystemNominalVoltage>' '<Phase>' '<CableSize>' '<NeutralSize>' '<Length>' '<Size/Do Not Size>')
#('11P08023' '11500' 'ABC' '240' '' '680.0' 'Do Not Size')
#('11P08024' '11500' 'ABC' '240' '' '610.0' 'Do Not Size')
)
)
)
Essentially I don't know how to close off the pattern matching at the blank line following the data rows, so that the next table can be started. Also something I hadn't looked at yet, LEIMportSKMPowerToolsData.txt has comment lines starting with '//' that need to be ignored along with the blank lines.
So two things are required for the first $50.
1. The above result needs to instead have two close-round-brackets in place of the blank line #('') so that #('Cable' appears at the same level as #('Bus'
2. The comments and blank lines need to be ignored.
In addition, all of the PetitParser documentation that I find says "parse a string (or stream)" but I see no actual samples of using a stream. Changing from using a string to a stream is pretty basic I should probably be able to guess, but while I have your attention I may as well learn what is best practice.
------------
I also offer a second AUD $50 for the following...
1. A method to which I pass a filename, that will return an object which can be accessed with the tablename to return a collection of rows,
2. A cell in one of those rows can be accessed by the column name.
This will just be a temporary step to move the data into my own internal format, so it doesn't need to be elegant. I had two rough separate ideas:
a. A simple dictionary of array of dictionaries.
b. A method that when passed a column name will return the array index into a row.
------------
So I hope that it worth someone's while to attend to this quickly. Technical discussion should continue here. I am not quite sure yet how I'll handle multiple solutions. The method of payment will need to be discussed privately with contributors once requirements are met.
hope you are all enjoying your festive season...
cheers -ben
'From Pharo1.4 of 18 April 2012 [Latest update: #14457] on 26 December 2012 at 10:58:18 am'!
PPCompositeParser subclass: #LEImportSKMPowerTools
instanceVariableNames: 'endOfLine nonComma start tableName columnNames columnName columnNamesRow table dataCell dataRow dataRows'
classVariableNames: ''
poolDictionaries: ''
category: 'Lektrek-ImportExport'!
!LEImportSKMPowerTools methodsFor: 'as yet unclassified' stamp: 'BenComan 12/24/2012 15:23'!
dataCell
^ nonComma star flatten
==> [ :nodes | nodes value ]! !
!LEImportSKMPowerTools methodsFor: 'as yet unclassified' stamp: 'BenComan 12/24/2012 19:37'!
dataRows
^ (dataRow delimitedBy: endOfLine) ==> [ :nodes | nodes reject: [ :each | each class = PPToken ] ]! !
!LEImportSKMPowerTools methodsFor: 'as yet unclassified' stamp: 'BenComan 12/24/2012 12:18'!
endOfLine
^ #newline asParser token! !
!LEImportSKMPowerTools methodsFor: 'as yet unclassified' stamp: 'BenComan 12/24/2012 12:23'!
nonComma
^ PPPredicateObjectParser anyExceptAnyOf: {Character tab . Character cr . Character lf . $, }! !
!LEImportSKMPowerTools methodsFor: 'grammar' stamp: 'BenComan 12/24/2012 15:38'!
columnName
^ $< asParser , (#letter asParser / #digit asParser / #space asParser / $_ asParser / $/ asParser ) star flatten , $> asParser
==> [ :nodes | nodes second value ]! !
!LEImportSKMPowerTools methodsFor: 'grammar' stamp: 'BenComan 12/24/2012 14:54'!
columnNames
^ ((columnName delimitedBy: $, asParser token) ) ==> [ :nodes | nodes reject: [ :each | each class = PPToken ] ]! !
!LEImportSKMPowerTools methodsFor: 'grammar' stamp: 'BenComan 12/24/2012 14:56'!
columnNamesRow
^ '//' asParser, columnNames
==> [ :nodes | nodes second value ]! !
!LEImportSKMPowerTools methodsFor: 'grammar' stamp: 'BenComan 12/24/2012 15:23'!
dataRow
^ (dataCell delimitedBy: $, asParser token) ==> [ :nodes | nodes reject: [ :each | each class = PPToken ] ]! !
!LEImportSKMPowerTools methodsFor: 'grammar' stamp: 'BenComan 12/26/2012 10:30'!
start
^ table star end! !
!LEImportSKMPowerTools methodsFor: 'grammar' stamp: 'BenComan 12/26/2012 10:38'!
table
^ ( (tableName , endOfLine, columnNamesRow , endOfLine , dataRows)
==> [ :nodes | { nodes first . nodes third . nodes fifth } ] )! !
!LEImportSKMPowerTools methodsFor: 'grammar' stamp: 'BenComan 12/24/2012 14:24'!
tableName
^ '// --- ' asParser, #word asParser star flatten, ' ---' asParser
==> [ :nodes | nodes second value ]! !
PPCompositeParser subclass: #LEImportSKMPowerTools
instanceVariableNames: 'endOfLine nonComma start table tableName columnNamesRow columnNames columnName dataRows dataRow dataCell'
LEImportSKMPowerTools>>start
^ table star end
LEImportSKMPowerTools>>table
^ ( (tableName , endOfLine, columnNamesRow , endOfLine , dataRows)
==> [ :nodes | { nodes first . nodes third . nodes fifth } ] )
LEImportSKMPowerTools>>tableName
^ '// --- ' asParser, #word asParser star flatten, ' ---' asParser
==> [ :nodes | nodes second value ]
LEImportSKMPowerTools>>columnNamesRow
^ '//' asParser, columnNames
==> [ :nodes | nodes second value ]
LEImportSKMPowerTools>>columnNames
^ ((columnName delimitedBy: $, asParser token) )
==> [ :nodes | nodes reject: [ :each | each class = PPToken ] ]
LEImportSKMPowerTools>>columnName
^ $< asParser , (#letter asParser / #digit asParser / #space asParser / $_ asParser / $/ asParser ) star flatten , $> asParser
==> [ :nodes | nodes second value ]
LEImportSKMPowerTools>>dataRows
^ (dataRow delimitedBy: endOfLine)
==> [ :nodes | nodes reject: [ :each | each class = PPToken ] ]
LEImportSKMPowerTools>>dataRow
^ (dataCell delimitedBy: $, asParser token)
==> [ :nodes | nodes reject: [ :each | each class = PPToken ] ]
LEImportSKMPowerTools>>dataCell
^ nonComma star flatten
==> [ :nodes | nodes value ]
LEImportSKMPowerTools>>endOfLine
^ #newline asParser token
LEImportSKMPowerTools>>nonComma
^ PPPredicateObjectParser anyExceptAnyOf: {Character tab . Character cr . Character lf . $, }
// Export start : 12/18/12 18:18:07
// Datablock format : All Input Data
// Query name : *ALL COMPONENTS
// --- Bus ---
//<ComponentName>,<ComponentType>,<SystemNominalVoltage>,<AF_ArcType>,<AF_WorkingDistance>
00BFA10,10,415,In Box,609.6
00BFA20,10,415,In Box,609.6
// The following field(s) are enumerated data types
// Either the quoted text or integer may be used in importing
// <InService>: "Out"=0 "In"=1
// <Phase>: "None"=0 "A"=1 "B"=2 "C"=4 "AB"=3 "AC"=5 "BC"=6 "ABC"=7 "AB MidTap"=48 "BC MidTap"=96 "CA MidTap"=80 "ABC MidTap"=112
// <Size/Do Not Size>: "Size"=0 "Do Not Size"=1
// --- Cable ---
//<ComponentName>,<SystemNominalVoltage>,<Phase>,<CableSize>,<NeutralSize>,<Length>,<Size/Do Not Size>
11P08023,11500,ABC,240,,680.0,Do Not Size
11P08024,11500,ABC,240,,610.0,Do Not Size
_______________________________________________
Moose-dev mailing list
Moose-dev@iam.unibe.ch
https://www.iam.unibe.ch/mailman/listinfo/moose-dev