Hi Ben,
I did not look into the details (mostly because you did not provide easily loadable
code), but from what I can read:
- you need to create a stack in your parser to make sure that the rows go to the right
component
- one way to do it is to define a components variable in the Parser
- make sure you declare it in ignoredNames
Parser class>>ignoredNames
^ super ignoredNames , #(components)
- in componentNamesRow, you need to add to components with the newly created one
- always associate the dataRows to the last created component
Cheers,
Doru
On 26 Dec 2012, at 08:17, Ben Coman
<btc(a)openInWorld.com>
wrote:
I tried PetitParser for the first time a few days
ago, to import some data required for testing my Masters project (which is due
RealSoonNow(c)). I found it easy to adapt PetitCSV for part of my requirement, but am
having trouble extending it further. I was hoping to get a lot done this vacation on
completing my project but this is slowing me down.
Now since everyone has better things to do this festive break, I am offering two small
bounties of AUD $50 to hopefully draw the attention of someone more familiar with
PetitParser, for whom this will hopefully be much quicker than my random experimenting.
The attached LEImportSKMPowerToolsData.txt shows a sample of what I need to parse. A
single file containing multiple tables (12 of) with the data of each table in CSV format.
Tables start with a line "// --- tablename ---", followed by column names on the
next line, followed by multiple lines of data.
Using the attached LEImportSKMPowerTools.st (also see attached LEImportSKMPowerTools.txt
formatted for quick reference )
I can successfully parse the first table of the following sample...
=========
// --- Bus ---
//<ComponentName>,<ComponentType>,<SystemNominalVoltage>,<AF_ArcType>,<AF_WorkingDistance>
00BFA10,10,415,In Box,609.6
00BFA20,10,415,In Box,609.6
// --- Cable ---
//<ComponentName>,<SystemNominalVoltage>,<Phase>,<CableSize>,<NeutralSize>,<Length>,<Size/Do
Not Size>
11P08023,11500,ABC,240,,680.0,Do Not Size
11P08024,11500,ABC,240,,610.0,Do Not Size
=========
... but the next table ends up as a subpart of the first as shown in the following result
for the above sample...
#(
#('Bus'
#('ComponentName' 'ComponentType' 'SystemNominalVoltage'
'AF_ArcType' 'AF_WorkingDistance')
#(
#('00BFA10' '10' '415' 'In Box'
'609.6')
#('00BFA20' '10' '415' 'In Box'
'609.6')
#('')
#('// --- Cable ---')
#('//<ComponentName>' '<SystemNominalVoltage>'
'<Phase>' '<CableSize>' '<NeutralSize>'
'<Length>' '<Size/Do Not Size>')
#('11P08023' '11500' 'ABC' '240' ''
'680.0' 'Do Not Size')
#('11P08024' '11500' 'ABC' '240' ''
'610.0' 'Do Not Size')
)
)
)
Essentially I don't know how to close off the pattern matching at the blank line
following the data rows, so that the next table can be started. Also something I
hadn't looked at yet, LEIMportSKMPowerToolsData.txt has comment lines starting with
'//' that need to be ignored along with the blank lines.
So two things are required for the first $50.
1. The above result needs to instead have two close-round-brackets in place of the blank
line #('') so that #('Cable' appears at the same level as
#('Bus'
2. The comments and blank lines need to be ignored.
In addition, all of the PetitParser documentation that I find says "parse a string
(or stream)" but I see no actual samples of using a stream. Changing from using a
string to a stream is pretty basic I should probably be able to guess, but while I have
your attention I may as well learn what is best practice.
------------
I also offer a second AUD $50 for the following...
1. A method to which I pass a filename, that will return an object which can be accessed
with the tablename to return a collection of rows,
2. A cell in one of those rows can be accessed by the column name.
This will just be a temporary step to move the data into my own internal format, so it
doesn't need to be elegant. I had two rough separate ideas:
a. A simple dictionary of array of dictionaries.
b. A method that when passed a column name will return the array index into a row.
------------
So I hope that it worth someone's while to attend to this quickly. Technical
discussion should continue here. I am not quite sure yet how I'll handle multiple
solutions. The method of payment will need to be discussed privately with contributors
once requirements are met.
hope you are all enjoying your festive season...
cheers -ben
'From Pharo1.4 of 18 April 2012 [Latest update: #14457] on 26 December 2012 at
10:58:18 am'!
PPCompositeParser subclass: #LEImportSKMPowerTools
instanceVariableNames: 'endOfLine nonComma start tableName columnNames columnName
columnNamesRow table dataCell dataRow dataRows'
classVariableNames: ''
poolDictionaries: ''
category: 'Lektrek-ImportExport'!
!LEImportSKMPowerTools methodsFor: 'as yet unclassified' stamp: 'BenComan
12/24/2012 15:23'!
dataCell
^ nonComma star flatten
==> [ :nodes | nodes value ]! !
!LEImportSKMPowerTools methodsFor: 'as yet unclassified' stamp: 'BenComan
12/24/2012 19:37'!
dataRows
^ (dataRow delimitedBy: endOfLine) ==> [ :nodes | nodes reject: [ :each | each class
= PPToken ] ]! !
!LEImportSKMPowerTools methodsFor: 'as yet unclassified' stamp: 'BenComan
12/24/2012 12:18'!
endOfLine
^ #newline asParser token! !
!LEImportSKMPowerTools methodsFor: 'as yet unclassified' stamp: 'BenComan
12/24/2012 12:23'!
nonComma
^ PPPredicateObjectParser anyExceptAnyOf: {Character tab . Character cr . Character lf .
$, }! !
!LEImportSKMPowerTools methodsFor: 'grammar' stamp: 'BenComan 12/24/2012
15:38'!
columnName
^ $< asParser , (#letter asParser / #digit asParser / #space asParser / $_ asParser /
$/ asParser ) star flatten , $> asParser
==> [ :nodes | nodes second value ]! !
!LEImportSKMPowerTools methodsFor: 'grammar' stamp: 'BenComan 12/24/2012
14:54'!
columnNames
^ ((columnName delimitedBy: $, asParser token) ) ==> [ :nodes | nodes reject: [ :each
| each class = PPToken ] ]! !
!LEImportSKMPowerTools methodsFor: 'grammar' stamp: 'BenComan 12/24/2012
14:56'!
columnNamesRow
^ '//' asParser, columnNames
==> [ :nodes | nodes second value ]! !
!LEImportSKMPowerTools methodsFor: 'grammar' stamp: 'BenComan 12/24/2012
15:23'!
dataRow
^ (dataCell delimitedBy: $, asParser token) ==> [ :nodes | nodes reject: [ :each |
each class = PPToken ] ]! !
!LEImportSKMPowerTools methodsFor: 'grammar' stamp: 'BenComan 12/26/2012
10:30'!
start
^ table star end! !
!LEImportSKMPowerTools methodsFor: 'grammar' stamp: 'BenComan 12/26/2012
10:38'!
table
^ ( (tableName , endOfLine, columnNamesRow , endOfLine , dataRows)
==> [ :nodes | { nodes first . nodes third . nodes fifth } ] )! !
!LEImportSKMPowerTools methodsFor: 'grammar' stamp: 'BenComan 12/24/2012
14:24'!
tableName
^ '// --- ' asParser, #word asParser star flatten, ' ---' asParser
==> [ :nodes | nodes second value ]! !
PPCompositeParser subclass: #LEImportSKMPowerTools
instanceVariableNames: 'endOfLine nonComma start table tableName columnNamesRow
columnNames columnName dataRows dataRow dataCell'
LEImportSKMPowerTools>>start
^ table star end
LEImportSKMPowerTools>>table
^ ( (tableName , endOfLine, columnNamesRow , endOfLine , dataRows)
==> [ :nodes | { nodes first . nodes third . nodes fifth } ] )
LEImportSKMPowerTools>>tableName
^ '// --- ' asParser, #word asParser star flatten, ' ---' asParser
==> [ :nodes | nodes second value ]
LEImportSKMPowerTools>>columnNamesRow
^ '//' asParser, columnNames
==> [ :nodes | nodes second value ]
LEImportSKMPowerTools>>columnNames
^ ((columnName delimitedBy: $, asParser token) )
==> [ :nodes | nodes reject: [ :each | each class = PPToken ] ]
LEImportSKMPowerTools>>columnName
^ $< asParser , (#letter asParser / #digit asParser / #space asParser / $_ asParser /
$/ asParser ) star flatten , $> asParser
==> [ :nodes | nodes second value ]
LEImportSKMPowerTools>>dataRows
^ (dataRow delimitedBy: endOfLine)
==> [ :nodes | nodes reject: [ :each | each class = PPToken ] ]
LEImportSKMPowerTools>>dataRow
^ (dataCell delimitedBy: $, asParser token)
==> [ :nodes | nodes reject: [ :each | each class = PPToken ] ]
LEImportSKMPowerTools>>dataCell
^ nonComma star flatten
==> [ :nodes | nodes value ]
LEImportSKMPowerTools>>endOfLine
^ #newline asParser token
LEImportSKMPowerTools>>nonComma
^ PPPredicateObjectParser anyExceptAnyOf: {Character tab . Character cr . Character lf .
$, }
// Export start : 12/18/12 18:18:07
// Datablock format : All Input Data
// Query name : *ALL COMPONENTS
// --- Bus ---
//<ComponentName>,<ComponentType>,<SystemNominalVoltage>,<AF_ArcType>,<AF_WorkingDistance>
00BFA10,10,415,In Box,609.6
00BFA20,10,415,In Box,609.6
// The following field(s) are enumerated data types
// Either the quoted text or integer may be used in importing
// <InService>: "Out"=0 "In"=1
// <Phase>: "None"=0 "A"=1 "B"=2 "C"=4
"AB"=3 "AC"=5 "BC"=6 "ABC"=7 "AB
MidTap"=48 "BC MidTap"=96 "CA MidTap"=80 "ABC
MidTap"=112
// <Size/Do Not Size>: "Size"=0 "Do Not Size"=1
// --- Cable ---
//<ComponentName>,<SystemNominalVoltage>,<Phase>,<CableSize>,<NeutralSize>,<Length>,<Size/Do
Not Size>
11P08023,11500,ABC,240,,680.0,Do Not Size
11P08024,11500,ABC,240,,610.0,Do Not Size
_______________________________________________
Moose-dev mailing list
Moose-dev(a)iam.unibe.ch
https://www.iam.unibe.ch/mailman/listinfo/moose-dev
"We are all great at making mistakes."
_______________________________________________
Moose-dev mailing list
Moose-dev(a)iam.unibe.ch