Re: [Pharo-users] GSOC 2015 Call for Ideas - Moose-dev

18 Feb 2015

      Well, you are certainly free to contribute.
Heuristic interpretation of data could be useful, but looks like an addition on top, the core library should be fast and efficient.
...
On 18 Feb 2015, at 10:35, Andrea Ferretti ferrettiandrea@gmail.com wrote:
For an example of what I am talking about, see
http://pandas.pydata.org/pandas-docs/version/0.15.2/io.html#csv-text-files
I agree that this is definitely too much options, but it gets the job
done for quick and dirty exploration.
The fact is that working with a dump of table on your db, whose
content you know, requires different tools than exploring the latest
opendata that your local municipality has put online, using yet
another messy format.
Enterprise programmers deal more often with the former, data
scientists with the latter, and I think there is room for both kind of
tools
2015-02-18 10:26 GMT+01:00 Andrea Ferretti ferrettiandrea@gmail.com:
...
Thank you Sven. I think this should be emphasized and prominent on the
home page*. Still, libraries such as pandas are even more lenient,
doing things such as:

autodetecting which fields are numeric in CSV files
allowing to fill missing data based on statistics (for instance, you

can say: where the field `age` is missing, use the average age)
Probably there is room for something built on top of Neo

by the way, I suggest that the documentation on Neo could benefit

from a reorganization. Right now, the first topic  on the NeoJSON
paper introduces JSON itself. I would argue that everyone that tries
to use the library knows what JSON is already. Still, there is no
example of how to read JSON from a file in the whole document.
2015-02-18 10:12 GMT+01:00 Sven Van Caekenberghe sven@stfx.eu:
...
...
On 18 Feb 2015, at 09:52, Andrea Ferretti ferrettiandrea@gmail.com wrote:
Also, these tasks
often involve consuming data from various sources, such as CSV and
Json files. NeoCSV and NeoJSON are still a little too rigid for the
task - libraries like pandas allow to just feed a csv file and try to
make head or tails of the content without having to define too much of
a schema beforehand
Both NeoCSV and NeoJSON can operate in two ways, (1) without the definition of any schema's or (2) with the definition of schema's and mappings. The quick and dirty explore style is most certainly possible.
'my-data.csv' asFileReference readStreamDo: [ :in | (NeoCSVReader on: in) upToEnd ].
=> an array of arrays
'my-data.json' asFileReference readStreamDo: [ :in | (NeoJSONReader on: in) next ].
=> objects structured using dictionaries and arrays
Sven