On Fri, Jan 21, 2011 at 5:12 PM, Tudor Girba <tudor.girba(a)gmail.com> wrote:
Thanks for this very sensible analysis.
I would definitely give any database support for an image that can scale to
use the entire promise of 64 bits and of multiple VMs. However, it looks
like it will take a while until we will get that. In the meantime, it is
more practical to use what exists.
Even if relational databases are not at all my preferred option, we know
that for Glorp there was an implementation in VW that served the purpose of
storing the objects and enabling mining algorithms (even at the expense of
high interaction). So, it would be great to salvage this effort and have a
similar support in Pharo.
In any case, I definitely would like to start an effort of looking into
Gemstone and into other object-oriented databases. Anybody interested in
joining the effort?
Just a question: Would the new slate disks not alleviate the problem of the
disk speed?
For Gemstone, Hernan Wilkinson did some test 1 o 2 years ago, and the
difference was....mmmm I dont remember, but I think 40x faster. All the
Gemstone migrations and I don't remember what more, was done with that HDD.
So, yes, at least in Gemstone it changes a lot.
Cheers,
Doru
On 21 Jan 2011, at 12:15, Stephan Eggermont wrote:
On 20 jan 2011, at 21:32, Tudor Girba wrote:
> The goal of Moose is to help us analyze data. This means: modeling,
mining,
measuring, querying, visualizing, browsing etc. To do this, the
prerequisite is being able to manipulate the data. Right now, we have all
objects in memory. To be able to scale we need database support.
Currently, the models have to fit into 32 bit address space. Modern
machines
support much more than that (data points: 16 GB @ 160 Euro for my
current machine, standard workstations support 192GB). Do you have many
models that wouldn't fit in 192 GB?
The kinds of analysis Moose does are not supported efficiently by
standard
relational databases at all. They are optimized for a very
different access scheme: selecting a very small subset of data and changing
that. That means that they are only able to provide reasonable results for
datasets that (nearly) fit into memory. In short: they allow you to avoid
using a 64 bit Pharo image, and are able to use more cores. What you lose is
having to copy data from and to the database and having to generate queries
that don't fit the object model well. They are unlikely to provide better
performance than a straightforward 64 bit Pharo image would, but can provide
a short-term solution.
Datawarehouse style databases (Vertical) and object oriented databases
(Gemstone)
are probably able to do better. Datawarehouse databases by
pregenerating all kinds of cross sections and projections of the data, and
oodbs by navigating instead of joining (and Gemstone by being able to use
all memory). But even there, a lot of the Moose analysis seem to touch a
large part of the model, and the interactivity needed means that disk based
models will never become popular.
Scaling of Moose is more likely to come from going 64 bit and
distributing the
model over multiple vms.
Stephan
_______________________________________________
Moose-dev mailing list
Moose-dev(a)iam.unibe.ch
https://www.iam.unibe.ch/mailman/listinfo/moose-dev
--
www.tudorgirba.com
"We cannot reach the flow of things unless we let go."
_______________________________________________
Moose-dev mailing list
Moose-dev(a)iam.unibe.ch
https://www.iam.unibe.ch/mailman/listinfo/moose-dev