what is the status of FameDB

List overview All Threads
Download

newer

older

Re: Issue 21 in moose-technology:...

current development version is 4.3

Stéphane Ducasse

19 Jan 2011 19 Jan '11

9:39 a.m.

Hi guys

in the past marco did a bridge to database for metadescribed but since famix was not regular we could not get famix models in databases. So I would be interested to know if there is an ongoing effort to do that? Because this would be a real plus for FameDescribed models. May be using Glorp and I was thinking that this would be good also to get all the source code of pharo in a db using Ring. And get Torch there too.

Stef

Show replies by date

Tudor Girba

19 Jan 19 Jan

9:53 a.m.

Hi,

Indeed, this is an important project. The project of Marco was MetaDB and it was done in VW for Meta using Glorp.

Alberto and me expressed the interest of working on FameDB because he needs it for his experiments. The project is still at the beginning - we only discussed the intention. It would be great if we would get someone else around this, especially with some DB know-how.

Cheers, Doru

On 19 Jan 2011, at 09:39, Stéphane Ducasse wrote:

...

Hi guys

in the past marco did a bridge to database for metadescribed but since famix was not regular we could not get famix models in databases. So I would be interested to know if there is an ongoing effort to do that? Because this would be a real plus for FameDescribed models. May be using Glorp and I was thinking that this would be good also to get all the source code of pharo in a db using Ring. And get Torch there too.

Stef _______________________________________________ Moose-dev mailing list Moose-dev@iam.unibe.ch https://www.iam.unibe.ch/mailman/listinfo/moose-dev

-- www.tudorgirba.com

"Beauty is where we see it."

Stéphane Ducasse

10:03 a.m.

On Jan 19, 2011, at 9:53 AM, Tudor Girba wrote:

...

Hi,

Indeed, this is an important project. The project of Marco was MetaDB and it was done in VW for Meta using Glorp.

Alberto and me expressed the interest of working on FameDB because he needs it for his experiments. The project is still at the beginning - we only discussed the intention. It would be great if we would get someone else around this, especially with some DB know-how.

I was thinking that we could allocate some months of Cyrille in that task. And Mariano is expert on DB so we could ask him some feedback and advices

Stef

...

Cheers, Doru

On 19 Jan 2011, at 09:39, Stéphane Ducasse wrote:

...
Hi guys

in the past marco did a bridge to database for metadescribed but since famix was not regular we could not get famix models in databases. So I would be interested to know if there is an ongoing effort to do that? Because this would be a real plus for FameDescribed models. May be using Glorp and I was thinking that this would be good also to get all the source code of pharo in a db using Ring. And get Torch there too.

Stef _______________________________________________ Moose-dev mailing list Moose-dev@iam.unibe.ch https://www.iam.unibe.ch/mailman/listinfo/moose-dev

-- www.tudorgirba.com

"Beauty is where we see it."

Moose-dev mailing list Moose-dev@iam.unibe.ch https://www.iam.unibe.ch/mailman/listinfo/moose-dev

Tudor Girba

10:13 a.m.

That would be so great! We also said that Cyrille would work on some Morphs, but I guess it would be better to focus on the DB.

Cheers, Doru

On 19 Jan 2011, at 10:03, Stéphane Ducasse wrote:

...

On Jan 19, 2011, at 9:53 AM, Tudor Girba wrote:

...
Hi,

Indeed, this is an important project. The project of Marco was MetaDB and it was done in VW for Meta using Glorp.

Alberto and me expressed the interest of working on FameDB because he needs it for his experiments. The project is still at the beginning - we only discussed the intention. It would be great if we would get someone else around this, especially with some DB know-how.

I was thinking that we could allocate some months of Cyrille in that task. And Mariano is expert on DB so we could ask him some feedback and advices

Stef

...
Cheers, Doru

On 19 Jan 2011, at 09:39, Stéphane Ducasse wrote:

...
Hi guys

in the past marco did a bridge to database for metadescribed but since famix was not regular we could not get famix models in databases. So I would be interested to know if there is an ongoing effort to do that? Because this would be a real plus for FameDescribed models. May be using Glorp and I was thinking that this would be good also to get all the source code of pharo in a db using Ring. And get Torch there too.

Stef _______________________________________________ Moose-dev mailing list Moose-dev@iam.unibe.ch https://www.iam.unibe.ch/mailman/listinfo/moose-dev

-- www.tudorgirba.com

"Beauty is where we see it."

Moose-dev mailing list Moose-dev@iam.unibe.ch https://www.iam.unibe.ch/mailman/listinfo/moose-dev

Moose-dev mailing list Moose-dev@iam.unibe.ch https://www.iam.unibe.ch/mailman/listinfo/moose-dev

-- www.tudorgirba.com

"Reasonable is what we are accustomed with."

Stéphane Ducasse

10:30 a.m.

On Jan 19, 2011, at 10:13 AM, Tudor Girba wrote:

...

That would be so great! We also said that Cyrille would work on some Morphs, but I guess it would be better to focus on the DB.

Yes right now focusing on RPackage integration. We should give him feedback.

Stef

...

Cheers, Doru

On 19 Jan 2011, at 10:03, Stéphane Ducasse wrote:

...
On Jan 19, 2011, at 9:53 AM, Tudor Girba wrote:

...
Hi,

Indeed, this is an important project. The project of Marco was MetaDB and it was done in VW for Meta using Glorp.

Alberto and me expressed the interest of working on FameDB because he needs it for his experiments. The project is still at the beginning - we only discussed the intention. It would be great if we would get someone else around this, especially with some DB know-how.

I was thinking that we could allocate some months of Cyrille in that task. And Mariano is expert on DB so we could ask him some feedback and advices

Stef

...
Cheers, Doru

On 19 Jan 2011, at 09:39, Stéphane Ducasse wrote:

...
Hi guys

in the past marco did a bridge to database for metadescribed but since famix was not regular we could not get famix models in databases. So I would be interested to know if there is an ongoing effort to do that? Because this would be a real plus for FameDescribed models. May be using Glorp and I was thinking that this would be good also to get all the source code of pharo in a db using Ring. And get Torch there too.

Stef _______________________________________________ Moose-dev mailing list Moose-dev@iam.unibe.ch https://www.iam.unibe.ch/mailman/listinfo/moose-dev

-- www.tudorgirba.com

"Beauty is where we see it."

Moose-dev mailing list Moose-dev@iam.unibe.ch https://www.iam.unibe.ch/mailman/listinfo/moose-dev

Moose-dev mailing list Moose-dev@iam.unibe.ch https://www.iam.unibe.ch/mailman/listinfo/moose-dev

-- www.tudorgirba.com

"Reasonable is what we are accustomed with."

Moose-dev mailing list Moose-dev@iam.unibe.ch https://www.iam.unibe.ch/mailman/listinfo/moose-dev

Stephan Eggermont

20 Jan 20 Jan

12:12 a.m.

On 19 jan 2011, at 09:39, Stéphane Ducasse wrote:

...

in the past marco did a bridge to database for metadescribed but since famix was not regular we could not get famix models in databases. So I would be interested to know if there is an ongoing effort to do that? Because this would be a real plus for FameDescribed models. May be using Glorp and I was thinking that this would be good also to get all the source code of pharo in a db using Ring. And get Torch there too.

What do you hope to achieve by having these models in a relational database?

Stephan Eggermont

Tudor Girba

12:31 a.m.

A database would provide scalability.

Of course, it does not have to be a relational database. I would also like to experiment an OO one, but I know even less about these :)

Cheers, Doru

On 20 Jan 2011, at 00:12, Stephan Eggermont wrote:

...

On 19 jan 2011, at 09:39, Stéphane Ducasse wrote:

...
in the past marco did a bridge to database for metadescribed but since famix was not regular we could not get famix models in databases. So I would be interested to know if there is an ongoing effort to do that? Because this would be a real plus for FameDescribed models. May be using Glorp and I was thinking that this would be good also to get all the source code of pharo in a db using Ring. And get Torch there too.

What do you hope to achieve by having these models in a relational database?

Stephan Eggermont _______________________________________________ Moose-dev mailing list Moose-dev@iam.unibe.ch https://www.iam.unibe.ch/mailman/listinfo/moose-dev

-- www.tudorgirba.com

"In a world where everything is moving ever faster, one might have better chances to win by moving slower."

Stéphane Ducasse

12:22 p.m.

...

...
in the past marco did a bridge to database for metadescribed but since famix was not regular we could not get famix models in databases. So I would be interested to know if there is an ongoing effort to do that? Because this would be a real plus for FameDescribed models. May be using Glorp and I was thinking that this would be good also to get all the source code of pharo in a db using Ring. And get Torch there too.

What do you hope to achieve by having these models in a relational database?

Query all the versions of a class. All the difference between to changeset over a stream of changes. We do not care about relational or not. But we should start somewhere.

Stef

Esteban Lorenzano

12:33 p.m.

Hi, I strongly recommend NOT to use a relational database. It is a very bad solution for OO (for several reasons, the best example I read ever is: "is like disassemble your car every night before sleep, and reassemble it every morning, before going to work"). I worked with relational databases a lot and I can ensure that it is a pain in the a**. Yes, with Mariano we did SqueakDBX, but that was just because in lots of jobs the database is not an option, you must use a relational db (and usually oracle), not because we think it is good for programming. Of course, gemstone can be a better solution... but if we want scalability for free, a nosql solution here can be a good choice (and there are some implementations in pharo, to choose one)

btw...a document oriented database (like MongoDB, already implemented for Pharo) is a good approach for non-regular structures.

my 2c.

best, Esteban

El 20/01/2011, a las 8:22a.m., Stéphane Ducasse escribió:

...

...
...
in the past marco did a bridge to database for metadescribed but since famix was not regular we could not get famix models in databases. So I would be interested to know if there is an ongoing effort to do that? Because this would be a real plus for FameDescribed models. May be using Glorp and I was thinking that this would be good also to get all the source code of pharo in a db using Ring. And get Torch there too.

What do you hope to achieve by having these models in a relational database?

Query all the versions of a class. All the difference between to changeset over a stream of changes. We do not care about relational or not. But we should start somewhere.

Stef _______________________________________________ Moose-dev mailing list Moose-dev@iam.unibe.ch https://www.iam.unibe.ch/mailman/listinfo/moose-dev

Stéphane Ducasse

12:41 p.m.

thanks for the point.

Stef

On Jan 20, 2011, at 12:33 PM, Esteban Lorenzano wrote:

...

Hi, I strongly recommend NOT to use a relational database. It is a very bad solution for OO (for several reasons, the best example I read ever is: "is like disassemble your car every night before sleep, and reassemble it every morning, before going to work"). I worked with relational databases a lot and I can ensure that it is a pain in the a**. Yes, with Mariano we did SqueakDBX, but that was just because in lots of jobs the database is not an option, you must use a relational db (and usually oracle), not because we think it is good for programming. Of course, gemstone can be a better solution... but if we want scalability for free, a nosql solution here can be a good choice (and there are some implementations in pharo, to choose one)

btw...a document oriented database (like MongoDB, already implemented for Pharo) is a good approach for non-regular structures.

my 2c.

best, Esteban

El 20/01/2011, a las 8:22a.m., Stéphane Ducasse escribió:

...
...
...
in the past marco did a bridge to database for metadescribed but since famix was not regular we could not get famix models in databases. So I would be interested to know if there is an ongoing effort to do that? Because this would be a real plus for FameDescribed models. May be using Glorp and I was thinking that this would be good also to get all the source code of pharo in a db using Ring. And get Torch there too.

What do you hope to achieve by having these models in a relational database?

Query all the versions of a class. All the difference between to changeset over a stream of changes. We do not care about relational or not. But we should start somewhere.

Stef _______________________________________________ Moose-dev mailing list Moose-dev@iam.unibe.ch https://www.iam.unibe.ch/mailman/listinfo/moose-dev

Moose-dev mailing list Moose-dev@iam.unibe.ch https://www.iam.unibe.ch/mailman/listinfo/moose-dev

Simon Denier

12:41 p.m.

On 20 janv. 2011, at 12:33, Esteban Lorenzano wrote:

...

Hi, I strongly recommend NOT to use a relational database. It is a very bad solution for OO (for several reasons, the best example I read ever is: "is like disassemble your car every night before sleep, and reassemble it every morning, before going to work"). I worked with relational databases a lot and I can ensure that it is a pain in the a**. Yes, with Mariano we did SqueakDBX, but that was just because in lots of jobs the database is not an option, you must use a relational db (and usually oracle), not because we think it is good for programming. Of course, gemstone can be a better solution... but if we want scalability for free, a nosql solution here can be a good choice (and there are some implementations in pharo, to choose one)

btw...a document oriented database (like MongoDB, already implemented for Pharo) is a good approach for non-regular structures.

Thanks for the tip Esteban, that's interesting news.

FYI https://twitter.com/#!/renggli/status/17317173076

#moose got it right from the beginning: "For truly deep program analyses, relational database still don't work." Oege de Moor at #tools2010

...

my 2c.

best, Esteban

El 20/01/2011, a las 8:22a.m., Stéphane Ducasse escribió:

...
...
...
in the past marco did a bridge to database for metadescribed but since famix was not regular we could not get famix models in databases. So I would be interested to know if there is an ongoing effort to do that? Because this would be a real plus for FameDescribed models. May be using Glorp and I was thinking that this would be good also to get all the source code of pharo in a db using Ring. And get Torch there too.

What do you hope to achieve by having these models in a relational database?

Query all the versions of a class. All the difference between to changeset over a stream of changes. We do not care about relational or not. But we should start somewhere.

Stef _______________________________________________ Moose-dev mailing list Moose-dev@iam.unibe.ch https://www.iam.unibe.ch/mailman/listinfo/moose-dev

Moose-dev mailing list Moose-dev@iam.unibe.ch https://www.iam.unibe.ch/mailman/listinfo/moose-dev

-- Simon Denier

Norbert Hartl

9:06 p.m.

On 19.01.2011, at 09:39, Stéphane Ducasse wrote:

...

Hi guys

in the past marco did a bridge to database for metadescribed but since famix was not regular we could not get famix models in databases. So I would be interested to know if there is an ongoing effort to do that? Because this would be a real plus for FameDescribed models. May be using Glorp and I was thinking that this would be good also to get all the source code of pharo in a db using Ring. And get Torch there too.

Stef,

are there any specific requirements on the db? Is this for long time storage of a lot of things or just to extend the main memory for a definite amount of time? What will be the main actions to do? Searching for symbols/methods/class names or traversing stuff?

In case you want to add over time a lot of code and want to have general search capabilities that can be stored in a central location it might be a good idea to ask gemstone to sponsor a full license and install this somewhere in inria. Or do I missing the point?

Norbert

Tudor Girba

9:32 p.m.

Hi Norbert,

The goal of Moose is to help us analyze data. This means: modeling, mining, measuring, querying, visualizing, browsing etc. To do this, the prerequisite is being able to manipulate the data. Right now, we have all objects in memory. To be able to scale we need database support.

So, all in all, it's not for a specific use case, but for any model that is described by Fame.

Cheers, Doru

On 20 Jan 2011, at 21:06, Norbert Hartl wrote:

...

On 19.01.2011, at 09:39, Stéphane Ducasse wrote:

...
Hi guys

in the past marco did a bridge to database for metadescribed but since famix was not regular we could not get famix models in databases. So I would be interested to know if there is an ongoing effort to do that? Because this would be a real plus for FameDescribed models. May be using Glorp and I was thinking that this would be good also to get all the source code of pharo in a db using Ring. And get Torch there too.

Stef,

are there any specific requirements on the db? Is this for long time storage of a lot of things or just to extend the main memory for a definite amount of time? What will be the main actions to do? Searching for symbols/methods/class names or traversing stuff?

In case you want to add over time a lot of code and want to have general search capabilities that can be stored in a central location it might be a good idea to ask gemstone to sponsor a full license and install this somewhere in inria. Or do I missing the point?

Norbert _______________________________________________ Moose-dev mailing list Moose-dev@iam.unibe.ch https://www.iam.unibe.ch/mailman/listinfo/moose-dev

-- www.tudorgirba.com

"Beauty is where we see it."

Nicolas Anquetil

21 Jan 21 Jan

10:02 a.m.

But I believe the main goal is storing lot of data and being able to query it fast.

So Querying and scalability are the main requirements. Other requirement like lot of updates, or top level security (redundancy, ...) or less important in this case

nicolas

----- Mail original -----

...

De: "Tudor Girba" tudor.girba@gmail.com À: "Moose-related development" moose-dev@iam.unibe.ch Envoyé: Jeudi 20 Janvier 2011 21:32:40 Objet: [Moose-dev] Re: what is the status of FameDB Hi Norbert,

The goal of Moose is to help us analyze data. This means: modeling, mining, measuring, querying, visualizing, browsing etc. To do this, the prerequisite is being able to manipulate the data. Right now, we have all objects in memory. To be able to scale we need database support.

So, all in all, it's not for a specific use case, but for any model that is described by Fame.

Cheers, Doru

On 20 Jan 2011, at 21:06, Norbert Hartl wrote:

...
On 19.01.2011, at 09:39, Stéphane Ducasse wrote:

...
Hi guys

in the past marco did a bridge to database for metadescribed but since famix was not regular we could not get famix models in databases. So I would be interested to know if there is an ongoing effort to do that? Because this would be a real plus for FameDescribed models. May be using Glorp and I was thinking that this would be good also to get all the source code of pharo in a db using Ring. And get Torch there too.

Stef,

are there any specific requirements on the db? Is this for long time storage of a lot of things or just to extend the main memory for a definite amount of time? What will be the main actions to do? Searching for symbols/methods/class names or traversing stuff?

In case you want to add over time a lot of code and want to have general search capabilities that can be stored in a central location it might be a good idea to ask gemstone to sponsor a full license and install this somewhere in inria. Or do I missing the point?

Norbert _______________________________________________ Moose-dev mailing list Moose-dev@iam.unibe.ch https://www.iam.unibe.ch/mailman/listinfo/moose-dev

-- www.tudorgirba.com

"Beauty is where we see it."

Moose-dev mailing list Moose-dev@iam.unibe.ch https://www.iam.unibe.ch/mailman/listinfo/moose-dev

Stephan Eggermont

12:15 p.m.

On 20 jan 2011, at 21:32, Tudor Girba wrote:

...

The goal of Moose is to help us analyze data. This means: modeling, mining, measuring, querying, visualizing, browsing etc. To do this, the prerequisite is being able to manipulate the data. Right now, we have all objects in memory. To be able to scale we need database support.

Currently, the models have to fit into 32 bit address space. Modern machines support much more than that (data points: 16 GB @ 160 Euro for my current machine, standard workstations support 192GB). Do you have many models that wouldn't fit in 192 GB?

The kinds of analysis Moose does are not supported efficiently by standard relational databases at all. They are optimized for a very different access scheme: selecting a very small subset of data and changing that. That means that they are only able to provide reasonable results for datasets that (nearly) fit into memory. In short: they allow you to avoid using a 64 bit Pharo image, and are able to use more cores. What you lose is having to copy data from and to the database and having to generate queries that don't fit the object model well. They are unlikely to provide better performance than a straightforward 64 bit Pharo image would, but can provide a short-term solution.

Datawarehouse style databases (Vertical) and object oriented databases (Gemstone) are probably able to do better. Datawarehouse databases by pregenerating all kinds of cross sections and projections of the data, and oodbs by navigating instead of joining (and Gemstone by being able to use all memory). But even there, a lot of the Moose analysis seem to touch a large part of the model, and the interactivity needed means that disk based models will never become popular.

Scaling of Moose is more likely to come from going 64 bit and distributing the model over multiple vms.

Stephan

Tudor Girba

5:12 p.m.

Thanks for this very sensible analysis.

I would definitely give any database support for an image that can scale to use the entire promise of 64 bits and of multiple VMs. However, it looks like it will take a while until we will get that. In the meantime, it is more practical to use what exists.

Even if relational databases are not at all my preferred option, we know that for Glorp there was an implementation in VW that served the purpose of storing the objects and enabling mining algorithms (even at the expense of high interaction). So, it would be great to salvage this effort and have a similar support in Pharo.

In any case, I definitely would like to start an effort of looking into Gemstone and into other object-oriented databases. Anybody interested in joining the effort?

Just a question: Would the new slate disks not alleviate the problem of the disk speed?

Cheers, Doru

On 21 Jan 2011, at 12:15, Stephan Eggermont wrote:

...

On 20 jan 2011, at 21:32, Tudor Girba wrote:

...
The goal of Moose is to help us analyze data. This means: modeling, mining, measuring, querying, visualizing, browsing etc. To do this, the prerequisite is being able to manipulate the data. Right now, we have all objects in memory. To be able to scale we need database support.

Currently, the models have to fit into 32 bit address space. Modern machines support much more than that (data points: 16 GB @ 160 Euro for my current machine, standard workstations support 192GB). Do you have many models that wouldn't fit in 192 GB?

The kinds of analysis Moose does are not supported efficiently by standard relational databases at all. They are optimized for a very different access scheme: selecting a very small subset of data and changing that. That means that they are only able to provide reasonable results for datasets that (nearly) fit into memory. In short: they allow you to avoid using a 64 bit Pharo image, and are able to use more cores. What you lose is having to copy data from and to the database and having to generate queries that don't fit the object model well. They are unlikely to provide better performance than a straightforward 64 bit Pharo image would, but can provide a short-term solution.

Datawarehouse style databases (Vertical) and object oriented databases (Gemstone) are probably able to do better. Datawarehouse databases by pregenerating all kinds of cross sections and projections of the data, and oodbs by navigating instead of joining (and Gemstone by being able to use all memory). But even there, a lot of the Moose analysis seem to touch a large part of the model, and the interactivity needed means that disk based models will never become popular.

Scaling of Moose is more likely to come from going 64 bit and distributing the model over multiple vms.

Stephan

Moose-dev mailing list Moose-dev@iam.unibe.ch https://www.iam.unibe.ch/mailman/listinfo/moose-dev

-- www.tudorgirba.com

"We cannot reach the flow of things unless we let go."

Stéphane Ducasse

10:58 p.m.

I would like to learn more about gemstone. Now I'm a bit full.

Stef

On Jan 21, 2011, at 5:12 PM, Tudor Girba wrote:

...

Thanks for this very sensible analysis.

I would definitely give any database support for an image that can scale to use the entire promise of 64 bits and of multiple VMs. However, it looks like it will take a while until we will get that. In the meantime, it is more practical to use what exists.

Even if relational databases are not at all my preferred option, we know that for Glorp there was an implementation in VW that served the purpose of storing the objects and enabling mining algorithms (even at the expense of high interaction). So, it would be great to salvage this effort and have a similar support in Pharo.

In any case, I definitely would like to start an effort of looking into Gemstone and into other object-oriented databases. Anybody interested in joining the effort?

Just a question: Would the new slate disks not alleviate the problem of the disk speed?

Cheers, Doru

On 21 Jan 2011, at 12:15, Stephan Eggermont wrote:

...
On 20 jan 2011, at 21:32, Tudor Girba wrote:

...
The goal of Moose is to help us analyze data. This means: modeling, mining, measuring, querying, visualizing, browsing etc. To do this, the prerequisite is being able to manipulate the data. Right now, we have all objects in memory. To be able to scale we need database support.

Currently, the models have to fit into 32 bit address space. Modern machines support much more than that (data points: 16 GB @ 160 Euro for my current machine, standard workstations support 192GB). Do you have many models that wouldn't fit in 192 GB?

The kinds of analysis Moose does are not supported efficiently by standard relational databases at all. They are optimized for a very different access scheme: selecting a very small subset of data and changing that. That means that they are only able to provide reasonable results for datasets that (nearly) fit into memory. In short: they allow you to avoid using a 64 bit Pharo image, and are able to use more cores. What you lose is having to copy data from and to the database and having to generate queries that don't fit the object model well. They are unlikely to provide better performance than a straightforward 64 bit Pharo image would, but can provide a short-term solution.

Datawarehouse style databases (Vertical) and object oriented databases (Gemstone) are probably able to do better. Datawarehouse databases by pregenerating all kinds of cross sections and projections of the data, and oodbs by navigating instead of joining (and Gemstone by being able to use all memory). But even there, a lot of the Moose analysis seem to touch a large part of the model, and the interactivity needed means that disk based models will never become popular.

Scaling of Moose is more likely to come from going 64 bit and distributing the model over multiple vms.

Stephan

Moose-dev mailing list Moose-dev@iam.unibe.ch https://www.iam.unibe.ch/mailman/listinfo/moose-dev

-- www.tudorgirba.com

"We cannot reach the flow of things unless we let go."

Moose-dev mailing list Moose-dev@iam.unibe.ch https://www.iam.unibe.ch/mailman/listinfo/moose-dev

Mariano Martinez Peck

22 Jan 22 Jan

11:10 a.m.

On Fri, Jan 21, 2011 at 5:12 PM, Tudor Girba tudor.girba@gmail.com wrote:

...

Thanks for this very sensible analysis.

I would definitely give any database support for an image that can scale to use the entire promise of 64 bits and of multiple VMs. However, it looks like it will take a while until we will get that. In the meantime, it is more practical to use what exists.

Even if relational databases are not at all my preferred option, we know that for Glorp there was an implementation in VW that served the purpose of storing the objects and enabling mining algorithms (even at the expense of high interaction). So, it would be great to salvage this effort and have a similar support in Pharo.

In any case, I definitely would like to start an effort of looking into Gemstone and into other object-oriented databases. Anybody interested in joining the effort?

Just a question: Would the new slate disks not alleviate the problem of the disk speed?

For Gemstone, Hernan Wilkinson did some test 1 o 2 years ago, and the difference was....mmmm I dont remember, but I think 40x faster. All the Gemstone migrations and I don't remember what more, was done with that HDD. So, yes, at least in Gemstone it changes a lot.

...

Cheers, Doru

On 21 Jan 2011, at 12:15, Stephan Eggermont wrote:

...
On 20 jan 2011, at 21:32, Tudor Girba wrote:

...
The goal of Moose is to help us analyze data. This means: modeling,

mining, measuring, querying, visualizing, browsing etc. To do this, the prerequisite is being able to manipulate the data. Right now, we have all objects in memory. To be able to scale we need database support.

...
Currently, the models have to fit into 32 bit address space. Modern

machines support much more than that (data points: 16 GB @ 160 Euro for my current machine, standard workstations support 192GB). Do you have many models that wouldn't fit in 192 GB?

...
The kinds of analysis Moose does are not supported efficiently by

standard relational databases at all. They are optimized for a very different access scheme: selecting a very small subset of data and changing that. That means that they are only able to provide reasonable results for datasets that (nearly) fit into memory. In short: they allow you to avoid using a 64 bit Pharo image, and are able to use more cores. What you lose is having to copy data from and to the database and having to generate queries that don't fit the object model well. They are unlikely to provide better performance than a straightforward 64 bit Pharo image would, but can provide a short-term solution.

...
Datawarehouse style databases (Vertical) and object oriented databases

(Gemstone) are probably able to do better. Datawarehouse databases by pregenerating all kinds of cross sections and projections of the data, and oodbs by navigating instead of joining (and Gemstone by being able to use all memory). But even there, a lot of the Moose analysis seem to touch a large part of the model, and the interactivity needed means that disk based models will never become popular.

...
Scaling of Moose is more likely to come from going 64 bit and

distributing the model over multiple vms.

...
Stephan

Moose-dev mailing list Moose-dev@iam.unibe.ch https://www.iam.unibe.ch/mailman/listinfo/moose-dev

-- www.tudorgirba.com

"We cannot reach the flow of things unless we let go."

Moose-dev mailing list Moose-dev@iam.unibe.ch https://www.iam.unibe.ch/mailman/listinfo/moose-dev

Stephan Eggermont

2:50 p.m.

On 22 jan 2011, at 11:10, Mariano Martinez Peck wrote:

...

On Fri, Jan 21, 2011 at 5:12 PM, Tudor Girba tudor.girba@gmail.com wrote: Just a question: Would the new slate disks not alleviate the problem of the disk speed?

For Gemstone, Hernan Wilkinson did some test 1 o 2 years ago, and the difference was....mmmm I dont remember, but I think 40x faster. All the Gemstone migrations and I don't remember what more, was done with that HDD. So, yes, at least in Gemstone it changes a lot.

Latency for reading is 65-75 microseconds. That is a lot better than hard disks. For writing the situation is different. There SSDs can be comparable to a HD (up to a few times faster) when having to clear a page. You'll only notice this after all pages of the SSD have been written. Before that, performance will be spectacular. Documentation on real (long term) write performance of SSDs is hard to get/interpret.

http://www.slideshare.net/matsunobu/ssd-deployment-strategies-for-mysql

And of course SSDs are still more than 100 times slower than ram.

Stephan

Norbert Hartl

24 Jan 24 Jan

11:47 p.m.

On 21.01.2011, at 17:12, Tudor Girba wrote:

...

Just a question: Would the new slate disks not alleviate the problem of the disk speed?

It is hard to tell. Usually you can assume that an OODB is more partitioned (data wise) than a relational database. This leads to more seeks on a hard drive. SSDs being much faster in doing seeks than normal hard drives may speed up a databases like gemstone a lot. Taking gemstone this is only one of the important factors. All object reads go through a shared page cache. This is the biggest performance gaining factor. If you can have a huge shared page cache where your model fits into a large extent than this will be really fast. The more your active object memory doesn't fit into the more disk performance gets important.

Norbert

Stephan Eggermont

28 Jan 28 Jan

11:56 a.m.

On 24 jan 2011, at 23:47, Norbert Hartl wrote:

...

It is hard to tell. Usually you can assume that an OODB is more partitioned (data wise) than a relational database. This leads to more seeks on a hard drive.

Not when using an OO model and following the navigation. Objects that are created together are close together. A simple example: An order has orderlines. In an OODB you would expect the orderlines to be on the same (or next) page on disk. With a RDBMS, you need to access two tables, and have two seeks. A full table scan over orders though is of course slower with an OODB, as more pages need to be loaded.

Stephan

Norbert Hartl

29 Jan 29 Jan

10:17 a.m.

On 28.01.2011, at 11:56, Stephan Eggermont wrote:

...

On 24 jan 2011, at 23:47, Norbert Hartl wrote:

...
It is hard to tell. Usually you can assume that an OODB is more partitioned (data wise) than a relational database. This leads to more seeks on a hard drive.

Not when using an OO model and following the navigation. Objects that are created together are close together. A simple example: An order has orderlines. In an OODB you would expect the orderlines to be on the same (or next) page on disk. With a RDBMS, you need to access two tables, and have two seeks. A full table scan over orders though is of course slower with an OODB, as more pages need to be loaded.

I don't know if we are talking about the same. You are saying "two seeks". You mean database "seeks"? I meant hard disk seeks which are probably a few hundreds when doing a query like you stated above. I think your example is a bit misleading because it is IMHO oversimplified and combines with assumptions I cannot share.

Can you give evidence for what you are stating? I think you cannot assume that objects created together (time wise) are written together. That seems only be valid if you name a certain database technology name. That would be the case if you just add objects and claim new pages if your current one is full. Furthermore I think there is a big distinction between objects that are created together and objects that are used together. The latter one being much more important. In the RDBMSses I know the index and the data is stored in a format that needs less overhead if you query a lot of objects of the same kind. So the load of an RDBMs can be anticipated by a query whereas that it is seldom the case for an OODB. It is exactly like you said. You follow the navigation but then you have to be sure that the data is constructed that way. Saying it would be rather easy to know you have to fetch 3 rows from one page in an RDBMS environment. In an OODB you only know it at the end. Without caching you could read the same page three times. And here gemstone does a create job having a shared page cache that makes this a non-issue.

Norbert

Stéphane Ducasse

21 Jan 21 Jan

10:57 p.m.

thanks stefan. Interesting.

On Jan 21, 2011, at 12:15 PM, Stephan Eggermont wrote:

...

On 20 jan 2011, at 21:32, Tudor Girba wrote:

...
The goal of Moose is to help us analyze data. This means: modeling, mining, measuring, querying, visualizing, browsing etc. To do this, the prerequisite is being able to manipulate the data. Right now, we have all objects in memory. To be able to scale we need database support.

Currently, the models have to fit into 32 bit address space. Modern machines support much more than that (data points: 16 GB @ 160 Euro for my current machine, standard workstations support 192GB). Do you have many models that wouldn't fit in 192 GB?

The kinds of analysis Moose does are not supported efficiently by standard relational databases at all. They are optimized for a very different access scheme: selecting a very small subset of data and changing that. That means that they are only able to provide reasonable results for datasets that (nearly) fit into memory. In short: they allow you to avoid using a 64 bit Pharo image, and are able to use more cores. What you lose is having to copy data from and to the database and having to generate queries that don't fit the object model well. They are unlikely to provide better performance than a straightforward 64 bit Pharo image would, but can provide a short-term solution.

Datawarehouse style databases (Vertical) and object oriented databases (Gemstone) are probably able to do better. Datawarehouse databases by pregenerating all kinds of cross sections and projections of the data, and oodbs by navigating instead of joining (and Gemstone by being able to use all memory). But even there, a lot of the Moose analysis seem to touch a large part of the model, and the interactivity needed means that disk based models will never become popular.

Scaling of Moose is more likely to come from going 64 bit and distributing the model over multiple vms.

Stephan

Moose-dev mailing list Moose-dev@iam.unibe.ch https://www.iam.unibe.ch/mailman/listinfo/moose-dev

Stéphane Ducasse

10:54 p.m.

...

...
Hi guys

in the past marco did a bridge to database for metadescribed but since famix was not regular we could not get famix models in databases. So I would be interested to know if there is an ongoing effort to do that? Because this would be a real plus for FameDescribed models. May be using Glorp and I was thinking that this would be good also to get all the source code of pharo in a db using Ring. And get Torch there too.

Stef,

are there any specific requirements on the db? Is this for long time storage of a lot of things

for pharo I would like to have all the versions of all release so that we can do and decent versions of and compute related changes and others.

...

or just to extend the main memory for a definite amount of time? What will be the main actions to do? Searching for symbols/methods/class names or traversing stuff?

for pharo yes.

...

In case you want to add over time a lot of code and want to have general search capabilities that can be stored in a central location it might be a good idea to ask gemstone to sponsor a full license and install this somewhere in inria.

This was a good possibility even if I have no idea about gemstone

...

Or do I missing the point?

Norbert _______________________________________________ Moose-dev mailing list Moose-dev@iam.unibe.ch https://www.iam.unibe.ch/mailman/listinfo/moose-dev

5406

Age (days ago)

5416

Last active (days ago)

moose-dev@list.inf.unibe.ch

23 comments

8 participants

tags (0)

participants (8)

Esteban Lorenzano
Mariano Martinez Peck
Nicolas Anquetil
Norbert Hartl
Simon Denier
Stephan Eggermont
Stéphane Ducasse
Tudor Girba