Big models, memory and persistence

List overview All Threads
Download

newer

older

Deprecation of #children in...

Re: [Pharo-users] Smacc debugger

Cyrille Delaunay

30 Mar 2017 30 Mar '17

10:07 a.m.

Hello Moose !

With the current memory limit of Pharo,

and the size of the generated moose models being potentially huge,

maybe some of you already though about (or even experimented) persistence solutions with query mechanisms that would instantiate famix objects only “on demand”,

in order to only have part of a model in memory when working on a specific area.

If so, I would be really interested to hear about (or play with) it :)

At first look, I see that there is a MooseGroupStorage class.

This kind of object answers to some usual collection messages (add, remove, select, detect, .. ).

I guess that when we perform queries over a moose model,

when we add or remove entity objects,

we end up using this protocol.

So, if I wanted to implement a database persistence solution for moose,

my first feeling would be to implement a specific kind of “MooseGroupStorage”,

and to plug there a communication layer with a database.

Does it make sense ?

I have not played with moose since a long time

(but I am back to play with it a lot more :))

and my vision on things may be naive.

So do not hesitate to tell me if what I am saying sounds crazy,

and to push me back on the right path !

Does anyone already thought about solutions to deal with memory limits when generating big moose models ?

-- Cyrille Delaunay

Attachments:

attachment.htm (text/html — 2.0 KB)

Show replies by date

Stephan Eggermont

30 Mar 30 Mar

1:31 p.m.

Hi Cyrille, Long time no see!

On 30/03/17 10:07, Cyrille Delaunay wrote:

...

With the current memory limit of Pharo and the size of the generated moose models being potentially huge,

maybe some of you already though about (or even experimented) persistence solutions with query mechanisms that would instantiate famix objects only “on demand”,

in order to only have part of a model in memory when working on a specific area.

If so, I would be really interested to hear about (or play with) it :)

The current FAMIX based models are not suitable for large models. The inheritance based modeling results in very large, nearly empty objects.

Moose models tend to be highly connected and tend to be used using badly predictable access patterns. That makes "standard databases" a bad match, especially if you cannot push querying to them.

We are very close to having 64bit Moose everywhere, shifting the problem from size of the model directly to speed. As the VM uses only one native thread and 8-thread machines are everywhere, the best speed-up should be expected from splitting the model over multiple pharo images, and possibly over multiple machines.

Stephan

Nicolas Anquetil

4:15 p.m.

Hi stephan,

thanks for your thoughts

(further comments below)

On 30/03/2017 13:31, Stephan Eggermont wrote:

...

Hi Cyrille, Long time no see!

On 30/03/17 10:07, Cyrille Delaunay wrote:

...
With the current memory limit of Pharo and the size of the generated moose models being potentially huge,

maybe some of you already though about (or even experimented) persistence solutions with query mechanisms that would instantiate famix objects only “on demand”,

in order to only have part of a model in memory when working on a specific area.

If so, I would be really interested to hear about (or play with) it :)

The current FAMIX based models are not suitable for large models. The inheritance based modeling results in very large, nearly empty objects.

Moose models tend to be highly connected and tend to be used using badly predictable access patterns. That makes "standard databases" a bad match, especially if you cannot push querying to them.

We are very close to having 64bit Moose everywhere, shifting the problem from size of the model directly to speed.

"very close" seems a bit optimistic. For example, it will take some time for windows yet The problem is that Synectique is already having difficulties right now and is looking for shorter term solution(s)

...

As the VM uses only one native thread and 8-thread machines are everywhere, the best speed-up should be expected from splitting the model over multiple pharo images, and possibly over multiple machines.

interesting idea, I am having some difficult seeing how to split a model in several parts that would have to link somehow one to the other. Do you have any further thoughts on this point?

nicolas

-- Nicolas Anquetil -- MCF (HDR) Project-Team RMod

Kjell Godo

4:39 p.m.

On Thu, Mar 30, 2017 at 07:15 Nicolas Anquetil nicolas.anquetil@inria.fr wrote:

...

Hi stephan,

thanks for your thoughts

(further comments below)

On 30/03/2017 13:31, Stephan Eggermont wrote:

...
Hi Cyrille, Long time no see!

On 30/03/17 10:07, Cyrille Delaunay wrote:

...
With the current memory limit of Pharo and the size of the generated moose models being potentially huge,

maybe some of you already though about (or even experimented) persistence solutions with query mechanisms that would instantiate famix objects only “on demand”,

in order to only have part of a model in memory when working on a specific area.

If so, I would be really interested to hear about (or play with) it :)

The current FAMIX based models are not suitable for large models. The inheritance based modeling results in very large, nearly empty objects.

Moose models tend to be highly connected and tend to be used using badly predictable access patterns. That makes "standard databases" a bad match, especially if you cannot push querying to them.

We are very close to having 64bit Moose everywhere, shifting the problem from size of the model directly to speed.

"very close" seems a bit optimistic. For example, it will take some time for windows yet The problem is that Synectique is already having difficulties right now and is looking for shorter term solution(s)

...
As the VM uses only one native thread and 8-thread machines are everywhere, the best speed-up should be expected from splitting the model over multiple pharo images, and possibly over multiple machines.

interesting idea, I am having some difficult seeing how to split a model in several parts that would have to link somehow one to the other.

how do they link

...

Do you have any further thoughts on this point?

nicolas

-- Nicolas Anquetil -- MCF (HDR) Project-Team RMod

Moose-dev mailing list Moose-dev@list.inf.unibe.ch https://www.list.inf.unibe.ch/listinfo/moose-dev

Nicolas Anquetil

4:54 p.m.

On 30/03/2017 16:39, Kjell Godo wrote:

...

On Thu, Mar 30, 2017 at 07:15 Nicolas Anquetil <nicolas.anquetil@inria.fr mailto:nicolas.anquetil@inria.fr> wrote:

Hi stephan,

thanks for your thoughts

(further comments below)


On 30/03/2017 13:31, Stephan Eggermont wrote:
> Hi Cyrille,
> Long time no see!
>
> On 30/03/17 10:07, Cyrille Delaunay wrote:
>> With the current memory limit of Pharo
>> and the size of the generated moose models being potentially huge,
>>
>> maybe some of you already though about (or even experimented)
>> persistence
>> solutions with query mechanisms that would instantiate famix
objects
>> only “on demand”,
>>
>> in order to only have part of a model in memory when working on a
>> specific area.
>>
>> If so, I would be really interested to hear about (or play
with) it :)
> The current FAMIX based models are not suitable for large models.
> The inheritance based modeling results in very large, nearly empty
> objects.
>
> Moose models tend to be highly connected and tend to be used
using badly
> predictable access patterns. That makes "standard databases" a
bad match,
> especially if you cannot push querying to them.
>
> We are very close to having 64bit Moose everywhere, shifting the
> problem from
> size of the model directly to speed.
"very close" seems a bit optimistic. For example, it will take
some time
for windows yet
The problem is that Synectique is already having difficulties
right now
and is looking for shorter term solution(s)

> As the VM uses only one native thread and
> 8-thread machines are everywhere, the best speed-up should be
expected
> from
> splitting the model over multiple pharo images, and possibly over
> multiple machines.
>
interesting idea,
I am having some difficult seeing how to split a model in several
parts
that would have to link somehow one to the other.

how do they link

well a model is a big graph where all entities (transitively) relate to all other entities, so splitting the model over several pharo images implies having entities in one image referencing other entities in other images. Not at all impossible, but this would be an interesting problem of engineering

nicolas

...

Do you have any further thoughts on this point?

nicolas

-- Nicolas Anquetil -- MCF (HDR) Project-Team RMod

Thierry Goubier

5:02 p.m.

2017-03-30 16:54 GMT+02:00 Nicolas Anquetil nicolas.anquetil@inria.fr:

...

On 30/03/2017 16:39, Kjell Godo wrote:

On Thu, Mar 30, 2017 at 07:15 Nicolas Anquetil nicolas.anquetil@inria.fr wrote:

...
Hi stephan,

thanks for your thoughts

(further comments below)

On 30/03/2017 13:31, Stephan Eggermont wrote:

...
Hi Cyrille, Long time no see!

On 30/03/17 10:07, Cyrille Delaunay wrote:

...
With the current memory limit of Pharo and the size of the generated moose models being potentially huge,

maybe some of you already though about (or even experimented) persistence solutions with query mechanisms that would instantiate famix objects only “on demand”,

in order to only have part of a model in memory when working on a specific area.

If so, I would be really interested to hear about (or play with) it :)

The current FAMIX based models are not suitable for large models. The inheritance based modeling results in very large, nearly empty objects.

Moose models tend to be highly connected and tend to be used using badly predictable access patterns. That makes "standard databases" a bad

match,

...
especially if you cannot push querying to them.

We are very close to having 64bit Moose everywhere, shifting the problem from size of the model directly to speed.

"very close" seems a bit optimistic. For example, it will take some time for windows yet The problem is that Synectique is already having difficulties right now and is looking for shorter term solution(s)

...
As the VM uses only one native thread and 8-thread machines are everywhere, the best speed-up should be expected from splitting the model over multiple pharo images, and possibly over multiple machines.

interesting idea, I am having some difficult seeing how to split a model in several parts that would have to link somehow one to the other.

how do they link

well a model is a big graph where all entities (transitively) relate to all other entities, so splitting the model over several pharo images implies having entities in one image referencing other entities in other images. Not at all impossible, but this would be an interesting problem of engineering

With Onil Goubier, we tried to publish a paper describing that mechanism in Smalltalk in 1998, where the mechanism to establish links between images was unified with the one storing the objects on disk. It was rejected, but the reviews were encouraging.

The main engineering difficulty we saw back then was GC-ing over that thing.

Regards,

Thierry

...

nicolas

...
Do you have any further thoughts on this point?

nicolas

-- Nicolas Anquetil -- MCF (HDR) Project-Team RMod

Moose-dev mailing list Moose-dev@list.inf.unibe.ch https://www.list.inf.unibe.ch/listinfo/moose-dev

Stephan Eggermont

5:06 p.m.

On 30/03/17 17:02, Thierry Goubier wrote:

...

With Onil Goubier, we tried to publish a paper describing that > mechanism in Smalltalk in 1998, where the mechanism to establish >

links between images was unified with the one storing the objects on > disk. It was rejected, but the reviews were encouraging. > > The main engineering difficulty we saw back then was GC-ing over that > thing. Is that paper available somewhere?

Stephan

Nicolas Anquetil

5:08 p.m.

On 30/03/2017 17:06, Stephan Eggermont wrote:

...

On 30/03/17 17:02, Thierry Goubier wrote:

...
With Onil Goubier, we tried to publish a paper describing that > mechanism in Smalltalk in 1998, where the mechanism to establish >

links between images was unified with the one storing the objects on > disk. It was rejected, but the reviews were encouraging. > > The main engineering difficulty we saw back then was GC-ing over that > thing. Is that paper available somewhere?

Stephan

Moose-dev mailing list Moose-dev@list.inf.unibe.ch https://www.list.inf.unibe.ch/listinfo/moose-dev

-- Nicolas Anquetil -- MCF (HDR) Project-Team RMod

Thierry Goubier

5:15 p.m.

2017-03-30 17:06 GMT+02:00 Stephan Eggermont stephan@stack.nl:

...

On 30/03/17 17:02, Thierry Goubier wrote:

...
With Onil Goubier, we tried to publish a paper describing that >

mechanism in Smalltalk in 1998, where the mechanism to establish > links between images was unified with the one storing the objects on > disk. It was rejected, but the reviews were encouraging. > > The main engineering difficulty we saw back then was GC-ing over that > thing. Is that paper available somewhere?

I suspect I may have a backup of that on a Sun MD drive I haven't been able to read since at least mid-1998 :( So the answer is no.

But the core idea was simple: use proxy objects, and when you touch the proxy, either loads the object from disk or forward it a message over the network. Kind of what you would do on a distributed virtual shared memory implementation combined with persistent storage. Use a page-based mechanism for loading / unloading objects so as to reduce costs.

There is a guy in my lab working on DVSM; maybe that would be an interesting subject.

Thierry

...

Stephan

Moose-dev mailing list Moose-dev@list.inf.unibe.ch https://www.list.inf.unibe.ch/listinfo/moose-dev

Stephan Eggermont

5:36 p.m.

On 30/03/17 16:15, Nicolas Anquetil wrote:

...

"very close" seems a bit optimistic. For example, it will take some > time for windows yet The problem is that Synectique is already >

having difficulties right now and is looking for shorter term > solution(s)

Short term would mean run a 64-bit linux in a vm or with a remote desktop.

...

...
As the VM uses only one native thread and 8-thread machines are >> everywhere, the best speed-up should be expected from splitting >>

the model over multiple pharo images, and possibly over multiple >> machines. >> > interesting idea, I am having some difficult seeing how to split a > model in several parts that would have to link somehow one to the > other. Do you have any further thoughts on this point?

Splitting a model is indeed the interesting aspect. Either do it automatic based on usage, or use a heuristic. The navigation can be made distribution-aware to avoid doing only network-calls. Easiest is to make a hierarchical model that fits well with the subject, e.g. package-based. So everything inside the package is guaranteed to be in the image for some set of packages, and everything else is remote pointer. If you have enough images, you can have different combinations of packages in different images, and some mechanism to determine if you received a full answer yet.

Stephan

3178

Age (days ago)

3178

Last active (days ago)

moose-dev@list.inf.unibe.ch

9 comments

5 participants

tags (0)

participants (5)

Cyrille Delaunay
Kjell Godo
Nicolas Anquetil
Stephan Eggermont
Thierry Goubier