Hello Moose !
With the current memory limit of Pharo,
and the size of the generated moose models being potentially huge,
maybe some of you already though about (or even experimented) persistence solutions with query mechanisms that would instantiate famix objects only “on demand”,
in order to only have part of a model in memory when working on a specific area.
If so, I would be really interested to hear about (or play with) it :)
At first look, I see that there is a MooseGroupStorage class.
This kind of object answers to some usual collection messages (add, remove, select, detect, .. ).
I guess that when we perform queries over a moose model,
when we add or remove entity objects,
we end up using this protocol.
So, if I wanted to implement a database persistence solution for moose,
my first feeling would be to implement a specific kind of “MooseGroupStorage”,
and to plug there a communication layer with a database.
Does it make sense ?
I have not played with moose since a long time
(but I am back to play with it a lot more :))
and my vision on things may be naive.
So do not hesitate to tell me if what I am saying sounds crazy,
and to push me back on the right path !
Does anyone already thought about solutions to deal with memory limits when generating big moose models ?
Hi Cyrille, Long time no see!
On 30/03/17 10:07, Cyrille Delaunay wrote:
With the current memory limit of Pharo and the size of the generated moose models being potentially huge,
maybe some of you already though about (or even experimented) persistence solutions with query mechanisms that would instantiate famix objects only “on demand”,
in order to only have part of a model in memory when working on a specific area.
If so, I would be really interested to hear about (or play with) it :)
The current FAMIX based models are not suitable for large models. The inheritance based modeling results in very large, nearly empty objects.
Moose models tend to be highly connected and tend to be used using badly predictable access patterns. That makes "standard databases" a bad match, especially if you cannot push querying to them.
We are very close to having 64bit Moose everywhere, shifting the problem from size of the model directly to speed. As the VM uses only one native thread and 8-thread machines are everywhere, the best speed-up should be expected from splitting the model over multiple pharo images, and possibly over multiple machines.
Stephan
Hi stephan,
thanks for your thoughts
(further comments below)
On 30/03/2017 13:31, Stephan Eggermont wrote:
Hi Cyrille, Long time no see!
On 30/03/17 10:07, Cyrille Delaunay wrote:
With the current memory limit of Pharo and the size of the generated moose models being potentially huge,
maybe some of you already though about (or even experimented) persistence solutions with query mechanisms that would instantiate famix objects only “on demand”,
in order to only have part of a model in memory when working on a specific area.
If so, I would be really interested to hear about (or play with) it :)
The current FAMIX based models are not suitable for large models. The inheritance based modeling results in very large, nearly empty objects.
Moose models tend to be highly connected and tend to be used using badly predictable access patterns. That makes "standard databases" a bad match, especially if you cannot push querying to them.
We are very close to having 64bit Moose everywhere, shifting the problem from size of the model directly to speed.
"very close" seems a bit optimistic. For example, it will take some time for windows yet The problem is that Synectique is already having difficulties right now and is looking for shorter term solution(s)
As the VM uses only one native thread and 8-thread machines are everywhere, the best speed-up should be expected from splitting the model over multiple pharo images, and possibly over multiple machines.
interesting idea, I am having some difficult seeing how to split a model in several parts that would have to link somehow one to the other. Do you have any further thoughts on this point?
nicolas
On Thu, Mar 30, 2017 at 07:15 Nicolas Anquetil nicolas.anquetil@inria.fr wrote:
Hi stephan,
thanks for your thoughts
(further comments below)
On 30/03/2017 13:31, Stephan Eggermont wrote:
Hi Cyrille, Long time no see!
On 30/03/17 10:07, Cyrille Delaunay wrote:
With the current memory limit of Pharo and the size of the generated moose models being potentially huge,
maybe some of you already though about (or even experimented) persistence solutions with query mechanisms that would instantiate famix objects only “on demand”,
in order to only have part of a model in memory when working on a specific area.
If so, I would be really interested to hear about (or play with) it :)
The current FAMIX based models are not suitable for large models. The inheritance based modeling results in very large, nearly empty objects.
Moose models tend to be highly connected and tend to be used using badly predictable access patterns. That makes "standard databases" a bad match, especially if you cannot push querying to them.
We are very close to having 64bit Moose everywhere, shifting the problem from size of the model directly to speed.
"very close" seems a bit optimistic. For example, it will take some time for windows yet The problem is that Synectique is already having difficulties right now and is looking for shorter term solution(s)
As the VM uses only one native thread and 8-thread machines are everywhere, the best speed-up should be expected from splitting the model over multiple pharo images, and possibly over multiple machines.
interesting idea, I am having some difficult seeing how to split a model in several parts that would have to link somehow one to the other.
how do they link
Do you have any further thoughts on this point?
nicolas
-- Nicolas Anquetil -- MCF (HDR) Project-Team RMod
Moose-dev mailing list Moose-dev@list.inf.unibe.ch https://www.list.inf.unibe.ch/listinfo/moose-dev
On 30/03/2017 16:39, Kjell Godo wrote:
On Thu, Mar 30, 2017 at 07:15 Nicolas Anquetil <nicolas.anquetil@inria.fr mailto:nicolas.anquetil@inria.fr> wrote:
Hi stephan, thanks for your thoughts (further comments below) On 30/03/2017 13:31, Stephan Eggermont wrote: > Hi Cyrille, > Long time no see! > > On 30/03/17 10:07, Cyrille Delaunay wrote: >> With the current memory limit of Pharo >> and the size of the generated moose models being potentially huge, >> >> maybe some of you already though about (or even experimented) >> persistence >> solutions with query mechanisms that would instantiate famix objects >> only “on demand”, >> >> in order to only have part of a model in memory when working on a >> specific area. >> >> If so, I would be really interested to hear about (or play with) it :) > The current FAMIX based models are not suitable for large models. > The inheritance based modeling results in very large, nearly empty > objects. > > Moose models tend to be highly connected and tend to be used using badly > predictable access patterns. That makes "standard databases" a bad match, > especially if you cannot push querying to them. > > We are very close to having 64bit Moose everywhere, shifting the > problem from > size of the model directly to speed. "very close" seems a bit optimistic. For example, it will take some time for windows yet The problem is that Synectique is already having difficulties right now and is looking for shorter term solution(s) > As the VM uses only one native thread and > 8-thread machines are everywhere, the best speed-up should be expected > from > splitting the model over multiple pharo images, and possibly over > multiple machines. > interesting idea, I am having some difficult seeing how to split a model in several parts that would have to link somehow one to the other.
how do they link
well a model is a big graph where all entities (transitively) relate to all other entities, so splitting the model over several pharo images implies having entities in one image referencing other entities in other images. Not at all impossible, but this would be an interesting problem of engineering
nicolas
Do you have any further thoughts on this point? nicolas
2017-03-30 16:54 GMT+02:00 Nicolas Anquetil nicolas.anquetil@inria.fr:
On 30/03/2017 16:39, Kjell Godo wrote:
On Thu, Mar 30, 2017 at 07:15 Nicolas Anquetil nicolas.anquetil@inria.fr wrote:
Hi stephan,
thanks for your thoughts
(further comments below)
On 30/03/2017 13:31, Stephan Eggermont wrote:
Hi Cyrille, Long time no see!
On 30/03/17 10:07, Cyrille Delaunay wrote:
With the current memory limit of Pharo and the size of the generated moose models being potentially huge,
maybe some of you already though about (or even experimented) persistence solutions with query mechanisms that would instantiate famix objects only “on demand”,
in order to only have part of a model in memory when working on a specific area.
If so, I would be really interested to hear about (or play with) it :)
The current FAMIX based models are not suitable for large models. The inheritance based modeling results in very large, nearly empty objects.
Moose models tend to be highly connected and tend to be used using badly predictable access patterns. That makes "standard databases" a bad
match,
especially if you cannot push querying to them.
We are very close to having 64bit Moose everywhere, shifting the problem from size of the model directly to speed.
"very close" seems a bit optimistic. For example, it will take some time for windows yet The problem is that Synectique is already having difficulties right now and is looking for shorter term solution(s)
As the VM uses only one native thread and 8-thread machines are everywhere, the best speed-up should be expected from splitting the model over multiple pharo images, and possibly over multiple machines.
interesting idea, I am having some difficult seeing how to split a model in several parts that would have to link somehow one to the other.
how do they link
well a model is a big graph where all entities (transitively) relate to all other entities, so splitting the model over several pharo images implies having entities in one image referencing other entities in other images. Not at all impossible, but this would be an interesting problem of engineering
With Onil Goubier, we tried to publish a paper describing that mechanism in Smalltalk in 1998, where the mechanism to establish links between images was unified with the one storing the objects on disk. It was rejected, but the reviews were encouraging.
The main engineering difficulty we saw back then was GC-ing over that thing.
Regards,
Thierry
nicolas
Do you have any further thoughts on this point?
nicolas
-- Nicolas Anquetil -- MCF (HDR) Project-Team RMod
Moose-dev mailing list Moose-dev@list.inf.unibe.ch https://www.list.inf.unibe.ch/listinfo/moose-dev
On 30/03/17 17:02, Thierry Goubier wrote:
With Onil Goubier, we tried to publish a paper describing that > mechanism in Smalltalk in 1998, where the mechanism to establish >
links between images was unified with the one storing the objects on > disk. It was rejected, but the reviews were encouraging. > > The main engineering difficulty we saw back then was GC-ing over that > thing. Is that paper available somewhere?
Stephan
+1
On 30/03/2017 17:06, Stephan Eggermont wrote:
On 30/03/17 17:02, Thierry Goubier wrote:
With Onil Goubier, we tried to publish a paper describing that > mechanism in Smalltalk in 1998, where the mechanism to establish >
links between images was unified with the one storing the objects on > disk. It was rejected, but the reviews were encouraging. > > The main engineering difficulty we saw back then was GC-ing over that > thing. Is that paper available somewhere?
Stephan
Moose-dev mailing list Moose-dev@list.inf.unibe.ch https://www.list.inf.unibe.ch/listinfo/moose-dev
2017-03-30 17:06 GMT+02:00 Stephan Eggermont stephan@stack.nl:
On 30/03/17 17:02, Thierry Goubier wrote:
With Onil Goubier, we tried to publish a paper describing that >
mechanism in Smalltalk in 1998, where the mechanism to establish > links between images was unified with the one storing the objects on > disk. It was rejected, but the reviews were encouraging. > > The main engineering difficulty we saw back then was GC-ing over that > thing. Is that paper available somewhere?
I suspect I may have a backup of that on a Sun MD drive I haven't been able to read since at least mid-1998 :( So the answer is no.
But the core idea was simple: use proxy objects, and when you touch the proxy, either loads the object from disk or forward it a message over the network. Kind of what you would do on a distributed virtual shared memory implementation combined with persistent storage. Use a page-based mechanism for loading / unloading objects so as to reduce costs.
There is a guy in my lab working on DVSM; maybe that would be an interesting subject.
Thierry
Stephan
Moose-dev mailing list Moose-dev@list.inf.unibe.ch https://www.list.inf.unibe.ch/listinfo/moose-dev
On 30/03/17 16:15, Nicolas Anquetil wrote:
"very close" seems a bit optimistic. For example, it will take some > time for windows yet The problem is that Synectique is already >
having difficulties right now and is looking for shorter term > solution(s)
Short term would mean run a 64-bit linux in a vm or with a remote desktop.
As the VM uses only one native thread and 8-thread machines are >> everywhere, the best speed-up should be expected from splitting >>
the model over multiple pharo images, and possibly over multiple >> machines. >> > interesting idea, I am having some difficult seeing how to split a > model in several parts that would have to link somehow one to the > other. Do you have any further thoughts on this point?
Splitting a model is indeed the interesting aspect. Either do it automatic based on usage, or use a heuristic. The navigation can be made distribution-aware to avoid doing only network-calls. Easiest is to make a hierarchical model that fits well with the subject, e.g. package-based. So everything inside the package is guaranteed to be in the image for some set of packages, and everything else is remote pointer. If you have enough images, you can have different combinations of packages in different images, and some mechanism to determine if you received a full answer yet.
Stephan