On 26/12/16 13:19, Udo Schneider wrote:
The classical way to implement this in our industry is to use a Graph/RDF database and describe everything in terms of relations/tuples (Subject/Predicate/Object).
The more classical way to provide real-time data-analysis is using gemstone technology http://www.gemstone.com/customers/DISA
Defining the (Meta-)Model: I assume I need to define my own subject model representing the entities im interested it with Fame (FM3 is just the Fame classes prefix, correct?). *So I’d end up with something like FAMIX just for my problem domain, correct?*
You could, and there might not be much added value in doing that instead of just creating a domain model directly in smalltalk.
Continuous import: The model continues to grow (in terms of subjects and links) over time. This can happen due to actions (see above) or mass import of new data (e.g. DNS zone files). *Is this kind of “growing” model compatible with Moose or does Moose expect a static model?*
Moose expects a dynamic model, but one that is not so large. With a 64-bit image the size is less of a problem than the processing speed. You might want to distribute the processing over many images, and create different mappings of data sources to images to optimize your queries.
Model in DB/Multiple clients: As you might imagine the model for a forensic case can grow pretty big very fast. Even for a quick investigation of data we’re easily talking about 10+Gigs of data.
By default log files are extremely redundant. Compressing them by using a decent domain model, introducing value objects to reduce data size can be useful.
In addition we might need to work on the same model (although on different aspects) with multiple people at the same time. So storing everything “in-image” is a no-go IMHO. I have pretty good experience with MongoDB/Voyage and assume this could be made working. Especially because I have to define my own model anyway - thus taking care of database operations. *Are there any experiences with keeping the model in a DB and working with a local image “workbench” on it. Esp. with multiple clients?*
Most databases work badly with this kind of problem. You might want to expicitly model the mapping of data to multiple images. You are basically creating a blackboard system
“Lab Notebook”: This one is a bit fuzzy - sorry if the intention is unclear. During a case investigation (IMHO an investigation is also part of “the” model) different findings (e.g. data, visualisations) need to be documented. My impression from Moose is that you “play” with the data (in inspectors) until you come to an conclusion based on an assessment. Once you have the result you can decide and tackle the next problem. In my context I’d need to document not only the result but also the way taken to achieve it. Maybe something simple like accumulating the research way through all inspector panes would be enough. I’m not quite sure though. *So how do I document different “results (e.g. data/visualizations) including the “way” they were achieved?*
This probably needs a double registration. From a governance p.o.v. you need the actual process steps, and you also need a log of how you could have reached your conclusions with perfect hindsight http://www.ics.uci.edu/~taylor/classes/121/IEEE86_Parnas_Clement.pdf
All in all I’m pretty impressed how Moose might fit on my problem domain. Especially using it as a “workbench” to explore/investigate the data is something which is much more pleasure to work with than digging through various SQL/noSQL/Splunk sources with different tools. Not to mention that the visualizations I get are simply astounding and can be used 1:1 on the reports we generate. However ignoring my first impression I’m not sure whether the data model I need to work on is “compatible” with the expectations Moose had/was designed with. Hence the questions.
Moose supports creating your own tools very well. We did a data migration with Moose, and created 60+ glamour browsers for it, and daily visualisations of the process. We managed to keep things in a single image, though we ran out of memory several times. To me it sounds like you'd need to develop something to manage lots of images, starting new ones on demand. That fits well with our Pharo vision.
Stephan