On 26/12/16 13:19, Udo Schneider wrote:
The classical way to
implement this in our industry is to use a Graph/RDF database and
describe everything in terms of relations/tuples
(Subject/Predicate/Object).
The more classical way to provide real-time data-analysis is using
gemstone technology
http://www.gemstone.com/customers/DISA
Defining the (Meta-)Model: I assume I need to define
my own subject
model representing the entities im interested it with Fame (FM3 is
just the Fame classes prefix, correct?). *So I’d end up with
something like FAMIX just for my problem domain, correct?*
You could, and there might not be much added value in doing
that instead of just creating a domain model directly in smalltalk.
Continuous import: The model continues to grow (in
terms of subjects
and links) over time. This can happen due to actions (see above) or
mass import of new data (e.g. DNS zone files). *Is this kind of
“growing” model compatible with Moose or does Moose expect a static
model?*
Moose expects a dynamic model, but one that is not so large.
With a 64-bit image the size is less of a problem than the
processing speed. You might want to distribute the processing
over many images, and create different mappings of data sources
to images to optimize your queries.
Model in DB/Multiple clients: As you might imagine the
model for a
forensic case can grow pretty big very fast. Even for a quick
investigation of data we’re easily talking about 10+Gigs of data.
By default log files are extremely redundant. Compressing them by
using a decent domain model, introducing value objects to reduce
data size can be useful.
In
addition we might need to work on the same model (although on
different aspects) with multiple people at the same time. So storing
everything “in-image” is a no-go IMHO. I have pretty good experience
with MongoDB/Voyage and assume this could be made working. Especially
because I have to define my own model anyway - thus taking care of
database operations. *Are there any experiences with keeping the
model in a DB and working with a local image “workbench” on it. Esp.
with multiple clients?*
Most databases work badly with this kind of problem. You might want to
expicitly model the mapping of data to multiple images. You are
basically creating a blackboard system
“Lab Notebook”: This one is a bit fuzzy - sorry if the intention is
unclear. During a case investigation (IMHO an investigation is also
part of “the” model) different findings (e.g. data, visualisations)
need to be documented. My impression from Moose is that you “play”
with the data (in inspectors) until you come to an conclusion based
on an assessment. Once you have the result you can decide and tackle
the next problem. In my context I’d need to document not only the
result but also the way taken to achieve it. Maybe something simple
like accumulating the research way through all inspector panes would
be enough. I’m not quite sure though. *So how do I document different
“results (e.g. data/visualizations) including the “way” they were
achieved?*
This probably needs a double registration. From a governance p.o.v.
you need the actual process steps, and you also need a log of how you
could have reached your conclusions with perfect hindsight
http://www.ics.uci.edu/~taylor/classes/121/IEEE86_Parnas_Clement.pdf
All in all I’m pretty impressed how Moose might fit on
my problem
domain. Especially using it as a “workbench” to explore/investigate
the data is something which is much more pleasure to work with than
digging through various SQL/noSQL/Splunk sources with different
tools. Not to mention that the visualizations I get are simply
astounding and can be used 1:1 on the reports we generate. However
ignoring my first impression I’m not sure whether the data model I
need to work on is “compatible” with the expectations Moose had/was
designed with. Hence the questions.
Moose supports creating your own tools very well. We did a data
migration with Moose, and created 60+ glamour browsers for it,
and daily visualisations of the process. We managed to keep things
in a single image, though we ran out of memory several times.
To me it sounds like you'd need to develop something to manage lots
of images, starting new ones on demand. That fits well with our
Pharo vision.
Stephan