Thanks everyone for the suggestions. I will try to look into them.
Cheers, Doru
On 5 Sep 2011, at 07:04, Lukas Renggli wrote:
Stanford has many large graph-like datasets to download: social networks, web graphs, peer-to-peer networks, shopping networks, road networks, wikipedia networks, etc.
http://snap.stanford.edu/data/
Lukas
On 5 September 2011 06:24, Guillermo Polito guillermopolito@gmail.com wrote:
I've used as an example of datamining a dataset about car accidents we got from here http://www.nhtsa.gov/NASS .
Hope it helps :) Guille
On Sun, Sep 4, 2011 at 11:58 PM, Hernán Morales Durand hernan.morales@gmail.com wrote:
2011/9/4 Tudor Girba tudor@tudorgirba.com:
Hi,
Thanks, but I am looking for data sets that contained graphs of entities with properties, rather then numbers.
Oh, that was just the top of the iceberg, look at cellular interaction networks like protein-protein interactions, relations between genes and QTLs, phylogenetic trees, gene ontology classifications, etc. probably they have more "properties" and relationships than you ever imagined. Check for example http://www.nature.com/msb/journal/v3/n1/fig_tab/msb4100166_F2.html or the one from the Human Interactome here http://www.blog.republicofmath.com/archives/2005, or http://www.biomedcentral.com/content/supplementary/1471-2164-9-96-s6.jpeg for Gene Ontology "objects". Also PubMed have thousands of related papers about real case studies.
To give an idea, an example would be a set of persons that have multiple properties, such as age or function, and have various kinds of relationships with other persons. Ideally, it should be something containing some more than 5-10 types of entities.
Cheers, Doru
On 5 Sep 2011, at 02:51, Hernán Morales Durand wrote:
Hi Tudor,
I don't know if you want few data sets or many ones, but for each case I found "Selecting genes with dissimilar discrimination strength for sample class prediction", report case studies in two real cancer microarray datasets (CAR and LUNG) for gene expression profiling. The Lymphoma case study in humans contains 30 case study genes, you may read about it in "Examples and Applications of Fuzzy Measure Similarity Using GO Terms". In general you can find many case studies from SNP data experiments doing all kind of predictions, for example from protein structure prediction studies that use LiveBench data sets (http://en.wikipedia.org/wiki/LiveBench), search for "Consensus fold recognition by predicting model quality". If you need more or something more specific just ask :) Cheers,
Hernán
2011/9/4 Tudor Girba tudor@tudorgirba.com:
Hi,
To show how Moose can support the analysis of various data sets, I am looking for a case study containing a complex data structure that does not represent a software system, and a set of questions associated with it. Ideally, the data should be freely available and it should contain a set of entities with various properties and various relationships with other entities.
Anyone has any idea regarding such a case study?
Cheers, Doru
-- www.tudorgirba.com
"There are no old things, there are only old ways of looking at them."
-- www.tudorgirba.com
"Every successful trip needs a suitable vehicle."
-- Lukas Renggli www.lukas-renggli.ch
-- www.tudorgirba.com
"Reasonable is what we are accustomed with."