Call for ideas on the proposed "Resources" section

Submitted by hilario on Thu, 05/01/2012 - 11:23

 

We would like to develop a "Resources" section that will gather information about, and links to, all available resources concerning data mining ontologies and related topics. The goal is to make the DMO Foundry portal a one-stop shop for  all information concerning data mining ontologies, their use cases, and potentially other types of ontologies (e.g. domain ontologies) that have been used to support the data mining process. This section should be a collaborative undertaking; the purpose of this post is to initiate a public brainstorming on the scope and content of this related resources section. 

Among the preliminary ideas are: a page on existing data mining ontologies, potentially classified according to their purposes and the competency questions they are meant to answer; a page on publications; a page on related topics like DM experiment databases and semantic data/meta-mining. Please add your suggestions and other thoughts on the types of resources and related topics that you would like to see included in this new section. I look forward to a lively discussion.

  I agree with the previous

 

I agree with the previous suggestions on the contents of the "Resources" section. In addition it would be useful to have a dataset repository (possibly with artificial data) where "prototype" scenarios are replicated.   For example, we could have one simple dataset linearly separable; another with the need for more complex boundaries; one where locality is needed to separate different regions of the input space, etc. We need a repository where we can compare different experiments, and provide an explanation of the effects of different strategies on the data.

 

 

On the suggested dataset repository

This is an excellent idea. There are a few dataset repositories around, but none in which datasets are collected or generated for the purpose of investigating specific DM issues or data/algorithm/model characteristics (e.g., linear vs more complex decision boundaries). Something similar has been attempted in the past, but on a very small scale;  cf. P. A. Benedict's DGP2 (http://archive.ics.uci.edu/ml/datasets/DGP2+-+The+Second+Data+Generation...). This kind of research requires precisely the kind of massive collaborative effort that we can muster through the DMO Foundry. For instance, we can initiate a discussion to identify a set of issues or topics that we want to study, and agree on a data generation and DM experimentation workplan for each. The work can then be assigned as student projects by those of us who give ML/DM courses.

To align this dataset repository with the goals of the DMO Foundry, I suggest that we apply the e-LICO Data Characterization Tool (based on the Metal DCT developed by Carlos Soares) to all these datasets and gather the results in an RDF triple store (let's call it DCHARS-DB) that can be queried using SPARQL. Since we'll surely conduct DM experiments on these datasets, we can also build the corresponding DM Experiments Database (DMEX-DB), where each record describes the application of a DM workflow to a dataset in DCHARS-DB, the algorithms involved and their characteristics, and the assessment of the learned hypothesis (model or pattern set). Thus, on top of the dataset repository you suggested, we can also build a meta-data repository that can be used as input data for different meta-analytical or meta-learning investigations. These two meta-level databases will be structured based on DMOP, whose data mining content can also be brought to bear in other ways on the meta-learning process.

In short, a full implementation of your suggestion will gather in the DMO Foundry all the needed ingredients for semantic or ontology-based meta-learning/mining. Let's hope we find enough committed volunteers to push it through.

User login