Data Mining OPtimization (DMOP) Ontology

The overall goal of DMOP is to provide support for all decision-making steps that have an impact on the outcome of the knowledge discovery process. It focuses specifically on tasks (e.g., learning, feature extraction) whose accomplishment requires non-trivial search in the space of alternative methods. For each such task, the decision process involves two steps that can be guided by prior knowledge from the ontology: algorithm selection and model selection. While data mining practitioners can profitably consult dmop to perform "manual" algorithm and model selection, the ontology has been designed to automate these two operations. Thus a third use of DMOP is meta-learning, i.e., the analysis of meta-data describing learning episodes in view of extracting patterns and rules to improve algorithm and model selection. Finally, generalizing meta-learning to the complete DM process, DMOP's most innovative objective is to support meta-mining or the meta-analysis of complete data mining processes in order to extract workflow patterns that are predictive of good or bad performance. In short, DMOP charts the higher-order feature space in which meta-learning and meta-mining can take place.

Structure of DMOP

DMOP's overall structure is shown in the figure below.

Like any ontology, DMOP consists of a TBox (terminological box) and an ABox (assertional box). The TBox provides a comprehensive conceptual framework for describing core data mining entities such as tasks, algorithms, models, workflows, and their relationships (see the figure below). The ABox, or the DM Knowledge Base (DMKB) , is a compendium of facts concerning individual algorithms and their implementations in widely used data mining software such as RapidMiner and Weka. Also shown in the figure, though not strictly speaking an integral part of DMOP, are DM experiment databases whose schemas are designed using concept and property definitions in DMOP.

 

Use of DMOP in semantic meta-mining

DMOP has been used to support ontology-based meta-learning, aka semantic meta-learning. However, it displays its full potential in semantic meta-mining, defined as ontology-based, process-oriented meta-learning. The overall context is the e-LICO virtual lab (figure below), which features an Intelligent Discovery Assistant (IDA) composed of an AI planner and a probabilistic ranker.

Given a user goal and dataset, the AI planner uses brute-force search to generate a set of valid (but not necessarily optimal)  workflows, which are ranked by the probabilistic ranker. Initially the ranker relies on usage statistics of operators/algorithms to rank workflows. However, as the system gains experience,  exerpimental meta-data stored in the DMEX-DB are analysed by a meta-miner to extract frequent workflow patterns and correlate them with expected workflow performance. A few meta-mining techniques used to obtain this ranking have been described in [1]. Initial results have shown that the use of DMOP allows the meta-miner to generalize over algorithms and workflows, and hence to make useful predictions on algorithms and workflows not previously seen by the meta-miner.

  • To browse the DMOP ontology, click on the Ontologies tab on top of this page, then select DMOP > DMOP Browser.

[1] M. Hilario, P. Nguyen, H. Do, A. Woznica, A. Kalousis. Ontology-based meta-mining of knowledge discovery workflows. In N. Jankowski et al., Meta-Learning in Computational Intelligence, Springer, 2011. Please note that this paper describes the status of DMOP ontology in 2010 and does not reflect perfectly the current version.

User login