工学硕士学位论文MOLAP模型及关键算法研究
司成祥
哈尔滨工业大学
2006年06月
国内图书分类号:TP311.131
国际图书分类号:618.3
工学硕士学位论文
MOLAP模型及关键算法研究
硕士研究生:司成祥
导师:徐晓飞教授
副  导  师:叶允明副教授
申 请  学 位:工学硕士
学科、专业:计算机科学与技术
所在单位:深圳研究生院
答辩日期:2006年6月
授予学位单位:哈尔滨工业大学
Classified Index: TP311.131
U.D.C: 618.3
Dissertation for the Master Degree of Engineering
Research on MOLAP Model and
Key Algorithms
Candidate:Si Chengxiang
Supervisor:Prof. Xu Xiaofei
Associate Supervisor:Associate Prof. Ye Yunming Academic Degree Applied for:Master of Engineering Specialty:  Computer Science and Technology Affiliation:  Shenzhen Graduate School
Date of Defence:June, 2006
Degree-Conferring-Institution:Harbin Institute of Technology
哈尔滨工业大学工学硕士学位论文
摘要
为了有效的支持决策分析,近几年人们提出了数据仓库的概念。数据仓库是一个面向主题的、集成的、非易失的且随时间变化的数据集合,用来支持管理人员的决策。OLAP(联机分析处理)在数据仓库基础上进行多维数据分析,是数据仓库上的重要应用。根据数据组织方式不同可以分为ROLAP和MOLAP。其中,MOLAP通过基于数组的多维存储引擎,支持数据的多维视图,具有快速的查询性能,而这主要得益于它独特的、以数据立方体形式存储的多维数据结构,以及存储在数据立方体中的预处理程度很高的数据(即聚集数据)。
本文主要对MOLAP模型中聚集数据进行了研究,主要从如何快速的选择聚集次序、如何有效的存储聚集数据以及在其上相应的查询更新等方面展开。在对MOLAP模型中聚集数据领域的最新研究成果进行了系统学习和总结的基础上,并结合自身的理解和思考,取得了一些创新和成果。
1、本文针对数据立方体多路数组聚集中选择最优聚集次序存在的不足,提出了一种优化的聚集次序寻方法,提高了基于数据立方体形式存储的多维数据预处理的计算速度,减少了数据立方体多路聚集所用的计算时间。
2、针对维度信息传统方式存储而造成扩展性不好,维度信息更新比较困难等问题,本文提出了一种维层次存储树结构,较好的解决了此问题。
3、针对聚集数据存储多采用多维数组方式,造成多维数据集模式中层次信息丢失,数据层次语义不清晰等问题,提出了一种基于层次聚集立方体的存储结构,较好的解决了这些问题。
关键词数据仓库;MOLAP;聚集数据
数据结构与算法论文- I -
哈尔滨工业大学工学硕士学位论文
Abstract
In order to support analysis of decisions, people have proposed the concept of data-warehouse. The data warehouse is a subject-oriented, integrated, time-variant, and nonvolatile collection. Its main purpose is to support analysis of decisions. On line analysis processing (OLAP) is the most important application on the data warehouses. According to the difference of the forms of data organizations, there are two kinds of OLAP servers: ROLAP (relational OLAP) that is founded on the relational database and MOLAP (multi-dimensional OLAP), which is on the basis of multi-dimensional arrays. MOLAP supports multi-views and has high query performance that is attributed to its particular multi-dimensional data structure and aggregate data, which is pre-computed abundantly.
This thesis explores the technology of the aggregate data of MOLAP and mainly concentrates on how to find out the optimal sequence of the multi-way array aggregation, how to store the data effectively and how to index and query the data. On the basis of our systematic research on the latest theory, the thesis makes several innovations and achievements, which will be illustrated in detail as follows.
1. In this thesis, according to the defects of the sequence of the multi-way aggregation, we propose a new method to find out the optimal order, which can improve the rate of pre-computation.
2. The dimensional information, which is stored in traditional way, would lead to the problem of the bad
expansibility and difficulties with data update. We propose a kind of storage structure of dimensional data, which well solve this problem.
3. Aggregate data are usually stored by multi-arrays, which lead to the lost of the dimension information and data hierarchy semantic information. The thesis proposes a kind of storage structure of Multi-dimensional data based on hierarchy, which well solve this problem.
Keywords Data Warehouse, MOLAP, Aggregate data
- II -