Application of Materialized View as Aggregate Table in Data Warehouse

2012-01-15WANGHuiLILang

衡阳师范学院学报 2012年6期

关键词：汇总表触发器计算机科学

WANG Hui，LI Lang

（Dept.of computer science，Hengyang Normal University，Hengyang Hunan 421002，China）

Application of Materialized View as Aggregate Table in Data Warehouse

WANG Hui，LI Lang

（Dept.of computer science，Hengyang Normal University，Hengyang Hunan 421002，China）

Aggregate tables store the pre－calculated summaries，and are critical in query performance tuning in a data warehouse.Analyzed the three methods of aggregates data building：triggers，stored procedures and materialized views.Stated the methods，usages and advantages of materialized views via examples，i.e.flexible refresh method，reduction of burdensome programming workload，query rewrite mechanism insuring application independence and so on.Indicated materialized views are ideal choice for aggregate tables.

materialized view；aggregate table；data warehouse

0 Introduction

With the advent of the database application systems and information explosion in 1990s，people have acquired more and more data.Business analysts and decision－makers have to face the issues how to wisely use the data to make business decisions and direct business activities，thus Decision Support System（DSS）and On－Line Analytical Processing（OLAP）are in existence.The foundation of all DSS processing is to build a data warehouse.

A data warehouse is a subject－oriented，integrated，nonvolatile，and time－variant collection of data in support of management's decisions［1］，the various applications of which may be in a few industries，such as finance，telecommunication，manufacture，retail sales，transportation and so on.In a data warehouse，huge historical data are stored，major operations are query and load，data access usually is for a group and access is infrequently，all of which are different from the traditional legacy database system，therefore the data modeling in a data warehouse should be different.Currently the industry has concluded that dimensional modeling is the most viable technique for data warehouse［2］.

1 Aggregate tables and data building

In dimensional model there are fact tables，where the numerical performance measurements of the business are stored，and dimension tables，which contain the textual descriptors of the business.In addition，in order to improve the query performance of data warehouse，there are some aggregate tables，which are pre－calculated summaries of the most granular data at higher levels along the dimension hierarchies［3］.

Since the aggregate tables are pre－calculated summaries derived from the fact and dimension tables，it is critical how they are loaded or updated.There are three methods for aggregates data building.

1）Trigger：Create some triggers on fact and dimension tables.The advantage of the method is that the update of aggregate table is immediate，no delay as long as the fact or dimension table is loaded or updated；whereas the disadvantage is that the update time of aggregate is not flexible，the fact or dimension table change is finished only after all derived aggregate tables are updated，which is timeconsuming.The issue is vital and seriously impacts the performance of data warehouse，especially in business hours.

2）Stored procedure：Create some database stored procedures for updating aggregate tables.The benefit is that they are flexible，and can be run on schedule or manually；whereas the drawback is that building stored procedure is a tough and burdensome work，and requires developers'experience.

3）Materialized view：Create materialized views for aggregate tables.The merit is that the update of materialized view is also flexible，and can be achieved immediately，periodically or manually［4］.The materialized view is a good practical feature of DBMS，and should be taken good advantage of to reduce the workload.

2 Materialized view

A materialized view（MV）is a schema object that provides indirect access to table data by storing the results of a query，and can be used to summarize，compute，replicate，and distribute da－ta［5］.

MV is similar to index in a few ways［6］，whereas CREATE MV statement is somewhat similar to CREATE VIEW.

MV can be used in a few fields：

1）Data replication.MV allows maintaining the copies of remote data on the local node.

2）ETL implementation［7］［8］.The usage of MV is an option for the ETL technology，which transfers data from data source to data warehouse.

3）Aggregate table data building.Creating MV is a good choice for aggregate table derived from fact and dimension tables in data warehouse.

3 MV as aggregate table

As mentioned above，creating MV is a good choice for aggregate table.The following is an example for MV usage.

Figure 1 Fact and dimension in star schema

Suppose there are one fact table and three dimension tables in a star schema shown in Figure 1，one aggregate table could be summarizing the totals for a category，per city，per month shown in Figure 2.

Figure 2 Aggregate table

Where“BUILD IMMEDIATE”，which is default，indicates to build the data when the MV is created，another option is“BUILD DEFERRED”，which indicates to build the data later on；

For“REFRESH”there are three options：（a）“FAST”indicates the incremental refresh method，（b）“COMPLETE”indicates the complete refresh method，（c）“FORCE”（default）indicates to use“FAST”to refresh the MV if possible，use“COMPLETE”otherwise；

“ON DEMAND”（default）indicates MV to be refreshed by calling the refresh procedure，another option is“ON COMMIT”meaning a fast refresh to occur whenever an operation on a base table is committed.

“ENABLE QUERY REWRITE”indicates the MV is considered available for rewriting queries，another option is“DISABLE QUERY REWRITE”（default）meaning the MV unavailable for rewriting queries.

In order to use“FAST”refresh method for a MV，which is the most possible case in a data warehouse，materialized view log on each base table should be created.

If“ON DEMAND”is specified in the CREATE MV statement，which is also the most possible case in a data warehouse，system procedure“dbms＿mview.refresh”needs to be called to refresh the MV.The procedure can be run manually or periodically with job queues.

The advantages of MV as aggregate table in a data warehouse are：

（1）MV is the feature of DBMS，and no burdensome programming work is needed for building and refreshing data of MV，which DBMS can handle.

（2）MV actually stores the query results，and can improve the query performance of data warehouse by the pre－calculated summaries.

（3）MV can be refreshed instantly，manually or periodically，which is very flexible.In a data warehouse environment MV is usually refreshed periodically，or manually if necessary.

（4）MV in a data warehouse is usually accessed through the query rewrite mechanism，transpar－ent to the end user or application，and can be added，altered or dropped just like index without impacting the validity and availability of the SQL statements in the applications..

4 Conclusion

Aggregate table plays an important role as a performance－tuning tool in a data warehouse，and MV is an ideal choice for aggregate table building.The paper describes the methods，usages，and advantages of MV as aggregate table，and we believe that it is practical valuable and greatly helpful in data warehouse applications.

［1］INMON W.H.Building the Data Warehouse［M］.4th ed Indiana，USA，Wiley Publication Inc.2005

［2］KIMBALL R.，ROSS M.The Data Warehouse Toolkit 2nd ed.：The Complete Guide to Dimensional Modeling［M］.New York USA，John Wiley and Sons，Inc.2002

［3］PONNIAH P.Data Warehousing Fundamentals for IT Professionals［M］.2nd ed.New Jersey USA John Wiley＆Sons，Inc.2010

［4］ZHU Wen，Mao Qin－hui，Xue Yan，et al.Analysis and comparison on maintenance algorithms of materialized view in data warehouse［J］.Modern Computer，2008（4）：58－60

［5］CYRAN M，LANE P.Oracle Database Concepts 10g release［M］.California，USA，Oracle Corp.2003

［6］CHEN Yi－xin，Ni Zi－wei.Application of materialized view in oracle large DB query［J］.Fujian Computer，2011（10）：147－148

［7］KIMBALL R.，CASERTA J.The Data Warehouse ETL Toolkit［M］.Indiana，USA，Wiley Publication Inc.2004

［8］XIE Ren－dong，YANG Jun，Application of oracle materialized view in data warehouse［J］.Computer Knowledge and Technology，2008，2（3）：421－423

物化视图作为汇总表在数据仓库中的应用

汪辉，李浪
（衡阳师范学院计算机科学系，湖南衡阳 421002）

汇总表存储的是事先计算好的汇总数据，在数据仓库查询调优中是至关重要的。分析了汇总表数据生成的三种方法：即触发器、存储过程、物化视图；用实例阐述了物化视图的使用方法和优点，即灵活的数据刷新方式、减少繁重的编程工作、查询重写机制保证应用独立性等，指出了物化视图是实现汇总表的理想选择。

物化视图；汇总表；数据仓库

CLC nunber：TP311

AArtical ID：1673－0313（2012）06－0059－04

date：2012－10－12

The Scientific Research Fund of Hunan Provincial Education Department（No：11B018），the Scientific Research fund of Hengyang Normal University（No 11B43）

Biography：Wang Hui（1964— ），male，born in Nanjing，Jiangsu province，MS，ever worked in Canada for 10years as architect in a few High－Tech companies，designed a few data warehouse applications.Research fields：Data Warehouse，BI，OLAP，Data Mining etc.Li Lang（1971— ），male，born in Hunan province，PhD，professor in Hengyang Normal University，Research fields：Embedded security，data processing etc.