An introduction to online analytical processing - OLAP

  • Overview
In computing, online analytical processing, or OLAP /ˈoʊlæp/, is an approach to answering multi-dimensional analytical (MDA) queries swiftly. Typical applications of OLAP include business reporting for sales, marketing, management reporting, business process management (BPM), budgeting and forecasting, financial reporting and similar areas, with new applications coming up, such as agriculture. The term OLAP was created as a slight modification of the traditional database term Online Transaction Processing ("OLTP").

  • Core of OLAP
Core of any OLAP system is an OLAP cube (also called a 'multidimensional cube' or a hypercube). It consists of numeric facts called measures which are categorized by dimensions. The measures are placed at the intersections of the hypercube, which is spanned by the dimensions as a Vector space.

  • Types
    • MOLAP
MOLAP stands for Multidimensional Online Analytical Processing.
MOLAP differs significantly in that (in some software) it requires the pre-computation and storage of information in the cube — the operation known as processing. Most MOLAP solutions store these data in an optimized multidimensional array storage.
- Advantages
+ Fast query performance due to optimized storage, multidimensional indexing and caching.
+ Smaller on-disk size of data compared to data stored in relational database due to compression techniques.
+ Automated computation of higher level aggregates of the data.
+ Array models provide natural indexing.
+ Effective data extraction achieved through the pre-structuring of aggregated data.
- Disadvantages
+ MOLAP tools traditionally have difficulty querying models with dimensions with very high cardinality
+ Some MOLAP products have difficulty updating and querying models with more than ten dimensions.
+ Some MOLAP methodologies introduce data redundancy.

    • ROLAP
ROLAP stands for Relational Online Analytical Processing.
ROLAP differs significantly in that it does not require the pre-computation and storage of information. Instead, ROLAP tools access the data in a relational database and generate SQL queries to calculate information at the appropriate level when an end user requests it. With ROLAP, it is possible to create additional database tables (summary tables or aggregations) which summarize the data at any desired combination of dimensions.
- Advantages
+ ROLAP is considered to be more scalable in handling large data volumes, especially models with dimensions with very high cardinality (i.e., millions of members).
+ Have variety of data loading tools available, and the ability to fine tune the ETL code to the particular data model, load times are generally much shorter than with the automated MOLAP loads.
  + The data are stored in a standard relational database and can be accessed by any SQL reporting tool (the tool does not have to be an OLAP tool).
  + ROLAP tools are better at handling non-aggregatable facts (e.g., textual descriptions).
+ The ROLAP approach can leverage database authorization controls such as row-level security, whereby the query results are filtered depending on preset criteria applied.

- Disadvantages
  + When the step of creating aggregate tables is skipped, the query performance then suffers because the larger detailed tables must be queried.
  + ROLAP relies on the general purpose database for querying and caching, and therefore several special techniques employed by MOLAP tools are not available (such as special hierarchical indexing).
  + Since ROLAP tools rely on SQL for all of the computations, they are not suitable when the model is heavy on calculations which don't translate well into SQL. Examples of such models include budgeting, allocations, financial reporting and other scenarios.

    • HOLAP
HOLAP stands for Hybrid Online Analytical Processing.
HOLAP is a combination of ROLAP and MOLAP OLAP) which are other possible implementations of OLAP. HOLAP allows storing part of the data in a MOLAP store and another part of the data in a ROLAP store, allowing a tradeoff of the advantages of each. The degree of control that the cube designer has over this partitioning varies from product to product.

  • OLAP with Nosql
The goal of OLAP with Nosql is use very high read and write throughput, data structure, distributed storage, distributed computing, and some special fearture with each type of Nosql.

- Advantages
+ High read / write throughput
  + Special data structure, fearture, ..
  + Distributed storage (big data)
  + Distributed computing

- Disadvantages
  + Don't have ability to use real SQL
  + No data contrains
  + Don't have a lot of great tools to management, administration, ...
  + Poor programing client