OLAP VS OLTP
OLPT and OLAP are complementing technologies. You can't live without OLTP: it runs your business day by day. So, using getting strategic information from OLTP is usually first “quick and dirty” approach, but can become limiting later.
This post explores key differences between two technologies.
OLTP stands for On Line Transaction Processing and is a data modeling approach typically used to facilitate and manage usual business applications. Most of applications you see and use are OLTP based.
OLAP stands for On Line Analytic Processing and is an approach to answer multi-dimensional queries. OLAP was conceived for Management Information Systems and Decision Support Systems but is still widely underused: every day I see too much people making out business intelligence from OLTP data!
Starschemas are not normalized. Not updating and deleting tuples that u do in normal database. Facitilates queries instead of other operation. When selecting there is not update to index which makes use of Index easier.
GRANULARITY - data warehouse typically stores data in different levels of granularity or summarization, depending on the data requirements of the business. If an enterprise needs data to assist strategic planning, then only highly summarized data is required. The lower the level of granularity of data required by the enterprise, the higher the number of resources (specifically data storage) required to build the data warehouse. The different levels of summarization in order of increasing granularity are:
- Current operational data
- Historical operational data
- Aggregated data
- Metadata
AYEG
Revenue Sources:
Pro Shop and juice bar
Membership and Day Passes
Problem : How to generate more revenue.Which set of customers should the gym target. Has the customer pattern changed over time ?
BI is all about how to maximize revenue based on internal or extranal data
Now Organize corp events. Customer pay fixed fees
What are the thoughts about
Need to have a look at the OLTP data. Need to look at the trends and patterns. Good point to start with. Need to relate with DW Tool Kit by Ralph and Kimball. Can the abve questions be answered using OLTP DB. Yes ... Need to do some messaging the data over time.
For some questions you need the historical data. Even if u have multiple sources of data you need to dump it into single datawarehouse.
A data mart is the access layer of the data warehouse environment that is used to get data out to the users. The data mart is a subset of the data warehouse which is usually oriented to a specific business line or team.
This can also be done by Virtual Integration. This could be a step towards complete integration.
With a virtual data warehouse, you don't have to go through all the hard work of integrating data, rummaging around old files and trying to understand what some programmer back in the 1960s had in mind. You don't need to take up space with replicated data. What a lovely idea. You just string some files together and access them as if they were a data warehouse.
Apache Cassandra is an open source distributed database management system used by Hadoop.
A dimension is a structure that categorizes data in order to enable users to answer business questions. Commonly used dimensions are customers, products, and time. For example, each sales channel of a clothing retailer might gather and store data regarding sales and reclamations of their Cloth assortment. The retail chain management can build a data warehouse to analyze the sales of its products across all stores over time and help answer questions such as:
What is the effect of promoting one product on the sale of a related product that is not promoted?
What are the sales of a product before and after a promotion?
How does a promotion affect the various distribution channels?
The data in the retailer's data warehouse system has two important components: dimensions and facts. The dimensions are products, customers, promotions, channels, and time. One approach for identifying your dimensions is to review your reference tables, such as a product table that contains everything about a product, or a promotion table containing all information about promotions. The facts are sales (units sold) and profits. A data warehouse contains facts about the sales of each product at on a daily basis.
A typical relational implementation for such a data warehouse is a Star Schema. The fact information is stored in the so-called fact table, whereas the dimensional information is stored in the so-called dimension tables. In our example, each sales transaction record is uniquely defined as for each customer, for each product, for each sales channel, for each promotion, and for each day (time).
BI Rules Learnt in today's Class :
- Facts are quantitative attributes that can be added across every dimension !!!
- Items are not quantitative. Quantity and $ amount are quantitative.
- Dimension Tables are short and Fat
- Fact table are tall(exhaustive) and skinny(very few attributes)
- Surrogate key represent dimensions.
Assumption : Every GYM has a local database. Think beyond something which is not there in a regular database.
Dimensions :
Time : Day, Hour
Space : Location where the sale is happening. Location of the Gym. Gym Dimension
Membership : Type
Items : Things that you buy, Products
Facts:
Sales : $ sales amountTotal # of member :
Quantity has Members, Time and Item dimensions.
What do you do about memberships ? Separate Schema for memberships. Only include retail sales.
Attributes for membership : Types of membership, Membership ID
Importance of Surrpogate
Data warehouses typically use a surrogate, (also known as artificial or identity key), key for the dimension tables primary keys. They can use Infa sequence generator, or Oracle sequence, or SQL Server Identity values for the surrogate key. It is useful because the natural primary key (i.e. Customer Number in Customer table) can change and this makes updates more difficult.
Slice and Dice
The ability to move between different combinations of dimensions when viewing data with an OLAP browser.
Staging -> Operational Data Store -> Datawarehouse