Sunday, April 22, 2012

My Inforgraphic Resume

Wednesday, April 18, 2012

Data Visualization / Social Media Business Models

Google Motion Charts
Blog By Stepehen Few : http://www.perceptualedge.com/blog/

Teradata link : Microstrategy

How do we make money for the Social Media ?

Advertisement. How is Google making money and how have they monetized that ? Taking your own data and selling it to you. Ad based revenue is the primariy medium to monetize social media. Posting ads on display network
Google Ad-sense Program.

What's the point of Google+
Refinement of search results
Opportunity to get more and more data about you

Facebook model
Advertising

Facebook Vs Google+
User Data Vs Search Data
Facebook know so much more about you
When / Where / What / Music / Family / Books / Kids / Gender / Photos .... Google wanted this kind of data that is why Google+ was started
Google acquestuin of Youtube

Facebook acquisition of Instagram

  • Easy to upload photos 
  • Increased User Base

Facebook Insights

Twitter Business Model
FireHose -> Access to tweets in real time
Brand and sponcered 

Fremium Business Model

Linked in targeting Companies
Huge market for recruiters. Attractive candidates for future recruitment.

Virtual Goods Model

Location Based Business Model
Four Square
Twitter + American Express + Four Square Partnership

Group on and living social

URL Shortner - BitURL
Diffusion pattern in News Media

Yelp / Trip Advisor
Selling opinions to third party !
Sopnsored Vs Regular Link

What is the Unique Value Proposition and sustain you over time




Tuesday, April 17, 2012

Social Media Twitter Project and more on Dimensional Modelling

Week end Before April 9th is the day to run through the presentation with Feedback for the GOMC !!!

Profile it and clean up. AYFG operational database and put in into SSIS
Build a data mart - OLAP, Visualize and Dashboard
Network Analysis - Advanced Analysis

Del - Dashboards and Network Analysis reports
Going to submit the video
May 7th is the deliverable date !
OLAP Reporting on Monday
SSIS

Dimensional Modelling
MON -> MS BI Suite OLAP Demo
WEd -> OBIE Suite Demo

Difference of working of the OBIE and MSBI tool. Create a data-mart
Next HW : Analysis of the airline Db (Tean)
Next HW : Info-graphic resume (Indi)
Next HW : Learnings of the class(Team)

Social Media Analytics

Facts for AYFG :
Quantity
Items
Time
Members
Gym
Customers
Store


Granularity : Specific Item across a particular gym at particular time. You are able to aggregate it across multiple facts.
Dimensions act as a mechanism to describe individual fact.

What happens when promotions are going on ? What if more promotions are going on simultaneously
You can have promotions as a dimensions. All products might not have dimensions

Promotion Dimension Table
PSK
Name
Details
Can be a simple price discount. Buy 1 Get 1 free !
You might not want to have a separate table.

Factless Fact Table

ISK -> Item
SSK - Store
TSK -> Time
PSK -> Promotion



Set difference will tell you what was on sale and did not sell or the other way.

A factless fact table is a table that contains nothing but dimensional keys.

There are two types of factless tables. One is for capturing the event. An event establishes the relationship among the dimension members from various dimension but there is no measured value. The existence of the relationship itself is the fact.

This type of fact table itself can be used to generate the useful reports. You can count the number of occurrences with various criteria. For example, you can have a factless fact table to capture the student attendance (the example used by Ralph). The following questions can be answered:

  • Which class has the least attendance?
  • Which teachers taugh the most students?
  • What is the average number of attendance of a given course?

All the queries are based on the COUNT() with the GROUP BY queries. I think that the interesting metrics are the nested GROUP BY so you can first count and then apply other aggregate functions such as AVERAGE, MAX, MIX.

The other type of factless table is called Coverage table by Ralph. It is used to support negative analysis report. For example a Store that did not sell a product for a given period. To produce such report, you need to have a fact table to capture all the possible combinations. You can then figure out what is missing.

No tuple appears if there is no sale/purchase. You can have NULL values for surrogate keys.

Social media can have Fact-less Fact Table. Need to think about it !

Degenerated Key 
Market Basket Analysis and associated promotions
Market Basket Analysis is a modelling technique based upon the theory that if you buy a certain group of items, you are more (or less) likely to buy another group of items. For example, if you are in an English pub and you buy a pint of beer and don't buy a bar meal, you are more likely to buy crisps (US. chips) at the same time than somebody who didn't buy beer. 

FACTS -> As granular as possible
Dimensions


AYFD Datamarts 







Eller GOMC Presentation

GOMC
20 minutes slot(10 mins of presenatations)
Deadlines will be posted. PPT submission 2 days before presentation. June 15th is the last day to submit.
Generally within 2 weeks after completion of your campeign.

Video of the presentation screened / Points of Presentation (Check List)

  • Choosing the Client
  • Why NGO
  • Why OPCS 
  • OPCS's Exisiting Website
  • Campeign Structure
  • NAme Campeigns connect to website, Adwords / population connections
  • Goals / Volunteering .. Connect to Mission Statement GAOLS .. 20% more PAge views / Convesions
  • Revenue generation . 20% to download wish list
  • Matriz used .. high traffic 
  • Campeign Strategy .. What did u do / Geo Targeting 
  • Budget Allocation
  • Weekly objectives / Budget allocation
  • First week Disappointment / Ad rank -> Placement -> Conversion
  • Required drastic conversion
  • Increased Max CPC
  • Problems and Resolutions
  • First major changes -> Change in Landing pages / Conversion
  • Second Major Changes -> Ad group Changes / Changes to download the calendar etc
  • Redefine conversion
  • Third Major change -> More descriptive change in the ad using key words
  • Increased daily budget Adding new keywords
  • Week Wise events .. Focus on primary goals
  • Overall Results
  • Impressions VS CTR
  • Why we chose a particular schedule for the campaigns 
  • Impact of a particular event ?
  • When were most conversion
  • How much money is earned per conversions ?
  • Wish List ?
  • Campaign and ad-words wise objectives !
  • Increase in page traffic
  • Future Recommendations Budget / Social Platform / Receptive audience / Direct to all audience / Optimize Marketing strategy
  • Learning outcomes
  • Client Relationships
  • Blog snapshot
Questions :
  • Comparison with previous three weeks ?
  • Biggest challenge ?
  • What did u do when u cannot make change to website ? 
  • What matters is what did u do to deal with the challange

What they did good :
  • Clearly defines conversion goals ?
  • What conversion meant ?
  • Numbers ? Forecast ?
So many downloads, Page views, Volunteers 


What stood out of this presentation  ?
Divided up the presentation, Under graduate team. Passionate about the cause !
Analytically thinking. Show numbers on every slide to justify numbers.
Scenario before and after.
Clear Conversion Goals !
Why only a particular number of volunteers ? .. What are u going to do with excess volunteers !
Numbers justifying numbers relating to business goals.

Practice and split up presentations ! Dr.Ram is ready to meet up individually with the PPT. She will help us to review the report

ROI calculation is very important. Needs to be put
Try to quantify the ROI.
Time your self.
Introduce your client and thank them

Staging area data warehouse basics

Staging areas are used to temporarily store data extracted from source systems so that it can be prepared for subsequent cleansing and transformation.

Table structures usually are very similar to the table structure or file layouts in the source systems and are a series of “flat” table structures. In other words, they have no indexes and no relationships to other tables in the staging area.

Data structures may be designed to include:
  • In-scope data e.g. only data in scope is loaded in the staging area; or
  • All data, e.g. all data in the source table or files is loaded in the staging area and subsequent transformations eliminate non-essential data.
Data is loaded into staging area using fast load techniques and is transformed to ensure basic compatibility with the staging area structures e.g.
  • Dates might be transformed from mm/dd/yy to yy/mm/dd; or
  • Numeric values might be converted to varchar (if not needed for reporting).

Why Union Vs Join ?

Consider all related dimensions while loading it
Need to IDentify te Cube that u aregoing to build
How logical is your DB ?

Slowly Changing Dimensions

OLAP Vs OLTP
Slowly Changing Dimensions

Type 0 -> No Changes Ever, Ostrich technique
Type 1 -> U go and update the attribute. Change the brand. Completely lost the history. Very Simple technique. But difficult to manage history !. If you are not interested in history
Type 2 -> Surrogate Key should be Unique. Create a NEW surrogate Key and keep this history data. Every time attribute value changes, add a new tuple in the dimension table with NEW surrogate key. Old facts under old surrogate key. New facts under NEW surrogate key. Unlimited number of changes under new territory.
Type 3 -> Add a New Column

Junk Dimension
In a data warehouse, a dimension is a data element that categorizes each item in a data set into non-overlapping regions. A data warehouse dimension provides the means to "slice and dice" data in a data warehouse. Dimensions provide structured labeling information to otherwise unordered numeric measures. For example, "Customer", "Date", and "Product" are all dimensions that could be applied meaningfully to a sales receipt. A dimensional data element is similar to a categorical variable in statistics.

A star schema that has many dimensions is sometimes called a centipede schema[2]. Having dimensions of only a few attributes, while simpler to maintain, results in queries with many table joins and makes the star schema less easy to use.

Types of Facts


There are three types of facts:


Additive: Additive facts are facts that can be summed up through all of the dimensions in the fact table.
Semi-Additive: Semi-additive facts are facts that can be summed up for some of the dimensions in the fact table, but not the others.
Non-Additive: Non-additive facts are facts that cannot be summed up for any of the dimensions present in the fact table.

Monday, April 16, 2012

Visualization


Learnings :
Dangerous to use Average Data.
OLAP reporting we aggregate. Aggregate can give u a different picture. Drill down capability is very critical.
Different types of graphics - Different visual representation.
Explain behavior  - Seeing certain patterns. Why the numbers the way they are
Volume of data
Accessibility of data
Cleaning the data is very critical
Ability to drill down to extreme level


Infographics 

Wednesday, April 11, 2012

Balanced Scorecard / Strategy Maps / Usage

The Balanced Scorecard (BSC) is a strategic performance management tool - a semi-standard structured report, supported by proven design methods and automation tools, that can be used by managers to keep track of the execution of activities by the staff within their control and to monitor the consequences arising from these actions. Kaplan and Norton cam up with this idea.

Evaluation not just based on Financial performance you might not be able to sustain performance .
Apart from Fin perspective there are other perspectives which can be seen as follows:

  • Customer
  • Learning and Growth
  • Internal Process -> Investing in our learning and growth. Positioning and Training etc
Can be applied at any level of the company !
Example of Zappos.  Managing and maintaining preformance. How do we do it ? How do we apply ?



Strategy Maps - How are you going to achieve this ? What are the series of steps and actions to achieve the level of performance.

Airline Industry Models :
Hub and Spoke Model - American, Delta, United, Continental
Point to Point Model - Southwest

Airline Industry Pressure Points :
  1. Competition
  2. Fuel prices
  3. Federal regulation
  4. Security

Theme for the strategy Maps for Southwest : Low Cost Efficiency !
2 Matrix in Airline Industry
CASM
RASM

Profit = RASM - CASM ... Maximize the difference


CASM is a commonly used measure of unit cost in the airline industry. CASM is expressed in cents to operate each seat mile offered, and is determined by dividing operating costs by ASMs. This number is frequently used to allow a cost comparison between different airlines or for the same airline across different time periods (say for one year vs the preceding year). A lower CASM means that it is easier for the airline to make a profit, as they have to charge less to break even.