вторник, 27 октября 2015 г.

Mobile Dashboard in near real time

I chose Klipfolio as a perfect visualization tool for operation reporting. 

There is overview of working architecture:

Using Amazon Redshift Instance I collect  data from adjust. Adjust sends data to ourt local database via callbacks. Than using pentaho data integration as ETL tool I take data and load it to the redshift. Job in pentaho runs every 10 minutes and incrementally update last 7 days. Considering mobile applications we should considering long period, because some phones can send data to mobile tracker with long delayed, that's why I prefer to take 7 days and update them with each load.

Data Flow

Using Pentaho data integration I create 2 transformations and put them in one job.

Run Pentaho DI

Pentaho home path: /home/dasboard/Downloads/data-integration
in order to start pentaha use terminal and run spoon.sh

Transformation 1: stg_adjust

This transformation extract data from mob.adjust for last 7 days and send it to s3://s3/webanalys/adjust_data in gzip format.

Transformation 2: sql_adjust

This transformation run sql queries:
--copy data from file to stg.stg_temp_adjsut

copy stg.stg_temp_adjust from 's3://webanalys/adjust_data'
delimiter '|'
timeformat 'YYYY-MM-DD HH24:MI:SS'
--delete data from stg.stg_adjust in order to write new fresh data
DELETE FROM stg.stg_adjust where stamp>= (SELECT MIN(stamp) FROM stg.stg_temp_adjust);
INSERT INTO stg.stg_adjust (SELECT * FROM stg.stg_temp_adjust);
--optimaze stg_adjsut table
analyze stg.stg_adjust;
vacuum stg.stg_adjust;

--delete from temp table all data
DELETE from stg.stg_temp_adjust;
vacuum stg.stg_temp_adjust;

Job: jb_adjsuttoAWS

Kitchen job

Pentaha allow as to schedule jobs via kitchen. Using crontab I run my job. Moreover, kitchen can perform only shell script. As a result I put my job in shell script:

/home/dasboard/Downloads/data-integration/kitchen.sh -file=/home/dasboard/Downloads/pentaho_di_files/jb_adjusttoAWS.kjb -level=Minimal

When we successful load data to redshift, we can build new klip using database connection in klipfolio.
Just choose postgres database and fill credentials. Unfortunately, klipfolio has one disadvantage, you should create separate query for almost every klip. 
My job runs every 10 minutes. As a result we get fresh data and can quickly check health of mobile applications and their performance.

Klips: Installs, Average Session duration in sec, registrations, daily active unique users, weekly active unique users, monthly unique users and installs by channels: