Reporting notes 01

Working for over 10 years in IT technology, I realized that lots of experiences were lost with my memory when time passed by. So I wrote this note to keep my memory. If anyone is also interested in this matter, please contact me.

Problem: business requires an report in excel format to be automated in web application.

Final web format:

Original data format:

Key points:

  • Data must be summarized by date, week, month, year, and job name dimensions.
  • Fact table contains dozen millions of records for several years;
  • The reporting requires to rotate the data by 90 degrees to display summarized data
  • Multiple data presentations in a single column, witch includes integer number, decimal, and percentage.

Solution:

  1. Get data from data source
    OpenQuery
    for linked server does not accept variables, therefore we have to use dynamic
    query statement.

    set @openquery = '
    SELECT  [Date],
            [Day],
            JobName,
            Volume_to_Work,
            Volume_Worked,
            Volume_RPC
    FROM OPENQUERY([S1],''
       SELECT cast(a.Date as Date) [Date],
            datename(dw,date) as [Day],
            a.JobName,
            a.Volume_to_Work,
            a.Volume_Worked,
            isnull(b.Volume_RPC,0)
    as Volume_RPC
       FROM
       (
            –VOLUME
            select BACKUPDATE
    date,JobName,sum(VOLUME_SENT) as Volume_to_Work,sum(VOLUME_WORKED) as Volume_Worked
            from TBL_METRICS_RAPPORTVOLUME
            where JobName in ('''''+
            replace(
            case @JOB_NAME_LIST when 'All|'
    then 'Ucm_Arics_ON_Preview|Ucm_Caprice_QC_Preview|UCM_OB_E_PREVIEW|UCM_OB_F_PREVIEW'
            else @JOB_NAME_LIST end
            ,'|',''''',''''')
            +'''''
            )
            and year(BACKUPDATE)
    = '+cast(@REPORT_YEAR as varchar)+'
            group by JobName,BACKUPDATE
       ) a
       left join
       (
            –RPC
            select Backupdate,session,count(session) as Volume_RPC
            from TBL_ACCOUNTS_WORKED a
            left join ( select
    RPC,Result from TBL_REFERENCE_DIALERCODES where site = ''''Internal'''' ) b on a.RELEASECODE=
    b.RESULT
            where session in ('''''+
            replace( case @JOB_NAME_LIST when 'All|' then 'Ucm_Arics_ON_Preview|Ucm_Caprice_QC_Preview|UCM_OB_E_PREVIEW|UCM_OB_F_PREVIEW'
            else @JOB_NAME_LIST end
            ,'|',''''',''''')
            +'''''
            )
            and
    year(Backupdate) = '+cast(@REPORT_YEAR as varchar)+'
            and b.RPC = 1
            group by
    Backupdate,session
       ) b on a.Date = b.Backupdate
    and a.JobName = b.session
       where datename(dw,date)
    <> ''''Sunday''''''
    )'
      
    create table #campaign ([Date] date,[Day] varchar(20),JobName varchar(255),Volume_to_work int,Volume_worked int,Volume_RPC int)
    insert
    #campaign exec(@openquery)
    generated openquery look like:
    SELECT
    [Date],
            [Day],
            JobName,
            Volume_to_Work,
            Volume_Worked,
            Volume_RPC
    FROM OPENQUERY([BREPP1],'
            SELECT cast(a.Date as Date) [Date],
                   datename(dw,date) as [Day],
                   a.JobName,
                   a.Volume_to_Work,
                   a.Volume_Worked,
                   isnull(b.Volume_RPC,0)
    as Volume_RPC
            FROM
            (
                    –VOLUME
                    select
    BACKUPDATE date,JobName,sum(VOLUME_SENT) as Volume_to_Work,sum(VOLUME_WORKED)
    as Volume_Worked
                    from [Source_Dialer].[dbo].[TBL_METRICS_RAPPORTVOLUME]
                    where JobName in (''All'',''Ucm_Arics_ON_Preview'',''Ucm_Caprice_QC_Preview'',''UCM_OB_E_PREVIEW'',''UCM_OB_F_PREVIEW''
                    )
                    and year(BACKUPDATE) = 2017
                    group by JobName,BACKUPDATE
            ) a
            left join
            (
                    –RPC
                    select Backupdate,session,count(session) as Volume_RPC
                    from Source_Dialer.dbo.TBL_METRICS_ACCOUNTS_WORKED a
                    left join ( select RPC,Result from
    Source_Dialer.dbo.TBL_METRICS_REFERENCE_DIALERCODES where site = ''Internal'')
    b on cast(a.RELEASECODE as varchar)= b.RESULT
                    where session in (''All'',''Ucm_Arics_ON_Preview'',''Ucm_Caprice_QC_Preview'',''UCM_OB_E_PREVIEW'',''UCM_OB_F_PREVIEW''
                    )
                    and year(Backupdate) = 2017
                    and b.RPC = 1
                    group by Backupdate,session
            ) b on a.Date = b.Backupdate and a.JobName = b.session
            where datename(dw,date) <> ''Sunday'''
    )
  2. Get summarized data
    select
       row_number() over(order by d.[JobName],d.date) as Id
       ,d.[JobName]
       ,year(d.[Date]) [Year]
       ,right(cast(d.DateISO as varchar),6) + cast(d.DayOfWeek as varchar) [Weekday]
       ,d.[Month]
       ,datepart(wk,d.[Date]) Week
       ,d.[Date]
       ,sum(Volume_to_work) Inventory
       ,sum(Volume_worked) Worked
       ,sum(Volume_RPC) RPC
    into
    #view_campaign
    from
    #campaign c
    right join (
       select [JobName],cd.[Date],cd.DateISO,cd.DayOfWeek,cd.month
       from (select distinct [JobName] from
    #campaign) c
       full outer join dbo.tbl_dim_date cd on 1=1
       where cd.year=@PRM_REPORT_YEAR
    ) d on d.[JobName]=c.[JobName] and c.[Date]=d.[Date]
    group by
         d.[JobName]
         ,year(d.[Date])
         ,d.[Month]
         ,datepart(Wk,d.[Date])
         ,d.[Date]
         ,right(cast(d.DateISO as varchar),6) + cast(d.DayOfWeek as varchar)

  3. Summarized by week, month, year for each jobname, use rollup to get weekly, monthly total columns in final report 

  4. Remove duplicated week that cross months
    
    –Sum the value of the week that cross months (Line38 + Line 44)
    update #tmp set Inventory=t.Inventory, Worked=t.Worked, rpc=t.rpc
    from #tmp v join
    (
       select JobName,Agg
             ,sum([Inventory])[Inventory]
             ,sum([Worked])[Worked]
             ,sum([RPC])[RPC]
       from #tmp
       where Agg like 'w%'
       group by Agg,JobName
    ) t on t.Agg=v.[Agg]
    where v.agg=t.agg and v.JobName=t.JobName

    Delete (Line38) the value of the first week of the duplicated week
    delete from #tmp where agg+cast([Month] as varchar) in
    (select agg+cast(min([Month]) as varchar)
     from #tmp
     where agg like 'w%'
     group by JobName,agg
     having count(agg)>1
    )

    )

  5. Add calculated columns: CompleteRate, ReachRate
    They are the result of division, so dot not use avg to get wrong result

    ALTER TABLE #tmp add
       CompleteRate decimal(14,4),
       ReachRate decimal(14,4)
    update #tmp
       set CompleteRate = case [Inventory] when 0 then 0 else [Worked]*1.0/[Inventory] end,
       ReachRate = case [Inventory] when 0 then 0 else [RPC]*1.0/[Inventory] end
    select * from #tmp

  6. Generate dynamic columns definition and header columns
    –@colDesc is for table header, # followed by datatype is for display formatting
    select @col += ',' + quotename(agg)+' varchar(15)'
    from #tmp
    where 
    	[year]=@REPORT_YEAR and
    	JobName='All'
    	
    select
    @colDesc += ',[' + dbo.[ufn_GetCampaignHeader](agg)+'#string] varchar(15)'
    from #tmp
    where
        [year]=@REPORT_YEAR and
        JobName='All'
    
    select @col = stuff(@col,1,1,''), @colDesc = stuff(@colDesc,1,1,'')
    

    The value of the variable

    
    @col =
    [YTD2017]    varchar (15),
    [Month1]     varchar (15),
    [2017-01-01] varchar (15),
    [2017-01-02] varchar (15),
    [2017-01-03] varchar (15),
    [2017-01-04] varchar (15),
    [2017-01-05] varchar (15),
    [2017-01-06] varchar (15),
    [2017-01-07] varchar (15),
    [Wk1]        varchar (15),
    [2017-01-08] varchar (15),
    [2017-01-09] varchar (15),
    ……
    [2017-12-28] varchar (15),
    [2017-12-29] varchar (15),
    [2017-12-30] varchar (15),
    [Wk52]       varchar (15),
    [2017-12-31] varchar (15),
    [Wk53]       varchar (15)
    
    

    The value of the variable

    
    @colDesc =
    [YTD2017#string]   varchar(15),
    [Jan#string]       varchar(15),
    [Sun170101#string] varchar(15),
    [Mon170102#string] varchar(15),
    [Tue170103#string] varchar(15),
    [Wed170104#string] varchar(15),
    [Thu170105#string] varchar(15),
    [Fri170106#string] varchar(15),
    ……
    [Fri171229#string] varchar(15),
    [Sat171230#string] varchar(15),
    [Wk52#string]      varchar(15),
    [Sun171231#string] varchar(15),
    [Wk53#string]      varchar(15)
    
  7. Create temp summary table to store pivoted final data
    
    --table tmp_summary, add spid value in the global temp table name to isolate concurrent users
    For using dynamic sql query we have to use global temp table, or we are unable to access the temp table.
    Seq (sequence) column is for displaying data in order for each job name
    
       declare @tmptable varchar(1000)
       select @tmptable = '##tmp_summary_' + cast(@@SPID as varchar)
       exec('if object_id(''tempdb..' + @tmptable + ''') is not null drop table ' + @tmptable)
       exec('create table ' + @tmptable + ' ([!Year] int,[JobName#string] varchar(max),[!Seq] int,[Campaign#string] varchar(max),'+ @colDesc + ')')
    

    Result:

    create table tmp_summary_51 ([Year] int,JobId int,Seq int,JobName varchar(max),[Year2017 Total] varchar(20),[Month1 Total] varchar(20),[2017-01-01] varchar(20),[2017-01-02] varchar(20),[2017-01-03] varchar(20),[2017-01-04] varchar(20),[2017-01-05] varchar(20),[2017-01-06] varchar(20),[2017-01-07] varchar(20),[Week1 Total] varchar(20),[2017-01-08] varchar(20),[2017-01-09]
    ......
    varchar(20),[2017-06-23] varchar(20),[2017-06-24] varchar(20),[Week25 Total] varchar(20),[2017-06-25] varchar(20),[2017-06-26] varchar(20),[2017-06-27] varchar(20),[2017-06-28] varchar(20),[2017-06-29] varchar(20),[2017-06-30] varchar(20),[Week26 Total] varchar(20))
    
  8. Turning table in 90 degree with pivot
    
    declare @job varchar(255)
    
    declare csr cursor FAST_FORWARD for select Name from @tbl_job
    
    open csr
    fetch next from csr into @job
    while @@fetch_status = 0
    begin
       -- get column list
       if @columns is null
            select
       @columns = replace (@col,' varchar (15)','')
      
       -- rotate
       -- inventory
       set @sql ='insert into ' + @tmptable + '
       select Year, JobName, 1 seq, Campaign,' + @columns +'
       from (
            select JobName,JobName+'' Inventory'' Campaign,[year],[agg],Inventory
            from #tmp
            where JobName = ''' + @job + '''
            ) x
       pivot (
            sum(Inventory) for [agg] in ('+@columns+')
       ) as pvt
       -- where [year] is not null
       order by [year], JobName, seq'
    
       exec(@sql)   
            
      
       -- worked
       set @sql ='insert into ' + @tmptable + '
       select Year, JobName, 2 seq, Campaign,' + @columns +'
       from (
            select JobName, JobName + '' Worked'' Campaign, [year], [agg], Worked
            from #tmp
            where JobName = ''' + @job + '''
            ) x
       pivot(
            sum(worked) for [agg] in ('+@columns+')
       ) as pvt'
       exec(@sql)
      
       -- % of completion
       set @sql ='insert into ' + @tmptable + '
       select Year, JobName, 3 seq, Campaign,' + @columns +'
       from (
            select JobName,''% of Completion'' Campaign, [year], [agg], CompleteRate
            from #tmp
            where JobName = ''' + @job + '''
            ) x
       pivot (
            avg(CompleteRate) for [agg] in ('+@columns+')
       ) as pvt'
       exec(@sql)
       --RPC
       set @sql ='insert into ' + @tmptable + '
       select Year, JobName, 4 seq, Campaign,' + @columns +'
       from (
            select JobName,''RPC'' Campaign, [year], [agg], RPC
            from #tmp
            where JobName = ''' + @job + '''
            ) x
       pivot (
            sum(RPC) for [agg] in ('+@columns+')
       ) as pvt'
       exec(@sql)
      
       --ReachRate
       set @sql ='insert into ' + @tmptable + '
       select Year,JobName,5 seq, Campaign,' + @columns +'
       from (
            select JobName,''Reach Rate'' Campaign, [year], [agg], ReachRate
            from #tmp
            where JobName = ''' + @job + '''
            ) x
       pivot (
            avg(ReachRate) for [agg] in ('+@columns+')
       ) as pvt'
       exec(@sql)
      
    fetch next from csr into @job
    end
    close csr
    deallocate csr
    

    pivoted data look like this:

    dynamic pivot statement:

  9. Rendered on web page
  10. Report filter:

  11. 5 temp tables are used
    #campaign      (Date, Day, JobName, Volume_to_work, Volume_worked, Volume_RPC)     -- stores original data
    #view_campaign (JobName, Year, Weekday, Month, Week, Date, Inventory, Worked, RPC) -- summarized data
    #tmp           (JobName, Year, Month, Week, Agg, Inventory, Worked, RPC)           -- target aggregation data
    ##tmp_summary_spid(Year, JobName, Seq, Campaign, YTD, Jan, Sun, Mon, Tue, … )      -- final output data
  12. Further action:
    Stores pivoted data in a table with ETL process to improve the performance.

Note:

  • the data used in this note is test date, not real.

Original post: http://itwebsys.blogspot.com/2017/07/reporting-notes-01.html

 514 total views

Author: Albert

Leave a Reply