Is there an easy way to create and populate an age dimension with age groups?
Solution
Creating an age dimension is usually done once and probably not in SSIS, but with a TSQL script.
For each new assignment I use a script similar to this and adjust it to the requirements for that particular assignment.
-- Drop dimension table if exists
IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[dim_age]') AND TYPE IN (N'U'))
BEGIN
DROP TABLE [dbo].[dim_age]
END
-- Create table dim_age
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[dim_age](
[dim_age_id] [int] IDENTITY(-1,1) NOT NULL,
[Age] [smallint] NULL,
[AgeGroup1] [nvarchar](50) NULL,
[AgeGroup1Sort] [int] NULL,
[AgeGroup2] [nvarchar](50) NULL,
[AgeGroup2Sort] [int] NULL,
CONSTRAINT [PK_dim_age] PRIMARY KEY CLUSTERED
(
[dim_age_id] ASC
) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
-- Enter unknown dimension value (in case a person's date of birth is unknown)
INSERT INTO [dbo].[dim_age]
([Age]
,[AgeGroup1]
,[AgeGroup1Sort]
,[AgeGroup2]
,[AgeGroup2Sort])
VALUES
(-1
,'Unknown'
,0
,'Unknown'
,0)
GO
-- Enter all ages
declare @age smallint;
set @age = 0;
-- Loop through ages 0 to 130
WHILE @age < 131
BEGIN
INSERT INTO [dbo].[dim_age]
([Age]
,[AgeGroup1]
,[AgeGroup1Sort]
,[AgeGroup2]
,[AgeGroup2Sort])
VALUES
(@age
-- Use the common age groups/categories of your region/branch/industry
-- This is just an example
, CASE
WHEN @age < 15 THEN '0 till 15 year'
WHEN @age < 25 THEN '15 till 25 year'
WHEN @age < 35 THEN '25 till 35 year'
WHEN @age < 45 THEN '35 till 45 year'
WHEN @age < 55 THEN '45 till 55 year'
WHEN @age < 65 THEN '55 till 65 year'
ELSE '65 year and older'
END
-- Add value to sort on in SSAS
, CASE
WHEN @age < 15 THEN 1
WHEN @age < 25 THEN 2
WHEN @age < 35 THEN 3
WHEN @age < 45 THEN 4
WHEN @age < 55 THEN 5
WHEN @age < 65 THEN 6
ELSE 7
END
, CASE
WHEN @age < 19 THEN 'Juvenile'
ELSE 'Mature'
END
-- Add value to sort on in SSAS
, CASE
WHEN @age < 19 THEN 1
ELSE 2
END
)
-- Goto next age
set @age = @age + 1
END
The result: filled age dimension |
How could you use this dimension?
A while ago I also posted an example to create and populate a date dimension. So now you can combine those in a datamart. I have an employee table and an absence table with a start- and enddate.
Employee table |
Absence table |
I will use the date dimension to split the absence time periods in separate days and then calculate the employee's age of each day of absence. This will go in to a fact table and then I can use the age dimension to see absence per age group.
-- Split absence time periode in separate days, but go back 2 years max and 1 year forward if end date is unknown
SELECT Absence.AbsenceId
, Absence.EmployeeNumber
-- Date of absence
, dim_date.Date as AbsenceDate
, Absence.ReasonCode
-- Calculation of age at time of absence
, DATEDIFF(YEAR, Employee.DateOfBirth, dim_date.Date)
-
(CASE
WHEN DATEADD(YY, DATEDIFF(YEAR, Employee.DateOfBirth, dim_date.Date), Employee.DateOfBirth)
> dim_date.Date THEN 1
ELSE 0
END) as Age
FROM EmployeeApplication.dbo.Absence
INNER JOIN EmployeeApplication.dbo.Employee
on Absence.EmployeeNumber = Employee.EmployeeNumber
INNER JOIN DM_Staff.dbo.dim_date
on dim_date.Date
-- change start date to lower bound if it's below it
BETWEEN CASE WHEN YEAR(Absence.AbsenceStartDate) >= YEAR(GETDATE()) - 2 THEN Absence.AbsenceStartDate
ELSE DATEADD(yy, DATEDIFF(yy, 0, getdate()) - 2, 0) END
-- change end date to upper bound if it's null
AND ISNULL(Absence.AbsenceEndDate, DATEADD(yy, DATEDIFF(yy, 0, getdate()) + 2, -1))
-- Filter absence record with an enddate below the lower bound (perhaps a bit superfluous with the inner join)
WHERE YEAR(ISNULL(Absence.AbsenceEndDate, GETDATE())) >= YEAR(GETDATE()) - 2
Result of query that can be used in a fact package |
fact absence |
Note: this is a simplified situation to keep things easy to explain.