Design approach to Update Huge Tables in Informatica powercenter workflow

One of the issues we come across during the ETL design is "Update Large Tables". This is a very common ETL scenarion especially when you treat with large volume of data like loading an SCD Type 2 Dimension. We discussed about a design approach for this scenarion in one of our prior articles. Here in this updated article lets discuss a different approach to update Larger tables using Informatica Mapping.

High level Design Approach.

Use Database JOIN to identify the records to be updated.
Insert the records into TEMP table, which is identified for UPDATE.
Use post session SQL to update the target table.

Design Assumption.

Source and Target tables are relational table.
Both source and target table is on the same database.
Tables are accessible using a single database user.

Informatica Implementation.

For the demonstration purpose lets consider the Customer Dimension table T_DIM_CUST, which has 100 M records. Each load we are expecting to update 100 K Records records in the Dimension table.

Lets start with the mapping building. As the first step, lets OUTER Join the source table CUST_STAGE and target table T_DIM_CUST. Use the SQL below as the SQL override in source qualifier.

SELECT
--Columns From Source Tables
CUST_STAGE.CUST_ID,
CUST_STAGE.CUST_NAME,
CUST_STAGE.ADDRESS1,
CUST_STAGE.ADDRESS2,
CUST_STAGE.CITY,
CUST_STAGE.STATE,
CUST_STAGE.ZIP,
--Columns from Target Tables.
--If any column from T_DIM_CUST has NULL value, record to be set as INSERT else UPDATE
T_DIM_CUST.CUST_ID,
T_DIM_CUST.AS_OF_START_DT,
T_DIM_CUST.AS_OF_END_DT
T_DIM_CUST.CUST_NAME,
T_DIM_CUST.ADDRESS1,
T_DIM_CUST.ADDRESS2,
T_DIM_CUST.CITY,
T_DIM_CUST.STATE,
T_DIM_CUST.ZIP
FROM CUST_STAGE
--Outer Join is Used
LEFT OUTER JOIN T_DIM_CUST
ON CUST_STAGE.CUST_ID = T_DIM_CUST.CUST_ID
AND T_DIM_CUST.AS_OF_END_DT = TO_DATE('12-31-4000','MM-DD-YYYY')

Now using a Router Transformation, route the records to INSERT/UPDATE path. Records identified as INSERT will be mapped to T_DIM_CUST and identified as UPDATE will be mapped to T_DIM_CUST_TEMP.

Use T_DIM_CUST_CUST_ID, which is the column from the target table to identify the records to be inserted/updated. If it is NULL, record will be set for insert else record will be set for update. Below is the Router Group Filter Condition and you can see how the mapping looks like in the below image (Below mapping image has not any transformation logic in it).

INSERT : IIF(ISNULL( T_DIM_CUST_CUST_ID ), TRUE, FALSE)
UPDATE : IIF(NOT ISNULL( T_DIM_CUST_CUST_ID ), TRUE, FALSE)

Now the mapping development is complete, during the session configuration process, add the below SQL as part of the Post session SQL statement as shown below. This MERGE INTO SQL will update the records in T_DIM_CUST table with the values from T_DIM_CUST_TEMP.

MERGEINTO T_DIM_CUST
USING T_DIM_CUST_TEMP
ON T_DIM_CUST.CUST_ID = T_DIM_CUST_TEMP.CUST_ID
WHENMATCHEDTHEN
UPDATE
   SET T_DIM_CUST.AS_OF_END_DT = T_DIM_CUST_TEMP.AS_OF_END_DT,
   T_DIM_CUST.UPDATE_DT = T_DIM_CUST_TEMP.UPDATE_DT,
   T_DIM_CUST.CUST_ID = T_DIM_CUST_TEMP.CUST_ID,
   T_DIM_CUST.CUST_NAME = T_DIM_CUST_TEMP.CUST_NAME,
   T_DIM_CUST.ADDRESS1 = T_DIM_CUST_TEMP.ADDRESS1,
   T_DIM_CUST.ADDRESS2 = T_DIM_CUST_TEMP.ADDRESS2,
   T_DIM_CUST.CITY = T_DIM_CUST_TEMP.CITY,
   T_DIM_CUST.STATE = T_DIM_CUST_TEMP.STATE,
   T_DIM_CUST.ZIP = T_DIM_CUST_TEMP.ZIP
   WHERE T_DIM_CUST.AS_OF_END_DT = TO_DATE('12-31-4000', 'MM-DD-YYYY')

That is all we need... Hope you enjoyed this design technique. Please let us know if you have any difficulties in implementing this technique.

Informatica PowerCenter Constraint Based Loading

Constraint based loading technique is available in Informatica PowerCenter since last couple of versions. This PowerCenter feature will let you load multiple tables in a single session, which is having database level primary key - foreign key constraint or parent - child relation. In this article let's see what is needed to set up a session for constraint based loading.

What is Constraint Based Loading

In the Workflow Manager, you can specify constraint-based loading for a session. When you select this option, the Integration Service orders the target load on a row-by-row basis. For every row, the Integration Service loads the row first to the primary key table, then to any foreign key tables.

What is Needed to Setup Constraint Based Loading

There are couple of rules for setting up Constraint-based loading in a particular session and lets see those in detail.

Key Relationships.
Active Source.
Target Connection Groups.
Treat Rows as Insert.

1. Key Relationships.

When target tables have no key relationships, the Integration Service does not perform constraint-based loading. Similarly, when target tables have circular key relationships, the Integration Service reverts to a normal load. It reverts to a normal load.

When the target definition is imported into the Target Designer, PowerCenter will detect the Key Relationship defined on the database. You can manually create the relationship with in the Target Designer as well.

2. Active Source.

Both the parent and child table need to be mapped from a single active source such as Source Qualifier, Aggregator, Normalizer etc. When target tables receive rows from different active sources, the Integration Service reverts to normal loading for those tables.

In the below mapping, both targets are getting rows from the active transformation Normalizer

3. Target Connection Groups.

The Integration Service enforces constraint-based loading for targets in the same target connection group. If the tables with the primary key-foreign key relationship are in different target connection groups, the Integration Service cannot enforce constraint-based loading when you run the workflow.

To verify that all targets are in the same target connection group, complete the following tasks.

All targets are in the same target load order group and receive data from the same active source.
Use the default partition properties and do not add partitions or partition points.
Define the same target type for all targets in the session properties.
Define the same database connection name for all targets in the session properties.
Choose normal mode for the target load type for all targets in the session properties.

Below Image shows the target are in the same load order group. You can verify the load order from the mapping designer.

Other required properties for the Constraint Based loading is highlighted in the below image. We can set these properties in the session level from workflow manager.

Informatica PowerCenter Session Properties

4. Treat Rows as Insert.

Use constraint-basedloading when the session option Treat Source Rows As is set to Insert. You might get inconsistent data if you select a different Treat Source Rows As option and you configure the session for constraint-basedloading.

Set this property in the session level as shown in below image.

How to Setup Constraint Based Loading

Now we now the requirements for setting up a session for constraint based target loading, Lets see how we configure the session.

To enable constraint-basedloading, you need to set the "Constraint bases load ordering" property as shown in below image.

That is all we need to setup a session for constraint based target loading.

Hope you enjoyed this article and will be helpful in your live projects. Please feel free to leave a comment or question below, we are more than happy to help.

Informatica PowerCenter Repository Upgrade

After an existing Informatica PowerCenter server binaries are upgraded to a higher version, we will have to upgrade the existing repository contents before we can enable the repository service and access the repository objects such as mappings, sessions, workflows etc... from the client tools. This article illustrates the step by step instructions for the upgrading the Informatica PowerCenter repository contents.

Repository Contents Upgrade

Step : 1

Log on to Administrator Console using your Admin User ID and Password as shown in below image.

Step : 2

From the Domain Navigator Click Actions -> New -> PowerCenter Repository Service as shown below.

Step : 3

Provide the Repository details.

Repository Name : The repository name should match with the prior version repository.
Description : An optional description about the repository.
Location : Choose the Domain you have already created. If you have only one Domain, this value will be pre populated.
License : Choose the license key from the drop down list.
Node : Choose the node name from the drop down list.

Step : 4

A new screen will appear, Provide the Repository database details. The repository database details should match with the prior version repository database.

Database Type : Choose your Repository database (Oracle/SQL Server/Sybase)
Username : Database user ID to connect database.
Password : Database user Password.
Connection String : Database Connection String.
Code Page : Database Code Page
Table Space : Database Table Space Name
Choose “Content exists under specified connection string. Do not create new content”

Click Finish

Step : 5

Now you will see the added repository in the Domain Navigator.

Choose the Repository you just added from the Domain Navigator. Click on 'Actions' at the right top, Then Actions -> Repository Contents -> Upgrade

Step : 6

A new window pops up. Provide the Administrator user name and password.

Click OK to complete the upgrade. This process might take some time depending on the size of your repository.

We are ready to enable the repository service and access the contents from the PowerCenter Client tools.

Optimize Upgrade Performance

before the repository contents are upgraded we need to optimize the repository contents. We can optimize the repository contents by completing the below activities.

Purge unnecessary versions from the repository : If your repository is enabled for version control, the repository can quickly grow. If the repository is very large, purge versions that you do not need.

Truncate the workflow and session log file : Use the Repository Manager or the pmrep TruncateLog command to truncate the workflow and session log file and delete run-time information that is not required.

Update statistics : PowerCenter identifies and updates the statistics of all repository tables and indexes when you upgrade a repository. To increase upgrade performance, you can update the statistics before you upgrade the repository.

SCD Type 1 Implementation using Informatica PowerCenter

Unlike SCD Type 2, Slowly Changing Dimension Type 1 do not preserve any history versions of data. This methodology overwrites old data with new data, and therefore stores only the most current information. In this article lets discuss the step by step implementation of SCD Type 1 using Informatica PowerCenter.

The number of records we store in SCD Type 1 do not increase exponentially as this methodology overwrites old data with new data Hence we may not need the performance improvement techniques used in the SCD Type 2 Tutorial.

Understand the Staging and Dimension Table.

For our demonstration purpose, lets consider the CUSTOMER Dimension. Below are the detailed structure of both staging and dimension table.

Staging Table

In our staging table, we have all the columns required for the dimension table attributes. So no other tables other than Dimension table will be involved in the mapping. Below is the structure of our staging table.

Key Points

Staging table will have only one days data. Change Data Capture is not in scope.
Data is uniquely identified using CUST_ID.
All attribute required by Dimension Table is available in the staging table

Dimension Table

Here is the structure of our Dimension table.

SCD Type 1 Implementation using Informatica

Key Points

CUST_KEY is the surrogate key.
CUST_ID is the Natural key, hence the unique record identifier.

Mapping Building and Configuration

Step 1
Lets start the mapping building process. For that pull the CUST_STAGE source definition into the mapping designer.

Step 2
Now using a LookUp Transformation fetch the existing Customer columns from the dimension table T_DIM_CUST. This lookup will give NULL values if the customer is not already existing in the Dimension tables.

LookUp Condition : IN_CUST_ID = CUST_ID
Return Columns : CUST_KEY

Step 3
Use an Expression Transformation to identify the records for Insert and Update using below expression.

INS_UPD :- IIF(ISNULL(CUST_KEY),'INS', 'UPD')

Additionally create two output ports.

CREATE_DT :- SYSDATE
UPDATE_DT :- SYSDATE

See the structure of the mapping in below image.

Step 4
Map the columns from the Expression Transformation to a Router Transformation and create two groups (INSERT, UPDATE) in Router Transformation using the below expression. The mapping will look like shown in the image.

INSERT :- IIF(INS_UPD='INS',TRUE,FALSE)
UPDATE :- IIF(INS_UPD='UPD',TRUE,FALSE)

INSERT Group

Step 5
Every records coming through the 'INSERT Group' will be inserted into the Dimension table T_DIM_CUST.

Use a Sequence generator transformation to generate surrogate key CUST_KEY as shown in below image. And map the columns from the Router Transformation to the target as shown below image.

Note : Update Strategy is not required, if the records are set for Insert.

UPDATE Group

Step 6
Records coming from the 'UPDATE Group' will update the customer Dimension with the latest customer attributes. Add an Update Strategy Transformation before the target instance and set it as DD_UPDATE. Below is the structure of the mapping.

We are done with the mapping building and below is the structure of the completed mapping.

Workflow and Session Creation

There is not any specific properties required to be given during the session configuration.

Below is a sample data set taken from the Dimension table T_DIM_CUST.

Initial Inserted Value for CUSTI_ID 1003

Updated Value for CUSTI_ID 1003

Hope you guys enjoyed this. Please leave us a comment in case you have any questions of difficulties implementing this.

This tutorial shows the process of creating an Informatica PowerCenter mapping and workflow which pulls data from multiple data sources and use Aggregator and Sorter Transformation. Using a sorter transformation, you can sor data either in an ascending or descending order. And aggregator can be used to summarize data.

For the demonstration purpose lets consider the generation of Company a report, which will show all order details in descending order of order.

Solution

Import Order, Items, Order_Items tables from the database
Calculate the total Order Amount for each Order
Create a target, which will show the total order amount in descending order

Below will be the structure of the completed mapping.

I. Import Source and Target Definition

Note : Click the link to Learn more on Source Definition and Target Definition.

Connect to the repository and open the project folder.
Import all the sources definitions Orders, Items, Order_Items from the database .
Create target table Tgt_OrderListing_x as shown below.

II. Source Qualifier and Aggregator Transformation

Note : Click the link to Learn more on Aggregator Transformation.

Create a Source Qualifier transformation and name it SQ_OrderListing_x.
Create an Aggregator transformation and group on the Order_id column
Link ports ORDER_ID, DATE_ENTERED, CUSTOMER_ID, QUANTITY, PRICE, DISCOUNT into the Aggregator.
Add a new output port Order_Amount.
The expression for Order_Amount is SUM(PRICE * QTY – DISCOUNT)
Make QUANTITY, PRICE, DISCOUNT only input ports.

Below will the structure of the mapping at this point.

III. Create Sorter Transformation

To create the Sorter Transformation, use one of the following methods.

Select TRANSFORMATION | CREATE and select the Sorter transformation from the drop down. Enter the name as Srt_OrderListing_x or
Click on the icon from the Transformations toolbar and rename the transformation to SRT_OrderListing_x.

Drag the output ports from Aggregator transformation to Sorter

transformation.

Select the Ports tab in the Sorter transformation as shown below. Check the Key column of the Order_Amount port and select Descending from the Direction drop down as shown below

IV. Map the Target Columns

Link all ports from Sorter Transformation to target table.
Your mapping should look like the one as given below:

V. Load the Target

Create a Workflow with the name wf_OrderList_x.
Create a session task with the name s_OrderList_x.
Run the Workflow.
Monitor the Workflow.
Verify the results for target table Tgt_OrderListing_x.

Your results should look something like this.

Hope you enjoyed this tutorial, Please let us know if you have any difficulties in trying out these exercise and subscribe to the mailing list to get the latest tutorials in your mail box.

This tutorial shows the process of creating an Informatica PowerCenter mapping and workflow which pulls data from multiple data sources and use Aggregator and Router Transformation. Router transformation can be used to split the data into different groups. And aggregator can be used to summarize data.

A Router transformation is similar to a Filter transformation, this transformation can be used to split the data into different groups. A Router transformation consists of input and output groups, input and output ports, group filter conditions, and properties that you configure in the Designer.

For the demonstration purpose lets consider the generation of a report, which requires Store wise order details.

Solution

Import Items, Orders, Order-Items and Stores tables from the database.
Calculate order amount for each order for each store.
Route the output based on store_id and load the data in different tables created for each store.
Retrieve store wise order details.

Below image shows the completed Mapping Layout.

Informatica PowerCenter Router Transformation

Create a Mapping

I. Create Sources and Targets

Import source tables from the database (Items, Orders, Order-Items and Stores).
Create three target tables as shown below and name them as follows.

Tgt_KAUAIFRANCHISE_x
Tgt_MAUIFRANCHISE_x
Tgt_OAHUFRANCHISE_x

The ports in all three target tables are as shown below

II. Drag Sources and Targets into the Mapping

Drag all the source tables into the Mapping Designer.
Create the Source Qualifier transformation and link the sources to the transformation.

III. Create an Aggregator Transformation

Drag all columns from Source qualifier into the transformation and group on Store_id and Order_id.
Create an output port ORDER_AMOUNT.
Create the expression: SUM(PRICE * QUANTITY - DISCOUNT)
Change PRICE, QUANTITY and DISCOUNT to input ports only.

IV. Create a Router Transformation

To create a Router transformation

Select TRANSFORMATION | CREATE and select Router from the drop down, or
Click the icon from the Transformation toolbar.

Link all the output ports from Aggregator Transformation to Router Transformation

Enter the name of the Router as : Rtr_StoreOrder_x.

Select the Groups tab and enter the values under Group Name and Group Filter Condition as shown in the figure below.

The router transformation will generate three groups : Kauai, Maui, Oahu and a default group.

Link columns from each group to the respective targets. For example, the ports under the Kauai group are linked to the Tgt_KAUAIFRANCHISE_x target. This target table contains the order details for the store where store id = 2014.

The final mapping will look like one given below:

V. Load the Target

Create a Workflow with the name wf_StoresOrders_x.
Create a session task with the name s_StoresOrders_x.
Run the Workflow.
Monitor the Workflow.
Verify the results for target table Tgt_KAUAIFRANCHISE_x, Tgt_MAUIFRANCHISE_x, Tgt_OAHUFRANCHISE_x

Hope you enjoyed this tutorial, Please let us know if you have any difficulties in trying out these exercise and subscribe to the mailing list to get the latest tutorials in your mail box.

Most of the time when we process flat files in Informatica PowerCenter, we do some kind of file pre or post processing, such as unzip the source file, create a custom header or footer for the target file etc. Such processing is normally done using Unix or Windows scripts, which is called using pre or post session script. Now Informatica PowerCenter has provided Source, Target Commands to make such processing easy than before.

File Command Property

Using Source or Target Command property, either a Unix or a Windows command can be used to generate flat file source data input rows or file list or a session. Command writes data into stdout and PowerCenter interprets this as a file list or source data. We can use service process variables like $PMSourceFileDir in the command.

Use Cases for File Command Property

In this article, lets discuss couple of use cases, which can be handled easily using File Source, Target Commands. These properties can be further used as per different business needs, but lets see couple of them here.

Use Case 1 : Read a Compressed Source File.

Before the file is read, the file need to be unzipped. We do not need any other pres session script to achieve this. This can be done easy with the below session setting.

Informatica File Source Target Command Property

This command configuration generates rows to stdout and the Flat file reader reads directly from stdout, hence removes need for staging data.

Use Case 2 : Generating a File List.

For reading multiple file sources with same structure, we use indirect file method. Indirect file reading is made easy using File Command Property in the session configuration as shown below.

Command writes list of file names to stdout and PowerCenter interprets this as a file list.

Use Case 3 : Zip the Target File.

We can zip the target file using a post session script. but this can be done with out a post session script as shown in below session configuration.

Use Case 4 : Custom Flat File Column Headings.

You can get the column heading for a flat file using the session configuration as below. This session setting will give a file with header record 'Cust ID,Name, Street #,City,State,ZIP'

Use Case 5 : Custom Flat File Footer.

You can get the footer for a flat file using the session configuration as given in below image. This configuration will give you a file with ***** End Of The Report ***** as the last row of the file.

These properties can be further used as per different business needs. Hope you enjoyed this article. Please leave your comments and questions, we would like to hear from you.

ETL Restartability design for informatica workflows

Restartable ETL jobs are very crucial to job failure recovery, supportability and data quality of any ETL System. So you need to build your ETL system around the ability to recover from abnormal ending of a job and restart. So a well designed ETL system should have a good restartable mechanism. In this article lets discuss ETL restartability approaches to support different type of ETL Jobs such as Dimension loads, Fact Loads etc...

What is ETL Restartability

Restartability logic or recovery logic is the ability to restart ETL processing if a processing step fails to execute properly. You want the ability to restart processing at the step where it failed as well as the ability to restart the entire ETL session.

Lets discuss ETL restartability approaches to support commonly used ETL Jobs types.

Slowly Changing Dimension
Fact Table
Snapshot Table
Current State Table
Very Large Table

1. Slowly Changing Dimension Load

Below diagram shows the high level steps required for SCD loading ETL Job

Restartability Design for Different Type ETL Loads

Lets see this in bit more detail.

Step 1 : In this step, we will read all the data from the staging table. This will include joining data from different tables and applying any incremental data capturing logic.

Step 2 : Data will be compared between source and target to identify if any change in any of the attributes. CHECKSUM Number can be used to make this process simple.

Step 3 : If the check CHECKSUM Number is different, Data is processed further, else ignored.

Step 4 : Do any transformation required, including the error handling.

Step 5 : Load the data into the Dimension Table.

Note : Click the link to Learn more on Slowly Changing Dimension Load

2. Fact Table Load

High level design for the Fact Table design is given in below image.

Some more details on the high level design.

Step 1 : In this step, we will read all the data from the source table. This will include joining data from different tables and applying any incremental data capturing logic.

Step 2 : Perform any transformation required, including the error handling.

Step 3 : Load the data into the TEMP Table.
Step 4 :Load the data from the TEMP Table into the FACT table. This can be done either using Database script or using an Informatica PowerCenter session.

Note : Data movement from the TEMP table to FACT table is assumed to be very less likely to get errors. Any error in this process will require manual intervention.

3. Snapshot Table Load

Many times we create snapshot tables and do build reporting on top of it. This particular restartability technique is appropriate for such scenarios. Below image shows the high level steps.

Detailed steps are as below.

Step 1 : In this step, we will read all the data from the source table. This will include joining data from different tables and applying any incremental data capturing logic.

Step 2 : Truncate the data from the target table.

Step 3 : Perform any transformation required, including the error handling.
Step 4 : Load the data into Target Table.

4. Current State Table Load

Just like SCD Type 1, there are scenarios, where you are interested to keep only the latest state of the data. Here we are discussing a very common and simple approach to achieve restartability for such scenarios.

More about the Steps.

Step 1 : In this step, we will read all the data from the source table. This will include joining data from different tables and applying any incremental data capturing logic.

Step 2 : Identify Records for INSERT/UPDATE and perform any transformations that is required, including the error handling.

Step 3 : Insert the record which is identified for Insert.
Step 4 : Update the record which is identified for Update.

Note : Click the link to Learn more on Slowly Changing Dimension Load

5. Very Large Table Load

The approach we are discussing here is appropriate for loading very large snapshot table , which is required to be available 24/7. You can read the complete design from this article.

Below is the high level design.

Lets see this in bit more detail.

Step 1 : In this step, we will read all the data from the source table. This will include joining data from different tables and applying any incremental data capturing logic.
Step 2 : Perform any transformations that is required, including the error handling.
Step 3 : Load the data into the TEMP Table.

Step 4 : Rename the TEMP table to the Target table. This will move the data from the TEMP table to the actual target table.

Note : Click the link to Learn more on this restartability design.

Please leave us a comment below, if you have any other thoughts or scenarios to be covered. We will be more than happy to help you.

Sequence Generator Transformation for Unique Key Generation

The Sequence Generator transformation generates numeric values in a sequential order. Use the Sequence Generator to create unique primary key values, replace missing primary keys, or cycle through a sequential range of numbers. In this tutorial lets see a practical implementation of Sequence Generator transformation.

For the demonstration purpose lets consider the below scenario.

Customer source data arrives at each store in a flat file. Each file contains the customer name and other customer details. However, there is no unique id to identify each customer. The unique id for each customer will be generated through the mapping.

Solution

Use a Sequence Generator transformation to generate a unique id for each customer.
Use this generated Customer id as the primary key in the target table.

Mapping Layout

I. Import Sources, Targets and create a Mapping

Note : Click the link to Learn more on Source Definition and Target Definition.

Import source definition for Customers, which is a flat file uploaded on the server.
Create a target table, which is similar to the source. Add the CUSTOMER_ID port as a Primary Key. Name the target as Tgt_Customer_x.
Create a mapping by the name M_Custid_x.
Drag the source definition Customers and target definition for Tgt_Customer_x in the designer workspace.

II. Drag Sources and Targets into the Mapping

Note : Click the link to Learn more on Mapping Designer.

Drag all the source tables into the Mapping Designer.
Create the Source Qualifier transformation and link the sources to the transformation.

III. Create a Sequence Generator Transformation

Create the Sequence Generator transformation.

Select TRANSFORMATION | CREATE and select Sequence Generator or
Click on the icon from the Transformation toolbar.
Enter the name as Seq_Custid_x.

Set the Start Value, End Value, Increment Value and other attributes as shown below. Check the Reset box.
Link NEXTVAL column of Sequence Generator to target table.
Link remaining columns from Source qualifier to target table.
Your mapping should look like the one shown below.

Informatica Powercenter Sequence Generator Transformation

IV. Load the Target

Create a Workflow with the name wf_Custid_x.
Create a session task with the name s_Custid_x
Run the Workflow.
Monitor the Workflow.

V. Verify the Results

Select the data from the target table to see the results. You will see the CUSTOMER_ID starting from 1 and increasing in a sequence.

Hope you enjoyed this tutorial, Please let us know if you have any difficulties in trying out these exercise and subscribe to the mailing list to get the latest tutorials in your mail box.

Initial History Building Algorithm for Slowly Changing Dimensions

Building initial history for a Data Warehouse is a complex and time consuming task. It involve taking into account of all the date intervals from different source tables during which the source system’s representation of data in any of the tables feeding into the Dimension Tables. So we can imaging the history building complexity and the need of a reusable algorithm.

In this article lets see a history building algorithm, which can take care of the different history building scenarios.

History Building Date Scenarios.

Lets see the different date scenarios.

Single Source Scenario: The Dimension Table needs only one source table to populate all its data elements.
Multiple Source Scenarios : The Dimension Table needs multiple source tables to populate its data elements. Join conditions and ‘where’ conditions need to be applied to construct the Dimension Table row.
Start & End Date Scenarios : Source table records have both start and end date to represent the active time period of any particular row.
Start Date Scenarios : Source table records have only start date to represent the active time period of a particular row. This record is assumed to be active from the start date till the end of the source system existence.
No Date Scenarios : Source table records have no start or end date. This record is assumed to be active for the entire life of the source system..

The history building process involves the identification of date column interpret as system changes like create,update,delete dates. Those dates are the input, for the algorithm which we are discussing here.

The figure below illustrates how history dates in a potential scenario of multiple input tables and how the end result of the constructed record into Dimension Table will contain history date intervals that are more detailed than each individual source table.

Note : Points on each line shows different date buckets.

History Building Algorithm.

Lets understand the history building algorithm with a real time example.

Step 1 : Gather all the dates from different source tables and tag as S if the date is a start date or E if the date is a end date.
Note : Add a high date 12/31/2099 for each data source, which represents the still active record.

Step 2 : Sort Dates by date and type. Date on ascending and Type on descending.

Step 3 : Remove Duplicate Dates. Consider both date and type column to identify the duplicate rows.

Step 4 : Set End Date to next start, Use next rows date to build the end date for the current row.

Step 5 : Remove adjacent pairs. Remove start and end date pair, which is only one day apart.

Step 6 : Revise Dates. If End date = Next Start date, set it to next Start Date - 1

With this step we have all the time buckets created.

We can have this algorithm build into a reusable component, which can be used across different ETL Process.

Hope you enjoyed this tutorial, Please let us know if you have any questions and subscribe to the mailing list to get the latest tutorials in your mail box.

We have discussed couple of different options for Change Data Capture including a Change Data Capture Framework in our prior discussions. Implementing change capture for ETL process which involves multiple data source needs special care to capture changes from any of your data source. Here in this article lets see CDC implementation for ETL Process which involve multiple data sources.

Change Scenarios

Lets see different possible scenarios to be considered, when we implement Change Data Capture for multi-sourced ETL.

Multiple data sources : Multiple data sources may be required to generate all the data elements required for the Dimension or Target table.
Change in data source : Change can be in one of the data sources or in multiple data sources. Any change needs to be captured.
Parent Child Relation : Data source can have parent child relation and all parent records may not have child records.
Reference & Lookup Tables : Reference and Lookup tables may be used to generate the required data elements for the Dimension or Target table.
Change identification : Changed data from a data source is identified using a date column. Any change in any of the data source will result in the change of the date column say "UPDATE_DT".

Below chart shows the different scenarios we mentioned above. Records to be pulled by the change data process from both CUST, ADDR tables are highlighted in blue based on the assumption that the last ETL run was 03/09/2013.

Change Data Capture Implementation for Multi Sourced ETL Processes

Preserving Last Run Timestamp

We will have to store the last ETL run timestamp, so that the subsequent ETL runs can identify any changes based on the last ETL run timestamp. We have discussed couple of different options for Change Data Capture including a Change Data Capture Framework in our prior discussions. Please visit the below links for more details.

Querying Changed Records

The important part of Change Data Capture which involves multiple data source is to build an SQL query to pull all the required data. Data sources needs to be joined and queries such that any changes discussed in the above change scenarios need to be captured.

Below SQL query is build on CUST, ADDR table, to cover all the scenarios we discusses before.

SQL Query Option 1

  SELECT CUST_ID,
   CUST_NAME,
   CUST_DOB,
   CUST_UPDATE_DT,
   ADDRESS_LINE,
   CITY,
   ZIP,
   STATE,
   ADDR_UPDATE_DT
FROM (SELECT C.CUST_ID AS CUST_ID,
   C.CUST_NAME AS CUST_NAME,
   C.CUST_DOB AS CUST_DOB,
   C.UPDATE_DT AS CUST_UPDATE_DT,
   A.ADDRESS_LINE AS ADDRESS_LINE,
   A.CITY AS CITY,
   A.ZIP AS ZIP,
   A.STATE AS STATE,
   A.UPDATE_DT AS ADDR_UPDATE_DT
FROM CUST C LEFT OUTER JOIN ADDR A ON C.CUST_ID = A.CUST_ID)
   WHERE CUST_UPDATE_DT > TO_DATE ('03/09/2013', 'MM/DD/YYYY') OR
ADDR_UPDATE_DT > TO_DATE ('03/09/2013', 'MM/DD/YYYY')

How this SQL query Works

Part 1 : The SQL query part in RED will give all the records from both CUST and ADDR tables.
Part 2 : The SQL query part in BLUE will filter out records which do not have any changes.

SQL Query Option 2

  SELECT C.CUST_ID AS CUST_ID,
   C.CUST_NAME AS CUST_NAME,
   C.CUST_DOB AS CUST_DOB,
   C.UPDATE_DT AS CUST_UPDATE_DT,
   A.ADDRESS_LINE AS ADDRESS_LINE,
   A.CITY AS CITY,
   A.ZIP AS ZIP,
   A.STATE AS STATE,
   A.UPDATE_DT AS ADDR_UPDATE_DT
FROM CUST C LEFT OUTER JOIN ADDR A ON C.CUST_ID = A.CUST_ID
WHERE C.UPDATE_DT > TO_DATE ('03/09/2013', 'MM/DD/YYYY')
UNION ALL
SELECT C.CUST_ID AS CUST_ID,
   C.CUST_NAME AS CUST_NAME,
   C.CUST_DOB AS CUST_DOB,
   C.UPDATE_DT AS CUST_UPDATE_DT,
   A.ADDRESS_LINE AS ADDRESS_LINE,
   A.CITY AS CITY,
   A.ZIP AS ZIP,
   A.STATE AS STATE,
   A.UPDATE_DT AS ADDR_UPDATE_DT
FROM CUST C LEFT OUTER JOIN ADDR A ON C.CUST_ID = A.CUST_ID
WHERE A.UPDATE_DT > TO_DATE ('03/09/2013', 'MM/DD/YYYY')

How this SQL query Works

Part 1 : The SQL query part in RED will give all the changes from CUST table and corresponding data from the ADDR table.

Part 2 : The SQL query part in BLUE will give all the changes from ADDR table and corresponding data from the CUST table.

Once the data is pulled correctly from the data sources, we need to apply the ETL logic to load the Dimension or Target table.

Hope you enjoyed this article. Please lets us know if you have any questions on this article or share us any of your experiences with change data capture.

A Stored Procedure is an important tool for populating and maintaining databases. Since stored procedures allow greater flexibility than SQL statements, database developers and programmers use stored procedures for various tasks within databases. Informatica PowerCenter provides Stored Procedure Transformation to leverage the power of Database Scripting. In this article lets see it in more in detail about how to use Stored Procedure Transformation.

For the demonstration purpose lets consider a scenario.

Customer source data arrives in a flat file from each store. At times, the customer names may contain some invalid data. All customer names should be validated to check for spaces, digits, special characters, etc. so that there is valid customer data in the Data Mart.

Solution

Use a Stored Procedure transformation to validate the customer name.

Connected Stored Procedure Transformation OR
Un-Connected Stored Procedure Transformation

The customer name is passed as a parameter to the Stored Procedure.

The Stored Procedure returns a ‘V’ value for valid names and ‘I’ for invalid names.

Below is the layout of the completed mapping.

Informatica PowerCenter Stored Procedure Transformation

I. Copy the mapping

We will be using the mapping created in the prior demo article. Copy the mapping to continue this exercise.

In the Navigator Window, select the mapping M_Custid_x.
Select the menu option EDIT | COPY and then select EDIT | PASTE.
Rename the mapping as M_CheckCustName_x.

II. Use a Connected Stored Procedure Transformation

To create a Stored Procedure transformation

Select TRANSFORMATION | CREATE and select Stored Procedure from the drop down, or
Click on the icon from the Transformation toolbar.
Enter the name of the transformation as SP_CheckCustName_x.

Select the procedure name from the PROCEDURES folder.
The Stored Procedure transformation appears with two ports: Name and Flag as shown below.
Double click on the stored procedure transformation. Note : The procedure contains two parameters, Name which is an IN parameter and FLAG, which is an OUT parameter.
Delete the existing links between the Source Qualifier and Tgt_Customer_x.
Link Firstname port from Source Qualifier into the Name port of the Stored Procedure transformation.
Create a Filter transformation and link all ports from Source Qualifier into Filter transformation.
Link the FLAG port from Stored Procedure into the Filter.
Create the filter condition : FLAG = ‘V’.
Link all ports except FLAG into the target.
The Sequence Generator transformation will generate the Customer_id in the target. Only rows with valid customer names will pass to the target.
The final mapping should look as given below:

III. Load the Target

Create a Workflow with the name wf_CheckCustName_Connected_x.
Create a session task with the name s_CheckCustName_Cconnected_x
Run the Workflow.
Monitor the Workflow.

IV. Verify the Results

Select the data from the target table. All the names are clean with no special characters or numbers.

V. Using an Unconnected Stored Procedure Transformation

Using the same mapping, remove the existing Stored Procedure transformation.
Create the Stored Procedure transformation again. Do not link it to any other transformation. Note : An Unconnected Stored Procedure transformation does not contain any links to other transformations.
The ports in the Unconnected Stored Procedure will appear as follows :
In the same mapping, create an Expression transformation before the Filter transformation. Link relevant Ports.
To call the Stored Procedure from the Expression transformation, enter the expression for the FLAG column, the newly added output port as shown below:
FirstName is passed as a parameter to the Stored Procedure and the value returned by the Stored Procedure will be available in the PROC_RESULT variable.
Link all ports from Expression transformation into the filter. Complete the rest of the mapping as shown below:

IV. Load the Target

Create a Workflow with the name wf_CheckCustName_Unconnected_x.
Create a session task with the name s_CheckCustName_Unconnected_x
Run the Workflow.
Monitor the Workflow.

Hope you enjoyed this tutorial, Please let us know if you have any difficulties in trying out these exercise and subscribe to the mailing list to get the latest tutorials in your mail box.

Data Manipulation Using Update Strategy in Informatica PowerCenter

It is obvious that we need data manipulation such as Insert, Update and Delete in an ETL job, Informatica PowerCenter provides Update Strategy transformation to handle any such data manipulation operations. Lets understand Update Strategy Transformation in detail.

Lets consider a real time scenario for the demonstration.

The operational source system that supplies data to your data mart tracks all items that your company has ever sold, even if they have since been discontinued. Your Sales Department wants to run queries against a Data Mart table that contains only currently selling items. They don’t want to use views or SQL, and they want this table updated on a regular basis.

Solution

Use the operational source table ITEMS to build a new Data Mart table, CURRENT_ITEMS, which will contain only current selling items.
Create an Unconnected Lookup transformation object to match source items against current items in the Data Mart.
Create an Update Strategy transformation to test the result of the lookup and determine the appropriate row action to take on the first and subsequent runs of the session.
New current items will be inserted, discontinued items will be rejected, current items already in the target will be updated, and current items already in the target but discontinued since the last session run will be deleted.

Mapping Layout

I. Analyze the source files

Use the Source Analyzer to analyze the ITEMS table from the operational source database. If the source table has already been imported and analyzed, it is not necessary to reanalyze it.

II. Design the target schema

Use the Warehouse Designer to create an automatic target definition named Tgt_CurrentItems_x using the ITEMS source definition.
Create the table in the target database using your student ID and password. The table should appear as below.

III. Create the Mapping and Transformations

Use the Mapping Designer to create a mapping called M_CurrentItems_x. Drag source and target into the designer workspace.
Create an Unconnected Lookup transformation to match ITEMS.ITEM_ID against Tgt_CurrentItems_x.ITEM_ID.
Click on the Target button to select the Lookup table Tgt_CurrentItems_x . Click OK.
Double-click on the Lookup and rename it LKP_CURRENT_ITEMS_x.
Click the Ports tab.
Add a new input port, ITEM_ID_IN, with the same data type as ITEM_ID.
Make ITEM_ID the R port. The ports should appear as shown below
Click the Properties tab.
Verify that the database connection is set to the correct target database string. For example, $Target.
Click the Condition tab.
Click on the icon.
Add the Lookup condition: ITEM_ID = ITEM_ID_IN.
Click OK to save changes and close the Lookup transformation.

IV. Create an Update Strategy transformation

Drag all ports from Source Qualifier into Update Strategy transformation.
Test the result of the lookup and determine the appropriate row action to take on the first and subsequent runs of the session. The logic is that new current items will be inserted, discontinued items will be rejected, current items already in the target will be updated, and current items already in the target but discontinued since the last session run will be deleted.
The pseudo code for the logic is as follows

if (the record doesn’t exist in the target table) then
    if (the discontinued flag is not set) then INSERT else REJECT
        else if (record exists) if (the discontinued flag is not set) then UPDATE
            the record else DELETE the record

Create an expression for the above pseudo code and enter it in the Update Strategy expression editor. The expression will call the Unconnected Lookup transformation.

Completed expression will look like in the below image.

Map all the columns from the update strategy to the target table.

V. Load the target

Use the Workflow Manager to create a Workflow wf_CurrentItems_x
Session Task s_CurrentItems_x based on the M_CurrentItems_x mapping.
Run and monitor the Workflow.

VI. Verify the results

Using a SQL query tool, connect to the target database and verify that the CURRENT_ITEMS table now contains data.

After the session is run once the items table can be modified to simulate changes. You should run the session again to see the results of the logic code in the Update Strategy transformation.

Video Tutorial

Hope you enjoyed this tutorial, Please let us know if you have any difficulties in trying out these exercise.

Re-Keying Surrogate Key For Dimension & Fact Tables. Need, Impact and Fix

A surrogate key is an artificial key that is used as a substitute for a natural key. Every surrogate key points to a dimension record, which represent the state of the dimension record at a point in time. We join between dimension tables and fact tables using surrogate keys to get the factual information at a point in time. In this article lets see the need of surrogate key re-keying, the impact of re-keying and possible fix.

Need and Impact of Surrogate Key Re-Keying

Typically we never re-generate or re-key surrogate key, just because of the fact that these keys links between dimension and fact records to represent the state of factual data at a point in time. At times we come across situations which can not avoid re-keying.

Lets consider an SCD Type 1 customer dimension, which stores the basic customer information and customer income group. And a Fact table sales fact.

Re-Keying Surrogate Key For Dimension and Fact Tables. The Need, Impact and Fix

Here CUST_DIM is not keeping the historical changes of customer attributes. From this data we cannot do an analysis no how the sales per customer changed, when the income group is changed. So business users decided to keep track of the customer attributes historical changes in an SCD Type 2.

This change in turn creates more records for each customers by adjusting the as of start and as of end date for many customer records. Here the CUST_ID 672 changed his income group from MEDIUM to HIGH, so we have two records, with surrogate key CUST_SKEY 101 and 301. One (301) effective till 25-July-12 and the other (101) is still active.

Changed values for both records of CUST_ID 672 is high lighted in red.

This change alone for the Dimension table will not give the capability of historical analysis. We will have to update the Fact table to refer the correct historical Dimension record. Below shown is the correct reference from Fact table to the Dimension record.

We can imagine how painful it will be to adjust the surrogate keys for a Fact or Dimension table having millions of records. Corrected surrogate Keys are highlighted in red in below image.

Fix for Surrogate Key Re-Keying

By now we know the complexity involved in the re-keying of surrogate key. Lets try to find the high level steps involved in fixing the issue.

Dimension Table

We are not left with not much option other than recreating the Dimension table, which will involve the history building retro effectively. To reduce the impact of Dimension rebuilding, we can build the dimension into a temporary table and finally convert the temporary table to the actual Dimension table.

Fact Table

Fact table can be rebuild from the source tables as long as the historical source data is available. Special care should be given to make sure that each fact record is pointing to the surrogate key, which is in effect for the time period of fact creation.

If the historical source data is not available, we can use the existing data from Fact table to derive the new re-keyed fact table. Along with the existing Fact table join with the existing Dimension table to get the natural key and in turn join it with the new re-keyed dimension table to get the surrogate key.

To reduce the impact of Fact rebuilding, we can build the Fact into a temporary table and finally convert the temporary table to the actual Fact table.

Hope you enjoyed this tutorial, Please let us know if you if you have experienced re-keying crisis and how you handled the situation. We are happy to hear from you.

The Informatica PowerCenter Workflow Manager contains many types of tasks to help you build workflows and worklets. You can create reusable tasks in the Task Developer. Or, create and add tasks in the Workflow or Worklet Designer as you develop the workflow. In this article lets see very commonly used Tasks for Workflow or Worklet development.

Background

Lets Consider the scenarion.

People, who are authorized to receive the session status, get an email, once the session has completed. The email gives details of number of rows loaded, rejected, time taken to complete, etc. The workflow should also cleanup the reject files created during a Workflow run.

Solution

Create an Email task and place it in a Workflow
A Command Task can be configured to specify shell or DOS commands, to delete reject files, copy a file, or archive target files.
Use the Command Task to delete reject files.

Workflow Layout

Below is the completed workflow layout.

Tasks and Task Developer in Informatica PowerCenter Workflow Manager

I. Create an Email Task

Create an Email Task in the Task Developer.
Enter the name for the Email Task as On_Success_Mail.
Double-click on the email task. Click on the General tab, enter the description for the task as shown below.
Select the Properties tab and enter the Email User Name and Email Subject details.
Create one more Email task, give the name as On_Failure_Mail and set its properties.

II. Configure the Workflow

Switch to the Workflow Designer and drag the wf_OrderListing_x Workflow created in Prior Article.
Double-click on the Session Task s_OrderList_x.
Click on the Components tab.
Click On Success E-Mail option; from the drop down list select Reusable.
Click on the icon and select On_Success_Mail from the drop down list.
Click on the icon shown highlighted in the figure below.
Enter the email text. Here you can select any post-session built-in Email variables, useful for including important session information.
The reusable Email task for On Failure E-Mail. Enter the details required.
Click OK.

Note: The concerned people will receive an email regarding the status of the Workflow, subject to mail server configuration.

III. Switch to Task Developer

Create a Command task or click on the icon on the Tasks toolbar.
Edit the Task.
In the Commands tab, click on the icon. Enter a name for the command as DeleteFiles, click on to enter the instructions in the command.
Enter the command as shown below.

Note: The command can be any valid UNIX command or shell script for UNIX servers, or, any valid DOS or batch file for Windows servers.

IV. Configure the Workflow

Open the Workflow wf_OrderListing_x create in Prior Article.
Link the session task s_OrderList_x to Command_Delete_x.
Run the Workflow.
Verify the results.

Note: The commands specified in the Command Task are executed on the Informatica Server. To verify the execution of the commands given in the Command Task you need to have privileges to login to the Informatica Server and view the BadFiles directory that has all the reject files.

Hope you enjoyed this tutorial, Please let us know if you have any difficulties in trying out these exercise.

Slowly Changing Dimension Type 6 a Combination of SCD Type 1, 2 & 3

In couple of our previous articles, we discussed how to design and implement SCD Type1, Type 2 and Type 3. We always can not fulfill all the business requirements just by these basic SCD Types. So here lets see what is SCD Type 6 and what it offers beyond the basic SCD Types.

Use Informatica Persistent Cache and Reduce Fact Table Load Time

In a matured data warehouse environment, it is very common to see fact tables with dozens of dimension tables linked to it. If we are using informatica to build this ETL process, we would expect to see dozens of lookup transformations as well; unless any other design techniques are used. Since lookup is the predominant transformation, turning this will help us gain some performance. Lets see how we can use persistent lookup cache for this performance improvement.

Slowly Changing Dimension Type 2 in Informatica powercenter workflow

In one of our prior articles we described the SCD Type 6 dimensional modeling technique. This technique is the combination of SCD Type1, Type 2 and Type 3, which gives much more flexibility in terms of the number of queries it can answer. But off course at the cost of complexity. In this article lets discuss the step by step implementation of SCD Type 6 using Informatica PowerCenter.

SCD Type 4, a Solution for Rapidly Changing Dimension

SCD Type 2, is design to generate new records for every change of a dimension attribute, so that complete historical changes can be tracked correctly. When we have dimension attributes which changes very frequently, the dimension grow very rapidly causing considerable performance and maintenance issues. In this article lets see how we can handle this rapidly changing dimension issue using SCD Type 4.

Lets consider a customer dimension with the following structure. Customer attributes such as Name, Date Of Birth, Customer State changes very rarely or do not even change, where as the Age Band, Income Band and Purchase Band is expected to change much frequently.

If this Customer dimension is used by an organization with 100 million customer, can expect this dimension to grow to 200 or 300 million records assuming that there will be at least two or three changes for a customer in a year.

Add Mini Dimension

We can split the dimension into two dimensions, one with the attributes which are less frequently changing and attributes which are frequently changing as in below. The frequently changing attributes will grouped into the Mini Dimension.

The Mini Dimension will contain one row for each possible combination of attributes. In our case all possible combinations of AGE_BAND, INCOME_BAND and PURCHASE_BAND will be available in CUST_DEMO_DIM with the surrogate key CUST_DEMO_KEY.

If we have 20 different Age Bands and four different Income Bands and three Purchase Bands, we will have 20 X 4 X 3 = 2400 distinct possible combinations. These values can be populated into the Mini Dimension table once and for ever with surrogate key ranging from 1 to 2400.

Note : Mini Dimension do not store the historical attributes, but the fact table preserved the history of dimension attribute assignment.

Below is the model for the Customer dimension with a Mini Dimension for the Sales data mart.

Mini Dimension Challenges

When Mini Dimension starts changing rapidly, multiple Mini Dimensions can be introduced to handle such scenarios. If no fact records are to associate main dimension and mini dimension, a fact less fact table can be used associate main dimension and mini dimension.

Hope you guys enjoyed this. Please leave us a comment in case you have any questions of difficulties implementing this.

Reusability is a great feature in Informatica PowerCenter which can be used by developers. Its general purpose is to reduce unnecessary coding which ultimately reduces development time and increases supportability. In this article lets see how we can build mapplet in Informatica PowerCenter to make your code reusable.

What is Mapplet

Mapplet is a reusable object that you create in the Mapplet Designer. It contains a set of transformations and lets you reuse the transformation logic in multiple mappings. When you use a mapplet in a mapping, you use an instance of the mapplet. Any change made to the mapplet is inherited by all instances of the mapplet.

Solution

Lets consider a real time scenario for the demonstration.

The Sales Department is interested in getting both the quarterly and yearly sales. This calculation is required in multiple ETL process. So decided to create a reusable code using Mapplet.

Build a mapplet that uses multiple sources and aggregate functions.
Create a variable within the mapplet for use in the aggregate functions.

Mapplet Layout

Reuse Informatica PowerCenter Code Using Mapplets

I. Set the Mapplet Designer Options

Manually create a Source Qualifier to pull in data from multiple source definitions. To build a custom Source Qualifier, you must set the Mapplet Designer Options correctly.
Select TOOLS | OPTIONS.
Click the Format tab.
In the Category section, choose Mapplet Designer from the pull-down list.
In the Tables section, uncheck the Create Source Qualifiers When Opening Sources box.
Click OK.

II. Create a New Mapplet

Switch to Mapplet Designer.

Select TOOLS | MAPPLET DESIGNER, or
Click on the button.

Create a Mapplet, by selecting MAPPLETS | CREATE.
Name the mapplet MPLT_QtrSales_x.

III. Analyze the source tables.

Bring the source definitions ITEMS, ORDER_ITEMS, and ORDERS into the Mapplet Designer Workspace by dragging them from the Navigator Window into the workspace.
Create the Source Qualifier either from TRANSFORMATIONS | CREATE, or use the Source Qualifier icon from the transformation toolbar. Name it SQ_SalesByQtr_x.

IV. Create an Aggregator transformation

Create an Aggregator transformation and name it Agg_SalesByQtr_x
Copy and link ITEM_ID and ITEM_NAME from the Source Qualifier (SQ_SalesByQtr_x) into the aggregator(Agg_SalesByQtr_x).
Double-click on Agg_SalesByQtr_x to edit the Aggregator.
On the Columns tab, add the following ports:

YEAR
MONTH
Q1SALES
Q2SALES
Q3SALES
Q4SALES

Enter the expression for YEAR as TO_CHAR(SQ_SalesByQtr_x.DATE_ENTERED,’YYYY’)
NOTE: Notice that you now have a new input port DATE_ENTERED in your Aggregator transformation. The local input port is automatically added when the external reference made to the SQ_SalesByQtr_x is validated.

Build the expressions for the Variable and Output ports as follows:

Group records by: ITEM_ID, ITEM_NAME and YEAR.

Add the Aggregate functions in the expressions. Your Ports tab will look something like in below image.

Select the Properties tab.

Check the Sorted Input box.

Exit the Edit Transformation dialog box by clicking the OK button.

Edit the Source Qualifier.

You must now identify to Informatica that the data will be sorted by ITEM_ID, ITEM_NAME, and DATE_ENTERED.
NOTE: The ports in the Source Qualifier must be in the same order as the ports in the Aggregator, in order to facilitate the correct summarization by the groupings you have specified, above.

Double-click the Source Qualifier transformation.

Select the Properties tab.

Open the SQL query window by clicking on the

icon.

Click the Generate SQL button.

Append the following text to the end of the default SQL statement: ORDER BY, ITEMS.ITEM_ID, ITEMS.ITEM_NAME, ORDERS.DATE_ENTERED

Enter the ODBC data source, User name, and Password given by your instructor.

Click the Validate button.

Confirm that there are no errors in the SQL.

Click OK to exit the SQL editor.

Click OK to exit the Source Qualifier Transformation.

V. Create a Mapplet Output Transformation

Select TRANSFORMATION | CREATE or Click on the icon from the Transformations toolbar. Name it Output_SalesByQtr_x
Select all of the output ports of the Aggregator and drag them into the mapplet Output transformation.
Select MAPPLET | VALIDATE from the menu.
Verify the results of the mapplet validation in the Output Window.
Save the repository.

We are all done and below is the completed Mapplet layout.

Video Tutorial

Hope you enjoyed this tutorial, Please let us know if you have any difficulties in trying out these exercise.

High level Design Approach.

Design Assumption.

Informatica Implementation.

What is Constraint Based Loading

What is Needed to Setup Constraint Based Loading

1. Key Relationships.

2. Active Source.

3. Target Connection Groups.

4. Treat Rows as Insert.

How to Setup Constraint Based Loading

Repository Contents Upgrade

Optimize Upgrade Performance

Understand the Staging and Dimension Table.

Staging Table

Key Points

Dimension Table

Key Points

Mapping Building and Configuration

INSERT Group

UPDATE Group

Workflow and Session Creation

Solution

I. Import Source and Target Definition

II. Source Qualifier and Aggregator Transformation

III. Create Sorter Transformation

IV. Map the Target Columns

V. Load the Target

Solution

Create a Mapping

I. Create Sources and Targets

II. Drag Sources and Targets into the Mapping

III. Create an Aggregator Transformation

IV. Create a Router Transformation

V. Load the Target

File Command Property

Use Cases for File Command Property

Use Case 1 : Read a Compressed Source File.

Use Case 2 : Generating a File List.

Use Case 3 : Zip the Target File.

Use Case 4 : Custom Flat File Column Headings.

Use Case 5 : Custom Flat File Footer.

What is ETL Restartability

Slowly Changing DimensionFact TableSnapshot TableCurrent State TableVery Large Table

1. Slowly Changing Dimension Load

2. Fact Table Load

3. Snapshot Table Load

4. Current State Table Load

5. Very Large Table Load

Solution

Mapping Layout

I. Import Sources, Targets and create a Mapping

II. Drag Sources and Targets into the Mapping

III. Create a Sequence Generator Transformation

IV. Load the Target

V. Verify the Results

History Building Date Scenarios.

History Building Algorithm.

Change Scenarios

Preserving Last Run Timestamp

Querying Changed Records

SQL Query Option 1

How this SQL query Works

SQL Query Option 2

How this SQL query Works

Solution

I. Copy the mapping

II. Use a Connected Stored Procedure Transformation

III. Load the Target

IV. Verify the Results

V. Using an Unconnected Stored Procedure Transformation

IV. Load the Target

Solution

Mapping Layout

I. Analyze the source files

II. Design the target schema

III. Create the Mapping and Transformations

IV. Create an Update Strategy transformation

V. Load the target

VI. Verify the results

Video Tutorial

Slowly Changing Dimension
Fact Table
Snapshot Table
Current State Table
Very Large Table