Wednesday, June 1, 2011

ODI - Manual Commits

--

In this post, we will look at one of the ways to manually commit data in ODI.
There are cases during data loads where you want to commit data when all your interfaces have run successfully. If anyone fails , commit should not be issued.

This is in context of a database only.

Most of the KM's provide a "COMMIT" option which can be either set to true or false.

What happens when the COMMIT option is set to false ? Does it commits the data or it does not?

Note: All the cases below have interfaces with COMMIT option set to false.

Case 1)  Executing an Interface or scenario for an Interface -

Lets a take simple example of a KM - IKM SQL Control Append applied to an Interface EMP_LOAD.


Note: TRUNCATE option is set to true.

Executing the scenario for the interface EMP_LOAD.


The COMMIT task was not executed.  The only task that were executed were Truncate and Insert.
But the data in the table was still populated.


This clearly indicates that ODI will issue COMMIT at the end of a successful session.

Case 2) Where an Interface successfully completes and other interface fails in an ODI Package - 

Here is an example of an ODI package with three interfaces

Dropped the target table for interface JOBS_LOAD so that the package fails.

Executing the package

DEPT_LOAD interface executed sucessfully and the JOBS_LOAD interface failed. The session failed.
The data was inserted into Department table by DEPT_LOAD interface but it was not committed as the session failed.


Case 3) Where all Interface successfully completes in an ODI Package -

Executing the package after fixing the Interface.

The ODI package executed successfully , no commits were issued by any of the Interface.
But looking into any of  the table, data is populated. As seen in Case 1 , ODI issues commit at end of an successful session.


Conclusion - Manual Commits can be achieved by having all the interfaces within an ODI package with COMMIT option set to false.

Few things to consider:
1) Adding scenarios of the interface to the package will not work as ODI will launch a new session for each
    scenario.
2) DDL's cannot be reversed.
3) Performance can degrade, infact manual commits should be avoided whenever possible.


--

Monday, May 23, 2011

ODI Session Restart - Scenarios & Load Plans

--


In this post we will look at ODI session restarts. Basically, how ODI behaves when you restart a session and different restart options for Load Plans.

We will consider three cases:

Case 1) Restarting a Scenario with no sub-scenarios -

I have here is a "LOAD_ALL" package which is executing three steps (interfaces using IKM SQL Control Append). I have dropped the target table so that JOBS_LOAD interface fails.
Executing the ODI package



ODI generated a session number 99045 and the interface JOBS_LOAD failed at task "Truncate target table".

ODI stores all the task (commands in your KM or procedure)execution details in the work repository table SNP_SESS_TASK_LOG


After fixing the error and restarting the session 99045. ODI restarts the session from the failed task which is the "Truncate target table" for interface JOBS_LOAD. If the interface had failed while executing the "Insert new rows" task then ODI would have restarted it from the same task.



In cases where there are no sub-scenarios inside an scenario , ODI will restart from the failed task rather than the failed step.
If you create a scenario for an ODI object(like Interface,Procedure) , restarting the failed scenario will result in same behavior.

Case 2) Restarting a Scenario with sub-scenarios -

Created a scenario for each of the three interfaces and added them in a package LOAD_ALL_SCEN.
Again before executing the package , i will drop the table to fail the interface JOBS_LOAD.



Executing the scenario for the package

ODI created session for each scenario.  The Parent session 101045 (scenario for package LOAD_ALL_SCEN ) spawns a new session (102045,103045) for each sub-scenario (interfaces). The session 103045 failed (scenario for JOBS_LOAD interface).

Fixed the error and restarted the parent session 101045.

The parent session starts from the failed step. It does not starts from the failed task.. In this case it created a new session 104045 for the JOBS_LOAD scenario.
Note: Sessions can be restarted at the failed sub-scenario(step). This should be avoided because they execute the current failed step and stop. Restarts should be done at top most parent session.

View of the SNP_SESS_TASK_LOG -


Case 3) Restarting Load Plans -

To explain the restart concept , we will take an example of simple load plans with no exception handling ( If anything fails load plans fails ).

Load plans provides different restart options.  Executing a load plan creates a session.

Load plans consists of steps (Serial,Parallel,Case,When,Else). Steps can have sub-steps and scenarios. The scenarios are the lowest entity in the Load Plan.

The first step is always a Serial step. In the example above its named as the ROOT_STEP.

If a scenario under a step fails then the step itself fails.

Restart options at the Step level of Load Plans
1)  Restart from failure -
     If the load plan is restarted then the scenario which failed will be the place where restart will initiate.
     Look  at the restart options at Scenario level.
2)  Restart all children -
     This will re-excute(new sessions) all the successfully executed scenarios, the scenario which failed will be
     restarted as per the restart option set at the scenario level.


Restart options at the Scenario level of Load Plans
Note: The "Step" discussed here is this section is a step within an ODI Package which is executed as a scenario.
1) Restart from new Session -
    The failed scenario will be re-executed with a new session.
    If the scenario has sub-scenarios as steps, a new session will be launched again for each scenario.
2) Restart from failed Step -
    If the failed scenario executes other scenarios (sub-scenarios) as steps then a new session will be launched
    for failed sub-scenario as discussed in "Case 2".
    If there is no sub-scenario as step then restart will start from the failed step.
3) Restart from failed Task -
    If the failed scenario has sub-scenario as steps then a new session will be launched for failed sub-scenario.
    If there is no sub-scenario as step then restart will start from the failed task as discussed in "Case 1" above.

Load plans provide additional benefits of restart which are not provided by packages.
A good practice is always create scenarios for each and every object.

--

ODI - Groovy JDBC connection

--

Groovy is now Officially supported in ODI 11.1.1.5.
It provides another option to a team whose developers are comfortable using Groovy and also opens new ways to integrate,code which adds more flexibility.
Now Groovy can be used in Knowledge Modules and ODI Procedures.

In this post we will look at how to make JDBC connections using Groovy in ODI.
It's always nice to know which version of groovy is uesd in ODI (documentation does not reflects).


Snippet of code to connect using Groovy to different databases in ODI using ODI Procedure (No rocket science).

Command on Source:



Command on Target:



Executing the procedure on a standalone agent will print the output:



Throwing an exception to print the Groovy version on the operators output -




Executing the procedure -



 

The Groovy version is 1.7.4.

--

Monday, February 21, 2011

OWB and Microsoft Sharepoint - Part II

--

In Part-I, I had covered how data\documents can be inserted\uploaded into Sharpoint. In this post, I will demonstrate how to extract data. Please read the previous post, if you have not read it.

The extraction can be done in two ways.
1) Use the code template mapping to invoke the webservice and store the XML in to a table. Then use a conventional mapping to extract the data using SQL.
2) Use the code template mapping to invoke the webservice to hold the data into C$ table and then parse it using a table function or leveraging XDB in the  Default Oracle Code Template.

This blog post will demonstrate using the first approach.
I will be extracting the data that was inserted into sharepoint list "Employee" from the previous post.

Extracting data from sharepoint using GetListItems -

Selecting soap action "GetListItems" under "lists.asmx" webservice provides details which can be used to extract listitems from sharepoint.


CAML Format to extract the listitems -

Target is to extract all the listitems. The default row limit of SharePoint is 100 rows which can be overrided as shown below.

Note: If you want to extract data incrementally, all the list have inbuild column "Modified" which can be used as the filter in the CAML query.

Details of how CAML for GetListItems can be found here on msdn. CAML can be constructed to select the columns of your choice,order them , filter them , etc.




Data required to invoke the webservice to get listitems.



Logical View of the Mapping to extract sharepoint listitems -




Execution View of the Mapping to get listitems -



After executing the mapping -



All the listitems (rows) returned in the XML are in the CLOB field.

Parsing the XML -

Conventional OWB Map using SQL Override feature to parse the XML.



Undbounded View Query -

The Unbounded view using the table EMPLOYEE_LISTITEMS.  The query parses the XML to get all the listitems.


Executing the Map -


Integrating with SharePoint  - change data can be easily tracked but there is no webservice which returns deleted data.  Either a minus operation after complete extraction or creating a webservice on sharepoint server can easily fix this issue.

Webservices are slow by nature but again with parallel option and cdc techniques data can be easily pushed into sharepoint.

--

Sunday, February 20, 2011

OWB and Microsoft Sharepoint - Part I

--

In this blog post, We will look at one of the ways to integrate with beast of a content management tool "Microsoft SharePoint Portal".

Sharepoint is used in wide variety of ways like for collaboration, content management , reporting tool , etc within an organization. There is always a need to integrate with it. Some organziations need the data from sharepoint into the warehouse or move data from database into sharepoint.

How difficult is integration with Sharepoint ? Can data be pushed in realtime into sharepoint or extracted incrementally from sharepoint ? Can documents be handled ?

Well I will try to answer these questions in this post with a solution and a demo.

For demo will be concentrate on Sharepoint Listitems and attaching\uploading documents (Yes Documents) to listitems using OWB 11gR2. (ODI blogs will follow soon :-) )

Sharepoint is built on .NET framework and there is absolutely no support for JAVA. No jar files. No JDBC drivers. Thats one of the reason, so many ETL\ELT tools in the market do not have any adapter to integrate with sharepoint. As far as I know only Microsoft's ETL tool SSIS comes with an adapter.

Here are few more options to extract and push data into sharepoint -
1) Microsoft BDC (Business Data Catalog) or BCS (Business connectivity services) sharepoint utilities to sync data between databases and sharepoint lists.
2) Sharepoint Object Model programming
3) Third party products
4) MOSS Webservices ( CAML + SOAP )

Top 3 options belong to the .NET family and they need to be coded\configured\executed at sharepoint server.

If you are only limited to using MOSS webservices (which is the case in almost all cases in all the projects - nobody wants to pay for additional licenses) here is a way to go. I will try to explain most of the things in this post in detail to explain how sharepoint works.

MOSS webservices exposes every action(add\delete\update) in sharepoint.
It is a soap based webservice with the infamous CAML (Collabartive Application Markup Language ). CAML contains specific groups of tags to both define and display data. In short, CAML is like a sql query with a XML syntax. You can find more details about CAML over here.

There are couple of challenges using MOSS webservices -
1) IIS Authentication (NTLM V1/V2) used by sharepoint to authenticate users
2) Document handling - requires base64 binary format

If you have tried Oracle PL/SQL to invoke MOSS webservice , it fails with "401 Authentication" error. This is because of the IIS authentication. Oracle PL/SQL just supports basic authentication.Many unanswered queries can be found on the forum.

To integrate with sharepoint, I will use a custom Code Template which would use CAML.
Code template takes care of all the challenges and abstract's the complexity from the user. One code template to invoke\consume sharepoint service.This makes mapping design very simple.




The code template uses SOAP protocol and designed using Java , Jython and Java Bean Shell.
Sharepoint Portal contains many sites\sub-sites ( like a database schema). Each site\subsite have list (imagine them like tables). There can be multiple views(like a database view) on top of a list but one view is created by default. List have listitems (basically rows).MOSS Webservices works for each site\subsite.





"Employee" List will be the demo example.

Note: Custom Metadata Interface (CMI) (Java based OWB api) can be used to reverse engineer sharepoint lists to represent it within OWB as tables and code templates can be created to generated CAML on the fly. This is out of context of this blog post but i will try to demonstrate CMI in a future blog post with some other simple example.

To invoke MOSS webservices, I will use the technique as demonstrated in one of my previous blogpost about invoking/consuming RESTful webservice using OWB11gR2.

MOSS webservice allows to perform different kinds of action on entire site\subsite like add\update\delete list\listitems , uploading documents, adding users , groups etc. Every SharePoint site has different webservices for it and every webservice has multiple actions (soapactions) to carry out different tasks.

Listitems webservice:

On (any) sharepoint site homepage:
http://sharepoint.yourorganization.net/sites/yoursite/orsubsite/default.aspx
replace "default.aspx" with "_vti_bin/lists.asmx"
this will give you the webservice endpoint
http://sharepoint.yourorganization.net/sites/yoursite/orsubsite/_vti_bin/lists.asmx
to handle list and listitems.
Soap Action and Webservice Endpoint will be used as inputs by the code template.




Inserting Listitmes :

Selecting operation "UpdateListItems" under "lists.asmx" webservice provides details of the soap message to insert/update/delete listitems.



CAML Format generation for Inserting Listitems -

The details of how to create the CAML query for "UpdateListItems" can be found here.
For the example above ,  soap message for insert will look like this.
All the column names of Employee list are substituted in the soap message.

Note: The "%s" will be replaced by values from the rows by the code template.




I will be using the default Employees table which is under the sample HR schema , to insert all rows from it into the sharepoint portal. I had customized the table to include three more columns .
LISTNAME(1st column) - SharePoint Listname under which all the data will appear as listitems
SUCCESS_FLAG(Second last column) - Flag to indicate whether the record is successfully updated\inserted\deleted in sharepoint
LISTITEM_ID (last column) - SharePoint generates ID(unique number) for each row which can be used to update\delete data from sharepoint




Logical view of the code template mapping to insert listitmes -

Note: Sharepoint assigns every listitem with a unique number called ID (like a surrogate in sharepoint). This is very handy to update delete records from sharepoint.


The MOSS_XML field will store the return XML after invoking sharepoint webservice from which ID will be extracted and stored in LISTITEM_ID column.





Execution view of the mapping to insert listitems-

The important inputs are CAML_FORMAT , WEBSERVICE_ENDPOINT and SOAP_ACTION.




Once the mapping is executed , the results can be seen. The Code template does the required IIS Authentication and updates the SUCCESS_FLAG for inserted row in SharePoint and captures the ID from the sharepoint server.





The Sharepoint View -






Attaching(Uploading) document to SharePoint Listitems:

Selection operation "AddAttachment" under "lists.asmx" webservice provides details of soap message to upload documents to listitems.


CAML Format to upload files as attachment -

The CAML format is simple soap message.




The list of files to be uploaded.





Logical view of the map to upload the files -



Execution view of the mapping to upload attachments -

The column FILE_PATH which holds the path along with filename is specified as input. Code template reads the file and converts it into base64 binary to upload it to sharepoint.


After executing the mapping.


The sharepoint view -


On clicking the listitiems, the attachments can be viewed.




Using the Code template approach any sharepoint webservice can be invoked. All kinds of things like handling users,groups,lists,calenders,etc can be managed easily.

Customizing the template to include Journalized data and now you can push data in near realtime into sharepoint.

In the Part-II , I will cover extraction of data from sharepoint into oracle and discuss the pros and cons.

--