Sunday, February 20, 2011

OWB and Microsoft Sharepoint - Part I

--

In this blog post, We will look at one of the ways to integrate with beast of a content management tool "Microsoft SharePoint Portal".

Sharepoint is used in wide variety of ways like for collaboration, content management , reporting tool , etc within an organization. There is always a need to integrate with it. Some organziations need the data from sharepoint into the warehouse or move data from database into sharepoint.

How difficult is integration with Sharepoint ? Can data be pushed in realtime into sharepoint or extracted incrementally from sharepoint ? Can documents be handled ?

Well I will try to answer these questions in this post with a solution and a demo.

For demo will be concentrate on Sharepoint Listitems and attaching\uploading documents (Yes Documents) to listitems using OWB 11gR2. (ODI blogs will follow soon :-) )

Sharepoint is built on .NET framework and there is absolutely no support for JAVA. No jar files. No JDBC drivers. Thats one of the reason, so many ETL\ELT tools in the market do not have any adapter to integrate with sharepoint. As far as I know only Microsoft's ETL tool SSIS comes with an adapter.

Here are few more options to extract and push data into sharepoint -
1) Microsoft BDC (Business Data Catalog) or BCS (Business connectivity services) sharepoint utilities to sync data between databases and sharepoint lists.
2) Sharepoint Object Model programming
3) Third party products
4) MOSS Webservices ( CAML + SOAP )

Top 3 options belong to the .NET family and they need to be coded\configured\executed at sharepoint server.

If you are only limited to using MOSS webservices (which is the case in almost all cases in all the projects - nobody wants to pay for additional licenses) here is a way to go. I will try to explain most of the things in this post in detail to explain how sharepoint works.

MOSS webservices exposes every action(add\delete\update) in sharepoint.
It is a soap based webservice with the infamous CAML (Collabartive Application Markup Language ). CAML contains specific groups of tags to both define and display data. In short, CAML is like a sql query with a XML syntax. You can find more details about CAML over here.

There are couple of challenges using MOSS webservices -
1) IIS Authentication (NTLM V1/V2) used by sharepoint to authenticate users
2) Document handling - requires base64 binary format

If you have tried Oracle PL/SQL to invoke MOSS webservice , it fails with "401 Authentication" error. This is because of the IIS authentication. Oracle PL/SQL just supports basic authentication.Many unanswered queries can be found on the forum.

To integrate with sharepoint, I will use a custom Code Template which would use CAML.
Code template takes care of all the challenges and abstract's the complexity from the user. One code template to invoke\consume sharepoint service.This makes mapping design very simple.




The code template uses SOAP protocol and designed using Java , Jython and Java Bean Shell.
Sharepoint Portal contains many sites\sub-sites ( like a database schema). Each site\subsite have list (imagine them like tables). There can be multiple views(like a database view) on top of a list but one view is created by default. List have listitems (basically rows).MOSS Webservices works for each site\subsite.





"Employee" List will be the demo example.

Note: Custom Metadata Interface (CMI) (Java based OWB api) can be used to reverse engineer sharepoint lists to represent it within OWB as tables and code templates can be created to generated CAML on the fly. This is out of context of this blog post but i will try to demonstrate CMI in a future blog post with some other simple example.

To invoke MOSS webservices, I will use the technique as demonstrated in one of my previous blogpost about invoking/consuming RESTful webservice using OWB11gR2.

MOSS webservice allows to perform different kinds of action on entire site\subsite like add\update\delete list\listitems , uploading documents, adding users , groups etc. Every SharePoint site has different webservices for it and every webservice has multiple actions (soapactions) to carry out different tasks.

Listitems webservice:

On (any) sharepoint site homepage:
http://sharepoint.yourorganization.net/sites/yoursite/orsubsite/default.aspx
replace "default.aspx" with "_vti_bin/lists.asmx"
this will give you the webservice endpoint
http://sharepoint.yourorganization.net/sites/yoursite/orsubsite/_vti_bin/lists.asmx
to handle list and listitems.
Soap Action and Webservice Endpoint will be used as inputs by the code template.




Inserting Listitmes :

Selecting operation "UpdateListItems" under "lists.asmx" webservice provides details of the soap message to insert/update/delete listitems.



CAML Format generation for Inserting Listitems -

The details of how to create the CAML query for "UpdateListItems" can be found here.
For the example above ,  soap message for insert will look like this.
All the column names of Employee list are substituted in the soap message.

Note: The "%s" will be replaced by values from the rows by the code template.




I will be using the default Employees table which is under the sample HR schema , to insert all rows from it into the sharepoint portal. I had customized the table to include three more columns .
LISTNAME(1st column) - SharePoint Listname under which all the data will appear as listitems
SUCCESS_FLAG(Second last column) - Flag to indicate whether the record is successfully updated\inserted\deleted in sharepoint
LISTITEM_ID (last column) - SharePoint generates ID(unique number) for each row which can be used to update\delete data from sharepoint




Logical view of the code template mapping to insert listitmes -

Note: Sharepoint assigns every listitem with a unique number called ID (like a surrogate in sharepoint). This is very handy to update delete records from sharepoint.


The MOSS_XML field will store the return XML after invoking sharepoint webservice from which ID will be extracted and stored in LISTITEM_ID column.





Execution view of the mapping to insert listitems-

The important inputs are CAML_FORMAT , WEBSERVICE_ENDPOINT and SOAP_ACTION.




Once the mapping is executed , the results can be seen. The Code template does the required IIS Authentication and updates the SUCCESS_FLAG for inserted row in SharePoint and captures the ID from the sharepoint server.





The Sharepoint View -






Attaching(Uploading) document to SharePoint Listitems:

Selection operation "AddAttachment" under "lists.asmx" webservice provides details of soap message to upload documents to listitems.


CAML Format to upload files as attachment -

The CAML format is simple soap message.




The list of files to be uploaded.





Logical view of the map to upload the files -



Execution view of the mapping to upload attachments -

The column FILE_PATH which holds the path along with filename is specified as input. Code template reads the file and converts it into base64 binary to upload it to sharepoint.


After executing the mapping.


The sharepoint view -


On clicking the listitiems, the attachments can be viewed.




Using the Code template approach any sharepoint webservice can be invoked. All kinds of things like handling users,groups,lists,calenders,etc can be managed easily.

Customizing the template to include Journalized data and now you can push data in near realtime into sharepoint.

In the Part-II , I will cover extraction of data from sharepoint into oracle and discuss the pros and cons.

--

4 comments:

  1. Hi, very good post. Just have one question, how were you able to address the first challenge regarding NTLM authentication?

    Thanks!

    ReplyDelete
  2. Hi

    Have you got an ODI vers of this yet??

    ReplyDelete
  3. Anybody knows how to do same with ODI?

    ReplyDelete
  4. Naveen,

    You will have to write a custom KM or Open tool to achieve the same in ODI.

    ReplyDelete