I have encountered a situation.
I want to pull the data from a webpage, using any transformation I have to load the data into target ( Oracle, DB2, Teradata, Flat File or XML File) Any format is fine.
The data will be updated every day on the webpage.
so I have to extract that data using any transformation and load it to any format.
Please suggest me a solution. I thought of using HTTP transformation. Please let me know how to set paramaters, etc. Can you explain step by step.
Can you send me step by step solution, because I never did such type of ETL.
Couple of ways you can address the issue.
1. From the front end, an xml can be generated. The xml definition to be imported in informatica. Each time the xml is generated, it can trigger a workflow from web services which can use xml as the source and load data to a relational database.
PS: You need to use the XML qualifier to parse the xml in informatica.
2. The second approach would be to use unstructured transformation. If the details of the webpage can be saved in a text file then the txt file can be used as an input to the unstructured transformation. You can then load into a table with a data type of BLOB or CLOB.
Let me know if this helps.
Thanks for responding and helping me find a solution for that. I am new to the informatica,
Can you tell me the step by step apporach, i.e how to do it using informatica what transformations I have to use, what paramaters I have to set. And how can I save a web page as xml. Can you please let me know how to do that.
Sorry for the delay in answering.
The xml file can be prepared from the frontend. Be it dotnet or java on which the front end is developed. Once the xml is prepared, that can be used as a source in a mapping in informatica. Please look into xml source details in the developer's guide.
Thanks for responding.
I do not have the information about the website/ Frontend build. It is a Federal Government reserves website. They will be updating the data periodic. I have to get the data on weekly basis. Here is the page from where I have to pull the data
Thanks for helping out. Let me know how to pull the data from it. And can you tell me more in detailed steps to do that.
Thanks for the update. Just wanted to know where are they updating the data? Do they have any underlying database for that?
Saranik, First of all thanks for the help.
They do not have particular time to update the data ( As I believe). They will update the data on weekly or fortnight basis. I do not know whether the federal reserves has a database for that or not. But I believe they will maintain a Database to keep the data everyday.
I really appreciate for taking time to respond and assisting me.
On the top right side of the webpage you provided, there is a "Data Download Program" link that would help you download your data to a csv file.
I think you will use Informatica's Unstructured Data Option (UDO), where unstructured HTML data is converted into XML first and then data from XML tags is pulled into .csv file. All this happens in UDO. The .csv file that gets created can be read by PowerCenter to load the data into table. I myself am exploring this option these days. The only difference is I am pulling data from unstructured EXCEL. I hope this information helps. Just an fyi, UDO is a separate module you or your company will need to buy licence for. It does not come with PowerCenter I believe.
Thanks for the responding. Is Unstructured Data Option (UDO) a Transformation? Can you provide me detailed information about that.
And can you send how to pull data from an EXCEL sheet. I have multiple sheets in an excel sheet, but unable to pull the data from the excel sheet.
Can you let me know how to pull the data from an Excel Sheet which is having multiple sheets and some calculations.
As I said in the previous post, Unstructured Data Option (UDO) is a separate module in itself with dedicated client and server. U need to have a separate licence to use it. Does not come with PowerCenter. And it is not the transformation that you get in Designer.
Use the "help" to learn how to read data from excel in designer. It is under "Working with Sources" and very well explained. This will answer both question 2 and 3 that you asked.