The HParser Commercial Edition which is available for Hadoop is this the same parser that is currently distributed as Data Transformation Studio, Version: 9.1.0 Build id: 17.0? If not how do they differ?
Would projects created in either of the two parsing design environments run unaltered in the other environment?
Can the existing Data Transformation Studio, Version: 9.1.0 Build id: 17.0 projects run/called by a command line interface?
HParser and Data Transformation share the same engine and studio - transformation implemented in one will work on the other.
HParser itself is the jar provided by Informatica to run Data Transformation as a MapReduce job within Hadoop, using
I suggest the following:
1. Download HParser community edition from Informatica marketplace - https://community.informatica.com/solutions/1679
2. It bundles the relevant components as well as docs.
3. Take few mnts to view the recorded end-to-end demo under the demo tab, it will explain usage etc.
Here is a sample of HParser exceution command as a MR job:
From the name node, where the HParser jar is at, run the following MR command:
hadoop jar dt-hadoop-0.1.6-job.jar com.informatica.b2b.dt.hadoop.DataTransformationJob -Ddt.debug=true -Dmapred.child.env=IFCONTENTMASTER_HOME=/usr/lib/hparser/,LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib/hparser/bin -Dmapred.child.java.opts=-Xmx200M -Djava.library.path=/usr/lib/hparser/bin <hdfs_input> <hdfs_output> <transformation name>
<hdfs_input> stands for the HDFS folder where the input files are at
<hdfs_output> is the name of the HDFS output folder where the output file will be created – make sure it doesn’t exist before u run this command
<transformation name> stands for the name of the DT transformation you would like to execute
Hope it helps,
The jar - dt-hadoop-0.1.6-job.jar - is provided as part of the download zip file via Informatica marketplace here:
It should be placed under the name node of your Hadoop cluster, the one you will initiate the MapReduce job from.
A pdf guide is provided as part of this zip as well; it contains detailed setup and execution instructions.
Please let me know if I can be of any further assistance.
This is what I have done so far :
1. hadoop setup - standalone mode
2. rpm executed, required three directories are created.
3. Following configuration doc -
Configure one node in the cluster as the command node for running Data Transformation jobs.
i. Copy the HParser JAR file to the command node - where do i put this jar (path)
ii. Create the HParser configuration file, and then save it to the command node -
where do i put this conf file (path)
4. I have a Hparser Studio also, how to i install that.
The jar and config file should be placed together under a path accesable to you.
To Install the HParser studio, simply unzip the HParserStudio901.zip file, run teh setup file and follow the onscreen instructions.
We can do it all over webex together if you wish.
I am done with jar and config file. I was trying to run Setup.exe for Hparser studio, but figured that Setup.exe is meant for Windows. So I'll have to run it using wine on linux or its meant to be run and used on Windows.
Webex is not possible. Thanks for the help offered. I'll communicate it to you if I could arrange it.
HParser studio is to be used on Windows only; once a tramsformation is designed on Windows it will be deployed (copied) to teh ServiceDB folder on your none-windows run-time environment and executed as a MapReduce job via the jar provided by Informatica
If you install Linux or Ubuntu 12.04 server, you can get the connectivity to a windows box by Samba File Server. The file, disks, and paths and the HDFS had me confused and a quick intro to a demo on setup and test of a single node cluster is helpful.
I'm working with the setup that "Michael G. Noll" has on the internet and will be setting up the Mapr Single Node Cluster configuration shortly. But as a matter of clarification the hparser is Eclipse aware on the Windows OS. But by following the brief tutorial that Noll has and substituting using 12.04 server, Hadoop-1.0.4.tar.gz, and the Java where I used the Oracle JDK 1.7 everything worked flawless and the folder questions become clearer. You can of course rune Gnome or even Ubuntu Desktop on the server to assist your setup. The VMWare at Mapr ran poorly on my hardware at home, however you could work with the Hparser if you had adequate Hardware and I would suggest that that be at a minimum 8gig memory and preferably 16gig memory with full VT support so that the 64-bit os can be used under Vmware player. Really if you have adequate hardware the Vmware is probably the easiest means to get the hparser working, Whine is of course the alternative
Finally Mapr has a download of the M3 edition and a VM available for immediate use, but I still suggest "pendrivelinux" running from USB stick if you hardware lacks adequate memory to run under VMware.