- 8.5+ years of experience in Information Technology and 3.4+ years of experience in Big Data Ecosystem related technologies.
- 3.5+ years of experience in BigData technologies using Talend Open Studio, Hadoop, Spark, Pig & Hive.
- Experience in scheduling jobs on Job Conductor and creating execution plans on Talend Admin Console (TAC).
- 1+ years of experience in Talend MDM 6.x
- Experience in Informatica, Amazon S3 and Redshift.
- Used ETL methodologies and best practices to create Talend ETL jobs for Data Services with Big Data platform (Version: 6.2.1).
- Worked on developing ETL processes ( Talend open Studio & Informatica) to load data from multiple data sources to HDFS using Splunk and SQOOP, and performed structural modifications using Map Reduce, Spark, HIVE.
- Experience in using Mapr/Hortonworks cluster, management of single - node and multi-node Hadoop cluster (Hadoop 2.5.1-mapr-1501).
- Experience in requirement analysis, development, implementation, testing, documenting and maintenance of BI applications.
- Excellent programming skills in Python, Core Java and SQL.
- Experience in continuous monitoring and managing the Hadoop cluster through Resource Manager.
- Very good knowledge in Business Intelligence, Data warehousing concepts, Agile methodology.
- Experienced in HIVE Schema design, Data imports and Analysis.
- In depth and extensive knowledge of analyzing data using HiveQL, Pig Latin & HBase.
- Hands-on experience in writing Pig Latin scripts, working with grunt shell and job scheduling with Oozie.
- Good Knowledge in Tableau and Qlik.
- Strong understanding of NoSQL databases like HBase, MongoDB & Cassandra.
- Strong understanding of Data Modeling and experience with Data Cleansing, Data Profiling and Data analysis.
- Experience in ETL (Talend) analysis, designing, developing, testing and implementing ETL processes including performance tuning and query optimizing of databases.
- Experience in extracting source data from Sequential files, XML files, Excel files, transforming and loading it into the target data warehouse.
- Experience in writing SQL Queries, T-SQL, SAS, writing stored procedures, functions and statements for data transformation and manipulation.
- Experience in working with version control tools like GitHub and BitBucket.
- Good Knowledge about scalable, secure cloud architecture based on Amazon Web Services (leveraging AWS cloud services: EC2, Cloud Formation, VPC, S3, etc).
- Strong communication skills with professional attitude and can take the pressures to drive with enthusiasm to support with full potential.
- Authorized to work in the US for any employer
Confidential, Atlanta, GA
- Worked in the Data Integration Team to perform data and application integration with a goal of moving data more effectively, efficiently and with high performance to assist in business critical projects coming up with huge data extraction.
- Developed Spark jobs in Python to perform big data analytics on ingested data in Amazon S3.
- Designed and developed the Business Rules and workflow system in Talend MDM 6.2.2.
- Developed Talend ETL jobs to push the data into Talend MDM and develop the jobs to extract the data from MDM.
- Developed data validation rule to confirm the golden record.
- Developed Data Matching/Linking rules to standardize the record in Talend MDM.
- Perform technical analysis, ETL design, development, testing, and deployment of IT solutions as needed by business or IT.
- Performed data manipulations using various Talend components like tMap, tJavarow, tjava, tSqlRow, tMSSQLInput, tJDBCInput, tJDBCOutput and many more.
- Used tStatsCatcher,tDie,tLogrow to create a generic joblet to store processing stats into a database table to record job history.
- Analyzed the source data to know the quality of data by using Talend Data Quality.
- Troubleshoot data integration issues and bugs, analyze reasons for failure, implement optimal solutions, and revise procedures and documentation as needed.
- Worked on Migration projects to migrate data from data warehouses on SqlServer and migrated those to Netezza.
- Used SQL queries and other data analysis methods, as well as TalendEnterprise Data Quality Platform for profiling and comparison of data, which will be used to make decisions regarding how to measure business rules and quality of the data.
- Worked on TalendRTX ETL tool, develop jobs and scheduled jobs in Talend integration suite.
- Used Talend reusable components like routines, context variable and globalMap variables.
- Responsible to tune ETL mappings, Workflows and underlying data model to optimize load and query Performance in Spark.
- Created Talend Mappings to load data from S3 to Amazon Redshift DWH.
- Supporting the existing map reduce programs developed in Java.
- Developed Talend ESB services and deployed them on ESB servers on different instances.
- Implementing fast and efficient data acquisition using Big Data processing techniques and tools.
- Created Projects in TAC and Assign appropriate roles to Developers and integrated GIT HUB.
- Used Talend Admin Console Job conductor to schedule ETL Jobs on daily, weekly, monthly and yearly basis.
- Monitored and supported the Talend jobs scheduled through Talend Admin Center (TAC).
- Utilized Agile Scrum Methodology to help manage and organize offshore team of 12 developers with regular code review sessions.
Environment: Talend, Spark, Python, Hadoop, Scala, Amazon S3, Amazon Redshift, Core Java.
Talend/Hadoop/Data Lake Developer
Confidential, Milwaukee, WI
- Gathered the business requirements from the Business Partners and Subject Matter Experts.
- Involved with ingesting data received from various providers, on HDFS for big data operations.
- Loaded and transformed large sets of structured, semi structured and unstructured data in various formats like text, zip, XML and JSON.
- Wrote MapReduce jobs to perform big data analytics on ingested data using Pig scripts.
- Expertise in writing Pig UDF'S in Python.
- Installed and configured Pig and also written Pig Latin scripts.
- Imported data using Sqoop to load data from Oracle to HDFS on regular basis from Oracle server to HBase depending on requirements.
- Developed Scripts and Batch Job to schedule various Hadoop Programs.
- Created Talend Mappings to populate the data into dimensions and fact tables
- Created Hive queries that helped market analysts spot emerging trends by comparing fresh data with EDW tables and historical metrics.
- Enabled speedy reviews and first mover advantages by using Oozie to automate data loading into the Hadoop Distributed File System and PIG to pre-process the data.
- Provided design recommendations and thought leadership to sponsors/stakeholders that improved review processes and resolved technical problems.
- Lead, design and develop open source ETL tool Talend solutions for Data Integration, Extracting data from source - Salesforce, SAP, Flat files, in-house systems and Transforming to load it in to the Data warehouse.
- Debugging and troubleshooting complex Talend ETL processes.
- Supported Map Reduce Programs running on the cluster and wrote custom Map Reduce Scripts for Data Processing in Java.
- Utilized Agile Scrum Methodology to help manage and organize a team of 6 developers with regular code review sessions.
- Weekly meetings with technical collaborators and active participation in code review sessions with senior and junior developers.
- Worked on various Talend components such as tMap, tHive components, tAggregateRow, tFile components, tHDFS components & tPIG components.
- Used ETL methodologies and best practices to create Talend ETL jobs.
Environment: Pig, Hive,Hbase, MapReduce, Sqoop, Talend, Splunk, Core Java, Pentaho, SFTP.
Confidential, Minneapolis, MN
- Primary responsibilities include building scalable distributed data solutions using Hadoop ecosystem.
- Knowledge of Hadoop Architecture and various components such as HDFS, JobTracker, Task Tracker, NameNode, DataNode, Secondary Namenode, and MapReduce concepts.
- Established custom MapReduce programs in order to analyze data and used Pig Latin to clean unwanted data.
- Built reusable Hive UDF libraries.
- Used Sqoop to import the data from RDBMS (Oracle and Sql Server) to Hadoop Distributed File System (HDFS) and later analyzed the imported data using Talend ETL Tool.
- Used Talend components like tmap, tsqlinput, tsqloutput, tfiledelimited, tfileoutputdelimited, tmssqloutputbulkexec, tunique, tFlowToIterate, tintervalmatch, tlogcatcher, tflowmetercatcher, tfilelist, tAggregate, tsort, thdfsinput, thdfsoutput, tFilterRow, tSchemaComplianceCheck,tHiveConnection, tHiveRow, tHiveClose.
- Responsible for doing validations and cleansing the data.
- Developed Oozie workflow for scheduling and orchestrating the ETL process.
- Created Hive tables, then applied HiveQL on those tables, this will invoke and run MapReduce jobs automatically.
- Performed loading and transforming large sets of Structured, Semi-Structured and Unstructured data and analyzed them by running Hive queries and Pig scripts.
- Extensively used Pig for data cleansing and created partitioned tables in Hive.
- Supported MapReduce programs those are running on the cluster.
- Performed Hadoop installation, updates, patches and version upgrades when required.
- Performed Cluster maintenance, Cluster Monitoring and Troubleshooting, Manage and review data backups and log files.
- Production rollout support which includes monitoring the solution post go-live and resolving any issues that are discovered by the client and client services teams.
Environment: HDFS, Talend, HIVE(0.7.1), Sqoop (V1), Core Java, Flume, Zookeeper, Oozie, Oracle 11g, SQL Server 2008, HBase, Oracle 11g / 10g,PL/SQL, SQL*PLUS, Windows NT, LINUX, UNIX Shell Scripting.
- Participated in business requirements analysis, data modeling and creating design document..
- Designed and created ETL processes to migrate data from Relational and Flat File sources into the target RDBMS.
- Developed mappings by implementing complex business rules using Router, Lookup, Expression, Sequence Generator, Rank, Joiner, Update Strategy, Union etc. to manipulate the data flow from source to target with versioning concepts included.
- Used Constraint Based Loading option in the sessions to load tables related with primary and foreign key.
- Designed mappings to generate reports and send notifications across to users.
- Used command tasks in workflows to execute shell scripts.
- Used Workflow Manager to create various tasks and used the Workflow Monitor to monitor the workflows.
- Created post-session and pre-session shell scripts and mail-notifications.
- Scheduled workflows and shell scripts using Autosys.
- Tuned the mappings and sessions for optimum performance.
- Documented Design and Unit Test plans for Informatica mappings, design and validation rules.
- Wrote stored procedures, Functions & Triggers to support the ETL processes.
- Help create OBIEE Repository (.rpd) which involves Schema Import, implementation of the business logic by customization, Dimensional Hierarchies, Aggregate Navigation, Level Based Measures, Security Management (Data level/Object level security), Variables, Initialization blocks and cache management.
- Determined the data sources, and imported the source tables into the physical layer using connection pool object and populated the metadata into schema object.
- Created Dashboards by embedding reports and providing intuitive drilldowns and links to exploit the full benefits of Analytics.
Environment: Informatica Power Center 8.6 & 9.1, OBIEE 10.1.3x, Oracle 11g, Sybase 12, SQL, PL/SQL, Windows 2003 Server, Sun Solaris Unix.
- Designed ETL mapping based on Existing ETL logic.
- Involved in the development of Informatica mappings and also tuned for better performance.
- Extensively used ETL to load data from flat files to Oracle.
- Made adjustments in Data Model and SQL scripts to create and alter tables.
- Worked extensively on SQL, PL/SQL and UNIX shell scripting.
- Co-ordination with Client Business & Systems team for QA.
- Worked on new data model to form the single security master for all the Data Marts and retrofit the existing and new securities in the FACT tables.
- Extracted the data from the flat files and other RDBMS databases into staging area and populated onto Data warehouse.
- Used Session parameters, Mapping variable/parameters and created Parameter files for imparting flexible runs of workflows based on changing variable values.
- Worked as a fully contributing team member.
Environment: Informatica Power center 8.1, Oracle 9i, PLSQL, MY-SQL, SQL Developer, HP Quality center, MS Office, Business Objects XI, and Linux.