- 7+ years of professional IT experience which includes 3 years’ experience in Big - Data related technologies.
- 3+ years of experience in Talend Big Data Integration.
- Extensively worked on Hive, Pig for performing data analysis.
- Experience in managing HBase database and using it to update/modify the data quickly.
- Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice-versa.
- Hands on experience on major components in Hadoop Ecosystem like Hadoop Map Reduce, Spark, HDFS, YARN, Cassandra, Hive, Pig, HBase, Sqoop, Oozie, Kafka.
- Good Knowledge on Map Reduce and Spark design patterns.
- Experience with different file formats like XML, JSON, CSV, Avro.
- Experience with distributed systems, large-scale non-relational data stores, NoSQL map-reduce systems, data modeling, database performance tuning, and multi-terabyte data warehouses.
- Experience in ingesting log data into HDFS using Kafka.
- Experience in managing and reviewing Hadoop log files.
- Involved in developing complex ETL transformation & performance tuning.
- Experience in developing, support and maintenance for the ETL (Extract, Transform and Load) processes using Talend Integration Suite.
- Experience in submitting Talend jobs for scheduling using Talend scheduler which is available in the Admin Console.
- Experience in Spark to process large stream of data.
- Experienced in running MapReduce and Spark jobs over YARN.
- Good knowledge in Talend Integration experience with AWS/Azure.
- Hands on experience in application development using Java, RDBMS, and Linux shell scripting.
- Excellent working knowledge of System Development Life Cycle (SDLC) and Software Testing Life Cycle (STLC) and Defect Life Cycle.
- Well versed in installation, configuration, supporting and managing of Big Data and underlying infrastructure of Hadoop Cluster.
- Experienced in executing the Agile methodologies.
- Experience in Version Control Tools SVN and GitHub.
- Excellent analytical capabilities and good communication skills.
- Ability to handle multiple tasks and work independently as well as in a team.
- Possess a proven history of outstanding performance in a high-pressure environment
- Good Leadership, interpersonal skills, commitment, hard working with zeal to learn new technologies and undertake challenging tasks.
Big Data Ecosystems: Spark SQL, HiveSQL, Hbase, Sqoop, HDFS, YARN, Kerberos, TD Encryption.
Hadoop Distributions: Hortonworks 2.4, Hortonworks 2.5, cloudera
ETL Tools: Talend Big Data Integration 6.3, Talend Data Integration, Talend Admin console, IBM Datasatge, SSIS
NoSQL Databases: Hbase, Cassandra
SQL Databases: Oracle, MySQL
Programming Languages: Java, SQL, Pig Latin, Scala
Tools and IDE: Eclipse, IntelliJ
Defect Management Tools: HPQC, Jira
Version Control Tool: Tortoise SVN, GitHub
Talend Bigdata Developer
- Worked closely with Business Analysts to review the business specifications of the project and also to gather the ETL requirements.
- Created many complex ETL jobs for data exchange from and to Database Server and various other systems including RDBMS, XML, CSV, and Flat file structures. Integrated java code inside Talend studio by using components like tJavaRow, tJava, tJavaFlex and Routines.
- Experienced in using debug mode of talend to debug a job to fix errors.
- Responsible for developing, support and maintenance for the ETL (Extract, Transform and Load) processes using Talend Integration Suite.
- Conducted JAD sessions with business users and SME's for better understanding of the reporting requirements.
- Used Spark to process large data.
- Created Talend jobs to copy the files from one server to another and utilized Talend FTP components.
- Created and managed Source to Target mapping documents for all Facts and Dimension tables.
- Analyzing the source data to know the quality of data by using Talend Data Quality.
- Involved in writing SQL Queries and used Joins to access data from Oracle, and MySQL.
- Prepared ETL mapping Documents for every mapping and Data Migration document for smooth transfer of project from development to testing environment and then to production environment.
- Design and Implemented ETL for data load from heterogeneous Sources to SQL Server and Oracle as target databases and for Fact and Slowly Changing Dimensions.
- Used Talend most used components (tMap, tDie, tConvertType, tFlowMeter, tLogCatcher, tRowGenerator, tSetGlobalVar, tHashInput & tHashOutput and many more)
- Utilized Big Data components like tHDFSInput, tHDFSOutput, tPigLoad, tPigFilterRow, tPigFilterColumn, tPigStoreResult, tHiveLoad, tHiveInput, tHbaseInput, tHbaseOutput, tSqoopImport and tSqoopExport.
- Developed Talend jobs to populate the claims data to data warehouse - star schema.
- Used Talend Admin Console Job conductor to schedule ETL Jobs on daily, weekly, monthly and yearly basis.
- Worked Extensively on Talend Admin Console and Schedule Jobs in Job Conductor.
Environment: Talend Real-Time Bigdata Platform, Talend Administrator Console, Oracle 11g, MS SQL Server 2012/2008, Hive, HBase, HDFS, Spark, java, Sqoop, UNIX.
Sr. ETL/Talend Developer
Confidential, Denver, CO
- Process the data and push the valid records to HDFS.
- Import data from MySQL to HDFS using SQOOP.
- Tune the MapReduce, PIG and Hive jobs to increase the performance and decrease the execution time of the jobs.
- Compress the files downloaded from the servers before storing them in the cluster to save cluster resources.
- Write Corejava programs to convert the JSON files to CSV or TSV files for further processing.
- Optimize already developed long running MapReduce and Pig job for better performance and accurate results.
- Create Hive databases and tables over the HDFS data and write HiveQL queries on the tables.
- Schedule Hadoop and UNIX jobs using OOZIE.
- Work with NoSQL databases like HBase.
- Write Pig and HiveUDFs for processing and analyzing log files.
- Developing Scripts and Batch Job to schedule various Hadoop Program.
- Visualize the complicated data analysis on the dashboards as per the business requirements.
- Integrated Hive, PIG and Mapreduce jobs with elastic search to publish the metrics to the dashboards.
- Utilized the most used Talend Components such as tMap, tFilterRow, tAggregateRow, tFileExist, tFileCopy, tFileList, tDie etc.
- Also, Utilized Big Data components such as tSqoopExport, tSqoopImport, tHDFSInput, tHDFSOutput, tHiveLoad, tHiveInput, tPigLoad, tPigFilterRow, tPigFilterColumn, tPigStoreResult,, tHbaseInput, tHbaseOutput along with executing the jobs in Debug mode and also utilizing the tlogrow component to view the sample output.
- Submitted talend jobs for scheduling using Talend scheduler which is available in the Admin Console.
- Deployed talend jobs on various environments including dev, test and production environments.
- Involved in analysis, design testing phases and responsible for documenting the technical specifications.
Environment: Hadoop 2x, YARN, HDFS, MapReduce, PIG, HIVE, HBASE, Shell Scripting, java, Oozie, TALEND, LINUX.
Bigdata Talend Developer
Confidential, Hoffman Estates, IL
- Participated in Client Meetings and gathered business requirements and analyzed them.
- Design, develop, test, implement and support of Data Warehousing ETL using Talend and HadoopTechnologies.
- Design and Implement ETL processes to import data from and into Microsoft Azure.
- Research, analyze and prepare logical and physical data models for new applications and optimize the data structures to enhance data load times and end - user data access response times.
- Create pig and hive scripts to process various types of data sets and load them into data warehouse built on Hive.
- Develop stored procedures/views in Snowflake and use in Talend for loading Dimensions and Facts.
- Develop merge scripts to UPSERT data into Snowflake from an ETL source.
- Created Hive queries that helped market analysts spot emerging trends by comparing fresh data with EDW reference tables and historical metrics.
- Created complex mappings in Talend using tMap, tJoin, tReplicate, tParallelize, tJava, tjavarow, tJavaFlex, tAggregateRow, tDie, tWarn, tLogCatcher, etc.
- Created joblets in Talend for the processes which can be used in most of the jobs in a project like to Start job and Commit job.
- Developed jobs to move inbound files to vendor server location based on monthly, weekly and daily frequency.
- Implemented Change Data Capture technology in Talend in order to load deltas to a Data Warehouse.
- Perform ETL using different sources like databases, flat files, xml files.
- Migrated Snowflake database to Windows Azure and updating the Connection Strings based on requirement.
- Managed and reviewed Hadoop log files.
- Wrote ETL jobs to read from web APIs using REST and HTTP calls and loaded into HDFS using java and Talend.
- Shared responsibility for administration of Hadoop, Hive and Pig and Talend.
- Good experience in reading queue from JMS.
- Provided design recommendations and thought leadership to sponsors/stakeholders that improved review processes and resolved technical problems.
ENVIRONMENT: Talend, HDFS, HBase, MapReduce, Eclipse, XML, JUNIT, Hadoop, Apache Pig, Hive, java,JMS Elastic Search, Web Services,Microsoft Office
SQL SERVER SSIS ETL Developer
- Deployed the reports to the Report Manager. Performed trouble-shooting and maintenance of the reports with any enhancements or changes as needed from time to time.
- Set up report subscriptions as required by BRD and the management.
- Dropped and recreated Indexes in the SQL Server DSS while migrating legacy data to SQL Server 2005.
- Used BCP to transfer data.
- Created highly complex SSIS packages using various Data transformations like conditional split, Fuzzy Lookup, For Each Loop, Multi Cast, column conversion, Fuzzy Grouping, Script Components.
- Created scripts for data validation and massaging of legacy data in the staging area before moving it to the DSS.
- Schedule Jobs to run SSIS packages at night, to feed daily data into the Decision Support System.
- Wrote documentation for the packages, scripts and the jobs created for DTS to SSIS migration.
- Created various reports such as graphs, charts, matrix, drill down, drill through, parameterized, sub reports and linked reports etc.
- Generated server-side T-SQL scripts for data manipulation and validation and created various snapshots and materialized views for remote instances.
- Modified existing databases by adding/removing tables there by altering referential integrity, primary key constraints and relationships according to requirements.
- Created trace in SQL Profiler and used Data base engine tuning advisor for Performance tuning on stored procedures and Scripts.
- Documented the migration process, reports and the DSS structure and various objects.
- Worked comfortably with a combination of Agile and Spiral methodologies.
- Facilitated meetings between various development teams, DBAs, and Testing teams for timely progress of the migration process.
- Participated in testing during the UAT (QA testing). Mentored and monitored team development and provided regular team status updates to the management.
Environment: SQL Server 2005/2000, DTS, SQL SSIS, SSRS, SQL Profiler, .NET Framework 3.5, C#, Visual Source Safe 2005.