- 8+ years of professional IT experience which includes 3years experience in Big - Data related technologies.
- Hands on experience on major components in Hadoop Ecosystem like Hadoop Map Reduce, HDFS, YARN, Cassandra,, Hive, Pig, HBase, Sqoop, Oozie, Flume.
- Good Knowledge on Map Reduce design patterns.
- Experience with distributed systems, large-scale non-relational data stores, NoSQL map-reduce systems, data modeling, database performance tuning, and multi-terabyte data warehouses.
- Extensively worked on Hive, Pig for performing data analysis.
- Experience in managing HBase database and using it to update/modify the data quickly.
- Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice-versa.
- Experience in ingesting log data into HDFS using Flume.
- Experience in managing and reviewing Hadoop log files.
- Involved in developing complex ETL transformation & performance tuning.
- Experience in developing, support and maintenance for the ETL (Extract, Transform and Load) processes using Talend Integration Suite.
- Experience in submitting Talend jobs for scheduling using Talend scheduler which is available in the Admin Console.
- Experience in Spark to process large stream of data.
- Experienced in running MapReduce and Spark jobs over YARN.
- Good knowledge in MongoDB concepts and its architecture.
- Hands on experience in application development using Java, RDBMS, and Linux shell scripting.
- Excellent working knowledge of System Development Life Cycle (SDLC) and Software Testing Life Cycle (STLC) and Defect Life Cycle.
- Well versed in installation, configuration, supporting and managing of Big Data and underlying infrastructure of Hadoop Cluster
Talend and Hadoop Developer
Confidential, Overland Park, KS
- Worked closely with Business Analysts to review the business specifications of the project and also to gather the ETL requirements.
- Created Talend jobs to copy the files from one server to another and utilized Talend FTP components.
- Created and managed Source to Target mapping documents for all Facts and Dimension tables
- Analyzing the source data to know the quality of data by using Talend Data Quality.
- Involved in writing SQL Queries and used Joins to access data from Oracle, and MySQL.
- Prepared ETL mapping Documents for every mapping and Data Migration document for smooth transfer of project from development to testing environment and then to production environment.
- Design and Implemented ETL for data load from heterogeneous Sources to SQL Server and Oracle as target databases and for Fact and Slowly Changing Dimensions SCD-Type1 and SCD-Type2.
- Utilized Big Data components like tHDFSInput, tHDFSOutput, tPigLoad, tPigFilterRow, tPigFilterColumn, tPigStoreResult, tHiveLoad, tHiveInput, tHbaseInput, tHbaseOutput, tSqoopImport and tSqoopExport.
- Used Talend most used components (tMap, tDie, tConvertType, tFlowMeter, tLogCatcher, tRowGenerator, tSetGlobalVar, tHashInput & tHashOutput and many more)
- Created many complex ETL jobs for data exchange from and to Database Server and various other systems including RDBMS, XML, CSV, and Flat file structures. Integrated java code inside Talend studio by using components like tJavaRow, tJava, tJavaFlex and Routines.
- Experienced in using debug mode of talend to debug a job to fix errors.
- Responsible for developing, support and maintenance for the ETL (Extract, Transform and Load) processes using Talend Integration Suite.
- Conducted JAD sessions with business users and SME's for better understanding of the reporting requirements.
- Developed Talend jobs to populate the claims data to data warehouse - star schema.
- Used Talend Admin Console Job conductor to schedule ETL Jobs on daily, weekly, monthly and yearly basis.
- Worked on various Talend components such as tMap, tFilterRow, tAggregateRow, tFileExist, tFileCopy, tFileList, tDie etc.
- Worked Extensively on Talend Admin Console and Schedule Jobs in Job Conductor.
Environment: Talend Open Studio, TalendAdministrator Console, MS SQL Server 2012/2008, Oracle 11g, Hive, HDFS,java, Sqoop, TOAD, UNIX..
Hadoop and Talend Developer
Confidential, Ottawa, KS
- Process the data and push the valid records to HDFS. Import data from MySQL to HDFS using SQOOP.
- Tune the MapReduce, PIG and Hive jobs to increase the performance and decrease the execution time of the jobs.
- Compress the files downloaded from the servers before storing them in the cluster to save cluster resources.
- Write Corejava programs to convert the JSON files to CSV or TSV files for further processing.
- Optimize already developed long running MapReduce and Pig job for better performance and accurate results.
- Create Hive databases and tables over the HDFS data and write HiveQL queries on the tables.
- Schedule Hadoop and UNIX jobs using OOZIE.
- Work with NoSQL databases like HBase.
- Write Pig and HiveUDFs for processing and analyzing log files.
- Developing Scripts and Batch Job to schedule various Hadoop Program.
- Visualize the complicated data analysis on the dashboards as per the business requirements.
- Integrated Hive, PIG and Mapreduce jobs with elastic search to publish the metrics to the dashboards.
- Utilized the most used Talend Components such as tMap, tFilterRow, tAggregateRow, tFileExist, tFileCopy, tFileList, tDie etc.
- Also, Utilized Big Data components such as tSqoopExport, tSqoopImport, tHDFSInput, tHDFSOutput, tHiveLoad, tHiveInput, tPigLoad, tPigFilterRow, tPigFilterColumn, tPigStoreResult,, tHbaseInput, tHbaseOutput along with executing the jobs in Debug mode and also utilizing the tlogrow component to view the sample output.
- Submitted talend jobs for scheduling using Talend scheduler which is available in the Admin Console.
- Deployed talend jobs on various environments including dev, test and production environments.
- Involved in analysis, design testing phases and responsible for documenting the technical specifications.
Environment: Hadoop 2x, YARN, HDFS, Mapreduce, PIG,HIVE, HBASE, Shell Scripting,java,Oozie, TALEND Open Studio, LINUX.
Sr. ETL/ Talend Developer
Confidential, Overland Park, KS
- Participated in Client Meetings and gathered business requirements and analyzed them.
- Design, develop, test, implement and support of Data Warehousing ETL using Talend and HadoopTechnologies.
- Design and Implement ETL processes to import data from and into Microsoft Azure.
- Research, analyze and prepare logical and physical data models for new applications and optimize the data structures to enhance data load times and end-user data access response times.
- Create pig and hive scripts to process various types of data sets and load them into data warehouse built on Hive.
- Develop stored procedures/views in Snowflake and use in Talend for loading Dimensions and Facts.
- Develop merge scripts to UPSERT data into Snowflake from an ETL source.
- Created Hive queries that helped market analysts spot emerging trends by comparing fresh data with EDW reference tables and historical metrics.
- Created complex mappings in Talend using tMap, tJoin, tReplicate, tParallelize, tJava, tjavarow, tJavaFlex, tAggregateRow, tDie, tWarn, tLogCatcher, etc.
- Created joblets in Talend for the processes which can be used in most of the jobs in a project like to Start job and Commit job.
- Developed jobs to move inbound files to vendor server location based on monthly, weekly and daily frequency.
- Implemented Change Data Capture technology in Talend in order to load deltas to a Data Warehouse.
- Perform ETL using different sources like databases, flat files, xml files.
- Migrated Snowflake database to Windows Azure and updating the Connection Strings based on requirement.
- Managed and reviewed Hadoop log files.
- Wrote ETL jobs to read from web apis using REST and HTTP calls and loaded into HDFS using java and Talend.
- Shared responsibility for administration of Hadoop, Hive and Pig and Talend.
- Tested raw data and executed performance scripts.
- Provided design recommendations and thought leadership to sponsors/stakeholders that improved review processes and resolved technical problems.
ENVIRONMENT: Talend, HDFS, HBase, MapReduce,, Eclipse, XML, JUNIT,Microsoft Azure, Hadoop, Apache Pig, Hive,java, Elastic Search, Web Services,Microsoft Office
SQL SERVER SSIS ETL Developer
Confidential, Overlandpark, KS
- Created highly complex SSIS packages using various Data transformations like conditional split, Fuzzy Lookup, For Each Loop, Multi Cast, column conversion, Fuzzy Grouping, Script Components.
- Created scripts for data validation and massaging of legacy data in the staging area before moving it to the DSS.
- Schedule Jobs to run SSIS packages at night, to feed daily data into the Decision Support System.
- Wrote documentation for the packages, scripts and the jobs created for DTS to SSIS migration.
- Created various reports such as graphs, charts, matrix, drill down, drill through, parameterized, sub reports and linked reports etc.
- Deployed the reports to the Report Manager. Performed trouble-shooting and maintenance of the reports with any enhancements or changes as needed from time to time.
- Set up report subscriptions as required by BRD and the management.
- Dropped and recreated Indexes in the SQL Server DSS while migrating legacy data to SQL Server 2005.
- Used BCP to transfer data.
- Generated server side T-SQL scripts for data manipulation and validation and created various snapshots and materialized views for remote instances.
- Modified existing databases by adding/removing tables there by altering referential integrity, primary key constraints and relationships according to requirements.
- Created trace in SQL Profiler and used Data base engine tuning advisor for Performance tuning on stored procedures and Scripts.
- Documented the migration process, reports and the DSS structure and various objects.
- Worked comfortably with a combination of Agile and Spiral methodologies.
- Facilitated meetings between various development teams, DBAs, and Testing teams for timely progress of the migration process.
- Participated in testing during the UAT (QA testing). Mentored and monitored team development and provided regular team status updates to the management.
Environment: SQL Server 2005/2000, DTS, SQL SSIS, SSRS, SQL Profiler, .NET Framework 3.5, C#, Visual Source Safe 2005.