We provide IT Staff Augmentation Services!

Senior Hadoop Developer Resume

3.00/5 (Submit Your Rating)

Chicago, IL

SUMMARY:

  • Around 9+ years of IT experience with gathering and analyzing customer’s technical requirements, development, management and maintenance projects on platforms like Hadoop, QlikView and Talend.
  • Expertise in Processing 25 PB data/ 700 Nodes including Dev/Prod clusters.
  • Excellent understanding of Hadoop architecture and different demons of Hadoop clusters which include Job Tracker, Task Tracker, Name Node and Data Node.
  • Working Experience on Snowflake Elastic data warehouse, cloud - based data-warehousing for storage and analyzing data.
  • Designed and Developed Automation Framework to save money and effort in Development Tasks.
  • Working Experience on Talend Components like tjavarow, tmap, thmap, tjdbc, context, hbase, hive and pig and big data batch processing and streaming processing components.
  • Experience in working with Hadoop in Stand-alone, pseudo and distributed modes.
  • Hands on experience on big data tools like Face book Presto, Apache Drill, Snowflake.
  • Hands on Experience on Kafka Streaming,Scala-Spark Streaming
  • Efficient in writing Map Reduce Programs and using Apache Hadoop API for analyzing the structured and unstructured data.
  • Experienced in handling different file formats like Text file, Sequence files and JSON files.
  • Expertise in implementing Ad-hoc queries using Hive QL.
  • Responsible for performing extensive data validation using HIVE Dynamic Partitioning and Bucketing.
  • Working Experience on Phoenix a massively parallel, relational database for analyzing data with Apache Hadoop.
  • Implemented Data Quality, Price Gap Rules in ETL Tool Talend.
  • Expertise in developing Hive Generic UDF's to implement complex business logic to in corporate into Hive QL.
  • Experienced in using Aggregate functions, table generated functions; implementing UDF's to handle complex objects.
  • Experienced in handling different optimization join operations like Map join, Sorted Bucketed Map join, Merge, Update, Delete, HUE etc.
  • Worked with QlikView Extensions like SVG Maps, HTML Content.
  • Developed Set Analysis to provide custom functionality in QlikView application.
  • Used Binary Load, Resident Load, Preceding Load, And Incremental Load during Data Model.
  • Experienced in Performance Tuning Hive queries using Hive configurable parameters.
  • Developed PIG Latin scripts using operators to extract data from data files to load into HDFS.
  • Experience in using Apache Sqoop to import and export data to and from (different sources) HDFS and Hive.
  • Worked with different file formats like RC file, Sequence file, ORC and AVRO file format.
  • Hands on experience in configuring and working with Flume to load the data from multiple sources directly into HDFS.
  • Good at working on low-level design documents and System Specifications.
  • Experience in working with BI team and transform Big Data requirements into Hadoop centric technologies.
  • Having good experience on Hadoop Administration like Cluster configuration, Single Node Configuration, Multi Node Configuration and Installing in distributed environment.
  • Trained around 50 associates on Hadoop, QlikView and its relative components.
  • Exposure to Shell Scripting for Build & Deployment Process
  • Worked as part of an agile team serving as a developer to customize, maintain and enhance a variety of applications for Hadoop.
  • Comprehensive knowledge of Software Development Life Cycle coupled with excellent communication skills.
  • Strong technical and interpersonal skills combined with great commitment towards meeting deadlines.
  • Experience working in both team and individual environments. Always eager to learn new technologies and implement them in challenging environment.

TECHNICAL SKILLS:

Hadoop Technologies and Distributions: Apache Hadoop, Cloudera Hadoop Distribution CDH3, CDH4, CDH5 and IBM Big Insights, Horton works

Hadoop Ecosystem: HDFS, Map-Reduce,Spark,Kafka,Hive, Pig, Sqoop, Oozie, Flume, HUE

NoSQL Database: Hbase

Databases: ORACLE,MySQL,Greenplum,Snowflake,Phoneix

Operating Systems: Linux (Red Hat, CentOS), Windows XP/7/8

ETL+Reporting: Talend

Cluster Management Tools: Cloudera Manager

BI Tools: QlikView

PROFESSIONAL EXPERIENCE:

Confidential, Chicago, IL

Senior Hadoop Developer

Responsibilities:

  • Designed and Developed Data lake Enterprise layer gold confirmed process which is available for the consumption team and business users to perform analytics.
  • Responsible for designing and developing an automated framework which creates automates the development process in Data lake.
  • Integrated Talend with HBase for storing the processed Enterprise Data into separate column families and column qualifiers.
  • Used CronTab and Zena scheduling to schedule trigger jobs in production.
  • Worked with cross functional consulting teams within the data science and analytics team to design, develop, and execute solutions to derive business insights and solve clients' operational and strategic problems.
  • Involved in migration of Teradata queries into the snowflake Data warehouse queries.
  • Worked in Agile Scrum model and involved in sprint activities.
  • Gathering and Analysis of Business requirements.
  • Worked in Various Talend Integrations with Hbase and Avro Format, Hive, Phoenix and Pig Components
  • Worked with GitHub, Zena, Jira, Jenkins Tools and deployed the projects in to production environments
  • Involved in Cluster coordination services through Zookeeper.
  • Worked On Integration with Phoenix Thick and thin clients and also involved in installing and developing Phoenix-Hive, Hive-Hbase integrations.
  • Wrote UNIX Automated Shell scripts and developed an automation framework with Talend and Unix.
  • Created Merge, Update, Delete Scripts in Hive and worked on performance tuning Joins in Hive.

Environment: Hadoop, HDFS, Hive, QlikView, UNIX shell scripting, Hue, Hbase, Avro Format, Phoenix, Talend, Snowflake.

Confidential, Deerfield, IL

Hadoop Developer

Responsibilities:

  • Responsible for designing and implementing ETL process to load data from different sources, perform data mining and analyze data using visualization/reporting.
  • Developed Big Data Solutions that enabled the business and technology teams to make data-driven decisions on the best ways to acquire customers and provide them business solutions.
  • Involved in migration of Teradata queries such as updates, inserts and deletes migration into the hive queries.
  • Development of PIG scripts for Noise reduction. Developed the Sqoop scripts for the historical data with BIG tables with more than 4 TB tables.
  • Worked in Agile Scrum model and involved in sprint activities.
  • Gathering and Analysis of Business requirements.
  • Involved with optimizing query performance and data load times in PIG, Hive and Map Reduce applications.
  • Expert in optimizing performance in hive using partitions and bucketing concepts.
  • Experienced to interact with data scientists to implement ad-hoc queries using Hive QL, Partitioning, bucketing and Hive Custom UDF's.
  • Experienced in optimizing hive queries, optimized joins and using different data files with Custom SerDe's.
  • Designed the process to do historical/incremental load.
  • Involved in Sqooping more than 20 TB of Data from Teradata to Hadoop.

Environment: Hadoop, HDFS, Hive, QlikView, UNIX shell scripting, Hue,Hbase,Pig,Sqoop, Talend.

Confidential, Bentonville, AR

Hadoop Developer

Responsibilities:

  • Developed Big Data Solutions that enabled the business and technology teams to make data-driven decisions on the best ways to acquire customers and provide them business solutions.
  • Analyze the assigned user stories in JIRA (Agile software) and create design documents.
  • Attend Daily stand ups and update the hours burned down in JIRA.
  • Worked in Agile Scrum model and involved in sprint activities.
  • Gathering and Analysis of Business requirements.
  • Created and Implemented Business, validation and coverage, Price gap Rules in Talend on Hive, Greenplum Databases using Talend Tool.
  • Involved in development of Talend components to validate the data quality across different data sources.
  • Involved in analysis of business validation rules and finding options for the implementation of the rules in Talend
  • Exceptions thrown out of the data validation rule execution will be sent back to the business users for remediating the data and ensuring clean data across data sources.
  • Worked on Global ID Tool to Apply the Business Rules.
  • Automated andSchedulingthe Rules on Weekly, Monthly Basis in TAC (Talend Administration Centre).
  • Created Scheduling Process with CRON Scheduling on a weekly Process.
  • Created and Maintained the Hive Tables and Greenplum Tables on weekly basis.
  • Collected the data from ftp server and loaded into the Hive tables.
  • Partitioned the collected logs by date/timestamps and host names.
  • Developed Data Quality Rules on top of the External Vendors Data.
  • Imported data frequently from MySQL to HDFS using Sqoop.
  • Supported operations team in Hadoop cluster maintenance activities including commissioning and decommissioning nodes and upgrades.
  • Used QlikView for visualizing and to generate reports.
  • Worked with systems engineering team to plan and deploy new Hadoop environments and expand existing Hadoop clusters.
  • Managing and scheduling Jobs using Oozie on a Hadoop cluster.
  • Involved in Data Modelling by Using QlikView Integration of Data Sources - ETL with QlikView reports.
  • Involved in defining job flows, managing and reviewing log files.
  • Monitored workload, job performance and capacity planning using Cloud era Manager.
  • Installed Oozie workflow engine to run multiple Map Reduce, Hive and Pig jobs.
  • Responsible for loading and transforming large sets of structured, semi structured and unstructured data.
  • Responsible to manage data coming from different sources.
  • Implemented pushing the data from Hadoop to Greenplum.
  • Worked on pre-processing the data using pig regular expressions.
  • Gained experience with NOSQL database.
  • Worked on scheduling the jobs through Resource Manager.
  • Created and maintained Technical documentation for launching HADOOP Clusters and for executing Hive queries and Talend Jobs.

Environment: Hadoop, HDFS, Hive, QlikView, UNIX shell scripting, Hue,Greenplum, Talend.

Confidential

QlikViewDeveloper

Responsibilities:

  • Worked in Agile Scrum model and involved in sprint activities.
  • Analysis of Business requirements and implementing Customer Friendly Dashboards.
  • Implemented Section Access for Security Implementation
  • Involved in Data Modelling by Using QlikView Integration of Data Sources - ETL with QlikView reports.
  • Identify and improve weak areas in the applications, performance reviews and code walk through to ensure quality.
  • Created QVD's and Designed QlikView Dashboards using different types of QlikView Objects.
  • Modified ETL Scripts while loading the data, resolving loops & ambiguity joins.
  • Wrote complex expressions using the Aggregation functions to match the logic with the business SQL.
  • Performance tuning by analyzing and comparing the turnaround times between SQL and QlikView.
  • Worked with QlikView Extensions like SVG Maps, HTML Content.
  • Developed Set Analysis to provide custom functionality in QlikView application.
  • Used Binary Load, Resident Load, Preceding Load, And Incremental Load during Data Model.

Environment: Hadoop, HDFS, Hive, SQL and QlikView.

Confidential

Hadoop Developer

Responsibilities:

  • Worked on analyzing Hadoop cluster using different big data analytic tools including Pig,Hive, and Map Reduce.
  • Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis
  • Worked on debugging, performance tuning of Hive & Pig Jobs.
  • Created HBase tables to store various data formats ofPII data coming from different portfolios.
  • Implemented test scripts to support test driven development and continuous integration.
  • Worked on tuning the performance Pig queries.
  • Cluster co-ordination services through Zookeeper.
  • Experience in managing development time, bug tracking, project releases, development speed, release forecast, scheduling and many more.
  • Involved in loading data from LINUX file system to HDFS.
  • Importing and exporting data into HDFS and Hive using Sqoop.
  • Developed Java program to extract the values from XML using XPaths.
  • Experience working on processing unstructured data using Pig and Hive.
  • Supported Map Reduce Programs those are running on the cluster.
  • Gained experience in managing and reviewing Hadoop log files.
  • End-to-end performance tuning of Hadoop clusters and Hadoop Map/Reduce routines against very large data sets.
  • Implemented test scripts to support test driven development and continuous integration.
  • Involved in scheduling Oozie workflow engine to run multiple Hive and pig jobs.
  • Assisted in monitoring Hadoop cluster using tools like Cloudera Manager.
  • Experience in optimization of Map reduce algorithm using combiners and partitions to deliver the best results and worked on Application performance optimization for a HDFS
  • Created and maintained Technical documentation for launching HADOOP Clusters and for executing Hive queries and Pig Scripts.

Environment: Hadoop(CDH4), Map-Reduce, HBase,Hive,Sqoop,Oozie.

Confidential

Jr Java Hadoop Developer

Responsibilities:

  • Analysing the requirements.
  • Develop Map/reduce programs.
  • Developed components to interact web, HDFS and reports.
  • Developed the statistical charts using Primefaces.

Environment: Hadoop(CDH4), Map-Reduce, HBase,Hive,Sqoop,Oozie.

We'd love your feedback!