Senior Hadoop Developer Resume
Chicago, IL
SUMMARY:
- Around 9+ years of IT experience with gathering and analyzing customer’s technical requirements, development, management and maintenance projects on platforms like Hadoop, QlikView and Talend.
- Expertise in Processing 25 PB data/ 700 Nodes including Dev/Prod clusters.
- Excellent understanding of Hadoop architecture and different demons of Hadoop clusters which include Job Tracker, Task Tracker, Name Node and Data Node.
- Working Experience on Snowflake Elastic data warehouse, cloud - based data-warehousing for storage and analyzing data.
- Designed and Developed Automation Framework to save money and effort in Development Tasks.
- Working Experience on Talend Components like tjavarow, tmap, thmap, tjdbc, context, hbase, hive and pig and big data batch processing and streaming processing components.
- Experience in working with Hadoop in Stand-alone, pseudo and distributed modes.
- Hands on experience on big data tools like Face book Presto, Apache Drill, Snowflake.
- Hands on Experience on Kafka Streaming,Scala-Spark Streaming
- Efficient in writing Map Reduce Programs and using Apache Hadoop API for analyzing the structured and unstructured data.
- Experienced in handling different file formats like Text file, Sequence files and JSON files.
- Expertise in implementing Ad-hoc queries using Hive QL.
- Responsible for performing extensive data validation using HIVE Dynamic Partitioning and Bucketing.
- Working Experience on Phoenix a massively parallel, relational database for analyzing data with Apache Hadoop.
- Implemented Data Quality, Price Gap Rules in ETL Tool Talend.
- Expertise in developing Hive Generic UDF's to implement complex business logic to in corporate into Hive QL.
- Experienced in using Aggregate functions, table generated functions; implementing UDF's to handle complex objects.
- Experienced in handling different optimization join operations like Map join, Sorted Bucketed Map join, Merge, Update, Delete, HUE etc.
- Worked with QlikView Extensions like SVG Maps, HTML Content.
- Developed Set Analysis to provide custom functionality in QlikView application.
- Used Binary Load, Resident Load, Preceding Load, And Incremental Load during Data Model.
- Experienced in Performance Tuning Hive queries using Hive configurable parameters.
- Developed PIG Latin scripts using operators to extract data from data files to load into HDFS.
- Experience in using Apache Sqoop to import and export data to and from (different sources) HDFS and Hive.
- Worked with different file formats like RC file, Sequence file, ORC and AVRO file format.
- Hands on experience in configuring and working with Flume to load the data from multiple sources directly into HDFS.
- Good at working on low-level design documents and System Specifications.
- Experience in working with BI team and transform Big Data requirements into Hadoop centric technologies.
- Having good experience on Hadoop Administration like Cluster configuration, Single Node Configuration, Multi Node Configuration and Installing in distributed environment.
- Trained around 50 associates on Hadoop, QlikView and its relative components.
- Exposure to Shell Scripting for Build & Deployment Process
- Worked as part of an agile team serving as a developer to customize, maintain and enhance a variety of applications for Hadoop.
- Comprehensive knowledge of Software Development Life Cycle coupled with excellent communication skills.
- Strong technical and interpersonal skills combined with great commitment towards meeting deadlines.
- Experience working in both team and individual environments. Always eager to learn new technologies and implement them in challenging environment.
TECHNICAL SKILLS:
Hadoop Technologies and Distributions: Apache Hadoop, Cloudera Hadoop Distribution CDH3, CDH4, CDH5 and IBM Big Insights, Horton works
Hadoop Ecosystem: HDFS, Map-Reduce,Spark,Kafka,Hive, Pig, Sqoop, Oozie, Flume, HUE
NoSQL Database: Hbase
Databases: ORACLE,MySQL,Greenplum,Snowflake,Phoneix
Operating Systems: Linux (Red Hat, CentOS), Windows XP/7/8
ETL+Reporting: Talend
Cluster Management Tools: Cloudera Manager
BI Tools: QlikView
PROFESSIONAL EXPERIENCE:
Confidential, Chicago, IL
Senior Hadoop Developer
Responsibilities:
- Designed and Developed Data lake Enterprise layer gold confirmed process which is available for the consumption team and business users to perform analytics.
- Responsible for designing and developing an automated framework which creates automates the development process in Data lake.
- Integrated Talend with HBase for storing the processed Enterprise Data into separate column families and column qualifiers.
- Used CronTab and Zena scheduling to schedule trigger jobs in production.
- Worked with cross functional consulting teams within the data science and analytics team to design, develop, and execute solutions to derive business insights and solve clients' operational and strategic problems.
- Involved in migration of Teradata queries into the snowflake Data warehouse queries.
- Worked in Agile Scrum model and involved in sprint activities.
- Gathering and Analysis of Business requirements.
- Worked in Various Talend Integrations with Hbase and Avro Format, Hive, Phoenix and Pig Components
- Worked with GitHub, Zena, Jira, Jenkins Tools and deployed the projects in to production environments
- Involved in Cluster coordination services through Zookeeper.
- Worked On Integration with Phoenix Thick and thin clients and also involved in installing and developing Phoenix-Hive, Hive-Hbase integrations.
- Wrote UNIX Automated Shell scripts and developed an automation framework with Talend and Unix.
- Created Merge, Update, Delete Scripts in Hive and worked on performance tuning Joins in Hive.
Environment: Hadoop, HDFS, Hive, QlikView, UNIX shell scripting, Hue, Hbase, Avro Format, Phoenix, Talend, Snowflake.
Confidential, Deerfield, IL
Hadoop Developer
Responsibilities:
- Responsible for designing and implementing ETL process to load data from different sources, perform data mining and analyze data using visualization/reporting.
- Developed Big Data Solutions that enabled the business and technology teams to make data-driven decisions on the best ways to acquire customers and provide them business solutions.
- Involved in migration of Teradata queries such as updates, inserts and deletes migration into the hive queries.
- Development of PIG scripts for Noise reduction. Developed the Sqoop scripts for the historical data with BIG tables with more than 4 TB tables.
- Worked in Agile Scrum model and involved in sprint activities.
- Gathering and Analysis of Business requirements.
- Involved with optimizing query performance and data load times in PIG, Hive and Map Reduce applications.
- Expert in optimizing performance in hive using partitions and bucketing concepts.
- Experienced to interact with data scientists to implement ad-hoc queries using Hive QL, Partitioning, bucketing and Hive Custom UDF's.
- Experienced in optimizing hive queries, optimized joins and using different data files with Custom SerDe's.
- Designed the process to do historical/incremental load.
- Involved in Sqooping more than 20 TB of Data from Teradata to Hadoop.
Environment: Hadoop, HDFS, Hive, QlikView, UNIX shell scripting, Hue,Hbase,Pig,Sqoop, Talend.
Confidential, Bentonville, AR
Hadoop Developer
Responsibilities:
- Developed Big Data Solutions that enabled the business and technology teams to make data-driven decisions on the best ways to acquire customers and provide them business solutions.
- Analyze the assigned user stories in JIRA (Agile software) and create design documents.
- Attend Daily stand ups and update the hours burned down in JIRA.
- Worked in Agile Scrum model and involved in sprint activities.
- Gathering and Analysis of Business requirements.
- Created and Implemented Business, validation and coverage, Price gap Rules in Talend on Hive, Greenplum Databases using Talend Tool.
- Involved in development of Talend components to validate the data quality across different data sources.
- Involved in analysis of business validation rules and finding options for the implementation of the rules in Talend
- Exceptions thrown out of the data validation rule execution will be sent back to the business users for remediating the data and ensuring clean data across data sources.
- Worked on Global ID Tool to Apply the Business Rules.
- Automated andSchedulingthe Rules on Weekly, Monthly Basis in TAC (Talend Administration Centre).
- Created Scheduling Process with CRON Scheduling on a weekly Process.
- Created and Maintained the Hive Tables and Greenplum Tables on weekly basis.
- Collected the data from ftp server and loaded into the Hive tables.
- Partitioned the collected logs by date/timestamps and host names.
- Developed Data Quality Rules on top of the External Vendors Data.
- Imported data frequently from MySQL to HDFS using Sqoop.
- Supported operations team in Hadoop cluster maintenance activities including commissioning and decommissioning nodes and upgrades.
- Used QlikView for visualizing and to generate reports.
- Worked with systems engineering team to plan and deploy new Hadoop environments and expand existing Hadoop clusters.
- Managing and scheduling Jobs using Oozie on a Hadoop cluster.
- Involved in Data Modelling by Using QlikView Integration of Data Sources - ETL with QlikView reports.
- Involved in defining job flows, managing and reviewing log files.
- Monitored workload, job performance and capacity planning using Cloud era Manager.
- Installed Oozie workflow engine to run multiple Map Reduce, Hive and Pig jobs.
- Responsible for loading and transforming large sets of structured, semi structured and unstructured data.
- Responsible to manage data coming from different sources.
- Implemented pushing the data from Hadoop to Greenplum.
- Worked on pre-processing the data using pig regular expressions.
- Gained experience with NOSQL database.
- Worked on scheduling the jobs through Resource Manager.
- Created and maintained Technical documentation for launching HADOOP Clusters and for executing Hive queries and Talend Jobs.
Environment: Hadoop, HDFS, Hive, QlikView, UNIX shell scripting, Hue,Greenplum, Talend.
Confidential
QlikViewDeveloper
Responsibilities:
- Worked in Agile Scrum model and involved in sprint activities.
- Analysis of Business requirements and implementing Customer Friendly Dashboards.
- Implemented Section Access for Security Implementation
- Involved in Data Modelling by Using QlikView Integration of Data Sources - ETL with QlikView reports.
- Identify and improve weak areas in the applications, performance reviews and code walk through to ensure quality.
- Created QVD's and Designed QlikView Dashboards using different types of QlikView Objects.
- Modified ETL Scripts while loading the data, resolving loops & ambiguity joins.
- Wrote complex expressions using the Aggregation functions to match the logic with the business SQL.
- Performance tuning by analyzing and comparing the turnaround times between SQL and QlikView.
- Worked with QlikView Extensions like SVG Maps, HTML Content.
- Developed Set Analysis to provide custom functionality in QlikView application.
- Used Binary Load, Resident Load, Preceding Load, And Incremental Load during Data Model.
Environment: Hadoop, HDFS, Hive, SQL and QlikView.
Confidential
Hadoop Developer
Responsibilities:
- Worked on analyzing Hadoop cluster using different big data analytic tools including Pig,Hive, and Map Reduce.
- Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis
- Worked on debugging, performance tuning of Hive & Pig Jobs.
- Created HBase tables to store various data formats ofPII data coming from different portfolios.
- Implemented test scripts to support test driven development and continuous integration.
- Worked on tuning the performance Pig queries.
- Cluster co-ordination services through Zookeeper.
- Experience in managing development time, bug tracking, project releases, development speed, release forecast, scheduling and many more.
- Involved in loading data from LINUX file system to HDFS.
- Importing and exporting data into HDFS and Hive using Sqoop.
- Developed Java program to extract the values from XML using XPaths.
- Experience working on processing unstructured data using Pig and Hive.
- Supported Map Reduce Programs those are running on the cluster.
- Gained experience in managing and reviewing Hadoop log files.
- End-to-end performance tuning of Hadoop clusters and Hadoop Map/Reduce routines against very large data sets.
- Implemented test scripts to support test driven development and continuous integration.
- Involved in scheduling Oozie workflow engine to run multiple Hive and pig jobs.
- Assisted in monitoring Hadoop cluster using tools like Cloudera Manager.
- Experience in optimization of Map reduce algorithm using combiners and partitions to deliver the best results and worked on Application performance optimization for a HDFS
- Created and maintained Technical documentation for launching HADOOP Clusters and for executing Hive queries and Pig Scripts.
Environment: Hadoop(CDH4), Map-Reduce, HBase,Hive,Sqoop,Oozie.
Confidential
Jr Java Hadoop Developer
Responsibilities:
- Analysing the requirements.
- Develop Map/reduce programs.
- Developed components to interact web, HDFS and reports.
- Developed the statistical charts using Primefaces.
Environment: Hadoop(CDH4), Map-Reduce, HBase,Hive,Sqoop,Oozie.