Hadoop Developer Resume Hoffman Estate, IL - Hire IT People

SUMMARY

Around 8+ years of total experience in IT industry which includes recent experience in Big Data Hadoop Ecosystem like Hive, Pig, Spark - streaming and data warehousing tool.
Experience with mapping, creation, designing, analysis, design implementation and support of application software based on client requirement, developing with n-Tier architecture solutions with distributed components of internet/intranet.
Around 4.5+ years of hands-on experience in working on Apache Hadoop ecosystem components like Map-Reduce, Sqoop, Flume, Spark, Pig, Hive, HBase, Oozie, Kafka, and Zookeeper.
Excellent understanding/ knowledge of Hadoop architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, Resource Manager and Map Reduce.
Experience in analyzing and cleansing raw data using HiveQL, Pig Latin.
Knowledge in job work-flow/ sub work flows scheduling and monitoring tools like Oozie.
Experience in different Hadoop distributions like Cloudera 5.3(CDH4, CHD 5) and Horton Works Distributions (HDP).
Experience in optimization of MapReduce algorithm using Combiners and Partitioners to deliver efficient results.
Hands on experience with the Spark SQL for complex data transformations using Scala programming language.
Experienced in using Kafka as a distributed publisher-subscriber messaging system.
Experience in importing and exporting data using Sqoop from HDFS/Hive/HBase to Relational Database Systems and vice-versa.
Experience in data transformations using Map-Reduce, HIVE, Pig scripts and Spark for different file formats.
Expertise in analyzing the data using HIVE and writing custom UDF's in JAVA for extended HIVE and PIG core functionality.
Hands on experience in using BI tools like Splunk/Hunk, Tableau.
Experience in understanding the security requirements for Hadoop and integrate with Kerberos authentication and authorization infrastructure.
Solid understanding of HDFS Designs, Daemons and HDFS high availability (HA).
Experience with various scripting languages like Linux/Unix shell scripts, Python.
Experience in understanding and managing Hadoop Log Files.
Experience in managing the Hadoop infrastructure with Cloudera Manager.
Experienced with data warehousing and ETL processes.
Expert knowledge of data warehousing concepts, with hands-on in developing ETL applications in a dimensional data mart/data warehouse environment.
Involved in creating MVC architecture using java, validating files, Struts frame Work.
Participated in Review of Test Plan, Test Cases and Test Scripts prepared by system integration testing team.
Monitored the performance and identified performance bottlenecks in ETL code.
Strong experience in client interaction and understanding business application, business data flow and data relations.
Have flair to adapt to new software applications and products, self-starter, have excellent communication skills and good understanding of business work flow.

TECHNICAL SKILLS

Scripting/ Languages: Shell Scripting, Unix script, Pig Latin, HiveQL.

Big Data Technologies: HDFS, MapReduce, Hive, Hue, Pig, Sqoop, Flume, Spark, Zookeeper, Oozie, Kafka, Impala, Hbase

RDBMS: MySQL, Oracle, Teradata, MSSQL

No Sql Databases: Hbase, Mark Logic, Cassandra

Programming language: Python, Scala, SQL, Java

IDE’s: NetBeans, Eclipse

Tools: Maven, Tableau

Virtual Machines: VMWare, Virtual Box

Hadoop Distributions: Cent OS 5.5, Unix, Red Hat Linux, Windows7, Debian, Kali

Software Methodologies: SDLC(water Fall)/Agile, Scrum

PROFESSIONAL EXPERIENCE

Confidential - Hoffman Estate, IL

Hadoop Developer

Responsibilities:

Worked with the source team to understand the format & delimiters of the data files.
Responsible for generating actionable insights from complex data to drive real business results for various application teams.
Developed spark scripts by using Scala shell as per requirements.
Developed and implemented API services using Python in spark.
Troubleshoot and resolve data quality issues and maintain high level of data accuracy in the data being reported.
Extensively implemented POC’s on migrating to Spark-Streaming to process the live data.
Ingested data from RDBMS and performed data transformations, and then export the transformed data to Cassandra as per the business requirement.
Re-writing existing map-reduce jobs to use new features and improvements for achieving faster results.
Analyzes the large amount of data sets to determine optimal way to aggregate and report on it.
Performance tuned slow running resource intensive jobs.
Worked on Data serialization formats for converting complex objects into sequence bits by using Avro, Parquet, Json, CSV formats.
Hands on experience working on in-memory based application Apache Spark for ETL transformations.
Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Python.
Developed multiple POCs using PySpark and deployed on the Yarn cluster, compared the performance of Spark, with Hive and SQL/Teradata.
Developed Flume configuration to extract log data from different resources and transfer data with different file formats (JSON, XML, Parquet) to hive tables using different SerDe’s.
Setup Oozie work flow /sub work flow jobs for Hive/Sqoop/HDFS actions.
Experience in accessing Kafka cluster to consume data into Hadoop.
Involved in importing the real time data to Hadoop using Kafka and implemented the Oozie job for daily imports.
Involved in some product business and functional requirement through gathering team, update the user comments in JIRA and documentation in confluence
Worked with team project manager on daily basis to complete project scheduling and admin tasks.
Handled tasks like maintaining accurate roadmap for project or certain product.
Monitoring the sprints, burndown charts and completing the monthly reports.

Confidential, Columbus, OH

HADOOP DEVELOPER

Responsibilities:

Worked on Spark SQL, Reading/Writing data from JSON file, text file, parquet file, Schema RDD.
Identifying data sources and create appropriate data ingestion procedures.
Transformed the data using Spark, Hive, Pig for BI team in order to perform visual analytics according to the client requirement.
Populate big data customer Marketing data structures.
Performed complex joins on tables in hive with various optimization techniques.
Implemented lateral view in conjunction with UDFs in Hive according to the client requirement.
The Hive tables created as per requirement were internal or external tables defined with appropriate static and dynamic partitions, intended for efficiency.
Performed ETL Procedures on the data HDFS.
Connected hive and impala to tableau reporting tool and generated graphical reports.
Worked extensively with HIVE DDLS and Hive Query language(HQLs).
Developed PIG Latin for handling business transformations and Responsible writing PIG script and Hive queries for data processing.
Extend Hive and Pig core functionality by writing custom UDFs using Java.
Developed HBase tables to store variable data format of input data coming from different portfolios.
Involved in adding huge volume of data in rows and columns to store in Hbase.
Created and maintained technical documentation for launching Hadoop cluster and for executing Hive queries and Pig scripts.
Involved in moving all transaction files generated from various sources to HDFS for further processing through flume.
Developed Sqoop jobs to extract data from Teradata/oracle into HDFS.
Involved in developing Impala scripts for extraction, transformation, loading of data into data warehouse.
Worked on importing and exporting data from Oracle and DB2 into HDFS and HIVE using Sqoop.
Automated all the jobs, for pulling data from FTP server to load data into Hive tables using Oozie workflows.
Loaded data into the cluster from dynamically generated files using Flume and from relational database management systems using Sqoop.

Confidential - Columbus, OH

Data Warehouse Developer

Responsibilities:

Designed jobs involving various cross reference lookups and joins, shared containers which can be used in multiple jobs.
Sequencers are created at job level to include multiple jobs and a layer level sequence which include all job level sequences.
Loading Data into Dimension tables.
Extensively employed Data Stage Director to validate, run, schedule, monitor the jobs and followed job log carefully to debug the jobs.
Carefully monitored the performance statistics and involved in fine tuning of jobs for the improved processing time.
Involved in developing UNIX scripts to call Data stage jobs.
Responsible for integrating the Datastage jobs with ETL Control framework and Tivoli, scheduling.
Involved in fine tuning, trouble shooting, bug fixing, defect analysis and enhancement of the multiple admin systems Data stage jobs.
Involved in the designing of marts and dimensional and fact tables.

Environment: • Data stage 7.5 • Teradata •Main Frames • SQL •ETL

Confidential - Kalamazoo, MI

Jr.JAVA DEVELOPER

Responsibilities:

The application was developed in J2EE using an MVC based architecture.
Implemented MVC design using Struts1.3 frameworks, JSP custom tag Libraries and various in-house custom tag libraries for the presentation layer.
Created tile definitions, Struts-config files, validation files and resource bundles for all modules using Struts framework.
Wrote prepared statements and called stored Procedures using callable statements in MySQL.
Executed SQL queries to perform crud operations on customer records.
Used Apache web sphere as the application server for deployment.
Used Web services for transmission of large blocks of XML data over HTTP.

Environment: • Java • JSP •Mysql • Struts •Tomcat Web Server • Html • XML • Eclipse •CSS

Confidential - Chicago, IL

Support Engineer

Responsibilities:

Developed and executed shell scripts to automate the jobs.
Wrote complex Hive queries and UDFs.
Worked on reading multiple data formats on HDFS using Hive using SerDe’s.
Wrote Pig scripts to run ETL jobs on the data in HDFS.
Used Hive to do analysis on the data and identify different correlations.
Written lots PIG UDF to process some complex data.
Worked on importing and exporting data from Oracle and DB2 into HDFS and HIVE using Sqoop.

Environment: • Hive • SQL •Pig • Flume • MYSQL

We provide IT Staff Augmentation Services!

Hadoop Developer Resume

Hoffman Estate, IL

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship