Hadoop Developer Resume Kansas City, MO - Hire IT People

SUMMARY

Hadoop Developer with over 7+ years of experience in building big data applications, data pipelines, creating data lakes to manage structured and semi - structured data and workflow implementations using Big data ecosystems like Hadoop, Spark, Kafka etc.
Expertise in developing applications using Java, Scala and Python.
Expertise in working with Hadoop Distributions like EMR, Hortonworks, and Cloudera.
In-depth understanding of Hadoop Architecture and its various components such as Resource Manager, Node Manager, Applications Master, Name Node, Data Node etc.,
Worked extensively in real-time streaming data pipelines using Spark-Streaming, and Kafka.
Extensive experience writing end to end Spark Applications both using Scala and Python and utilizing Spark RDD, Spark DataFrames, Spark SQL and Spark Streaming.
Gained good experience troubleshooting long running jobs in Spark and fine tuning the performance bottlenecks.
Expertise in writing DDLs and DMLs scripts for analytics applications in Hive.
Experienced in Python development for various ETL and Data analytics applications as well as working with python libraries like Matplotlib, Numpy, Scipy, and Pandas for data analysis.
Expertise in working with AWS cloud services like EC2, S3, Redshift, EMR, Lambda, DynamoDB, RDS, Glue, and Athena for big data development.
Expertise in working with Hive optimization techniques like Partitioning, Bucketing, vectorizations and Map side-joins, Bucket-Map Join, skew joins.
Expertise in debugging and tuning failed and long-running Spark applications using various optimization techniques for executor tuning, memory management, Serialization, Broadcasting, and persisting methods assuring the optimal performance of applications.
Experience working with batch processing and operational data sources and migration of data from traditional databases to Hadoop and NoSQL databases.
Experienced with different file formats like Parquet, ORC, CSV, Text, XML, JSON, and Avro files.
Expertise in data ingestion using Flume, Sqoop, and Nifi.
Experience in orchestrating workflows using Oozie and Airflow.
Good Knowledge in making and keeping up profoundly versatile and fault-tolerant Infrastructure in AWS environment spanning over different availability zones.
Passionate about gleaning insightful information from massive datasets and developing a culture of sound, data-driven decision making.
I am a good team player who likes to take initiative and seek out new challenges.
Excellent communication skills can work in a fast-paced multitasking environment both independently and in a collaborative team, a self-motivated enthusiastic learner.
Involved in all the phases of Software Development Life Cycle (Requirements Analysis, Design, Development, Testing, Deployment, and Support) and Agile methodologies

TECHNICAL SKILLS

Big Data Technologies: Spark, Hive, HDFS, Apache NiFI, Map Reduce, Sqoop, HBase, Oozie, Impala, and Kafka.

Hadoop Distributions: Cloudera, HDP and EMR

Languages: Java, Scala, Python and SQL

No SQL Databases: HBase, Cassandra, and MongoDB

AWS Services: EC2, EMR, Redshift, RDS, S3, AWS Lambda, CloudWatch, Glue, Athena

Databases: MySQL, Teradata, Oracle

Other tools: JIRA, GitHub, Jenkins

PROFESSIONAL EXPERIENCE

Confidential, Kansas City, MO

Hadoop Developer

Responsibilities:

Worked on building centralized Data lake on AWS Cloud utilizing primary services like S3, EMR, Redshift and Athena and Glue.
Hands on experience in building and deploying Spark applications for performing ETL workloads on large datasets.
Built series of Spark Applications and Hive scripts to produce various analytical datasets needed for digital marketing teams.
Worked extensively on building and automating data ingestion pipelines and moving terabytes of data from existing data warehouses to cloud.
Worked extensively on fine tuning spark applications and providing production support to various pipelines running in production.
Worked closely with business teams and data science teams and ensured all the requirements are translated accurately into our data pipelines.
Developed PySpark based pipelines using spark data frame operations to load data to EDL using EMR for jobs execution & AWS S3 as storage layer.
Worked on full spectrum of data engineering pipelines: data ingestion, data transformations and data analysis/consumption.
Developed AWS lambdas using Python & Step functions to orchestrate data pipelines.
Worked on automating the Infrastructure setup, launching and termination EMR clusters etc.
Created Hive external tables on top of datasets loaded in S3 buckets and created various hive scripts to produce series of aggregated datasets for downstream analysis.
Build real time streaming pipeline utilizing Kafka, Spark Streaming and Redshift.
Worked on creating Kafka producers using Kafka Java Producer API for connecting to external Rest live stream application and producing messages to Kafka topic.
Implemented a Continuous Delivery pipeline with Maven, Github and Jenkins.
Designed, documented operational problems by following standards and procedures using Jira.

Environment: AWS S3, EMR, Lambdas, Redshift, Athena, Glue, Spark, Scala, Python, Java, Hive, Kafka, PySpark, Github, Jira.

Confidential, Deerfield, IL

Hadoop Developer

Responsibilities:

Responsible for ingesting large volumes of user behavioral data and customer profile data to Analytics Data store.
Developed custom multi-threaded Java based ingestion jobs as well as Sqoop jobs for ingesting from FTP servers and data warehouses.
Developed PySpark and Scala based Spark applications for performing data cleansing, event enrichment, data aggregation, de-normalization and data preparation needed for machine learning and reporting teams to consume.
Worked on troubleshooting spark application to make them more error tolerant.
Worked on fine-tuning spark applications to improve the overall processing time for the pipelines.
Wrote Kafka producers to stream the data from external rest API’s to Kafka topics.
Wrote Spark-Streaming applications to consume the data from kafka topics and write the processed streams to HBase and MongoDB.
Experienced in handling large datasets using Spark in Memory capabilities, using broadcasts variables in Spark, effective & efficient Joins, transformations and other capabilities.
Worked extensively with Sqoop for importing data from Oracle.
Designing and customizing data models for Data warehouse supporting data from multiple sources on real time.
Experience working for EMR cluster in AWS cloud and working with S3, Redshift, Snowflake.
Wrote Glue jobs to migrate data from hdfs to S3 data lake.
Involved in creating Hive tables, loading and analyzing data using hive scripts.
Implemented Partitioning, Dynamic Partitions, Buckets in Hive.
Good experience with continuous Integration of application using Bamboo.
Used Reporting tools like Tableau to connect with Impala for generating daily reports of data.
Collaborated with the infrastructure, network, database, application and BA teams to ensure data quality and availability.
Designed, documented operational problems by following standards and procedures using JIRA.

Environment: Hadoop, Spark, Scala, Python, Hive, HBase, MongoDB, Sqoop, Oozie, Kafka, Snowflake, Amazon EMR, Glue, YARN, JIRA, Amazon AWS, Shell Scripting, SBT, GITHUB, Maven

Confidential

Hadoop Developer

Responsibilities:

Involved in writing Spark applications using Scala to perform various data cleansing, validation, transformation and summarization activities according to the requirement.
Load the data into Spark RDD and perform in-memory data computation to generate the output as per the requirements.
Developed data pipelines using Spark, Hive and Sqoop to ingest, transform and analyse operational data.
Developed Spark jobs, Hive jobs to summarize and transform data.
Worked on performance tuning of Spark applications to reduce job execution times.
Performance tuning the Spark jobs by changing the configuration properties and using broadcast variables.
Real time streaming the data using Spark with Kafka . Responsible for handling Streaming data from web server console logs.
Worked on different file formats like Text files, Avro, Parquet, JSON, XML files and Flat files using Map Reduce Programs.
Developed daily process to do incremental import of data from DB2 and Teradata into Hive tables using Sqoop.
Wrote Pig Scripts to generate transformations and performed ETL procedures on the data in HDFS .
Solved performance issues in Hive and Pig scripts with understanding of Joins, Group and Aggregation and how does it translate to MR jobs.
Work with cross functional consulting teams within the data science and analytics team to design, develop and execute solutions to derive business insights and solve clients operational and strategic problems.
Exported the analysed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
Extensively used Hive/HQL or Hive queries to query data in Hive Tables and loaded data into HBase tables.
Extensively worked with Partitions, Dynamic Partitioning, bucketing tables in Hive, designed both Managed and External tables, also worked on optimization of Hive queries.
Involved in collecting and aggregating large amounts of log data using Flume and staging data in HDFS for further analysis.
Assisted analytics team by writing Pig and Hive scripts to perform further detailed analysis of the data.
Designing Oozie workflows for job scheduling and batch processing.

Environment: Java, Scala, Apache Spark, MySQL, CDH, IntelliJ IDEA, Hive, HDFS, YARN, Map Reduce, Sqoop, PIG, Flume, Unix Shell Scripting, Python, Apache Kafka .

We provide IT Staff Augmentation Services!

Hadoop Developer Resume

Kansas City, MO

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship