Hadoop/Spark Developer Resume Bentonville - Hire IT People

SUMMARY:

Around 5+ years of IT experience in different stages of software Development Life Cycle (Analysis, Design, Development, Testing, Deployment and Support) using WATERFALL and AGILE methodologies.
Having experience in Data Analysis and data migration using Hadoop Eco System components like Spark, MapReduce, Sqoop, Hive, Hue.
Designed HIVE queries to perform data analysis, data transfer and table design to load data into Hadoop environment.
Implemented Sqoop and TDCH for large dataset transfer between HDFS and RDBMS and vice - versa.
Experience in data workflow scheduler Oozie to manage Hadoop jobs by Direct Acyclic Graph (DAG) of actions with the control flows.
Hands on Experience in designing and developing applications in Spark using PYSpark.
Profound experience in working with Cloudera and Horton Works Hadoop Distributions.
Used AORTA ETL tool to Extract the data from different data sources, Transform and Load the data into the Data Ware house systems for reporting and Analysis
Good knowledge on NoSQL databases like HBase and Cassandra.
Very Good understanding of Object-Oriented Programming (OOPS) .
Hands on experience in Sequence files, RC files, Avro and Parquet file formats.
Good knowledge on Jenkins for CI/CD and Maven for build process.
Involved in Agile methodologies, daily scrum meetings, sprint planning.
Good Knowledge on AWS cloud components EMR, S3, EC2.
Designed HIVE queries to perform data analysis, data transfer and table design to load data into Hadoop environment.
Hands on experience in working with Spark SQL queries, Data frames, import data from Data sources, perform transformations, perform read/write operations, save the results to output directory into HDFS.

TECHNICAL SKILLS:

Bigdata Ecosystem: HDFS and Map Reduce, Hive, YARN, HUE, Oozie, Apache Spark, Sqoop

Hadoop Distributions: Cloudera, Hortonworks

Programming languages: Java, C/C++, SCALA, HiveQL.

Scripting Languages: Shell Scripting, Java Scripting

Databases: MySQL, oracle, Teradata, DB2

Version control Tools: SVN, Git, GitHub

Operating Systems: WINDOWS 10/8/Vista/ XP

Development IDEs: Eclipse IDE, Python(IDLE)

Packages: Microsoft Office, putty, MS Visual Studio

PROFESSIONAL EXPERIENCE:

Hadoop/Spark Developer

Confidential, Bentonville

Responsibilities:

Developed TDCH scripts for importing and exporting data into HDFS and Hive.
Used Fair Scheduling to allocate resources in yarn.
Responsible to manage data coming from different sources.
Scheduled automated jobs using Cron scheduler.
Involved in creating Hive Tables, loading with data and writing Hive queries.
Implemented the workflows using Apache Oozie framework to automate tasks.
Read the ORC files and create Data frames to use in spark
Experienced working with Spark Core and Spark SQL using PySpark.
Performed data transformations and analytics on large dataset using Spark.
Integrated spark jobs with MLP platform

Environment : Hadoop, Hive, Spark, Python, Hue, Teradata, Yarn,Java

Data Engineer

Confidential, Bentonville

Responsibilities:

Developed YAML scripts for importing and exporting data into HDFS and Hive.
Importing tables data from different sources to Data Lake using AORTA ETL framework.
Responsible for creating and scheduling jobs using Atomic Job scheduler.
Involved in creating Hive Tables, loading with data and writing Hive queries.
Used Bucketing and Dynamic Partitioning on Hive tables.
Validating Tables data after loading in to data lake using Hive queries.

Environment : Hadoop, Hive, Yaml, Hue, Teradata, DB2

Hadoop Developer

Confidential, CHICAGO, IL

Responsibilities:

Extracted the data from RDBMS into HDFS using Sqoop.
Used ETL processes to load data from flat files into the target database by applying business logic on transformation mapping for inserting and updating records when loaded.
Performance optimizations on Spark/Scala.
Worked on different file formats Avro, RC and ORC file formats.
Created and worked on Sqoop jobs with incremental load to populate Hive External tables.
Implemented the workflows using Apache Oozie framework to automate tasks.
Created and worked Sqoop jobs with incremental load to populate Hive External tables.
Designed and developed Hive tables to store staging and historical data.
Created Hive tables as per requirement, internal and external tables are defined with appropriate static and dynamic partitions, intended for efficiency.
Experience in using ORC file format with Snappy compression for optimized storage of Hive tables.
Developed Oozie workflow for scheduling and orchestrating the ETL process.

Environment: Apache Hadoop, HDFS, MapReduce, Sqoop, Hive, Oozie, Spark, Linux

Software Engineer II

Confidential

Responsibilities:

Understanding of software requirement specifications coming from the software management team.
Involved in the project design, coding and unit testing using C++/Java
Feature enhancements
Involved in the Product maintenance, enhancements and debugging field issues
Involved in design of protocols and socket programming for client server communication.
Developed message queues and semaphores for inter process communication.
Client server communication using TCP and UDP.
Thorough code reviews and unit tests to ensure software quality and avoid rework.
Done Code reviews and Unit tests thus maintained quality standards. (SDLC)
Effectively used GDB debugging for resolving crash issues.
Inter process communication between different processes running over different PC's
Actively involved in code reviews and future enhancement meetings.
Worked in waterfall methodology, thus aligned myself to work in dynamic environments.
Thorough code reviews and unit tests to ensure software quality and avoid rework.
Involved in all the phases of the Software Development Life Cycle (SDLC)
Provided extensive pre-delivery support for code review and bug fixing
Involved in the implementation part of the applications in various platforms like Windows, Linux

Environment: C++, Java, Linux OS, GDB debugger, TCP/IP, Socket Programming

Software Engineer

Confidential

Responsibilities:

Understanding of software requirement specifications coming from the client.
Involved in the startup and team meetings for discussing coding standards and coding parameters.
Involved in the development of different modules using C/C++
Involved in the unit, integration testing and system testing for validating the product.
Involved in the code debugging using memescape debugging tool.
Involved in the different enhancements given by the software management.
Involved in successful delivery of modules with the time lines.
Involved in the field testing and analysis of different log files.

Environment: C++, VXWORKS, Memscope Debugger, GDB debugger, RS232

We provide IT Staff Augmentation Services!

Hadoop/spark Developer Resume

BentonvillE

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship