We provide IT Staff Augmentation Services!

Hadoop/spark Developer Resume

BentonvillE

SUMMARY:

  • Around 5+ years of IT experience in different stages of software Development Life Cycle (Analysis, Design, Development, Testing, Deployment and Support) using WATERFALL and AGILE methodologies.
  • Having experience in Data Analysis and data migration using Hadoop Eco System components like Spark, MapReduce, Sqoop, Hive, Hue.
  • Designed HIVE queries to perform data analysis, data transfer and table design to load data into Hadoop environment.
  • Implemented Sqoop and TDCH for large dataset transfer between HDFS and RDBMS and vice - versa.
  • Experience in data workflow scheduler Oozie to manage Hadoop jobs by Direct Acyclic Graph (DAG) of actions with the control flows.
  • Hands on Experience in designing and developing applications in Spark using PYSpark.
  • Profound experience in working with Cloudera and Horton Works Hadoop Distributions.
  • Used AORTA ETL tool to Extract the data from different data sources, Transform and Load the data into the Data Ware house systems for reporting and Analysis
  • Good knowledge on NoSQL databases like HBase and Cassandra.
  • Very Good understanding of Object-Oriented Programming (OOPS) .
  • Hands on experience in Sequence files, RC files, Avro and Parquet file formats.
  • Good knowledge on Jenkins for CI/CD and Maven for build process.
  • Involved in Agile methodologies, daily scrum meetings, sprint planning.
  • Good Knowledge on AWS cloud components EMR, S3, EC2.
  • Designed HIVE queries to perform data analysis, data transfer and table design to load data into Hadoop environment.
  • Hands on experience in working with Spark SQL queries, Data frames, import data from Data sources, perform transformations, perform read/write operations, save the results to output directory into HDFS.

TECHNICAL SKILLS:

Bigdata Ecosystem: HDFS and Map Reduce, Hive, YARN, HUE, Oozie, Apache Spark, Sqoop

Hadoop Distributions: Cloudera, Hortonworks

Programming languages: Java, C/C++, SCALA, HiveQL.

Scripting Languages: Shell Scripting, Java Scripting

Databases: MySQL, oracle, Teradata, DB2

Version control Tools: SVN, Git, GitHub

Operating Systems: WINDOWS 10/8/Vista/ XP

Development IDEs: Eclipse IDE, Python(IDLE)

Packages: Microsoft Office, putty, MS Visual Studio

PROFESSIONAL EXPERIENCE:

Hadoop/Spark Developer

Confidential, Bentonville

Responsibilities:

  • Developed TDCH scripts for importing and exporting data into HDFS and Hive.
  • Used Fair Scheduling to allocate resources in yarn.
  • Responsible to manage data coming from different sources.
  • Scheduled automated jobs using Cron scheduler.
  • Involved in creating Hive Tables, loading with data and writing Hive queries.
  • Implemented the workflows using Apache Oozie framework to automate tasks.
  • Read the ORC files and create Data frames to use in spark
  • Experienced working with Spark Core and Spark SQL using PySpark.
  • Performed data transformations and analytics on large dataset using Spark.
  • Integrated spark jobs with MLP platform

Environment : Hadoop, Hive, Spark, Python, Hue, Teradata, Yarn,Java

Data Engineer

Confidential, Bentonville

Responsibilities:

  • Developed YAML scripts for importing and exporting data into HDFS and Hive.
  • Importing tables data from different sources to Data Lake using AORTA ETL framework.
  • Responsible for creating and scheduling jobs using Atomic Job scheduler.
  • Involved in creating Hive Tables, loading with data and writing Hive queries.
  • Used Bucketing and Dynamic Partitioning on Hive tables.
  • Validating Tables data after loading in to data lake using Hive queries.

Environment : Hadoop, Hive, Yaml, Hue, Teradata, DB2

Hadoop Developer

Confidential, CHICAGO, IL

Responsibilities:

  • Extracted the data from RDBMS into HDFS using Sqoop.
  • Used ETL processes to load data from flat files into the target database by applying business logic on transformation mapping for inserting and updating records when loaded.
  • Performance optimizations on Spark/Scala.
  • Worked on different file formats Avro, RC and ORC file formats.
  • Created and worked on Sqoop jobs with incremental load to populate Hive External tables.
  • Implemented the workflows using Apache Oozie framework to automate tasks.
  • Created and worked Sqoop jobs with incremental load to populate Hive External tables.
  • Designed and developed Hive tables to store staging and historical data.
  • Created Hive tables as per requirement, internal and external tables are defined with appropriate static and dynamic partitions, intended for efficiency.
  • Experience in using ORC file format with Snappy compression for optimized storage of Hive tables.
  • Developed Oozie workflow for scheduling and orchestrating the ETL process.

Environment: Apache Hadoop, HDFS, MapReduce, Sqoop, Hive, Oozie, Spark, Linux

Software Engineer II

Confidential

Responsibilities:

  • Understanding of software requirement specifications coming from the software management team.
  • Involved in the project design, coding and unit testing using C++/Java
  • Feature enhancements
  • Involved in the Product maintenance, enhancements and debugging field issues
  • Involved in design of protocols and socket programming for client server communication.
  • Developed message queues and semaphores for inter process communication.
  • Client server communication using TCP and UDP.
  • Thorough code reviews and unit tests to ensure software quality and avoid rework.
  • Done Code reviews and Unit tests thus maintained quality standards. (SDLC)
  • Effectively used GDB debugging for resolving crash issues.
  • Inter process communication between different processes running over different PC's
  • Actively involved in code reviews and future enhancement meetings.
  • Worked in waterfall methodology, thus aligned myself to work in dynamic environments.
  • Thorough code reviews and unit tests to ensure software quality and avoid rework.
  • Involved in all the phases of the Software Development Life Cycle (SDLC)
  • Provided extensive pre-delivery support for code review and bug fixing
  • Involved in the implementation part of the applications in various platforms like Windows, Linux

Environment: C++, Java, Linux OS, GDB debugger, TCP/IP, Socket Programming

Software Engineer

Confidential

Responsibilities:

  • Understanding of software requirement specifications coming from the client.
  • Involved in the startup and team meetings for discussing coding standards and coding parameters.
  • Involved in the development of different modules using C/C++
  • Involved in the unit, integration testing and system testing for validating the product.
  • Involved in the code debugging using memescape debugging tool.
  • Involved in the different enhancements given by the software management.
  • Involved in successful delivery of modules with the time lines.
  • Involved in the field testing and analysis of different log files.

Environment: C++, VXWORKS, Memscope Debugger, GDB debugger, RS232

Hire Now