Hadoop/spark Developer Resume
BentonvillE
SUMMARY:
- Around 5+ years of IT experience in different stages of software Development Life Cycle (Analysis, Design, Development, Testing, Deployment and Support) using WATERFALL and AGILE methodologies.
- Having experience in Data Analysis and data migration using Hadoop Eco System components like Spark, MapReduce, Sqoop, Hive, Hue.
- Designed HIVE queries to perform data analysis, data transfer and table design to load data into Hadoop environment.
- Implemented Sqoop and TDCH for large dataset transfer between HDFS and RDBMS and vice - versa.
- Experience in data workflow scheduler Oozie to manage Hadoop jobs by Direct Acyclic Graph (DAG) of actions with the control flows.
- Hands on Experience in designing and developing applications in Spark using PYSpark.
- Profound experience in working with Cloudera and Horton Works Hadoop Distributions.
- Used AORTA ETL tool to Extract the data from different data sources, Transform and Load the data into the Data Ware house systems for reporting and Analysis
- Good knowledge on NoSQL databases like HBase and Cassandra.
- Very Good understanding of Object-Oriented Programming (OOPS) .
- Hands on experience in Sequence files, RC files, Avro and Parquet file formats.
- Good knowledge on Jenkins for CI/CD and Maven for build process.
- Involved in Agile methodologies, daily scrum meetings, sprint planning.
- Good Knowledge on AWS cloud components EMR, S3, EC2.
- Designed HIVE queries to perform data analysis, data transfer and table design to load data into Hadoop environment.
- Hands on experience in working with Spark SQL queries, Data frames, import data from Data sources, perform transformations, perform read/write operations, save the results to output directory into HDFS.
TECHNICAL SKILLS:
Bigdata Ecosystem: HDFS and Map Reduce, Hive, YARN, HUE, Oozie, Apache Spark, Sqoop
Hadoop Distributions: Cloudera, Hortonworks
Programming languages: Java, C/C++, SCALA, HiveQL.
Scripting Languages: Shell Scripting, Java Scripting
Databases: MySQL, oracle, Teradata, DB2
Version control Tools: SVN, Git, GitHub
Operating Systems: WINDOWS 10/8/Vista/ XP
Development IDEs: Eclipse IDE, Python(IDLE)
Packages: Microsoft Office, putty, MS Visual Studio
PROFESSIONAL EXPERIENCE:
Hadoop/Spark Developer
Confidential, Bentonville
Responsibilities:
- Developed TDCH scripts for importing and exporting data into HDFS and Hive.
- Used Fair Scheduling to allocate resources in yarn.
- Responsible to manage data coming from different sources.
- Scheduled automated jobs using Cron scheduler.
- Involved in creating Hive Tables, loading with data and writing Hive queries.
- Implemented the workflows using Apache Oozie framework to automate tasks.
- Read the ORC files and create Data frames to use in spark
- Experienced working with Spark Core and Spark SQL using PySpark.
- Performed data transformations and analytics on large dataset using Spark.
- Integrated spark jobs with MLP platform
Environment : Hadoop, Hive, Spark, Python, Hue, Teradata, Yarn,Java
Data Engineer
Confidential, Bentonville
Responsibilities:
- Developed YAML scripts for importing and exporting data into HDFS and Hive.
- Importing tables data from different sources to Data Lake using AORTA ETL framework.
- Responsible for creating and scheduling jobs using Atomic Job scheduler.
- Involved in creating Hive Tables, loading with data and writing Hive queries.
- Used Bucketing and Dynamic Partitioning on Hive tables.
- Validating Tables data after loading in to data lake using Hive queries.
Environment : Hadoop, Hive, Yaml, Hue, Teradata, DB2
Hadoop Developer
Confidential, CHICAGO, IL
Responsibilities:
- Extracted the data from RDBMS into HDFS using Sqoop.
- Used ETL processes to load data from flat files into the target database by applying business logic on transformation mapping for inserting and updating records when loaded.
- Performance optimizations on Spark/Scala.
- Worked on different file formats Avro, RC and ORC file formats.
- Created and worked on Sqoop jobs with incremental load to populate Hive External tables.
- Implemented the workflows using Apache Oozie framework to automate tasks.
- Created and worked Sqoop jobs with incremental load to populate Hive External tables.
- Designed and developed Hive tables to store staging and historical data.
- Created Hive tables as per requirement, internal and external tables are defined with appropriate static and dynamic partitions, intended for efficiency.
- Experience in using ORC file format with Snappy compression for optimized storage of Hive tables.
- Developed Oozie workflow for scheduling and orchestrating the ETL process.
Environment: Apache Hadoop, HDFS, MapReduce, Sqoop, Hive, Oozie, Spark, Linux
Software Engineer II
Confidential
Responsibilities:
- Understanding of software requirement specifications coming from the software management team.
- Involved in the project design, coding and unit testing using C++/Java
- Feature enhancements
- Involved in the Product maintenance, enhancements and debugging field issues
- Involved in design of protocols and socket programming for client server communication.
- Developed message queues and semaphores for inter process communication.
- Client server communication using TCP and UDP.
- Thorough code reviews and unit tests to ensure software quality and avoid rework.
- Done Code reviews and Unit tests thus maintained quality standards. (SDLC)
- Effectively used GDB debugging for resolving crash issues.
- Inter process communication between different processes running over different PC's
- Actively involved in code reviews and future enhancement meetings.
- Worked in waterfall methodology, thus aligned myself to work in dynamic environments.
- Thorough code reviews and unit tests to ensure software quality and avoid rework.
- Involved in all the phases of the Software Development Life Cycle (SDLC)
- Provided extensive pre-delivery support for code review and bug fixing
- Involved in the implementation part of the applications in various platforms like Windows, Linux
Environment: C++, Java, Linux OS, GDB debugger, TCP/IP, Socket Programming
Software Engineer
Confidential
Responsibilities:
- Understanding of software requirement specifications coming from the client.
- Involved in the startup and team meetings for discussing coding standards and coding parameters.
- Involved in the development of different modules using C/C++
- Involved in the unit, integration testing and system testing for validating the product.
- Involved in the code debugging using memescape debugging tool.
- Involved in the different enhancements given by the software management.
- Involved in successful delivery of modules with the time lines.
- Involved in the field testing and analysis of different log files.
Environment: C++, VXWORKS, Memscope Debugger, GDB debugger, RS232