Big Data Engineer Resume

SUMMARY:

6 years of experience in software development which includes 4 years on Big data Hadoop Ecosystem, Hive, Spark, Scala, Map Reduce, YARN and Elastic Map Reduce (AWS EMR).
Hands on experience with Apache Hive (HQL), MySQL, HDFS, AWS and Shell scripting. Worked on MR, TEZ and Spark execution engines for Hive.
Experience working on AWS EMR on EC2 instances utilizing its elasticity.
Experience writing Spark jobs in Scala and MR Jobs in Java.
Experience migrating Greenplum, Netezza and Oracle SQL programs to Hadoop and cloud using Apache Hive, AWS EMR and S3.
Experience migrating Pig scripts to Spark jobs using Scala.
Experience leading a team and being a SCRUM master.
Strong experience in optimizing HQL queries and resource utilization on distributed clusters based on the input data(size).
Experience working on Mapreduce Java and HBase APIs.
Hands on experience on performing jobs on Petabyte scale data.
Strong knowledge of Algorithms & Data structures, Object oriented programming and Software development life cycle.
Experience in Hortonworks and Cloudera Hadoop distributions.
Hands on experience with Big Data technologies like Pig, Sqoop, Oozie.
Hands on experience in version controlling systems SVN and GIT.
Experience in designing and coding web applications in core JAVA/JAVA EE, XML and AJAX using MVC architecture.
Experience in all phases of SDLC using agile software development methodology.
Experience in Coding and debugging in C# using .NET framework and visual studio.

TECHNICAL SKILLS:

Languages: Java, Scala, HQL (Hive query language), SQL, SparkSQL, Shell Scripting (Bash)

Big Data: Apache Hadoop, Apache Spark, HDFS, MapReduce, Apache Hive, Amazon S3, Amazon Elastic Map Reduce, EC2, Apache HBase, Apache Sqoop, Apache Pig.

Databases: Oracle, GreenPlum, Netezza, Postgres, MySql, AWS RDS

Tools: Splunk monitoring tool, Jenkins, Ant, JIRA, Crucible Fisheye, AquaDataStudio, PgAdmin for Greenplum and Postgres, Aginity for Netezza.

Version Control: Subversion and Git

Methodologies: Agile, Waterfall

EXPERIENCE:

Confidential

Big Data Engineer

Responsibilities:

Analyze and work on development tasks in every sprint.
Write Spark jobs and Map reduce jobs in Scala and Java and operationalize the jobs using shell scripting and JAMS jobs.
Analyze and work on enhancements.
Support productions issues.
Participate in spikes and POCs.
Write unit tests for both Java and Scala code.
Migrate pig scripts to Spark Jobs using Scala.
Deploy the code in QA environment and test an end to end run on AWS cluster.
Do performance analysis on Spark and Map reduce jobs on different configuration EC2 clusters.
Perform parallel run and make sure the build is giving expected results.
Use Spark to compare large datasets from both parallel environment and Prod environment.
Port legacy C++ code to new Java framework for one of the components of OATS.
Fix bugs after the initial Java porting, do the unit testing, perform parallel run and compare both legacy C++ output with New Java output.
Work on scripts to automate the data retrieval and comparison.

Confidential

Big Data Engineer

Responsibilities:

Analyze pattern requirements, create user stories and sub - tasks and distribute story points and expected hours for the subtasks on JIRA.
Lead a team of 4 developers in setting up an offshore team and play the SCRUM Master role.
Design the implementation and do design walkthrough to the stakeholders.
Create Hive external tables on top of S3 data, adding partitions, coding as per the requirements using Hive (HQL), Java (UDFs) and Shell scripting.
Integrate the developed pattern with Pattern Toolkit Component (PTC) framework for running the pattern on AWS.
Integrate the patterns to Spark environment and run Hive on top of Spark.
Perform performance analysis between Hive jobs on Spark vs TEZ vs MR engines
Spin up clusters on AWS and run the pattern in a step wise manner, copy back the resulted output from the hive code on to S3 in the final step.
Perform Code walkthrough and also participate in peer review.
Deploy the pattern and handover the build label to QA team for further testing.
Help QA team in writing ANT targets to simulate the execution process on QA environment.

Confidential, Charlotte

Free Lancer/Volunteer

Responsibilities:

Assist in installing and maintaining Hadoop cluster spanning over Multi terabytes (Cloudera Hadoop)
Manage data coming from different sources.
Support application users in debugging their Map Reduce applications.
Involve in Hadoop maintenance and ongoing support.
Help PhD students in data analysis tasks using Hive and Pig.

Confidential

Associate Software Engineer

Responsibilities:

Analyze the functional requirements documents and prepare user stories as part of requirements phase. Follow Agile/SCRUM development.
Use Atlassian tool JIRA for project and issue tracking - Raising user stories for development tasks
Develop the assigned user stories in time in C# .NET, .NET Framework 3.5, VS 2010
Develop automated test scripts using Nunit.
Write unit test cases integration test cases and execute those test cases.

Confidential

Associate Software Engineer

Responsibilities:

Prepare the user stories for new requirements and improvements/RFCs.
Fix the assigned RFCs (Request for Change) in C# .NET
Check in the developed code into VSS and Subversion.
Write unit test cases using NUnits.
Use NCover for code coverage.
Adhere to quality standards conducted and participated in peer reviews.
Assist in integration, functional, regression and media check testing.

Confidential

Intern

Responsibilities:

Learn the firmware of the company’s products and familiarize with the working conditions.
Communicate with the on-site employees and self-learn technologies used for the product development.
Create batch files for automation of nightly builds and for executing unit test cases.
Write unit test cases for the developed code.
Perform code coverage using NCover.

We provide IT Staff Augmentation Services!

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship