Big Data Engineer Resume
SUMMARY:
- 6 years of experience in software development which includes 4 years on Big data Hadoop Ecosystem, Hive, Spark, Scala, Map Reduce, YARN and Elastic Map Reduce (AWS EMR).
- Hands on experience with Apache Hive (HQL), MySQL, HDFS, AWS and Shell scripting. Worked on MR, TEZ and Spark execution engines for Hive.
- Experience working on AWS EMR on EC2 instances utilizing its elasticity.
- Experience writing Spark jobs in Scala and MR Jobs in Java.
- Experience migrating Greenplum, Netezza and Oracle SQL programs to Hadoop and cloud using Apache Hive, AWS EMR and S3.
- Experience migrating Pig scripts to Spark jobs using Scala.
- Experience leading a team and being a SCRUM master.
- Strong experience in optimizing HQL queries and resource utilization on distributed clusters based on the input data(size).
- Experience working on Mapreduce Java and HBase APIs.
- Hands on experience on performing jobs on Petabyte scale data.
- Strong knowledge of Algorithms & Data structures, Object oriented programming and Software development life cycle.
- Experience in Hortonworks and Cloudera Hadoop distributions.
- Hands on experience with Big Data technologies like Pig, Sqoop, Oozie.
- Hands on experience in version controlling systems SVN and GIT.
- Experience in designing and coding web applications in core JAVA/JAVA EE, XML and AJAX using MVC architecture.
- Experience in all phases of SDLC using agile software development methodology.
- Experience in Coding and debugging in C# using .NET framework and visual studio.
TECHNICAL SKILLS:
Languages: Java, Scala, HQL (Hive query language), SQL, SparkSQL, Shell Scripting (Bash)
Big Data: Apache Hadoop, Apache Spark, HDFS, MapReduce, Apache Hive, Amazon S3, Amazon Elastic Map Reduce, EC2, Apache HBase, Apache Sqoop, Apache Pig.
Databases: Oracle, GreenPlum, Netezza, Postgres, MySql, AWS RDS
Tools: Splunk monitoring tool, Jenkins, Ant, JIRA, Crucible Fisheye, AquaDataStudio, PgAdmin for Greenplum and Postgres, Aginity for Netezza.
Version Control: Subversion and Git
Methodologies: Agile, Waterfall
EXPERIENCE:
Confidential
Big Data Engineer
Responsibilities:
- Analyze and work on development tasks in every sprint.
- Write Spark jobs and Map reduce jobs in Scala and Java and operationalize the jobs using shell scripting and JAMS jobs.
- Analyze and work on enhancements.
- Support productions issues.
- Participate in spikes and POCs.
- Write unit tests for both Java and Scala code.
- Migrate pig scripts to Spark Jobs using Scala.
- Deploy the code in QA environment and test an end to end run on AWS cluster.
- Do performance analysis on Spark and Map reduce jobs on different configuration EC2 clusters.
- Perform parallel run and make sure the build is giving expected results.
- Use Spark to compare large datasets from both parallel environment and Prod environment.
- Port legacy C++ code to new Java framework for one of the components of OATS.
- Fix bugs after the initial Java porting, do the unit testing, perform parallel run and compare both legacy C++ output with New Java output.
- Work on scripts to automate the data retrieval and comparison.
Confidential
Big Data Engineer
Responsibilities:
- Analyze pattern requirements, create user stories and sub - tasks and distribute story points and expected hours for the subtasks on JIRA.
- Lead a team of 4 developers in setting up an offshore team and play the SCRUM Master role.
- Design the implementation and do design walkthrough to the stakeholders.
- Create Hive external tables on top of S3 data, adding partitions, coding as per the requirements using Hive (HQL), Java (UDFs) and Shell scripting.
- Integrate the developed pattern with Pattern Toolkit Component (PTC) framework for running the pattern on AWS.
- Integrate the patterns to Spark environment and run Hive on top of Spark.
- Perform performance analysis between Hive jobs on Spark vs TEZ vs MR engines
- Spin up clusters on AWS and run the pattern in a step wise manner, copy back the resulted output from the hive code on to S3 in the final step.
- Perform Code walkthrough and also participate in peer review.
- Deploy the pattern and handover the build label to QA team for further testing.
- Help QA team in writing ANT targets to simulate the execution process on QA environment.
Confidential, Charlotte
Free Lancer/Volunteer
Responsibilities:
- Assist in installing and maintaining Hadoop cluster spanning over Multi terabytes (Cloudera Hadoop)
- Manage data coming from different sources.
- Support application users in debugging their Map Reduce applications.
- Involve in Hadoop maintenance and ongoing support.
- Help PhD students in data analysis tasks using Hive and Pig.
Confidential
Associate Software Engineer
Responsibilities:
- Analyze the functional requirements documents and prepare user stories as part of requirements phase. Follow Agile/SCRUM development.
- Use Atlassian tool JIRA for project and issue tracking - Raising user stories for development tasks
- Develop the assigned user stories in time in C# .NET, .NET Framework 3.5, VS 2010
- Develop automated test scripts using Nunit.
- Write unit test cases integration test cases and execute those test cases.
Confidential
Associate Software Engineer
Responsibilities:
- Prepare the user stories for new requirements and improvements/RFCs.
- Fix the assigned RFCs (Request for Change) in C# .NET
- Check in the developed code into VSS and Subversion.
- Write unit test cases using NUnits.
- Use NCover for code coverage.
- Adhere to quality standards conducted and participated in peer reviews.
- Assist in integration, functional, regression and media check testing.
Confidential
Intern
Responsibilities:
- Learn the firmware of the company’s products and familiarize with the working conditions.
- Communicate with the on-site employees and self-learn technologies used for the product development.
- Create batch files for automation of nightly builds and for executing unit test cases.
- Write unit test cases for the developed code.
- Perform code coverage using NCover.