We provide IT Staff Augmentation Services!

Sr Big Data Engineer/hadoop Devops Developer Resume

Sunnyvale, CA


  • Overall 8+ years of experience in design and deployment of Data Management and Data Warehousing Projects in various roles as a Data Modeler and Data Analyst on Big data technologies.
  • Possesses 3+ years of rich Hadoop experience in design and development of Big Data applications, which involves Apache Hadoop Map/Reduce, HDFS, Hive, HBase, Pig, Oozie, Sqoop, Flume and Spark.
  • Expertise in developing solutions around NOSQL databases like MongoDB and Cassandra.
  • Experience with all flavor of Hadoop distributions, including Cloudera, Horton works.
  • Excellent understanding of Hadoop architecture Map Reduce MRv1 and Map Reduce MRv2 (YARN).
  • Developed multiple Map Reduce programs to process large volumes of semi/unstructured data files using different Map Reduce design patterns.
  • Strong experience in writing Map Reduce jobs in Java and Pig.
  • Experience with various performance optimizations like using distributed cache for small datasets, partition, bucketing in Hive and Map Side joins when writing Map Reduce jobs.
  • Excellent understanding of Hadoop architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce programming paradigm.
  • Worked extensively over semi - structured data (fixed length & delimited files), for data sanitation, report generation and standardization.
  • Excellent hands on experience in analyzing data using Pig Latin, HQL, HBase and Map Reduce programs in Java.
  • Used Spark with Kafka to stream the real time data.
  • Developed Unit test cases using Junit, Easy Mock and MRUnit testing frameworks.
  • Experience in working with Map Reduce programs using Apache Hadoop for working with Big Data
  • Good knowledge in Linux shells scripting or shell commands.
  • Hands on experience in dealing with Compression Codec's like Snappy, Gzip.
  • Involved in requirement and design phase to implement Streaming Lambda Architecture to use realtime streaming using Spark and Kafka.
  • Developed real-time data synchronization systems with reactive programming concepts like Akka and Kafka
  • Good understanding of Data Mining and Machine Learning techniques
  • Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice-versa
  • Experience in importing streaming logs and aggregating the data to HDFS through Flume.
  • Expert in JAVA 1.8 LAMBDAS, STREAMS, Type annotations.


Hadoop Core Services: HDFS, Map Reduce, Spark, YARN

Hadoop Distribution: Horton works, Cloudera, Apache

NO SQL Databases: MongoDB, Cassandra

Hadoop Data Services: Hive, Pig, Sqoop, Flume, Sqoop

Hadoop Operational Services: Zookeeper, Oozie

Monitoring Tools: Ambari, Cloudera Manager

Cloud Computing Tools: Amazon AWS

Languages: C, Java, Python, SQL, PL/SQL, Pig, HiveQL, Unix Shell Scripting

Databases: Oracle, MySQL, MongoDB

Operating Systems: UNIX, Windows, LINUX

Build Tools: Jenkins, Maven, ANT

Development Tools: Microsoft SQL Studio, Toad, Eclipse

Development Methodologies: Agile/Scrum, Waterfall


Sr Big Data Engineer/Hadoop DevOps Developer

Confidential, Sunnyvale, CA


  • Architected and built innovative application and processing layers over Hadoop to suit the requirements of specific enterprise needs.
  • Deliver a reliable and fully automated deploy process for all cluster environments by using Chef/Jenkins/jfrog utilizing Ruby and Python programs.
  • Implemented and maintained the Branching and Build/Release strategies utilizing SVN,Maven,GIT.
  • Responsible for using AWS Console and AWS CLI for deploying and operating AWS services specifically VPC,EC2,S3,EBS,IAM,ELB, Cloud Formation and Cloud Watch.
  • Responsible for MapR Hadoop Cluster Administration (Installations, upgrades, configuration and
  • Maintenace, Performance tuning, Cluster Monitoring and Migrations.)
  • Configure monitoring systems with comprehensive checks and alerts on Nagios,MCS and cloud watch.
  • Managing clusters, systems routine backups, scheduling jobs, enabling system loggin, network logging of servers. Performance tuning and monitoring the server status and maintenance.
  • Creating data pipeline strategies, Analysis of the data and business processes and recommending solutions for analytics reporting needs.
  • Created unix /python/spark /Hive programs for converting various data formats for data massaging.

Environment: MapR, AWS(EC2,S3,VPC,ELB,EBS,Route 53,SNS,cloud formation,cloud watch), Java, Spark, Python, Ruby, Perl, Chef, Jenkins, Maven, Git, SVN, Cron, Jira, Rally, Tomcat, Oracle Big Data Discovery, looker, Tensorflow, Tableau, Nagios.

Hadoop /Spark Bigdata Developer

Confidential, Richfield, MN


  • Worked on Sqoop jobs for ingesting data from Oracle and MySQL
  • Created hive external tables for querying the data
  • Used Spark Data frame APIs to ingest S3 data
  • Wrote scripts to load data from Red shift
  • Processed complex/nested json and csv data using Data frame API
  • Applied Transformation rules on the top of Data Frames
  • Scheduled Spark jobs using Oozie
  • Processed Hive, csv, json and oracle data
  • Validated the source and final output data.
  • Tested the data using Dataset API
  • Partitioned (dynamic as well as static partition) and Bucketed tables to improve query performance
  • Improved HQL performance by analyzing the plan using explain plan and applying various optimization techniques like Map side join, join optimization, tuning container (CPU/Core, Memory etc.)
  • Based on new spark versions, applying different optimization transformation rules
  • Debugged the script to minimize shuffling of data
  • Analyzed and created reports using Tableau
  • Created dashboards in Tableau

Environment: Hadoop, Spark/Scala, MapReduce, HDFS, HBase, Hive, Pig, Java, SQL, Sqoop, Flume, Oozie, UNIX, Maven, Eclipse

Senior Spark Developer/Hadoop Developer

Confidential, Pottsville, PA


  • Implemented Spark RDD Transformations and Actions.
  • Developed DF's, Case Classes for the required input data and performed the data transformations using Spark - Core.
  • Used Hive Queries in Spark-SQL for analysis and processing the data.
  • Used Scala programming to perform transformations and applying business logic.
  • Implemented Partitioning, Dynamic Partition, Indexing and buckets in Hive.
  • Loaded the dataset into Hive for ETL Operation.
  • Stored processed data in parquet file format.
  • Streamed data from data source using Kafka.
  • Involved in moving all log files generated from various sources to HDFS for further processing through Flume.
  • Converting Hive/SQL queries into Spark transformations using Spark RDD, Scala.
  • Worked with Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark Streaming.
  • Developed Flume ETL job for handling data from HTTP Source and Sink as HDFS.
  • Implemented advanced procedures like text analytics and processing using the in-memory computing capabilities like Spark.
  • Involved in creating Hive Tables, loading with data and writing Hive queries, which will invoke and run MapReduce jobs in the backend.
  • Importing and exporting data into HDFS and HIVE using Kafka.
  • Implemented advanced procedures like text analytics and processing using the in-memory computing capabilities like Spark.
  • Developed data pipeline using Sqoop to ingest customer behavioral data into HDFS for analysis.
  • Monitoring jobs using Hadoop resource manager and Ambari Views.

Confidential, Boca Raton, FL

Hadoop Developer


  • Responsible for coding Map Reduce program, Hive queries, testing and debugging the Map Reduce programs.
  • Responsible for Installing, Configuring and Managing of Hadoop Cluster spanning multiple racks.
  • Developed Pig Latin scripts in the areas where extensive coding needs to be reduced to analyze large data sets.
  • Used Sqoop tool to extract data from a relational database into Hadoop.
  • Involved in performance enhancements of the code and optimization by writing custom comparators and combiner logic.
  • Worked closely with data warehouse architect and business intelligence analyst to develop solutions.
  • Good understanding of job schedulers like Fair Scheduler which assigns resources to jobs such that all jobs get, on average, an equal share of resources over time and an idea about Capacity Scheduler.
  • Developed the presentation layer using JSP, HTML, CSS and client side validations using JavaScript.
  • Collaborated with the ETL/ Informatica team to determine the necessary data models and UI designs to support Cognos reports.
  • Eclipse for application development in Java J2EE, JBOSS as the application server, and Node JS for standalone UI testing, Oracle as the backend, GIT as the version control and ANT for build script.
  • Involved in coding, code reviews, JUnit testing, Prepared and executed Unit Test Cases.
  • Responsible for performing peer code reviews, troubleshooting issues and maintaining status report.
  • Involved in creating Hive Tables, loading with data and writing Hive queries, which will invoke and run Map Reduce jobs in the backend.
  • Involved in identifying possible ways to improve the efficiency of the system. Involved in the requirement analysis, design, development and Unit Testing use of MRUnit and Junit.
  • Prepare daily and weekly project status report and share it with the client.
  • Supported in setting up QA environment and updating configurations for implementing scripts with Pig, Hive and Sqoop.

Environment: Apache Hadoop, Java (JDK 1.7), Oracle, My SQL, Hive, Pig, Sqoop, Linux, Cent OS, Junit, MR Unit, Cloud era

Java Developer / Hadoop Developer

Confidential, Denver, CO


  • Experience in administration, installing, upgrading and managing CDH3, Pig, Hive & Hbase
  • Architecture and implementation of the Product Platform as well as all data transfer, storage and Processing from Data Center and to Hadoop File Systems
  • Experienced in defining job flows.
  • Implemented CDH3 Hadoop cluster on CentOS.
  • Worked on installing cluster, commissioning & decommissioning of datanode, namenode recovery, capacity planning, and slots configuration.
  • Wrote Custom Map Reduce Scripts for Data Processing in Java
  • Importing and exporting data into HDFS and Hive using Sqoop and also used flume from to extract from multiple resources.
  • Responsible to manage data coming from different sources.
  • Supported Map Reduce Programs those are running on the cluster.
  • Involved in loading data from UNIX file system to HDFS.
  • Created Hive tables to store data into HDFS, loading data and writing hive queries that will run internally in map reduce way.
  • Used Flume to Channel data from different sources to HDFS
  • Created HBase tables to store variable data formats of PII data coming from different portfolios
  • Implemented best income logic using Pig scripts. Wrote custom Pig UDF to analyze data
  • Load and transform large sets of structured, semi structured and unstructured data
  • Cluster coordination services through Zookeeper
  • Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.

Environment: Hadoop, Map Reduce, Hive, Hbase, Flume, Pig, Zookeeper, Java, ETL, SQL, Centos, Eclipse.

Hire Now