We provide IT Staff Augmentation Services!

Senior Big Data Developer Resume



  • Highly motivated and skilled professional 7 plus years of IT experience in Architecture, Analysis, design, development, implementation, maintenance and support with experience in developing strategic methods for deploying big data technologies to efficiently solve Big Data processing requirement.
  • Around 5+ years of experience on BIG DATA using HADOOP framework and related technologies such as HDFS, HBASE, MapReduce, HIVE, PIG, FLUME, OOZIE, SQOOP, and ZOOKEEPER.
  • Knowledge and experience in Spark using Python and Scala.
  • Knowledge on big - data database HBase and NoSQL databases Mongo DB and Cassandra.
  • Experience in meeting expectations with Hadoop clusters using Cloudera and Horton Works.
  • Experience includes Requirements Gathering, Design, Development, Integration, Documentation, Testing and Build.
  • Experience in Spark applications using Scala for easy Hadoop transitions.
  • Experience in Extending Hive and Pig core functionality by writing custom UDFs.
  • Solid knowledge of Hadoop architecture and core components Name node, Data nodes, Job trackers, Task Trackers, Oozie, Scribe, Hue, Flume, HBase, etc.
  • Extensively worked on development and optimization of Map reduce programs, PIG scripts and HIVE queries to create structured data for data mining.
  • Ingested data from RDBMS and performed data transformations, and then export the transformed data to Cassandra as per the business requirement.
  • Expertise on various AWS Services in Compute, Storage, Network, Database, Monitoring and Security.
  • Wrote many scripts in shell/python to automate daily tasks or query data on AWS resources.
  • Created AWS Config and built the tools to track the configuration changes and by setting the automatic Notifications using AWS SNS.
  • Design roles and groups using AWS Identity and Access Management (IAM).
  • Working knowledge in AWS Lambda for Server-less computing using Python.
  • Good Experience on Kafka management tools like Kafka manager and Kafka offset monitor.
  • Implemented custom Kafka connector for Kafka (sink and source) to HDFS, MYSQL.
  • Good Experience with Kafka Log Aggregation, Kafka Commit Log re-syncing mechanism for failed nodes to restore their data, Stream Processing, Event sourcing.
  • Implemented Kafka security using Kerberos and SSL.
  • Implemented spark- Kafka custom offset management using zookeeper.
  • Implemented KSQL also for do more SQL operation on Kafka data.
  • Worked in provisioning and managing multi-tenant Hadoop clusters on public cloud environment - Amazon Web Services (AWS) and on private cloud infrastructure - Open stack cloud platform.
  • Great Experience with Kafka implementation and engineering Kafka sizing, security, replication and monitoring.
  • Strong experience in working with Elastic MapReduce and setting up environments on Amazon AWS EC2 instances
  • Loaded some of the data into Cassandra for fast retrieval of data.
  • Worked with both Scala and Java, Created frameworks for processing data pipelines through Spark.
  • Implemented batch-processing solution to certain unstructured and large volume of data by using Hadoop Map Reduce framework.
  • Knowledge of ETL methods for data extraction, transformation and loading in corporate-wide ETL Solutions and Data warehouse tools for data analysis.
  • Experience in Database design, Data analysis, Programming SQL.
  • Experience in writing custom classes, functions, procedures, problem management, library controls and reusable components.
  • Experienced in integrating Java-based web applications in a UNIX environment.


Hadoop/Big Data: HDFS, Map Reduce, Hive, Pig, Sqoop, Flume, Oozie, and Spark (Python and Scala)

NoSQL Databases: HBase, Cassandra, MongoDB

Languages: C, C++, Java, J2EE, PL/SQL, Pig Latin, HiveQL, Unix shell scripts, R

Java/J2EE Technologies: Applets, Swing, JDBC, JSON, JSTL, JMS, Java Script, JSP, Servlets, EJB, JSF, jQuery

Frameworks: MVC, Struts, Spring, Hibernate.

ETL: IBM Web Sphere/Oracle

Operating Systems: Sun Solaris, UNIX, Red Hat Linux, Ubuntu Linux and Windows XP/Vista/7/8

Web Technologies: HTML, DHTML, XML, WSDL, SOAP

Web/Application servers: Apache Tomcat, WebLogic, Jobs

Databases: Oracle, SQL Server, MySQL

Tools and IDE: Eclipse, NetBeans, Developer, DB Visualizer.

Network Protocols: TCP/IP, UDP, HTTP, DNS


Senior Big data Developer

Confidential, GA


  • Deployed and configured 15 Node Kafka cluster.
  • Implemented custom Kafka connector for Kafka (sink and source) to HDFS, MYSQL.
  • Worked on various AWS Services in Compute, Storage, Network, Database, Monitoring and Security.
  • Wrote many scripts in shell/python to automate daily tasks or query data on AWS resources.
  • Created AWS Config and built the tools to track the configuration changes and by setting the automatic Notifications using AWS SNS.
  • Design roles and groups using AWS Identity and Access Management (IAM).
  • Working knowledge in AWS Lambda for Server-less computing using Python.
  • worked with Spark-SQL, Data Frames and Pair RDD's.
  • handled large data frames using Partitions, Spark in-Memory capabilities, Broadcasts in Spark, Effective & efficient Joins, Transformations and other during ingestion process itself.
  • Experience in writing SPARK UDF’s which is a feature of SPARK SQL which helps in code optimization.
  • Actively involved in the SPARK tuning techniques by successfully caching the RDD’s and increase the number of executors per node etc.
  • worked with different levels of data compressions like JSON, PARQUET, SNAPPY etc and writing into S3 with the desired partitioning.
  • Successful creation on workflow’s using Data bricks REST API’s for running the production data.
  • Successful creation of metric files (JSON format) for the Kibana dashboards.
  • Worked on developing Unix Shell scripts to automate Spark-Sql.
  • Spark transformations using Spark and Scala.
  • Worked on creating SPARK jobs using SCALA.
  • Using Spark SQL to query the headers to learn about the composition of the data allowing us to compare data from various sources.
  • Using Amazon Redshift to load the data from cloud.
  • Involved in the gathering of Business requirements and preparation of Information Template used for identifying data elements for future reporting needs.
  • Maintain documentation that supports system configuration, training and user experience.


Hadoop/ Spark Developer

Confidential, GA


  • Implemented Spark using Scala and Spark SQL for faster testing and processing of data.
  • Real time streaming the data using Spark with Kafka.
  • Responsible for building scalable distributed data solutions using Hadoop.
  • Responsible for cluster maintenance, adding and removing cluster nodes, cluster monitoring and troubleshooting, managing and reviewing data backups and Hadoop log files.
  • Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.
  • Upgrading the Hadoop Cluster from CDH3 to CDH4, setting up High Availability Cluster and integrating HIVE with existing applications.
  • Exploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, and Pair RDD’s.
  • Installed Oozie workflow engine to run multiple Hive and Pig jobs.
  • Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and extracted data from Teradata into HDFS using Sqoop.
  • Worked extensively with Sqoop for importing metadata from Oracle.
  • Configured Sqoop and developed scripts to extract data from MySQL into HDFS.
  • Created HBase tables to store various data formats of PII data coming from different portfolios.
  • Helped with the sizing and performance tuning of the Cassandra cluster.
  • Involved in the process of Cassandra data modelling and building efficient data structures.
  • Trained and mentored analyst and test team on Hadoop framework, HDFS, Map Reduce concepts, Hadoop Ecosystem.
  • Assist with the addition of Hadoop processing to the IT infrastructure.

Environment: Hadoop, MapReduce, HDFS, Hive, Java, SQL, Spark, Cloudera Manager, StormCassandra, Pig, Sqoop, PL/SQL, MySQL, Windows, Horton worksOozie, HBase

Hadoop Developer



  • Load and transform large sets of structured, semi structured and unstructured data.
  • Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
  • Mentored analyst and test team for writing Hive Queries.
  • Develop and maintains complex outbound notification applications that run on custom architectures, using diverse technologies including Core Java, J2EE, SOAP, XML, JMS, JBoss and Web Services.
  • Used Nagios Remote Plugin Executor (NRPE) for the remote system monitoring.
  • Involved in running Hadoop jobs for processing millions of records of text data.
  • Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
  • Used Ganglia PHP Web Front-end to view the gathered information from the web pages.
  • Developed multiple MapReduce jobs in java for data cleaning and pre-processing.
  • Involved in loading data from LINUX file system to HDFS.
  • Responsible for managing data from multiple sources.
  • Experienced in running Hadoop streaming jobs to process terabytes of xml format data.
  • Assisted in exporting analysed data to relational databases using Sqoop.
  • Created and maintained Technical documentation for launching HADOOP Clusters and for executing Hive queries and Pig Scripts.
  • Exported data to RDBMS via Sqoop to check whether the power saving program is successful or not.
  • Extensively used Sqoop for importing the data from RDBMS to HDFS.
  • Used Zookeeper to coordinate the clusters.
  • Handled the imported data to perform transformations, cleaning and filtering using Hive and Map Reduce.
  • Enabled speedy reviews and first mover advantages by using Oozie to automate data loading into the Hadoop Distributed File System (HDFS) and PIG to pre-process the data.
  • Installed and configured MapReduce, HIVE and the HDFS; implemented CDH3 Hadoop cluster on CentOS. Assisted with performance tuning and monitoring.


Java Developer



  • Involved in the process Design, Coding and Testing phases of the software development cycle.
  • Designed use-case, sequence and class diagram (UML).
  • Developed rich web user interfaces using JavaScript (pre-developed library).
  • Created modules in Java and C++, python.
  • Developed JSP pages with Struts framework, Custom tags and JSTL.
  • Developed Servlets, JSP pages, Beans, JavaScript and worked on integration.
  • Developed SOAP/WSDL interface to exchange usage and Image and terrain information from Geomaps.
  • Developed Unit test cases for the classes using JUnit.
  • Developed stored procedures to extract data from Oracle database.
  • Developed and maintained Ant Scripts for the build purposes on testing and production environments.
  • Designed and developed user interface components using AJAX, jQuery, JSON, JSP, JSTL & Custom Tag library.
  • Involved in building and parsing XML documents using SAX parser.


Hire Now