We provide IT Staff Augmentation Services!

Sr. Hadoop Developer/ Big Data Resume

4.00/5 (Submit Your Rating)

Baltimore, MD


  • Around 9 Years of IT industry experience with 6 years of experience in dealing with ApacheHadoop components like HDFS, MapReduce, Spark, Hive, Pig, Sqoop, Oozie, Zookeeper, HBase, Cassandra, MongoDB and Amazon Web Services.
  • Experience in performing in - memory data processing and real time streaming analytics using Apache Spark with Scala, Java and Python.
  • Developed applications for Distributed Environment using Hadoop, MapReduce and Python.
  • Developed MapReduce jobs to automate transfer of data from HBase.
  • Developing and Maintenance the Web Applications using the Web Server Tomcat.
  • Experience in integrating Hadoop with Ganglia and have good understanding of Hadoop metrics and visualization using Ganglia.
  • Good experience working with Hortonworks Distribution and Cloudera Distribution.
  • Very good understanding/knowledge ofHadoopArchitecture and various components such as HDFS, JobTracker, TaskTracker, NameNode, DataNode, Secondary Namenode, and MapReduce concepts.
  • Experience in data extraction and transformation using MapReduce jobs.
  • Proficient in working with Hadoop, HDFS, writing PIG scripts and Sqoop scripts.
  • Performed data analysis using Hive and Pig.
  • Expert in creating Pig and Hive UDFs using Java in order to analyze the data efficiently.
  • Experience in importing and exporting data using Sqoop from Relational Database Systems to HDFS and vice-versa.
  • Strong understanding of NoSql databases like HBase, MongoDB & Cassandra.
  • Experience in working on various Hadoop data access components like MapReduce, Pig, Hive, HBase, Spark and Kafka.
  • Strong understanding of Spark real time streaming and SparkSQL and experience in loading data from external data sources like MySQL and Cassandra for Spark applications.
  • Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios.
  • Intensive working experience with Amazon Web Services(AWS) using S3 for storage, EC2 for computing and RDS, EBS.
  • Excellent programming skills at higher level of abstraction using SCALA and JAVA.
  • Well versed with job workflow scheduling and monitoring tools like Oozie.
  • Loaded streaming log data from various web servers into HDFS using Flume.
  • Experience in using Sqoop, Oozie and Cloudera Manager.
  • Experience on Source control repositories like SVN, CVS and GIT.
  • Experience in improving the search focus and quality in ElasticSearch by using aggregations and Python scripts.
  • Hands on experience in application development using RDBMS, and Linux shell scripting.
  • Have experience with working on Amazon EMR and EC2 Spot instances.
  • Solid understanding of relational database concepts.
  • Extensively worked with Unified Modeling Tools (UML) in designing Use Cases, Activity flow diagram, Class diagrams, Sequence and Object Diagrams using Rational Rose, MS-Visio.
  • Very good experience in complete project life cycle (design, development, testing and implementation) of Client Server and Web applications.
  • Good Knowledge on Hadoop Cluster architecture and monitoring the cluster.
  • Adequate knowledge and working experience in Agile & Waterfall methodologies.
  • Support development, testing, and operations teams during new system deployments.
  • Practical knowledge on implementing Kafka with third-party systems, such as Spark and Hadoop.
  • Good team player and can work efficiently in multiple team environments and multiple products. Easily adaptable to the new systems and environments.
  • Possess excellent communication and analytical skills along with a can - do attitude.


Programming languages: C, Java, Python, Scala, SQL

HADOOP/BIG DATA: MapReduce, Spark, SparkSQL, PySpark, SparkR, Pig, Hive, Sqoop, HBaseFlume, Kafka Cassandra, Yarn, Oozie, Zookeeper, ElasticSearch

Databases: MySQL, PL/SQL, Mongo DB, HBase, Cassandra.

Operating Systems: Windows, Unix, Linux, Ubuntu.

Web Development: HTML, JSP, JavaScript, JQuery, CSS, XML, AJAX.

Web/Application Servers: Apache Tomcat, Sun Java Application Server

Tools: IntelliJ, Eclipse, Net Beans, Nagios, Ganglia, Maven

Scripting: BASH, JavaScript

Version Controls: GIT, SVN


Sr. Hadoop Developer/ Big Data

Confidential - Baltimore, MD


  • Analyzed SQL scripts and designed the solution to implement using PySpark.
  • Worked in Spark streaming to get ongoing information from the Kafka and store the stream information to HDFS.
  • Developed and Configured Kafka brokers to pipeline server logs data into Spark Streaming.
  • Completed data extraction, aggregation and analysis in HDFS by using PySpark and store the data needed to Hive.
  • Worked on MySQL database to retrieve information from storage using Python.
  • Experienced in implementing and working on the python code using shell scripting.
  • Responsible for developing data pipeline Apache NIFI and Spark/Scala to extract the data from vendor and store in HDFS and REDSHIFT.
  • Developed Python code to gather the data from HBase (Cornerstone) and designs the solution to implement usingPySpark.
  • Worked on integrating Python with Web development tools for developing Web Services in Python using XML, JSON.
  • Optimizing the poorly written Spark/Scala jobs by monitoring the YARN UI to use more parallelism.
  • Working on python's UNITTEST and PYTEST frameworks.
  • Used Python/ HTML / CSS to help the team implement dozens of new features in a massively scaled Google App Engine web application.
  • Loaded all data-sets into Hive from Source CSV files using spark and Cassandra from Source CSV files using Spark/PySpark.
  • Installed and configured Hadoop MapReduce, HDFS, HIVE, PIG, SQOOP, Flume, Oozie on the Hadoop cluster are installed and configured.
  • Strong Experience with Python automation in automating Rest API and UI automation using Selenium web driver using Python.
  • Having good knowledge on an Apache Cassandra for storing the data in a Cluster.
  • Developed and analyzed the SQL scripts and designed the solution to implement using Pyspark
  • Built an Ingestion Framework that would ingest the files from SFTP to HDFS using Apache NIFI and ingest financial data into HDFS.
  • Developed a fully automated continuous integration system using Git, Jenkins.
  • Performed job functions using Spark API's in Scala for real time analysis and for fast querying purposes.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Python and Scala.
  • Designed and developed scalable Azure APIs using Flask web framework in Python and Integrated with Azure API Management, Logical Apps and other Azure services.
  • Loaded all datasets into Hive from Source CSV files using Spark and Cassandra from Source CSV files using Spark/PySpark.
  • Perform data analysis on NoSQL databases such as HBase and Cassandra.
  • Providing a responsive, AJAX-test driven design using JavaScript libraries such as JavaScript, jQuery, AngularJS and Bootstrap.js - Using Subversion for version control.
  • Developed and implemented core API services using Python and Spark (PySpark).
  • Used Pyspark to process and analyze the data.
  • Written test cases using PyUnit test framework and Selenium Automation testing for better manipulation of test scripts.
  • Leveraged cloud and GPU computing technologies for automated machine learning and analytics pipelines, such as AWS, GCP
  • Conducted systems design, feasibility on recommend cost-effective cloud solutions such as Amazon Web Services (AWS), Microsoft Azure and Rackspace.
  • Worked on Micro services for Continuous Delivery environment using Docker and Jenkins.
  • Experience in building and architecting multiple Data pipelines, end to end ETL and ELT process for Data ingestion and transformation in GCP
  • Developed and deployed data pipeline in cloud such as AWS and GCP
  • Using Hive join queries to join multiple tables of a source system and load them to Elastic search tables.
  • Done batch processing of data sources using Apache Spark, Elastic search.
  • Implemented a Python-based distributed random forest via Python streaming.
  • Worked on the MySQL migration project to make the system completely independent of the database being used.

Environment: Python, Mapreduce, Spark, Hadoop, HBase, Scala, Kafka, Hive, Pig, Sqoop, RBAC, ACL, Kafka, Pandas, Docker, ReactJS, Google APIs, SOAP, REST, IntelliJ, Azure APIs, Shell Scripting, Selenium, AWS.

Sr. Hadoop Developer/ Big Data

Confidential - Devner, CO


  • Involved in installation, configuration and maintenance of Hadoop clusters for application development with Cloudera distribution.
  • Developed Kafka consumer’s API in Scala for consuming data from Kafka topics.
  • Developed end-to-end scalable distributed data pipelines which receiving data using distributed messaging systems Kafka through persistence of data into HDFS with Apache Spark using Scala.
  • Involved in performance tuning of Spark jobs using Cache and using complete advantage of cluster environment.
  • In the framework we just need to mention the table names, schemas and location of source file/ Sqoop parameters etc. and the framework will generate the entire code which includes Workdlow.xml.
  • Performed advanced operations like text analytics and processing, using in-memory computing capabilities of Spark using Scala.
  • Experience in query data using Spark SQL on Spark to implement Spark RDD’S in Scala.
  • Experienced in working with different scripting technologies like Python, UNIX shell scripts.
  • Performed POC on writing the spark applications in Scala, Python and R programming language.
  • Worked on Partitioning, Bucketing, Parallel execution, Map side Joins for optimization of necessary hive queries.
  • Performed Hive QL to create Hive tables and to write Hive queries to perform the data analysis.
  • Experience in collecting log data from web servers and pushed to HDFS using Flume and NoSql database Cassandra.
  • Used Oozie workflow to Manage and scheduling Jobs on a Hadoop Cluster and used Zookeeper for cluster coordination services.
  • Started using Apache NiFi to copy the data from local file system to HDP.
  • Implemented best offer logic using Pig scripts and Pig UDFs.
  • Developed a data pipeline using Kafka, Spark and Hive to ingest, transform and analysing data.
  • Used NIFI for the transformation of data from different components of Big data ecosystem.
  • Worked on different data sources like Oracle, Netezza, MySQL, Flat files etc. and experience with AWS components like Amazon Ec2 instances, S3 buckets and Cloud Formation templates.
  • Developed Python scripts to update content in the database and manipulate files.
  • Used Qlik sense to build customized interactive reports, worksheets, and dashboards.
  • Worked on different file formats like Sequence files, XML files and Map files using Map Reduce Programs.
  • Developed and implemented core API services using Python and Spark (PySpark).
  • Strong expertise on MapReduce programming model with XML, JSON, CSV file formats.
  • Involved in managing and organizing developers with regular code review sessions by utilizing Agile and Scrum Methodologies. Implemented Nifi flow topologies to perform cleansing operations before moving data into HDFS.
  • Experience in implementing Spark RDD transformations, actions, data frames, case classes to required data by using Spark core.
  • Migrated the computational code in hql toPySpark.
  • Completed data extraction, aggregation and analysis in HDFS by usingPySparkand store the data needed to Hive.
  • Provide support data analysts in running Pig and Hive queries.
  • Involved in HiveQL and Involved in Pig Latin.
  • Performed various POC’s in data ingestion, data analysis and reporting using Hadoop, MapReduce, Hive, Pig, Sqoop, Flume, Elastic Search.
  • Implemented Jira for bug tracking and Bit-bucket to code and code review.
  • Performed Data Migration to GCP
  • Implemented apache airflow DAG to find popular items in Redshift and ingest in the main PostgreSQL via a web service call.
  • Implemented Spark applications in data processing project to handle data from various sources and creating DStreams, Data frames on input data which we get from streaming service like Kafka.

Environment: Map Reduce, HDFS, Hive, Spark, Spark-SQL, Sqoop, IntelliJ, Apache Kafka, Java 7, Cassandra, Scala, Apache Pig, Apache Hive, Oozie, Nifi, Linux, AWS EC2, Agile development, Oracle 11g/10g, UNIX Shell scripting, Ambari, TezEclipse, Qlik sense and Cloudera.

Hadoop Developer/ Big Data

Confidential - New York, NY


  • Used Cassandra Query Language to design Cassandra database and tables with various configuration options.
  • Developed PIG UDF'S for manipulating the data according to Business Requirements and also worked on developing custom PIG Loaders.
  • Experience in integrating Hadoop with Ganglia and have good understanding of Hadoop metrics and visualization using Ganglia.
  • Involved in the review of functional and non-functional requirements.
  • Practical experience in developing Spark applications in Eclipse with Maven.
  • Strong understanding of Spark real time streaming and SparkSQL.
  • Used Apache Kafka for collecting, aggregating, and moving large amounts of data from application servers.
  • Loading data from external data sources like MySQL and Cassandra for Spark applications.
  • Developed Python and Shell scripts to automate the end-to-end implementation process of AI project.
  • Experience in selecting and configuring the right Amazon EC2 instances and access key AWS services using client tools and AWS SDKs.
  • Responsible for setting up QA environment and updating configurations for implementing scripts with Pig and Sqoop.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Python and Scala.
  • Knowledge on using AWS identity and Access Management to secure access to EC2 instances and configure auto-scaling groups using CloudWatch.
  • Firm understanding of optimizations and performance-tuning practices while working with Spark.
  • Good knowledge on compression and serialization to improve performance in Spark applications
  • Performed interactive querying using SparkSQL.
  • Involved in designing Kafka for multi data center cluster and monitoring it.
  • Responsible for importing real time data to pull the data from sources to Kafka clusters.
  • Develop predictive analytic using Apache Spark Scala APIs.
  • Worked on different file formats like Sequence files, XML files and Map files using Map Reduce Programs.
  • Strong expertise on MapReduce programming model with XML, JSON, CSV file formats.
  • Designed and Implemented Partitioning (static, Dynamic) and Bucketing in HIVE.AWS
  • Practical knowledge on Apache Sqoop to import datasets from MySQL to HDFS and vice-versa.
  • Good knowledge on building predictive models focusing on customer service using R programming.
  • Experience in reviewing and managing Hadoop log files.
  • Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
  • Experience in building batch and streaming applications with Apache Spark and Python.
  • Used the libraries built on Mlib to perform data cleaning and used R programming for dataset reorganizing
  • Debug CQL queries and implement performance enhancement practices.
  • Strong knowledge on Apache Oozie for scheduling the tasks.
  • Practical knowledge on implementing Kafka with third-party systems, such as Spark and Hadoop.
  • Experience in configuring Kafka brokers, consumers and producers for optimal performance.
  • Knowledge of creating Apache Kafka consumers and producers in Java.
  • Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map reduce way.
  • Practical knowledge of monitoring a Hadoop cluster using Nagios and Ganglia.
  • Experience with GIT for version control system.
  • Involved in loading data from UNIX file system to HDFS and developed UNIX scripts for job scheduling, process management and for handling logs from Hadoop.
  • Understanding technical specifications and documenting technical design documents.
  • Strong skills in agile development and Test-Driven development.
  • Have practical knowledge on implementing Internet of Things (IoT)

Environment: Hadoop Cloudera Distribution (CDH4), Java 7, Hadoop, Spark, SparkSQL, Mlib, R programming, Scala, Cassandra, IoT, MapReduce, Apache Pig, Apache Hive, HDFS, Sqoop, IntelliJ, Oozie, Kafka, Maven, Eclipse, Nagios, Ganglia, Zookeeper, AWS EC2, GIT, Ambari, TezEclipse, UNIX Shell scripting, Oracle 11g/10g, Linux, Agile development.

Hadoop Developer / Big Data

Confidential - Newport Beach, CA


  • Developed Spark scripts by using Scala as per the requirement.
  • Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
  • Designed and implemented Incremental Imports into Hive tables.
  • Developed and written Apache PIG scripts and HIVE scripts to process the HDFS data.
  • Involved in defining job flows, managing and reviewing log files.
  • Involved in Unit testing and delivered Unit test plans and results documents using Junit and MR unit.
  • Supported Map Reduce Programs those are running on the cluster.
  • As a Big Data Developer, implemented solutions for ingesting data from various sources and processing the Data-at-Rest utilizing Big Data technologies such as Hadoop, MapReduce Frameworks, HBase, Hive, Oozie, Flume, Sqoop etc.
  • Configured deployed and maintained multi-node Dev and Test Kafka Clusters and implemented data ingestion and handling clusters in real time processing using Kafka.
  • Imported data from AWS S3 into Spark RDD, Performed transformations and actions on RDD'
  • Developed Spark scripts by using Scala shell commands as per the requirement.
  • Imported Bulk Data into HBase Using Map Reduce programs.
  • Perform analytics on Time Series Data exists in HBase using HBase API.
  • Involvedin collecting, aggregating and moving data from servers to HDFS using Apache Flume.
  • Written Hive jobs to parse the logs and structure them in tabular format to facilitate effective querying on the log data.
  • Extracted the data from Teradata into HDFS/Databases/Dashboards using Spark Streaming.
  • Responsible for continuous monitoring and managing Elastic MapReduce cluster through AWS console.
  • Wrote multiple java programs to pull data from HBase.
  • Involved with File Processing using Pig Latin.
  • Involved in creating Hive tables, loading with data and writing hive queries that will run internally in map reduce way.
  • Created applications using Kafka, which monitors consumer lag within Apache Kafka clusters.
  • Experience in optimization of Map reduce algorithm using combiners and partitions to deliver the best results and worked on Application performance optimization for a HDFS cluster.
  • Worked on debugging, performance tuning of Hive & Pig Jobs
  • Used Hive to find correlations between customer's browser logs in different sites and analyzed them to build risk profile for such sites.
  • Worked with Spark Ecosystem using SCALA and HIVE Queries on different data formats like Text file and parquet.
  • Implemented business logic by writing UDFs in Java and used various UDFs from Piggybanks and other sources.
  • Created and maintained Technical documentation for launching HADOOP Clusters and for executing Hive queries and Pig Scripts.

Environment: Java, Hadoop, Map Reduce, Pig, Hive, Linux, Sqoop, Flume, Eclipse, AWS EC2, and Cloudera CDH 4.

Java/Hadoop Developer



  • Processed data into HDFS by developing solutions.
  • Analyzed the data using Map Reduce, Pig, Hive and produce summary results from Hadoop to downstream systems.
  • Used Pig as ETL tool to do transformations, event joins and some pre-aggregations before storing the data onto HDFS.
  • Developed data pipeline using flume, Sqoop and pig to extract the data from weblogs and store in HDFS.
  • Used Sqoop to import and export data from HDFS to RDBMS and vice-versa.
  • Created Hive tables and involved in data loading and writing Hive UDFs.
  • Exported the analyzed data to the relational database MySQL using Sqoop for visualization and to generate reports.
  • Created HBase tables to load large sets of structured data.
  • Managed and reviewed Hadoop log files.
  • Worked on Data ingestion to Kafka and Processing and storing the data Using Spark Streaming.
  • Involved in providing inputs for estimate preparation for the new proposal.
  • Worked extensively with HIVE DDLs and Hive Query language (HQLs).
  • Developed UDF, UDAF, UDTF functions and implemented it in HIVE Queries.
  • Implemented SQOOP for large dataset transfer between Hadoop and RDBMs.
  • Created Map Reduce Jobs to convert the periodic of XML messages into a partition avro Data.
  • Used Sqoop widely in order to import data from various systems/sources (like MySQL) into HDFS.
  • Created components like Hive UDFs for missing functionality in HIVE for analytics.
  • Developing Scripts and Batch Job to schedule a bundle (group of coordinators) which consists of various.
  • Used different file formats like Text files, Sequence Files, Avro.
  • Cluster co-ordination services through Zookeeper.
  • Assisted in creating and maintaining Technical documentation to launching HADOOP Clusters and even for executing Hive queries and Pig Scripts.
  • Assisted in Cluster maintenance, cluster monitoring, adding and removing cluster nodes and
  • Installed and configured Hadoop, Map Reduce, HDFS, Developed multiple Map Reduce jobs in java for data cleaning and pre-processing.

Environment: Hadoop, HDFS, Map Reduce, Hive, Kafka, Pig, Sqoop, HBase, Shell Scripting, Oozie, Oracle 11g.

Software Engineer



  • Designed, developed and executed Data Migration from Db2 Database to Oracle Database using Linux scripts, Java and SQL loader concepts.
  • A key member of the team and playing a key role in articulating the Design requirements for the Development of Automated tools that perform error free Configuration.
  • Developed UNIX and java utilities for Data migration from Db2 to Oracle. Sole developer and POC for the migration Activity.
  • Developed JSP pages, Servlets and HTML pages as per requirement.
  • Developed the necessary Java Beans, PL/SQL procedures for the implementation of business rules.
  • Developed user interface using JAVA Server Pages (JSP), HTML and Java Script for the Presentation Tier.
  • Developed JSP pages and client-side validation by java script tags.
  • Developed an own realm for Apache Tomcat Server for authenticating the users.
  • Developed front end controller in Servlet to handle all the requests.
  • Developed the web interface using JSP and developed struts action classes.
  • Responsible for both functional and non-functional requirements gathering, performing impact analysis and testing the solutions build on build basis.
  • Coding using Java, Java Script and HTML.
  • Used JDBC to provide database connectivity to database tables in Oracle.
  • Used WebSphere Application Server for application deployment.
  • Implemented Software Development Life Cycle (Requirements Analysis, Design, Development, Testing, Deployment and Support).

Environment: J2EE, IBM DB2, IBM WebSphere Application Server, EJB, JSP, Servlets, HTML, CSS, JavaScript, Oracle database, Unix Scripting and Windows 2000.

Java Developer



  • Communicate with Clients for Requirements Gathering, Explaining the requirements to Team Members
  • Analyzing the Requirements and Designing Screen Prototypes.
  • Involved in Project Documentation.
  • Involved in creation of Basic DB Architecture for the application.
  • Involved in adding solution to VSS.
  • Designing & Development of Screens.
  • Coded JS functions for client validations.
  • Created user Controls for reusability.
  • Creation of Tables, Views, Packages, Sequences, Functions for all the modules of the project.
  • Developed Crystal Reports.
  • Integrating the functionality of all modules.
  • Involved in deploying the application.
  • Unit testing & integration testing.
  • Designing test plan, test cases and checking the validation.
  • Test whether the application meets the business requirements.
  • Implementation of the system at client Location.
  • Giving Training to Application users, interacting with the client, understanding the change requests if any from client.
  • Responsible for Immediate Error Resolving.

Environment: Core Java, JavaScript, J2EE, Servlets, JSP, Design Patterns, JDBC, HTML, CSS, AJAX, Hibernate, WebLogic, Oracle 8i, ANT, LINUX, SVN, Windows XP

We'd love your feedback!