We provide IT Staff Augmentation Services!

Big Data Developer. Resume

Bloomfield, CT


  • Over 7+ years of experience in IT industry which includes 3+ years of experience in Big Data Practices and Technologies like HDFS, MapReduce, Hive, Pig, HBase, Sqoop, Oozie, Flume, Spark, Kafka.
  • 3 years Extensive Experience in design and development of multi - tier applications using Jdk 1.8, J2EE, springframeworks.
  • Expertise in developing spark streaming applications in Python and Scala using spark RDD's, Data frames, Spark SQL.
  • Expertise in creating Kafka topics, MapR Streams and MapR Topics.
  • Highly skilled in integrating Kafka with Spark streaming for high speed data processing.
  • Worked with NoSQL databases like HBase and MapR-DB for information extraction and place huge amount of data.
  • Extensive experience in Shell scripting.
  • Experience with Oozie workflow engine in running jobs with actions that run Sqoop, Pig and Hive jobs.
  • Importing and exporting data from different Relational databases like MySQL, Oracle into HDFS and Hive using Sqoop.
  • Extensive experience working in Oracle, DB2, SQL Server and MySQL database.
  • Experience in cluster monitoring tools like MapR Control System (MCS), Cloudera Manager.
  • Experience in developing real time streaming pipelines using Apache Kafka, MapR Streams and Spark Streaming.
  • Hands on experience in cloud services like AWS.
  • Knowledgeon different distributions of Hadoop likeCloudera and MapR.
  • Worked on data streaming tools and ETL tools like Streamsets Data Collector, Attunity Replicate.
  • Used Agile (SCRUM) methodologies for Software Development.
  • Worked on HIVE queries to perform data analysis, data transfer and table design to load data into Hadoop environment.
  • Work closely with business partners to understand business requirements and design solutions based on those requirements.
  • Extensively involved in writing PL/SQL, stored procedures, functions and packages.
  • Experience in writing UDFs and configuring CRON Jobs.
  • Good Knowledge on reporting and data visualization tools like Oracle Data Visualization Desktop, Tableau and Grafana.
  • Outstanding communication and presentation skills, willing to learn, adapt to new technologies and third-party products.


Hadoop/Big data: HDFS, Hive, Map Reduce, Spark, Sqoop, HBase, Kafka, Oozie

No SQL Databases: HBase, MapR-DB

Languages: Python, Scala, Core Java, Unix Shell scripts, SQL

Web/Application Server: Apache Tomcat

Databases: Oracle, DB2, SQL Server, MySQL

IDEs: Eclipse, Intellij, DB Visualizer

Other Tools &packages: CVS, SVN, JUnit, Maven, ANT, GitHub, Streamsets Data Collector, Oracle DVD, Grafana, Tableau.

SDLC Methodology: Agile, Waterfall model

Operating Systems: Linux, UNIX, Windows

Office Tools: MS Office, Word, Power Point


Confidential - Bloomfield, CT

Big Data Developer.


  • Configured real-time streaming pipeline from DB2 to HDFS using Apache Kafka.
  • Created Kafka topics.
  • Developed PySpark application to consume data from Apache Kafka topics and publish to HDFS and HBase.
  • Worked on Apache Flume to stream data from Oracle to Apache Kafka topics.
  • Managed docker images using Quay.
  • Created hive managed and external tables.
  • Used Hue and Cloudera Manager to monitor Spark jobs.
  • Developed SQOOP scripts to load data from Oracle to Hive external tables.
  • Worked on Grafana for real-time visualizations.

Environment: Cdh 5.7.0, Apache Kafka 1.0.0, Hive 1.1.0, HBase 1.2.0, Hue, Cloudera Manager, Spark 1.6.0, Python 2.6.6, SQOOP, Oozie, Pig, IntelliJ, Kafka Connect Framework, Grafana, GIT.

Confidential - Boston, MA

Big Data Developer.


  • Involved in Requirement analysis, Design, development and testing of the application.
  • Configured Kafka Connect JDBC with SAP HANA and MapR Streams for both real-time streaming and batch process.
  • Created MapR-Event Streams and Kafka topics.
  • Worked on Attunity Replicate to load data from SAP ECC to Apache Kafka topics.
  • Developed Spark Streaming application using Pythonto stream data from MapR Event Streams and Apache Kafka topics to Hive and MapR-DB and also to stream data from one topic to the other topic with in the MapR Event Streams.
  • Worked on DStreams (Discretized Stream), RDD’s (Resilient Distributed Dataset), Dataframes, Spark SQL to build the spark streaming application.
  • Involved in creating SQL queries to extract data, to perform joins on the tables in SAP HANA and MySQL.
  • Worked with NoSQL databases like HBase in creating tables to load large sets of semi structured data coming from source systems.
  • Implemented Partitioning, Dynamic Partition, and Bucketing in Hive for efficient data access.
  • Used Hue and MapR Control System (MCS) to monitor and troubleshoot Spark jobs.
  • Developed SQOOP scripts to move data from MapR-FS to SAP HANA.
  • Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
  • Installed and configured Kafka Connect JDBC in AWS EC2 instance.
  • Created stored procedures in MySQL to improve data handling and ETL Transactions.
  • Worked on data validation using HIVE and also written Hive UDFs.
  • Managed Linux and Windows virtual servers on AWSEC2.
  • Used Jenkins AWS code deploy plugin to deploy into AWS.
  • Configured SAP HANAsource connector with SAP HANA as source and Apache Kafka topic as target for real time streaming and batch processing.
  • Provisioned, installed and configured SAP HANA enterprise edition on AWS cloud EC2 instance.
  • Developed Streaming application to stream data from MapR ES to HBase.
  • Streamed data from Apache Kafka topics to time series database OPEN TSDB.
  • Built dashboards and visualizations on top of MapR-DB and Hive using Oracle data visualizer desktop. Built real-time visualizations on top of Open TSDB using Grafana.
  • Worked on UNIX shell scripts and automation of the ETL processes using UNIX shell scripting.

Environment: MapR 6.0, Apache Kafka 1.0.0, Hive 2.1, HBase 1.1.8, Hue, MapR-DB, MapR-FS, Attunity Replicate, Spark 2.1.0, Python,AWS, SAP HANA, SQOOP, Oozie, Pig, IntelliJ, Kafka Connect Framework, DB Visualizer, Oracle Data Visualizer Desktop, Stream-sets Data collector, MapR-ES, MySQL, GIT.

Confidential - Minneapolis, MN

Big Data Developer


  • Worked on enhancing the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Spark RDD's.
  • Worked on MySQL for identifying required tables and views to export into HDFS.
  • Loaded data from MySQL to HDFS to development cluster for validation and cleansing.
  • Created Apache Kafka topics.
  • Configured Streamsets data collector with Apache Kafka to stream real time data from different sources(database & files) into Kafka topics.
  • Developed streaming application to stream data from Kafka topics to Hive using Spark, Python.
  • Worked on real time processing and batch processing of data sources using Apache Spark, Elastic search, Spark Streaming, Apache Kafka.
  • Created scripts for importing data into HDFS/Hive using Sqoop from DB2.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, python, Spark SQL.
  • Worked on migrating PIG scripts and MapReduce programs to Spark Data frames API and Spark SQL to improve performance.
  • Conducted POC's for real time streaming of data from MySQL to Hive and HBase.
  • Worked on Sequence files, RC files, Map side joins, bucketing, Partitioning for Hive performance enhancement and storage improvement.
  • Handled importing & exporting of large data sets from various data sources into HDFS and vice-versa using Sqoop, performed transformations using Hive and loaded data into HDFS.
  • Built dashboards and visualizations on top of Hive using Tableau and published those reports on tableau online accounts and on the browser using iframe.
  • Monitoring Hadoop Cluster through Cloudera Manager and Implementing alerts based on Error messages.
  • Worked on Java Mail API to send text and email notifications to Customers.
  • Loaded data from UNIX file system to HDFS.

Environment: Cloudera, Apache Kafka, HDFS, Python, Hive, Spark, Spark SQL, PIG, Map Reduce, SQOOP, IntelliJ,Tableau, Stream-sets Data collector, UNIX, MySQL, GIT.

Confidential -San Diego, CA

Data Engineer


  • Involved in testing SQL Scripts for report development and handled the performance issues effectively.
  • Performed data cleaning depending on the requirement using SQL to ensure data quality and completeness to ensure proper accuracy of data.
  • Worked extensively with Flume for importing data from various webservers to HDFS.
  • Developed end to end Spark applications using Scala to perform various data cleansing, validation, transformation and summarization activities according to the requirement.
  • Monitoring systems and services through Cloudera Manager to make the clusters available for the business.
  • Worked on Flume to load the log data from multiple sources directly into HDFS.
  • Developed Sqoop scripts for importing and exporting data into HDFS and Hive.
  • Used Zookeeper to store offsets of messages consumed for a specific topic and partition by a specific Consumer Group in Kafka.
  • Worked with HBase in creating HBase tables to load large sets of semi structured data coming from various sources.
  • Scheduled the workflows using Oozie workflow scheduler.
  • Worked in Agile and used JIRA for maintain the stories about project.
  • Worked with SQOOP import and export functionalities to handle large data set transfer between DB2 database and HDFS.
  • Created hive queries for extracting data and sending them to clients.

Environment: Apache Kafka, HDFS, Hive, Spark, GIT, MySQL, PIG, Python,DB2, Cron Jobs, UNIX, Spark SQL, SQOOP, OOZIE.


Java Developer


  • Involved in the implementation of design using vital phases of the Software development life cycle.
  • Involved in design, development and testing of the application.
  • Implemented the object-oriented programming concepts for validating the columns of the import file.
  • Responsible for creating RESTful Web services using JAX-RS.
  • Used DOM Parser to parse the xml files.
  • Implemented complex back-end component to get the count in no time against large size MySQL database (about 4 crore rows) using Java multi-threading.
  • Experience working in agile development following SCRUM process, Sprint and daily stand-up meetings.
  • Developed front-end screens using JSP, HTML, JQuery, JavaScript and CSS.
  • Participate in OOAD, domain modelling, and system architecture.
  • Used WinSCP to transfer file from local system to other system.
  • Coming up with the test cases for unit testing before the QA release.
  • Working closely with QA team and coordinating on fixes.

Environment: Java, Core Java, Apache Tomcat, Maven, JavaScript, RESTful Web Services,Weblogic, JBoss, Eclipse IDE, Apache CXF, FTP, HTML, CSS.


Java/J2EE Developer


  • Analyzed and modified existing code wherever required and participated in developing the designs document.
  • Uses Rational Rose for model driven development and UML modelling.
  • Responsible and active in the analysis, design, implementation and deployment of full software development life-cycle (SDLC) of the project.
  • Developed the presentation layer using JSP, HTML, and CSS.
  • Performed form validations using JavaScript and Struts validators.
  • Used JDBC technology to establish connection with Oracle database Communicated with the database using PL/SQL.
  • Participated in understanding of business requirements, design and development of the project.
  • Migrated to a Struts based system from the existing JSP/ Servlets/ Beans based application.
  • Developed JSP pages and client-side validation by java script tags.
  • Developed Web services for sending and getting data from different applications using.
  • Implement Front controller design pattern.
  • Resolved critical bugs.

Environment: Java, J2EE Servlet, JSF 2, XML, JSON, HTML, CSS, JQuery, Spring 3.0, Log4j, Git, Maven, Eclipse, Apache Tomcat 6, and Oracle 11g.

Hire Now