We provide IT Staff Augmentation Services!

Sr. Big Data/spark Developer Resume

Cincinnati, OH


  • 8 years of professional IT experience working in development, testing and support of Web based Application including 4 years of experience in Big Data Ecosystem.
  • Experience with different distribution systems of Hadoop like enterprise versions of Hortonworks, Cloudera and good knowledge on Amazon’s EMR and MAPR distribution
  • Hands on Experience on different big data ingestion tools like Sqoop, Flume, Kafka.
  • Good working experience using Sqoop to import data into HDFS from RDBMS and vice versa.
  • Experience in capturing data and importing it to HDFS using Flume and Kafka for semi - structured data and Sqoop for existing relational databases.
  • Worked on Implementing and optimizing Hadoop/MapReduce algorithms for Big Data analytics.
  • Used Scala and Python to convert Hive/SQL queries into RDD transformations in Apache Spark.
  • Implemented Hive Partitioning and Bucketing on the collected data in HDFS.
  • Experience in developing data pipeline using Sqoop, Flume and pig to extract the data from weblogs and store in HDFS.
  • Good experience in analyzing data using Hive Query Language, Pig Latin and custom MapReduce programs in Java along with using User Defined Functions.
  • Hands on experience in Hadoop components like HDFS, MapReduce, Job Tracker, Name Node, Data Node and Task Tracker.
  • Expertise in writing Spark RDD transformations, Actions, Data Frames, Case classes for the required input data and performed the data transformations using Spark-Core
  • Created User Defined Functions (UDFs), User Defined Aggregated Functions (UDAFs) in PIG and Hive.
  • Worked on MongoDB for distributed Storage and Processing.
  • Implemented Collections and Aggregation Frameworks in MongoDB.
  • Implemented B Tree Indexing on the data files which are stored in MongoDB.
  • Good knowledge in using MongoDB CRUD operations.
  • Responsible for using Flume sink to remove the date from Flume channel and deposit in NO-SQL database like MongoDB
  • Implemented Flume NG MongoDB sink to load the JSON- styled data into MongoDB.
  • Excellent understanding and knowledge of NOSQL databases like HBase, MongoDB and Cassandra and its integration with Hadoop cluster.
  • Expertise in performing real time analytics on big data using HBase and Cassandra.
  • Expertise in Cluster management and configuring Cassandra Database.
  • Proficient in NoSQL databases including HBase, Cassandra, MongoDB and its integration with Hadoop cluster.
  • Experience designing and executing time driven and data driven Oozie workflows.
  • Responsible for fetching real time data using Kafka and processing using Spark and Scala.
  • Developed SPARK CODE using SCALA and Spark-SQL/Streaming for faster testing and processing of data.
  • Experienced in integrating Kafka with Spark streaming for high speed data processing.
  • Responsible for fetching real time data using Kafka and processing using Spark and Scala.
  • Good knowledge in setting up batch intervals, split intervals and window intervals in Spark Streaming.
  • Very Good understanding and Working Knowledge of Object Oriented Programming (OOPS), Multithreading in Core Java, J2EE, Web Services (REST, SOAP), JDBC, Java Script, JQuery.
  • Experience in configuring the Zookeeper to coordinate the servers in clusters, job scheduling for Spark Jobs and to maintain the data consistency.
  • Implemented usage of Amazon EMR for processing Big Data across a Hadoop Cluster of virtual servers on Amazon Elastic Compute Cloud (EC2) and Amazon Simple Storage Service (S3).
  • Experienced with Faceted Reader search, Full Text Search Data querying using Solr.
  • Coordinated with SCRUM team in delivering agreed user stories on time for every sprint.
  • Experience in complete Software Development Life Cycle (SDLC) in both Waterfall and Agile methodologies
  • Developed stored procedures and queries using PL/ SQL .
  • Development experience in DBMS like Oracle, MS SQL Server, Teradata, and MYSQL .
  • Good understanding of end-to-end content lifecycle, web content management, content publishing/deployment, and delivery processes.
  • Good analytical, communication, problem solving skills and adore learning new technical, functional skills.


Big Data Ecosystem: Hadoop, MapReduce, Hive, Spark, YARN, Sqoop, Oozie, Pig, Flume, Kafka, Impala, Zookeeper

Languages: C, C++, Java, Scala, Python, SQL, PL/SQL, Pig Latin, HiveQL, Unix, Shell Scripting

Java Technologies: JavaBeans, JSP, Servlets, JDBC, struts and JNDI

XML Technologies: XML, XSD, DTD, JAXP (SAX, DOM), JAXB

Web Design: HTML, DHTML, AJAX, JavaScript, JQuery and CSS, Angular JS, Ext JS and JSON

Development Tools: Eclipse, Intellij, Microsoft SQL Studio, Toad, NetBeans

Operating systems: UNIX, LINUX, Mac OS and Windows Variants


Sr. Big Data/Spark Developer

Confidential, Cincinnati, OH


  • Involved in analyzing business requirements and prepared detailed specifications that follow project guidelines required for project development.
  • Used Sqoop to import data from Relational Databases like MySQL, Oracle.
  • Involved in importing structured and unstructured data into HDFS.
  • Responsible for fetching real time data using Kafka and processing using Spark and Scala.
  • Involved in loading data from rest endpoints to Kafka Producers and transferring the data to Kafka Brokers.
  • Worked on Kafka to import real time weblogs and ingested the data to Spark Streaming.
  • Developed business logic using Kafka Direct Stream in Spark Streaming and implemented business transformations.
  • Worked on Building and implementing real-time streaming ETL pipeline using Kafka Streams API.
  • Worked on Hive to implement Web Interfacing and stored the data in Hive tables.
  • Migrated Map Reduce programs into Spark transformations using Spark and Scala.
  • Configured Spark streaming to get ongoing information from the Kafka and store the stream information to HDFS.
  • Experienced with Spark Context, Spark-SQL, Spark YARN.
  • Implemented Spark-SQL with various data sources like JSON, Parquet, ORC and Hive.
  • Implemented Spark Scripts using Scala, Spark SQL to access hive tables into spark for faster processing of data.
  • Loaded the data into Spark RDD and do in memory data Computation to generate the Output response.
  • Worked on loading AVRO/PARQUET/TXT files in Spark Framework using Java/Scala language and created Spark Data frame and RDD to process the data and save the file in parquet format in HDFS to load into fact table using ORC Reader.
  • Good knowledge in setting up batch intervals, split intervals and window intervals in Spark Streaming.
  • Implemented data quality checks using Spark Streaming and arranged passable and bad flags on the data.
  • Implemented Hive Partitioning and Bucketing on the collected data in HDFS.
  • Involved in Data Querying and Summarization using Hive and Pig and created UDF’s, UDAF’s and UDTF’s.
  • Implemented Sqoop jobs for large data exchanges between RDBMS and /Hive clusters.
  • Extensively used Zookeeper as a backup server and job scheduled for Spark Jobs.
  • Knowledge on MLLib (Machine Learning Library) framework for auto suggestions.
  • Developed traits and case classes etc in Scala.
  • Developed Spark scripts using Scala shell commands as per the business requirement.
  • Worked on Cloudera distribution and deployed on AWS EC2 Instances.
  • Experienced in loading the real-time data to NoSQL database like Cassandra.
  • Experienced in using Datastax Spark Connector which is used to store the data in Cassandra database from Spark.
  • Involved in NoSQL (Datastax Cassandra) database design, integration, implementation, written scripts and invoked them using CQLSH.
  • Well versed in using Data Manipulations, Compactions, tombstones in Cassandra.
  • Experience in retrieving the data present in Cassandra cluster by running queries in CQL (Cassandra Query Language).
  • Worked on connecting Cassandra database to the Amazon EMR File System for storing the database in S3.
  • Implemented usage of Amazon EMR for processing Big Data across a Hadoop Cluster of virtual servers on Amazon Elastic Compute Cloud (EC2) and Amazon Simple Storage Service (S3).
  • Deployed the project on Amazon EMR with S3 connectivity for setting a backup storage.
  • Well versed in using of Elastic Load Balancer for Autoscaling in EC2 servers.
  • Configured work flows that involves Hadoop actions using Oozie.
  • Experienced with Faceted Reader search, Full Text Search Data querying using Solr.
  • Used Python for pattern matching in build logs to format warnings and errors.
  • Coordinated with SCRUM team in delivering agreed user stories on time for every sprint.

Environment: Hadoop YARN, Spark SQL, Spark-Streaming, AWS S3, AWS EMR, Spark-SQL, GraphX, Scala, Python, Kafka, Hive, Pig, Sqoop, Solr, Cassandra, Cloudera, Oracle 10g, Linux.

Big Data Developer

Confidential, Cleveland, OH


  • Experienced in designing and deployment of Hadoop cluster and different Big Data analytic tools including Pig, Hive, HBase, Oozie, Sqoop, Kafka, Spark, Impala.
  • Imported weblogs and unstructured data using the Apache Flume and store it in Flume channel.
  • Loaded the CDRs from relational DB using Sqoop and other sources to Hadoop cluster by Flume.
  • Developed business logic in Flume interceptor in Java.
  • Implementing quality checks and transformations using Flume Interceptor.
  • Worked on the Ad hoc queries, Indexing, Replication, Load balancing, Aggregation in MongoDB.
  • Experience in managing MongoDB environment from availability, performance and scalability perspectives.
  • Used Spark API over Cloudera Hadoop YARN to perform analytics on data.
  • Developed multiple MapReduce jobs using in Java for data processing.
  • Developed workflows using custom MapReduce, Pig, Hive, and Sqoop.
  • Implemented Data Ingestion in real time processing using Kafka.
  • Expertise in integrating Kafka with Spark streaming for high speed data processing.
  • Developed multiple Kafka Producers and Consumers as per the software requirement specifications.
  • Developed Spark code using Scala and Spark-SQL/Streaming for faster processing of data
  • Configured Spark Streaming to receive real time data and store the stream data to HDFS.
  • Implemented Spark using Scala and Spark SQL for faster testing and processing of data.
  • Documented the requirements including the available code which should be implemented using Spark, Hive, HDFS and SOLR.
  • Extract Real time feed using Kafka and Spark Streaming and convert it to RDD and process data in the form of Data Frame and save the data as Parquet format in HDFS.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Scala.
  • Optimized Hive QL/ pig scripts by using execution engine like Spark.
  • Worked and learned a great deal from Amazon Web Services (AWS) Cloud services like EC2, S3, EBS, RDS and VPC.
  • Worked in provisioning and managing multi-tenant Hadoop clusters on public cloud environment - Amazon Web Services (AWS) and on private cloud infrastructure - Openstack cloud platform
  • Worked with cloud administrations like Amazon web services (AWS).
  • Strong experience in working with ELASTIC MAPREDUCE and setting up environments on Amazon AWS EC2 instances.
  • Experience working with Apache SOLR for indexing and querying.
  • Experience in using Solr and Zookeeper technologies.
  • Worked with Apache Spark which provides fast and general engine for large data processing integrated with functional programming language Scala.
  • Developed POC using Scala, Spark SQL and MLlib libraries along with Kafka and other tools as per requirement then deployed on the Yarn cluster.
  • Experience in performance tuning a Cassandra cluster to optimize it for writes and reads.
  • Experience in Data modeling and connecting Cassandra from spark and saving summarized data frame to Cassandra.
  • Experience designing and executing time driven and data driven Oozie workflows.
  • Exploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop.

Environment: Hadoop, HDFS, MapReduce, pig, Hive, pig scripts, YARN, Sqoop, Oozie, Python, AWS, Shell Scripting, Impala, Spark, spark-SQL, Talend, Cassandra, SOLR, Zookeeper, Scala, Kafka, Cloudera, MongoDB.

Big Data Developer

Confidential, Beachwood, OH


  • Involved in handling large amount of data coming from different sources and involved in HDFS maintenance and loading of structured and unstructured data.
  • Involved in defining job flows, managing and reviewing log files.
  • Developed Map Reduce jobs in Java to perform data cleansing and pre-processing.
  • Migrated large amount of data from various Databases like Oracle, Netezza, MySQL to Hadoop.
  • Imported Bulk Data into HBase Using Map Reduce programs.
  • Developed and written Apache PIG scripts and HIVE scripts to process the HDFS data.
  • Imported the weblogs using Flume.
  • Perform analytics on Time Series Data exists in HBase using HBase API.
  • Designed and implemented Incremental Imports into Hive tables.
  • Involved in Unit testing and delivered Unit test plans and results documents using Junit and MRUnit.
  • Written Hive jobs to parse the logs and structure them in tabular format to facilitate effective querying on the log data.
  • Design and Implementation of Batch jobs using Sqoop, MR2, PIG, Hive.
  • Involved with File Processing using Pig Latin.
  • Scheduled jobs using Oozie workflow Engine.
  • Worked on various compression techniques like GZIP and LZO.
  • Ingesting Log data from various web servers into HDFS using Apache Flume.
  • Involved in creating Hive tables, loading with data and writing Hive queries that will run internally in Map Reduce way.
  • Experience in optimization of Map Reduce algorithm using combiners and partitions to deliver the best results and worked on Application performance optimization for a HDFS cluster.
  • Implemented business logic by writing UDFs in Java and used various UDFs from Piggybanks and other sources.
  • Used Hive to find correlations between customer's browser logs in different sites and analyzed them to build risk profile for such sites.
  • Created and maintained Technical documentation for launching HADOOP Clusters and for executing Hive queries and Pig Scripts.
  • Active involvement in SDLC phases (Design, Development, Testing), Code review etc.
  • Active involvement in Scrum meetings and Followed Agile Methodology for implementation.

Environment: Java, Hadoop, MapReduce, Pig, Hive, Oozie, Linux, Scoop, Flume, Oracle, MySQL, Eclipse, AWS EC2, Cloudera.

Jr. Big Data Developer

Confidential, Akron, OH


  • Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, Hive and Sqoop.
  • Developed simple and complex Map Reduce programs in Java for Data Analysis on different data formats
  • Optimized Map/Reduce Jobs to use HDFS efficiently by using various compression mechanisms.
  • Developed Spark program in Scala to analyze reports using Machine Learning models.
  • Experienced in implementing static and dynamic partitioning in hive.
  • Experience in customizing map reduce framework at different levels like input formats, data types and partitions.
  • Extensively used Sqoop to import/export data between RDBMS and hive tables, incremental imports and created Sqoop jobs for last saved value.
  • Involved in loading data from LINUX file system to HDFS.
  • Developing Scripts and Autosys Jobs to schedule a bundle (group of coordinators), which consists of various Hadoop Programs using Oozie.
  • Created Oozie workflow engine to run multiple Hive jobs.
  • Experienced with using different kind of compression techniques to save data and optimize data transfer over network using Snappy in Hive tables.
  • Creating Hive tables, loading with data and writing Hive queries which will run internally in Map Reduce way
  • Used Zookeeper for providing coordinating services to the cluster.

Environment: Hadoop, Cloudera (CDH 4), HDFS, Hive, Flume, Sqoop, Pig, Java, Eclipse, Teradata, MongoDB, Ubuntu, UNIX, and Maven.

Java Developer



  • Understanding requirement and the technical aspects and architecture of the existing system.
  • Help Design application development using Spring MVC framework, front-end interactive page design using HTML, JSP, JSTL, CSS, JavaScript, jQuery and AJAX.
  • Utilized various JavaScript and jQuery libraries, AJAX for form validation and other interactive features.
  • Involved in writing SQL queries for fetching data from Oracle database.
  • Developed multi-tiered web - application using J2EE standards.
  • Designed and developed Web Services to store and retrieve user profile information from database.
  • Used Apache Axis to develop web services and SOAP protocol for web services communication.
  • Used Spring DAO concept to interact with Database using JDBC template and Hibernate template.
  • Well Experienced in deploying and configuring applications onto application servers like Web logic, WebSphere and Apache Tomcat.
  • Followed AGILE Methodology and SCRUM to deliver the product with cross-functional skills.
  • Used JUnit to test persistence and service tiers. Involved in unit test case preparation.
  • Hands on experience in software configuration / change control process and tools like Subversion (SVN), Git CVS and Clear Case.
  • Worked closely with team members on and offshore in development when having dependencies.
  • Involved in sprint planning, code review, and daily standup meetings to discuss the progress of the application.

Environment: HTML, Ajax, Servlets, JSP, SQL, JavaScript, CSS, XML, SOAP, Tomcat Server, Hibernate, JDBC, Agile, Git, SVN.

Java Developer



  • Involved in gathering business requirements, analyzing the project and created UML diagrams such as Use Cases, Class Diagrams, Sequence Diagrams and flowcharts for the optimization Module using Microsoft Visio
  • Configured faces-config.xml for the page navigation rules and created managed and backing beans for the Optimization module.
  • Developing Enterprise Application using Spring MVC, JSP, MySQL
  • Working on developing client-side Web Services components using Jax-Ws technologies
  • Extensively worked on JUnit for testing the application code of server-client data transferring
  • Developed and enhanced products in design and in alignment with business objectives
  • Used SVN as a repository for managing/deploying application code
  • Involved in the system integration and user acceptance tests successfully
  • Developed front end using JSTL, JSP, HTML and Java Script
  • Used XML to maintain the Queries, JSP page mapping, Bean Mapping etc.
  • Used Oracle 10g as the backend database and written PL/SQL scripts.
  • Maintained and modified system based on user feedbacks using the OO concepts
  • Implemented database transactions using Spring AOP & Java EE CDI capability
  • Enriched organization reputation via fulfilling requests and exploring opportunities
  • Business Analysis, Reporting Service and Integrate to Sage Accpac (ERP)
  • Developing new and maintaining existing functionality using SPRING MVC, Hibernate
  • Developed test cases for integration testing using JUnit
  • Creating new and maintaining existing web pages build in JSP, Servlet.

Environment: Java, Spring MVC, Hibernate, MSSQL, JSP, Servlet, JDBC, ODBC, JSF, Servlet, NetBeans, GlassFish, Spring, Oracle, MySQL, Sybase, Eclipse, Tomcat, WebLogic Server

Hire Now