We provide IT Staff Augmentation Services!

Spark Veloper Resume

0/5 (Submit Your Rating)



  • 8 years of Experience in Enterprise Application Development, Web Applications Client - Server Technologies using various languages and tools like Python, Java, J2EE, JSP and Servlets.
  • 4+ years of experience in design, development, maintenance and support of Big Data Analytics using Hadoop Ecosystem tools Pig, Hive, MapReduce, Spark, Kafka and flume
  • Over 2 years of experience in SparkSQL, Spark Streaming, MLib and Graphx
  • Hands on experience in developing and deploying enterprise based applications using major components inHadoopecosystem likeHadoop 2.x,Map Reduce 2.x, YARN, Hive, Pig, HBase, Flume, Sqoop, Spark, Storm, Kafka, Oozie and Zookeeper.
  • Experience in developing and designing POCs using Scala and deployed on the Yarn cluster, compared the performance of Spark, with Hive and SQL/Oracle.
  • Experience in converting Hive/SQL queries into Spark transformations using Python.
  • Hands on experience with NoSQL Databases HBase, Mongodb and Cassandra.
  • Extensive Experience in Cassandra, Hbase Database Architecture.
  • Knowledge on Cloud technologies like AWS Cloud.
  • Experience on ETL development using Kafka, Flume, and Sqoop.
  • Good knowledge in Kafka and Messaging systems.
  • Good experience in Hive partitioning, bucketing and peform different types of joins on Hive tables and implementing Hive serdes like REGEX, JSON and Avro
  • Configured and implemented applications with messaging systems, Kafka to guarantee data quality in high-speed processing.
  • Extended Pig and Hive core functionality by writing custom UDFs.
  • Wrote Ad-hoc queries for analyzing the data using HIVE QL
  • Excellent working experience on SQL & PL/SQL and Oracle.
  • Experience in UNIX shell scripting and has good understanding of OOPS, OOAD, Data structures and Design Patterns.
  • Comprehensive knowledge and experience in process improvement, normalization/de-normalization, data extraction, data cleansing, data manipulation on HIVE.
  • Written multiple MapReduce programs in Java for data extraction, transformation and aggregation from multiple file formats including XML, JSON, CSV and other compressed file formats
  • Familiar with data architecture including data ingestion pipeline design,Hadoopinformation architecture, data modeling and data mining, machine learning and advanced data processing.
  • Extensive experienced in build/deploy multi module projects using Ant, Maven and CI servers like Jenkins,
  • Experience in working with web development technologies such as HTML, CSS, JavaScript and Python.
  • Extracted data from log files and push into HDFS using Flume.
  • Scheduled workflow using Oozie workflow Engine
  • Hands on experience in designing and coding web applications using Core Java and J2EE technologies like spring, Hibernate, JMS, And Angular JS
  • Extensive experience in Web services SOAP and RESTFUL web services.
  • Extreme Knowledge on Networking Protocols such as TCP/IP, FTP, HTTP, HTTPS and Socket programming.
  • Good experienced in working with agile, scrum and Waterfall methodologies.
  • Successfully working in fast-paced environment, both independently and in collaborative team environments


Big Data Technologies: Pig, Hive, Sqoop, Flume, HBase, Kafka-Storm,Spark, Oozie, Zookeeper, MapReduce, YarnHadoop Distributions (Cloudera, Hortonworks)

Java Technologies: Java/J2EE - JSP, Servlets, JDBC, JSTL, EJB, Junit, RMI, JMS

Web Technologies: Ajax, JavaScript, JQuery, HTML, CSS, XML, Python

Programing Languages: Java, C/ C++, Python, Scala, Shell Scripting

Databases: MySQL, MS-SQL Server, SQL, Oracle 11g, NoSQL (HBase, MongoDB, Cassandra)

Web Services: REST, AWS, SOAP, WSDL, UD

Operating System: Windows, Linux/Unix

Tools: Ant, Maven, Junit, SOAP UI

Servers: Apache Tomcat, WebSphere

IDE's: Eclipse, IntelliJ IDEA, NetBeans

Web/UI: HTML, CSS, Java Script, XML, SOAP, WSDL

ETL/BI: Tools Qlikview, Tableau, Kibana

Network Protocols: TCP/IP,TLS, SSL, SSH, FTP, HTTP and HTTPS


Confidential, DE

Spark Developer


  • Worked with team members for upgrading, configuration and maintenance of variousHadoopinfrastructures like Pig, Hive, and Hbase.
  • Implemented and extracted the data from Hbase using SPARK
  • Worked on the core andSparkSQL modules ofSparkextensively
  • Involved in converting Cassandra/Hive/SQL queries intoSparktransformations usingSparkRDDs, and Scala/Python.
  • Developed multiple POCs using Scala and deployed on the Yarn cluster, compared the performance ofSpark, with Cassandra and SQL
  • Analyzed the Cassandra/SQL scripts and designed the solution to implement using Scala
  • Written scripts in python for data processing from Hbase.
  • Extracted files from NoSQL database (Cassandra), HBase through Sqoop and placed in HDFS for processing
  • Worked on messaging frameworks likeKafka, tuning optimization
  • Worked on CreatingKafkatopics, partitions, and writing custom partitioner classes
  • Worked on Configuring Zookeeper,Kafkacluster
  • Involved in creating Hive tables, and loading and analyzing data using hive queries
  • Written hive queries on the analyzed data for aggregation and reporting
  • Involved in creating Hive tables and loading with data
  • Used Pig Latin scripts to extract the data from the output files, process it and load into HDFS
  • Imported and exported data from different databases into HDFS and Hive using Sqoop.
  • Used Sqoop for loading existing metadata in Oracle to HDFS
  • Automated all the jobs for extracting the data from different Data Sources like MySQL to pushing the result set data to Hadoop Distributed File System.
  • Used BI Tool Tableau for the generating of dashboard reports and visualization of data.
  • Developed graphs by using Tableau ETL tool.
  • Prepared avro schema files for generating Hive tables.
  • Used Avro Serdes to handle Avro Format Data in Hive and Impala
  • Implemented Map Reduce jobs using Java API and PIG Latin as well HIVEQL
  • Participated in the setup and deployment of Hadoop cluster
  • Hands on design and development of an application using Hive (UDF)
  • Developed the Pig UDF'S to pre-process the data for analysis
  • Written client web applications in XML using SOAP web services.
  • Used SOAP web services for access the information form MYSQL server
  • Involved in importing the real time data to Hadoop using Kafka and implemented the Oozie job for daily imports
  • Working in AWS environment for development and deployment of Custom Hadoop Applications
  • Install Hadoop, Map Reduce, HDFS, AWS and developed multiple MapReduce jobs in PIG and Hive for data cleaning and pre-processing

Environment: MapReduce, HDFS, Hive, Pig,Spark,Spark-Streaming,SparkSQL, Apache Kafka, Hbase, Flume, Avro, Zookeeper, Oozie, Kibana, Elastic Search,Yarn, Linux, Sqoop, Java, Scala, Python, Tableau, SOAP, REST, CDH4, CDH5, AWS, Eclipse, Oracle, Git, Shell Scripting and Cassandra

Confidential, New York, NY

Spark Developer


  • Designed and configured Flume servers to collect data from the network proxy servers and store to HDFS.
  • Loaded the customer profiles data, customer spending data, credit from legacy warehouses onto HDFS using Sqoop.
  • Built data pipeline using Pig and Java Map Reduce to store onto HDFS.
  • Applied transformations and filtered both traffic using Pig.
  • Used Pattern matching algorithms to recognize the customer across different sources and built risk profiles for each customer using Hive and stored the results in HBase.
  • Performed unit testing using MRUnit.
  • Installation of Storm and Kafka on 4 node cluster and written Kafka Rest API to collect events from front end.
  • DevelopedSparkcode using Scala andSpark-SQL/Streaming for faster testing and processing of data
  • Load the data intoSparkRDD and do in memory data Computation to generate the Output response.
  • UseSparkAPI over Hortonworks Hadoop YARN to perform analytics on data in Hive
  • Exploring with theSparkimproving the performance and optimization of the existing algorithms in Hadoop usingSparkContext,Spark-SQL, Data Frame, Pair RDD's,SparkYARN
  • Used RESTFUL web services in JSON format to develop server applications.
  • Written Storm topology to accept the events from Kafka producer and emit into Cassandra DB and written Junit test cases for Storm Topology.
  • Responsible for building scalable distributed data solutions using Hadoop
  • Installed and configured Hive, Pig, Sqoop, Flume and Oozie on the Hadoop cluster
  • Setup and benchmarked Hadoop/HBase clusters for internal use
  • Developed Simple to complex Map/reduce Jobs using Hive and Pig
  • Optimized Map/Reduce Jobs to use HDFS efficiently by using various compression mechanisms
  • Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted the data from MySQL into HDFS using Sqoop
  • Analyzed the data by performing Hive queries and running Pig scripts to study employee behavior
  • Installed Oozie workflow engine to run multiple Hive and Pig jobs

Environment: Hadoop, Java, Python, Linux, Hive, Zookeeper, Kafka Map Reduce, Sqoop, Pig 0.10 and 0.11, JDK1.6, HDFS, Flume, Oozie, DB2, Mongodb, Elastic search,Cassandra, SOAP, REST, HBase, Linux, Mahout.

Confidential, Buffalo, NY

Hadoop Developer


  • Worked on Implementation and Maintenance of Cloudera Hadoop cluster.
  • Assisted in upgrading, configuration and maintenance of various Hadoop infrastructures like Pig, Hive, and Hbase.
  • Exported data from HDFS to RDBMS via Sqoop for Business Intelligence, visualization and user report generation.
  • Developed script to run night batch process Using Python.
  • Worked and used transformations, cleaning and filtering on imported data using Hive, Map Reduce, and loaded final data into HDFS
  • Analyzed user request patterns and implemented various performance optimization measures including but not limited to implementing partitions and buckets in HiveQL.
  • Created and maintained Technical documentation for launching HADOOP Clusters and for executing Hive queries and Pig Scripts
  • Monitored workload, job performance and node health using Cloudera Manager.
  • Used Flume to collect and aggregate weblog data from different sources and pushed to HDFS
  • Integrated Oozie with Map-Reduce, Pig, Hive, and Sqoop.
  • Analyzed business requirements and cross-verified them with functionality and features of NOSQL databases like HBase, Cassandra to determine the optimal DB
  • Importing and exporting data into HDFS and Hive using Sqoop
  • Used Oozie scheduler to submit workflows,
  • Worked on SOAP web services and written web applications Using XML.

Environment: Hadoop, MapReduce, HDFS, Pig, Hive, HBase, Flume, ZooKeeper, Cloudera Manager, Oozie, Java (jdk1.6), MySQL, SQL, Windows NT, Linux.

Confidential, Dallas, TX

Hadoop Developer


  • Installed/Configured/Maintained Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase, Flume, Oozie Zookeeper and Sqoop.
  • Involved in creating Hive tables and working on them using HiveQL and perform data analysis using Hive and Pig
  • Imported data using Sqoop to load data from MySQL to HDFS and Hive on regular basis
  • Loaded data into the cluster from dynamically generated files using Flume and from relational database management systems using Sqoop
  • Written Hive queries for data analysis to meet the business requirements
  • Wrote Pig scripts to run ETL jobs on the data in HDFS and further do testing
  • Responsible for cluster maintenance, adding and removing cluster nodes, cluster monitoring and troubleshooting, managing and reviewing data backups and Hadoop log files
  • Automated all the jobs, for pulling data from FTP server to load data into Hive tables, using Oozie workflows.
  • Extensively involved in Installation and configuration of Cloudera distribution Hadoop, Name Node and Secondary Name Node.
  • Wrote MapReduce jobs using Java API and Pig Latin.

Environment: Cloudera, Hadoop, Map/reduce, HDFS, Pig, Hive, Sqoop, Impala, Oozie, Java, J2EE, Python, Linux, SQL, Oracle, SOAP Web services.


Sr Java Developer


  • Involved in defining the business rule according to client specific and convert them into High level Technical Design.
  • Designed entire system according to OOPS & UML by using Rational Tools.
  • Elaborated use cases, interface definition specifications in collaboration with Business.
  • Used Backend as the Oracle database & used JDBC technologies for integration.
  • Extensively used TOAD for all DB related activities & integration testing.
  • Used build and deploy scripts in ANT and UNIX shell scripting.
  • Developed User interface screens using Servlets, JSP, JavaScript, CSS, AJAX, HTML.
  • Involved in unit testing of developed business units & used the JUnit for specifics.
  • Worked along with the Development team & QA team to resolve the issues in SIT/UAT/Production environments.
  • Closely Co-ordinated with Architect, Business Analyst, business team for requirement analysis and doing development and implementation.
  • Spring Framework caching mechanism which was used to pre-load some of the Master Information.
  • Developed Controller Classes, Command Objects, Action Classes, Form beans, Transfer Objects Singleton at server side for handling requests and responses from presentation Layer.

Environment: Core Java, J2EE1.5/1.6, Struts, Ajax, Rational Rose, Hibernate3.0, CVS, RAD7.0 IDE, Oracle10g, JDBC, log4j, WebSphere6.0, Servlets, JSP, Junit.


Java Developer


  • Involved in program setup, program profile, fees, card settings Modules.
  • Developed Action classes, business classes, helper classes, Hibernate POJO classes
  • Developed spring DAO classes, store proc. classes to connect the DB through spring JDBC
  • Developed Action Forms, Form Beans and Java Action classes using Struts framework
  • Participated in code reviews and ensured compliance with standards
  • Involved in preparing database scripts and deployment process
  • Used JDBC API to connect to the database and carry out database operations.
  • Involved in design and implementation of web tier using Servlets and JSP.
  • Used Apache POI for Excel files reading.
  • Designed complete Technical Specifications using business requirements.
  • Planning efficiently while implementing the new architecture in production in not affecting the existing business or functionalities.
  • Used JSP and JSTL Tag Libraries for developing User Interface components.
  • Involved in developing UML Diagrams like Use Case, Class, Sequence diagrams.
  • Handled the session management to switch from classic application to new wizard and vice versa.

Environment: JAVA, J2EE (Jsp, JSTL, Servelts), Hibernate, Struts (Validation Framewrok), Spring, Apache JQuery, JavaScript, SQL, Tortoise SVN, Maven, VISIO.

We'd love your feedback!