We provide IT Staff Augmentation Services!

Spark Developer Resume

Minneapolis, MinnesotA


  • 7 years of work experience in IT, which includes experience in Development and Implementation of Hadoop, Data warehousing solutions and Java.
  • Experience working with Hadoop ecosystem using components like HDFS, MapReduce, Hive, HBase, Sqoop, Impala in Cloudera distribution and good knowledge on Hortonworks.
  • Hands on experience in writing Sqoop scripts to import data from multiple RDBMS to HDFS.
  • Experience in Hive partitioning, bucketing and perform joins on Hive tables and implement Hive SerDes.
  • Good Knowledge in writing Spark Applications in PySpark and Scala using Datafames.
  • Hands on Experience in designing and developing applications In Spark using Scala and Pyspark to compare the performance of Spark with Hive and SQL/Oracle.
  • Software developer in core Java Application Development, Client/Server Applications, and Internet/Intranet based database applications and developing, testing and implementing application environment using J2EE, JDBC, JSP, Servlets, Web Services, Oracle, PL/SQL and Relational Databases.
  • Experience developing Kafka producers and Kafka Consumers for streaming millions of events per second on streaming data
  • Experience in manipulating/analyzing large datasets and finding patterns and insights within structured and unstructured data.
  • Having working experience with Building RESTful web services, and RESTful API
  • Solid design skills using Java Design Patterns and Unified Modeling Language UML.
  • Strong understanding of real time streaming technologies Spark and Kafka.
  • Strong understanding of Logical and Physical data base models and entity - relationship modeling.
  • Replaced existing MR jobs and Hive scripts with Spark SQL, Spark data transformations for efficient data processing.
  • Strong understanding of Java Virtual Machines and multithreading process.
  • Experience in writing complex SQL queries, creating reports and dashboards.
  • Excellent analytical, communication and interpersonal skills.
  • Possess excellent communication, interpersonal and analytical skills along with positive attitude.


Programming/Scripting Languages: Scala, PySpark, Core Java, Python, SQL

Big Data: Hadoop, MapReduce, HDFS, Hive, sqoop, Spark, Kinesis, Kafka

Other tools: Microsoft Office tools, VSTS, VM ware, Git NoSQL, Oracle, MYSQL Apache Cassandra, Hbase

Big data Eco System: HDFS, Oozie, Zookeeper, Spark SQL, Spark streaming, Hue, Ambari, Impala.

File Formats: Txt, XML, JSON, Avro, Parquet, ORC

Cloud Computing: AWS

Visualization and Reporting Tools: Tableau, Microsoft Power BI


Confidential - Minneapolis, Minnesota

Spark Developer


  • Working as a Big Data Engineer on Hortonworks distribution. Responsible for Data Ingestion, Data Cleansing, Data Standardization and Data Transformation.
  • Working with Hadoop 2.x version and Spark 2.x (Python and Scala).
  • Involved in extracting data from various data sources into Hadoop HDFS. This included data from SFTP server, GCS and AWS buckets.
  • Worked on creating Hive managed and external tables based on the requirement.
  • Implemented Partitioning and Bucketing on Hive tables for better performance.
  • Used Spark-SQL to process the data and to run on Spark engine.
  • Worked on Spark for improving performance and optimization of existing algorithms in Hadoop using Spark-SQL and Scala.
  • Worked on Google cloud Big Query to execute the queries and analyze the data quickly.
  • Worked on various file formats like Parquet, Json and ORC.
  • Developed end to end ETL pipeline using Spark-SQL, Scala on Spark engine.
  • Worked with external vendors or partners to onboard external data into Confidential GCS buckets.
  • Worked on Oozie to develop workflows to automate ETL data pipeline.
  • Worked visualization tools like Google visual studio and internal tool like Domo.
  • Used Spark for interactive queries, processing of streaming data and integration with NoSQL database for huge volume of data.
  • Experienced in handling large datasets using Partitions, Spark in-memory capabilities, Broadcasts in Spark, Effective efficient Joins, Transformations and other during ingestion process itself.
  • Developed custom ETL solutions, batch processing and real-time data ingestion pipeline to move data in and out of Hadoop using Pyspark and shell scripting.
  • Explored with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, and Data Frame.
  • Worked with Sqoop import and export functionalities to handle large data set transfer between Oracle databases and HDFS.
  • Developed Spark jobs to clean data obtained from various feeds to make it suitable for ingestion into Hive tables for analysis.
  • Imported data from various sources into Spark RDD for analysis.
  • Configured Oozie workflow to run multiple Hive jobs which run independently with time and data availability.
  • Utilized Hive tables and HQL queries for daily and weekly reports. Worked on complex data types in Hive like Structs and Maps.
  • Imported data from AWS S3 into Spark RDD, performed transformations and actions on RDDs.
  • Created Cassandra tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios.
  • Supported code/design analysis, strategy development and project planning.
  • Created reports for the BI team using Sqoop to export data into HDFS and Hive.
  • Assisted with data capacity planning and node forecasting.
  • Collaborated with the infrastructure, network, database, application and BI teams to ensure data quality and availability.
  • Designing ETL processes using Informatica to load data from Flat Files, Oracle and Excel files to Confidential Oracle Data Warehouse database.

Environment: Spark, Spark SQL, Hive, Oozie, Sqoop, Flume, Java, Scala, PySpark, Shell scripting, Tableau.

Confidential - Atlanta, Georgia

Hadoop developer


  • Involved in review of functional and non-functional requirements.
  • Utilized SQOOP, Kafka, Flume and Hadoop File System API’s for implementing data ingestion pipelines.
  • Worked on real time streaming, performed transformations on the data using Kafka and Spark Streaming.
  • Migrated existing on-premise application to AWS and used AWS services like EC2 and S3 to process and store small data sets.
  • Experienced in maintaining the Hadoop cluster on AWS EMR.
  • Worked on importing metadata into Hive and migrating existing tables and applications to work on Hive and AWS cloud.
  • Involved in managing and reviewing Hadoop log files. Designed and implemented an ETL framework using Java.
  • Worked on real time streaming, performed transformations on the data using Kafka and flume.
  • Developed Spark Streaming jobs in Scala to consume data from Kafka Topics, made transformations on data and insert into HBase tables.
  • Imported data using Sqoop to load data from MySQL to HDFS on regular basis.
  • Developing Scripts and Batch Job to schedule various Hadoop Program.
  • Written Hive queries for data analysis to meet the business requirements.
  • Creating Hive tables and working on them using Hive QL.
  • Experienced in defining job flows.
  • Got good experience with MYSQL database Cassandra.
  • Involved in creating Hive tables loading with data and writing hive queries which will run internally in map reduce way.
  • Developed a custom Filesystem plug in for Hadoop so it can access files on Data Platform.
  • Designed and implemented MapReduce-based large-scale parallel relation-learning system.

Environment: Hadoop, MapReduce, HDFS, Hive, Java, Hadoop distribution of Horton Works, Cloudera, HBase, Linux, XML, MySQL, Hadoop, HDFS, ETL, Kafka, YARN, Drill, Hive, Cassandra


Hadoop Developer


  • Developed MapReduce programs to parse the raw data, populate staging tables and store the refined data in partitioned tables in the EDW.
  • Created Hive queries that helped market analysts spot emerging trends by comparing fresh data with EDW reference tables and historical metrics.
  • Enabled speedy reviews and first mover advantages by using Oozie to automate data loading into the Hadoop Distributed File System.
  • Performed ETL operations using Hive to transform transactional data into de-normalized form.
  • Created adhoc reports by gathering requirements from different teams.
  • Utilized Hive user defined functions to analyze the complex data to find specific user behavior.
  • Analyzed data using HiveQL to derive metrics like game duration, daily active users (DAU), weekly active users (WAU) etc.
  • Implemented Hive generic UDFs to in corporate business logic into Hive queries.
  • Worked along with the admin team to assist them in adding/ removing cluster nodes, cluster monitoring and trouble shooting.
  • Exported data to relational databases using Sqoop for visualization and to generate reports.
  • Created Machine Learning and statistical models like (SVM, CRF, HMM) to assess gamer performance.
  • Provided design recommendations and thought leadership to sponsors/stakeholders that improved review processes and resolved technical problems.
  • Managed and reviewed Hadoop log files.
  • Tested raw data and executed performance scripts.
  • Shared responsibility for administration of Hadoop and Hive.

Environment: Hadoop, MapReduce, HDFS, Hive, Java, SQL, PL/ SQL, MySQL. Sqoop, Linux XML MySQL.


Core Java/J2EE Developer


  • Troubleshoot various software issues using debugging process and coding techniques.
  • Provide high-level customer support to remote clients using a support e-ticketing system.
  • Perform system administration for hosting server and client software.
  • Developed RESTful web services, and RESTful API.
  • Developed screens using Java, HTML, DHTML, CSS, JSP and JavaScript.
  • Designed Database for the application.
  • Implemented all validations and done testing.
  • Implemented and managed SQL database for use in background for security and internal proprietary processes.
  • Diagnose and correct errors within Java/HTML/PHP code to allow for connection and utilization of proprietary applications.
  • End user support and administrative functions to include password and account management.
  • Developed PL/SQL View function in Oracle 9i database for get available date module.
  • Used Quartz schedulers to run the jobs in a sequential with in the given time
  • Used JSP and JSTL Tag Libraries for developing User Interface components.

Environment: JDK 5.0, JavaScript, HTML, DHTML, XML, Struts, JSP, Servlet, JNDI, J2EE, Tomcat, Oracle, JSP, RESTful API.


Java/J2EE Developer


  • Analyzed and reviewed client requirements and design
  • Worked on testing, debugging and troubleshooting all types of technical issues.
  • Good knowledge in OOPS concepts
  • Used JDBC for database connectivity and manipulation
  • Used Eclipse for the Development, Testing and Debugging of the application.
  • Working as java j2ee backend developer in creating the Maven web application project
  • Used DOM Parser to parse the xml files.
  • Log4j framework has been used for logging debug, info & error data.
  • Used WinSCP to transfer file from local system to other system.
  • Performed Test Driven Development (TDD) using JUnit.
  • Used JProfiler for performance tuning
  • Built the application using MAVEN and deployed using WebSphere Application server.
  • Gathered and collected information from various programs, analyzed time requirements and prepared documentation to change existing programs.
  • Used SOAP for exchanging XML based messages.
  • Used Microsoft VISIO for developing Use Case Diagrams, Sequence Diagrams and Class Diagrams in the design phase.
  • Developed Custom Tags to simplify the JSP code.
  • Designed UI screens using JSP and HTML.

Environment: Java, HTML, Servlets, JSP, Hibernate, Junit Testing, Oracle DB, SQL, Jasper Reports, iReport, Maven, Jenkins.

Hire Now