We provide IT Staff Augmentation Services!

Bigdata And Spark Developer Resume

Dallas, TX


  • Around 7+ years of IT experience in Analysis, design, development, implementation, maintenance and support with experience in developing strategic methods for deploying big data technologies to efficiently solve Big Data processing requirement.
  • Around 5 years of experience on BIG DATA using HADOOP framework and related technologies such as HDFS, HBASE, MapReduce, Spark, Hbase, HIVE, PIG, FLUME, OOZIE, SQOOP, and ZOOKEEPER.
  • Experience in data analysis using HIVE, PIG LATIN, HBASE and custom Map Reduce programs in Java.
  • Experience in writing custom UDFs in JAVA and SCALA for HIVE and PIG TO EXTEND THE FUNCTIONALITY.
  • Experience with Cloudera and Horton works distributions.
  • Over 3 - years experience on SPARK, SCALA, HBASE and KAFKA.
  • Developed analytical components using KAFKA, SCALA, SPARK, HBASE and SPARK STREAM.
  • Experience in working with Flume to load the log data from multiple sources directly into HDFS.
  • Pretty Good knowledge On the Hortonworks administration and security things such as Apache Ranger, Knox Gateway, High Availability.
  • Performed Hadoop backup Strategy to take the backup of hive, HDFS, Hbase, oozie etc.
  • Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems (RDBMS) and from RDBMS to HDFS.
  • Involved in creating HDINSIGHT cluster in Confidential AZURE PORTAL also created EVENTSHUB and AZURE SQL DATABASES.
  • Worked on a clustered Hadoop for Windows Azure using HDInsight and HORTONWORKS Data Platform for Windows.
  • Built real time pipeline for streaming data using EVENTSHUB/ Confidential AZURE Queue and SPARK STREAMING.
  • Loaded the aggregated data into Hbase for reporting purpose
  • Read the data from Hbase to Spark to perform Join on different tables.
  • Created the Hbase tables for validation, audit and offset management table.
  • Created logical view instead of tables in order to enhance the performance of hive queries.
  • Involved in developing Hive DDLS to create, alter and drop Hive tables
  • Pretty Good Knowledge on hive Optimization techniques like Vectorization and column-based optimization.
  • Written oozie workflow to invoke the Jobs in predefined Interval.
  • Expert in scheduling Oozie coordinator based on input data events it starts Oozie workflow when input data is available.
  • On Other Hand working on POC with Kafka and NIFI to pull the real-time events into Hadoop Box.
  • Exploring with the Spark for improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, SPARK-SQL, DATA FRAME, PAIR RDD'S and YARN.
  • Experienced in managing Hadoop Cluster using HORTONWORKS AMBARI.



Confidential, Dallas, TX


  • Developed framework to encrypt sensitive data (SSN, Account number ...etc.) in all kinds of datasets and moved datasets one S3 bucket to another.
  • Processed datasets like Text, Parquet, Avro, Fixed Width, Zip, Gz, JSON and XML.
  • Developed framework to check data quality of datasets, schema defined in cloud. worked on Amazon Web service(AWS) to integrate EMR with Spark 2 and S3 storage and Snowflake
  • Configured Spark streaming to receive real time data from the Kafka and store the stream data into AWS S3 using Scala.
  • Experienced in handling large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark, Effective & efficient Joins, Transformations and other during ingestion process itself.
  • Enhanced and optimized product Spark code to aggregate, group and run data mining tasks using the Spark framework.
  • Used Spark - Streaming APIs to perform required transformations and actions on the learner data model which gets the data from Kafka in near real time.
  • Worked on migrating Map Reduce programs into Spark transformations using Spark and Used File Broker to schedule workflows to run Spark jobs to transform data on a persistent schedule.
  • Experience developing, deploying Shell Scripts for automation/notification/monitoring.
  • Extensively used Apache Kafka, Apache Spark, HDFS and Apache Impala to build a near real time data pipelines that get, transform, store and analyze click stream data to provide a better personalized user experience.
  • Worked on Performance tuning on Spark Application.
  • Worked with Apache Spark SQL and data frame functions to perform data transformations and aggregations on complex semi structured data.
  • Hands on experience in creating RDDs, transformations and actions while implementing Spark applications.



Confidential, Plano, Texas


  • Developed data pipeline using EVENTHUBS, SPARK, HIVE, PIG AND AZURE SQL DATABASE to ingest customer behavioral data and financial histories into HDINSIGHT cluster for analysis.
  • Involved in creating HDINSIGHT cluster in Confidential AZURE PORTAL also created EVENTSHUB and AZURE SQL DATABASES.
  • Worked on a clustered Hadoop for Windows Azure using HDInsight and HORTONWORKS Data Platform for Windows.
  • Spark Streaming collects this data from EVENTSHUB in near-real-time and performs necessary transformations and AGGREGATION on the fly to build the common learner data model and persists the data in AZURE DATABASE.
  • Used PIG to do transformations, event joins, filter boot traffic and SOME PRE-AGGREGATIONS before storing the data onto azure database.
  • Expertise with the tools in Hadoop Ecosystem including PIG, HIVE, HDFS, YARN, OOZIE, AND ZOOKEEPER. Hadoop architecture and its components.
  • Involved in integration of Hadoop cluster with spark engine to perform BATCH and GRAPHX operations.
  • Exploring with the SPARK improving the performance and optimization of the existing algorithms in Hadoop using SPARK CONTEXT, SPARK-SQL, DATA FRAME, PAIR RDD'S, SPARK YARN.
  • I have been experienced with SPARK STREAMING to ingest data into SPARK ENGINE.
  • Import the data from different sources like EVENTHUBS, COSMOS into SPARK RDD.
  • Developed SPARK CODE using SCALA and Spark-SQL/Streaming for faster testing and processing of data.
  • Involved in converting Hive/SQL queries into SPARK TRANSFORMATIONS using Spark RDDs, and SCALA.
  • Developed multiple POCs using SCALA and deployed on the Yarn cluster, compared the performance of Spark, with Hive and SQL/Teradata.
  • Worked on the SPARK SQL and SPARK STREAMING modules of Spark extensively and Used SCALA to write code for all Spark use cases.
  • Used DATAFRAME API in Scala for converting the distributed collection of data organized into named columns.
  • Involved in converting the JSON data into DATAFRAME and stored into hive tables.
  • Experienced with AZCOPY, LIVY, WINDOWS POWERSHELL and CURL to submit the spark jobs on HDINSIGHT CLUSTER.
  • Analyzed the SQL scripts and designed the solution to implement USING SCALA.





  • Importing and exporting data into HDFS and Hive using Sqoop.
  • Used Bash Shell Scripting, Sqoop, AVRO, Hive, Pig, Java, Map/Reduce daily to develop ETL, batch processing, and data storage functionality.
  • Used Pig to do data transformations, event join sand some pre-aggregations before storing the data on the HDFS.
  • Exploited Hadoop MySQL-Connector to store Map Reduce results in RDBMS.
  • Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
  • Worked on loading all tables from the source database schema through Sqoop.
  • Worked on designed, coded and configured server side J2EE components like JSP, AWS and JAVA.
  • Collected data from different databases (i.e. Oracle, MySQL) to Hadoop
  • Used Oozie and Zookeeper for workflow scheduling and monitoring.
  • Worked on Designing and Developing ETL Workflows using Java for processing data in HDFS/Hbase using Oozie.
  • Experienced in managing and reviewing Hadoop log files.
  • Involved in loading and transforming large sets of structured, semi structured and unstructured data from relational databases into HDFS using Sqoop imports.
  • Working on extracting files from MySQL through Sqoop and placed in HDFS and processed.
  • Supported Map Reduce Programs those running on the cluster.
  • Cluster coordination services through Zoo Keeper.
  • Involved in loading data from UNIX file system to HDFS.
  • Created several Hive tables, loaded with data and wrote Hive Queries in order to run internally in MapReduce.
  • Developed Simple to complex MapReduce Jobs using Hive and Pig.




  • Involved in the analysis, design, and development and testing phases of Software Development Life Cycle (SDLC)
  • Designed and developed framework components, involved in designing MVC pattern using Struts and spring framework.
  • Responsible for developing Use case, Class diagrams and Sequence diagrams for the modules using UML and Rational Rose.
  • Developed the Action Classes, Action Form Classes, created JSPs using Struts tag libraries and configured in Struts-config.xml, Web.xml files.
  • Involved in Deploying and Configuring applications in Web Logic Server.
  • Used SOAP for exchanging XML based messages.
  • Used Confidential VISIO for developing Use Case Diagrams, Sequence Diagrams and Class Diagrams in the design phase.
  • Developed Custom Tags to simplify the JSP code. Designed UI screens using JSP and HTML.
  • A ctively involved in designing and implementing Factory method, Singleton, MVC and Data Access Object design patterns.
  • Web services used for sending and getting data from different applications using SOAP messages. Then used DOM XML parser for data retrieval.
  • Wrote JUNIT test cases for Controller, Service and DAO layer using MOCKITO, DBUNIT.
  • Developed unit test cases using proprietary framework which is similar to JUNIT.
  • Used JUnit framework for unit testing of application and ANT to build and deploy the application on WebLogic Server

Hire Now