Sr. Spark Developer Resume
Redmond, WA
SUMMARY
- Around 9 years of IT experience in Architecture, Analysis, design, development, implementation, maintenance and support with experience in developing strategic methods for deploying big data technologies to efficiently solve Big Data processing requirement.
- Around 7+ years of experience on BIG DATA using HADOOP framework and related technologies such as Spark, Spark Streaming HDFS, HBASE, MapReduce, HIVE, PIG, FLUME, OOZIE, SQOOP, and ZOOKEEPER.
- Experience in data analysis using HIVE, Pig Latin, HBASE and custom Map Reduce programs in Java.
- Experience in writing custom UDFs in java and python for Hive and Pig to extend the functionality.
- Experience with Cloudera CDH3, CDH4 and CDH5 distributions.
- Excellent understanding /knowledge on Hadoop (Gen - 1 and Gen-2) and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, Resource Manager (YARN).
- Experience in managing and reviewing Hadoop log files.
- I have been experiencing with HUE.
- Experience in working with Flume to load the log data from multiple sources directly into HDFS.
- Excellent understanding and knowledge of NOSQL database HBASE and Cassandra.
- Spark Streaming collects this data from EVENTSHUB in near-real-time and performs necessary transformations and AGGREGATION on the fly to build the common learner data model and persists the data in AZURE DATABASE.
- Used PIG to do transformations, event joins, filter boot traffic and SOME PRE-AGGREGATIONS before storing the data onto azure database.
- Expertise with the tools inHadoopEcosystem including PIG, HIVE, HDFS, YARN, OOZIE, AND ZOOKEEPER.Hadooparchitecture and its components.
- Involved in integration of Hadoop cluster with spark engine to perform BATCH and GRAPHX operations.
- Exploring with the SPARK improving the performance and optimization of the existing algorithms in Hadoop using SPARK CONTEXT, SPARK-SQL, DATA FRAME, PAIR RDD'S, and SPARK YARN.
- I have been experienced with SPARK STREAMING to ingest data into SPARK ENGINE.
- Import the data from different sources like EVENTHUBS, COSMOS into SPARK RDD.
- Developed SPARK CODE using SCALA and Spark-SQL/Streaming for faster testing and processing of data.
- Involved in converting Hive/SQL queries into SPARK TRANSFORMATIONS using Spark RDDs, and SCALA.
- Developed multiple POCs using SCALA and deployed on the Yarn cluster, compared the performance of Spark, with Hive and SQL/Teradata.
- Worked on the SPARK SQL and SPARK STREAMING modules of Spark extensively and Used SCALA to write code for all Spark use cases.
- Used DATAFRAME API in Scala for converting the distributed collection of data organized into named columns.
- Involved in converting the JSON data into DATAFRAME and stored into hive tables.
- Analyzed the SQL scripts and designed the solution to implement USING SCALA.
- Developed EVENTHUBS PRODUCER application in Scala to generate events into eventhubs.
- Analyzed the SQL scripts and designed the solution to implement using PYSPARK.
- Used Hive to analyze the PARTITIONED AND BUCKETED data and compute various metrics for reporting.
- Experience in all stages of SDLC (Agile, Waterfall), writing Technical Design document, Development, Testing and Implementation of Enterprise level Data mart and Data warehouses.
- Extensive experience working in Oracle, DB2, SQL Server and My SQL database.
- Experience in J2EE technologies like Struts, JSP/Servlets, and spring.
- Good Exposure on scripting languages like JavaScript, Angular JS, jQuery and xml.
- Delivery Assurance - Quality Focused & Process Oriented:
- Ability to work in high-pressure environments delivering to and managing stakeholder expectations
- Application of structured methods to: Project Scoping and Planning, risks, issues, schedules and deliverables.
- Strong analytical and Problem solving skills.
- Good Inter personnel skills and ability to work as part of a team. Exceptional ability to learn and master new technologies and to deliver outputs in short deadlines
TECHNICAL SKILLS
Technology: Hadoop Ecosystem/J2SE/J2EE / Data base
Operating Systems: Windows Vista/XP/NT/2000/ LINUX (Ubuntu, Cent OS), UNIX
DBMS/Databases: DB2, My SQL, PL/SQL
Programming Languages: C, C++, Core Java, XML, JSP/Servlets, Struts, Spring, HTML, JavaScript, jQuery, Web services, Xml.
Big Data Ecosystem: HDFS, Map Reducing, Oozie, Hive, Pig, SQOOP, Flume, Splunk, Zookeeper, Kafka and HBASE.
Methodologies: Agile, Water Fall
NOSQL Databases: HBASE
Version Control Tools: SVN, CVS
ETL Tools: IBM data stage 8.1, Informatica
PROFESSIONAL EXPERIENCE
Sr. Spark Developer
Confidential, Redmond, WA
Responsibilities:
- Developed data pipeline using EVENTHUBS, SPARK, HIVE, PIG AND AZURE SQL DATABASE to ingest customer behavioral data and financial histories into HDINSIGHT cluster for analysis.
- Involved in creating HDINSIGHT cluster in Confidential AZURE PORTAL also created EVENTSHUB and AZURE SQL DATABASES.
- Worked on a clustered Hadoop for Windows Azure using HDInsight and HORTONWORKS Data Platform for Windows.
- Spark Streaming collects this data from EVENTSHUB in near-real-time and performs necessary transformations and AGGREGATION on the fly to build the common learner data model and persists the data in AZURE DATABASE.
- Used PIG to do transformations, event joins, filter boot traffic and SOME PRE-AGGREGATIONS before storing the data onto azure database.
- Expertise with the tools in Hadoop Ecosystem including PIG, HIVE, HDFS, YARN, OOZIE, AND ZOOKEEPER. Hadoop architecture and its components.
- Involved in integration of Hadoop cluster with spark engine to perform BATCH and GRAPHX operations.
- Exploring with the SPARK improving the performance and optimization of the existing algorithms in Hadoop using SPARK CONTEXT, SPARK-SQL, DATA FRAME, PAIR RDD'S, and SPARK YARN.
- I have been experienced with SPARK STREAMING to ingest data into SPARK ENGINE.
- Import the data from different sources like EVENTHUBS, COSMOS into SPARK RDD.
- Developed SPARK CODE using SCALA and Spark-SQL/Streaming for faster testing and processing of data.
- Involved in converting Hive/SQL queries into SPARK TRANSFORMATIONS using Spark RDDs, and SCALA.
- Developed multiple POCs using SCALA and deployed on the Yarn cluster, compared the performance of Spark, with Hive and SQL/TERADATA.
- Worked on the SPARK SQL and SPARK STREAMING modules of Spark extensively and Used SCALA to write code for all Spark use cases.
- Involved in converting the JSON data into DATAFRAME and stored into hive tables.
- Experienced with AZCOPY, LIVY, WINDOWS POWERSHELL and CURL to submit the spark jobs on HDINSIGHT CLUSTER.
- Analyzed the SQL scripts and designed the solution to implement USING SCALA.
- Developed EVENTHUBS PRODUCER application in Scala to generate events into eventhubs.
- Analyzed the SQL scripts and designed the solution to implement using PYSPARK.
- Used Hive to analyze the PARTITIONED AND BUCKETED data and compute various metrics for reporting.
- Involved in developing HIVE DDLS to create, alter and drop Hive tables and storm.
- Create scalable and high-performance web services for data tracking.
- Involved in loading data from UNIX file system to HDFS. Installed and configured Hive and also written Hive UDFs and Cluster coordination services through Zoo Keeper.
- Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map reduce way.
- Involved in using HCATALOG to access Hive table metadata from Map Reduce or Pig code.
- Involved in using SQOOP for importing and exporting data into HDFS.
- Used Eclipse and ant to build the application. Proficient work experience with NOSQL, Mongo databases also the HDFS data from Rows to Columns and Columns to Rows.
- Involved in developing Shell scripts to orchestrate execution of all other scripts (Pig, Hive, and Map Reduce) and move the data files within and outside of HDFS.
ENVIRONMENT: Azure HD insight, Spark, Hive, Spark SQL, Eventhub, Hortonworks, Scala IDE, Python, Scala, Maven, Jupiter notebook, Visual Studio, UNIX shell-scripting.
Spark Developer
Confidential
Responsibilities:
- Developed Spark code using Scala, Data Frames and Spark-SQL for faster processing of data.
- Used Spark Data frames, Spark-SQL extensively.
- Configured Spark streaming to get ongoing information from the Kafka and store the stream information to HDFS.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDD's, Data frames, Spark SQL and Scala
- Performed advanced procedures like text analytics and processing, using the in-memory computing capabilities of Spark using Scala.
- Developed Scala and Spark SQL code to extract data from various databases.
- Used Spark SQL to process the huge amount of structured data and implemented Spark RDD transformations, actions to migrate Map reduce algorithms.
- Experienced with Kafka to ingest data into Spark Engine
- Worked on Spark streaming using Apache Kafka for real time data processing.
- Performed Data Ingestion from multiple internal clients using Apache Kafka.
- Developed data pipeline using Spark, Hive, SQOOP and Kafka to ingest customer behavioral data into Hadoop platform for analysis.
- Extensively worked with all kinds of Un-Structured, Semi-Structured and Structured data.
- Used Spark-Streaming APIs to perform necessary transformations and actions on the data got from Kafka and Persists into HDFS.
- Extract Real time feed using Kafka and Spark Streaming and convert it to RDD and process data in the form of Data Frame and save the data as Parquet format in HDFS.
- Developed multiple Kafka Producers and Consumers from as per the software requirement specifications.
- Involved in the process of data acquisition, data pre-processing and data exploration of project in Spark using Scala.
- Developed Spark scripts by using Scala shell commands as per the requirement.
- Experienced in handling large datasets using partitions, Spark in Memory capabilities, Broadcasts in Spark, Effective & efficient Joins, Transformation and other during ingestion process itself.
Hadoop Developer
Confidential
Responsibilities:
- Worked extensively in creating Map Reduce jobs to power data for search and aggregation. Designed a data warehouse using Hive.
- Involved in Analyzing system failures, identifying root causes, and recommended course of actions.
- Using Hive, Map-reduce, and loaded data into HDFS.
- Worked with systems engineering team to plan and deploy new Hadoop environments and expand existing Hadoop clusters.
- Monitored workload, job performance and capacity planning using Cloudera Manager.
- Worked extensively with SQOOP for importing metadata from Oracle.
- Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
- Extensively used Pig for data cleansing.
- Created partitioned tables in Hive.
- Worked with business teams and created Hive queries for ad hoc access.
- Evaluated usage of Oozie for Workflow Orchestration.
- Mentored analyst and test team for writing Hive Queries.
- Experience in writing MapReduce programs with Java API to cleanse Structured and unstructured data.
- Experience in RDMS such as Oracle, Teradata
- Worked on loading the data from MySQL to HBASE where necessary using SQOOP.
- Launching Amazon EC2 Cloud Instances using Amazon Images (Linux/ Ubuntu) and Configuring launched instances with respect to specific applications.
- Created and maintained Technical documentation for launching HADOOP Clusters and for executing Hive queries and Pig Scripts.
- Exported data to RDBMS via SQOOP to check whether the power saving program is successful or not.
- Extensively used SQOOP for importing the data from RDBMS to HDFS.
- Used Zookeeper to coordinate the clusters.
- Installed and configured MapReduce, HIVE and the HDFS; implemented CDH3 Hadoop cluster on CentOS. Assisted with performance tuning and monitoring.
Java Developer
Confidential
Responsibilities:
- Involved in analysis, design and implementation of business requirements.
- Extensively worked in integrating the different layers of the project.
- Designed and developed Application based on Struts framework using MVC design patterns.
- Developed Struts Action classes and Form Beans.
- Designed and developed the UI using Struts view component, JSP, HTML, CSS and JavaScript.
- Extensively used JSTL tags and Struts tag libraries. Used Struts tiles as well in the presentation tier.
- Developed Custom Tags to simplify the JSP. Designed UI screens using JSP, CSS, XML and HTML. Used JavaScript for client side validation.
- Responsible for writing the JavaScript code for validating the application.
- Used JQuery for validations and UI development.
- JSP Content is configured in XML Files for Developing Deployment descriptor files.
- Developed numerous UI (User Interface) screens using JSP and HTML, CSS, Java Script.
- Responsible for writing the JavaScript code for validating the application.
- Involved in production support of the application.
- Implemented in various design patterns in the project.
- Involved in System Testing and integration testing of the application.
- Analysis and understanding of business requirements.
- Effectively participated in weekly client communications with Business Analysts.
- Involved in the architecture team for design and implementation of system.
- Developed application using Spring MVC, JSP, JSTL and AJAX on the presentation layer, the business layer is built using spring and the persistent layer uses Hibernate.
- Developed Custom Tags to represent data in a desired unique table format and to implement paging logic.
- Developed views and controllers for client and manager modules using Spring MVC and Spring Core.
- Business logic is implemented using Spring Core and Hibernate.
- Data Operations are performed using Spring ORM wiring with Hibernate and Implemented Hibernate Template and criteria API for Querying database.
- Developed Web Services using XML messages that use SOAP, Created WSDL and the SOAP envelope.
- Developed Exception handling framework and used log4J for logging.
- Developed and modified database objects as per the requirements.
- Involved in Unit integration, bug fixing, acceptance testing with test cases, Code reviews.
Environment: Java 1.5, Spring Framework, Hibernate 3.6, UML, XML, CSS, HTML, Linux/UNIX, Oracle9i, Java Script, Weblogic, Rational Rose, J2ee,Struts,JavaBeans,Servlets, JSP, JDBC, Apache Tomcat, DB2,SQL,Windows XP, Eclipse, Log4j, Junit, ANT, Jboss.