We provide IT Staff Augmentation Services!

Sr. Spark Developer Resume

5.00/5 (Submit Your Rating)

PhiladelphiA

PROFESSIONAL SUMMARY:

  • Around 8 years of IT experience in Architecture, Analysis, design, development, implementation, maintenance and support with experience in developing strategic methods for deploying big data technologies to efficiently solve Big Data processing requirement
  • Around 5+ years of experience on BIG DATA using HADOOP framework and related technologies such as Spark, Spark Streaming HDFS, HBASE, MapReduce, HIVE, PIG, FLUME, OOZIE, SQOOP, and ZOOKEEPER.
  • Experience in data analysis using HIVE, Pig Latin, HBase and custom Map Reduce programs in Java.
  • Experience in writing custom UDFs in java and python for Hive and Pig to extend the functionality.
  • Experience with Cloudera CDH3, CDH4 and CDH5 distributions.
  • Excellent understanding /knowledge on Hadoop (Gen - 1 and Gen-2) and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, Resource Manager (YARN).
  • Experience in managing and reviewing Hadoop log files.
  • I have great working knowledge and experience with HUE.
  • Experience in working with Flume to load the log data from multiple sources directly into HDFS.
  • Excellent understanding and knowledge of NOSQL database HBase and Cassandra.
  • Spark Streaming collects this data from EVENTSHUB in near-real-time and performs necessary transformations and AGGREGATION on the fly to build the common learner data model and persists the data in AZURE DATABASE.
  • Used PIG to do transformations, event joins, filter boot traffic and SOME PRE-AGGREGATIONS before storing the data onto azure database.
  • Expertise with the tools in Hadoop Ecosystem including PIG, HIVE, HDFS, YARN, OOZIE, AND ZOOKEEPER. Hadoop architecture and its components.
  • Involved in integration of Hadoop cluster with spark engine to perform BATCH and GRAPHX operations.
  • I have been experienced with SPARK STREAMING to ingest data into SPARK ENGINE.
  • Import the data from different sources like EVENTHUBS, COSMOS into SPARK RDD.
  • Involved in converting Hive/SQL queries into SPARK TRANSFORMATIONS using Spark RDDs, and SCALA.
  • Developed multiple POCs using SCALA and deployed on the Yarn cluster, compared the performance of Spark, with Hive and SQL/Teradata.
  • Used DATAFRAME API in Scala for converting the distributed collection of data organized into named columns.
  • Involved in converting the JSON data into DATAFRAME and stored into hive tables.
  • Experienced with AZCOPY, LIVY, WINDOWS POWERSHELL and CURL to submit the spark jobs on HDINSIGHT CLUSTER.
  • Analyzed the SQL scripts and designed the solution to implement USING SCALA.
  • Developed EVENTHUBS PRODUCER application in Scala to generate events into eventhubs.
  • Analyzed the SQL scripts and designed the solution to implement using PYSPARK.
  • Used Hive to analyze the PARTITIONED AND BUCKETED data and compute various metrics for reporting.
  • Experience in all stages of SDLC (Agile, Waterfall), writing Technical Design document, Development, Testing and Implementation of Enterprise level Data mart and Data warehouses.
  • Extensive experience working in Oracle, DB2, SQL Server and My SQL database.
  • Experience in J2EE technologies like Struts, JSP/Servlets, and spring.
  • Good Exposure on scripting languages like JavaScript, Angular JS, jQuery and xml.

TECHNICAL SKILLS:

Technology: Hadoop Ecosystem/J2SE/J2EE / Data base

Operating Systems: Windows Vista/XP/NT/2000/ LINUX (Ubuntu, Cent OS), UNIX

DBMS/Databases: DB2, My SQL, PL/SQL

Programming Languages: C, C++, Core Java, XML, JSP/Servlets, Struts, Spring, HTML, JavaScript, jQuery, Web services, Xml.

Big Data Ecosystem: HDFS, Map Reducing, Oozie, Hive, Pig, Sqoop, Flume, splunk, Zookeeper, Kafka and Hbase.

Methodologies: Agile, Water Fall

NOSQL Databases: Hbase

Version Control Tools: SVN, CVS

ETL Tools: IBM data stage 8.1, Informatica

PROFESSIONAL EXPERIENCE:

Sr. Spark Developer

Confidential, Philadelphia

Responsibilities:

  • Developed data pipeline using EVENTHUBS, SPARK, HIVE, PIG AND AZURE SQL DATABASE to ingest customer behavioral data and financial histories into HDINSIGHT cluster for analysis.
  • Involved in creating HDINSIGHT cluster in MICROSOFT AZURE PORTAL also created EVENTSHUB and AZURE SQL DATABASES.
  • Worked on a clustered Hadoop for Windows Azure using HDInsight and HORTONWORKS Data Platform for Windows.
  • Spark Streaming collects this data from EVENTSHUB in near-real-time and performs necessary transformations and AGGREGATION on the fly to build the common learner data model and persists the data in AZURE DATABASE.
  • Used PIG to do transformations, event joins, filter boot traffic and SOME PRE-AGGREGATIONS before storing the data onto azure database.
  • Expertise with the tools in Hadoop Ecosystem including PIG, HIVE, HDFS, YARN, OOZIE, AND ZOOKEEPER. Hadoop architecture and its components.
  • Involved in integration of Hadoop cluster with spark engine to perform BATCH and GRAPHX operations.
  • Exploring with the SPARK improving the performance and optimization of the existing algorithms in Hadoop using SPARK CONTEXT, SPARK-SQL, DATA FRAME, PAIR RDD'S, SPARK YARN.
  • I have been experienced with SPARK STREAMING to ingest data into SPARK ENGINE.
  • Import the data from different sources like EVENTHUBS, COSMOS into SPARK RDD.
  • Developed SPARK CODE using SCALA and Spark-SQL/Streaming for faster testing and processing of data.
  • Involved in converting Hive/SQL queries into SPARK TRANSFORMATIONS using Spark RDDs, and SCALA.
  • Developed multiple POCs using SCALA and deployed on the Yarn cluster, compared the performance of Spark, with Hive and SQL/Teradata.
  • Worked on the SPARK SQL and SPARK STREAMING modules of Spark extensively and Used SCALA to write code for all Spark use cases.
  • Used DATAFRAME API in Scala for converting the distributed collection of data organized into named columns.
  • Involved in converting the JSON data into DATAFRAME and stored into hive tables.
  • Experienced with AZCOPY, LIVY, WINDOWS POWERSHELL and CURL to submit the spark jobs on HDINSIGHT CLUSTER.
  • Analyzed the SQL scripts and designed the solution to implement USING SCALA.
  • Developed EVENTHUBS PRODUCER application in Scala to generate events into eventhubs.
  • Analyzed the SQL scripts and designed the solution to implement using PYSPARK.
  • Used Hive to analyze the PARTITIONED AND BUCKETED data and compute various metrics for reporting.
  • Involved in developing HIVE DDLS to create, alter and drop Hive tables and storm.
  • Create scalable and high-performance web services for data tracking.
  • Involved in loading data from UNIX file system to HDFS. Installed and configured Hive and also written Hive UDFs and Cluster coordination services through Zoo Keeper.
  • Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map reduce way.
  • Involved in using HCATALOG to access Hive table metadata from Map Reduce or Pig code.
  • Computed various metrics using Java Map Reduce to calculate metrics that define user experience.
  • Involved in using SQOOP for importing and exporting data into HDFS.
  • Used Eclipse and ant to build the application. Proficient work experience with NOSQL, Mongo databases also the HDFS data from Rows to Columns and Columns to Rows.
  • Involved in developing Shell scripts to orchestrate execution of all other scripts (Pig, Hive, and Map Reduce) and move the data files within and outside of HDFS.

ENVIRONMENT: Azure HdInsight, Spark, Hive, Spark Sql, Eventhub, Scala Ide, Python, Scala, Maven, Jupyter Notebook, Visual Studio, Unix Shell Scripting.

Spark Developer

Confidential, Dallas TX

Responsibilities:

  • Developed Spark code using Scala, Data Frames and Spark-SQL for faster processing of data.
  • Used Spark Data frames, Spark-SQL extensively.
  • Configured Spark streaming to get ongoing information from the Kafka and store the stream information to HDFS.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDD's, Data frames, Spark SQL and Scala
  • Performed advanced procedures like text analytics and processing, using the in-memory computing capabilities of Spark using Scala.
  • Developed Scala and Spark SQL code to extract data from various databases.
  • Used Spark SQL to process the huge amount of structured data and implemented Spark RDD transformations, actions to migrate Map reduce algorithms.
  • Experienced with Kafka to ingest data into Spark Engine
  • Worked on Spark streaming using Apache Kafka for real time data processing.
  • Performed Data Ingestion from multiple internal clients using Apache Kafka.
  • Developed data pipeline using Spark, Hive, Sqoop and Kafka to ingest customer behavioral data into Hadoop platform for analysis.
  • Extensively worked with all kinds of Un-Structured, Semi-Structured and Structured data.
  • Used Spark-Streaming APIs to perform necessary transformations and actions on the data got from Kafka and Persists into HDFS.
  • Extract Real time feed using Kafka and Spark Streaming and convert it to RDD and process data in the form of Data Frame and save the data as Parquet format in HDFS.
  • Developed multiple Kafka Producers and Consumers from as per the software requirement specifications.
  • Involved in the process of data acquisition, data pre-processing and data exploration of project in Spark using Scala.
  • Developed Spark scripts by using Scala shell commands as per the requirement.
  • Experienced in handling large datasets using partitions, Spark in Memory capabilities, Broadcasts in Spark, Effective & efficient Joins, Transformation and other during ingestion process itself.

Environment: Hadoop, MapReduce, HDFS, Hive, Java, SQL, Spark, Cloudera Manager, Storm Cassandra, Pig, Sqoop, PL/SQL, MySQL, Windows, Horton works Oozie, HBase .

Sr.Hadoop Developer

Confidential

Responsibilities:

  • Worked extensively in creating Map Reduce jobs to power data for search and aggregation. Designed a data warehouse using Hive.
  • Involved in Analyzing system failures, identifying root causes, and recommended course of actions.
  • Using Hive, Map-reduce, and loaded data into HDFS.
  • Worked with systems engineering team to plan and deploy new Hadoop environments and expand existing Hadoop clusters.
  • Monitored workload, job performance and capacity planning using Cloudera Manager.
  • Worked extensively with SQOOP for importing metadata from Oracle.
  • Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
  • Extensively used Pig for data cleansing.
  • Created partitioned tables in Hive.
  • Worked with business teams and created Hive queries for ad hoc access.
  • Evaluated usage of Oozie for Workflow Orchestration.
  • Mentored analyst and test team for writing Hive Queries.
  • Experience in writing MapReduce programs with Java API to cleanse Structured and unstructured data.
  • Experience in RDMS such as Oracle, Teradata
  • Worked on loading the data from MySQL to HBase where necessary using Sqoop.
  • Launching Amazon EC2 Cloud Instances using Amazon Images (Linux/ Ubuntu) and Configuring launched instances with respect to specific applications.

Environment: Hadoop, Big Data, HDFS, MapReduce, Sqoop, Oozie, Pig, Hive, Flume, LINUX, Java, Eclipse.

Confidential

MS SQL Server Developer

Responsibilities:

  • Upgraded the SQL Server 2005 databases to SQL Server 2008.
  • Developed T-SQL Queries, Triggers, Functions and Stored Procedures.
  • Used DDL and DML for writing triggers, stored procedures, and data manipulation.
  • Used SQL Analyzer and Profiler analyzing the SQL statements and procedures.
  • Created Schema Objects such as tables, views, maintained referential integrity and granted roles to the Users.
  • Developed internal reports using SSRS.
  • Created different Data sources and Datasets for the reports.
  • Created views to restrict access to data in a table for security.
  • Trained a team of junior developers.
  • Created detailed technical documentation.

Environment: SQL Server 2005/2008 Enterprise Manager, SSMS, Windows Enterprise Server 2003, SQL Query Analyzer, IIS 6.0 OLAP, VB.NET, .NET, HTML.

We'd love your feedback!