Spark Developer Resume

SUMMARY

Around 9+ years of professional IT experience which includes experience in Big data ecosystem and Java/J2EE related technologies.
Good exposure in following all the process in a production environment like change management, incident management and managing escalations
Experience with AWS components like Amazon Ec2 instances, S3 buckets and EBS Volumes.
Experience in Hadoop architecture and various components such as HDFS, Job Tracker, Task Tracker, NameNode, Data Node and MapReduce programming paradigm.
Hands on experience in installing, configuring, and using Hadoop ecosystem components like Hadoop MapReduce, HDFS, HBase, Hive, Sqoop, Pig, Flume and kafka.
In depth and extensive knowledge of analyzing data using HiveQL, Pig Latin, HBase and custom Map Reduce programs in Java..
Good Knowledge on Hadoop Cluster architecture and working with Hadoop clusters using Cloudera (CDH5) and HortonWorks Distributions.
Excellent understanding and knowledge of NOSQL databases like MongoDB and HBase.
Extensive knowledge on file formats like AVRO, sequence files, Parquet, ORC and RC .
Experience in importing and exporting data using Sqoop to Relational Database Systems and vice - versa.
Good knowledge in using job scheduling and workflow designing tools like Oozie.
Have good experience creating real time data streaming solutions using Apache Spark, Kafka and Flume.
Experienced in Developing Spark application using Spark Core, Spark SQL and Spark Streaming API's.
Experience in managing Hadoop clusters using Cloudera Manager Tool and Ambari.
Experience in Installing and monitoring standalone multi-node Clusters of Kafka and Storm
Very good experience in complete project life cycle (design, development, testing and implementation) Rapid Application Development (RAD), Agile Methodology and Scrum software development processes
Highly skilled with object oriented architectures and patterns, systems analysis, software design, effective coding practices, databases, and servers.
Designed and Maintained Oozie workflows to manage the flow of jobs in the cluster
Developer in Big Data team, worked with Hadoop AWS cloud, and its ecosystem.
Hands on experience in application development using Java, RDBMS, and Linux shell scripting.
Experience in Java, JSP, Servlets, WebLogic, WebSphere, Hibernate, Spring, JBoss, JDBC, Java Script, Ajax, Jquery, XML, and HTML.
Experience in different compression techniques like Gzip, LZO, Snappy and Bzip2
Experience in working with multi/ multiple Operating Systems like Windows, Linux and strong knowledge with troubleshooting, finding and fixing critical problems.
Functional knowledge of Banking and Health Insurance domain.
Ability to adapt to evolving technology, strong sense of responsibility and accomplishment.
Proficient with Core JAVA, AWT and also with the markup languages like HTML 5.0,XHTML, DHTML, CSS, XML 1.1, XSL, XSLT, XPath, XQuery, Angular.js, Node.js
Worked with version control systems like Subversion, Perforce, and GIT for providing common platform for all the developers.
Articulate in written and verbal communication along with strong interpersonal, analytical, and organizational skills.
Experience in preparing deployment packages and deploying to Dev and QA environments and prepare deployment instructions to Production Deployment Team.
Highly motivated team player with the ability to work independently and adapt quickly to new and emerging technologies.
Creatively communicate and present models to business customers and executives, utilizing a variety of formats and visualization methodologies.

TECHNICAL SKILLS

Big Data Technologies: Apache Hadoop (MRv1, MRv2), Hive, Pig, Sqoop, HBase, MongoDB, Flume, Spark, Zookeeper, Oozie.

Languages: C, Java, SQL/PLSQL.

Methodologies: Agile, Rad, V-model.

Databases: Oracle, MySQL, MongoDB, Hbase, MS SQL server.

Web Technologies: HTML, JSP, JSF, CSS, JavaScript, JSON & AJAX

IDE’s: Eclipse, Netbeans

Build tools: Maven, Ant.

Web services: SOAP & RESTful Web Services

Cloud solutions: Amazon Web Services (AWS)

Monitoring Tools: Wire shark, Nagios, Ganglia

Operating System: Windows, Ubuntu, Red Hat Linux, Cent OS.

Scripting languages: JavaScript, Shell Scripting.

PROFESSIONAL EXPERIENCE

Confidential, MA

Spark Developer

Responsibilities:

Used Sqoop to ingest various type of financial data into HDFS (Encryption Zone).
Created a SFTP module that would pull the data from FTP server using spark for daily ingestion.
Performing integrity check which validates the files after ingestion using Message Digest Algorithm
Performing various benchmarking steps to optimize the performance of spark jobs and thus improve the overall batch processing.
Creating Hive tables on the data ingested.Maintaining the data in Text,Avro and ORC file formats .
Optimizing Hive query performance by using various techniques.
Developed shell scripts to generate the hive create statements from the data and load the data into the table
Analyzing data with HIVE, TEZ, Spark SQL and comparing its results with TEZ and SPARK SQL.
Used ORC file format and used various Optimization techniques for improving query performance.
Analyzed user transactions and implemented various performance optimization measures including but not limited to implementing partitions and buckets in HiveQL
Worked on Providing User support and application support on Hadoop Infrastructure.
Working on Encryption Zone in AWS and securing the financial data accordingly.
Creating Apache Ranger policies and masking confidentail data accroding to the business requirement .
Connecting Tableau and SAP BO to hive tables and perform business analytics and Reporting.
All the Mterics data is directly published in kafka, where it is consumed by a consumers group called Spark Streaming API .
Used Spark Streaming APIs to perform transformations and actions on the fly for building common learner data model which gets the data from Kafka in near real time and persist it to AWS S3.
Load D-Stream data into Spark RDD and do in memory data computation to generate Output response
Assisted in monitoring Hadoop cluster using tools like Ambari
Worked and learned a great deal from AWS Cloud services like EC2, S3, EBS and VPC.
Used Hadoop as ETL instead of Informatica and migrated an existing on-premises applications to AWS. Used AWS services like EC2 and S3 for small data sets processing and storage, Experienced in Maintaining the Hadoop cluster on AWS EC2.
Implemented a Continuous Delivery pipeline with Gitlab.
Developing and maintaining Workflow Scheduling Jobs in Oozie for importing data from RDBMS to Hive.
Worked in Kerberos, Ranger Active Directory/LDAP, Unix based File System

Environment: Hortonworks Distribution of Hadoop, Apache Ranger, AWS, Java (JDK 1.8), MySQL, Apache Kafka, DB2,SybaseIQ, Kafka,Tez, Spark SQL, Spark Streaming, UNIX Shell Scripting

Confidential, San jose, CA

Big Data consultant

Responsibilities:

Worked with Hortonworks Distribution and involved in installation and configuration of parcels of various Hadoop eco system components including HDFS, Pig, Hive, Sqoop, Hbase.
Developed data pipeline using Sqoop to ingest customer behavioural data into HDFS for analysis.
Import data into Hive using Sqoop from RDBS system (SAP HANA) and Teradata.
Written Hive queries structure them in tabular format to facilitate effective querying on the log data to perform business analytics .
Good experience in Hive partitioning, bucketing and perform different types of joins on Hive tables and implementing Hive SerDe.
Optimizing Hive query performance and handle 20TB of data per day .
Provided production support for cluster maintenance and its tuning.
Triggered workflows based on timeoravailability of data using Oozie.
Monitoring and Debugging Hadoop jobs/Applications running in production using AMBARI
Worked on Providing User support and application support on Hadoop Infrastructure.

Environment: Hortonworks Distribution of Hadoop, HDFS, Oozie, Java (JDK 1.6), Eclipse, Tez, UNIX Shell Scripting

Confidential, Eden prairie, MN

Big Data Consultant

Responsibilities:

Worked with Hortonworks Distribution and involved in installation and configuration of parcels of various Hadoop eco system components including HDFS, Pig, Hive, Sqoop, Hbase.
Launching Amazon EC2 Cloud Instances using Amazon Images and configuring hadoop instances with respect to specific applications.
Creating Private networks and sub-networks and bringing instances under them based on the requirement.
Creating Security groups for both individual instances and for group of instances under a network.
Evaluate, refine, and continuously improve the efficiency and accuracy of existing Predictive Models.
Developed data pipeline using Sqoop to ingest customer behavioural data into HDFS for analysis.
Worked on installing cluster, Commissioning & Decommissioning of Datanode, Namenode recovery and capacity planning.
Import data into Hive using Sqoop from RDBS systems
Written Hive queries to parse the logs and structure them in tabular format to facilitate effective querying on the log data to perform business analytics .
Good experience in Hive partitioning, bucketing and perform different types of joins on Hive tables and implementing Hive SerDe.
Used Hbase to perform fast, random reads and writes to all data stored and integrate with other components like Hive.
Provided production support for cluster maintenance and its tuning.
Triggered workflows based on timeoravailability of data using Oozie.
Monitoring and Debugging Hadoop jobs/Applications running in production.
Worked on Providing User support and application support on Hadoop Infrastructure.
Added authorization to the server using the user’s Kerberos identity to determine which role each was and which operations they could perform
Used Spark for interactive queries, processing of streaming data and integration with popular NoSQL database for huge volume of data.
Implemented Spark using java,scala and utilizing Data frames, Spark SQL API for faster processing of data
Used Spark SQL to analyze web logs and use Spark tranformations and actions to compute some statistics for web server monitoring using Scala.

Environment: AWS, EC2, S3, Hortonworks Distribution of Hadoop, HDFS, Oozie, Java (JDK 1.6), Eclipse, MySQL, Kafka, Impala, Spark SQL, Spark Streaming, UNIX Shell Scripting

Confidential, Omaha NE

Hadoop Developer

Responsibilities:

Handle the installation and configuration of a Hadoop cluster.
Handle the data exchange between HDFS and different Web Applications and databases using Flume and Sqoop.
Close monitoring and analysis of the Map Reduce job executions on cluster at task level.
Changes to the configuration properties of the cluster based on volume of the data being processed and performance of the cluster.
Commission and decommission the Data nodes from cluster in case of problems.
Worked on No SQL databases like MongoDB and ingest the data into HDFS
Worked on No SQL databases like MongoDB.
Understanding concepts like replication, sharding and implementing them using Mongo DB.
Worked on NoSQL (HBase) for support enterprise production.
Loading data into HBASE using HIVE and SQOOP.
Involved in upgradation process of the Hadoop cluster from CDH3 to CDH4
Creating Hive tables and working on them using Hive QL.
Performed cluster co-ordination and assisted with data capacity planning and node forecasting using Zookeeper.
Installed Hadoop, Map Reduce, HDFS,developed multiple MapReduce jobs in PIG and Hive for data cleaning and pre-processing
Developed job flows in Oozie to automate the workflow for extraction of data from warehouses and weblogs.
Pig UDFs for custom data processing (clean, edit and format unstructured data) .

Environment: Hadoop, MapReduce, HDFS, Hive, Java, Hadoop distribution of Horton Works, Pig, HBase, Linux, XML, MySQL, MySQL Workbench, Java 6, Eclipse, Oracle 10g, PL/SQL, SQL*PLUS.

We provide IT Staff Augmentation Services!

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship