Hadoop/Spark Developer Resume El Segundo, CA - Hire IT People

PROFESSIONAL SUMMARY:

Overall 6+years of programming and software development experience with skills in data analysis, design and development, testing and deployment of software systems from development stage to production stage. Experience in Big - Data technologies like SPARK, Kafka, Hive, Sqoop, HDFS, Oozie, DynamoDB, SQS, SNS, Scala, and Lambda. Keen interest in problem solving and designing and implementing effective software applications.
Extensively worked on Spark using Scala on cluster for computational (analytics).
Creating end to end Spark applications using Scala to perform various data cleansing, validation, transformation and summarization activities on user behavioral data.
Extensively worked on the Spark Core, Spark SQL, and Spark Streaming.
Integrated Spark Streaming with Kafka for computational analytics for real time processing.
Exploring Spark improving the performance and optimization and monitoring of Spark.
Experience in migrating the data using Sqoop from HDFS to Relational Database System and vice-versa according to client’s requirement.
Analyzed the data by performing Hive queries (HiveQL) Pig scripts (Pig Latin) and custom Map reduce programs in Java to study customer behavior.
Extensively worked with the Kafka applications for Real-time processing of the transactions.
Good experience working with different Hadoop file formats like Sequence File, JSON, ORC, and AVRO.
Good experience in working with cloud environment Amazon Web Services (AWS) EMR, DynamoDB, SQS, SNS, EC2, and S3.
Imported the data from different sources like AWS S3 Local file system into Spark RDD .
Hands on experience and Good Knowledge on real time data feeding platform- KAFKA , integration with Spark Framework.
Experience with developing and maintaining Applications written for Amazon Simple Storage , AWSElastic Map Reduce , and AWS Cloud Formation
Deployed instances in AWS EC2 and used EBS stores for persistent storage and also performed access management using IAM service.
Ensure data integrity and data security on AWS technology by implementing AWS best practices.
Good knowledge of Data warehousing concepts and ETL processes.
Knowledge of manipulating/analyzing large datasets and finding patterns and insights within structured and unstructured data.
Experience in Branching, Merging, Tagging and maintaining the version across the environments using SCM tools like Subversion (SVN), GIT (GitHub, GitLab).
Knowledge of installing, configuring, debugging and troubleshooting Hadoop clusters.
Experienced with UNIX and Linux distro (Redhat, Ubuntu, CentOS, Debian).
Quick learning skills and effective team spirit with good communication skills.
Strong analytical and Problem-solving skills.

TECHNICAL SKILLS:

Big Data Technologies: HDFS, MapReduce, Hive, Pig, Tez, Sqoop, Spark, Kafka, DynamoDB, Flume, Impala, Oozie, Hue, HBase, Hue, HiveQL, Pig Latin, Zookeeper.

Advanced Big Data Technologies: Spark Core, Spark SQL, Spark Streaming, Kafka, HBase, Cassandra

Programming: Java, Scala, J2EE, SQL, UNIX, HiveQL, Pig Latin, HTML, XML, CSS, JavaScript, JDBC.

Cloud Computing: AMI, EC2, EMR, Volumes, Snapshots, S3, EBS, RDS, DynamoDB, Elastic Cache, Redshift, SQS, SNS, Lambda, VPC, AWS CLI, CloudWatch.

DevOps: Jenkins, Git, Maven, SVN, SBT

RDBMS: MySQL, SQL server, Oracle

Web Server: Apache Tomcat and Oracle Web logic server

Operating System: Windows, Unix and Linux. MacOS

IDE and Software: IntelliJ, Eclipse, Net beans.

Version Control: Subversion(SVN), Git, Bitbucket

Others: Putty, WinScp, Cygwin, Tectia, SFTP

PROFESSIONAL EXPERIENCE:

Confidential - El Segundo, CA

Hadoop/Spark Developer

Responsibilities:

Involved in various stages of Software Development Life Cycle (SDLC) during application development.
Created and maintained data pipeline.
Roles and Responsibilities:
Created aggregated functions and groupings on the sensor data to determine the user behavioral patterns and responsible for storing and analyzing clickstream data.
Integrating Data Stream from Kafka Sources with Spark Streaming for Data analysis.
Enhanced and optimized product Spark code to aggregate, group and run data mining tasks using the Spark framework.
Analyzed data from Spark Streaming will be stored in the AWS DynamoDB.
Experienced working with the caching database Redis for quick lookup of the user information for fast access of data.
Used Amazon Simple Queue Service for message queuing service to send messages in case of application issues.
Used Amazon Simple Notification Service for coordinating and manages the delivery of messages to the subscribed topic of SQS to send the notifications in case of application exceptions.
Experienced with Amazon Lambda server less event driven services for processing data whenever an event (DynamoDB record) is inserted into the DynamoDB table.
Used sqoop to export the data from RDBMs to HDFS for historical data.
Experience with creating hive internal and external tables for the structured data available in the HDFS.
Implemented bucketing concepts in Hive and External tables were designed to enhance the performance.
Worked on Snappy compression for Avro and parquet files.
Oozie workflow engine was installed to run multiple Hive jobs.

Environment: Hadoop, HDFS, Amazon EMR, Spark Streaming, Kafka, DynamoDB, SQS, SNS, Hive, sqoop, Oozie, Zookeeper, Oracle.

Confidential - Nashua, NH

Hadoop Developer/Big Data Analyst

Responsibilities:

Worked closely with the functional team to gather and understand business requirements determine feasibility to convert them to technical tasks in the Design Documents.
Extracted the data from various SQL servers into HDFS using SQOOP for easy data manipulation.
Created Hive external and internal tables according to business requirement.
Used Hive and created Hive tables and involved in data loading and writing Hive UDFs.
Developed Hive queries to process the data for visualizing and reporting.
Implemented the Hive queries for aggregating the data and extracting useful information by sorting the data according to required attributes.
Worked on implementing Partition, Dynamic Partition for easy access of multiple days of data.
Implemented Bucketing in Hive for efficiently accessing data and for optimized analysis of data.
Worked with complex data structures formats like Json files for processing of data in hive.
Implemented Pig Latin scripts to clean the data.
Created oozie workflows for running the hive mappings to schedule jobs in a sequential manner.
Applied compression techniques on the hive tables for better storage performance benefits.
Worked closely with business team to gather requirements and add new support features.
Implemented Partitioning, Dynamic Partitions and Buckets in HIVE for more efficient data access.
Created Autosys jobs for scheduling and monitoring and reporting the oozie jobs.
Strong analytical and problem-solving skills.

Environment: Hadoop, (MapReduce/YARN), HDFS, Hive, Impala, Pig, Oozie, Zookeeper, Teradata, Cloudera CDH, Autosys, Microstrategy.

Confidential, Atlanta, GA

Hadoop Developer

Responsibilities:

Installed and configured Hadoop, Developed multiple MapReduce jobs in Java for data cleaning and preprocessing. Installed and configured Pig for ETL jobs.
Troubleshooting the cluster by reviewing HadoopLOG files.
Imported data using Sqoop from Teradata using Teradata connector.
Used Oozie to orchestrate the work flow.
Creating Hive tables and working on them for data analysis in order to meet the business requirements.
Designed and implemented MapReduce-based large-scale parallel relation-learning system.
Installed and benchmarked Hadoop/HBase clusters for internal use.
Written HBASE Client program in Java and webservices.
Model, serialize, and manipulate data in multiple forms (xml).
Experience with data model concepts-star schema dimensional modeling Relational design (ER). Supported post production enhancements.
Created tables, stored procedures in SQL for data manipulation and retrieval, Database Modification using SQL, PL/SQL, Stored procedures, triggers, Views in Oracle 9i.
Created User Interface using JSF.
Involved in integration testing the Business Logic layer and Data Access layer.
Integrated JSF with JSP and used JSF Custom Tag Libraries to display the value of variables defined in configuration files.
Used technologies like JSP, JSTL, JavaScript and Tiles for Presentation tier
Involved inJUnit testing of the application using JUnit framework.

Environment: Hadoop, MapReduce, HDFS, Hive, Java, Cloudera, Pig, HBase, Linux, XML, MySQL Workbench, Java 6 , Eclipse, Oracle 10g, PL/SQL, SQL*PLUS .

Confidential, Cleveland, OH

Java Developer

Responsibilities:

Involved in all the development phases of SDLC including gathering requirements, documenting the requirements as Use case documents.
Designed, deployed and tested Multi-tier application using the Java technologies.
Involved in front end development using JSP, HTML & CSS.
Implemented the Application using spring MVC Framework
Deployed the application on Oracle Web logic server
Implemented Multithreading concepts in java classes to avoid deadlocking.
Used MySQL database to store data and execute SQL queries on the backend.
Prepared and Maintained test environment .Tested the application before going live to production. Documented and communicated test result to the team lead on daily basis.
Involved in weekly meeting with team leads and manager to discuss the issues and status of the projects.

Environment:: J2EE (Java, JSP, JDBC, Multi-Threading), HTML, Oracle Web logic server, Eclipse, MySQL, Junit

We provide IT Staff Augmentation Services!

Hadoop/spark Developer Resume

El Segundo, CA

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship