Hadoop/spark Developer Resume
El Segundo, CA
PROFESSIONAL SUMMARY:
- Overall 6+years of programming and software development experience with skills in data analysis, design and development, testing and deployment of software systems from development stage to production stage. Experience in Big - Data technologies like SPARK, Kafka, Hive, Sqoop, HDFS, Oozie, DynamoDB, SQS, SNS, Scala, and Lambda. Keen interest in problem solving and designing and implementing effective software applications.
- Extensively worked on Spark using Scala on cluster for computational (analytics).
- Creating end to end Spark applications using Scala to perform various data cleansing, validation, transformation and summarization activities on user behavioral data.
- Extensively worked on the Spark Core, Spark SQL, and Spark Streaming.
- Integrated Spark Streaming with Kafka for computational analytics for real time processing.
- Exploring Spark improving the performance and optimization and monitoring of Spark.
- Experience in migrating the data using Sqoop from HDFS to Relational Database System and vice-versa according to client’s requirement.
- Analyzed the data by performing Hive queries (HiveQL) Pig scripts (Pig Latin) and custom Map reduce programs in Java to study customer behavior.
- Extensively worked with the Kafka applications for Real-time processing of the transactions.
- Good experience working with different Hadoop file formats like Sequence File, JSON, ORC, and AVRO.
- Good experience in working with cloud environment Amazon Web Services (AWS) EMR, DynamoDB, SQS, SNS, EC2, and S3.
- Imported the data from different sources like AWS S3 Local file system into Spark RDD .
- Hands on experience and Good Knowledge on real time data feeding platform- KAFKA , integration with Spark Framework.
- Experience with developing and maintaining Applications written for Amazon Simple Storage , AWSElastic Map Reduce , and AWS Cloud Formation
- Deployed instances in AWS EC2 and used EBS stores for persistent storage and also performed access management using IAM service.
- Ensure data integrity and data security on AWS technology by implementing AWS best practices.
- Good knowledge of Data warehousing concepts and ETL processes.
- Knowledge of manipulating/analyzing large datasets and finding patterns and insights within structured and unstructured data.
- Experience in Branching, Merging, Tagging and maintaining the version across the environments using SCM tools like Subversion (SVN), GIT (GitHub, GitLab).
- Knowledge of installing, configuring, debugging and troubleshooting Hadoop clusters.
- Experienced with UNIX and Linux distro (Redhat, Ubuntu, CentOS, Debian).
- Quick learning skills and effective team spirit with good communication skills.
- Strong analytical and Problem-solving skills.
TECHNICAL SKILLS:
Big Data Technologies: HDFS, MapReduce, Hive, Pig, Tez, Sqoop, Spark, Kafka, DynamoDB, Flume, Impala, Oozie, Hue, HBase, Hue, HiveQL, Pig Latin, Zookeeper.
Advanced Big Data Technologies: Spark Core, Spark SQL, Spark Streaming, Kafka, HBase, Cassandra
Programming: Java, Scala, J2EE, SQL, UNIX, HiveQL, Pig Latin, HTML, XML, CSS, JavaScript, JDBC.
Cloud Computing: AMI, EC2, EMR, Volumes, Snapshots, S3, EBS, RDS, DynamoDB, Elastic Cache, Redshift, SQS, SNS, Lambda, VPC, AWS CLI, CloudWatch.
DevOps: Jenkins, Git, Maven, SVN, SBT
RDBMS: MySQL, SQL server, Oracle
Web Server: Apache Tomcat and Oracle Web logic server
Operating System: Windows, Unix and Linux. MacOS
IDE and Software: IntelliJ, Eclipse, Net beans.
Version Control: Subversion(SVN), Git, Bitbucket
Others: Putty, WinScp, Cygwin, Tectia, SFTP
PROFESSIONAL EXPERIENCE:
Confidential - El Segundo, CA
Hadoop/Spark Developer
Responsibilities:
- Involved in various stages of Software Development Life Cycle (SDLC) during application development.
- Created and maintained data pipeline.
- Roles and Responsibilities:
- Created aggregated functions and groupings on the sensor data to determine the user behavioral patterns and responsible for storing and analyzing clickstream data.
- Integrating Data Stream from Kafka Sources with Spark Streaming for Data analysis.
- Enhanced and optimized product Spark code to aggregate, group and run data mining tasks using the Spark framework.
- Analyzed data from Spark Streaming will be stored in the AWS DynamoDB.
- Experienced working with the caching database Redis for quick lookup of the user information for fast access of data.
- Used Amazon Simple Queue Service for message queuing service to send messages in case of application issues.
- Used Amazon Simple Notification Service for coordinating and manages the delivery of messages to the subscribed topic of SQS to send the notifications in case of application exceptions.
- Experienced with Amazon Lambda server less event driven services for processing data whenever an event (DynamoDB record) is inserted into the DynamoDB table.
- Used sqoop to export the data from RDBMs to HDFS for historical data.
- Experience with creating hive internal and external tables for the structured data available in the HDFS.
- Implemented bucketing concepts in Hive and External tables were designed to enhance the performance.
- Worked on Snappy compression for Avro and parquet files.
- Oozie workflow engine was installed to run multiple Hive jobs.
Environment: Hadoop, HDFS, Amazon EMR, Spark Streaming, Kafka, DynamoDB, SQS, SNS, Hive, sqoop, Oozie, Zookeeper, Oracle.
Confidential - Nashua, NH
Hadoop Developer/Big Data Analyst
Responsibilities:
- Worked closely with the functional team to gather and understand business requirements determine feasibility to convert them to technical tasks in the Design Documents.
- Extracted the data from various SQL servers into HDFS using SQOOP for easy data manipulation.
- Created Hive external and internal tables according to business requirement.
- Used Hive and created Hive tables and involved in data loading and writing Hive UDFs.
- Developed Hive queries to process the data for visualizing and reporting.
- Implemented the Hive queries for aggregating the data and extracting useful information by sorting the data according to required attributes.
- Worked on implementing Partition, Dynamic Partition for easy access of multiple days of data.
- Implemented Bucketing in Hive for efficiently accessing data and for optimized analysis of data.
- Worked with complex data structures formats like Json files for processing of data in hive.
- Implemented Pig Latin scripts to clean the data.
- Created oozie workflows for running the hive mappings to schedule jobs in a sequential manner.
- Applied compression techniques on the hive tables for better storage performance benefits.
- Worked closely with business team to gather requirements and add new support features.
- Implemented Partitioning, Dynamic Partitions and Buckets in HIVE for more efficient data access.
- Created Autosys jobs for scheduling and monitoring and reporting the oozie jobs.
- Strong analytical and problem-solving skills.
Environment: Hadoop, (MapReduce/YARN), HDFS, Hive, Impala, Pig, Oozie, Zookeeper, Teradata, Cloudera CDH, Autosys, Microstrategy.
Confidential, Atlanta, GA
Hadoop Developer
Responsibilities:
- Installed and configured Hadoop, Developed multiple MapReduce jobs in Java for data cleaning and preprocessing. Installed and configured Pig for ETL jobs.
- Troubleshooting the cluster by reviewing HadoopLOG files.
- Imported data using Sqoop from Teradata using Teradata connector.
- Used Oozie to orchestrate the work flow.
- Creating Hive tables and working on them for data analysis in order to meet the business requirements.
- Designed and implemented MapReduce-based large-scale parallel relation-learning system.
- Installed and benchmarked Hadoop/HBase clusters for internal use.
- Written HBASE Client program in Java and webservices.
- Model, serialize, and manipulate data in multiple forms (xml).
- Experience with data model concepts-star schema dimensional modeling Relational design (ER). Supported post production enhancements.
- Created tables, stored procedures in SQL for data manipulation and retrieval, Database Modification using SQL, PL/SQL, Stored procedures, triggers, Views in Oracle 9i.
- Created User Interface using JSF.
- Involved in integration testing the Business Logic layer and Data Access layer.
- Integrated JSF with JSP and used JSF Custom Tag Libraries to display the value of variables defined in configuration files.
- Used technologies like JSP, JSTL, JavaScript and Tiles for Presentation tier
- Involved inJUnit testing of the application using JUnit framework.
Environment: Hadoop, MapReduce, HDFS, Hive, Java, Cloudera, Pig, HBase, Linux, XML, MySQL Workbench, Java 6 , Eclipse, Oracle 10g, PL/SQL, SQL*PLUS .
Confidential, Cleveland, OH
Java Developer
Responsibilities:
- Involved in all the development phases of SDLC including gathering requirements, documenting the requirements as Use case documents.
- Designed, deployed and tested Multi-tier application using the Java technologies.
- Involved in front end development using JSP, HTML & CSS.
- Implemented the Application using spring MVC Framework
- Deployed the application on Oracle Web logic server
- Implemented Multithreading concepts in java classes to avoid deadlocking.
- Used MySQL database to store data and execute SQL queries on the backend.
- Prepared and Maintained test environment .Tested the application before going live to production. Documented and communicated test result to the team lead on daily basis.
- Involved in weekly meeting with team leads and manager to discuss the issues and status of the projects.
Environment:: J2EE (Java, JSP, JDBC, Multi-Threading), HTML, Oracle Web logic server, Eclipse, MySQL, Junit
