Spark & Hadoop Developer Resume
PROFESSIONAL SUMMARY:
- 8 years of experience in IT, which includes experience in Bigdata Technologies, Hadoop ecosystem, Data Warehousing, SQL related technologies in Retail, Manufacturing, Financial and Communication sectors
- 5 Years of experience in Big Data Analytics using Various Hadoop eco - systems tools and Spark Framework and Currently working on Spark and Spark Streaming frameworks extensively using Scala as the main programming dialect
- Experience installing/configuring/maintaining Apache Hadoop clusters for application development and Hadoop tools like Sqoop, Hive, PIG, Flume, HBase, Kafka, Hue, Storm, Zoo Keeper, Oozie, Cassandra, Sqoop, Python
- Worked with major distributions like Cloudera (CDH 3&4) & Horton works Distributions and AWS. Also worked on Unix and DWH in support for various Distributions
- Hands on experience in developing and deploying enterprise-based applications using major components in Hadoop ecosystem like Hadoop 2.X, YARN, Hive, Pig, MapReduce, Spark, Kafka, Storm, Oozie, HBase, Flume, Sqoop and Zookeeper
- Experience in handling large datasets using Partitions, Spark in memory capabilities, Broadcasts in Spark with Scala and python, Effective and efficient Joins, Transformations and other during ingestion process itself
- Experience in developing data pipeline using Pig, Sqoop, and Flume to extract the data from weblogs and store in HDFS and accomplished developing Pig Latin Scripts and using HiveQL for data analytics
- Extensively dealt with Spark Streaming and Apache Kafka to fetch live stream data.
- Experience in converting Hive/SQL queries into Spark transformations using Java and experience in ETL development using Kafka, Flume and Sqoop
- Good experience in writing Spark applications using Scala and Java and used Scala set to develop Scala projects and executed using Spark-Submit
- Experience working on NoSQL databases including HBase, Cassandra and MongoDB and experience using Sqoop to import data into HDFS from RDBMS and vice-versa
- Developed Spark scripts by using Scala shell commands as per the requirement
- Good experience in writing Sqoop queries for transferring bulk data between Apache Hadoop and structured data stores
- Substantial experience in writing Map Reduce jobs in Java, PIG, Flume, Zookeeper, Hive and Storm
- Created multiple MapReduce Jobs using Java API, Pig and Hive for data extraction
- Strong expertise in troubleshooting and performance fine-tuning Spark, MapReduce and Hive applications
- Good experience on working with Amazon EMR framework for processing data on EMR and EC2 instances
- Created AWS VPC network for the installed Instances and configured security groups and Elastic IP s Accordingly
- Developed AWS Cloud formation templates to create custom sized VPC, subnets, EC2 instances, ELB and security groups
- Extensive experience in developing applications that perform Data Processing tasks using Teradata, Oracle, SQL Server and MySQL database
- Worked on data warehousing and ETL tools like Informatica, Tableau, and Pentaho
- Experience in understanding the security requirements for Hadoop and integrate with Kerberos authentication and authorization infrastructure
- Acquaintance with Agile and Waterfall methodologies. Responsible for handling several clients facing meetings with great communication skills
SKILL:
Big Data Technologies: HDFS, MapReduce, Hive, Pig, Sqoop, Flume, Oozie, Zookeeper, Kafka, Cassandra, Apache Spark, Spark Streaming, HBase, Flume, Impala
Hadoop Distribution: Cloudera, Horton Works, Apache, AWS
Languages: Java, SQL, PL/SQL, Python, Pig Latin, HiveQL, Scala, Regular Expressions
Web Technologies: HTML, CSS, JavaScript, XML, JSP, Restful, SOAP
Operating Systems: Windows (XP/7/8/10), UNIX, LINUX, UBUNTU, CENTOS.
Portals/Application servers: WebLogic, WebSphere Application server, WebSphere Portal server, JBOSS
Build Automation tools: SBT, Ant, Maven
Version Control: GIT
IDE & Build Tools, Design: Eclipse, Visual Studio, Net Beans, Rational Application Developer, Junit
Databases: Oracle, SQL Server, MySQL, MS Access, NoSQL Database (HBase, Cassandra, MongoDB), Teradata.
PROFESSIONAL EXPERIENCE:
Confidential
Spark & Hadoop Developer
Responsibilities:
- Worked on analyzing Hadoop cluster and different big data analytical and processing tools including Pig, Hive, Sqoop, python and Spark with Scala & java, Spark Streaming
- Wrote Spark - Streaming applications to consume the data from Kafka topics and wrote processed streams to HBase and steamed data using Spark with Kafka
- Worked on the large-scale Hadoop YARN cluster for distributed data processing and analysis using Spark, Hive, and HBase
- Involved in creating data-lake by extracting customer's data from various data sources to HDFS which include data from Excel, databases, and log data from servers
- Developed Apache Spark applications by using Scala and python for data processing from various streaming sources
- Used Scala to convert Hive/SQL queries into RDD transformations in Apache Spark
- Implemented Spark solutions to generate reports, fetch and load data in Hive
- Experienced in writing real-time processing and core jobs using Spark Streaming with Kafka as a data pipeline system
- Written HiveQL to analyses the number of unique visitors and their visit information such as views, most visited pages, etc.
- Configured Spark Streaming to receive real time data from the Apache Kafka and store the stream data to HDFS using Scala
- Monitored workload, job performance and capacity planning using Cloudera Manager
- Experienced on working with Amazon EMR framework for processing data on EMR and EC2 instances
- Designing and implementing complete end-to-end Hadoop Infrastructure including Pig, Hive, Sqoop, Oozie, Flume, and Zookeeper
- Further used pig to do transformations, event joins, elephant bird API and pre -aggregations performed before loading JSON files format onto HDFS
- Involved in resolving performance issues in Pig and Hive with understanding of Map Reduce physical plan execution and using debugging commands to run code in optimized way
- Used Spark to perform analytics on data in Hive and experienced with ETL working with Hive and Map-Reduce
Environment: Hdp 2.6.0, HDFS, MapReduce, Spark Streaming, Spark-Core, Spark SQL, Scala, Pig 0.14, Hive 1.2.1, Sqoop 1.4.4, Flume 1.6.0, Kafka, JSON, HBase.
Confidential, Jacksonville, FL
Hadoop/Java Developer
Responsibilities:
- Responsible for architecting Hadoop clusters with CDH3 and involved in installation of CDH3 and upgradation to CDH4 from CDH3
- Worked on creating Key space in Cassandra for saving the Spark Batch output
- Worked on Spark application to compact the small files present into hive ecosystem to make it equivalent to block size of HDFS
- Manage migration of on - perm servers to AWS by creating golden images for upload and deployment
- Manage multiple AWS accounts with multiple VPC s for both production and non-production where primary objectives are automation, build out, integration and cost control
- Implemented the real time streaming ingestion using Kafka and Spark Streaming
- Loaded data using Spark-streaming with Scala and Python
- Involved in requirement and design phase to implement Streaming Lambda Architecture to use real time streaming using Spark and Kafka and Scala
- Experience in loading the data into Spark RDD and performing in-memory data computation to generate the output responses
- Migrated complex map reduce programs into In-memory Spark processing using
- Transformations and actions
- Developed full text search platform using NoSQL and Logstash Elastic Search engine, allowing for much faster, more scalable and more intuitive user searches
- Developed the Sqoop scripts to make the interaction between Pig and MySQL Database
- Worked on Performance Enhancement in Pig, Hive and HBase on multiple nodes
- Worked with Distributed n-tier architecture and Client/Server architecture
- Supported Map Reduce Programs those are running on the cluster and developed multiple Map Reduce jobs in Java for data cleaning and pre-processing
- Developed MapReduce application using Hadoop, MapReduce programming and HBase
- Evaluated usage of Oozie for Work Flow Orchestration and experienced in cluster coordination using Zookeeper
- Developing ETL jobs with organization and project defined standards and processes
- Experienced in enabling Kerberos authentication in ETL process
- Implemented data access using Hibernate persistence framework
- Design of GUI using Model View Controller Architecture (STRUTS Frame Work)
- Integrated Spring DAO for data access using Hibernate and involved in the Development of Spring Framework Controllers
Environment: Hadoop 2.X, HDFS, MapReduce, Hive, Pig, Sqoop, Oozie, HBase, Java, J2EE, Eclipse, HQL.
Confidential, Woodcliff Lake, NJ
Hadoop Developer
Responsibilities:
- Installed and configured Hadoop MapReduce, HDFS and developed multiple MapReduce jobs in Java for data cleansing and prep
- Worked on Spark streaming to collect TB's of data for every hour from connected cars
- Worked on Spark Batch Processing to load the processed data into Cassandra
- Moving log data periodically into HDFS using Flume and building multi - hop flows, fan-out flows, and failover mechanism
- Developed MapReduce jobs to automate transfer of data from HBase and to read data files and scrub the data
- Experience in Spark programming using Scala.
- Experienced in performing ETL using Spark, Spark SQL
- Transferring data between MySQL and HDFS using Sqoop with connectors
- Creating and populating Hive tables and writing Hive queries for data analysis to meet the business requirements
- Transformed Kafka loaded data using Spark-streaming with Scala and Python
- Installed and configured Pig and written Pig Latin scripts
- Migrating data from MySQL database to HBase. Running MapReduce jobs to access HBase data from application using Java Client API s
- Installed and configured Hive, Pig, Sqoop, Flume and Oozie on the Hadoop cluster and automating the jobs using Oozie
- Actively participated in software development lifecycle, including design and code reviews, test development, test automation
- Involved in solution-driven agile development methodology and actively participated in daily scrum meetings
- Monitoring Hadoop cluster using tools like Cloudera Manager
- Automation script to monitor HDFS and HBase through Cron jobs
- Create a complete processing engine, based on Cloudera's distribution, enhanced to performance
Environment: Hadoop, MapReduce, HDFS, Sqoop, HBase, Oozie, SQL, Pig, Flume, Hive, Java.
Confidential
Hadoop Developer
Responsibilities:
- Installed and configured Hadoop MapReduce, HDFS, Developed multiple MapReduce jobs in java for data cleaning and preprocessing
- Installed and configured Apache Hadoop to test the maintenance of log files in Hadoop cluster
- Importing and exporting data into HDFS and Hive using Sqoop
- Experienced in defining job flows and managing and reviewing Hadoop log files
- Load and transform large sets of structured, semi structured and unstructured data
- Responsible to manage data coming from different sources and for implementing MongoDB to store and analyze unstructured data
- Supported Map Reduce Programs those are running on the cluster and involved in loading data from UNIX file system to HDFS
- Installed and configured Hive and written Hive UDFs
- Involved in creating Hive tables, loading with data and writing hive queries that will run internally in map reduce way
- Involved in Hadoop cluster task like Adding and Removing Nodes without any effect to running jobs and data
- Created HBase tables to store variable data formats of PII data coming from different portfolios
- Implemented best income logic using Pig scripts
- Load and transform large sets of structured, semi structured and unstructured data
- Cluster coordination services through Zookeeper
- Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team
- Supported in setting up QA environment and updating configurations for implementing scripts with Pig and Sqoop
- Continuous monitoring and managing the Hadoop cluster using Cloudera Manager
- Used Hibernate ORM framework with Spring framework for data persistence and transaction management and involved in templates and screens in HTML and JavaScript
Environment: Hadoop, HDFS, MapReduce, Pig, Sqoop, Unix, HBase, Java, JavaScript, HTML
Confidential
SQL/Java Developer
Responsibilities:
- Worked with several clients with day to day requests and responsibilities
- Designed and developed Struts like MVC 2 Web framework using the front - controller design pattern, which is used successfully in a number of production systems
- Wrote SQL queries to perform back-end database operations
- Wrote various SQL, PLSQL queries and stored procedures for data retrieval
- Prepared utilities for the Unit -Testing of Application Using JSP and Servlets
- Developed Database applications using SQL and PL/SQL
- Applied design patterns and Object-Oriented design concept to improve the existing Java/J2EE based code base
- Identified and fixed transactional issues due to incorrect exception handling and concurrency issues due to unsynchronized blocks of code
- Resolved product complications at customer sites and narrowed the understanding to the development and deployment teams to adopt long term product development strategy with minimal roadblocks
- Convinced business users and analysts with alternative solutions that are more robust and simpler to implement from technical perspective and satisfying the functional requirements from the business perspective
- Played a crucial role in developing persistence layer
- Analyzed, developed, tuned, tested, debugged and documented processs using technologies SQL, PL/SQL, Informatica, UNIX and Control-M
- Documented technical specs, class diagrams and sequence diagrams, developed technical design documents based on changes. Analyzed Portal capabilities and scalability and identified area where Portal could be used to enhance usability and improve productivity
Environment: Java, J2EE, JSP, Eclipse, SQL, Windows, PL/SQL, Oracle, Informatica, Unix, Control-M.