Hadoop Spark Developer Resume
Las, VegaS
SUMMARY
- 6+ years of IT experience which includes 3+ years of experience in large scale distributed data processing for Big data Analytics using Hadoop ecosystem components like MapReduce, HDFS, HBase, Hive, Sqoop, Pig, Oozie, Zookeeper, Flume, kafka, Spark with scala with CDH4&5 distributions and cloud computing with Microsoft Azure.
- Experienced in development of a framework for common and consistent use of the intended design for batch and real time streaming of data ingestion, data processing, Predictive analysis and delivery of massive datasets.
- Strong experience in data Injection, data storage and data processing using Hadoop Ecosystem tools like Sqoop,Hive, Pig, Spark, MapReduce, Spark Streaming, MapReduce, Flume, Kafka, HBase, Oozie, Zookeeper, and HDFS.
- Good experience in Java, Scala, Python, Unix scripts and writing UDF’s including SQL, JSON, XML.
- Excellent experience in importing and exporting data using Sqoop from RDBMS to HDFS and vice - versa.
- Partner with data scientists for planning and executing predictive analytics, machine learning and deep learning initiatives and good in using Spark MLlib platform.
- Experience in multiple distributions; Horton Works and Cloudera platforms.
- Expertise in in-memory high speed cluster computing technologies Spark and cloud computing.
- Well-versed in and implemented Partitioning, Dynamic-Partitioning and bucketing concepts in Hive to compute data metrics.
- Experience in data modeling, predictive analytics and develop best-practices within data integration, data analytics and operational solutions.
- Extensively worked with SQL Server, HBase, and MySql.
- Excellent understanding and hands on experience using NOSQL databases like Mongo DB and Hbase.
- Strong working in global delivery model (Onsite-Offshore model) involving multiple vendors and cross functional engineering teams.
- Experienced in J2EE, JDBC, Servlets, Struts, Hibernate, Ajax, JavaScript, JQuery, CSS, XML and HTML.
- Experienced in using IDEs like Eclipse, VisualStudio and experience in DBMS like SQLServer and MYSQL.
- Have good knowledge on Gaming and Telecom domains.
- Excellent oral and written communication skills and great team player.
TECHNICAL SKILLS
Big Data: Hive, Pig, Sqoop, Oozie, HBase, Zookeeper, YARN, Kafka, Spark,Scala,flume
Data Science: SparkMLib
Database: SQL server, MySQL
NoSQL Database: HBase
AWS/Cloud Services: Microsoft Azure
IDE/Build Tools: Eclipse, IntelliJ,Maven, SBT
Languages: SQL, PL/SQL, Shell Scripting, Java/J2EE, Python,Pyspark, Scala, JSON, XML
Version Control: GIT, Perfore
Platform: Linux/Unix, Windows
Agile Tools: JIRA
PROFESSIONAL EXPERIENCE
Confidential, Las vegas
Hadoop Spark Developer
Responsibilities:
- Worked extensively with Sqoop for importing and exporting data from SQL Server.
- Implemented Preprocessing steps using Data Frames for Batch Processing
- Analyzing Data issues for the customers and fixing the issues
- Built the summary tables, implemented call prediction models: player gaming summary models with K-Means Cluster in production using Spark MLlib and Scala.
- Worked with Data scientist partner for Predictive analysis, implemented bonus recommendation Engine using Spark MLib, and persisted the recommendation results in HBase
- Bug fixing and QA support
Confidential
Hadoop Spark Developer
Responsibilities:
- Worked extensively with Sqoop for importing and exporting data from SQL Server.
- Implemented Preprocessing steps using Data Frames for Batch Processing
- Built the summary tables, implemented call prediction models: player gaming summary models with K-Means Cluster in production using Spark MLlib and Scala.
- Worked with Data scientist partner for Predictive analysis, implemented bonus recommendation Engine using Spark MLib, and persisted the recommendation results in HBase
- Used Spark-Streaming APIs to perform necessary transformations and actions on the fly for building the common learner data model which gets the data from Kafka in near real time and Persists into Hbase
- Developed data ingestion framework to acquire data from SQL Server, and error handling mechanism.
- Partnered with data scientists to perform data analysis, summary datasets and identifying the call input predictors and machine learning algorithms using RStudio.
- Developed real time ingestion of System and free form remarks/messages using Kafka and Spark Streaming to make sure the events are available in customer’s activity timeline view in real-time.
- Coordinated with Hadoop admin on cluster job performance and security issues, and Hortonworks team to resolve the compatibility and version related issues of HDP, Hive, Spark, Oozie.
- Automated ingestion and prediction process using Oozie workflows, coordinators jobs and supported in running jobs on the cluster.
Environment: HDP, HDFS, Sqoop, Spark, Scala, Kafka, JDK 1.8, Maven, Eclipse, Tableau, SQL Server, Linux, Perforce, and Oozie workflow scheduler and Zookeeper.
Confidential
Hadoop Developer
Responsibilities:
- Responsible for building scalable distributed data solutions using Hadoop.
- Writing shell scripts to monitor the health check of Hadoop daemon services and responding accordingly to any warning or failure conditions.
- Managing and scheduling Jobs on a Hadoop cluster.
- Deployed Hadoop Cluster in the different modes- Standalone, Pseudo-distributed, Fully Distributed
- Replaced default Derby metadata storage system for Hive with MySQL system.
- Executed queries using Hive and developed Map-Reduce jobs to analyze data.
- Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS.
- Implemented best practices to create Hive tables with appropriate partition methods and processing of data be consistent
- With Enterprise standards Developed Scripts and Batch Job to schedule various Hadoop Programs.
- Installed and maintained Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase and Sqoop.
- Developed the Pig UDF’S to pre-process the data for analysis.
- Develop Hive queries for the analysts.
- Writing Hive queries for data analysis to meet the business requirements.
- Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre- processing with Pig.
- Implemented Fair schedulers on the Job tracker to share the resources of the Cluster for the Map Reduce jobs given by the users.
- Took part in monitoring, troubleshooting and managing Hadoop log files.
Environment: CDH, HDFS, Sqoop, Hive, Pig, HBase, Java, Maven, Eclipse, MySQL, SQL Server, LINUX and Oozie and Zookeeper.
Confidential
Java Developer
Responsibilities:
- Involved in the Design, Coding, Testing and Implementation of the web application.
- Developed JSP Java Server Pages starting from HTML and detailed technical design specification documents. Pages included HTML, CSS, JavaScript, Hibernate and JSTL.
- Developed SOAP based requests for communicating with Web Services.
- Used agile systems and strategies to provide quick and feasible solutions, based on agile system, to the organization.
- Implemented HTTP Modules for different applications in Struts Framework that uses Servlets, JSP, ActionForm, ActionClass and ActionMapping
- Developing web applications using MVC Framework, Spring, Struts, Hibernate.
- Analyzed and fixed defects in the Login application.
- Involved in configuration and deployment of application on the JBoss Application.
- Involved in dynamic creation of error elements on demand when there is an error.
- Ensured design consistency with client’s development standards and guidelines.