Sr. Hadoop Developer Resume

SUMMARY:

Overall 5+ years of professional experience in IT in BIGDATA using HADOOP and experienced in building distributed applications, high quality software, object - oriented methods, project leadership, and rapid reliable development.
Good knowledge on Hadoop Architecture and its components such as HDFS , MapReduce , JobTracker , TaskTracker , NameNode , DataNode .
Have solid Background working on DBMS technologies such as Oracle, MY SQL, data warehousing architectures and performed migration from different databases SQL server, Oracle, MYSQL to Hadoop.
Extensive knowledge in J2EE technologies such as Object Oriented Programming techniques ( OOPS ), JSP , and JDBC .
In depth and extensive knowledge of analyzing data using HiveQL , PigLatin , HBase and custom MapReduce programs in Java .
Good understanding of Zookeeper for monitoring and managing Hadoop jobs.
Having extensive knowledge on RDBMS such as Oracle , MicrosoftSQLServer , MYSQL .
Experienced in migrating ETL transformations using Pig Latin Scripts, transformations, join operations.
Hands on experience in different software development approaches such as Spiral, Waterfall & Agile iterative models.
Extending Pig and Hive core functionality by writing customized User Defined Functions for analysis of data, file processing, by running Pig Latin Scripts .
Good understanding of NoSQL databases such as HBase , Cassandra and MongoDB .
Experience with operating systems: Linux , RedHat , and UNIX .
Supported MapReduce Programs running on the cluster and wrote custom MapReduce Scripts for Data Processing in Java .
Experience in Developing Spark jobs using Scala in test environment for faster data processing and used SparkSQL for querying.
Proficient in configuring Zookeeper , Cassandra & Flume to the existing Hadoop cluster .
Expertise in Web technologies using HTML , XML , JDBC , JSP , JavaScript , AJAX , SOAP .
Extensive experience in different IDEs like Eclipse , NetBeans .
Extensive experience in using MVC architecture, Struts, Hibernate for developing web applications using Java, JSPs, JavaScript, HTML, jQuery, AJAX, XML and JSON .
Excellent Java development skills using J2EE, spring, J2SE, Servlets, JUnit, MRUnit, JSP, JDBC.
Excellent global exposure to various work cultures and client interaction with diverse teams.

TECHNICAL SKILLS:

Programming Languages: Java, J2EE, C, SQL/PLSQL, PIG LATIN, Scala, HTML, XML.

Web Technologies: JDBC, JSP, JavaScript, AJAX, SOAP, HTML, JSP, JQuery, CSS, XML.

Hadoop/Big Data: Map Reduce, Spark, SparkSQL, PySpark, Spark, Pig, Hive, Sqoop, HBase, Flume, Kafka Cassandra, Yarn, Oozie, Zookeeper, ElasticSearch.

RDBMS Languages: MySQL, PL/SQL, Mongo DB, HBase, Cassandra.

Cloud: Azure, AWS

NoSQL: MongoDB, HBase, Apache Cassandra.

Web/Application servers: Apache Tomcat, WebLogic, JBoss

Tools: .Net Beans, Eclipse, GIT, Putty.

Operating System: Linux, Windows, Ubuntu, Red Hat Linux, UNIX.

Methodologies: Agile, Waterfall model.

PROFESSIONAL SUMMARY:

Sr. Hadoop Developer

Confidential

Responsibilities:

Defined job work flows as per their dependencies in Oozie.
Using the Spark framework Enhanced and optimized product Spark code to aggregate, group and run data mining tasks.
Developing Spark code using scala and Spark-SQL/Streaming for faster testing and processing data
Good experience in Oozie Framework and Automating daily import jobs.
Experienced in using Zookeeper and Oozie Operational Services for coordinating the cluster and scheduling workflows.
Installed and configured Hive, Pig, Sqoop, Flume and Oozie on the Hadoop cluster.
Worked in complete Software Development Life Cycle(analysis, design, development, testing, implementation and support)using Agile Methodologies
Regularly tune performance of Hadoop existing spark jobs to improve data processing and retrieving
Experience in Map Reduce programming model for analyzing the data stored in HDFS.
Queried both Managed and External tables created by Hive using Impala.
Data Processing: Processed data using Map Reduce and Yarn. Worked on Kafka as a proof of concept for log processing.
Developed the Apache Storm, Kafka, and HDFS integration project to do a real-time data analysis.
Involved in writing shell scripts, Bash scripts for Unix OS for application deployments to production region.
Developed various Python scripts to find vulnerabilities with SQL Queries by doing SQL injection, permission checks and performance analysis.
Integrated Oozie with the rest of the Hadoop stack supporting several types of Hadoop jobs out of the box (such as Map-Reduce, Pig, Hive, and Sqoop) as well as system specific jobs (such as Java programs and shell scripts).

Enviro Role: Sr. Hadoop Developer

Confidential, Washington, DC

Responsibilities:

Gathered User requirements and designed technical and functional specifications.
Installed, Configured and Maintained Hadoop clusters for application development and Hadoop tools like Hive, PIG, HBase, Zookeeper and Sqoop.
Loading the data from the different Data sources like (Teradata and DB2) into HDFS using Sqoop and load into Hive tables, which are partitioned.
Worked on creating Hive tables and written Hive queries for data analysis to meet business requirements and experienced in Sqoop to import and export the data from Oracle & MySQL.
Extending HIVE and PIG core functionality by using custom User Defined Function’s (UDF), User Defined Table-Generating Functions (UDTF) and User Defined Aggregating Functions (UDAF) for Hive and Pig using python.
Imported and exported the analysed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
Worked on importing and exporting data into HDFS and Hive using Sqoop.
Used Flume to handle streaming data and loaded the data into Hadoop cluster.
Developed and executed hive queries for de-normalizing the data.
Developed the Apache Storm, Kafka, and HDFS integration project to do a real-time data analysis.
Responsible for executing hive queries using Hive Command Line, Web GUI HUE and Impala to read, write and query the data into HBase.
Developed Oozie Workflows for daily incremental loads, which gets data from Teradata and then imported into hive tables.
Worked on a POC to compare processing time of Impala with Apache Hive for batch applications to implement the former in project.
Worked on Cluster of size 130 nodes.
Designed Apache Airflow entity resolution module for data ingestion into Microsoft SQL Server.
Developed batch processing pipeline to process data using python and airflow. Scheduled spark jobs using airflow.
Involved in writing, testing, and running MapReduce pipelines using Apache Crunch.
Managed, reviewed Hadoop log file, and worked in analysing SQL scripts and designed the solution for the process using Spark.
Created reports in TABLEAU for visualization of the data sets created and tested native Drill, Impala and Spark connectors.
Worked hands on No-SQL databases like MongoDB for POC purpose in storing images and URIs.
Designed and implemented MongoDB and associated RESTful web service.
Developed various Python scripts to find vulnerabilities with SQL Queries by doing SQL injection, permission checks and performance analysis.
Handled importing data from different data sources into HDFS using Sqoop and performing transformations using Hive, Map Reduce and then loading data into HDFS.
Exporting of result set from HIVE to MySQL using Sqoop export tool for further processing.
Collecting and aggregating large amounts of log data using Flume and staging data in HDFS for further analysis.

Environment : Hadoop, YARN, HBase, Teradata, D2, NoSQL, Kafka, Python, Zookeeper, Oozie, Tableau, Apache Crunch, Apache Storm, MySQL, SQL Server, jQuery, JavaScript, HTML, Ajax and CSS.

Hadoop Developer/Admin

Confidential, Boca Raton, FL

Responsibilities:

Administered, maintained, provisioned, patched and maintained Cloudera Hadoop clusters on Linux.
Designed and developed the applications on the data lake to transform the data according business users to perform analytics.
Developed shell scripts to perform Data Quality validations like Record count, File name consistency, Duplicate File and for creating Tables and views.
Creating the views by masking PHI Columns for the table, so that data in the view for the PHI columns cannot be seen by unauthorized teams.
Worked on Parquet File format to get a better storage and performance for publish tables.
Worked with NoSQL databases like HBase in creating HBase tables to store the audit data of the RAWZ and APPZ tables.
Responsible to manage data coming from different sources and involved in HDFS maintenance and loading of structured and unstructured data.
Developed a Restful API using & Scala for tracking open source projects in GitHub and computing the in-process metrics information for those projects.
Developed analytical components using Scala, Spark, Apache Mesos and Spark Stream.
Experience in using the Docker container system with the Kubernetes integration
Developed a Web Application using Java with the Google Web Toolkit API with PostgreSQL
Used R for prototype on a sample data exploration to identify the best algorithmic approach and then wrote Scala scripts using spark machine learning module.
Developed MapReduce/Spark Python modules for machine learning & predictive analytics in Hadoop on AWS. Implemented a Python-based distributed random forest via Python streaming.
Utilized Spark, Scala, Hadoop, HBase, Kafka, Spark Streaming, Caffe, TensorFlow, MLLib, Python, a broad variety of machine learning methods including classifications, regressions, dimensionally reduction etc.
Excellent understanding / knowledge of Hadoop architecture and various components such as HDFS, HBase, Hive Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce programming paradigm
Built Kafka-Spark-Cassandra Scala simulator for Met stream, a big data consultancy; Kafka-Spark-Cassandra prototypes.
Designed a data analysis pipeline in Python, using Amazon Web Services such as S3, EC2 and Elastic Map Reduce
Implemented applications with Scala along with Akka and Play framework.
Expert in implementing advanced procedures like text analytics and processing using the in-memory computing capabilities like Apache Spark written in Scala.
It is python and Scala based analytic system with ML Libraries.
Worked with NoSQL Platforms and Extensive understanding on relational databases versus No-SQL platforms. Created and worked on large data frames with a schema of more than 300 columns.
Ingestion of data into Amazon S3 using Sqoop and apply data transformations using Python scripts.
Creating Hive tables, loading and analyzing data using hive scripts. Implemented Partitioning, Dynamic Partitions in HIVE.
Deployed and analyzed large chunks of data using HIVE as well as HBase.
Worked on querying data using Spark SQL on top of pyspark engine.
Used Amazon EMR to perform the Pyspark Jobs on the Cloud. .
Created Hive tables to store various data formats of PII data coming from the raw hive tables.
Developed Sqoop jobs to import/export data from RDBMS to S3 data store.
Designed and implemented Pyspark UDF's for evaluation, filtering, loading and storing of data.
Fine-tuning pyspark applications/jobs to improve the efficiency and overall processing time for the pipelines.
Knowledge of writing Hive queries and running both scripts in tez mode to improve performance on Hortonworks Data Platform.
Used Microservices architecture, with Spring Boot based services interacting through a combination of REST and Spring Boot.
Built Spring Boot Microservices for the delivery of software products across the enterprise
Created the ALB, ELBs and EC2 instances to deploy the applications into cloud environment.
Providing service discovery for all Microservices using Spring Cloud Kubernetes project
Developed fully functional responsive modules based on Business Requirements using Scala with Akka.
Development of new listeners for producers and consumer for both Rabbit MQ and Kafka
Used Microservices with Spring Boot interacting through a combination of REST and Apache Kafka message brokers.
Worked in building servers like DHCP, PXE with kick-start, DNS and NFS and used them in building infrastructure in a Linux Environment. Automated build, testing and integration with Ant, Maven and JUnit.

Environment : Apache Hive, Hbase, Pyspark, python, Agile, Stream sets, Bitbucket, Cloudera, Shell Scripting, Amazon EMR, Amazon S3, PyCharm, Jenkins, Scala, Java.

Hadoop Developer

Confidential, Princeton, NJ

Responsibilities:

Analysed the web log data using the HiveQL to extract number of unique visitors per day, page views, visit duration, most purchased product on website.
Developed optimal strategies for distributing the web log data over the cluster, importing and exporting the stored web log data into HDFS and Hive using Scoop.
Hands on experience in loading data from UNIX file system and Teradata to HDFS.
Install, configure, and operate Apache stack i.e. Hive, HBase, Pig, Sqoop, Zookeeper, Oozie, Flume and Mahout on Hadoop cluster.
Created a high-level design for the Data Ingestion and data extraction Module, enhancement of Hadoop Map-Reduce job which joins the incoming slices of data and pick only the fields needed for further processing.
Involved in creating Hive tables, loading with data and writing hive queries that will run internally in map reduce way.
Involved in writing optimized Pig Script along with involved in developing and testing Pig Latin Scripts
Tested raw data and executed performance scripts.
Worked with NoSQL database HBase to create tables and store data.
Developed and involved in the industry specific UDF (user defined functions)
Used Flume to collect, aggregate, and store the web log data from various sources like web servers, mobile and network devices and pushed to HDFS.
Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java map reduce Hive, Pig, and Sqoop.
Integrated Oozie with the rest of the Hadoop stack supporting several types of Hadoop jobs out of the box (such as Map-Reduce, Pig, Hive, and Sqoop) as well as system specific jobs (such as Java programs and shell scripts).
Monitored workload, job performance and capacity planning using Cloudera Manager.
Resolve issues, answer questions, and provide support for users or clients on a day to day basis related to Hadoop and its ecosystem including the HAWQ database.
Install, configure, and operate data integration and analytic tools i.e. Informatica, Chorus, SQLFire, GemFireXD for business needs.

Environment : Cloudera, MapReduce, Hive, Solr, Pig, HDFS, HBase, Flume, Sqoop, Zookeeper,Python, Flat files, AWS, Teradata, Unix/Linux.

Jr. Java Developer

Confidential

Responsibilities:

Design and develop Servlets, Session and Entity Beans to implement business logic and deploy them on the JBoss Application Server.
Developed many JSP pages, used JavaScript for client side validation.
MVC framework for developing J2EE based web application.
Involved in all the phases of SDLC including Requirements Collection, Design and Analysis of the Customer specifications, Development and Customization of the Application.
Developed the User Interface Screens for presentation using Ajax, JSP and HTML.
Used the JDBC for data retrieval from the database for various inquiries.
Good experience in writing the stored procedures and using JDBC for the database interaction.
Developed stored procedures in PL/SQL for Oracle 10g.
Eclipse is used as an IDE tool to write and debug the application code, SQL developer is used to test and run the SQL statements.
Implemented client side and server side data validations using the JavaScript.

Environment : Java, Eclipse Galileo, HTML4.0, JavaScript, SQL, PL/SQL, CSS, JDBC, JBoss 4.0, Servlets 2.0, JSP 1.0, Oracle.

We provide IT Staff Augmentation Services!

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship