Sr. Hadoop Developer Resume
SUMMARY:
- Overall 5+ years of professional experience in IT in BIGDATA using HADOOP and experienced in building distributed applications, high quality software, object - oriented methods, project leadership, and rapid reliable development.
- Good knowledge on Hadoop Architecture and its components such as HDFS , MapReduce , JobTracker , TaskTracker , NameNode , DataNode .
- Have solid Background working on DBMS technologies such as Oracle, MY SQL, data warehousing architectures and performed migration from different databases SQL server, Oracle, MYSQL to Hadoop.
- Extensive knowledge in J2EE technologies such as Object Oriented Programming techniques ( OOPS ), JSP , and JDBC .
- In depth and extensive knowledge of analyzing data using HiveQL , PigLatin , HBase and custom MapReduce programs in Java .
- Good understanding of Zookeeper for monitoring and managing Hadoop jobs.
- Having extensive knowledge on RDBMS such as Oracle , MicrosoftSQLServer , MYSQL .
- Experienced in migrating ETL transformations using Pig Latin Scripts, transformations, join operations.
- Hands on experience in different software development approaches such as Spiral, Waterfall & Agile iterative models.
- Extending Pig and Hive core functionality by writing customized User Defined Functions for analysis of data, file processing, by running Pig Latin Scripts .
- Good understanding of NoSQL databases such as HBase , Cassandra and MongoDB .
- Experience with operating systems: Linux , RedHat , and UNIX .
- Supported MapReduce Programs running on the cluster and wrote custom MapReduce Scripts for Data Processing in Java .
- Experience in Developing Spark jobs using Scala in test environment for faster data processing and used SparkSQL for querying.
- Proficient in configuring Zookeeper , Cassandra & Flume to the existing Hadoop cluster .
- Expertise in Web technologies using HTML , XML , JDBC , JSP , JavaScript , AJAX , SOAP .
- Extensive experience in different IDEs like Eclipse , NetBeans .
- Extensive experience in using MVC architecture, Struts, Hibernate for developing web applications using Java, JSPs, JavaScript, HTML, jQuery, AJAX, XML and JSON .
- Excellent Java development skills using J2EE, spring, J2SE, Servlets, JUnit, MRUnit, JSP, JDBC.
- Excellent global exposure to various work cultures and client interaction with diverse teams.
TECHNICAL SKILLS:
Programming Languages: Java, J2EE, C, SQL/PLSQL, PIG LATIN, Scala, HTML, XML.
Web Technologies: JDBC, JSP, JavaScript, AJAX, SOAP, HTML, JSP, JQuery, CSS, XML.
Hadoop/Big Data: Map Reduce, Spark, SparkSQL, PySpark, Spark, Pig, Hive, Sqoop, HBase, Flume, Kafka Cassandra, Yarn, Oozie, Zookeeper, ElasticSearch.
RDBMS Languages: MySQL, PL/SQL, Mongo DB, HBase, Cassandra.
Cloud: Azure, AWS
NoSQL: MongoDB, HBase, Apache Cassandra.
Web/Application servers: Apache Tomcat, WebLogic, JBoss
Tools: .Net Beans, Eclipse, GIT, Putty.
Operating System: Linux, Windows, Ubuntu, Red Hat Linux, UNIX.
Methodologies: Agile, Waterfall model.
PROFESSIONAL SUMMARY:
Sr. Hadoop Developer
Confidential
Responsibilities:
- Defined job work flows as per their dependencies in Oozie.
- Using the Spark framework Enhanced and optimized product Spark code to aggregate, group and run data mining tasks.
- Developing Spark code using scala and Spark-SQL/Streaming for faster testing and processing data
- Good experience in Oozie Framework and Automating daily import jobs.
- Experienced in using Zookeeper and Oozie Operational Services for coordinating the cluster and scheduling workflows.
- Installed and configured Hive, Pig, Sqoop, Flume and Oozie on the Hadoop cluster.
- Worked in complete Software Development Life Cycle(analysis, design, development, testing, implementation and support)using Agile Methodologies
- Regularly tune performance of Hadoop existing spark jobs to improve data processing and retrieving
- Experience in Map Reduce programming model for analyzing the data stored in HDFS.
- Queried both Managed and External tables created by Hive using Impala.
- Data Processing: Processed data using Map Reduce and Yarn. Worked on Kafka as a proof of concept for log processing.
- Developed the Apache Storm, Kafka, and HDFS integration project to do a real-time data analysis.
- Involved in writing shell scripts, Bash scripts for Unix OS for application deployments to production region.
- Developed various Python scripts to find vulnerabilities with SQL Queries by doing SQL injection, permission checks and performance analysis.
- Integrated Oozie with the rest of the Hadoop stack supporting several types of Hadoop jobs out of the box (such as Map-Reduce, Pig, Hive, and Sqoop) as well as system specific jobs (such as Java programs and shell scripts).
Enviro Role: Sr. Hadoop Developer
Confidential, Washington, DC
Responsibilities:
- Gathered User requirements and designed technical and functional specifications.
- Installed, Configured and Maintained Hadoop clusters for application development and Hadoop tools like Hive, PIG, HBase, Zookeeper and Sqoop.
- Loading the data from the different Data sources like (Teradata and DB2) into HDFS using Sqoop and load into Hive tables, which are partitioned.
- Worked on creating Hive tables and written Hive queries for data analysis to meet business requirements and experienced in Sqoop to import and export the data from Oracle & MySQL.
- Extending HIVE and PIG core functionality by using custom User Defined Function’s (UDF), User Defined Table-Generating Functions (UDTF) and User Defined Aggregating Functions (UDAF) for Hive and Pig using python.
- Imported and exported the analysed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
- Worked on importing and exporting data into HDFS and Hive using Sqoop.
- Used Flume to handle streaming data and loaded the data into Hadoop cluster.
- Developed and executed hive queries for de-normalizing the data.
- Developed the Apache Storm, Kafka, and HDFS integration project to do a real-time data analysis.
- Responsible for executing hive queries using Hive Command Line, Web GUI HUE and Impala to read, write and query the data into HBase.
- Developed Oozie Workflows for daily incremental loads, which gets data from Teradata and then imported into hive tables.
- Worked on a POC to compare processing time of Impala with Apache Hive for batch applications to implement the former in project.
- Worked on Cluster of size 130 nodes.
- Designed Apache Airflow entity resolution module for data ingestion into Microsoft SQL Server.
- Developed batch processing pipeline to process data using python and airflow. Scheduled spark jobs using airflow.
- Involved in writing, testing, and running MapReduce pipelines using Apache Crunch.
- Managed, reviewed Hadoop log file, and worked in analysing SQL scripts and designed the solution for the process using Spark.
- Created reports in TABLEAU for visualization of the data sets created and tested native Drill, Impala and Spark connectors.
- Worked hands on No-SQL databases like MongoDB for POC purpose in storing images and URIs.
- Designed and implemented MongoDB and associated RESTful web service.
- Developed various Python scripts to find vulnerabilities with SQL Queries by doing SQL injection, permission checks and performance analysis.
- Handled importing data from different data sources into HDFS using Sqoop and performing transformations using Hive, Map Reduce and then loading data into HDFS.
- Exporting of result set from HIVE to MySQL using Sqoop export tool for further processing.
- Collecting and aggregating large amounts of log data using Flume and staging data in HDFS for further analysis.
Environment : Hadoop, YARN, HBase, Teradata, D2, NoSQL, Kafka, Python, Zookeeper, Oozie, Tableau, Apache Crunch, Apache Storm, MySQL, SQL Server, jQuery, JavaScript, HTML, Ajax and CSS.
Hadoop Developer/Admin
Confidential, Boca Raton, FL
Responsibilities:
- Administered, maintained, provisioned, patched and maintained Cloudera Hadoop clusters on Linux.
- Designed and developed the applications on the data lake to transform the data according business users to perform analytics.
- Developed shell scripts to perform Data Quality validations like Record count, File name consistency, Duplicate File and for creating Tables and views.
- Creating the views by masking PHI Columns for the table, so that data in the view for the PHI columns cannot be seen by unauthorized teams.
- Worked on Parquet File format to get a better storage and performance for publish tables.
- Worked with NoSQL databases like HBase in creating HBase tables to store the audit data of the RAWZ and APPZ tables.
- Responsible to manage data coming from different sources and involved in HDFS maintenance and loading of structured and unstructured data.
- Developed a Restful API using & Scala for tracking open source projects in GitHub and computing the in-process metrics information for those projects.
- Developed analytical components using Scala, Spark, Apache Mesos and Spark Stream.
- Experience in using the Docker container system with the Kubernetes integration
- Developed a Web Application using Java with the Google Web Toolkit API with PostgreSQL
- Used R for prototype on a sample data exploration to identify the best algorithmic approach and then wrote Scala scripts using spark machine learning module.
- Developed MapReduce/Spark Python modules for machine learning & predictive analytics in Hadoop on AWS. Implemented a Python-based distributed random forest via Python streaming.
- Utilized Spark, Scala, Hadoop, HBase, Kafka, Spark Streaming, Caffe, TensorFlow, MLLib, Python, a broad variety of machine learning methods including classifications, regressions, dimensionally reduction etc.
- Excellent understanding / knowledge of Hadoop architecture and various components such as HDFS, HBase, Hive Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce programming paradigm
- Built Kafka-Spark-Cassandra Scala simulator for Met stream, a big data consultancy; Kafka-Spark-Cassandra prototypes.
- Designed a data analysis pipeline in Python, using Amazon Web Services such as S3, EC2 and Elastic Map Reduce
- Implemented applications with Scala along with Akka and Play framework.
- Expert in implementing advanced procedures like text analytics and processing using the in-memory computing capabilities like Apache Spark written in Scala.
- It is python and Scala based analytic system with ML Libraries.
- Worked with NoSQL Platforms and Extensive understanding on relational databases versus No-SQL platforms. Created and worked on large data frames with a schema of more than 300 columns.
- Ingestion of data into Amazon S3 using Sqoop and apply data transformations using Python scripts.
- Creating Hive tables, loading and analyzing data using hive scripts. Implemented Partitioning, Dynamic Partitions in HIVE.
- Deployed and analyzed large chunks of data using HIVE as well as HBase.
- Worked on querying data using Spark SQL on top of pyspark engine.
- Used Amazon EMR to perform the Pyspark Jobs on the Cloud. .
- Created Hive tables to store various data formats of PII data coming from the raw hive tables.
- Developed Sqoop jobs to import/export data from RDBMS to S3 data store.
- Designed and implemented Pyspark UDF's for evaluation, filtering, loading and storing of data.
- Fine-tuning pyspark applications/jobs to improve the efficiency and overall processing time for the pipelines.
- Knowledge of writing Hive queries and running both scripts in tez mode to improve performance on Hortonworks Data Platform.
- Used Microservices architecture, with Spring Boot based services interacting through a combination of REST and Spring Boot.
- Built Spring Boot Microservices for the delivery of software products across the enterprise
- Created the ALB, ELBs and EC2 instances to deploy the applications into cloud environment.
- Providing service discovery for all Microservices using Spring Cloud Kubernetes project
- Developed fully functional responsive modules based on Business Requirements using Scala with Akka.
- Development of new listeners for producers and consumer for both Rabbit MQ and Kafka
- Used Microservices with Spring Boot interacting through a combination of REST and Apache Kafka message brokers.
- Worked in building servers like DHCP, PXE with kick-start, DNS and NFS and used them in building infrastructure in a Linux Environment. Automated build, testing and integration with Ant, Maven and JUnit.
Environment : Apache Hive, Hbase, Pyspark, python, Agile, Stream sets, Bitbucket, Cloudera, Shell Scripting, Amazon EMR, Amazon S3, PyCharm, Jenkins, Scala, Java.
Hadoop Developer
Confidential, Princeton, NJ
Responsibilities:
- Analysed the web log data using the HiveQL to extract number of unique visitors per day, page views, visit duration, most purchased product on website.
- Developed optimal strategies for distributing the web log data over the cluster, importing and exporting the stored web log data into HDFS and Hive using Scoop.
- Hands on experience in loading data from UNIX file system and Teradata to HDFS.
- Install, configure, and operate Apache stack i.e. Hive, HBase, Pig, Sqoop, Zookeeper, Oozie, Flume and Mahout on Hadoop cluster.
- Created a high-level design for the Data Ingestion and data extraction Module, enhancement of Hadoop Map-Reduce job which joins the incoming slices of data and pick only the fields needed for further processing.
- Involved in creating Hive tables, loading with data and writing hive queries that will run internally in map reduce way.
- Involved in writing optimized Pig Script along with involved in developing and testing Pig Latin Scripts
- Tested raw data and executed performance scripts.
- Worked with NoSQL database HBase to create tables and store data.
- Developed and involved in the industry specific UDF (user defined functions)
- Used Flume to collect, aggregate, and store the web log data from various sources like web servers, mobile and network devices and pushed to HDFS.
- Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java map reduce Hive, Pig, and Sqoop.
- Integrated Oozie with the rest of the Hadoop stack supporting several types of Hadoop jobs out of the box (such as Map-Reduce, Pig, Hive, and Sqoop) as well as system specific jobs (such as Java programs and shell scripts).
- Monitored workload, job performance and capacity planning using Cloudera Manager.
- Resolve issues, answer questions, and provide support for users or clients on a day to day basis related to Hadoop and its ecosystem including the HAWQ database.
- Install, configure, and operate data integration and analytic tools i.e. Informatica, Chorus, SQLFire, GemFireXD for business needs.
Environment : Cloudera, MapReduce, Hive, Solr, Pig, HDFS, HBase, Flume, Sqoop, Zookeeper,Python, Flat files, AWS, Teradata, Unix/Linux.
Jr. Java Developer
Confidential
Responsibilities:
- Design and develop Servlets, Session and Entity Beans to implement business logic and deploy them on the JBoss Application Server.
- Developed many JSP pages, used JavaScript for client side validation.
- MVC framework for developing J2EE based web application.
- Involved in all the phases of SDLC including Requirements Collection, Design and Analysis of the Customer specifications, Development and Customization of the Application.
- Developed the User Interface Screens for presentation using Ajax, JSP and HTML.
- Used the JDBC for data retrieval from the database for various inquiries.
- Good experience in writing the stored procedures and using JDBC for the database interaction.
- Developed stored procedures in PL/SQL for Oracle 10g.
- Eclipse is used as an IDE tool to write and debug the application code, SQL developer is used to test and run the SQL statements.
- Implemented client side and server side data validations using the JavaScript.
Environment : Java, Eclipse Galileo, HTML4.0, JavaScript, SQL, PL/SQL, CSS, JDBC, JBoss 4.0, Servlets 2.0, JSP 1.0, Oracle.