Hadoop Developer Resume
WA
SUMMARY
- Overall 8+ years of software programming experience in various phases of SDLC with 5+ years of Experience in Hadoop ecosystems and 2+ years of experience in development of Web applications built on Java and J2EE in Multitier environment.
- Experienced in building solutions to handle Bigdata using Hadoop with multiple distributions i.e. Cloudera, Horton works and NoSQL platforms (HBase).
- Ingesting data from different sources like Amazon web services (S3, SQS, SNS, RDS), Salesforce, Oracle, MySQL, Connect Enterprise to HDFS and HBase using Hortonworks Dataflow(HDF) and Kafka connectors.
- Implemented complex real time streaming and batch applications using Hadoop, Spark, NiFi, Kafka, RabbitMq, Teradata, REST micro services and develop Business Intelligence (ETL) applications for serving Finance system Reporting needs on AWS Cloud Computing services.
- Experienced in implementing Cloudera impala and Hortonworks phoenix for faster querying.
- Hands - on experience working with Amazon AWS concepts like EMR, amp and EC2 web services which provide fast and efficient processing of Big Data.
- Experienced in Migrating existing MR programs to PySpark Models.
- Experienced in designing and implementing Spark programs using Python.
- Hands-on experience on Hive HBase integration and HBase Phoenix integration.
- Developing Shell and Python scripts to suffice business requirements.
- Worked on different formats of data like JSON, xml and csv and loaded to Hive table in ORC format with Snappy compression.
- Experience in hive performance tuning using bucketing and partitioning.
- Experience in using IDE's such as Eclipse, NetBeans for debugging and using java coding standards.
- Hands on Experience on Databases: Teradata, Netezza, Oracle, MS SQL Server, MySQL and DB2, also developed stored procedures and queries using PL/SQL.
- Knowledge on statistical data mining/ data science tools like SAS, Weka, R Studio, RapidMiner.
- Working knowledge on Waterfall and Agile software development model and project planning using Microsoft Project Planner and JIRA.
TECHNICAL SKILLS
Hadoop/BigData Technologies: Hadoop (Horton works, Cloudera) HDFS, Map Reduce, Pig, HBase, Kafka, Spark, Zookeeper, Hive, Phoenix, Oozie, Sqoop, Flume, Storm, Impala, Solr and Tableau, NiFi.Programming Languages Python, SQL, R, PL/SQL, Java JDK1.4/1.5/1.6 (JDK 5/JDK 6), C/C++, HTML, AVS & JVS
Client Technologies: JQUERY, Java Script, AJAX, CSS, HTML 5, XHTML
Operating Systems: UNIX, Windows, LINUX
Application Servers: IBM Web sphere, Tomcat, Web Logic, Web Sphere
Databases: Teradata, Oracle 8i/9i/10g,Netezza & MySQL 4.x/5.x
Java IDE: Eclipse 3.x, IBM Web Sphere Application Developer, IBM RAD 7.0
Development Tools: TOAD, SQL Developer, Jenkins, Nexus, Maven.
PROFESSIONAL EXPERIENCE
Hadoop Developer
Confidential, WA
Responsibilities:
- Worked in Finance and Payment domain.
- Developed PySpark applications to migrate Historical and incremental data from file based, RDBMS and streaming data into Hadoop.
- Worked on Oozie to automate data loading into different buckets in AWS S3 and Pig to pre-process the data.
- Worked on converting the existing MR jobs in Hadoop to Spark workflows using Spark SQL, Hive and Python.
- Teamed up with the Solution architects to build and automate PySpark workflows to perform ETL actions on huge chunks of data.
- Used PySpark to load JSON data and created schema RDD to handled structured data.
- Developed applications using PySpark to ingest json documents from MongoDB, transformed and loaded into Hive tables in ORC format.
- Designed and developed pipelines which integrate DEEP.io RabbitMQ, NiFi and AWS for processing event based streams.
- Developed NIFI Workflows to parse log messages using inbuilt and custom processors to ingest into various Kafka topics.
- Focused on improving the performance and optimization of the Spark workflows.
- Developed API’s with Rest services using Spring Boot to stream events in real time from external systems to Hadoop.
- Automated Spark jobs by scheduling them using Control-M Tool.
- Experience in resolving Spark and Yarn resource management issues in Spark including Shuffle issues, Out of Memory issues, heap space errors, Null Pointer Exceptions and schema compatibility.
- Implemented Audit Balance Control functionality to keep the data quality in check.
- Improved performance of existing batch jobs running using Pig, Oozie, Hive, Shell scripts for fixing bugs and new functionalities.
- Successfully delivered projects working in two week Agile sprints.
- Trained and adopted DevOps model by gaining SOX compliance to develop in development environment and deploy in Production environment as well there by owning the jobs and checking for any failure notifications on daily basis.
- Worked in fully developed agile model improving performance Sprint by Sprint by performing better during Sprint Plannings, reviews, Retrospectives and Grooming sessions.
Environment: AWS, Hadoop, Map Reduce, HDFS, Ambari, Hive, Phoenix, Pig, Sqoop, Oozie, Spark, Python, Solr, SQL, Java (jdk 1.6), GIT, Jenkins, Nexus, and Control-M.
Spark Developer
Confidential, NC
Responsibilities:
- Teamed up with Architects to design Spark model for the existing MapReduce model.
- Performed source data transformations using Hive.
- Migrated Existing MapReduce programs to Spark Models using Python.
- Used RDD's to perform transformation on datasets as well as to perform actions like count, reduce, first.
- Implemented various checkpoints on RDD's to disk to handle job failures and debugging.
- Reduced the model run times by performance tuning of the models for the business to run the model hundreds of times a day.
- Good knowledge on Spark platform parameters like memory, cores and executors.
- Used Spark DataFrame API over Cloudera platform to perform analytics on hive data.
- Used Spark Streaming and the Akka actor system so actors can be used as receivers for incoming stream.
- Designed and built signup and login pages using HTML and JavaScript.
- Implemented set up of multi node cluster on Amazon EC2 instance using AWS.
- Experienced in fixing various production issues during user acceptance test.
- Implemented Cloudera Impala on top of hive for faster querying for user.
- Used version control tools like SVN, Nexus for deployments.
- Created Autosys jobs for automation for workflows.
- Streamlined logfiles using Apache Storm.
- Designed BI Tableau reports for the business users.
Environment: Hadoop, Map Reduce, HDFS, Hive, Imapla, Sqoop, Oozie, Spark, HQL, Java (jdk 1.6), Tableau, Eclipse, SVN, Nexus, Autosys, Scala.
Hadoop Developer
Confidential, WA
Responsibilities:
- Teamed up with Data Architects and Source System Analysts to map the data from various sources to target attributes.
- Knowledge on integration of various data sources to a Hadoop Data Lake in a Communication model.
- Imported structured data from file systems using file copy component and RDBMS through Apache Sqoop import.
- Imported streaming data using Apache Storm and Apache Kafka into HBase and designed Hive tables on top.
- Implemented Pre-Preparation layer using Data profiling as specified in the Source to Target Mapping document.
- Handled various special character issues during ingestion using Apache Sqoop and Pig cleansing techniques.
- Wrote workflows which include data cleansing Pig actions and hive actions.
- Developed Java UDF's for Date conversions and to generate MD5 checksum value.
- Created Phoenix tables and Phoenix queries on top of HBase tables to boost query performance.
- Implemented Apache Spark- Python data processing project to handle data from RDBMS and streaming sources.
- Designed batch processing jobs using Apache Spark using Python to increase speeds by ten-fold compared to that of MR jobs.
- Designed custom Spark REPL application to handle similar datasets.
- Developed Spark SQL to load tables into HDFS to run select queries on top.
- Used Spark Streaming to divide streaming data into batches as an input to spark engine for batch processing.
- Reduced Code redundancy by identifying and parameterizing frequently used values in the shell script.
- Knowledge on implementing various SCD user requirements.
- Optimized Pig joins to increase performance and debugged the scripts to avoid cross joins.
- Scheduled the jobs using Apache Oozie to run during off-peak hours for load management on Cluster.
- Prepared and scheduled dispatch jobs which include Teradata BTEQ scripts to load the processed data into Teradata.
- Used Resource Manager for monitoring Job status and for debugging the Mapper/Reducer failures.
- Mentored EQM team for creating Hive queries to test use cases.
- Worked with Application Support team to ensure correct deployment of various components of code and to schedule them in TIDAL/Ctrl-M.
Environment: Hadoop, Map Reduce, HDFS, Ambari, Hive, Phoenix, Pig, Sqoop, Oozie, Spark, Impala, Python, Solr, SQL, Java (jdk 1.6),Tableau, Eclipse, Jenkins, Nexus, TIDAL and Control-M.
Hadoop Developer
Confidential, Atlanta, GA
Responsibilities:
- Implemented business logic by writing Pig UDFs in Java and used various UDFs from Piggybanks and other sources.
- Implemented set up of multi node cluster on Amazon EC2 instance using AWS.
- Knowledge on Real time data processing using Apache Kafka and Storm.
- Implemented Hive Generic UDF's to implement business logic.
- Created Hive tables on top of HDFS files and designed queries to run on top.
- Worked on NoSQL database like HBase and created hive tables on top.
- Designed UDF's in Pig to pre-process the data for analysis.
- Exported the cleaned data to the relational databases using Apache Sqoop for the BI team to generate reports and Visualize results.
- Coordinated with end users for designing and implementation of analytics solutions for User Based Recommendations using R as per project proposals.
- Extended Hive and Pig core functionality by designing custom UDFs
- Used Impala for faster querying purposes.
- Implemented test scripts to support test driven development and continuous integration.
Environment: Hadoop, Map Reduce, HDFS, Hive, Pig, Sqoop, Spark, Oozie, GemFire XD, SQL, Java (jdk 1.6), Eclipse, Dashboard, R, Rstudio, BI.
Java Developer
Confidential
Responsibilities:
- Designed new JSP's for the front end using HTML,JavaScript, Jquery, and Ajax.
- Implemented the Struts frame work with MVC architecture.
- Developing JSP pages and configuring the module in the application.
- Developed DAOs (Data Access Object) using Hibernate as ORM to interact with DBMS - Oracle.
- Applied J2EE design patterns like Business Delegate, DAO and Singleton.
- Deployed and tested the application using Tomcat web server.
- Performed client side validation usingjavascripts.
- Developed DAO's using JDBC.
- Used JBOSS for application deployment and MySQL for database.
- Used JUnit for evaluating UC test cases for the integration testing tool.
- Writing SQL queries to fetch the business data using Oracle as database.
- Developed UI for Customer Service Modules and Reports using JSF, JSP's and My Faces Components.
- Knowledge on Big data tools and BI reporting.
- Used Log4j for logging the application log of the running system to trace the errors and certain automated routine functions.
Environment: Java 1.5, J2EE1.4, JDBC, JSP, Servlets, Spring 3.x, Hibernate, JavaScript, JNDI, LOG4J, Ant, BI, LINUX and amazon cloud tomcat server.