Senior Hadoop Developer Resume
Charlotte, NC
SUMMARY:
- Over 7+ years of IT professional experience including 4+ years of Big Data Ecosystem related technologies with full project development, implementation and deployment on Linux/Windows/Unix.
- Experience in Big data analytics implementation using Hadoop, HDFS, Java, Hive, Impala, Sqoop, Spark, Scala, HBase, Kafka, Oozie, Pig and Map Reduce Programming.
- Excellent knowledge on Hadoop Ecosystem Architecture and components such as Hadoop Distributes File System (HDFS), MRv1, MRv2, Job Tracker, Task Tracker, Name Node, Data Node, Resource Manager, Node Manager and MapReduce programming.
- Developed Map Reduce jobs based on the use cases using Java, Map Reduce, Pig and Hive.
- Experience in loading data using Hive and writing scripts for data transformations using Hive and Pig.
- Knowledge and working experience in developing Apache Spark programs using Scala.
- Good Knowledge in Spark SQL queries to load tables into HDFS to run select queries on top.
- Hands - on experience with message broker such as Apache Kafka.
- Experience in writing workflows using Oozie and automating them with Autosys scheduling.
- Experience in creating Impala views on hive tables for fast access to data.
- Developed UDF functions and implemented it in HIVE Queries.
- Developed PIG Latin scripts for handling business transformations.
- Implemented SQOOP for large dataset transfer between Hadoop and RDBMs.
- Experience with Oozie Workflow Engine in running workflow jobs with actions that run Hadoop Map/Reduce and Pig jobs.
- Knowledge on installation and administration of multi-node virtualized clusters using Cloudera Hadoop and Apache Hadoop.
- Knowledge on Amazon Web Services (AWS).
- Hands on experience working on CSV, AVRO, Parquet file Formats.
- Good understanding of NoSQL databases and hands on work experience in writing applications on NoSQL databases like HBase.
- Experience in Database design, Data analysis, Programming SQL, Stored procedure's PL/ SQL, and Triggers in Oracle and SQL Server.
- Working knowledge of Databases like Oracle, Netezza, MySQL and Teradata.
- Experience in using IDEs like Eclipse and NetBeans.
- Extensive programming experience in developing Java applications using Core Java, J2EE and JDBC.
- Well versed with UNIX and Linux command line and shell script.
- Extensive experience worked on SVN and Clear Case for Source Controlling.
- Adequate knowledge and working experience with Agile methodology.
- Research-oriented, motivated, proactive, self-starter with strong technical, analytical and interpersonal skills.
- Project management skills like schedule planning, Offshore Team management, and design presentation.
TECHNICAL SKILLS:
Languages: Java, Scala, SQL
Big Data Technologies: Hadoop, HDFS, MapReduce, Hive, Pig, Sqoop, Spark, Kafka, Oozie, Flume, Impala, HBase, CDH5
Databases: Oracle 9i/10g/11g, MySQL, Netezza and Teradata
Scripting Languages: Shell Scripting, Python
IDE Tools: Eclipse, Net Beans
DB Tools: TOAD, SQL Assistant
Operating Systems: UNIX, LINUX, Windows
PROFESSIONAL EXPERIENCE:
Senior Hadoop Developer
Confidential, Charlotte, NC
Responsibilities:
- Assess current and future ingestion requirements, review data sources, data formats and recommend processes for loading data into Hadoop.
- Developed ETL Applications using HIVE, SPARK, IMPALA & SQOOP and Automated using Oozie workflows and Shell scripts with error handling Systems and scheduled using Autosys.
- Built Sqoop jobs to import massive amounts of data from relational databases - Teradata & Netezza -and back-populate on Hadoop platform.
- Working on creating a common workflow to convert from EBCDIC format to ASCII from the Mainframe sources to a delimited file in the Avro format to HDFS.
- Worked on Avro and Parquet File Formats with snappy compression.
- Developing scripts to perform business transformations on the data using Hive and PIG.
- Creating Impala views on top of Hive tables for faster access to analyze data through HUE/TOAD.
- Connected Impala with different BI tools like TOAD and Sql Assistant to help modeling team to run the different RISK models.
- POC on Spark, working on Spark programs using Scala and Spark SQL for developing Business Reports.
- Configured Spark streaming to receive real time data from the Kafka and store the stream data to HDFS using Scala.
- Developing Bteq scripts for moving data from staging table to final tables in Teradata as part of automation.
- Support architecture, design review, code review, and best practices to implement a Hadoop architecture.
Environment: CDH 4, CDH5, HDFS, Map Reduce, Hive, Pig, Sqoop, Oozie, Impala, Spark, Kafka, Teradata, Linux, Java, Eclipse, SQL Assistant, TOAD
Senior Hadoop Developer
Confidential,Minneapolis, MN
Responsibilities:
- Primary responsibilities include building scalable distributed data solutions using Hadoop ecosystem
- Installed and configured Hive, Pig, Sqoop, Flume and Oozie on the Hadoop cluster
- Developed Simple to complex Map/reduce streaming jobs using Java that are implemented using Hive and Pig.
- Optimized Map/Reduce Jobs to use HDFS efficiently by using various compression mechanisms
- Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted the data from MySQL into HDFS using Sqoop
- Analyzed the data by performing Hive queries (HiveQL) and running Pig scripts (Pig Latin) to study customer behavior.
- Used Hive to analyze data ingested into HBase by using Hive-HBase integration and compute various metrics for reporting on the dashboard.
- Continuous monitoring and managing the Hadoop cluster using Cloudera Manager
- Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required
- Installed Oozie workflow engine to run multiple Hive and Pig jobs
- Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team
- Experienced on loading and transforming of large sets of structured, semi structured and unstructured data.
- Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
- Written multiple MapReduce programs in Java for data extraction, transformation and aggregation from multiple file formats including XML, JSON, CSV and other compressed file formats
Environment: Hadoop, PIG, Hive, Apache Sqoop, Oozie, HBase, Kafka, Zoo keeper, Cloudera manager, 30 Node cluster with Linux-Ubuntu.
Senior Hadoop Developer
Confidential, Alpharetta, GA
Responsibilities:
- Involved in defining job flows, managing and reviewing log files.
- Supported Map Reduce Programs those are running on the cluster.
- As a Big Data Developer, implemented solutions for ingesting data from various sources and processing the Data-at-Rest utilizing Big Data technologies such as Hadoop, MapReduce Frameworks, HBase, Hive, Oozie, Flume, Kafka, Sqoop etc.
- Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
- Imported Bulk Data into HBase Using Map Reduce programs.
- Developed and written Apache PIG scripts and HIVE scripts to process the HDFS data.
- Perform analytics on Time Series Data exists in HBase using HBase API.
- Designed and implemented Incremental Imports into Hive tables.
- Involved in collecting, aggregating and moving data from servers to HDFS using Apache Flume.
- Written Hive jobs to parse the logs and structure them in tabular format to facilitate effective querying on the log data.
- Involved with File Processing using Pig Latin.
- Involved in creating Hive tables, loading with data and writing hive queries that will run internally in map reduce way.
- Experience in optimization of Map reduce algorithm using combiners and partitions to deliver the best results and worked on Application performance optimization for a HDFS cluster.
- Implemented business logic by writing UDFs in Java and used various UDFs from Piggybanks and other sources.
- Worked on debugging, performance tuning of Hive & Pig Jobs
- Used Hive to find correlations between customer's browser logs in different sites and analyzed them to build risk profile for such sites.
- Created and maintained Technical documentation for launching HADOOP Clusters and for executing Hive queries and Pig Scripts.
Environment: Java, Hadoop 2.1.0, Map Reduce2, Pig 0.12.0, Hive 0.13.0, Linux, Sqoop 1.4.2, Flume 1.3.1, Eclipse, AWS EC2, and Cloudera CDH 4.
Hadoop Developer
Confidential, Memphis,TN
Responsibilities:
- Migrated the needed data from MySQL into HDFS using Sqoop and importing various formats of flat files in to HDFS.
- Mainly worked on Hive queries to categorize data of different claims.
- Involved in loading data from LINUX file system to HDFS
- Written customized Hive UDFs in Java where the functionality is too complex.
- Implemented Partitioning, Dynamic Partitions, Buckets in HIVE.
- Designing and creating Hive external tables using shared meta-store instead of derby with partitioning, dynamic partitioning and buckets.
- Generate final reporting data using Tableau for testing by connecting to the corresponding Hive tables using Hive ODBC connector.
- Responsible to manage the test data coming from different sources
- Reviewing peer table creation in Hive, data loading and queries.
- Weekly meetings with technical collaborators and active participation in code review sessions with senior and junior developers.
- Monitored System health and logs and respond accordingly to any warning or failure conditions.
- Gained experience in managing and reviewing Hadoop log files.
- Involved in scheduling Oozie workflow engine to run multiple Hive and pig jobs
- Involved unit testing, interface testing, system testing and user acceptance testing of the workflow tool.
- Created and maintained Technical documentation for launching Hadoop Clusters and for executing Hive queries and Pig Scripts
Environment: Apache Hadoop, HDFS, Hive, Map Reduce, Core Java, Pig, Sqoop, Cloudera CDH4, Oracle, MySQL
Java/Hadoop Developer
Confidential, Edina, MN
Responsibilities:
- Designed and developed batch processing using multi-threading to process payments and restoration of services.
- Designed and developed presentation layer using JSP, HTML and Struts for Reporting module.
- Used Struts frame work to maintain MVC and created action forms, action mappings, DAOs, application properties for Internationalization etc.
- Created Materialized views to replicate master tables of Billing system.
- Developed SQL queries to store and retrieve data from database & used PL SQL.
- Developed a common Data Integrity (DI) Framework in Java to compare the table count between source and destination tables.
- Developed UDFs in Java as a utility functions for Hive and Pig.
- Responsible for writing Hadoop MapReduce jobs in Java.
- Used Eclipse IDE to develop the application and involved in building the code & deploying on the Apache Tomcat application server.
- Involved in writing test cases for application using JUnit and support application through debugging, bug fixing and Production Support.
Environment: Apache Hadoop, core Java, J2EE, JSP, Oracle, Apache Tomcat, Junit, Eclipse Helios, Putty, Java Script, HTML, CSS, AJAX, SVN, Struts, XML.
Java Developer
Confidential
Responsibilities:
- Involved in Requirements analysis, design, and development and testing.
- Involved in setting up the different roles & maintained authentication to the application.
- Designed, deployed and tested Multi-tier application using the Java technologies.
- Involved in front end development using JSP, HTML & CSS.
- Implemented the Application using Servlets
- Deployed the application on Oracle Web logic server
- Implemented Multithreading concepts in java classes to avoid deadlocking.
- Used MySQL database to store data and execute SQL queries on the backend.
- Prepared and Maintained test environment.
- Tested the application before going live to production.
- Documented and communicated test result to the team lead on daily basis.
- Involved in weekly meeting with team leads and manager to discuss the issues and status of the projects.
Environment: J2EE (Java, JSP, JDBC, Multi-Threading), HTML, Oracle Web logic server, Eclipse, MySQL, Junit.