Senior Hadoop Developer Resume
Wells, FargO
SUMMARY:
- Over 7+ years of IT professional experience including 4+ years of Big Data Ecosystem related technologies with full project development, implementation and deployment on Linux/Windows/Unix.
- Experience in Big data analytics implementation using Hadoop, HDFS, Spark, Hive, Sqoop, Impala, Kafka, Java, Scala, HBase, Oozie, Pig and Map Reduce Programming.
- Experience working on Data Ingestion from various RDBMS Data Sources into Hadoop Data Lake using Sqoop and loading data using Hive Tables with data transformations.
- Experience in developing Spark Applications using Scala.
- Experience in Spark SQL queries to load tables into HDFS using DataFrames for data enrichment.
- Hands - on experience with message broker such as Apache Kafka and integration with Apache Nifi for Spark Steaming.
- Experience in writing workflows using Oozie and automating them with Autosys scheduling.
- Developed UDF functions and implemented it in HIVE Queries.
- Developed PIG Latin scripts for handling business transformations.
- Experience working on different Enterprise Hadoop Distributions like Cloudera (CDH) and Hortonworks (HDP/HDF)
- Knowledge on installation and administration of multi-node virtualized clusters using Cloudera Hadoop and Apache Hadoop.
- Experience working with Amazon Web Services (AWS) like EC2, S3, EMR, RDS and VPC.
- Hands on experience working on CSV, AVRO, ORC, Parquet file Formats.
- Good understanding of NoSQL databases and hands on work experience in writing applications on NoSQL databases like HBase.
- Experience in Database design, Data analysis, Programming SQL, Stored procedure's PL/ SQL, and Triggers in Oracle and SQL Server.
- Working knowledge of Databases like Oracle, SQLServer, Netezza, MySQL and Teradata.
- Experience in using IDEs like Eclipse and NetBeans.
- Experience in using DevOps tools like Jenkins and UDeploy.
- Extensive programming experience in developing Java applications using Core Java, J2EE and JDBC.
- Well versed with UNIX and Linux command line and shell script.
- Extensive experience worked on Subversion (SVN), Git for Source Controlling.
- Adequate knowledge and working experience with Agile methodology.
- Research-oriented, motivated, proactive, self-starter with strong technical, analytical and interpersonal skills.
- Project management skills like schedule planning, Offshore Team management, and design presentation.
TECHNICAL SKILLS:
Languages: Java, Scala, SQL
Big Data Technologies: Hadoop, HDFS, MapReduce, Hive, Pig, Sqoop, Spark, Kafka, Nifi, Oozie, Impala, HBase, Cloudera (CDH), Hortonworks (HDP&HDF)
Databases: Oracle, SQLServer, MySQL, Netezza and Teradata
Scripting Languages: Shell Scripting, Python
IDE Tools: Eclipse, Net Beans
DB Tools: TOAD, SQL Assistant, Tableau
Operating Systems: UNIX, LINUX, Windows
PROFESSIONAL EXPERIENCE:
Senior Hadoop Developer
Confidential, Wells Fargo
Responsibilities:
- Developed application using HIVE & SQOOP to data transfer from Teradata, SQL Server to Hadoop HDFS and Automated using Shell scripts with error handling Systems and scheduled using Autosys.
- Developed Spark Applications by using Scala and Implemented Apache Spark data sourcing to ingest data from various RDBMS and Streaming sources.
- Supporting Spark Streaming Application that consumes data from Nifi- Kafka integrated process and stores in Hive Tables and Offset values in HBase
- Developed Spark code to load ORC data and loaded in to Hive Tables and handled structured data using Spark- SQL and Data Frame concepts for sourcing and processing of data.
- Involved in writing spark applications using Scala Hands on experience in creating RDDs, transformations and Actions while implementing spark applications.
- Developed Spark code using PySpark for Data Quality Check implementation using Spark RDD and Data frame programming
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Python and Scala.
- Performed End to End Application Testing which includes Unit Testing, Functional Testing and Regression Testing.
- Involved in Production activities like Source Code Deployment, scheduling Autosys jobs and Month End Production Support.
- Support architecture, design review, code review, and best practices to implement a Hadoop architecture.
Environment: Hortonworks (HDP, HDF), HDFS, Map Reduce, Hive, Sqoop, Spark, HBase, Ambari, Nifi, Kafka, Atlas, Ranger, Teradata, SQLServer, Linux, Scala, Eclipse, SQL Assistant
Senior Hadoop Developer
Confidential, Charlotte, NC.
Responsibilities:
- Assess current and future ingestion requirements, review data sources, data formats and recommend processes for loading data into Hadoop.
- Developed ETL Applications using HIVE, SPARK, IMPALA & SQOOP and Automated using Oozie workflows and Shell scripts with error handling Systems and scheduled using Autosys.
- Built Sqoop jobs to import massive amounts of data from relational databases - Teradata & Netezza -and back-populate on Hadoop platform.
- Working on creating a common workflow to convert from EBCDIC format to ASCII from the Mainframe sources to a delimited file in the Avro format to HDFS.
- Worked on Avro and Parquet File Formats with snappy compression.
- Developing scripts to perform business transformations on the data using Hive and PIG.
- Creating Impala views on top of Hive tables for faster access to analyze data through HUE/TOAD.
- Connected Impala with different BI tools like TOAD and Sql Assistant to help modeling team to run the different RISK models.
- Experience working on Spark programs using Scala and Spark SQL for developing Business Reports.
- Configured Spark streaming to receive real time data from the Kafka and store the stream data to HDFS using Scala.
- Developing Bteq scripts for moving data from staging table to final tables in Teradata as part of automation.
- Support architecture, design review, code review, and best practices to implement a Hadoop architecture.
Environment: Cloudera (CDH4/CDH5), HDFS, Map Reduce, Hive, Pig, Sqoop, Oozie, Impala, Spark, Kafka, Teradata, Linux, Java, Eclipse, SQL Assistant, TOAD
Senior Hadoop Developer
Confidential, Minneapolis, MN
Responsibilities:
- Primary responsibilities include building scalable distributed data solutions using Hadoop ecosystem
- Installed and configured Hive, Pig, Sqoop, Flume and Oozie on the Hadoop cluster
- Developed Simple to complex Map/reduce streaming jobs using Java that are implemented using Hive and Pig.
- Optimized Map/Reduce Jobs to use HDFS efficiently by using various compression mechanisms
- Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted the data from MySQL into HDFS using Sqoop
- Analyzed the data by performing Hive queries (HiveQL) and running Pig scripts (Pig Latin) to study customer behavior.
- Used Hive to analyze data ingested into HBase by using Hive-HBase integration and compute various metrics for reporting on the dashboard.
- Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team
- Experienced on loading and transforming of large sets of structured, semi structured and unstructured data.
- Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
- Written multiple MapReduce programs in Java for data extraction, transformation and aggregation from multiple file formats including XML, JSON, CSV and other compressed file formats
Environment: Hadoop, PIG, Hive, Apache Sqoop, Oozie, HBase, Zoo keeper, Cloudera manager, 30 Node cluster with Linux-Ubuntu.
Senior Hadoop Developer
Confidential, Alpharetta, GA
Responsibilities:
- Involved in defining job flows, managing and reviewing log files.
- Supported Map Reduce Programs those are running on the cluster.
- As a Big Data Developer, implemented solutions for ingesting data from various sources and processing the Data-at-Motion utilizing Big Data technologies such as Hadoop, MapReduce Frameworks, HBase, Hive, Oozie, Flume, Kafka, Sqoop etc.
- Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
- Imported Bulk Data into HBase Using Map Reduce programs.
- Developed and written Apache PIG scripts and HIVE scripts to process the HDFS data.
- Designed and implemented Incremental Imports into Hive tables.
- Involved in collecting, aggregating and moving data from servers to HDFS using Apache Flume.
- Written Hive jobs to parse the logs and structure them in tabular format to facilitate effective querying on the log data.
- Involved with File Processing using Pig Latin.
- Experience in moving data to Amazon S3 and worked with ELASTIC MAPREDUCE and setup environment in AWS EC2 Instances.
- Involved in creating Hive tables, loading with data and writing hive queries that will run internally in map reduce way.
- Experience in optimization of Map reduce algorithm using combiners and partitions to deliver the best results and worked on Application performance optimization for a HDFS cluster.
- Implemented business logic by writing UDFs in Java and used various UDFs from Piggybanks and other sources.
- Worked on debugging, performance tuning of Hive & Pig Jobs
- Used Hive to find correlations between customer's browser logs in different sites and analyzed them to build risk profile for such sites.
- Created and maintained Technical documentation for launching HADOOP Clusters and for executing Hive queries and Pig Scripts.
Environment: Java, Hadoop 2.1.0, Map Reduce2, Pig 0.12.0, Hive 0.13.0, Linux, Sqoop 1.4.2, Flume 1.3.1, Eclipse, AWS EC2, S3, EMR
Hadoop Developer
Confidential, Memphis, TN
Responsibilities:
- Migrated the needed data from MySQL into HDFS using Sqoop and importing various formats of flat files in to HDFS.
- Mainly worked on Hive queries to categorize data of different claims.
- Involved in loading data from LINUX file system to HDFS
- Written customized Hive UDFs in Java where the functionality is too complex.
- Implemented Partitioning, Dynamic Partitions, Buckets in HIVE.
- Designing and creating Hive external tables using shared meta-store instead of derby with partitioning, dynamic partitioning and buckets.
- Generate final reporting data using Tableau for testing by connecting to the corresponding Hive tables using Hive ODBC connector.
- Responsible to manage the test data coming from different sources
- Reviewing peer table creation in Hive, data loading and queries.
- Weekly meetings with technical collaborators and active participation in code review sessions with senior and junior developers.
- Monitored System health and logs and respond accordingly to any warning or failure conditions.
- Gained experience in managing and reviewing Hadoop log files.
- Involved in scheduling Oozie workflow engine to run multiple Hive and pig jobs
- Involved unit testing, interface testing, system testing and user acceptance testing of the workflow tool.
- Created and maintained Technical documentation for launching Hadoop Clusters and for executing Hive queries and Pig Scripts
Environment: Apache Hadoop, HDFS, Hive, Map Reduce, Core Java, Pig, Sqoop, Cloudera CDH4, Oracle, Tableau, MySQL
Java Developer
Confidential
Responsibilities:
- Involved in Requirements analysis, design, and development and testing.
- Involved in setting up the different roles & maintained authentication to the application.
- Designed, deployed and tested Multi-tier application using the Java technologies.
- Involved in front end development using JSP, HTML & CSS.
- Implemented the Application using Servlets
- Deployed the application on Oracle Web logic server
- Implemented Multithreading concepts in java classes to avoid deadlocking.
- Used MySQL database to store data and execute SQL queries on the backend.
- Prepared and Maintained test environment.
- Tested the application before going live to production.
- Documented and communicated test result to the team lead on daily basis.
- Involved in weekly meeting with team leads and manager to discuss the issues and status of the projects.
Environment: J2EE (Java, JSP, JDBC, Multi-Threading), HTML, Oracle Web logic server, Eclipse, MySQL, Junit.