Sr. Hadoop Developer Resume

SUMMARY

Overall 8+ years of overall experience with strong emphasis on Design, Development, Implementation, Testing and Deployment of Software Applications.
Over 4+ years of comprehensive IT experience in BigData and BigData Analytics, Hadoop, HDFS, MapReduce, YARN, Hadoop Ecosystem and Shell Scripting.
5+ years of development experience using Java, J2EE, JSP and Servlets.
Highly capable for processing large sets of Structured, Semi - structured and Unstructured datasets and supporting BigData applications.
Hands on experience with Hadoop Ecosystem components like Map Reduce (Processing), HDFS (Storage), YARN, Sqoop, Pig, Hive, HBase, Oozie, ZooKeeper and Spark for data storage and analysis.
Expertise in transferring data between a Hadoop ecosystem and structured data storage in a RDBMS such as MY SQL, Oracle, Teradata and DB2 using Sqoop.
Experience in NoSQL databases like Mongo DB, HBase and Cassandra.
Experience in Apache Spark cluster and streams processing using Spark Streaming.
Expertise in moving large amounts of log, streaming event data and Transactional data using Flume.
Experience in developing MapReduce jobs in Java for data cleaning and preprocessing.
Expertise in writing Pig Latin, Hive Scripts and extended their functionality using User Defined Functions (UDF's).
Expertise in handling arrangement of data within certain limits (Data Layout's) using Partitions and Bucketing in Hive.
Expertise in preparing Interactive Data Visualization's using Tableau Software from different sources.
Hands on experience in developing workflows execute MapReduce, Sqoop, Pig, Hive and Shell Scripts using Oozie.
Experience working with Cloudera Hue Interface and Impala.
Hands on experience developing Solr Indexes using MapReduce Indexer Tool.
Expertise in Object-Oriented Analysis and Design (OOAD) like UML and use of various design patterns.
Experience in Java, JSP, Servlets, EJB, Web Logic, Web Sphere, Hibernate, Spring, JBoss, JDBC, RMI, Java Script, Ajax, JQuery, XML and HTML.
Fluent with the core Java concepts like I/O, Multi-Threading, Exceptions, Reg Ex, Data Structures and Serialization.
Performed Unit Testing using Junit Testing Framework and Log4J to monitor the error logs.
Experience in process Improvement, Normalization/De-normalization, Data extraction, cleansing and Manipulation.
Converting requirement specification, Source system understanding into Conceptual, Logical and Physical Data Model, Data flow (DFD).
Expertise in working with Transactional Databases like Oracle, SQL server, My SQL, and Db2.
Expertise in developing SQL queries, Stored Procedures and excellent development experience with Agile Methodology.
Ability to adapt to evolving technology, Strong sense of Responsibility and Accomplishment.
Excellent leadership, interpersonal, problem solving and time management skills.
Excellent communication skills both Written (documentation) and Verbal (presentation).

TECHNICAL SKILLS

Technology: Hadoop Ecosystem/ J2SE/ J2EE/ Oracle.

Languages: Hadoop, HDFS, MapReduce, Hive, Pig, HBase, Spark, Scala, Impala, kafka, Hue, Sqoop, Oozie, Flume, Zookeeper, Cassandra, Cloudera CDH5, Python, PySpark, Solrand Horton works .

DBMS/Databases: Oracle, MySQL, SQL Server, DB2, Mongo DB, Teradata, HBase, Cassandra. .

Programming Languages: C, C++, JSE, XML, JSP/Servlets, Struts, Spring, HTML, JavaScript, jQuery, Web services.

Big Data Ecosystem: HDFS, Map Reduce, Oozie, Hive, Pig, Sqoop, Flume, Zookeeper and Hbase, Storm,Kafka, Spark, Scala.

Methodologies: Agile,WaterFall.

NOSQL Databases: Cassandra, MongoDB, HBase.

Version Control Tools: SVN, CVS, VSS, PVCS.

Reporting Tools: Crystal Reports, SQL Server Reporting Services and Data Reports, Business Intelligence and Reporting Tool (BIRT).

PROFESSIONAL EXPERIENCE

Confidential

Sr. Hadoop Developer

Responsibilities:

Responsible for Managing, Analyzing and Transforming petabyte s of data and also quick validation check on FTP file arrival from S3 Bucket to HDFS.
Responsible for analyzing large data sets and derive customer usage patterns by developing new MapReduce programs.
Experienced in creation of Hive tables and loading data incrementally into the tables using Dynamic Partitioning and Worked on Avro Files, JSON Records.
Experienced in using Pig for data cleansing and developed Pig Latin scripts to extract the data from web server output files to load into HDFS.
Worked on Hive by creating external and internal tables, loading it with data and writing Hive queries.
Involved in development and usage of UDTF s and UDAF s for decoding Log Record Fields and Conversion s, Generating Minute Buckets for the specified Time Interval s and JSON Field Extractor.
Developed Pig and Hive UDF's to analyze the complex data to find specific user behavior.
Responsible for Debug, Optimization of Hive Scripts and also implementing Deduplication Logic in Hive using a Rank Key Function (UDF).
Experienced in writing Hive Validation Scripts which are used in validation framework (for daily analysis through graphs and presented to business users).
Developed workflow in Oozie to automate the tasks of loading data into HDFS and pre - processing with Pig and Hive.
Involved for Cassandra Database Schema design.
Using BULK LOAD Utility data pushed to Cassandra databases.
Responsible for creating Dashboards on Tableau Server.
Generated reports for hive tables in different scenarios using Tableau
Responsible for Scheduling using Active Batchjobs and Cron jobs.
Experienced in Jar builds that can be triggered by commits to Github using Jenkins.
Exploring new tools for data tagging like Tealium (POC Report).
Actively updated the upper management with daily updates on the progress of project that include the classification levels that were achieved on the data.

Environment: Hadoop, HDFS, Map Reduce, Hive, HBase, Zookeeper, Impala, Java(jdk1.6), Cloudera, Oracle, SQL Server, UNIX Shell Scripting, Flume, Oozie, Scala, Spark, Sqoop, Python, kafka, PySpark.

Confidential

Sr. Hadoop Developer

Responsibilities:

Responsible for Writing MapReduce jobs to perform operations like copying data on HDFS and defining job flows on EC2 server, load and transform large sets of structured, semi - structured and unstructured data. .
Developed a process for Sqooping data from multiple sources like SQL Server, Oracle and Teradata.
Responsible for creation of mapping document from source fields to destination fields mapping.
Developed a shell script to create staging, landing tables with the same schema like the source and generate the properties which are used by Oozie jobs.
Developed Oozie workflow s for executing Sqoop and Hive actions.
Worked with NoSQL databases like Hbase in creating Hbase tables to load large sets of semi structured data coming from various sources.
Performance optimizations on Spark/Scala. Diagnose and resolve performance issues.
Responsible for developing Python wrapper scripts which will extract specific date range using Sqoop by passing custom properties required for the workflow.
Developed scripts to run Oozie workflows, capture the logs of all jobs that run on cluster and create a metadata table which specifies the execution times of each job.
Developed Hive scripts for performing transformation logic and also loading the data from staging zone to final landing zone.
Worked on Parquet File format to get a better storage and performance for publish tables.
Involved in loading transactional data into HDFS using Flume for Fraud Analytics.
Developed Python utility to validate HDFS tables with source tables.
Designed and developed UDF S to extend the functionality in both PIG and HIVE.
Import and Export of data using Sqoop between MySQL to HDFS on regular basis.
Responsible for developing multiple Kafka Producers and Consumers from scratch as per the software requirement specifications. .
Involved in using CA7 tool to setup dependencies at each level (Table Data, File and Time).
Automated all the jobs for pulling data from FTP server to load data into Hive tables using Oozie workflows.
Involved in developing Spark code using Scala and Spark-SQL for faster testing and processing of data and exploring of optimizing it using Spark Context, Spark-SQL, Pair RDD's, Spark YARN.
Migrating the needed data from Oracle, MySQL in to HDFS using Sqoop and importing various formats of flat files in to HDFS.

Environment: Hadoop, HDFS, Map Reduce, Hive, HBase, Kafka, Zookeeper, Oozie, Impala, Java(jdk1.6), Cloudera, Oracle, Teradata SQL Server, UNIX Shell Scripting, Flume, Scala, Spark, Sqoop, Python.

We provide IT Staff Augmentation Services!

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship