We provide IT Staff Augmentation Services!

Sr. Hadoop Developer Resume

SUMMARY

  • Overall 8+ years of overall experience with strong emphasis on Design, Development, Implementation, Testing and Deployment of Software Applications.
  • Over 4+ years of comprehensive IT experience in BigData and BigData Analytics, Hadoop, HDFS, MapReduce, YARN, Hadoop Ecosystem and Shell Scripting.
  • 5+ years of development experience using Java, J2EE, JSP and Servlets.
  • Highly capable for processing large sets of Structured, Semi - structured and Unstructured datasets and supporting BigData applications.
  • Hands on experience with Hadoop Ecosystem components like Map Reduce (Processing), HDFS (Storage), YARN, Sqoop, Pig, Hive, HBase, Oozie, ZooKeeper and Spark for data storage and analysis.
  • Expertise in transferring data between a Hadoop ecosystem and structured data storage in a RDBMS such as MY SQL, Oracle, Teradata and DB2 using Sqoop.
  • Experience in NoSQL databases like Mongo DB, HBase and Cassandra.
  • Experience in Apache Spark cluster and streams processing using Spark Streaming.
  • Expertise in moving large amounts of log, streaming event data and Transactional data using Flume.
  • Experience in developing MapReduce jobs in Java for data cleaning and preprocessing.
  • Expertise in writing Pig Latin, Hive Scripts and extended their functionality using User Defined Functions (UDF's).
  • Expertise in handling arrangement of data within certain limits (Data Layout's) using Partitions and Bucketing in Hive.
  • Expertise in preparing Interactive Data Visualization's using Tableau Software from different sources.
  • Hands on experience in developing workflows execute MapReduce, Sqoop, Pig, Hive and Shell Scripts using Oozie.
  • Experience working with Cloudera Hue Interface and Impala.
  • Hands on experience developing Solr Indexes using MapReduce Indexer Tool.
  • Expertise in Object-Oriented Analysis and Design (OOAD) like UML and use of various design patterns.
  • Experience in Java, JSP, Servlets, EJB, Web Logic, Web Sphere, Hibernate, Spring, JBoss, JDBC, RMI, Java Script, Ajax, JQuery, XML and HTML.
  • Fluent with the core Java concepts like I/O, Multi-Threading, Exceptions, Reg Ex, Data Structures and Serialization.
  • Performed Unit Testing using Junit Testing Framework and Log4J to monitor the error logs.
  • Experience in process Improvement, Normalization/De-normalization, Data extraction, cleansing and Manipulation.
  • Converting requirement specification, Source system understanding into Conceptual, Logical and Physical Data Model, Data flow (DFD).
  • Expertise in working with Transactional Databases like Oracle, SQL server, My SQL, and Db2.
  • Expertise in developing SQL queries, Stored Procedures and excellent development experience with Agile Methodology.
  • Ability to adapt to evolving technology, Strong sense of Responsibility and Accomplishment.
  • Excellent leadership, interpersonal, problem solving and time management skills.
  • Excellent communication skills both Written (documentation) and Verbal (presentation).

TECHNICAL SKILLS

Technology: Hadoop Ecosystem/ J2SE/ J2EE/ Oracle.

Languages: Hadoop, HDFS, MapReduce, Hive, Pig, HBase, Spark, Scala, Impala, kafka, Hue, Sqoop, Oozie, Flume, Zookeeper, Cassandra, Cloudera CDH5, Python, PySpark, Solrand Horton works .

DBMS/Databases: Oracle, MySQL, SQL Server, DB2, Mongo DB, Teradata, HBase, Cassandra. .

Programming Languages: C, C++, JSE, XML, JSP/Servlets, Struts, Spring, HTML, JavaScript, jQuery, Web services.

Big Data Ecosystem: HDFS, Map Reduce, Oozie, Hive, Pig, Sqoop, Flume, Zookeeper and Hbase, Storm,Kafka, Spark, Scala.

Methodologies: Agile,WaterFall.

NOSQL Databases: Cassandra, MongoDB, HBase.

Version Control Tools: SVN, CVS, VSS, PVCS.

Reporting Tools: Crystal Reports, SQL Server Reporting Services and Data Reports, Business Intelligence and Reporting Tool (BIRT).

PROFESSIONAL EXPERIENCE

Confidential

Sr. Hadoop Developer

Responsibilities:

  • Responsible for Managing, Analyzing and Transforming petabyte s of data and also quick validation check on FTP file arrival from S3 Bucket to HDFS.
  • Responsible for analyzing large data sets and derive customer usage patterns by developing new MapReduce programs.
  • Experienced in creation of Hive tables and loading data incrementally into the tables using Dynamic Partitioning and Worked on Avro Files, JSON Records.
  • Experienced in using Pig for data cleansing and developed Pig Latin scripts to extract the data from web server output files to load into HDFS.
  • Worked on Hive by creating external and internal tables, loading it with data and writing Hive queries.
  • Involved in development and usage of UDTF s and UDAF s for decoding Log Record Fields and Conversion s, Generating Minute Buckets for the specified Time Interval s and JSON Field Extractor.
  • Developed Pig and Hive UDF's to analyze the complex data to find specific user behavior.
  • Responsible for Debug, Optimization of Hive Scripts and also implementing Deduplication Logic in Hive using a Rank Key Function (UDF).
  • Experienced in writing Hive Validation Scripts which are used in validation framework (for daily analysis through graphs and presented to business users).
  • Developed workflow in Oozie to automate the tasks of loading data into HDFS and pre - processing with Pig and Hive.
  • Involved for Cassandra Database Schema design.
  • Using BULK LOAD Utility data pushed to Cassandra databases.
  • Responsible for creating Dashboards on Tableau Server.
  • Generated reports for hive tables in different scenarios using Tableau
  • Responsible for Scheduling using Active Batchjobs and Cron jobs.
  • Experienced in Jar builds that can be triggered by commits to Github using Jenkins.
  • Exploring new tools for data tagging like Tealium (POC Report).
  • Actively updated the upper management with daily updates on the progress of project that include the classification levels that were achieved on the data.

Environment: Hadoop, HDFS, Map Reduce, Hive, HBase, Zookeeper, Impala, Java(jdk1.6), Cloudera, Oracle, SQL Server, UNIX Shell Scripting, Flume, Oozie, Scala, Spark, Sqoop, Python, kafka, PySpark.

Confidential

Sr. Hadoop Developer

Responsibilities:

  • Responsible for Writing MapReduce jobs to perform operations like copying data on HDFS and defining job flows on EC2 server, load and transform large sets of structured, semi - structured and unstructured data. .
  • Developed a process for Sqooping data from multiple sources like SQL Server, Oracle and Teradata.
  • Responsible for creation of mapping document from source fields to destination fields mapping.
  • Developed a shell script to create staging, landing tables with the same schema like the source and generate the properties which are used by Oozie jobs.
  • Developed Oozie workflow s for executing Sqoop and Hive actions.
  • Worked with NoSQL databases like Hbase in creating Hbase tables to load large sets of semi structured data coming from various sources.
  • Performance optimizations on Spark/Scala. Diagnose and resolve performance issues.
  • Responsible for developing Python wrapper scripts which will extract specific date range using Sqoop by passing custom properties required for the workflow.
  • Developed scripts to run Oozie workflows, capture the logs of all jobs that run on cluster and create a metadata table which specifies the execution times of each job.
  • Developed Hive scripts for performing transformation logic and also loading the data from staging zone to final landing zone.
  • Worked on Parquet File format to get a better storage and performance for publish tables.
  • Involved in loading transactional data into HDFS using Flume for Fraud Analytics.
  • Developed Python utility to validate HDFS tables with source tables.
  • Designed and developed UDF S to extend the functionality in both PIG and HIVE.
  • Import and Export of data using Sqoop between MySQL to HDFS on regular basis.
  • Responsible for developing multiple Kafka Producers and Consumers from scratch as per the software requirement specifications. .
  • Involved in using CA7 tool to setup dependencies at each level (Table Data, File and Time).
  • Automated all the jobs for pulling data from FTP server to load data into Hive tables using Oozie workflows.
  • Involved in developing Spark code using Scala and Spark-SQL for faster testing and processing of data and exploring of optimizing it using Spark Context, Spark-SQL, Pair RDD's, Spark YARN.
  • Migrating the needed data from Oracle, MySQL in to HDFS using Sqoop and importing various formats of flat files in to HDFS.

Environment: Hadoop, HDFS, Map Reduce, Hive, HBase, Kafka, Zookeeper, Oozie, Impala, Java(jdk1.6), Cloudera, Oracle, Teradata SQL Server, UNIX Shell Scripting, Flume, Scala, Spark, Sqoop, Python.

Hire Now