We provide IT Staff Augmentation Services!

Hadoop Developer Resume

Phoenix, AZ

SUMMARY:

  • Offering over 6 years of overall IT experience with 5 Years of experience in Application integration and management in Cloud and Big Data.
  • Expertise in Hadoop, HDFS, Map Reduce and Hadoop Ecosystem including Hive, HBase, HBase - Hive, Integration, PIG, Sqoop, Flume, Oozie, Zookeeper & knowledge of Mapper/Reduce/HDFS Framework.
  • Good understanding on Hadoop architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, Application Master, Resource Manager, Node Manager and MapReduce programming paradigm.
  • Knowledge of NO SQL databases like MongoDB and Cassandra.
  • Have experience in Shell Scripting and used it extensively with Spark for data processing.
  • Developing various cross platform products while working with different Hadoop file formats like Sequence File, RC File, ORC, AVRO & Parquet.
  • Experience in Apache Spark, Spark Streaming, Spark SQL and No SQL databases like Cassandra and HBase.
  • Analyzing Data through Hive QL, Pig Latin & MapReduce programs in Java. Extending HIVE and PIG core functionalities by implementing custom UDF’s.
  • Experience in AWS services like EMR, EC2 and S3.
  • Good exposure and experience in Spark, Scala, Big Data and AWS Stack.
  • Good hands on experience in creating the RDD's, Data frames for the required input data and performed the data transformations using Spark and Scala.
  • Hands on experience on Cloudera & MapR Hadoop environments.
  • Experience in writing Maven and SBT scripts to build and deploy Java and Scala Applications.
  • Good understanding of Hadoop administration with Cloudera & MapR.
  • Uses Talend Open Studio to load files into Hadoop HIVE tables and performed ETL aggregations in Hadoop HIVE.
  • Experience in performance tuning, monitoring the Hadoop cluster by gathering and analyzing the existing infrastructure using Cloudera manager.
  • Involved in production monitoring using workflow monitor and experience in development and support environments.
  • Experienced in using waterfall, Agile and Scrum models of software development process framework.
  • Strong knowledge of version control systems like SVN & GIT.

TECHNICAL SKILLS:

Hadoop: HDFS, Spark, Flume, Kafka, Oozie.

Hadoop Distributions: Cloudera, Hortonworks, Azure HDINSIGHT.

Languages: Java, Scala, Python, LINUX Shell Scripting, AZURE PowerShell.

Scripts: JavaScript, Shell Scripting.

Database: Oracle 10g, MySQL, MSSQL.

No SQL Database: HBase, Cassandra, MongoDB.

Web Servers: Apache Tomcat.

Operating Systems: Windows, Linux (Cent OS, Ubuntu).

PROFESSIONAL EXPERIENCE:

Confidential, Phoenix, AZ

Hadoop Developer

Roles & Responsibilities:

  • Developed various main & service classes through Scala using spark SQLs for the requirement specific tasks.
  • Cluster size is scalable and with Band width of 40GB nodes with a twostep process namely Data Ingestion flowed by Data Processing.
  • Hands on coding with Scala for leveraging Apache Spark through the Scala APIs.
  • Performed Data preprocessing and data cleaning using Hive & Pig.
  • Populating HBase tables through automation. Used Spark Shell for querying database.
  • Great familiarity with Hive joins & used HQL for querying the databases eventually leading to complex Hive UDFs.
  • Veracity of Data from various sources: Sequence file, Avro, Text, Hive Tables, Batch Streams and Logfiles.
  • Wrapper scripts in Unix Shell for automation using shell scripting.
  • Build Tools used were Maven and SBT.
  • Involved in Data Validation and fixing discrepancies by working in coordination with the Data Integration.
  • Implemented SPARK batch jobs.
  • Developing the Tasks and setting up the requirement environment for running Hadoop in cloud on various instances.
  • Automated complex workflow schedulers using the Oozie workflow scheduler.

Environment: MapR Hadoop Distribution, Hive, Python, HBase, Sqoop, Maven builds, Spark, Spark SQL, Oozie, Linux/Unix, Shell Scripting, GIT.

Confidential, Irving, TX

Hadoop Developer

Roles & Responsibilities:

  • Worked in a team with 30-node cluster and increased cluster by adding Nodes, the configuration for additional data nodes was done by Commissioning process in Hadoop.
  • Daily Monitoring of Cluster status and health using Hue UI.
  • Created data importing pipelines from the MySQL and Oracle into the HDFS using Sqoop.
  • Stored data in AWS S3 like HDFS. Also performed EMR operations on data stored in S3.
  • Scripted AWS environments using secured VPC and different data pipelines and Redshift cluster.
  • Written Hadoop MapReduce jobs using Java for processing data on HDFS.
  • Written Spark applications utilizing Spark-Core, Data frames, Spark-SQL using Scala.
  • Import the data from HDFS/HBase into Spark RDD.
  • Created Hive tables and implemented partitioning, dynamic partitions, buckets and created external tables to optimize performance.
  • Extensively Involved in loading data from UNIX file system to HDFS.
  • Involved in evaluating the business requirements and prepared detailed specifications that follow project guidelines required to develop written programs.
  • Performed CRUD operations in HBase.
  • Developed Hive queries to process the data.
  • Oozie for automating events for Data Ingestion and Processing.
  • Generated aggregations and groups and visualizations using Tableau.

Environment: HDFS, MapReduce, Sqoop, Hive, Pig, Oozie, HBase, Yarn, Spark, Tableau, Cloudera Manager.

Confidential, Bentonville, AR

Hadoop Developer

Roles & Responsibilities:

  • Involved in all phases of Installation and upgradation of Hadoop big data platform. Implementing security for Hadoop big data platform
  • Designed the sequence diagrams to depict the data flow into Hadoop.
  • Involved in importing and exporting data between HDFS and Relational Systems like Oracle, MySQL and DB2 using Sqoop.
  • Prepare SOPs for product installations, upgrades and any other new process. Analyze encryption methodologies and implement them in the environment
  • Setup best practices for monitoring. Analyze Hardware, Software requirements for the projects
  • Help Application and Operations team to troubleshoot the performance issues
  • Implemented Partitioning, Dynamic Partitions and bucketing in HIVE for efficient data access.
  • Created final tables in Parquet format. Use of Impala to create and manage Parquet tables.
  • Implemented data Ingestion and handling clusters in real time processing using Apache Kafka.
  • Involve in creating Hive tables, loading with data and writing Hive queries which will run internally in map reduce way.
  • Implement Partitioning, Dynamic Partitions, Buckets in HIVE.
  • Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.

Environment: Hadoop, HDFS, Pig, Hive, Spark, MapReduce, Java.

Confidential

Hadoop Engineer

Roles & Responsibilities:

  • Worked with systems engineering team to plan and deploy new Hadoop environments and expand existing Hadoop clusters.
  • Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required. ETL & BI concepts, testing methodologies
  • Involved in Analyzing system failures, identifying root causes, and recommended course of actions.
  • Designed a data warehouse using Hive.
  • Using Hive, Map-reduce, and loaded data into HDFS.
  • Monitored workload, job performance and capacity planning using Cloudera Manager.
  • Extensively worked on SQOOP for importing metadata from Oracle.
  • Extensively used Pig for data cleansing.
  • Created partitioned tables in Hive.
  • Worked with business teams and created Hive queries for ad hoc access.
  • Evaluated usage of Oozie for Workflow Orchestration.
  • Mentored analyst and test team for writing Hive Queries.
  • Wrote MapReduce programs with Java API to cleanse Structured and unstructured data.
  • Worked on loading the data from MySQL to HBase where necessary using Sqoop.
  • Launched Amazon EC2 Cloud Instances using Amazon Images (Linux/ Ubuntu) and Configuring launched instances with respect to specific applications.

Environment: Hadoop, Map Reduce, HDFS, Hive, Pig, Sqoop, AWS, Java, Oozie, MySQL

Confidential

Associate Engineer

Roles & Responsibilities:

  • Created database objects like tables, views, indexes, stored-procedures, triggers, and user defined functions
  • Written stored procedures and SQL scripts in SQL server implement business rules for various clients
  • Written T- SQL queries for the retrieval of the data
  • Writing and debugging T- SQL, stored procedures, views and user defined functions
  • Data migration (import & export - BCP) from text to SQL Server
  • Error handling using Try-Catch Block
  • Normalization and De-Normalization of tables
  • Developed backup and restore scripts for SQL Server 2008
  • Installed and configured SQL Server 2008 with latest service packs
  • Customized the stored procedures and database triggers to meet the changing business rules
  • Implemented indexes for performance tuning.
  • Wrote Triggers and Stored Procedures and T- SQL Queries to capture updated and deleted data from OLTP systems
  • Designed data models using Erwin. Developed physical data models and created DDL scripts to create database schema and database objects
  • Wrote T- SQL queries using inner join, outer join, and self joins, merge join. And implemented functionality for removing duplicate records like using CTE and ranking function.

Environment: MS SQL Server 2008, T- SQL, DTS, SQL Server Enterprise Manager, SQL Profiler.

Environment: SSIS/SSRS, T- SQL, SQL Server 2008, MS Excel MS Office 2007, Windows 7.

Hire Now