We provide IT Staff Augmentation Services!

Hadoop Developer Resume

Charlotte, NC

SUMMARY

  • Around 8 years of IT professional Experience along with Hadoop, experience in developing, configuring, implementing Hadoop and Big - data ecosystems on various platforms.
  • Experience in dealing with Apache Hadoop components like HDFS, MapReduce, HIVE, HBase, PIG, SQOOP, Spark and Flume Big Data and Big Data Analytics.
  • Experience in analyzing data using HiveQL (HQL), Pig Latin.
  • Got experience in managing and reviewing Hadoop Log files.
  • Worked with Sqoop to move (import/export) data from a relational database into Hadoop and used FLUME to collect data and populate Hadoop.
  • Experience in working with versions of Hadoop 1.0 and Hadoop 2.0 (YARN).
  • In depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, Name Node, Job Tracker, Data Node, Task Tracker and Map Reduce concepts.
  • Worked on the data ingestion from SQL Server to our Datalake by using Sqoop and Shell scripts.
  • Involved in data transfer HDFS to RDBMS using SQOOP and vice-versa.
  • Excellent knowledge on Spark Core architecture.
  • Hands on expertise in writing different RDD transformations, actions using Scala.
  • Created Data Frames and performed analysis using Spark SQL.
  • Have Good experience in Client side designing and validations using HTML, DHTML and JavaScript.
  • Have good knowledge in creating the Autosys jobs and Beatle jobs.

TECHNICAL SKILLS

Big Data Tools: HDFS, MapReduce, YARN, Hive (HQL), Pig, Sqoop, Flume, Oozie, Kafka, spark, Horton work.

Hadoop Distribution: Cloudera Distribution of Hadoop (CDH).

Web Technologies: HTML, XML, XHTML, JAVASCRIPT.

Programming Languages: SQL, MySQL, Oracle, Scala.

Database: MySQL, NoSQL, HBase, Oracle.

Operating Systems: UNIX, Linux, Windows Variants.

Tools: Eclipse, IntelliJ, SBT, SQL Server Management Studio, GitHub.

PROFESSIONAL EXPERIENCE

Confidential, Charlotte, NC

Hadoop developer

Responsibilities:

  • Involved in spark migration from Spark 1.x to Spark 2.x.
  • Involved in implementing Spark Dataframes using Java API from Different sources of systems
  • Involved in writing AutoSys Jobs like Box and Jil files creation.
  • Used HDFS File system API for moving files between LFS and HDFS.
  • Perform transformations/partitioning on incoming data and created external Hive tables to store the processed results and storing data in Parquet format.
  • Data validation, schema validation & validating data between staging tables & final tables in HULC.
  • Manage and monitor Hadoop, Spark, HIVE development issues and bugs.

Confidential, Charlotte, NC

Hadoop developer

Responsibilities:

  • Create schema document for hive tables, and Create Hive tables in Datalake bronze layer with and without partitions
  • Create hql files to load data from staging to permanent table
  • Load the hive tables from source files through file ingestion process
  • Load the hive tables from external database server tables through Sqoop ingestion process
  • Create config files for data loading process
  • Create and schedule Autosys jobs for load process
  • Unit testing of loading process
  • Attending daily standup calls and provide updates on current task status
  • Attending Sprint Planning meetings and provide estimates for upcoming tasks.
  • Supporting the application in Production environment and providing fixes to the issues.
  • Developing enhancements to the existing application.

Confidential, Boston, MA

Hadoop Developer

Responsibilities:

  • Helped business processes by developing, installing and configuring Hadoop ecosystem components that moved data from individual servers to HDFS.
  • Assess existing and available data warehousing technologies and methods to ensure our Data warehouse/BI architecture meets the needs of the business unit and enterprise and allows for business growth.
  • Capturing data from existing databases that provide SQL interfaces using Sqoop.
  • Worked extensively with Sqoop for importing and exporting the data from HDFS to Relational Database systems/mainframe and vice-versa. Loading data into HDFS.
  • Created Hive queries (HQL) that helped market analysts spot emerging trends by comparing fresh data with EDW reference tables and historical metrics.
  • Enabled speedy reviews and first mover advantages by using Oozie to automate data loading into the Hadoop Distributed File System and PIG to pre-process the data.
  • Provided design recommendations and thought leadership to sponsors/stakeholders that improved review processes and resolved technical problems.
  • Worked with Spark and Scala.
  • Responsible for end to end design on Spark Sql, Development to meet the requirements.
  • Experienced in working with Spark eco-system using SparkSQL and Scala queries on different formats like Text file, CSV file.
  • Experience in Query data using Spark SQL on the top of Spark Engine implementing Spark RDD’s in Scala.
  • Performed data frame and dataset operations on rdd.
  • Involved in converting MapReduce programs into Spark transformations using Spark RDD
  • Managed and reviewed Hadoop log files.
  • Tested raw data and executed performance scripts.
  • Shared responsibility for administration of Hadoop, Hive and Pig.
  • Developed Hive queries for the analysts.

Tools: Used: Hadoop, HDFS, Hive(HQL), Sqoop, Oozie, Spark, Spark SQL, Cloudera, PL/SQL, SQL*PLUS, Windows NT, UNIX Shell Scripting.

Confidential

Hadoop Developer

Responsibilities:

  • Responsible for building scalable distributed data solutions using Hadoop.
  • Checking of Hadoop daemon services and responding accordingly to any warning or failure conditions.
  • Deployed Hadoop Cluster in the different modes- Standalone, Pseudo-distributed, Fully Distributed
  • Writing shell scripts to monitor the health check of Hadoop daemon services and responding accordingly to any warning or failure conditions.
  • Developed MapReduce programs to parse the raw data, populate staging tables and store the refined data in partitioned tables in the EDW.
  • Replaced default Derby metadata storage system for Hive with MySQL system.
  • Executed queries using Hive and developed Map-Reduce jobs to analyze data.
  • Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS.
  • Implemented best practices to create Hive tables with appropriate partition methods and processing of data be consistent with Enterprise standards Developed Scripts and Batch Job to schedule various Hadoop Programs.
  • Develop Hive queries for the analysts.
  • Writing Hive queries for data analysis to meet the business requirements.
  • Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre- processing with Pig.
  • Implemented Fair schedulers on the Job tracker to share the resources of the Cluster for the Map Reduce jobs given by the users.
  • Developed Map-Reduce, Hive, Pig Scripts to process data.
  • Imported data from RDBMS to HDFS using Sqoop.
  • Involved in Hive and HBase Integration by using HBase Storage Handler.
  • Developed custom Event flume sink responsible for collecting the data in real time and storing it in cache for Analysis.
  • Analyzed the data using Hadoop ecosystem like Hive, Flume.

Tools: Used: Hadoop, Cloudera, Hive, HBase, SQL,HQL, Flume, Kafka, Oozie and Sqoop, Linux, MapReduce, HDFS, MapR, Java, MySQL, Horton work.

Confidential

PL-SQL Developer

Responsibilities:

  • Developing personal projects with guidance of database veterans as mentors
  • Creating databases
  • Good knowledge and Experience in dealing with Relational Database Management Systems, including Normalization, Stored Procedures, Constraints, Querying, Joins, Keys, Indexes, Complex Views, Dynamic SQL, Triggers and Cursors.
  • Expertise with DDL and DML statements, RDBMS, data dictionaries and normal forms.
  • Writing stored procedures, using temporary tables, views, indexes, triggers when required and complex queries including correlated queries and queries with complex joins and aggregate functions.
  • Experience in using Try Catch Block introduced in SQL Server 2005.
  • Experience in writing complex T-SQL queries using Inner Joins, Outer Joins and Cross Joins.
  • SQL server administration skills including, backups, disaster recovery, database maintenance, user authorizations, database creation
  • Experienced with cross browser compatibility and worked on various browsers like Google chrome, Mozilla Firefox, Internet Explorer and Safari.

Tools: Used: SQL Server 2000/2005, SSIS, Microsoft Office, Windows 2003/XP.

Hire Now