We provide IT Staff Augmentation Services!

Hadoop Developer Resume

Durham, NC

PROFESSIONAL SUMMARY:

  • Around 8+ years of professional IT experience in all phases of Software Development Life Cycle including hands on experience in Big Data Analytics.
  • Experience in analysis, design, development and integration using Big Data Hadoop Technology like MapReduce, Hive, Pig, Sqoop, Oozie, Kafka, HBase, AWS, Cloudera, Horton works, Impala, Avro, Data Processing, SQL.
  • Good knowledge on Hadoop Architecture and its components such as HDFS, MapReduce, Job Tracker, Task Tracker, Name Node, Data Node.
  • Hands on experience in installing, configuring, and using Hadoop ecosystem components like HDFS, Hive, Spark, Scala, Spark - SQL, MapReduce, Pig, Sqoop, Flume, HBase, Zookeeper, Oozie and Tidal.
  • Hands on experience on developing Pyspark jobs for data cleaning and pre-processing.
  • Having extensive knowledge on Hadoop technology experience in Storage, writing Queries, processing and analysis of data.
  • Define the scope of automation, Tools selection, Design, Develop and maintain the Automation framework
  • Experience in extending Pig and Hive functionalities with custom UDFs for analysis of data, file processing, by running Pig Latin Scripts and using Hive Query Language.
  • Experience working with Amazon AWS cloud which includes services like (EC2, EMR, S3A, RDS and EBS), Elastic Beanstalk, Cloud Watch.
  • Good knowledge in using job scheduling and monitoring tools like Oozie and Zoo Keeper.
  • Expertise on working with various databases in writing Sql queries, Stored Procedures, functions and Triggers by using PL\SQL and Sql.
  • Experience in NoSQL Column-Oriented Databases like HBase and its Integration with Hadoop cluster.
  • Experience in installation, configuring, supporting and managing Hadoop Clusters using Apache, Cloudera (CDH 5.X) distributions and on Amazon web services (AWS).
  • Experience in Developing Spark jobs using Scala in test environment for faster data processing and used Spark SQL for querying.
  • Good understanding on Spark Streaming with Kafka for real-time processing.

TECHNICAL SKILLS:

Programming Languages: SQL/PLSQL, PIG LATIN, Scala

Hadoop: HDFS, MapReduce, HBase, Hive, Pig, Impala, SQOOP, Flume, OOZIE, Spark, Spark QL, and Zookeeper, PySpark, AWS, Cloudera, Horton works, Kafka, Avro.

Scripting Languages: Python 2.7 & 3.0, Scala and Shell scripting.

RDBMS Languages: Oracle SQL, Microsoft SQL Server, MYSQL.

NoSQL: HBase

IDES: Pycharm, Eclipse, and Intellij

Operating System: Linux, Windows, UNIX, CentOS.

Methodologies: Agile, Waterfall model.

Other Tools: Attunity Replicate & Compose, Tidal, SVN, Apache Ant, TOAD, Pl/SQL Developer, JIRA, Visual Studio.

PROFESSIONAL EXPERIENCE:

Confidential, Durham, NC

Hadoop Developer

Responsibilities:

  • Responsible for building scalable distributed data solutions using Hadoop.
  • Developed Spark jobs and Hive Jobs to summarize and transform data.
  • Experienced in developing Spark scripts for data analysis in both python and scala.
  • Built on premise data pipelines using Kafka and spark for real time data analysis.
  • Analysed the SQL scripts and designed the solution to implement using Scala.
  • Implemented Hive complex UDF's to execute business logic with Hive Queries.
  • Worked on migrating MapReduce programs into Spark transformations using Spark and Scala, initially done using python (PySpark).
  • Evaluated performance of Spark SQL vs IMPALA vs DRILL on offline data as a part of poc.
  • Handled importing data from different data sources into HDFS using Sqoop and performing transformations using Hive, Map Reduce and then loading data into HDFS.
  • Exporting of result set from HIVE to MySQL using Sqoop export tool for further processing.
  • Created data frames in particular schema from raw data stored at Amazon S3, lambda using PySpark.
  • Experienced in loading and transforming of large sets of structured, semi structured, and Unstructured data.
  • Responsible for developing data pipeline by implementing Kafka producers and consumers.
  • Worked on the ETL scripts and fixed the issues at the time of data load from various data sources.
  • Expertise in implementing Spark Scala application using higher order functions for both batch and interactive analysis requirement.
  • Developed a program to extract the name entities from ORC files.
  • Used GIT for version control.

Environment: Cloudera, Hadoop, HDFS, AWS, PIG, Hive, Impala, Spark-SQL, MapReduce, Flume, Sqoop, Oozie, Kafka, Spark, Scala, PySpark, Shell Scripting, HBase, ZooKeeper.

Confidential, Nashville, TN

Sr. Hadoop Developer

Responsibilities:

  • Migrating Python scripts from Pivotal Hadoop to Horton Works Hadoop.
  • Built Functions and Views in Hive to load data from various sources into the system.
  • Developed multiple Pyspark jobs for data cleaning and pre-processing.
  • Built Sqoop jobs for data load from various RDMS.
  • Hands on experience on Python Scripts to execute functions that load Data from External to Managed Tables in Hawq and Hive.
  • Trouble shoot Production Issues and Fix the issue.
  • Analysed the HQL scripts and designed the solution to implement using Python.
  • Worked in loading and transforming of large sets of structured, semi structured, and Unstructured data.
  • Managing and reviewing Hadoop Log files to resolve any configuration issues.
  • Implemented Hive complex UDF's to execute business logic with Hive Queries.
  • Used TFS for Version control.
  • Used Tidal (Enterprise scheduler) for Scheduling Daily, weekly and Monthly jobs.
  • 24/7 on call Production Support.

Environment: Pyspark, Spark, Hawq, Hive, Tidal, PL/SQL, Shell scripting and TFS.

Confidential, Nashville, TN

Hadoop/Spark Developer

Responsibilities:

  • Worked with Hadoop Ecosystem components like Sqoop, Flume, Oozie, Hive and Pig.
  • Developed PIG and Hive UDF's in java for extended use of PIG and Hive and wrote Pig Scripts for sorting, joining, filtering and grouping the data.
  • Developed spark programs using Scala, involved in creating Spark SQL Queries and Developed Oozie workflow for spark jobs.
  • Developed the Oozie workflows with Sqoop actions to migrate the data from relational databases like Oracle, Teradata to HDFS.
  • Developed Hive queries to do analysis of the data and to generate the end reports to be used by business users.
  • Implemented Spark using Scala and utilizing Data frames and Spark SQL API for faster processing of data.
  • Developed Spark code and Spark-SQL/Streaming for faster testing and processing of data.
  • Developed a data pipeline using Kafka, Cassandra and Hive to ingest, transform and analysing customer behavioural data.
  • Great familiarity with Hive joins & used HQL for querying the databases eventually leading to complex Hive UDFs.
  • Develop scripts to automate the execution of ETL using shell scripts under Unix environment
  • Responsible to migrate iterative map reduce programs into Spark transformations using Spark and Scala.
  • Used Scala to write the code for all the use cases in Spark and Spark SQL.
  • Expertise in implementing Spark and Scala application using higher order functions for both batch and interactive analysis requirement. Implemented SPARK batch jobs.
  • Worked with Spark core, Spark Streaming and spark SQL modules of Spark.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs and Scala.
  • Exploring with Spark various modules of Spark and working with Data Frames, RDD and Spark Context.
  • Developed a data pipeline using Spark and Hive to ingest, transform and analysing data.
  • Developed Spark scripts by using Scala shell commands as per the requirement.
  • Experienced in performance tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning.
  • Analysed the SQL scripts and designed the solution to implement using Scala.
  • Responsible for developing data pipeline with Amazon AWS to extract the data from weblogs and store in HDFS.
  • Implemented schema extraction for Parquet and Avro file Formats in Hive.
  • Implemented Partitioning, Dynamic Partitions, Buckets in HIVE.
  • Worked and learned a great deal from AWS Cloud services like EC2, S3, EMR and RDS.
  • Imported data from AWS S3 into Spark RDD, Performed transformations and actions on RDD's.

Environment: Hadoop YARN, Spark Core, Spark Streaming, Spark SQL, Scala, Kafka, Hive, Sqoop, Amazon AWS, Oozie, Cloudera, Oracle, Linux.

Confidential

SQL Developer

Responsibilities:

  • Plan, design, and implement application database code objects, such as stored procedures and views.
  • Build and maintain SQL scripts, indexes, and complex queries for data analysis and extraction.
  • Provide database coding to support business applications using PL/SQL.
  • Perform quality assurance and testing of SQL environment.
  • Develop new processes to facilitate import and normalization, including data file for counterparties.
  • Work with business stakeholders, application developers, and production teams and across functional units to identify business needs and discuss solution options.
  • Ensure best practices are applied and integrity of data is maintained through security, documentation, and change management.

Environment: PL/SQL, XML, CSS.

Hire Now