We provide IT Staff Augmentation Services!

Hadoop Developer Resume

3.00/5 (Submit Your Rating)

Bentonville, AR

PROFESSIONAL SUMMARY:

  • Around 5+ year of professional experience in Big Data technologies like HDFS, MapReduce, Hive, Sqoop, Oozie, Spark, SparkSQL, SparkStreaming and object - oriented programming and functional programming languages like Java, Python, Scala.
  • Worked on end-to-end data pipelines for use in individual big data analytics applications as well as for Data Warehouse purpose, providing data in a ready-to-use form to data scientists who are looking to run queries and algorithms against the information for predictive analytics, machine learning and data mining purposes.
  • Extensively worked on Spark Streaming for real-time data ingestion pipeline.
  • Experienced in Data Ingestion using Sqoop and IBM queue.
  • Expertise in designing ETL data model into Hadoop Framework using Hive.
  • Handled job workflow and job coordination in cluster using Oozie.
  • Experienced in Hadoop Development Platform with Cloudera Distribution, Hortonworks, Amazon web services (AWS), EC2 and S3.
  • Fine-tuning and enhance performance of Map Reduce jobs. Developed analytical components using Scala, Spark.
  • Hands on in writing Linux/Unix Shell scripting and Python Scripting.
  • Ability to blend technical expertise with strong conceptual and analytical skill to provide quality solutions.
  • Experience in understanding the security requirements for Hadoop and integrate with Kerberos authentication and authorization infrastructure.
  • Expertise in Java concepts such as Collections framework, Garbage collection, Exception handling, scala collections etc.

TECHNICAL SKILLS:

Big Data/ Data Engineering Technologies: HDFS, MapReduce, Hue, Hive, Spark, Pig, Sqoop, YARN, Oozie, Kafka, Flume, Zookeeper, Hadoop Distributions Platforms (Hortonworks, Cloudera)

Programming Languages: C, Java, Python, Scala, Shell-scripting

Java Technologies: Java, J2EE

Data Stores/Databases: Cassandra, HBase, Vertica, Teradata, MySQL, Oracle

Reporting Tools: Tableau, Business Objects

IDEs: IntelliJ, Pycharm, Anaconda jupyter, Eclipse, Netbeans, R studio

Tools: JIRA, Junit, Maven, sbt.

Version Control Systems: GIT, SVN, github.

PROFESSIONAL EXPERIENCE:

Confidential, Bentonville, AR

Hadoop Developer

Responsibilities:

  • Developed the YAML based ETL jobs that would ensure fault tolerant ways of Data ingestion, Data Integration and transformation using SQOOP, HIVE, SPARK and SHELL SCRIPTING.
  • Developed a batch data ingestion pipeline using Sqoop and Hive to ingest history and incremental data, transform it and push it to HIVE to persist the data which can be later used by downstream to analyze Supply Chain data.
  • Ensured lineage/metadata is being captured at every hop data hover from its true source to the data lake.
  • Automated all the ETL jobs with both event and schedule-based techniques using AUTOMIC tool.
  • Extensively working on Hive queries to load data from various sources like Teradata, DB2, Oracle, Mainframe etc.
  • Migrated the transformed data to Azure data lake and to Google Cloud Platform (GCP) depending on the business need for the users to consume.
  • Improved the performance of the load by tuning of the queries, using cost-based optimizations, persisting data in Hive in orc format using correct compression ensuring no data loss, hive dynamic partitioning, set parameters, mapper, and reducer tuning to improve the performance and latency of the queries.
  • Created shell scripts for identifying the incremental load, validate the data after loading and reducing the runtime for various purposes while loading the data to ADLS (Azure Data lake storage) to meet SLAs.
  • Created data validation Spark application in Scala for validating the data of source and target.
  • Providing on call support to all the production jobs every alternate week.
  • Working in Agile environment to meet the delivery deadlines of the product.

Environment: Hadoop, MapReduce, Spark, Hive, Teradata, SQL, ETL, Sqoop, Azure Data Lake Storage, Google Cloud Platform, YAML, Scala, Linux/Unix, Hortonworks Hadoop Distribution Platform.

Confidential, Atlanta, GA

Big Data/Hadoop Developer

Responsibilities:

  • Designed an ETL framework that would pull data from SQL server and load it into Hive for reporting usage metrics as well as spending aggregates.
  • Implemented the framework using Spark to load user information and related customer data from SQL server to HDFS using Sqoop.
  • Imported data to Spark and loaded to Hive tables ensuring correct format with no data loss.
  • Worked on batch data ingestion pipeline using Sqoop, Hive to ingest, transform and analyzing customer behavioral data.
  • Developed Shell script code to trigger the spark job.
  • Used Spark framework to push batch of data, perform transformations to the data and finally loading to Hive for batch data processing.
  • Developed Spark programs to do transformations, joins, filters, and pre-aggregations before storing the data to Hive tables.
  • Worked on performance tuning of the queries, cost-based optimizations to improve the performance and latency of the queries.
  • Developed Error mail code in Scala, which triggers mail to specified group when error occurs in processing the json file.
  • Worked with Hadoop security and access controls like Kerberos.
  • Hands on experience in writing Bash Linux/Unix Shell Scripting.
  • Used Cloudera as Hadoop Distribution platform.

Environment: Yarn, HDFS, Mapreduce, Hive, Sqoop, Spark, Cloudera, Cloudera Manager, Oracle, Oozie, Spark SQL, Cassandra, Tableau, Bash, Tidal, Scala, Python, Java.

Confidential

 Software Developer

Responsibilities:

  • Worked on Batch data ingestion pipeline using Apache SQOOP to import data from SQL server.
  • Imported the structured data using Sqoop and correlated with the weblog data to get insights.
  • Created External Hive tables, performed aggregations to determine Bounce Rate, Miss Rate, etc.
  • Experience in writing Hive scripts and extending Hive core functionality by writing custom UDFs.
  • Experience in software development Life cycle process such as Agile.
  • Reviewing the developed code to check for the errors in the code.

Environment: Hadoop, MapReduce, Hive, Sqoop, SQL server, Hortonworks

We'd love your feedback!