We provide IT Staff Augmentation Services!

Hadoop/etl Data Engineer Resume

0/5 (Submit Your Rating)

New, YorK

SUMMARY

  • Over NINE years of professional experience in Information Technology and FIVE years of expertise in BIGDATA using HADOOP framework and Analysis, Design, Development, Testing, Documentation, Deployment and Integration using SQL and Big Data technologies.
  • Expertise in using major components of Hadoop ecosystem components like HDFS, YARN, MapReduce, Hive, Impala, Pig, Sqoop, HBase, Spark, Spark SQL, Kafka, Spark Streaming, Flume, Oozie, Zookeeper, Hue.
  • Good understanding of distributed systems, HDFS architecture, internal working details of Map Reduce and Spark processing frameworks.
  • Well Experienced in ETL methods for data extraction, transformation and loading in corporate - wide ETL Solutions and Data Warehouse tools for reporting and data analysis.
  • Develop data set processes for data modelling, and mining. Recommend ways to improve data reliability, efficiency and quality.
  • Experience in importing and exporting the data using Sqoop from HDFS to Relational Database systems and vice-versa and load into Hive tables, which are partitioned.
  • Having good knowledge in writing MapReduce jobs through Pig, Hive, and Sqoop.
  • Extensive knowledge in writing Hadoop jobs for data analysis as per the business requirements using Hive and worked on HiveQL queries for required data extraction, join operations, writing custom UDF's as required and having good experience in optimizing Hive Queries.
  • Proficient in Hadoop, PySpark, Core Java and SQL.
  • Hands on experience in HDFS, Map Reduce, Pig, Hive, Sqoop, Oozie.
  • Expert in SQL Querying using Oracle.
  • Hands on Experience in Talend ETL Tool.
  • Proficient in Hive Query Language (HQL) for extracting data from Hive.
  • Working experience in ingesting data on to the clusters using Sqoop (incremental) and Flume.
  • Developed customized UDF’s in java to extend Hive functionality
  • Strong analytical and problem solving skills.
  • Good understanding of object oriented principles.
  • Conducted design and code reviews.
  • Good understanding of data structures.
  • Strong Knowledge in all phases of Software Development Life Cycle (SDLC).
  • Good written and verbal communication skills and the ability to perform tasks independently as well as in teams.
  • Prepared use case scenarios, business rules engine, functional specifications and technical documents.
  • Experience in troubleshooting Map Reduce jobs and addressing production issues (such as data issues, environmental issues, performance tuning and enhancements).

TECHNICAL SKILLS

Big Data Technology: Hadoop, Hive, Hive LLAP, PySpark, Pig, Sqoop, Oozie, Apache Kafka, Apache Ignite, Zookeeper.

Languages: Shell Scripting, Java, SQL.

Operating System: Windows, Linux.

Databases: Oracle, DB2, Postgres.

Hadoop Distribution’s: CDH, HDP.

Automation Framework: LISA.

Tools: Used: Hue, Putty, WinSCP, Eclipse, Netbeans, SQL Developer, Squirrel, SVN, Autosys, Talend, BitBucket.

Visualization Tool: Tableau Desktop.

PROFESSIONAL EXPERIENCE

Confidential, New York

Hadoop/ETL Data Engineer

Responsibilities:

  • Evaluating client needs and translating their business requirement to functional specifications thereby onboarding them onto Hadoop ecosystem.
  • Extracted and updated the data into HDFS using Sqoop import and export.
  • Developed HIVE UDFs to in corporate external business logic into Hive script and Developed join data set scripts using HIVE join operations.
  • Requirement gathering and analysis.
  • Created various hive external tables, staging tables and joined the tables as per the requirement. Implemented static Partitioning, Dynamic partitioning and Bucketing.
  • Worked with various HDFS file formats like Parquet, Json for serializing and deserializing.
  • Worked with the Spark for improving performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, PySpark, Pair RDD's, Spark YARN. Used Spark for interactive queries, processing of streaming data and integration with popular NoSQL database for huge volume of data..
  • Strong Knowledge on architecture and components of Spark, and efficient in working with Spark Core, SparkSQL.
  • Experience in using Kafka and Kafka brokers to initiate spark context and processing livestreaming.
  • Developed custom Kafka producer and consumer for different publishing and subscribing to Kafka topics.
  • Used HUE for running Hive queries. Created partitions according to day using Hive to improve performance.
  • Developed Oozie workflow engine to run multiple Hive, Pig, Sqoop and Spark jobs.
  • Involved in the Design phase of the project.
  • Data Modelling and Data Warehouse design in Oracle.
  • Implementing & supporting the Suddenlink Live Wire Implementation using Kafka and Java whch loads customer calls data in to KOM database.
  • Performance tuning of Ingest scripts which load huge data and process ETL scripts.
  • History Data Migration from Genesys Calls Data from Suddenlink Database to KOM Database.
  • Having a client interaction regarding technical aspects on daily basis.
  • Acting as onsite lead, participate in knowledge management activities with client and responsible for sharing the same with the offshore team.
  • Developing and maintaining shell scripts and SQLs.
  • Performance tuning of SQL Scripts for the better performance and minimize the execution time.
  • Conducting reviews for self and peers.
  • Unit testing and Integration testing.

Environment: Hadoop (HDFS, MapReduce), Yarn, Spark, Hive, Hue, Sqoop, Flume, Oracle, Kafka, Shell Scripting, SQL.

Confidential

Hadoop Data Engineer

Responsibilities:

  • Requirement gathering and analysis.
  • Involved in the Design phase of the project.
  • Data Modelling and Data Warehouse design in Hive and Oracle
  • Involved in the development of Ingest ETL in to both HDFS and Oracle
  • Created various hive external tables, staging tables and joined the tables as per the requirement. Implemented static Partitioning, Dynamic partitioning and Bucketing.
  • Ingesting the usage data in to HDFS using the sqoop and in to Oracle staging tables using the SQLs.
  • Scheduling the Jobs through Appworx Scheduler.
  • Performance tuning of Ingest scripts which load huge data and process ETL scripts
  • Conducting reviews for self and peers.
  • Unit testing and Integration testing.

Environment: HDP, Hive, Sqoop, Shell Scripting, Talend, PLSQL.

Confidential

Hadoop Data Engineer

Responsibilities:

  • Requirement gathering and analysis.
  • Involved in the Design phase of the project.
  • Data Modelling and Data Warehouse design in Hive.
  • Involved in the development of automation tools.
  • Having a client interaction regarding technical aspects on daily basis.
  • Acting as onsite lead, participate in knowledge management activities with client and responsible for sharing the same with the offshore team.
  • History Data Migration from Oracle/ Hana to Hive using Sqoop.
  • Handling duplicate tables across schemas in Oracle and Hana.
  • Understanding BODS workflows and transformation of business logic to Spark (Python) scripts.
  • Performance tuning of Spark scripts to meet BODS & Hana performance.
  • Conducting reviews for self and peers.
  • Unit testing and Integration testing.

Environment: HDP, Spark, Python, Hive, Sqoop, Shell Scripting.

Confidential

Hadoop Engineer

Responsibilities:

  • Involved in requirement analysis and designing of PAF Outbound.
  • Developed and monitored Aggregate jobs.
  • Developed hive scripts to create and load data into hive tables.
  • Developed Shell Scripts for the data movement in the cluster.
  • Developed Oozie Workflows and tested to streamline multiple data ingestion & transformation actions.
  • Created and monitored the Autosys jobs.
  • Involved in Failover activities.

Environment: HDP 2.2, Hive, Oozie, Schell scripting, Java.

Confidential, El Segundo, CA

Hadoop Engineer

Responsibilities:

  • Migrated structured data from warehouse into the cluster using Sqoop.
  • Created external HIVE tables and ingested data into them.
  • Created staging tables to store intermediate incremental data.
  • Implemented dynamic partitions to load from the staging tables.
  • Developed HQL to analyze huge data sets.
  • Monitoring jobs performance and status.

Environment: Hive, Sqoop, Map Reduce, Data Torrent (Real Time Streaming Application), UNIX Shell Scripting.

Confidential

Hadoop Engineer

Responsibilities:

  • Involved in the requirement analysis and designing of various VDCS & VDDS components.
  • Developed Sqoop jobs to transfer huge data from Oracle to HDFS.
  • Developed Transformation scripts to transfer data between Hive Tables.
  • Developed and tested Oozie workflows with Sqoop, Hive and Shell actions.
  • Developed scripts to create and load data into hive tables.
  • Developed Oozie Workflows for the pipeline to ingest from source data to Staging and Staging to Foundation Layer.
  • Responsible for functional testing
  • Involved in daily/weekly status calls with Clients.
  • Logging the bugs in the JIRA tool.
  • Involved in components deployments (Regular Build Process)

Environment: Hadoop, Hive, Pig, Sqoop, Flume, Oozie, Zookeeper, Java.

Confidential, Delaware

Hadoop Engineer

Responsibilities:

  • Migrated transaction and customer data from log files & databases to HDFS
  • Closely monitored map reduce programs that will parse the log data and structure them in tabular format to facilitate effective querying
  • Created external hive tables to analyze parsed data
  • Set up cron jobs to delete Hadoop logs, local old job files and cluster temp files
  • Created buckets to improve the performance of the jobs
  • Performed unit testing on the jobs
  • Developed UDFs for various business cases
  • Debugged and tuned the performance of jobs
  • Created workflow using Oozie

Environment: HDFS, Map Reduce, Hive, Sqoop, Java, Oozie, Flume.

Confidential, Delaware

Software Engineer

Responsibilities:

  • Design, Coding, Testing Implementation of new or enhanced processes.
  • Coding includes COBOL, Cobol-DB2, Batch programs, online programs, JCL, SQL and code generator tool TELON.
  • Preparing Mainframe Utilization Reports.
  • GCAP Batch JOB monitoring.

Environment: Z/OS, COBOL, JCL, TELON.

We'd love your feedback!