We provide IT Staff Augmentation Services!

Big Data Engineer Resume

2.00/5 (Submit Your Rating)

SUMMARY:

  • Over 9 years of experience in IT working for Fortune 500 companies
  • Certified Hadoop Developer with experience in designing and developing data platforms to assist and guide business decisions
  • Expert in designing efficient and reliable ETL data pipelines using Hadoop and Spark
  • Designed near real - time and batch-oriented ingestion of data into Hadoop Data Lake using Spark Streaming, Kafka and Sqoop
  • Hands on experience in using cloud platforms like AWS and Azure
  • Experience in working with Agile and waterfall models
  • Experience in business domains like Payments, Banking & Finance, Airlines and Retail
  • An active team player with effective communication and interpersonal skills
  • Translated business requirements into detailed, production-level technical specifications, detailing new features and enhancements to existing business functionality

TECHNICAL SKILLS:

Big Data Platform: Hadoop, Hive, Spark

Programming: Python, Scala, Shell

DBMS: Teradata, Oracle, DB2

Version Control: SVN, Github

Cloud: AWS (Lambda, S3, EMR, Athena), Azure (Data Factory)

Orchestration: Oozie

PROFESSIONAL EXPERIENCE:

Confidential

Big Data Engineer

Responsibilities:

  • Develop ETL data pipelines using combination of tools like Hive, Spark with Scala, Spark Streaming, Sqoop
  • Develop UDF’s in java to mask sensitive data using Hashing algorithms before exposing to external vendors
  • Develop pig scripts to dedupe and merge historical and incremental data
  • Develop automated JSON generation model from data in hive using java MapReduce
  • Develop automated batch job to migrate data from cluster to cluster in Hadoop
  • Develop Oozie worfklows and coordinator to schedule Hadoop jobs
  • Design Azure DataFactory pipelines to supply data to third party vendors
  • Automation of manually running reports using Shell Scripting
  • Develop automated alerting system using Oozie API to provide near real-time status of the running coordinator instances

Environment: Hadoop, Hive, Spark, Azure

Confidential

Data Engineer

Responsibilities:

  • Design data ingestion process using tool Sqoop
  • Design ETL data pipelines using combination of tools like Hive, Spark, Impala
  • Develop automated script to deploy on demand Cloudera Hadoop Cluster on AWS
  • Design reporting layer on Athena using AWS Glue, Lambda and S3
  • Develop approach to create and decommission on demand Hadoop cluster on a daily basis

Environment: Hadoop, Hive, Spark, AWS, Lambda

Confidential

Senior Data Engineer

Responsibilities:

  • Design data ingestion process using tools Sqoop
  • Design ETL data pipelines using combination of tools like Hive, SparkSQL and PySpark
  • Migrate existing Spark Jobs in production to run via “Spark Compute as a Service using Apache Livy” framework which enabled SparkSession sharing and improve performance
  • Design and develop a homogenous layer on hive to accommodate various data sources adhere to the same data model
  • Process and load real time data for every 30 minutes on to HDFS using HiveQL
  • Develop reporting queries using OLAP functions on top of the financial data in Hive and publish to the Business users on regular time intervals
  • Automation of manually running reports using Shell Scripting, Teradata and scheduled in Crontab
  • Migration of Teradata tables to Hadoop using Hive and orchestration via internal python Framework
  • Also worked on Uc4 scheduler and Informatica

Environment: Hadoop, Hive, Teradata, Spark

Confidential, Austin, TX

Consultant

Responsibilities:

  • Design ETL pipelines using Hive and Spark (PySpark)
  • Develop job orchestration using Oozie
  • Refactor Data Ingestion into Hadoop Data Lake from disparate third-party vendors for better performance using SFTP, Gsutil and Teradata connector for Sqoop
  • Refactor HDFS schema design according to best practices
  • Design scalable data layout in Hive by choosing the right file formats (parquet, sequencefile, ORC) and compression codecs (snappy, Lzo etc)
  • Develop SparkSQL code to replace traditional Hive MapReduce jobs
  • Automated testing script to perform QA

Environment: Hadoop, Spark, Hive

Confidential

Associate

Responsibilities:

  • Provide BI consulting solutions
  • Use Big Data technologies like Hadoop, Cassandra in BI data delivery
  • Data Migration from existing Teradata Systems to Hortonworks HDInsight cluster on Azure
  • Leverage core expertise in solution design and managing enterprise wide BI (Data warehousing/Data Integration) implementations
  • Perform data analysis over large datasets using Apache Pig,Apache Hive and Spark
  • Design and build data staging and summary (aggregated) area in Hive DW

Environment: Hadoop, Hive, Spark, Azure

Confidential

Associate

Responsibilities:

  • Create Technical Design and ETL mapping documents
  • Lead offshore team, allocate and track tasks assigned
  • Design DataStage ETL jobs, DataStage Sequences and Shell scripts
  • Unit testing of DataStage ETL jobs
  • Design the flow of execution using Datastage
  • Performance tuning of SQL queries and Datastage jobs

Environment: Datastage, Teradata, UNIX Scripting

Confidential

Programmer Analyst

Responsibilities:

  • Create Technical Design and ETL mapping documents
  • Perform impact analysis pertaining to DML and DDL changes to the Banking Data Warehouse
  • Prepare DDL and DML scripts
  • Design DataStage ETL jobs, DataStage Sequences and Shell scripts
  • Unit testing of DataStage ETL jobs
  • Design the flow of execution using Datastage
  • Performance tuning of SQL queries and Datastage jobs

Environment: DataStage, Oracle, DB2, UNIX Scripting

We'd love your feedback!