Big Data Engineer Resume

SUMMARY:

Over 9 years of experience in IT working for Fortune 500 companies
Certified Hadoop Developer with experience in designing and developing data platforms to assist and guide business decisions
Expert in designing efficient and reliable ETL data pipelines using Hadoop and Spark
Designed near real - time and batch-oriented ingestion of data into Hadoop Data Lake using Spark Streaming, Kafka and Sqoop
Hands on experience in using cloud platforms like AWS and Azure
Experience in working with Agile and waterfall models
Experience in business domains like Payments, Banking & Finance, Airlines and Retail
An active team player with effective communication and interpersonal skills
Translated business requirements into detailed, production-level technical specifications, detailing new features and enhancements to existing business functionality

TECHNICAL SKILLS:

Big Data Platform: Hadoop, Hive, Spark

Programming: Python, Scala, Shell

DBMS: Teradata, Oracle, DB2

Version Control: SVN, Github

Cloud: AWS (Lambda, S3, EMR, Athena), Azure (Data Factory)

Orchestration: Oozie

PROFESSIONAL EXPERIENCE:

Confidential

Big Data Engineer

Responsibilities:

Develop ETL data pipelines using combination of tools like Hive, Spark with Scala, Spark Streaming, Sqoop
Develop UDF’s in java to mask sensitive data using Hashing algorithms before exposing to external vendors
Develop pig scripts to dedupe and merge historical and incremental data
Develop automated JSON generation model from data in hive using java MapReduce
Develop automated batch job to migrate data from cluster to cluster in Hadoop
Develop Oozie worfklows and coordinator to schedule Hadoop jobs
Design Azure DataFactory pipelines to supply data to third party vendors
Automation of manually running reports using Shell Scripting
Develop automated alerting system using Oozie API to provide near real-time status of the running coordinator instances

Environment: Hadoop, Hive, Spark, Azure

Confidential

Data Engineer

Responsibilities:

Design data ingestion process using tool Sqoop
Design ETL data pipelines using combination of tools like Hive, Spark, Impala
Develop automated script to deploy on demand Cloudera Hadoop Cluster on AWS
Design reporting layer on Athena using AWS Glue, Lambda and S3
Develop approach to create and decommission on demand Hadoop cluster on a daily basis

Environment: Hadoop, Hive, Spark, AWS, Lambda

Confidential

Senior Data Engineer

Responsibilities:

Design data ingestion process using tools Sqoop
Design ETL data pipelines using combination of tools like Hive, SparkSQL and PySpark
Migrate existing Spark Jobs in production to run via “Spark Compute as a Service using Apache Livy” framework which enabled SparkSession sharing and improve performance
Design and develop a homogenous layer on hive to accommodate various data sources adhere to the same data model
Process and load real time data for every 30 minutes on to HDFS using HiveQL
Develop reporting queries using OLAP functions on top of the financial data in Hive and publish to the Business users on regular time intervals
Automation of manually running reports using Shell Scripting, Teradata and scheduled in Crontab
Migration of Teradata tables to Hadoop using Hive and orchestration via internal python Framework
Also worked on Uc4 scheduler and Informatica

Environment: Hadoop, Hive, Teradata, Spark

Confidential, Austin, TX

Consultant

Responsibilities:

Design ETL pipelines using Hive and Spark (PySpark)
Develop job orchestration using Oozie
Refactor Data Ingestion into Hadoop Data Lake from disparate third-party vendors for better performance using SFTP, Gsutil and Teradata connector for Sqoop
Refactor HDFS schema design according to best practices
Design scalable data layout in Hive by choosing the right file formats (parquet, sequencefile, ORC) and compression codecs (snappy, Lzo etc)
Develop SparkSQL code to replace traditional Hive MapReduce jobs
Automated testing script to perform QA

Environment: Hadoop, Spark, Hive

Confidential

Associate

Responsibilities:

Provide BI consulting solutions
Use Big Data technologies like Hadoop, Cassandra in BI data delivery
Data Migration from existing Teradata Systems to Hortonworks HDInsight cluster on Azure
Leverage core expertise in solution design and managing enterprise wide BI (Data warehousing/Data Integration) implementations
Perform data analysis over large datasets using Apache Pig,Apache Hive and Spark
Design and build data staging and summary (aggregated) area in Hive DW

Environment: Hadoop, Hive, Spark, Azure

Confidential

Associate

Responsibilities:

Create Technical Design and ETL mapping documents
Lead offshore team, allocate and track tasks assigned
Design DataStage ETL jobs, DataStage Sequences and Shell scripts
Unit testing of DataStage ETL jobs
Design the flow of execution using Datastage
Performance tuning of SQL queries and Datastage jobs

Environment: Datastage, Teradata, UNIX Scripting

Confidential

Programmer Analyst

Responsibilities:

Create Technical Design and ETL mapping documents
Perform impact analysis pertaining to DML and DDL changes to the Banking Data Warehouse
Prepare DDL and DML scripts
Design DataStage ETL jobs, DataStage Sequences and Shell scripts
Unit testing of DataStage ETL jobs
Design the flow of execution using Datastage
Performance tuning of SQL queries and Datastage jobs

Environment: DataStage, Oracle, DB2, UNIX Scripting

We provide IT Staff Augmentation Services!

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship