Big Data Engineer Resume
SUMMARY:
- Over 9 years of experience in IT working for Fortune 500 companies
- Certified Hadoop Developer with experience in designing and developing data platforms to assist and guide business decisions
- Expert in designing efficient and reliable ETL data pipelines using Hadoop and Spark
- Designed near real - time and batch-oriented ingestion of data into Hadoop Data Lake using Spark Streaming, Kafka and Sqoop
- Hands on experience in using cloud platforms like AWS and Azure
- Experience in working with Agile and waterfall models
- Experience in business domains like Payments, Banking & Finance, Airlines and Retail
- An active team player with effective communication and interpersonal skills
- Translated business requirements into detailed, production-level technical specifications, detailing new features and enhancements to existing business functionality
TECHNICAL SKILLS:
Big Data Platform: Hadoop, Hive, Spark
Programming: Python, Scala, Shell
DBMS: Teradata, Oracle, DB2
Version Control: SVN, Github
Cloud: AWS (Lambda, S3, EMR, Athena), Azure (Data Factory)
Orchestration: Oozie
PROFESSIONAL EXPERIENCE:
Confidential
Big Data Engineer
Responsibilities:
- Develop ETL data pipelines using combination of tools like Hive, Spark with Scala, Spark Streaming, Sqoop
- Develop UDF’s in java to mask sensitive data using Hashing algorithms before exposing to external vendors
- Develop pig scripts to dedupe and merge historical and incremental data
- Develop automated JSON generation model from data in hive using java MapReduce
- Develop automated batch job to migrate data from cluster to cluster in Hadoop
- Develop Oozie worfklows and coordinator to schedule Hadoop jobs
- Design Azure DataFactory pipelines to supply data to third party vendors
- Automation of manually running reports using Shell Scripting
- Develop automated alerting system using Oozie API to provide near real-time status of the running coordinator instances
Environment: Hadoop, Hive, Spark, Azure
Confidential
Data Engineer
Responsibilities:
- Design data ingestion process using tool Sqoop
- Design ETL data pipelines using combination of tools like Hive, Spark, Impala
- Develop automated script to deploy on demand Cloudera Hadoop Cluster on AWS
- Design reporting layer on Athena using AWS Glue, Lambda and S3
- Develop approach to create and decommission on demand Hadoop cluster on a daily basis
Environment: Hadoop, Hive, Spark, AWS, Lambda
Confidential
Senior Data Engineer
Responsibilities:
- Design data ingestion process using tools Sqoop
- Design ETL data pipelines using combination of tools like Hive, SparkSQL and PySpark
- Migrate existing Spark Jobs in production to run via “Spark Compute as a Service using Apache Livy” framework which enabled SparkSession sharing and improve performance
- Design and develop a homogenous layer on hive to accommodate various data sources adhere to the same data model
- Process and load real time data for every 30 minutes on to HDFS using HiveQL
- Develop reporting queries using OLAP functions on top of the financial data in Hive and publish to the Business users on regular time intervals
- Automation of manually running reports using Shell Scripting, Teradata and scheduled in Crontab
- Migration of Teradata tables to Hadoop using Hive and orchestration via internal python Framework
- Also worked on Uc4 scheduler and Informatica
Environment: Hadoop, Hive, Teradata, Spark
Confidential, Austin, TX
Consultant
Responsibilities:
- Design ETL pipelines using Hive and Spark (PySpark)
- Develop job orchestration using Oozie
- Refactor Data Ingestion into Hadoop Data Lake from disparate third-party vendors for better performance using SFTP, Gsutil and Teradata connector for Sqoop
- Refactor HDFS schema design according to best practices
- Design scalable data layout in Hive by choosing the right file formats (parquet, sequencefile, ORC) and compression codecs (snappy, Lzo etc)
- Develop SparkSQL code to replace traditional Hive MapReduce jobs
- Automated testing script to perform QA
Environment: Hadoop, Spark, Hive
Confidential
Associate
Responsibilities:
- Provide BI consulting solutions
- Use Big Data technologies like Hadoop, Cassandra in BI data delivery
- Data Migration from existing Teradata Systems to Hortonworks HDInsight cluster on Azure
- Leverage core expertise in solution design and managing enterprise wide BI (Data warehousing/Data Integration) implementations
- Perform data analysis over large datasets using Apache Pig,Apache Hive and Spark
- Design and build data staging and summary (aggregated) area in Hive DW
Environment: Hadoop, Hive, Spark, Azure
Confidential
Associate
Responsibilities:
- Create Technical Design and ETL mapping documents
- Lead offshore team, allocate and track tasks assigned
- Design DataStage ETL jobs, DataStage Sequences and Shell scripts
- Unit testing of DataStage ETL jobs
- Design the flow of execution using Datastage
- Performance tuning of SQL queries and Datastage jobs
Environment: Datastage, Teradata, UNIX Scripting
Confidential
Programmer Analyst
Responsibilities:
- Create Technical Design and ETL mapping documents
- Perform impact analysis pertaining to DML and DDL changes to the Banking Data Warehouse
- Prepare DDL and DML scripts
- Design DataStage ETL jobs, DataStage Sequences and Shell scripts
- Unit testing of DataStage ETL jobs
- Design the flow of execution using Datastage
- Performance tuning of SQL queries and Datastage jobs
Environment: DataStage, Oracle, DB2, UNIX Scripting
