We provide IT Staff Augmentation Services!

Senior Data Engineer Resume

2.00/5 (Submit Your Rating)

SUMMARY

  • Highly analytical and team - oriented Senior Data Engineer with 16+ years of experience in designing, developing, and implementing highly scalable enterprise data solutions to meet complex business needs.
  • Expertise in Big Data tools, technologies, and ecosystems including Apache Hadoop, Spark, Scala, MapReduce, Hive, Pig and HBase, Zookeeper, Sqoop, Yarn, Impala, and Stonebranch.
  • Strong experience with AWS Cloud Services like EC2, EMR, IAM, Security Groups, Redshift, RDS, CloudWatch, S3, etc.
  • Hands-on experience in developing and implementing infrastructure, frameworks, and platforms for data ingestion, aggregation, integration, and analysis within Hadoop ecosystems.
  • Strong knowledge of all phases of the Data Life Cycle: Data Acquisition, Data Quality Management, Data Governance, and Metadata Management.
  • Well-versed in technical concepts, principles, and best practices related to ETL, Data Warehousing, Object-Oriented Programming, Cloud Platforms, DevOps, Data Science, and Data Analytics.
  • Excellent written and verbal communication skills with a demonstrated ability to interface with cross-functional Agile-Scrum teams and internal and external stakeholders at all levels of business.
  • Performance Engineering of DB SQLs and ETLs.
  • Strong knowledge in Object Oriented Programming languages (Scala, Python)
  • Hands on experience implementing complex Big Data pipelines batch and real time pipelines.
  • Strong skills in IBM-DataStage 11.5/9.1/8.5/7.5 , Informatica, Talend 5.3/6.2, SQL Programming, IBM DB2, Teradata, Netezza, Oracle PL/SQL, Debugging, Performance tuning and Shell Scripting.
  • Knowledge and experience in design, development and deployments of Big Data projects using Hadoop.
  • Coded complex SQLs (ELT scripts) to load data into Data Warehouse Foundation/Aggregate tables.
  • Created a Detailed design technical artifacts with
  • Data Flow Diagrams (DFDs), Program architecture designs and Data Models Design.
  • ETL/ELT Jobs, ETL Designs, SQLs and reference table designs.
  • RTM, Data Loading Methodologies, Post Implementation support
  • Experience using Sqoop to import data into HDFS from RDBMS and vice-versa and dealing with log files to extract data and to copy into HDFS.
  • Expertise on Logical and Physical modeling of Landing/Staging/Foundation and Mart Layers.
  • Rich Experience in Retail, Sales, Supply Chain and Banking Domains.
  • Executed multiple end to end Enterprise data warehousing projects.

TECHNICAL SKILLS

Big Data: Hadoop, Spark (SQL/Data Frames, Data sets), Streaming - Apache Kafka, Kinesis, Spark, Hive, Yarn, HDFS, MapReduce, Sqoop, Oozie, Zookeeper, NiFi, Stream Sets, Change Data Capture Parquet/Avro/ORC/XML/JSON/ORC/CSV/TXT (formats)

Hadoop Distributions: Cloudera, AWS EMR, Azure HDInsight

ETL Tools: DataStage/(V 11.3/8.5/8.1/7.5/7.0 Designer, Administrator, Director, Manager, Parallel Extender).

Query Tools: Advanced Query Tool, Squirrel, Ambari, SQL developer, Teradata SQL Assistant

Database: Teradata, DB2, Netezza, Oracle, MSSQL Server

Storage: HBase, Cassandra, MongoDB, Oracle, SQL Server, NoSQL and Columnar Databases, DB2, PL/SQL, T-SQL, MySQL, PostgreSQL

Versioning: Git, AWS Code Commit

Scripting: Scala, Java, Python, PowerShell, d3js, REST, JavaScript, JSON, CURL

BI Concepts: OLTP, ODS, Data Warehousing, OLAP, Dimensional Modeling, Slowly Changing Dimensions

Languages: SQL, PL/SQL, Shell Scripting, Java, Python, Scala

Platforms: Cloudera CDH, Hortonworks HDP, Linux, ER/Studio Data Architect, Jupyter, SBT, NetBeans, GitHub, TFS, Jenkins, Web Services, Design Patterns

Scheduling Tools: CA7, Control-M, Crontab, Stonebranch

Other: Docker, Kubernetes, Jenkins, Ansible C, Tableau, MDM

PROFESSIONAL EXPERIENCE

Confidential

Senior Data Engineer

Technology: Hadoop 3.0.1, Spark, Scala, Sqoop, Hive, HBase, HiveQL, Oracle, MSSQL, Unix and Stonebrach scheduler.

Responsibilities:

  • Analyze and understand requirements from Business teams.
  • Built Real time streaming analytics with Kafka streams and Spark Structured streaming.
  • Built a centralized one stop shop for Confidential (EDP, Enterprise Data Platform) data lake Customer, product data for batch analytics.
  • Automated EDP data lake saves hundreds of hours each time who previously had to pull data manually from various legacy RDBMS systems.
  • Responsible for building scalable and distributed data solutions using Cloudera CDH002E
  • Developed Sqoop jobs for data ingestion, incremental data loads from RDBMS to HDFS
  • Built pipelines and integrated with Bigdata Hadoop platform using Spark, Scala, Hive.
  • Built distributed real time processing with Kafka stream sets.
  • Used Change Data Capture Scripts and Spark for type 1 and Type 2 processes and performed complex transformations using Spark (SQL, Datasets)/Scala.
  • Acquire full knowledge of Hadoop architecture and HDFS
  • Have working knowledge of MapReduce, HBase, Spark, Scala and Hive
  • Ensure to choose a Hadoop solution that would be deployed without any hindrance.
  • Automated all the production jobs using stonebranch, for extracting the data from different Data Sources such as Oracle, MSSQL, Mainframes pushing the result set data to Hadoop Distributed File System.
  • Participated in the setup and deployment of Hadoop clusters.
  • Scheduled all the spark Scala jobs using stonebranch scheduler.
  • Developed and supported the extraction, transformation, and load process (ETL) for data migration.
  • Profiling data and perform validations.

Confidential

Technical Lead

Technology: Hadoop 3.0.1, Spark, Scala, Sqoop, Hive, HBase, Datastage, HiveQL, Oracle, MSSQL,Infosphere CDC, HDFS, Sqoop, Hive, HBase, HiveQL, Netezza and Oozie/Control M.

Responsibilities:

  • Analyze and understand requirements from Business teams.
  • Built Real time streaming analytics with Kafka streams and Spark Structured streaming.
  • Built distributed real time processing with Kafka stream sets.
  • Used Spark for ETL processes and performed complex transformations using Spark (SQL, Datasets)/Scala.
  • Acquire full knowledge of Hadoop architecture and HDFS
  • Have working knowledge of MapReduce, HBase, Spark,Scala and Hive
  • Ensure to choose a Hadoop solution that would be deployed without any hindrance.
  • Automated all the jobs for extracting the data from different Data Sources such as Oracle, MSSQL, Mainframes pushing the result set data to Hadoop Distributed File System
  • Analyze and understand requirements from Business teams.
  • Data Analysis and Profiling to answer business/functional questions.
  • Designed patterns for history/incremental loads using CDC and Sqoop process.
  • Designed ETL (Datastage) jobs for extracting from HIVE snapshot tables, transforming, loading data into different stage/Base tables using Datastage and Talend jobs.
  • Data from Hadoop HIVE tables will be loaded into EDW Dimensions and Fact tables for further reporting and analysis
  • Identified columns for compression in tables for efficient usage of disk space.
  • Created detail step-by- step documents and executed the same in Production to load data into different tables
  • Migrated Netezza to Hive/MS-SQL Server and Control-M to Oozie.
  • Created Datastage ETLs as per business need.
  • Extensively used ETL methodology for supporting ETL solutions using Datastage with strong Knowledge on OLAP, OLTP, Star and Snowflake Schema methodologies
  • Configured jobs to run via Oozie.
  • Developed UNIX Shell Scripts, FTP and Datastage jobs to process and load data files.
  • Developed complex SQL scripts to transform and load data (as per business needs).
  • Performance Engineering of ETL jobs and SQL Queries.
  • Responsible for assisting in the development, execution and documentation of system and integration test plans, UAT and Regression phases; reviewed test cases and test strategy
  • Perform quality monitoring and trending on data standardization processes

We'd love your feedback!