Senior Data Engineer Resume

SUMMARY

Highly analytical and team - oriented Senior Data Engineer with 16+ years of experience in designing, developing, and implementing highly scalable enterprise data solutions to meet complex business needs.
Expertise in Big Data tools, technologies, and ecosystems including Apache Hadoop, Spark, Scala, MapReduce, Hive, Pig and HBase, Zookeeper, Sqoop, Yarn, Impala, and Stonebranch.
Strong experience with AWS Cloud Services like EC2, EMR, IAM, Security Groups, Redshift, RDS, CloudWatch, S3, etc.
Hands-on experience in developing and implementing infrastructure, frameworks, and platforms for data ingestion, aggregation, integration, and analysis within Hadoop ecosystems.
Strong knowledge of all phases of the Data Life Cycle: Data Acquisition, Data Quality Management, Data Governance, and Metadata Management.
Well-versed in technical concepts, principles, and best practices related to ETL, Data Warehousing, Object-Oriented Programming, Cloud Platforms, DevOps, Data Science, and Data Analytics.
Excellent written and verbal communication skills with a demonstrated ability to interface with cross-functional Agile-Scrum teams and internal and external stakeholders at all levels of business.
Performance Engineering of DB SQLs and ETLs.
Strong knowledge in Object Oriented Programming languages (Scala, Python)
Hands on experience implementing complex Big Data pipelines batch and real time pipelines.
Strong skills in IBM-DataStage 11.5/9.1/8.5/7.5 , Informatica, Talend 5.3/6.2, SQL Programming, IBM DB2, Teradata, Netezza, Oracle PL/SQL, Debugging, Performance tuning and Shell Scripting.
Knowledge and experience in design, development and deployments of Big Data projects using Hadoop.
Coded complex SQLs (ELT scripts) to load data into Data Warehouse Foundation/Aggregate tables.
Created a Detailed design technical artifacts with
Data Flow Diagrams (DFDs), Program architecture designs and Data Models Design.
ETL/ELT Jobs, ETL Designs, SQLs and reference table designs.
RTM, Data Loading Methodologies, Post Implementation support
Experience using Sqoop to import data into HDFS from RDBMS and vice-versa and dealing with log files to extract data and to copy into HDFS.
Expertise on Logical and Physical modeling of Landing/Staging/Foundation and Mart Layers.
Rich Experience in Retail, Sales, Supply Chain and Banking Domains.
Executed multiple end to end Enterprise data warehousing projects.

TECHNICAL SKILLS

Big Data: Hadoop, Spark (SQL/Data Frames, Data sets), Streaming - Apache Kafka, Kinesis, Spark, Hive, Yarn, HDFS, MapReduce, Sqoop, Oozie, Zookeeper, NiFi, Stream Sets, Change Data Capture Parquet/Avro/ORC/XML/JSON/ORC/CSV/TXT (formats)

Hadoop Distributions: Cloudera, AWS EMR, Azure HDInsight

ETL Tools: DataStage/(V 11.3/8.5/8.1/7.5/7.0 Designer, Administrator, Director, Manager, Parallel Extender).

Query Tools: Advanced Query Tool, Squirrel, Ambari, SQL developer, Teradata SQL Assistant

Database: Teradata, DB2, Netezza, Oracle, MSSQL Server

Storage: HBase, Cassandra, MongoDB, Oracle, SQL Server, NoSQL and Columnar Databases, DB2, PL/SQL, T-SQL, MySQL, PostgreSQL

Versioning: Git, AWS Code Commit

Scripting: Scala, Java, Python, PowerShell, d3js, REST, JavaScript, JSON, CURL

BI Concepts: OLTP, ODS, Data Warehousing, OLAP, Dimensional Modeling, Slowly Changing Dimensions

Languages: SQL, PL/SQL, Shell Scripting, Java, Python, Scala

Platforms: Cloudera CDH, Hortonworks HDP, Linux, ER/Studio Data Architect, Jupyter, SBT, NetBeans, GitHub, TFS, Jenkins, Web Services, Design Patterns

Scheduling Tools: CA7, Control-M, Crontab, Stonebranch

Other: Docker, Kubernetes, Jenkins, Ansible C, Tableau, MDM

PROFESSIONAL EXPERIENCE

Confidential

Senior Data Engineer

Technology: Hadoop 3.0.1, Spark, Scala, Sqoop, Hive, HBase, HiveQL, Oracle, MSSQL, Unix and Stonebrach scheduler.

Responsibilities:

Analyze and understand requirements from Business teams.
Built Real time streaming analytics with Kafka streams and Spark Structured streaming.
Built a centralized one stop shop for Confidential (EDP, Enterprise Data Platform) data lake Customer, product data for batch analytics.
Automated EDP data lake saves hundreds of hours each time who previously had to pull data manually from various legacy RDBMS systems.
Responsible for building scalable and distributed data solutions using Cloudera CDH002E
Developed Sqoop jobs for data ingestion, incremental data loads from RDBMS to HDFS
Built pipelines and integrated with Bigdata Hadoop platform using Spark, Scala, Hive.
Built distributed real time processing with Kafka stream sets.
Used Change Data Capture Scripts and Spark for type 1 and Type 2 processes and performed complex transformations using Spark (SQL, Datasets)/Scala.
Acquire full knowledge of Hadoop architecture and HDFS
Have working knowledge of MapReduce, HBase, Spark, Scala and Hive
Ensure to choose a Hadoop solution that would be deployed without any hindrance.
Automated all the production jobs using stonebranch, for extracting the data from different Data Sources such as Oracle, MSSQL, Mainframes pushing the result set data to Hadoop Distributed File System.
Participated in the setup and deployment of Hadoop clusters.
Scheduled all the spark Scala jobs using stonebranch scheduler.
Developed and supported the extraction, transformation, and load process (ETL) for data migration.
Profiling data and perform validations.

Confidential

Technical Lead

Technology: Hadoop 3.0.1, Spark, Scala, Sqoop, Hive, HBase, Datastage, HiveQL, Oracle, MSSQL,Infosphere CDC, HDFS, Sqoop, Hive, HBase, HiveQL, Netezza and Oozie/Control M.

Responsibilities:

Analyze and understand requirements from Business teams.
Built Real time streaming analytics with Kafka streams and Spark Structured streaming.
Built distributed real time processing with Kafka stream sets.
Used Spark for ETL processes and performed complex transformations using Spark (SQL, Datasets)/Scala.
Acquire full knowledge of Hadoop architecture and HDFS
Have working knowledge of MapReduce, HBase, Spark,Scala and Hive
Ensure to choose a Hadoop solution that would be deployed without any hindrance.
Automated all the jobs for extracting the data from different Data Sources such as Oracle, MSSQL, Mainframes pushing the result set data to Hadoop Distributed File System
Analyze and understand requirements from Business teams.
Data Analysis and Profiling to answer business/functional questions.
Designed patterns for history/incremental loads using CDC and Sqoop process.
Designed ETL (Datastage) jobs for extracting from HIVE snapshot tables, transforming, loading data into different stage/Base tables using Datastage and Talend jobs.
Data from Hadoop HIVE tables will be loaded into EDW Dimensions and Fact tables for further reporting and analysis
Identified columns for compression in tables for efficient usage of disk space.
Created detail step-by- step documents and executed the same in Production to load data into different tables
Migrated Netezza to Hive/MS-SQL Server and Control-M to Oozie.
Created Datastage ETLs as per business need.
Extensively used ETL methodology for supporting ETL solutions using Datastage with strong Knowledge on OLAP, OLTP, Star and Snowflake Schema methodologies
Configured jobs to run via Oozie.
Developed UNIX Shell Scripts, FTP and Datastage jobs to process and load data files.
Developed complex SQL scripts to transform and load data (as per business needs).
Performance Engineering of ETL jobs and SQL Queries.
Responsible for assisting in the development, execution and documentation of system and integration test plans, UAT and Regression phases; reviewed test cases and test strategy
Perform quality monitoring and trending on data standardization processes

We provide IT Staff Augmentation Services!

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship