Senior Data Engineer Resume
SUMMARY
- Highly analytical and team - oriented Senior Data Engineer with 16+ years of experience in designing, developing, and implementing highly scalable enterprise data solutions to meet complex business needs.
- Expertise in Big Data tools, technologies, and ecosystems including Apache Hadoop, Spark, Scala, MapReduce, Hive, Pig and HBase, Zookeeper, Sqoop, Yarn, Impala, and Stonebranch.
- Strong experience with AWS Cloud Services like EC2, EMR, IAM, Security Groups, Redshift, RDS, CloudWatch, S3, etc.
- Hands-on experience in developing and implementing infrastructure, frameworks, and platforms for data ingestion, aggregation, integration, and analysis within Hadoop ecosystems.
- Strong knowledge of all phases of the Data Life Cycle: Data Acquisition, Data Quality Management, Data Governance, and Metadata Management.
- Well-versed in technical concepts, principles, and best practices related to ETL, Data Warehousing, Object-Oriented Programming, Cloud Platforms, DevOps, Data Science, and Data Analytics.
- Excellent written and verbal communication skills with a demonstrated ability to interface with cross-functional Agile-Scrum teams and internal and external stakeholders at all levels of business.
- Performance Engineering of DB SQLs and ETLs.
- Strong knowledge in Object Oriented Programming languages (Scala, Python)
- Hands on experience implementing complex Big Data pipelines batch and real time pipelines.
- Strong skills in IBM-DataStage 11.5/9.1/8.5/7.5 , Informatica, Talend 5.3/6.2, SQL Programming, IBM DB2, Teradata, Netezza, Oracle PL/SQL, Debugging, Performance tuning and Shell Scripting.
- Knowledge and experience in design, development and deployments of Big Data projects using Hadoop.
- Coded complex SQLs (ELT scripts) to load data into Data Warehouse Foundation/Aggregate tables.
- Created a Detailed design technical artifacts with
- Data Flow Diagrams (DFDs), Program architecture designs and Data Models Design.
- ETL/ELT Jobs, ETL Designs, SQLs and reference table designs.
- RTM, Data Loading Methodologies, Post Implementation support
- Experience using Sqoop to import data into HDFS from RDBMS and vice-versa and dealing with log files to extract data and to copy into HDFS.
- Expertise on Logical and Physical modeling of Landing/Staging/Foundation and Mart Layers.
- Rich Experience in Retail, Sales, Supply Chain and Banking Domains.
- Executed multiple end to end Enterprise data warehousing projects.
TECHNICAL SKILLS
Big Data: Hadoop, Spark (SQL/Data Frames, Data sets), Streaming - Apache Kafka, Kinesis, Spark, Hive, Yarn, HDFS, MapReduce, Sqoop, Oozie, Zookeeper, NiFi, Stream Sets, Change Data Capture Parquet/Avro/ORC/XML/JSON/ORC/CSV/TXT (formats)
Hadoop Distributions: Cloudera, AWS EMR, Azure HDInsight
ETL Tools: DataStage/(V 11.3/8.5/8.1/7.5/7.0 Designer, Administrator, Director, Manager, Parallel Extender).
Query Tools: Advanced Query Tool, Squirrel, Ambari, SQL developer, Teradata SQL Assistant
Database: Teradata, DB2, Netezza, Oracle, MSSQL Server
Storage: HBase, Cassandra, MongoDB, Oracle, SQL Server, NoSQL and Columnar Databases, DB2, PL/SQL, T-SQL, MySQL, PostgreSQL
Versioning: Git, AWS Code Commit
Scripting: Scala, Java, Python, PowerShell, d3js, REST, JavaScript, JSON, CURL
BI Concepts: OLTP, ODS, Data Warehousing, OLAP, Dimensional Modeling, Slowly Changing Dimensions
Languages: SQL, PL/SQL, Shell Scripting, Java, Python, Scala
Platforms: Cloudera CDH, Hortonworks HDP, Linux, ER/Studio Data Architect, Jupyter, SBT, NetBeans, GitHub, TFS, Jenkins, Web Services, Design Patterns
Scheduling Tools: CA7, Control-M, Crontab, Stonebranch
Other: Docker, Kubernetes, Jenkins, Ansible C, Tableau, MDM
PROFESSIONAL EXPERIENCE
Confidential
Senior Data Engineer
Technology: Hadoop 3.0.1, Spark, Scala, Sqoop, Hive, HBase, HiveQL, Oracle, MSSQL, Unix and Stonebrach scheduler.
Responsibilities:
- Analyze and understand requirements from Business teams.
- Built Real time streaming analytics with Kafka streams and Spark Structured streaming.
- Built a centralized one stop shop for Confidential (EDP, Enterprise Data Platform) data lake Customer, product data for batch analytics.
- Automated EDP data lake saves hundreds of hours each time who previously had to pull data manually from various legacy RDBMS systems.
- Responsible for building scalable and distributed data solutions using Cloudera CDH002E
- Developed Sqoop jobs for data ingestion, incremental data loads from RDBMS to HDFS
- Built pipelines and integrated with Bigdata Hadoop platform using Spark, Scala, Hive.
- Built distributed real time processing with Kafka stream sets.
- Used Change Data Capture Scripts and Spark for type 1 and Type 2 processes and performed complex transformations using Spark (SQL, Datasets)/Scala.
- Acquire full knowledge of Hadoop architecture and HDFS
- Have working knowledge of MapReduce, HBase, Spark, Scala and Hive
- Ensure to choose a Hadoop solution that would be deployed without any hindrance.
- Automated all the production jobs using stonebranch, for extracting the data from different Data Sources such as Oracle, MSSQL, Mainframes pushing the result set data to Hadoop Distributed File System.
- Participated in the setup and deployment of Hadoop clusters.
- Scheduled all the spark Scala jobs using stonebranch scheduler.
- Developed and supported the extraction, transformation, and load process (ETL) for data migration.
- Profiling data and perform validations.
Confidential
Technical Lead
Technology: Hadoop 3.0.1, Spark, Scala, Sqoop, Hive, HBase, Datastage, HiveQL, Oracle, MSSQL,Infosphere CDC, HDFS, Sqoop, Hive, HBase, HiveQL, Netezza and Oozie/Control M.
Responsibilities:
- Analyze and understand requirements from Business teams.
- Built Real time streaming analytics with Kafka streams and Spark Structured streaming.
- Built distributed real time processing with Kafka stream sets.
- Used Spark for ETL processes and performed complex transformations using Spark (SQL, Datasets)/Scala.
- Acquire full knowledge of Hadoop architecture and HDFS
- Have working knowledge of MapReduce, HBase, Spark,Scala and Hive
- Ensure to choose a Hadoop solution that would be deployed without any hindrance.
- Automated all the jobs for extracting the data from different Data Sources such as Oracle, MSSQL, Mainframes pushing the result set data to Hadoop Distributed File System
- Analyze and understand requirements from Business teams.
- Data Analysis and Profiling to answer business/functional questions.
- Designed patterns for history/incremental loads using CDC and Sqoop process.
- Designed ETL (Datastage) jobs for extracting from HIVE snapshot tables, transforming, loading data into different stage/Base tables using Datastage and Talend jobs.
- Data from Hadoop HIVE tables will be loaded into EDW Dimensions and Fact tables for further reporting and analysis
- Identified columns for compression in tables for efficient usage of disk space.
- Created detail step-by- step documents and executed the same in Production to load data into different tables
- Migrated Netezza to Hive/MS-SQL Server and Control-M to Oozie.
- Created Datastage ETLs as per business need.
- Extensively used ETL methodology for supporting ETL solutions using Datastage with strong Knowledge on OLAP, OLTP, Star and Snowflake Schema methodologies
- Configured jobs to run via Oozie.
- Developed UNIX Shell Scripts, FTP and Datastage jobs to process and load data files.
- Developed complex SQL scripts to transform and load data (as per business needs).
- Performance Engineering of ETL jobs and SQL Queries.
- Responsible for assisting in the development, execution and documentation of system and integration test plans, UAT and Regression phases; reviewed test cases and test strategy
- Perform quality monitoring and trending on data standardization processes
