Azure Data Engineer Resume
Sunnyvale, CA
SUMMARY
- Overall, 7 Years of total IT experience and Technical proficiency in the Data Warehousing involving Business Requirements Analysis, Application Design, Data Modeling, Development, Testing and Documentation.
- 3 years of experience as Azure Data Engineer in Azure Cloud, Azure Data Factory, Azure Data Lake Storage, Azure Synapse Analytics, Azure Analytical services, Azure Cosmos, NO SQL DB, Azure HDInsight, Big Data Technologies (Hadoop and Apache Spark) and Data bricks.
- 4 years of experience as Teradata/ETL Consultant in Teradata Database design, implementation and maintenance mainly in large scale Data Warehouse environments, experience in Teradata RDBMS using Teradata Parallel Transporter, BTEQ, Fast Load, Multi Load, TPump, Fast Export and Teradata SQL Assistance.
- Experience in designing Azure Cloud Architecture and Implementation plans for hosting complex application workloads on MS Azure.
- Providing Azure technical expertise including strategic design and architectural mentorship, assessments, POCs, etc., in support of the overall sales lifecycle or consulting engagement process.
- Excellent knowledge on integrating Azure Data Factory V2/V1 with variety of data sources and processing the data using the pipelines, pipeline parameters, activities, activity parameters, manually/window based/event - based job scheduling.
- Have Experience in designing and developing Azure stream analytics jobs to process real time data using Azure Event Hubs.
- Experience with different cloud-based storage systems like S3, Azure Blob Storage, Azure DataLake Storage Gen 1 & Gen2.
- Experience in implementing data pipelines using Azure DataFactory.
- In-depth knowledge of Spark Architecture including Spark Core, Spark SQL, Data Frames and Spark Streaming.
- Worked on Data Warehouse design, implementation, and support (SQL Server, Azure SQL DB, Azure SQL Data warehouse, Teradata).
- Experience in implementing in ETL and ELT solutions using large data sets.
- Expertise in querying and testing RDBMS such as Teradata, Oracle and SQL Server using SQL for data integrity.
- ExtensiveExperienceinTeradata12/13/14/15.*for developing ETL and ELT architectures.
- Proficient in Data Modeling Techniques using Star Schema, Snowflake Schema, Fact and Dimension tables, RDBMS, Physical and Logical data modeling for Data Warehouse and Data Mart.
- Technical expertise in ETL methodologies, Informatica 9.x/8.6/8.5/8.1 - Power Centre, Power Mart, Client tools - Mapping Designer, Workflow Manager/Monitor and Server tools.
- Excellent communication and inter personnel skills, Proactive, Dedicated and Enjoy learning new Technologies and Tools.
- Strong commitment towards quality, experience in ensuring experience in ensuring compliance to coding standards and review process.
TECHNICAL SKILLS
Azure Cloud Platform: Azure Data Factory v2, Azure Blob Storage, Azure DataLake Gen 1 & Gen 2, Azure SQL DB, SQL server, Logic Apps, Azure Synapse, Azure Analytic Services, Data bricks, Azure Cosmos DB, Azure Stream Analytics, Azure Event Hub, Key Vault, Azure App Services, Logic Apps, Event Grid, Service Bus, Azure DevOps, ARM Templates.
Databases: Azure SQL Warehouse, Azure SQL DB, Azure Cosmos DB, Teradata, Oracle, MySQL, Microsoft
Programming Languages: Python, PySpark, T-SQL, LINUX Shell Scripting, AZURE PowerShell
ETL Tools: Teradata SQL Assistant, TPT, BTEQ, Fast Load, Multi Load, Fast Export, T Pump, Informatica Power Centre 9.x/8.6/8.5/8.1
Big data Technologies: Hadoop, Hive, HDFS, Apache Kafka, Apache Spark
Data Modeling: Erwin, Visio
PROFESSIONAL EXPERIENCE
Confidential, Sunnyvale, CA
Azure Data Engineer
Responsibilities:
- Meetings with business/user groups to understand the business process, gather requirements, analyze, design, development and implementation according to client requirement.
- Designing and Developing Azure Data Factory (ADF) extensively for ingesting data from different source systems like relational and Non relational to meet business functional requirements.
- Designed and Developed event driven architectures using blob triggers and DataFactory.
- Creating pipelines, data flows and complex data transformations and manipulations using ADF and PySpark with Databricks.
- Automated jobs using different triggers like Events, Schedules and Tumbling in ADF.
- Created, provisioned different Databricks clusters, notebooks, jobs and autoscaling.
- Ingested huge volume and variety of data from disparate source systems into Azure DataLake Gen2 using Azure Data Factory V2.
- Created several Databricks Spark jobs with Pyspark to perform several tables to table operations.
- Performed data flow transformation using the data flow activity.
- Implemented Azure, self-hosted integration runtime in ADF.
- Developed streaming pipelines using Apache Spark with Python.
- Created, provisioned multiple Databricks clusters needed for batch and continuous streaming data processing and installed the required libraries for the clusters.
- Improved performance by optimizing computing time to process the streaming data and saved cost to company by optimizing the cluster run time.
- Perform ongoing monitoring, automation, and refinement of data engineering solutions.
- Designed and developed a new solution to process the NRT data by using Azure stream analytics, Azure Event Hub and Service Bus Queue.
- Created Linked service to land the data from SFTP location to Azure Data Lake.
- Extensively used SQL Server Import and Export Data tool.
- Working with complex SQL views, Stored Procedures, Triggers, and packages in large databases from various servers.
- Experience in working on both agile and waterfall methods in a fast pace manner.
- Generating alerts on the daily metrics of the events to the product people.
- Extensively used SQL Queries to verify and validate the Database Updates.
- Suggest fixes to complex issues by doing a thorough analysis of root cause and impact of the defect.
- Provided 24/7 On-call Production Support for various applications and provided resolution for night-time production job, attend conference calls with business operations, system managers for resolution of issues.
Environment: Azure Data Factory (ADF v2), Azure SQL Database, Azure functions Apps, Azure Data Lake, BLOB Storage, SQL server, Windows remote desktop, UNIX Shell Scripting, AZURE PowerShell, Data bricks, Python, ADLS Gen 2, Azure Cosmos DB, Azure Event Hub, Azure Machine Learning.
Confidential, Charlotte, NC
Teradata/ETL Consultant
Responsibilities:
- Involved in understanding the Requirements of the End Users/Business Analysts and Developed Strategies for ETL processes.
- Worked closely with analysts to come up with detailed solution approach design documents.
- Developed the Teradata Macros, Stored Procedures to load data into Incremental/Staging tables and then move data from staging to Journal then move data from Journal into Base tables
- Writing UNIX shell scripts to support and automate the ETL process.
- Provided scalable, high speed, parallel data extraction, loading and updating using TPT.
- Performed Query Optimization with the help of explain plans, collect statistics, Primary and Secondary indexes.
- Used volatile table and derived queries for breaking up complex queries into simpler queries. Streamlined the Teradata scripts and shell scripts migration process on the UNIX box.
- Extracted data from various source systems like Oracle, Sql Server and flat files as per the requirements
- Used Informatica Designer to create complex mappings using different transformations like Filter, Router, Connected & Unconnected lookups, Stored Procedure, Joiner, Update Strategy, Expressions and Aggregator transformations to pipeline data to Data Mart.
- Used Informatica Workflow Manager to create, schedule, execute and monitor sessions, Worklets and workflows.
- Performed the data profiling and analysis making use of Informatica Data Explorer (IDE) and Informatica Data Quality (IDQ). supporting software development and data management projects that prepare large datasets for delivery to a shared cloud computing and storage environment.
- Provide strategies and requirements for the seamless migration of applications, web services, and data from local and server-based systems to the AWS cloud.
- Performed transformations, cleaning and filtering on imported data using Hive.
- Worked on Kafka to bring the data from data sources and keep it in HDFS systems for filtering.
- Implementing Hadoop Security on Hortonworks Cluster using Kerberos and Two-way SSL.
- Extracted the data from Teradata into HDFS using the Sqoop and Exported the patterns analyzed back to Teradata using Sqoop.
- Worked on Tableau software for the reporting needs.
- Worked on creating few Tableau dashboard reports, Heat map charts and supported numerous dashboards, pie charts and heat map charts that were built on Teradata database.
- Suggest fixes to complex issues by doing a thorough analysis of root cause and impact of the defect.
Environment: Teradata 16, Informatica Power Center 9.5, Workflow Manager, Workflow Monitor, Warehouse Designer, Source Analyzer, Transformation developer, Map let Designer,, Mapping Designer, Repository manager, Informatica Cloud, Informatica Data Quality (IDQ), UC4, Control-M, Tableau, UNIX, SSH (secure shell), TOAD, ERWIN.
Confidential, Bloomfield, CT
Teradata/ETL Consultant
Responsibilities:
- Involved in Requirement gathering, business Analysis, Design and Development, testing and implementation of business rules.
- Analyzing the requirements and the existing environment to help come up with the right strategy to load and Extract on Data warehouse.
- Prepared the ETL specifications, Mapping document, ETL framework specification
- Implement slowly changing Dimension logics in the mapping to effectively handle change data capture which is typical in data warehousing systems.
- Prepared functional and technical specific, design documents.
- Responsible for data profiling, data cleansing and data conformation
- Actively participated in code migration process to higher environment and documents creation for the same.
- Created a BTEQ scripts for preloading of the worktables prior to main load process.
- Proficient in understanding Teradata EXPLAIN plans, Collect Stats option, Secondary Indexes (USI, NUSI), Partition Primary Index (PPI), Volatile, global temporary, derived tables etc.
- Reviewed the SQL for missing joins, join constraints, data format issues, miss-matched aliases and casting errors.
- Used Teradata Manager, BTEQ, FASTLOAD, MULTILOAD, TPUMP, SQL and TASM for workload management.
- Wrote various TPT scripts for ad hoc requirements and used tdload for exporting data from one environment to another environment using TPT.
- Involved in Data Modeling to identify the gaps with respect to business requirements and transforming the business rules.
- Perform workload management using various tools like Teradata Manager, Fast Load, Multi Load, TPUMP, TPT, SQL Assistant.
- Developed the scripts using Teradata Parallel Transporter and implemented the Extraction- Transformation-Loading of data with TPT.
- Identify the performance bottlenecks in the production processes and identify key places where SQL can be tuned to improve overall performance of the production process
- Developed UNIX scripts to automate different tasks involved as part of loading process
Environment: Teradata 15, Teradata SQL Assistant, PowerCenter, TPump, BTEQ, MLOAD, FLOAD, FASTEXPORT, Erwin Designer, Informatica 9.5, Tableau, POWER BI, UNIX, Korn Shell scripts.