Azure Cloud Data Engineer Resume
Plano, TX
SUMMARY
- 7+ years of experience in Data warehousing with exposure to Cloud Architecture Design, Modelling, Development, Testing, Maintenance and customer support environments on multiple domains like Insurance, Financial, Telecom, Banking.
- 2+ years of experience in Azure Cloud, Azure Data Factory, Azure Data Lake Storage, Azure Synapse Analytics, Azure Analytical services, Azure Cosmos NO SQL DB, Azure HDInsight Big Data Technologies (Hadoop and ApacheSpark) and Data bricks.
- Experience in designing Azure Cloud Architecture and Implementation plans for hosting complex application workloads on MS Azure.
- Experience working in reading Continuous Json data from different source system using Kafka into Databricks Delta and processing the files using Apache Structured streaming, Pyspark and creating the files in parquet format.
- Created data pipelines for both batch process, Micro - batch streaming and continuous streaming process in Databricks for high latency, low latency and ultra-low latency of data accordingly by using inbuilt Apache spark modules.
- Well versed experienced in creating pipelines in Azure Cloud ADFv2 using different activities like Move &Transform, Copy, filter, for each, Data bricks etc.
- Providing Azure technical expertise including strategic design and architectural mentorship, assessments, POCs, etc., in support of the overall sales lifecycle or consulting engagement process.
- Hands on experience in Hadoop Ecosystem components such asHadoop, Spark, HDFS, YARN, TEZ, Hive, Sqoop, MapReduce, Pig, OOZIE, Kafka, Storm, HBASE.
- Have Experience in designing and developing Azure stream analytics jobs to process real time data using Azure Event Hubs, Azure IoT Hub and Service Bus Queue.
- In-depth understanding of Spark Architecture including Spark Core, Spark SQL, Data Frames and Spark Streaming.
- Expertise in using Spark SQL, U-SQL with various data sources like JSON, Parquet and Hive.
- Experience in writingHQLqueries in Hive Data warehouse and performance tuning ofHIVEscripts, resolving automation job failure issues and reloading the data intoHIVEData Warehouse if needed.
- Experience in using Accumulator and Broadcast variables, RDD caching for Spark Streaming.
- Experience in data processing like collecting, aggregating, moving from various source using Apache Kafka.
- Strong experience in writing applications using Python using different libraries likePandas,NumPy, SciPy, Matpotlib etc.
- Good understanding of NoSQL databases and hands on work experience in writing applications on NoSQL databases like CosmosDB.
- Much experience in performing Data Modelling by designing Conceptual, Logical data models and translating them to Physical data models for high volume datasets from various sources likeOracle, Teradata, Vertica and SQL Server by using Erwin tool.
- 5+ years of Proficient experience in Teradata database and Teradata Load and Unload Utilities (FASTLOAD, FASTEXPORT, MULTILOAD, TPUMP, BTEQ, TPT and TPTAPI).
- Expert knowledge and experience in Business Intelligence Data Architecture, Data Management and Modeling to integrate multiple, complex data sources that are transactional and non-transactional, structured and unstructured.
- 4+ years of Expert knowledge in working withInformatica Power Center 9.6.x/8.x/7.x(Designer, Repository manager, Repository Server Administrator console, Server Manager, Workflow manager, workflow monitor).
- Also, design and develop relational databases for collecting and storing data and build and design data input and data collection mechanisms.
- Well versed with Relational and Dimensional Modeling techniques likeStar, Snowflake Schema, OLTP, OLAP, Normalization, Fact and Dimensional Tables.
- Over 4+ years of working experience in Vertica Data Architecture, designing and writing Vsql scripts.
- Good knowledge in creating SQL queries, collecting statistics and Teradata SQL query performance tuning techniques and Optimizer/explain plan.
- Well versed in writing UNIX shell scripting.
- Self-motivated, hardworking, possess strong analytical and problem-solving skills and result oriented with the spirit of teamwork and effective communication and interpersonal skills. Eager to learn, able to adapt quickly, well organized and very reliable.
TECHNICAL SKILLS
Azure Cloud Platform: ADFv2, BLOB Storage, ADLS, Azure SQL DB, SQL server, Azure Synapse, Azure Analytic Services, Data bricks, Mapping Dataflow (MDF), Azure Cosmos DB, Azure Stream Analytics, Azure Event Hub, Azure Machine Learning, App Services, Logic Apps, Event Grid, Service Bus, Azure DevOps, GIT Repository Management, ARM Templates
Teradata Tools and Utilities: FatsLoad, FastExport, MultiLoad, Tpump, TPT, Teradata SQL Assistant, BTEQ
Modelling & DA Specs Tools: CA Erwin Data Modeler, MS Visio
ETL Tools: Informatica Power Center 10.x/9.x/ 8.6/8.5/8.1/7 ), DataStage 11.x/9.x, SSIS
Programming Languages: PySpark, Python, U-SQL, T-SQL, LINUX Shell Scripting, AZURE PowerShell
Big data Technologies: Hadoop, HDFS, Hive, Apache Spark, Apache Kafka, Pig, Zookeeper, Sqoop, Oozie, HBASE, YARN
Databases: Azure SQL Warehouse, Azure SQL DB, Azure Cosmos No SQL DB, Teradata, Vertica, RDBMS, MySQL, Oracle, Microsoft SQL Server
IDE and Tools: Eclipse, Tableau, IntelliJ, R Studio, SSMS, Maven, SBT, MS-Project, GitHub, Microsoft Visual Studio
Scheduler Tools: Autosys Scheduler, Control-M, Active Batch
Methodologies: Waterfall, Agile/Scrum, SDLC
PROFESSIONAL EXPERIENCE
Confidential, Plano, TX
Azure cloud Data Engineer
Responsibilities:
- Attended requirement calls and worked with Business Analyst and Solution Architects to understand the requirements from clients.
- Analyzed the data flow from different sources to target to provide the corresponding design Architecture in Azure environment.
- Take initiative and ownership to provide business solutions on time.
- Created High level technical design documents and Application design documents as per the requirements and delivered clear, well-communicated and complete design documents.
- Created DA specs and Mapping Data flow and provided the details to developer along with HLDs.
- Created Application Interface Document for the downstream to create new interface to transfer and receive the files through Azure Data Share. creating pipelines, data flows and complex data transformations and manipulations using ADF and PySpark with Databricks
- Ingested data in mini-batches and performs RDD transformations on those mini-batches of data by using Spark Streaming to perform streaming analytics in Data bricks.
- Created, provisioned different Databricks clusters needed for batch and continuous streaming data processing and installed the required libraries for the clusters.
- Improved performance by optimizing computing time to process the streaming data and saved cost to company by optimizing the cluster run time.
- Perform ongoing monitoring, automation and refinement of data engineering solutions prepare complex SQL views, stored procs in azure SQL DW and Hyperscale
- Loaded different files from ADLS by using U-SQL scripts into target Azure Data warehouse.
- Worked on complex U-SQL for the data transformation and loading table and report generation.
- Designed and developed a new solution to process the NRT data by using Azure stream analytics, Azure Event Hub and Service Bus Queue.
- Created Linked service to land the data from Caesars SFTP location to Azure Datalake.
- Created numerous pipelines in Azure using Azure Data Factory v2 to get the data from different source databases Informix, Sybase etc by using different Azure Activities like Move &Transform, Copy, filter, for each, Databricks etc.
- Created several Databricks Spark jobs with Pyspark to perform several tables to table operations.
- Extensively used SQL Server Import and Export Data tool.
- Created database users, logins and permissions to setup.
- Working with complex SQL, Stored Procedures, Triggers, and packages in large databases from various servers.
- Experience in creatingData LakeAnalytics accountand creatingDate Lake Analytics Jobin Azure Portal usingU-SQLScript.
- Helping team member to resolve any technical issue, Troubleshooting, Project Risk & Issue identification and management
- Addressing resource issue, Monthly one on one, Weekly meeting.
Environment: Azure Cloud, Azure Data Factory (ADF v2), Azure functions Apps, Azure DataLake, BLOB Storage, SQL server, Teradata Utilities, Windows remote desktop, UNIX Shell Scripting, AZURE PowerShell, Data bricks, Python, Erwin Data Modelling Tool, Azure Cosmos DB, Azure Stream Analytics, Azure Event Hub, Azure Machine Learning.
Confidential, Plano, TX
Azure cloud Data Engineer
Responsibilities:
- Working with theHortonworksDistribution of Hadoop.
- Played a lead role in architecting and development of Confidential Data Lake and in building Confidential Data Cube on Microsoft AzureHDINSIGHTcluster.
- Responsible for managing data coming from disparate data sources.
- Experience in ingesting incremental updates from Inform web services on to Hadoop data platform usingSqoop.
- Implemented OLAP multi-dimensional cube functionality using AzureSQL Data Warehouse.
- Experience in working withRestfulAPIs.
- CreatedHBasetables to store various data formats coming from different applications.
- Developed scripts for extracting and processing EDI POS sales data sourced fromSFTPserver in Hive data warehouse usingLinux shell scripting.
- Implemented proof of concept to analyze the streaming data using ApacheSparkwithPython; UsedMaven/SBTfor build and deploy the Spark programs.
- Responsible for building Confidential data cube usingSPARKframework by writing Spark SQL queries in Python so as to improve efficiency of data processing and reporting query response time.
- Developedsparkprogramming code inPythonData bricks workbooks.
- Performance tuning ofSQOOP,HiveandSparkjobs.
- Responsible for modification of ETL data load scripts, scheduling automated jobs and resolving production issues (if any) on time.
- Wrote AZUREPOWERSHELLscripts to copy or move data from local file system to HDFS Blob storage.
- DevelopedOOZIEworkflows to automate ETL process by scheduling multiple SQOOP and HIVE jobs.
- Daily Monitoring of Cluster status and health usingAMBARIUI.
- Maintained technical documentation for launching and executing jobs onHadoopclusters.
- Involved in story-drivenagiledevelopment methodology and actively participated in daily scrum meetings.
- Responsible for programming code independently for intermediate to complex modules following development standards.
- Planned and conducted code reviews for changes and enhancements that ensure standards compliance and systems interoperability.
- Responsible for modifying the code, debugging, and testing the code before deploying on the production cluster.
Environment: Microsoft Azure HDINSIGHT, Hadoop Stack, Sqoop, Hive, Oozie, Microsoft SQL server, HBASE, YARN, Hortonworks, UNIX Shell Scripting, AZURE PowerShell, Databricks, Python, Erwin Data Modelling Tool, Azure Cosmos DB, Azure Data Factory (ADF v2) Azure functions Apps, Web service, Azure DataLake, BLOB Storage, Azure SQL DB, Azure SQL Warehouse.
Confidential, Sanjose, CA
ETL/Teradata Developer
Responsibilities:
- Collaborate with Lead Developers, System Analysts, Business Users, Architects, Test Analysts, Project Managers and peer developers to analyze system requirements.
- Worked with SQL, PL/SQL procedures and functions, stored procedures and packages within the mappings.
- Involved in all activities related to the development, implementation, and support of ETL Process using Informatica Power Center 10.x
- Worked with most of the transformations such as the Source Qualifier, Expression, Aggregator and Connected & Unconnected lookups, Filter, Router, Sequence Generator, Sorter, Joiner, SQL and Update Strategy.
- Worked with ETL Developers in creating External Batches to execute mappings, Mapplets using Informatica workflow designer to integrate Shire’s data from varied sources like Oracle, DB2, flat files and SQL databases and loaded into landing tables of Informatica MDM Hub.
- Developed complex store procedures using input/output parameters, cursors, views, triggers and complex queries using temp tables and joins.
- Develop of scripts for loading the data into the tables using FastLoad, MultiLoad and BTEQ utilities of Teradata.
- Used control-M to schedule Jobs.
- Used Snowflake schema to be joined with the fact table
- Involved in requirement analysis, ETL design and development for extracting data from the source systems like sales force, Mainframe, DB2, sybase, Oracle, flat files.
- Responsible for determining the bottlenecks and fixing the bottlenecks with performance tuning.
- Extensively involved in the Analysis, design and Modeling. Worked on Snowflake Schema, Data Modeling, Data Elements, Issue/Question Resolution Logs, and Source to Target Mappings, Interface Matrix and Design elements.
- Design and develop logical and physical data models that utilize concepts such as Star Schema, Snowflake Schema and Slowly Changing Dimensions
- Work on Test Driven Development and conduct unit testing, system testing and user acceptance testing.
- Created deployment packages to deploy the developed Informatica mappings, Mapplets, Worklets and workflows into Higher Environments.
- Trouble shoot any deployment issues and coordinate to deploy the code into production on the target date.
Environment: TeradataSQL Assistant, Informatica Power center, Snowflake, Outlook, Jenkins, MLOAD, TPUMP, FAST LOAD, FAST EXPORT, TPT, Udeploy.
Confidential, Tennessee, TN
ETL/Teradata Developer
Responsibilities:
- Involved in understanding the Requirements of the End Users/Business Analysts and Developed Strategies for ETL processes.
- Performed analysis of complex business issues and provided recommendations for possible solutions. Writing SQL queries.
- Extracted data from flat files (provided by disparate ERP systems) and loaded the data into Teradata staging using Informatica Power Center.
- Identify key data or components that fit within the business system/process and document the gaps that need solutions.
- Program using T-SQL in SQL Server, PL/SQL in Oracle, Microsoft SSRS and Crystal Reports, configure permits and licenses for individuals, properties and businesses in CSDC Application suite.
- Extracted data from different sources like Flat files ("pipe" delimited or fixed length), excel spreadsheets and Databases.
- Used Teradata utilities BTEQ, FAST LOAD, MULTI LOAD, TPUMP to load data.
- Wrote, tested and implemented Teradata Fast load, Multiload and BTEQ scripts, DML and DDL.
- Managed all development and support efforts for the Data Integration/Data Warehouse team.
- Set and follow Informatica best practices, such as creating shared objects in shared for reusability and standard naming convention of ETL objects, design complex Informatica transformations, mapplets, mappings, reusable sessions, worklets and workflows.
- Used Informatica Data Quality Tool (Developer) to scrub, standardize, and match customer Address against the reference table and Performed Unit testing for all the interfaces.
- Worked on creating Technical Design Documents (TDD) by performing the impact analysis on the application for the new functionality changes.
- Performance tuned and optimized various complex SQL queries.
- Used BTEQ and SQL Assistant (Query man) front-end tools to issue SQL commands matching the business requirements to Teradata RDBMS.
- Coordinated with the business analysts and developers to discuss issues in interpreting the requirements
- Provided on call support during the release of the product to low level to high level Production environment.
- Used Agile methodology for repeated testing.
- Involved in Unit testing, User Acceptance testing to check whether the data is loading into target, which was extracted from different source systems according to the user requirements.
- Prepared BTEQ import, export scripts for tables.
- Interacting with the Source Team and Business to get the Validation of the data, End to End Testing.
Environment: Oracle, Teradata, Teradata SQL Assistant, Informatica Power center, SQL, MLOAD, TPUMP, FAST LOAD, FAST EXPORT, Control-M.
Confidential
ETL Developer
Responsibilities:
- Gathered requirements from Data Analyst development of the system.
- Extensively usedETLto load Flat files, XML files, Oracle and legacy data as sources and Oracle, Flat files as targets.
- Created Mappings usingMapping Designerto load the data from various sources, using different transformations likeSource Qualifier, Expression, Lookup (Connected and Unconnected),Aggregator, Update Strategy, Joiner, Filter, andSorter transformations.
- Worked with Data Quality group to identify and research data quality issues.
- Extensive experience inPerformance Tuning-Identified and fixed bottlenecks and tuned the complex Informatica mappings for better Performance.
- Involved in thedebuggingof themappingsby creatingbreak pointsto gaintrouble shooting information about data and error conditions.
- Set up informatica Schedules to execute the designed workflows in timely fashion.
- Developed various scripts forTeradatautilities.
- DevelopedUnit Test Cases toensure successful execution of the data loading processes.
- Assisted QA team to fix and find solutions for theproduction issues.
Environment: Informatica Power Center 8.6.1, Oracle 10g, Teradata, PL/SQLOracle SQL developer, Flat files, Unix.