Azure Data Engineer Resume
Nashville, TN
SUMMARY
- Having 8 years of IT industry experience, including 3 years working with Big Data and 5 years utilizing Azure cloud services.
- Understanding of Azure Cloud, Azure Data Factory, Azure Data Lake storage, Azure Synapse Analytics, Azure Analytical services, and Azure Cosmos NO SQL DB, as well as Azure Big Data Technologies (Hadoop and Apache Spark).
- Experience developing, supporting, and maintaining ETL (Extract, Transform, and Load) processes using Talend Integration Suite.
- Proficiency in using Informatica's ETL tool to build extremely complex mappings, reusable transformations, sessions, and workflows to extract data from various sources and load it into targets.
- Proficiency with a range of databases, including MS SQL Server, Cassandra, MongoDB, and MySQL.
- Knowledge of how to build Spark applications that leverage Spark - SQL in Databricks to extract, process, and aggregate data from a range of file types in order to analyze and alter the data in order to obtain insights into user behavior.
- Avro, Parquet, Sequence, Json, ORC, and text were among the file formats utilized for data loading, parsing, information collection, and transformations.
- Excellent experience working with Hortonworks and Cloudera's Apache Hadoop distributions.
- Designed and constructed Hive external tables with bucketing, indexing, and both static and dynamic partitioning on a shared meta-store.
- Examining the potential for using Spark context, Spark-SQL, Data Frames, and pair RDDs to improve the efficiency and effectiveness of the current Hadoop techniques.
- Extensive practical knowledge in tuning spark works.
- Has knowledge of using HiveQL to work with structured data and enhancing Hive searches.
- Python libraries like PySpark, Numbly, Pandas, Star Base, and Matplotlib are familiar to me.
- Writing complex SQL queries using joins, group by, and nested queries.
- Proven ability to write HBase queries that use connectors and NOSQL to load data.
- Extensive knowledge of exploratory data analysis, statistical analysis, and visualization using Tableau, R, Python, SQL, and Tableau.
- In-depth familiarity with Snowflake cloud technology.
- Practical understanding of using Flume and Kafka to load log data from multiple sources directly into HDFS.
- Excellent knowledge of Teradata UPI and NUPI, secondary indexes, and other features; BTEQ, Fast load, Multifood, SQL Helper
TECHNICAL SKILLS
Big Data Technologies: Hadoop, Map Reduce, HDFS, Sqoop, Hive, HBase, Flume, Kafka, Yarn, Apache Spark.
Databases: Oracle, MySQL, SQL Server, MongoDB, Dynamo DB, Cassandra, Snowflake.
Programming Languages: Python, Pyspark, Shell script, Perl script, SQL, Java.
Tools: PyCharm, Eclipse, Visual Studio, SQL*Plus, SQL Developer, SQL Navigator, SQL Server Management Studio, Eclipse, Postman.
Version Control: SVN, Git, GitHub, Maven
Operating Systems: Windows 10/7/XP/2000/NT/98/95, UNIX, LINUX, OS
Visualization/ Reporting: Tableau, ggplot2, matplotlib
PROFESSIONAL EXPERIENCE
Confidential, Nashville, TN
Azure Data Engineer
Responsibilities:
- Comprehend business requirements, analyze them, and convert them into operational and application requirements.
- A one-time load technique for transferring sizable databases to Azure SQL DWH was designed.
- With Azure Data Factory and HDInsight, extract, transform, and load data from source systems to Azure Data Storage services.
- Developed a system for handling rollback strategy, automatic batch pipeline restart, data profiling, and cleaning.
- Create and put into practice database solutions with Azure SQL Data Warehouse and Azure SQL
- Lead a group of six developers in the application migration.
- Protected sensitive data by using masking and encryption techniques.
- SSIS IR was implemented to run SSIS packages from ADF.
- A mapping document was created to map columns from the source to the target.
- With Azure blob, I was able to create Azure Data Factory (ADF) pipelines.
- ETL was carried out utilizing Azure Data Bricks. Azure Synapse Analytics was used to replace an on-premises Oracle ETL process.
- I worked on automating the creation of scripts using Python scripting. Azure Data Bricks are used for data curation.
- Using hive to import and transform data, together with PySpark, HDInsight, Azure ADW, and Azure Data Bricks.
- Designed and implemented hive partitioning and buckling.
- Kafka and Spark structured streaming were put into use for real-time data input. used Azure Data Lake as the source and Azure blob to get the data.
- Utilized aspects of the ADF such as the stored procedure, lookup, execute pipeline, data flow, copy data, and azure function.
- I created a star schema to organize drilling data. To load data, PySpark procedures, functions, and packages were created.
- Using a combination of Azure Data Factory, T-SQL, Spark SQL, and U-SQL Azure Data Lake Analytics, extract, transform, and load data from sources systems to Azure Data Storage services.
- Data processing in Azure Databricks after being ingested into one or more Azure Services (Azure Data Lake, Azure Storage, Azure SQL, Azure DW).
- In charge of monitoring and troubleshooting the Spark data bricks cluster, as well as determining the cluster size.
- Building SQL, Python, and automated Databricks notebooks using jobs.
- Using Azure Databricks to build Spark clusters and set up high concurrency clusters to hasten the preparation of high-quality data.
- With Azure Databricks and Data Factory, create and manage an ideal data pipeline architecture on the Microsoft Azure cloud.
Environment: Hadoop, Hive, Impala, Oracle, Spark, Pig, Sqoop, Oozie, Map Reduce, Teradata, SQL, Abinitio, (S3, RedShift, CFT, EMR, Cloudwatch), Kafka, Zookeeper, Pyspark.
Confidential
Azure Data Engineer
Responsibilities:
- Participating in the collection of requirements, analysis, design, development, change management, and deployment
- Created a number of Spark applications in Scala to do various enrichments on user profile data along with click stream data.
- By creating JSON scripts, the pipeline for processing data using SQL activity in Azure Data Factory (ADF) was deployed.
- Ingestion of data into one or more Azure Services, such as Azure Data Lake, Azure Storage, and data processing in Azure Databricks.
- Data ingestion into one or more Azure Services, such as Azure Data Lake, Azure SQL, and Azure Data Storage.
- Created Spark code in the Python and Spark SQL environments to speed up testing and data processing, load data into Spark RDDs, and perform in-memory calculations to produce output responses with less memory use.
- Using spark, a near real-time data pipeline was developed.
- Developed mapping documents, architecture documents, and ETL architecture.
- Created data conversion scripts and ETL (Extraction, Transformation, and Loading) procedures using the Pre-Stage, Stage, Pre-Target, and Target tables.
- Developed standards and best practices for ETL process documentation.
- Developed interactive reports, workbooks, and dashboards using Tableau.
- Create Python programs to decode JSON files and import the information into databases. worked with several data types, including JSON and XML, and ran Python machine learning algorithms.
- Using Hive and Pig, created simple to complicated MapReduce jobs.
- Profile structured, unstructured, and semi-structured data from a variety of sources to find patterns in the data.
- Apply data quality measures by utilizing the appropriate Python scripts or queries, depending on the source.
- Developed and implemented HIVE queries and functions for data evaluation, filtering, loading, and saving.
- Worked on PySpark APIs for data transformations.
- Using Hive and Map Reduce to perform transformations, cleaning, and filtering on imported data, and then loaded the finished data into HDFS.
- Examined the SQL scripts and created the solution for PySpark implementation.
- Placed files in HDFS after being extracted from MongoDB using Sqoop.
- Use other tools and SQL queries to carry out data analysis and profiling.
- Participated in daily SCRUM meetings, sprint planning, showcases, and retrospective while adhering to the agile process.
Environment: ADF, Databricks and ADL Spark, Hive, HBase, Sqoop, Flume, ADF, Blob, cosmos DB, MapReduce, HDFS, Cloudera, SQL, Apache Kafka, Azure, Python, power BI, Unix, SQL Server.
Confidential
Big Data Developer
Responsibilities:
- Worked on requirements gathering, business analysis, and the technical design of Hadoop and Big Data systems.
- Participated in the implementation of SQOOP, which facilitates data loading from various RDBMS sources to Hadoop systems and vice versa.
- Created Python scripts to extract the data from the output files of the web server and load it into HDFS.
- Created a Python script that automates the EMR cluster launch and Hadoop application configuration.
- Working with Avro and Parquet files extensively, parsing semi-structured JSON data, and converting it to Parquet using Data Frames in PySpark.
- Participating in root cause analysis of system failures and recommending next steps, For future use, the systems' processes and procedures were documented.
- Engaged in setting up the Hadoop cluster and distributing the load across the nodes.
- Participated in the configuration of several nodes utilizing the Hortonworks platform for Hadoop installation, commissioning, decommissioning, balancing, troubleshooting, monitoring, and debugging.
- Working with Spark for interactive and batch analysis on top of Yarn/MRv2.
- Using Cloudera Manager to manage and keep an eye on a Hadoop cluster.
- Built pipelines using Python and Shell scripting.
- Built a data pipeline to import Enterprise message delivery data into HDFS using Sqoop, HQL, Spark, and Kafka.
- Created a workflow in Oozie and Airflow to automate the pre-processing using Pig and Hive and putting data into HDFS.
- Helped with the creation and upkeep of technical documentation for anything from launching HADOOP Clusters to running Hive searches and Pig scripts.
- Hadoop integration into conventional ETL to speed up the extraction, transformation, and loading of large amounts of semi-structured and unstructured data.
- Filled the Hadoop distributed file system with unstructured data (HDFS).
- Built HIVE Tables with efficient buckets and dynamic and static partitioning. Moreover, external HIVE tables were constructed for staging.
- Data was loaded into HIVE tables, MapReduce queries were written, and created a specially designed BI product for manager teams who use HiveQL for query analytics.
- RDDs were combined based on the needs of the business and transformed into Data frames that were then stored in HBase/Cassandra and RDBMs as temporary hive tables for processing.
Environment: Hadoop 3.0, Hive 2.1, J2EE, JDBC, Pig 0.16, HBase 1.1, Sqoop, NoSQL, Impala, Java, Spring, MVC, XML, Spark 1.9, PL/SQL, HDFS, JSON, Hibernate, Bootstrap, jQuery.
