We provide IT Staff Augmentation Services!

Data Engineer Resume

0/5 (Submit Your Rating)

DallaS

SUMMARY

  • Around 9+ years of experience in the software industry, including 5 years of experience in, Azure cloud services including Big Data technologies like Spark, Map reduce, Hive Yarn, and HDFS including programming languages like Scala and Python, and 4 years of experience in Data warehouse.
  • Expertise working with Azure cloud services like Azure Data Lake Storage, Azure Data Factory, Azure Analytical Services, Azure Blob Storage, Azure Analysis Services, and Azure Synapse.
  • Excellent Hands - on experience working with Snowflake Virtual Data warehouse and implementing complex pipelines.
  • Experience in writing UDFs and Stored procedures using the Snowflake environment.
  • Hands-On experience in Spark Core, Spark SQL, Spark Streaming, and creating the Data Frames handle in SPARK with Scala.
  • Working experience in developing applications involving Big Data technologies like Map Reduce, HDFS, Hive, Sqoop, Oozie, HBase, Pig, Spark, Scala, Kafka, and ETL(DataStage).
  • Experience in Data ingestion to one or more Azure services (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and processing the data in Azure Databricks.
  • Designing and Developed Oracle PL/SQL and Shell Scripts, Linux, and UNIX commands performed Data Import/Export, Data Conversions, and Data Cleansing.
  • Strong experience in Software Development Life Cycle (SDLC) including Requirements Analysis, Design Specification, and Testing as per Cycle in both Waterfall and Agile methodologies.
  • Ability to work effectively in cross-functional team environments, excellent communication, and interpersonal skills.

TECHNICAL SKILLS

Cloud Services: Azure Data Factory (ADF), Azure Data Lake Storage (ADLS), Azure Synapse Analytics (SQL Data warehouse), Azure SQL Database, Polybase, Azure Cosmos NoSQL DB, Azure Key vaults, Azure DevOps, Big Data Technologies like Hadoop, Apache Spark and Azure Data bricks.Big Data Technologies Hortonworks, Cloudera

Databases: MongoDB, Dynamo DB, Cassandra, Snowflake, Oracle, MySQL, SQL Server

Programming Languages: Shell script, Perl script, SQL, Python, PySpark

Tools: Visual Studio, SQL*Plus, SQL Developer, SQL Navigator, SQL Server Management Studio, SQOOP, Hive, HBase, Flume, Kafka, Yarn, Apache Spark, PyCharm, Eclipse, Postman.

CI/CD Tools: Terraform, Jenkins, ADO

Version Control: SVN, GIT, GitHub.

Operating Systems: Windows 10/7/XP/2000/NT/98/95, UNIX, LINUX, OS

Visualization/ Reporting: PowerBI, Tableau

PROFESSIONAL EXPERIENCE

Confidential, Dallas

Data Engineer

Responsibilities:

  • Designed and set up Enterprise Data Lake to provide support for various use cases including Analytics, processing, storing, and Reporting of voluminous, rapidly changing data.
  • Responsible for maintaining quality reference data in source by performing operations such as cleaning, transformation, and ensuring Integrity in a relational environment by working closely with the stakeholders & solution architect.
  • Worked on creating tabular models on Azure analytic services for meeting business reporting requirements.
  • Data Ingestion to one or more cloud Azure Services - (Azure Data Lake, Azure Storage, Azure SQL,
  • Azure DW) and cloud migration processing the data in Azure Databricks.
  • Creating pipelines, data flows, and complex data transformations and manipulations using ADF and PySpark with Databricks.
  • Have good experience working with Azure BLOB and Data Lake storage and loading data into Azure SQL Synapse analytics (DW).
  • Experience working with Azure SQL Database Import and Export Service.
  • Developed Python, PySpark, and Bash scripts logs to Transform, and Load data across on-premises and cloud platforms.
  • Worked on Apache Spark Utilizing the Spark, SQL, and Streaming components to support the intraday and real-time data processing.
  • Set up and worked on Kerberos authentication principals to establish secure network communication on cluster and testing of HDFS, Hive, Pig, and Map Reduce to access cluster for new users.
  • Used Spark SQL for Scala & amp, a Python interface that automatically converts RDD case classes to schema RDD.
  • Import the data from different sources like HDFS/HBase into Spark RDD and perform computations using PySpark to generate the output response.
  • Implementing different performance optimization techniques such as using distributed cache for small datasets, partitioning, and bucketing in the hive, doing map side joins, etc.
  • Good knowledge of Spark platform parameters like memory, cores, and executors
  • Developed a reusable framework to be leveraged for future migrations that automate ETL from RDBMS systems to the Data Lake utilizing Spark Data Sources and Hive data objects.
  • Importing & exporting databases using SQL Server Integrations Services (SSIS) and Data Transformation Services (DTS Packages).

Environment: Azure, Azure Data Factory, Databricks, PySpark, Python, Apache Spark, HBase, HIVE, SQOOP, Snowflake, Python, SSRS, Tableau.

Confidential - Schaumburg, IL.

Big data Developer

Responsibilities:

  • In-depth understanding/ knowledge of Hadoop architecture and various components such as HDFS, application manager, node master, resource manager name node, data node, and map-reduce concepts.
  • Involved in developing a Map Reduce framework that filters bad and unnecessary records.
  • Involved heavily in setting up the CI/CD pipeline using Jenkins, Maven, Nexus, GitHub, and AWS.
  • Developed data pipeline using flume, Sqoop, pig, and map-reduce to ingest customer behavioral data and purchase histories into HDFS for analysis.
  • Used Spark-SQL to load JSON data and create schema RDD and loaded it into Hive tables handled structured data using Spark SQL
  • Used HIVE to do transformations, event joins, and some pre-aggregations before storing the data onto HDFS.
  • The Hive tables created as per requirement were internal or external tables defined with appropriate static and dynamic partitions, intended for efficiency.
  • Implemented the workflows using the Apache Oozie framework to automate tasks.
  • Developing design documents considering all possible approaches and identifying the best of them.
  • Written Map Reduce code that will take input as log files and parse the and structure them in tabular format to facilitate effective querying on the log data.
  • Developed scripts and automated data management from end to end and sync up b/w all the Clusters.
  • Implemented Fair schedulers on the Job Tracker to share the resources of the cluster for the Map Reduce jobs given by the users.

Environment: Cloudera CDH 3/4 Distribution, Tibco, HDFS, MapReduce, Hive, Oozie, Pig, Shell Scripting, MySQL.

We'd love your feedback!