We provide IT Staff Augmentation Services!

Sr. Data Engineer Resume

5.00/5 (Submit Your Rating)

St, LouiS

SUMMARY

  • Overall 7+ years of experience as a Data Engineer and Data Analyst, with an expertise in Data Mapping, Data Validation, and dealing with statistical data analysis such as transforming business requirements into analytical models, designing algorithms, machine learning, and strategic solutions that scale across massive volumes of data.
  • Experience with a various BIGDATA technologies, tools, and databases Spark, Hive, python, SQL, AWS, Snowflake, Hadoop, Sqoop, CDL(Cassandra), Teradata, Tableau, and Redshift, but always making sure of living in the world, I cherish most i.e., DATA WORLD.
  • Experience in Software/Application Development using Python, Scala, C, SQL, and in - depth understanding of Distributed Systems Architecture and Parallel Processing Frameworks.
  • Deep knowledge and strong deployment experience in Hadoop and Big Data ecosystems- HDFS, MapReduce, Spark, Pig, Sqoop, Hive, Oozie, Kafka, zookeeper, and HBase.
  • Strong experience in writing scripts using Python API, PySpark API and Spark API for analyzing the data.
  • Experience with Azure Cloud, Azure Data Factory, Azure Data Lake Storage, Azure Synapse Analytics, Azure Analytical Service, Big Data Technologies (Apache Spark) and Data Bricks is preferred.
  • In-depth knowledge of Google Cloud Platform. Built data pipelines in airflow in GCP for ETL-related jobs using different airflow operators.
  • Hands-on experience on GCP cloud and Google Big Query.
  • Very good experience in GCP Dataproc, GCS, Cloud functions, Big Query
  • Hands-On experience on Spark Core, Spark SQL, Spark Streaming, and creating the Data Frames handle in SPARK with Scala.
  • Implemented Apache Airflow for extracting data from multiple data sources and run spark jobs for data transformation.
  • Experience in NoSQL databases and worked on table row key design and to load and retrieve data for real-time data processing and performance improvements based on data access patterns.
  • Strong experience in the Analysis, design, development, testing, and implementation of Business Intelligence solutions using Data Warehouse/Data Mart Design, ETL, BI, Client/Server applications and writing ETL scripts using Regular Expressions and custom tools (Informatica, Pentaho, and Sync Sort) to ETL data.
  • Deep understanding of MapReduce with Hadoop and Spark. Good knowledge of Big Data ecosystem like Hadoop 2.0 (HDFS, Hive, Pig, Impala), Spark (Spark SQL, Spark MLlib, Spark Streaming).
  • Excellent performance in building, publishing customized interactive reports and dashboards with customized parameters including producing tables, graphs, listings using various procedures and tools such as Tableau and user-filters using Tableau.

TECHNICAL SKILLS

Big Data Technologies: Hadoop, HDFS, Hive, MapReduce, Pig, Sqoop, Flume, Oozie, Hadoop distribution, and HBase Py- Spark, Spark, Tera Data

Programming Languages: Java, Python, Scala

Databases: MySQL, SQL/PL-SQL, MS-SQL Server, No SQL, MongoDB, Cassandra, Hbase.

Scripting/ Web Languages: JavaScript, HTML5, CSS3, XML, SQL, Shell

Operating Systems: Linux, Windows

Software Life Cycles: SDLC, Waterfall, and Agile models

Office Tools: MS-Office, MS-Project and Risk Analysis tools, Visio

Utilities/Tools: Eclipse, Tomcat, NetBeans, JUnit, SQL, SVN, Log4j, SOAP UI, ANT, Maven, Automation and MR-Unit, Azure Data Lake Analytics, Azure DevOps, Azure DataBricks

PROFESSIONAL EXPERIENCE

Confidential, St .Louis

Sr. Data Engineer

Responsibilities:

  • Performed data analysis and developed analytic solutions. Data investigation to discover correlations/trends and the ability to explain them. data
  • Worked with Data Engineers, Data Architects, to define back-end requirements for data products (aggregations, materialized views, tables - visualization)
  • In-Depth understanding of Snowflake Multi-cluster Size and Credit Usage Played a key role in Migrating Teradata objects into the Snowflake environment.
  • Developed frameworks and processes to analyze unstructured information. Assisted in Azure Power BI architecture design
  • Migrated some of the existing pipelines to Azure Data bricks using PySpark Notebooks for analytical team.
  • Azure PaaS Solutions like Azure Web Apps, Web Roles, Worker Roles, SQL Azure, and Azure Storage.
  • Used Apache Airflow in Orchestrating and automating ETL data pipelines.
  • Implemented Apache Airflow for authorizing, scheduling, and monitoring data pipelines.
  • Worked on partitioning and used bucketing in HIVE tables and setting tuning parameters to improve the performance
  • Working on tuning long-running hive queries using various optimization techniques.
  • Involved in developing Impala scripts to do Ad hoc queries.
  • Involved in importing and exporting data from HBase using Spark.
  • Experience with Spark, Azkaban, Kafka, Pig, Zookeper, Flume, and Streaming
  • Developed Mappings using Transformations like Expression, Filter, Joiner and Lookups for better data messaging and to migrate clean and consistent data
  • Used Spark Streaming to receive real time data from the Kafka and store the stream data to HDFS using Python and NoSQL databases such as HBase and Cassandra.
  • Developed python scripts for data cleaning, analysis and automating day to day activities.
  • Developing Spark applications using Scala and Spark-SQL for data extraction, transformation, and aggregation from multiple file formats. Using Kafka and integrating with Spark Streaming. Developed data analysis tools using SQL and Python code.

Environment: Spark, Hive, HBase, Sqoop, Flume, MapReduce, HDFS, SQL, Apache Kafka, Azure Cloud, Azure Data Factory, Azure Data Lake Storage, Azure Synapse Analytics, Python.

Confidential, NC

Sr. Data Engineer

Responsibilities:

  • Implemented a Continuous Delivery pipeline with Docker, and Git Hub and AWS.
  • Implemented Apache Airflow for authoring, scheduling, and monitoring Data Pipelines
  • Created yaml files for each data source and including glue table stack creation
  • Planning to move fromVCLOUD to GCP
  • Worked on a python script to extract data from Netezza databases and transfer it to AWS.
  • Developed Lambda functions and assigned IAM roles to run python scripts along with various triggers (SQS, EventBridge, SNS)
  • Created a Lambda Deployment function, and configured it to receive events from S3 buckets
  • Ingestion of data into Hadoop using Sqoop and applying data transformations and using Pig and HIVE.
  • Worked extensively with AWS services like EC2, S3, VPC, ELB, Auto Scaling Groups, Route 53, IAM, CloudTrail, CloudWatch, CloudFormation, CloudFront, SNS, and RDS.
  • Developed Python scripts to parse XML, and JSON files and load the data in AWS Snowflake Data warehouse.
  • ETL development using EMR/Hive/Spark, Lambda, Scala, DynamoDB Streams, Amazon Kinesis Firehose, Redshift and S3.
  • Work on developing events-based data processing pipeline using AWS Lambda, SNS and DynamoDB streams
  • Worked with data investigation, discovery, and mapping tools to scan every single data record from many sources.
  • Handled importing of data from various data sources, performed transformations using Hive, MapReduce, and loaded data into HDFS.
  • Extracted the data from Teradata into HDFS using Sqoop.
  • Analyzed the data by performing Hive queries and running Pig scripts to know user behavior.
  • Collected data using Spark Streaming from AWS S3 bucket in near-real-time and performs necessary Transformations and Aggregation on the fly to build the common learner data model and persists the data in HDFS.
  • Did exploratory data analysis (EDA) using Python and did Python integration with Hadoop Map Reduce and spark

Environment: AWS, Gcp, Java, Bigquery, Gcs Bucket, G-Cloud Function, Apache Beam, Cloud Dataflow, Cloud Shell, Gsutil, Bq Command Line Utilities, Dataproc, Cloud Sql, Mysql, Posgres, Sql Server, Python, Scala, Spark, Hive, Spark-Sql

Confidential, NC

Data Engineer

Responsibilities:

  • Analyzed and gathered business requirements specifications by interacting with client and understanding business requirement specification documents.
  • Designed Data Quality Framework to perform schema validation and data profiling on Python
  • Implemented web applications in Flask frame works following MVC architecture
  • Used Unit Test Python library for testing many programs on python.
  • Wrote Python scripts to parse JSON documents and load the data in database.
  • Designed and maintained databases using Python and developed Python based API (RESTful Web Service) using Flask, SQL, and PostgreSQL.
  • Creating various types of reports such as drill down & drill through reports, Matrix reports Sub reports, and Charts using SQL Server Reporting Services (SSRS).
  • Implemented naming Standards and Warehouse Metadata for facts and dimensions of Logical and Physical Data Models.
  • Involved in ETL processing using Pig & Hive in AWS EMR, S3, and Data Profiling, Mapping, and Integration from multiple sources to AWS S3.
  • Created new database objects like Procedures, Functions, Packages, Triggers, Indexes, and Views using T-SQL in SQL Server.
  • Created complex Stored Procedures and PL/SQL blocks with optimum performance using Bulk Binds (BULK COLLECT & FORALL), Inline views, Reference cursors, cursor variables, dynamic SQL, v-arrays, external tables, nested tables, etc.
  • Used Sqoop to import/export data between RDBMS and hive tables, incremental imports, and created Sqoop jobs for last saved value.
  • Designed Informatica mapping for Error handling and was involved in the preparation of the low-level design (LLD) documents for Informatica Mappings.
  • Designing and developing SQL Server Database, Tables, and Indexes, Stored procedures, Views, User Defined Functions, and other T- SQL statements.

Environment: Python, Flask, Azure, PHP, HTML5, UNIX, LINUX, MYSQL, PostgreSQL, Mongo DB

Confidential

Data Analyst

Responsibilities:

  • Imported Legacy data from SQL Server and Teradata into Amazon S3.
  • Created consumption views on top of metrics to reduce the running time for complex queries.
  • Exported Data into Snowflake by creating Staging Tables to load Data of different files from Amazon S3.
  • As a part of Data Migration, wrote many SQL Scripts for Mismatch of data and worked on loading the history data from Teradata SQL to snowflake.
  • Developed SQL scripts to Upload, Retrieve, Manipulate and handle sensitive data (National Provider Identifier Data I.e., Name, Address, SSN, Phone No) in Teradata, SQL Server Management Studio, and Snowflake Databases for the Project.
  • Implemented Defect Tracking process using JIRA tool by assigning bugs to Development Team
  • Tested Hadoop Map Reduce developed in python, pig, Hive
  • Implement code in Python to retrieve and manipulate data.
  • Incorporated predictive modeling (rule engine) to evaluate the Customer/Seller health score using Implemented Defect Tracking process using JIRA tool by assigning bugs to Development Team
  • Involved in Functional Testing, Integration testing, Regression Testing, Smoke testing and python scripts, performed computations and integrated with the Tableau viz.
  • Worked with stakeholders to communicate campaign results, strategy, issues or needs.
  • Analyzed marketing campaigns from various perspectives including CTR, conversion rates, seasonal/geographical trends, search queries, landing page, conversion funnel, quality score, competitors, distribution channel, etc. to achieve maximum ROI for clients.
  • Understood Business requirements to the core and came up with Test Strategy based on Business rules
  • Created Metric tables, End user views in Snowflake to feed data for Tableau refresh.
  • Generated Custom SQL to verify the dependency for the daily, Weekly, Monthly jobs.

We'd love your feedback!