We provide IT Staff Augmentation Services!

Sr. Data Engineer Resume

5.00/5 (Submit Your Rating)

Rochester, MN

SUMMARY

  • Over 8 years of working experience as Data Engineer with high proficient knowledge in Data Analysis.
  • Experienced using "Big data" work on Hadoop, Spark, PySpark, Hive, HDFS and other NoSQL platforms.
  • Experienced working extensively on the Master Data Management(MDM) and application used for MDM.
  • Experience in transferring the data using Informatica tool from AWS S3 to AWS Redshift.
  • Hands on experience with Amazon Web Services along with provisioning and maintaining AWS resources such as EMR, S3 buckets, EC2instances, RDS and others.
  • Hands on experience with Google cloud services like GCP, BigQuery, GCS Bucket and G - Cloud Function.
  • Experienced in Informatica ILM and Informatica Lifecycle Management and its tools.
  • Efficient in all phases of the development lifecycle, coherent with Data Cleansing, Data Conversion, Data Profiling, Data Mapping, Performance Tuning and System Testing.
  • Good Knowledge on SQL queries and creating database objects like stored procedures, triggers, packages and functions using SQL and PL/SQL for implementing the business techniques.
  • Supporting ad-hoc business requests and Developed Stored Procedures and Triggers and extensively used Quest tools like TOAD.
  • Good understanding and exposure to Python programming.
  • Excellent working experience in Scrum/Agile framework and Waterfall project execution methodologies.
  • Extensive experience working with business users/SMEs as well as senior management.
  • Experience in Big Data Hadoop Ecosystem in ingestion, storage, querying, processing and analysis of Big data.
  • Experience in developing Map Reduce Programs using Apache Hadoop for analyzing the big data as per the requirement.
  • Experienced in Technical consulting and end-to-end delivery with architecture, data modeling, data governance and design - development - implementation of solutions.
  • Experience in installation, configuration, supporting and managing - Cloudera Hadoop platform along with CDH4&CDH5 clusters.
  • Strong experience and knowledge of NoSQL databases such as MongoDB and Cassandra.
  • Proficient in Normalization/De-normalization techniques in relational/dimensional database environments and have done normalizations up to 3NF.
  • Good understanding of Ralph Kimball (Dimensional) & Bill Inman (Relational) model Methodologies.
  • Strong experience in using MS Excel and MS Access to dump the data and analyze based on business needs.
  • Good experienced in Data Analysis as a Proficient in gathering business requirements and handling requirements management.
  • Experience in migrating the data using Sqoop from HDFS and Hive to Relational Database System and vice-versa according to client's requirement.

PROFESSIONAL EXPERIENCE

Sr. Data Engineer

Confidential, Rochester MN

Responsibilities:

  • As Data Engineer in BNY to drive projects using Spark, SQL and Azure cloud environment.
  • Worked on data governance to provide operational structure to previously ungoverned data environments.
  • Participated in the requirement gathering sessions to understand the expectations and worked with system analysts to understand the format and patterns of the upstream source data.
  • Done data migration from an RDBMS to a NoSQL database, and gives the whole picture for data deployed in various data systems.
  • Designed and implement end-to-end data solutions (storage, integration, processing and visualization) in Azure.
  • Analyze, design and build Modern data solutions using Azure PaaS service to support visualization of data.
  • Designed and Configured Azure Cloud relational servers and databases analyzing current and future business requirements.
  • Worked on migration of data from On-prem SQL server to Cloud databases (Azure Synapse Analytics (DW) & Azure SQL DB).
  • Designed and developed data pipeline in Azure cloud which gets customer data from API and process it to Azure SQL DB.
  • Created External tables in Azure SQL Database for data visualization and reporting purpose.
  • Create and setup self-hosted integration runtime on virtual machines to access private networks.
  • Orchestrated all Data pipelines using Azure Data Factory and built a custom alerts platform for monitoring.
  • Extract Transform and Load data from Sources Systems to Azure Data Storage services using a combination of Azure Data Factory.
  • Done Data Ingestion to one or more Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and processing the data in Azure Databricks.
  • Performed SAP Data Migration by using Business Objects Data Services as the ETL tool.
  • Working with Azure BLOB and Data lake storage and loading data into Azure SQL Synapse analytics (DW).
  • Working on building visuals and dashboards using Power BI reporting tool.
  • Developed streaming pipelines using Apache Spark with Python.
  • Created PL/SQL packages and Database Triggers and developed user procedures and prepared user manuals for the new programs.
  • To meet specific business requirements wrote UDF’s in Scala and PySpark.
  • Developed JSON API Scripts for deploying the Pipeline in Azure Data Factory (ADF) that process the data using the Sql Activity.
  • Worked in Agile environment, and used rally tool to maintain the user stories and tasks.
  • Worked with Enterprise data support teams to install Hadoop updates, patches, version upgrades as required and fixed problems, which raised after the upgrades.
  • Implemented test scripts to support test-driven development and continuous integration.
  • Used Spark for Parallel data processing and better performances.
  • Used Azure Key vault as central repository for maintaining secrets and referenced the secrets in Azure Data Factory and also in Databricks notebooks.
  • Used Python to extract data for Web scraping.
  • Used to collect data from Social Media websites such as Twitter to find out what’s trending using Social Media Scraping.
  • Conducted numerous training sessions, demonstration sessions on Big Data.

Environment: Hadoop 3.3, Spark 3.3, Azure, ADF, Scala 3.0, JSON, Power BI, Azure SQL DB, Azure Synapse, Python 3.9, PL/SQL and Agile.

Data Engineer

Confidential, CA

Responsibilities:

  • As a Data Engineer, assisted in leading the plan, building, and running states within the Enterprise Analytics Team.
  • Lead architecture and design of data processing, warehousing and analytics initiatives.
  • Engaged in solving and supporting real business issues with your Hadoop distributed File systems and Open Source framework knowledge.
  • Responsible for data governance rules and standards to maintain the consistency of the business element names in the different data layers.
  • Involved in various phases of development analyzed and developed the system going through Agile Scrum methodology.
  • Performed detailed analysis of business problems and technical environments and use this data in designing the solution and maintaining data architecture.
  • Build a program with Python and apache beam and execute it in cloud Data flow to run Data validation between raw source file and Big query tables.
  • Built the data pipelines that will enable faster, better, data-informed decision-making within the business.
  • Used Rest API with Python to ingest Data from and some other site to Big Query.
  • Loaded and transformed large sets of structured, semi structured and unstructured data using Hadoop/Big Data concepts.
  • Performed Data transformations in Hive and used partitions, buckets for performance improvements.
  • Optimized Hive queries to extract the customer information from HDFS.
  • Involved in scheduling Oozie workflow engine to run multiple Hive jobs.
  • Continuously monitor and manage data pipeline (CI/CD) performance alongside applications from a single console with GCP.
  • Developed Spark scripts by using python and bash Shell commands as per the requirement.
  • Worked on POC to check various cloud offerings including Google Cloud Platform (GCP).
  • Developed a POC for project migration from on prem Hadoop MapR system to GCP.
  • Compared Self hosted Hadoop with respect to GCPs Data Proc, and explored Big Table (managed HBase) use cases, performance evolution.
  • Write a Python program to maintain raw file archival in GCS bucket.
  • Implemented business logic by writing UDFs and configuring CRON Jobs.
  • Designed Google Compute Cloud Data Flow jobs that move data within a 200 PB data lake.
  • Implemented scripts that load Google Big Query data and run queries to export data.

Environment: Hadoop 3.3, Spark 3.1, Python, GCP, Data Lake, GCS, HBase, Oozie, Hive, CI/CD, Big Query, Hive, Rest API, Agile Methodology

Sr. Data Engineer

Confidential, Charlotte, NC

Responsibilities:

  • Worked as Data Engineer to collaborate with other Product Engineering team members to develop, test and support data-related initiatives.
  • Developed understanding of key business, product and user questions.
  • Followed Agile methodology for the entire project.
  • Defined the business objectives comprehensively through discussions with business stakeholders, functional analysts and participating in requirement collection sessions.
  • Provided a summary of the Project's goals, and the specific expectation of business users from BI and how it aligns with the project goals.
  • Lead the estimation, review the estimates, identify the complexities and communicate to all the stakeholders.
  • Responsible for data governance rules and standards to maintain the consistency of the business element names in the different data layers.
  • Migrated on-primes environment on Cloud using MS Azure.
  • Worked on migration of data from On-prem SQL server to Cloud databases (Azure Synapse Analytics (DW) & Azure SQL DB).
  • Performed data flow transformation using the data flow activity.
  • Performed ongoing monitoring, automation, and refinement of data engineering solutions.
  • Creating pipelines, data flows and complex data transformations and manipulations using ADF and PySpark with Databricks.
  • Developed mapping document to map columns from source to target.
  • Created azure data factory (ADF pipelines) using Azure PolyBase and Azure blob.
  • Performed ETL using Azure Data Bricks.
  • Wrote UNIX shell scripts to support and automate the ETL process.
  • Worked on python scripting to automate generation of scripts. Data curation done using azure data bricks.
  • Used stored procedure, lookup, executes pipeline, data flow, copy data, azure function features in ADF.
  • Worked on Kafka to bring the data from data sources and keep it in HDFS systems for filtering.
  • Created several Databricks Spark jobs with PySpark to perform several tables to table operations.
  • Working on building visuals and dashboards using Power BI reporting tool.
  • Providing 24/7 On-call Production Support for various applications.

Environment: Hadoop, Spark, Kafka, Azure Data Bricks, ADF, Python, PySpark, HDFS, ETL, Agile & Scrum meetings

Data Engineer

Confidential

Responsibilities:

  • Participated in requirements sessions to gather requirements along with business analysts and product owners.
  • Involved in Agile development methodology active member in scrum meetings.
  • Involvement in design, development and testing phases of Software Development Life Cycle (SDLC).
  • Installed and configured Hive and also written Hive UDFs and Cluster coordination services through Zookeeper.
  • Architected, Designed and Developed Business applications and Data marts for reporting.
  • Involved in different phases of Development life including Analysis, Design, Coding, Unit Testing, Integration Testing, Review and Release as per the business requirements.
  • Developed Big Data solutions focused on pattern matching and predictive modeling
  • Objective of this project is to build a data lake as a cloud based solution in AWS using Apache Spark.
  • Installed and configured Hadoop Ecosystem components.
  • Worked on implementation and maintenance of Cloudera Hadoop cluster.
  • Created Hive External tables to stage data and then move the data from Staging to main tables
  • Implemented the Big Data solution using Hadoop, hive and Informatica to pull/load the data into the HDFS system.
  • Pulling the data from data lake (HDFS) and massaging the data with various RDD transformations.
  • Involved in Kafka and building use case relevant to our environment.
  • Developed Oozie workflow jobs to execute hive, Sqoop and MapReduce actions.
  • Provided thought leadership for architecture and the design of Big Data Analytics solutions for customers, actively drive Proof of Concept (POC) and Proof of Technology (POT) evaluations and to implement a Big Data solution.
  • Created Integration Relational 3NF models that can functionally relate to other subject areas and responsible to determine transformation rules accordingly in the Functional Specification Document.
  • Responsible for developing data pipeline using flume, Sqoop and pig to extract the data from weblogs and store in HDFS.
  • Imported the data from different sources like HDFS/HBase into Spark RDD and developed a data pipeline using Kafka and Storm to store data into HDFS.
  • Documented the requirements including the available code which should be implemented using Spark, Hive, HDFS, HBase and Elastic Search.
  • Developed Spark code using Scala for faster testing and processing of data.
  • Apache Hadoop installation & configuration of multiple nodes on AWS EC2 system
  • Developed Pig Latin scripts for replacing the existing legacy process to the Hadoop and the data is fed to AWS S3.
  • Collaborated with Business users for requirement gathering for building Tableau reports per business needs.
  • Developed continuous flow of data into HDFS from social feeds using Apache Storm Spouts and Bolts.
  • Involved in loading data from Unix file system to HDFS.

Environment: Spark, 3NF, flume 1.8, Sqoop 1.4, pig 0.17, Hadoop 3.0, YARN, HDFS

Data Analyst

Confidential

Responsibilities:

  • Analyzed and reported client/customer data using large data sets like transactional and analytical data to meet business objectives.
  • Responsible for all aspects of management, administration, and support of IBM's internal Linux/UNIX cloud-based infrastructure as the premier hosting provider.
  • Worked with SQL and performed the computations, log transformations and Data exploration to identify the insights and conclusions from complex data using R.
  • Used SPSS for data cleaning, reporting and developed efficient and modifiable statistical scenario.
  • Extracted data from SQL Server using Talend to load it into a single data warehouse repository.
  • Utilized Digital analytics data from Heap in extracting business insights and visualized the trends from the customer events tracked.
  • Extensively used Star and Snowflake Schema methodologies.
  • Worked on working different types of projects like migration projects, Ad-hoc reporting and exploratory research to guide predictive modeling.
  • Applied concepts of R-squared, R.M.S.E. P-value, in the evaluation stage to extract interesting findings through comparisons.
  • Worked on the entire CRISP-DM life cycle and actively involved in all the phases of project life cycle including data acquisition, data cleaning, data engineering.
  • Extensively used Azure Machine Learning to set up the experiments and creating Web services for the predictive analytics.
  • Worked on writing complex SQL queries in performing Data analysis using window functions, joins, improving performance by creating partitioned tables.
  • Prepared dashboards with drill down functions such as date filters, parameters, actions using Tableau to reflect the data behavior over time.

Environment: Azure Cloud, Azure Machine Learning, UNIX, SQL, Talend, Star & Snowflake Schema, Sql queries and Ad-hoc reporting.

We'd love your feedback!