Sr. Data Engineer Resume Rochester MN - Hire IT People

SUMMARY

Over 8 years of working experience as Data Engineer with high proficient knowledge in Data Analysis.
Experienced using "Big data" work on Hadoop, Spark, PySpark, Hive, HDFS and other NoSQL platforms.
Experienced working extensively on the Master Data Management(MDM) and application used for MDM.
Experience in transferring the data using Informatica tool from AWS S3 to AWS Redshift.
Hands on experience with Amazon Web Services along with provisioning and maintaining AWS resources such as EMR, S3 buckets, EC2instances, RDS and others.
Hands on experience with Google cloud services like GCP, BigQuery, GCS Bucket and G - Cloud Function.
Experienced in Informatica ILM and Informatica Lifecycle Management and its tools.
Efficient in all phases of the development lifecycle, coherent with Data Cleansing, Data Conversion, Data Profiling, Data Mapping, Performance Tuning and System Testing.
Good Knowledge on SQL queries and creating database objects like stored procedures, triggers, packages and functions using SQL and PL/SQL for implementing the business techniques.
Supporting ad-hoc business requests and Developed Stored Procedures and Triggers and extensively used Quest tools like TOAD.
Good understanding and exposure to Python programming.
Excellent working experience in Scrum/Agile framework and Waterfall project execution methodologies.
Extensive experience working with business users/SMEs as well as senior management.
Experience in Big Data Hadoop Ecosystem in ingestion, storage, querying, processing and analysis of Big data.
Experience in developing Map Reduce Programs using Apache Hadoop for analyzing the big data as per the requirement.
Experienced in Technical consulting and end-to-end delivery with architecture, data modeling, data governance and design - development - implementation of solutions.
Experience in installation, configuration, supporting and managing - Cloudera Hadoop platform along with CDH4&CDH5 clusters.
Strong experience and knowledge of NoSQL databases such as MongoDB and Cassandra.
Proficient in Normalization/De-normalization techniques in relational/dimensional database environments and have done normalizations up to 3NF.
Good understanding of Ralph Kimball (Dimensional) & Bill Inman (Relational) model Methodologies.
Strong experience in using MS Excel and MS Access to dump the data and analyze based on business needs.
Good experienced in Data Analysis as a Proficient in gathering business requirements and handling requirements management.
Experience in migrating the data using Sqoop from HDFS and Hive to Relational Database System and vice-versa according to client's requirement.

PROFESSIONAL EXPERIENCE

Sr. Data Engineer

Confidential, Rochester MN

Responsibilities:

As Data Engineer in BNY to drive projects using Spark, SQL and Azure cloud environment.
Worked on data governance to provide operational structure to previously ungoverned data environments.
Participated in the requirement gathering sessions to understand the expectations and worked with system analysts to understand the format and patterns of the upstream source data.
Done data migration from an RDBMS to a NoSQL database, and gives the whole picture for data deployed in various data systems.
Designed and implement end-to-end data solutions (storage, integration, processing and visualization) in Azure.
Analyze, design and build Modern data solutions using Azure PaaS service to support visualization of data.
Designed and Configured Azure Cloud relational servers and databases analyzing current and future business requirements.
Worked on migration of data from On-prem SQL server to Cloud databases (Azure Synapse Analytics (DW) & Azure SQL DB).
Designed and developed data pipeline in Azure cloud which gets customer data from API and process it to Azure SQL DB.
Created External tables in Azure SQL Database for data visualization and reporting purpose.
Create and setup self-hosted integration runtime on virtual machines to access private networks.
Orchestrated all Data pipelines using Azure Data Factory and built a custom alerts platform for monitoring.
Extract Transform and Load data from Sources Systems to Azure Data Storage services using a combination of Azure Data Factory.
Done Data Ingestion to one or more Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and processing the data in Azure Databricks.
Performed SAP Data Migration by using Business Objects Data Services as the ETL tool.
Working with Azure BLOB and Data lake storage and loading data into Azure SQL Synapse analytics (DW).
Working on building visuals and dashboards using Power BI reporting tool.
Developed streaming pipelines using Apache Spark with Python.
Created PL/SQL packages and Database Triggers and developed user procedures and prepared user manuals for the new programs.
To meet specific business requirements wrote UDF’s in Scala and PySpark.
Developed JSON API Scripts for deploying the Pipeline in Azure Data Factory (ADF) that process the data using the Sql Activity.
Worked in Agile environment, and used rally tool to maintain the user stories and tasks.
Worked with Enterprise data support teams to install Hadoop updates, patches, version upgrades as required and fixed problems, which raised after the upgrades.
Implemented test scripts to support test-driven development and continuous integration.
Used Spark for Parallel data processing and better performances.
Used Azure Key vault as central repository for maintaining secrets and referenced the secrets in Azure Data Factory and also in Databricks notebooks.
Used Python to extract data for Web scraping.
Used to collect data from Social Media websites such as Twitter to find out what’s trending using Social Media Scraping.
Conducted numerous training sessions, demonstration sessions on Big Data.

Environment: Hadoop 3.3, Spark 3.3, Azure, ADF, Scala 3.0, JSON, Power BI, Azure SQL DB, Azure Synapse, Python 3.9, PL/SQL and Agile.

Data Engineer

Confidential, CA

Responsibilities:

As a Data Engineer, assisted in leading the plan, building, and running states within the Enterprise Analytics Team.
Lead architecture and design of data processing, warehousing and analytics initiatives.
Engaged in solving and supporting real business issues with your Hadoop distributed File systems and Open Source framework knowledge.
Responsible for data governance rules and standards to maintain the consistency of the business element names in the different data layers.
Involved in various phases of development analyzed and developed the system going through Agile Scrum methodology.
Performed detailed analysis of business problems and technical environments and use this data in designing the solution and maintaining data architecture.
Build a program with Python and apache beam and execute it in cloud Data flow to run Data validation between raw source file and Big query tables.
Built the data pipelines that will enable faster, better, data-informed decision-making within the business.
Used Rest API with Python to ingest Data from and some other site to Big Query.
Loaded and transformed large sets of structured, semi structured and unstructured data using Hadoop/Big Data concepts.
Performed Data transformations in Hive and used partitions, buckets for performance improvements.
Optimized Hive queries to extract the customer information from HDFS.
Involved in scheduling Oozie workflow engine to run multiple Hive jobs.
Continuously monitor and manage data pipeline (CI/CD) performance alongside applications from a single console with GCP.
Developed Spark scripts by using python and bash Shell commands as per the requirement.
Worked on POC to check various cloud offerings including Google Cloud Platform (GCP).
Developed a POC for project migration from on prem Hadoop MapR system to GCP.
Compared Self hosted Hadoop with respect to GCPs Data Proc, and explored Big Table (managed HBase) use cases, performance evolution.
Write a Python program to maintain raw file archival in GCS bucket.
Implemented business logic by writing UDFs and configuring CRON Jobs.
Designed Google Compute Cloud Data Flow jobs that move data within a 200 PB data lake.
Implemented scripts that load Google Big Query data and run queries to export data.

Environment: Hadoop 3.3, Spark 3.1, Python, GCP, Data Lake, GCS, HBase, Oozie, Hive, CI/CD, Big Query, Hive, Rest API, Agile Methodology

Sr. Data Engineer

Confidential, Charlotte, NC

Responsibilities:

Worked as Data Engineer to collaborate with other Product Engineering team members to develop, test and support data-related initiatives.
Developed understanding of key business, product and user questions.
Followed Agile methodology for the entire project.
Defined the business objectives comprehensively through discussions with business stakeholders, functional analysts and participating in requirement collection sessions.
Provided a summary of the Project's goals, and the specific expectation of business users from BI and how it aligns with the project goals.
Lead the estimation, review the estimates, identify the complexities and communicate to all the stakeholders.
Responsible for data governance rules and standards to maintain the consistency of the business element names in the different data layers.
Migrated on-primes environment on Cloud using MS Azure.
Worked on migration of data from On-prem SQL server to Cloud databases (Azure Synapse Analytics (DW) & Azure SQL DB).
Performed data flow transformation using the data flow activity.
Performed ongoing monitoring, automation, and refinement of data engineering solutions.
Creating pipelines, data flows and complex data transformations and manipulations using ADF and PySpark with Databricks.
Developed mapping document to map columns from source to target.
Created azure data factory (ADF pipelines) using Azure PolyBase and Azure blob.
Performed ETL using Azure Data Bricks.
Wrote UNIX shell scripts to support and automate the ETL process.
Worked on python scripting to automate generation of scripts. Data curation done using azure data bricks.
Used stored procedure, lookup, executes pipeline, data flow, copy data, azure function features in ADF.
Worked on Kafka to bring the data from data sources and keep it in HDFS systems for filtering.
Created several Databricks Spark jobs with PySpark to perform several tables to table operations.
Working on building visuals and dashboards using Power BI reporting tool.
Providing 24/7 On-call Production Support for various applications.

Environment: Hadoop, Spark, Kafka, Azure Data Bricks, ADF, Python, PySpark, HDFS, ETL, Agile & Scrum meetings

Data Engineer

Confidential

Responsibilities:

Participated in requirements sessions to gather requirements along with business analysts and product owners.
Involved in Agile development methodology active member in scrum meetings.
Involvement in design, development and testing phases of Software Development Life Cycle (SDLC).
Installed and configured Hive and also written Hive UDFs and Cluster coordination services through Zookeeper.
Architected, Designed and Developed Business applications and Data marts for reporting.
Involved in different phases of Development life including Analysis, Design, Coding, Unit Testing, Integration Testing, Review and Release as per the business requirements.
Developed Big Data solutions focused on pattern matching and predictive modeling
Objective of this project is to build a data lake as a cloud based solution in AWS using Apache Spark.
Installed and configured Hadoop Ecosystem components.
Worked on implementation and maintenance of Cloudera Hadoop cluster.
Created Hive External tables to stage data and then move the data from Staging to main tables
Implemented the Big Data solution using Hadoop, hive and Informatica to pull/load the data into the HDFS system.
Pulling the data from data lake (HDFS) and massaging the data with various RDD transformations.
Involved in Kafka and building use case relevant to our environment.
Developed Oozie workflow jobs to execute hive, Sqoop and MapReduce actions.
Provided thought leadership for architecture and the design of Big Data Analytics solutions for customers, actively drive Proof of Concept (POC) and Proof of Technology (POT) evaluations and to implement a Big Data solution.
Created Integration Relational 3NF models that can functionally relate to other subject areas and responsible to determine transformation rules accordingly in the Functional Specification Document.
Responsible for developing data pipeline using flume, Sqoop and pig to extract the data from weblogs and store in HDFS.
Imported the data from different sources like HDFS/HBase into Spark RDD and developed a data pipeline using Kafka and Storm to store data into HDFS.
Documented the requirements including the available code which should be implemented using Spark, Hive, HDFS, HBase and Elastic Search.
Developed Spark code using Scala for faster testing and processing of data.
Apache Hadoop installation & configuration of multiple nodes on AWS EC2 system
Developed Pig Latin scripts for replacing the existing legacy process to the Hadoop and the data is fed to AWS S3.
Collaborated with Business users for requirement gathering for building Tableau reports per business needs.
Developed continuous flow of data into HDFS from social feeds using Apache Storm Spouts and Bolts.
Involved in loading data from Unix file system to HDFS.

Environment: Spark, 3NF, flume 1.8, Sqoop 1.4, pig 0.17, Hadoop 3.0, YARN, HDFS

Data Analyst

Confidential

Responsibilities:

Analyzed and reported client/customer data using large data sets like transactional and analytical data to meet business objectives.
Responsible for all aspects of management, administration, and support of IBM's internal Linux/UNIX cloud-based infrastructure as the premier hosting provider.
Worked with SQL and performed the computations, log transformations and Data exploration to identify the insights and conclusions from complex data using R.
Used SPSS for data cleaning, reporting and developed efficient and modifiable statistical scenario.
Extracted data from SQL Server using Talend to load it into a single data warehouse repository.
Utilized Digital analytics data from Heap in extracting business insights and visualized the trends from the customer events tracked.
Extensively used Star and Snowflake Schema methodologies.
Worked on working different types of projects like migration projects, Ad-hoc reporting and exploratory research to guide predictive modeling.
Applied concepts of R-squared, R.M.S.E. P-value, in the evaluation stage to extract interesting findings through comparisons.
Worked on the entire CRISP-DM life cycle and actively involved in all the phases of project life cycle including data acquisition, data cleaning, data engineering.
Extensively used Azure Machine Learning to set up the experiments and creating Web services for the predictive analytics.
Worked on writing complex SQL queries in performing Data analysis using window functions, joins, improving performance by creating partitioned tables.
Prepared dashboards with drill down functions such as date filters, parameters, actions using Tableau to reflect the data behavior over time.

Environment: Azure Cloud, Azure Machine Learning, UNIX, SQL, Talend, Star & Snowflake Schema, Sql queries and Ad-hoc reporting.

We provide IT Staff Augmentation Services!

Sr. Data Engineer Resume

Rochester, MN

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship