Sr. Data Engineer Resume
Rochester, MN
SUMMARY
- Over 8 years of working experience as Data Engineer with high proficient knowledge in Data Analysis.
- Experienced using "Big data" work on Hadoop, Spark, PySpark, Hive, HDFS and other NoSQL platforms.
- Experienced working extensively on the Master Data Management(MDM) and application used for MDM.
- Experience in transferring the data using Informatica tool from AWS S3 to AWS Redshift.
- Hands on experience with Amazon Web Services along with provisioning and maintaining AWS resources such as EMR, S3 buckets, EC2instances, RDS and others.
- Hands on experience with Google cloud services like GCP, BigQuery, GCS Bucket and G - Cloud Function.
- Experienced in Informatica ILM and Informatica Lifecycle Management and its tools.
- Efficient in all phases of the development lifecycle, coherent with Data Cleansing, Data Conversion, Data Profiling, Data Mapping, Performance Tuning and System Testing.
- Good Knowledge on SQL queries and creating database objects like stored procedures, triggers, packages and functions using SQL and PL/SQL for implementing the business techniques.
- Supporting ad-hoc business requests and Developed Stored Procedures and Triggers and extensively used Quest tools like TOAD.
- Good understanding and exposure to Python programming.
- Excellent working experience in Scrum/Agile framework and Waterfall project execution methodologies.
- Extensive experience working with business users/SMEs as well as senior management.
- Experience in Big Data Hadoop Ecosystem in ingestion, storage, querying, processing and analysis of Big data.
- Experience in developing Map Reduce Programs using Apache Hadoop for analyzing the big data as per the requirement.
- Experienced in Technical consulting and end-to-end delivery with architecture, data modeling, data governance and design - development - implementation of solutions.
- Experience in installation, configuration, supporting and managing - Cloudera Hadoop platform along with CDH4&CDH5 clusters.
- Strong experience and knowledge of NoSQL databases such as MongoDB and Cassandra.
- Proficient in Normalization/De-normalization techniques in relational/dimensional database environments and have done normalizations up to 3NF.
- Good understanding of Ralph Kimball (Dimensional) & Bill Inman (Relational) model Methodologies.
- Strong experience in using MS Excel and MS Access to dump the data and analyze based on business needs.
- Good experienced in Data Analysis as a Proficient in gathering business requirements and handling requirements management.
- Experience in migrating the data using Sqoop from HDFS and Hive to Relational Database System and vice-versa according to client's requirement.
PROFESSIONAL EXPERIENCE
Sr. Data Engineer
Confidential, Rochester MN
Responsibilities:
- As Data Engineer in BNY to drive projects using Spark, SQL and Azure cloud environment.
- Worked on data governance to provide operational structure to previously ungoverned data environments.
- Participated in the requirement gathering sessions to understand the expectations and worked with system analysts to understand the format and patterns of the upstream source data.
- Done data migration from an RDBMS to a NoSQL database, and gives the whole picture for data deployed in various data systems.
- Designed and implement end-to-end data solutions (storage, integration, processing and visualization) in Azure.
- Analyze, design and build Modern data solutions using Azure PaaS service to support visualization of data.
- Designed and Configured Azure Cloud relational servers and databases analyzing current and future business requirements.
- Worked on migration of data from On-prem SQL server to Cloud databases (Azure Synapse Analytics (DW) & Azure SQL DB).
- Designed and developed data pipeline in Azure cloud which gets customer data from API and process it to Azure SQL DB.
- Created External tables in Azure SQL Database for data visualization and reporting purpose.
- Create and setup self-hosted integration runtime on virtual machines to access private networks.
- Orchestrated all Data pipelines using Azure Data Factory and built a custom alerts platform for monitoring.
- Extract Transform and Load data from Sources Systems to Azure Data Storage services using a combination of Azure Data Factory.
- Done Data Ingestion to one or more Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and processing the data in Azure Databricks.
- Performed SAP Data Migration by using Business Objects Data Services as the ETL tool.
- Working with Azure BLOB and Data lake storage and loading data into Azure SQL Synapse analytics (DW).
- Working on building visuals and dashboards using Power BI reporting tool.
- Developed streaming pipelines using Apache Spark with Python.
- Created PL/SQL packages and Database Triggers and developed user procedures and prepared user manuals for the new programs.
- To meet specific business requirements wrote UDF’s in Scala and PySpark.
- Developed JSON API Scripts for deploying the Pipeline in Azure Data Factory (ADF) that process the data using the Sql Activity.
- Worked in Agile environment, and used rally tool to maintain the user stories and tasks.
- Worked with Enterprise data support teams to install Hadoop updates, patches, version upgrades as required and fixed problems, which raised after the upgrades.
- Implemented test scripts to support test-driven development and continuous integration.
- Used Spark for Parallel data processing and better performances.
- Used Azure Key vault as central repository for maintaining secrets and referenced the secrets in Azure Data Factory and also in Databricks notebooks.
- Used Python to extract data for Web scraping.
- Used to collect data from Social Media websites such as Twitter to find out what’s trending using Social Media Scraping.
- Conducted numerous training sessions, demonstration sessions on Big Data.
Environment: Hadoop 3.3, Spark 3.3, Azure, ADF, Scala 3.0, JSON, Power BI, Azure SQL DB, Azure Synapse, Python 3.9, PL/SQL and Agile.
Data Engineer
Confidential, CA
Responsibilities:
- As a Data Engineer, assisted in leading the plan, building, and running states within the Enterprise Analytics Team.
- Lead architecture and design of data processing, warehousing and analytics initiatives.
- Engaged in solving and supporting real business issues with your Hadoop distributed File systems and Open Source framework knowledge.
- Responsible for data governance rules and standards to maintain the consistency of the business element names in the different data layers.
- Involved in various phases of development analyzed and developed the system going through Agile Scrum methodology.
- Performed detailed analysis of business problems and technical environments and use this data in designing the solution and maintaining data architecture.
- Build a program with Python and apache beam and execute it in cloud Data flow to run Data validation between raw source file and Big query tables.
- Built the data pipelines that will enable faster, better, data-informed decision-making within the business.
- Used Rest API with Python to ingest Data from and some other site to Big Query.
- Loaded and transformed large sets of structured, semi structured and unstructured data using Hadoop/Big Data concepts.
- Performed Data transformations in Hive and used partitions, buckets for performance improvements.
- Optimized Hive queries to extract the customer information from HDFS.
- Involved in scheduling Oozie workflow engine to run multiple Hive jobs.
- Continuously monitor and manage data pipeline (CI/CD) performance alongside applications from a single console with GCP.
- Developed Spark scripts by using python and bash Shell commands as per the requirement.
- Worked on POC to check various cloud offerings including Google Cloud Platform (GCP).
- Developed a POC for project migration from on prem Hadoop MapR system to GCP.
- Compared Self hosted Hadoop with respect to GCPs Data Proc, and explored Big Table (managed HBase) use cases, performance evolution.
- Write a Python program to maintain raw file archival in GCS bucket.
- Implemented business logic by writing UDFs and configuring CRON Jobs.
- Designed Google Compute Cloud Data Flow jobs that move data within a 200 PB data lake.
- Implemented scripts that load Google Big Query data and run queries to export data.
Environment: Hadoop 3.3, Spark 3.1, Python, GCP, Data Lake, GCS, HBase, Oozie, Hive, CI/CD, Big Query, Hive, Rest API, Agile Methodology
Sr. Data Engineer
Confidential, Charlotte, NC
Responsibilities:
- Worked as Data Engineer to collaborate with other Product Engineering team members to develop, test and support data-related initiatives.
- Developed understanding of key business, product and user questions.
- Followed Agile methodology for the entire project.
- Defined the business objectives comprehensively through discussions with business stakeholders, functional analysts and participating in requirement collection sessions.
- Provided a summary of the Project's goals, and the specific expectation of business users from BI and how it aligns with the project goals.
- Lead the estimation, review the estimates, identify the complexities and communicate to all the stakeholders.
- Responsible for data governance rules and standards to maintain the consistency of the business element names in the different data layers.
- Migrated on-primes environment on Cloud using MS Azure.
- Worked on migration of data from On-prem SQL server to Cloud databases (Azure Synapse Analytics (DW) & Azure SQL DB).
- Performed data flow transformation using the data flow activity.
- Performed ongoing monitoring, automation, and refinement of data engineering solutions.
- Creating pipelines, data flows and complex data transformations and manipulations using ADF and PySpark with Databricks.
- Developed mapping document to map columns from source to target.
- Created azure data factory (ADF pipelines) using Azure PolyBase and Azure blob.
- Performed ETL using Azure Data Bricks.
- Wrote UNIX shell scripts to support and automate the ETL process.
- Worked on python scripting to automate generation of scripts. Data curation done using azure data bricks.
- Used stored procedure, lookup, executes pipeline, data flow, copy data, azure function features in ADF.
- Worked on Kafka to bring the data from data sources and keep it in HDFS systems for filtering.
- Created several Databricks Spark jobs with PySpark to perform several tables to table operations.
- Working on building visuals and dashboards using Power BI reporting tool.
- Providing 24/7 On-call Production Support for various applications.
Environment: Hadoop, Spark, Kafka, Azure Data Bricks, ADF, Python, PySpark, HDFS, ETL, Agile & Scrum meetings
Data Engineer
Confidential
Responsibilities:
- Participated in requirements sessions to gather requirements along with business analysts and product owners.
- Involved in Agile development methodology active member in scrum meetings.
- Involvement in design, development and testing phases of Software Development Life Cycle (SDLC).
- Installed and configured Hive and also written Hive UDFs and Cluster coordination services through Zookeeper.
- Architected, Designed and Developed Business applications and Data marts for reporting.
- Involved in different phases of Development life including Analysis, Design, Coding, Unit Testing, Integration Testing, Review and Release as per the business requirements.
- Developed Big Data solutions focused on pattern matching and predictive modeling
- Objective of this project is to build a data lake as a cloud based solution in AWS using Apache Spark.
- Installed and configured Hadoop Ecosystem components.
- Worked on implementation and maintenance of Cloudera Hadoop cluster.
- Created Hive External tables to stage data and then move the data from Staging to main tables
- Implemented the Big Data solution using Hadoop, hive and Informatica to pull/load the data into the HDFS system.
- Pulling the data from data lake (HDFS) and massaging the data with various RDD transformations.
- Involved in Kafka and building use case relevant to our environment.
- Developed Oozie workflow jobs to execute hive, Sqoop and MapReduce actions.
- Provided thought leadership for architecture and the design of Big Data Analytics solutions for customers, actively drive Proof of Concept (POC) and Proof of Technology (POT) evaluations and to implement a Big Data solution.
- Created Integration Relational 3NF models that can functionally relate to other subject areas and responsible to determine transformation rules accordingly in the Functional Specification Document.
- Responsible for developing data pipeline using flume, Sqoop and pig to extract the data from weblogs and store in HDFS.
- Imported the data from different sources like HDFS/HBase into Spark RDD and developed a data pipeline using Kafka and Storm to store data into HDFS.
- Documented the requirements including the available code which should be implemented using Spark, Hive, HDFS, HBase and Elastic Search.
- Developed Spark code using Scala for faster testing and processing of data.
- Apache Hadoop installation & configuration of multiple nodes on AWS EC2 system
- Developed Pig Latin scripts for replacing the existing legacy process to the Hadoop and the data is fed to AWS S3.
- Collaborated with Business users for requirement gathering for building Tableau reports per business needs.
- Developed continuous flow of data into HDFS from social feeds using Apache Storm Spouts and Bolts.
- Involved in loading data from Unix file system to HDFS.
Environment: Spark, 3NF, flume 1.8, Sqoop 1.4, pig 0.17, Hadoop 3.0, YARN, HDFS
Data Analyst
Confidential
Responsibilities:
- Analyzed and reported client/customer data using large data sets like transactional and analytical data to meet business objectives.
- Responsible for all aspects of management, administration, and support of IBM's internal Linux/UNIX cloud-based infrastructure as the premier hosting provider.
- Worked with SQL and performed the computations, log transformations and Data exploration to identify the insights and conclusions from complex data using R.
- Used SPSS for data cleaning, reporting and developed efficient and modifiable statistical scenario.
- Extracted data from SQL Server using Talend to load it into a single data warehouse repository.
- Utilized Digital analytics data from Heap in extracting business insights and visualized the trends from the customer events tracked.
- Extensively used Star and Snowflake Schema methodologies.
- Worked on working different types of projects like migration projects, Ad-hoc reporting and exploratory research to guide predictive modeling.
- Applied concepts of R-squared, R.M.S.E. P-value, in the evaluation stage to extract interesting findings through comparisons.
- Worked on the entire CRISP-DM life cycle and actively involved in all the phases of project life cycle including data acquisition, data cleaning, data engineering.
- Extensively used Azure Machine Learning to set up the experiments and creating Web services for the predictive analytics.
- Worked on writing complex SQL queries in performing Data analysis using window functions, joins, improving performance by creating partitioned tables.
- Prepared dashboards with drill down functions such as date filters, parameters, actions using Tableau to reflect the data behavior over time.
Environment: Azure Cloud, Azure Machine Learning, UNIX, SQL, Talend, Star & Snowflake Schema, Sql queries and Ad-hoc reporting.