Sr. Data Engineer Resume Eden Prairie, MN - Hire IT People

SUMMARY

Over 9 years of experience as Data Engineer including designing, developing and implementation of data models for enterprise - level applications and systems.
Proficient in managing entire data science project life cycle and actively involved in all the phases of project.
Experienced using "Bigdata" work on Hadoop, Spark, PySpark, Hive, HDFS and other NoSQL platforms.
Experienced working extensively on the Master Data Management(MDM) and application used for MDM.
Experience in transferring the data using Informatica tool from AWS S3 to AWS Redshift.
Hands on experience with Amazon Web Services along with provisioning and maintaining AWS resources such as EMR, S3 buckets, EC2instances, RDS and others.
Hands on experience with Google cloud services like GCP, BigQuery, GCS Bucket and G-Cloud Function.
Experienced in Informatica ILM and Informatica Lifecycle Management and its tools.
Efficient in all phases of the development lifecycle, coherent with Data Cleansing, Data Conversion, Data Profiling, Data Mapping, Performance Tuning and System Testing.
Good Knowledge on SQL queries and creating database objects like stored procedures, triggers, packages and functions using SQL and PL/SQL for implementing the business techniques.
Supporting ad-hoc business requests and Developed Stored Procedures and Triggers and extensively used Quest tools like TOAD.
Good understanding and exposure to Python programming.
Excellent working experience in Scrum/Agile framework and Waterfall project execution methodologies.
Extensive experience working with business users/SMEs as well as senior management.
Experience in Big Data Hadoop Ecosystem in ingestion, storage, querying, processing and analysis of Big data.
Experience in developing Map Reduce Programs using Apache Hadoop for analyzing the big data as per the requirement.
Experienced in Technical consulting and end-to-end delivery with architecture, data modeling, data governance and design - development - implementation of solutions.
Experience in installation, configuration, supporting and managing - Cloudera Hadoop platform along with CDH4&CDH5 clusters.
Strong experience and knowledge of NoSQL databases such as Mongo DB and Cassandra.
Proficient in Normalization/De-normalization techniques in relational/dimensional database environments and have done normalizations up to 3NF.
Good understanding of Ralph Kimball (Dimensional) & Bill Inman (Relational) model Methodologies.
Strong experience in using MS Excel and MS Access to dump the data and analyze based on business needs.
Good experienced in Data Analysis as a Proficient in gathering business requirements and handling requirements management.
Experience in migrating the data using Sqoop from HDFS and Hive to Relational Database System and vice-versa according to client's requirement.

TECHNICAL SKILLS

Data Modeling Tools: Erwin R9.7/9.6, ER Studio V17

Machine Learning: Linear regression, Logistic regression, Decision tree, Random Forest, K nearest neighbor, K mean

Big Data & Hadoop Ecosystem: MapReduce, Spark 3.3, HBase 2.3.4, Hive 2.3, Flume 1.9, Sqoop 1.4.6, Kafka 2.6, Oozie 4.3, Hue, Cloudera Manager, Neo4j, Hadoop 3.3, Apache NiFI 1.6

NOSQL Database: Mongo DB, Azure Sql DB, Cassandra 3.11.10

Databases: Microsoft SQL Server 2017, Teradata 15.0, Oracle 12c, and MS Access

Cloud Platforms: GCP, Google big-query, AWS, EC2, EC3, Redshift & MS Azure

BI Tools: Tableau 10, SSRS, Crystal Reports, Power BI.

Programming Languages: SQL, PL/SQL, UNIX shell Scripting, R

Operating Systems: Microsoft Windows Vista7/8 and 10, UNIX, and Linux.

Methodologies: Agile, RAD, JAD, RUP, UML, System Development Life Cycle (SDLC), Waterfall Model.

PROFESSIONAL EXPERIENCE

Confidential

Sr. Data Engineer

Responsibilities:

As Data Engineer in BNY to drive projects using Spark, SQL and Azure cloud environment.
Worked on data governance to provide operational structure to previously ungoverned data environments.
Participated in the requirement gathering sessions to understand the expectations and worked with system analysts to understand the format and patterns of the upstream source data.
Done data migration from an RDBMS to a NoSQL database, and gives the whole picture for data deployed in various data systems.
Designed and implement end-to-end data solutions (storage, integration, processing and visualization) in Azure.
Analyze, design and build Modern data solutions using Azure PaaS service to support visualization of data.
Designed and Configured Azure Cloud relational servers and databases analyzing current and future business requirements.
Worked on migration of data from On-prem SQL server to Cloud databases (Azure Synapse Analytics (DW) & Azure SQL DB).
Designed and developed data pipeline in Azure cloud which gets customer data from API and process it to Azure SQL DB.
Created External tables in Azure SQL Database for data visualization and reporting purpose.
Create and setup self-hosted integration runtime on virtual machines to access private networks.
Orchestrated all Data pipelines using Azure Data Factory and built a custom alerts platform for monitoring.
Extract Transform and Load data from Sources Systems to Azure Data Storage services using a combination of Azure Data Factory.
Done Data Ingestion to one or more Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and processing the data in Azure Databricks.
Performed SAP Data Migration by using Business Objects Data Services as the ETL tool.
Working with Azure BLOB and Data lake storage and loading data into Azure SQL Synapse analytics (DW).
Working on building visuals and dashboards using Power BI reporting tool.
Developed streaming pipelines using Apache Spark with Python.
Created PL/SQL packages and Database Triggers and developed user procedures and prepared user manuals for the new programs.
To meet specific business requirements wrote UDF’s in Scala and PySpark.
Developed JSON API Scripts for deploying the Pipeline in Azure Data Factory (ADF) that process the data using the Sql Activity.
Worked in Agile environment, and used rally tool to maintain the user stories and tasks.
Worked with Enterprise data support teams to install Hadoop updates, patches, version upgrades as required and fixed problems, which raised after the upgrades.
Implemented test scripts to support test-driven development and continuous integration.
Used Spark for Parallel data processing and better performances.
Used Azure Key vault as central repository for maintaining secrets and referenced the secrets in Azure Data Factory and also in Databricks notebooks.
Used Python to extract data for Web scraping.
Used to collect data from Social Media websites such as Twitter to find out what’s trending using Social Media Scraping.
Conducted numerous training sessions, demonstration sessions on Big Data.

Environment: Hadoop 3.3, Spark 3.3, Azure, ADF, Scala 3.0, JSON, Power BI, Azure SQL DB, Azure Synapse, Python 3.9, PL/SQL and Agile.

Confidential - Eden Prairie, MN

Sr. Data Engineer

Responsibilities:

Worked as Data Engineer to review business requirement and compose source to target data mapping documents.
Conducted technical orientation sessions using documentation and training materials.
Gathered the business requirements from the Business Partners and Subject Matter Experts.
Served as technical expert guiding choices to implement analytical and reporting solutions for client.
Worked closely with the business, other architecture team members and global project teams to understand, document and design data warehouse processes and needs.
Involved in Agile development methodology active member in scrum meetings.
Designed and architect various layer of Data Lake.
Designed star schema in BigQuery.
Using rest API with Python to ingest Data from and some other site to BigQuery.
Monitored BigQuery, Dataproc and cloud Data flow jobs via Stack driver for the entire environment.
Open SSH tunnel to Google Dataproc to access to yarn manager to monitor spark jobs.
Submitted Spark jobs using gsutil and spark submission get it executed in Dataproc cluster.
Used g-cloud function with Python to load Data in to BigQuery for on arrival CSV files in GCS bucket.
Wrote a program to download a SQL Dump from the maintenance site and then load it in GCS bucket.
Loaded the SQL dump from GCS bucket to MYSQL (hosted in Google cloud SQL) and load the Data from MYSQL to BigQuery using Python, Scala, spark and Dataproc.
Process and load bound and unbound Data from Google pub/sub topic to BigQuery using cloud Dataflow with Python.
Wrote a Python program to maintain raw file archival in GCS bucket.
Wrote Scala program for spark transformation in Dataproc.

Environment: BigQuery, DataStage, Agile/scrum, Python, Spark, Scala, Dataproc, G-cloud, GCS bucket, SQL, Star Schema & MYSQL.

Confidential

Data Engineer

Responsibilities:

Worked as a Data Engineer to review business requirement and compose source to target data mapping documents.
Architected, Designed and Developed Business applications and Data marts for reporting.
Developed Big Data solutions focused on pattern matching and predictive modeling
Build Hadoop solutions for big data problems using MR1 and MR2 in YARN.
Developed complete end to end Big-data processing in Hadoop eco system.
Implemented Installation and configuration of multi-node cluster on Cloud using Amazon Web Services (AWS) on EC2.
Developed reconciliation process to make sureelasticsearchindex document count match to source records
Developed data pipelines to consume data from Enterprise Data Lake (MapR Hadoop distribution - Hive tables/HDFS) for analytics solution.
Created Data Pipeline using Processor Groups and multiple processors using Apache Nifi for Flat File as part of a POC using Amazon EC2.
Created Hive External tables to stage data and then move the data from Staging to main tables
Implemented the Big Data solution using Hadoop, hive and Informatica to pull/load the data into the HDFS system.
Developed incremental and complete load Python processes to ingest data intoElasticSearchfrom oracle database
Pulled the data from data lake (HDFS) and massaging the data with various RDD transformations.
Created Airflow Scheduling scripts in Python.
Developed Spark code using Scala and Spark-SQL/Streaming for faster processing of data.
Developed Rest services to write data intoElasticSearchindex using Python Flask specifications
Used AWS Cloud with Infrastructure Provisioning / Configuration.
Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting on the dashboard.
Involved in PL/SQL query optimization to reduce the overall run time of stored procedures.
Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting on the dashboard.
Worked on configuring and managing disaster recovery and backup on NoSQL database.
Utilized Oozie workflow to run Pig and Hive Jobs Extracted files from Mongo DB through Sqoop and placed in HDFS and processed.
Continuously tuned Hive UDF's for faster queries by employing partitioning and bucketing.
Implemented partitioning, dynamic partitions and buckets in Hive.
Used Flume to collect, aggregate, and store the web log data from different sources like web servers, mobile and network devices and pushed to HDFS.
Supported in setting up QA environment and updating configurations for implementing scripts with Pig, Hive and Sqoop.

Environment: Apache Spark, Hive 2.3, Informatica, HDFS, Airflow, MapReduce, Scala, Apache NIFI 1.6, Yarn, PL/SQL, Mongo DB, Pig 0.16, Sqoop 1.2, Flume 1.8

Confidential - Houston, TX

Data Analyst/ Data Engineer

Responsibilities:

Worked on Data Analyst/Data Engineer and data validation to ensure the accuracy of the data between the warehouse and source systems .
Evaluated business requirements and prepared detailed specifications that follow project guidelines required to develop written programs.
Used agile methodology and SCRUM process for project developing.
Deployed and monitored scalable infrastructure on cloud environment Amazon web services (AWS).
Analyzed and Prepare data, identify the patterns on dataset by applying historical models.
Involved in Migrating Objects from Teradata to Snowflake.
Designed and implemented effective Analytics solutions and models with Snowflake.
Involved with data profiling for multiple sources and answered complex business questions by providing data to business users.
Performed Data scrubbing for removing incomplete, irrelevant data and maintained consistency in the target data warehouse by cleaning the dirty data.
Loaded data into Hive Tables from Hadoop Distributed File System (HDFS) to provide SQL-like access on Hadoop data.
Perform data manipulation, data preparation, normalization, and predictive modeling.
Improve efficiency and accuracy by evaluating model in R.
Tested the ETL process for both before data validation and after data validation process.
Used AWS glue catalog with crawler to get the data from S3 and perform Sql query operations.
Designed data profiles for processing, including running SQL, Procedural/SQL queries and using R for Data Acquisition and Data Integrity which consists of Datasets Comparing and Dataset schema checks.
Used Python and R for programming for improvement of model.
Written SQL queries against Snowflake.
Designed the Database Tables & Created Table and Column Level Constraints using the suggested naming conventions for constraint keys.
Responsible for creating on - demand tables on S3 files using Lambda Functions and AWS Glue using Python and PySpark.
Automated solutions to manual processes with big data tools (Spark, Python and AWS).
Involved in loading data from Unix file system to HDFS.
Worked on the enhancing the data quality in the database.
Maintained PL/SQL objects like packages, triggers, procedures etc.
Designed and developed T-SQL stored procedures to extract, aggregate, transform, and insert data.
Created several reports for claims handling which had to be exported out to PDF formats.
Created Tableau dashboards, datasets, data sources and worksheets

Environment: Snowflake, Hadoop 2.5, Teradata, R, Python, Spark, AWS S3, Glue, PySpark, PL/SQL, T-SQL, Tableau, and agile/Scrum.

Confidential - Boston, MA

Data Analyst

Responsibilities:

As a Data Analyst I was responsible for gathering data migration requirements.
Identified problematic areas and conduct research to determine the best course of action to correct the data.
Analyzed problem and solved issues with current and planned systems as they relate to the integration and management of order data.
Under supervision of Sr. Data Scientist performed Data Transformation method for Re scaling and Normalizing Variables.
Developed and implemented predictive models using Natural Language Processing Techniques and machine learning algorithms.
Involved in Data Mapping activities for the data warehouse.
Analyzed reports of data duplicates or other errors to provide ongoing appropriate inter-departmental communication and monthly or daily data reports.
Monitor for timely and accurate completion of select data elements.
Collected, analyze and interpret complex data for reporting and/or performance trend analysis.
Monitor data dictionary statistics.
Involved in analyzing and adding new features of Oracle 10g like DBMS SHEDULER, Create Directory, Data pump, CONNECT BY ROOT in existing Oracle 10g application.
Coded, tested, debugged, implemented and documented data using R.
Applied K-Means algorithm in determining the position of an Agent based on the data collected.
Applied Regression in identifying the probability of the Agent's location regarding the insurance policies sold.
Archived the old data by converting them in to SAS data sets and flat files.
Extensively used Erwin tool in Forward and reverse engineering, following the Corporate Standards in Naming Conventions, using Conformed dimensions whenever possible.
Enhance smooth transition from legacy to newer system, through change management process.
Planned project activities for the team based on project timelines using Work Breakdown Structure.
Compare data with original source documents and validate Data accuracy.
Used reverse engineering to create Graphical Representation (E-R diagram) and to connect to existing database.
Generate weekly and monthly asset inventory reports.
Created Technical Design Documents, Unit Test Cases.
Written SQL Scripts and PL/SQL Scripts to extract data from Database to meet business requirements and for Testing Purposes.
Written complex SQL queries for validating thedataagainst different kinds of reports generated by Business Objects XIR2
Involved in Test case/ data preparation, execution and verification of the test results
Created user guidance documentations.
Created reconciliation report for validating migrated data.

Environment: UNIX, Shell Scripting, XML Files, K-Means, R, XSD, XML, SAS, PL/SQL, Oracle 10g, Erwin 9.5, Autosys.

We provide IT Staff Augmentation Services!

Sr. Data Engineer Resume

Eden Prairie, MN

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship