We provide IT Staff Augmentation Services!

Sr. Data Engineer Resume

0/5 (Submit Your Rating)

Eden Prairie, MN

SUMMARY

  • Over 9 years of experience as Data Engineer including designing, developing and implementation of data models for enterprise - level applications and systems.
  • Proficient in managing entire data science project life cycle and actively involved in all the phases of project.
  • Experienced using "Bigdata" work on Hadoop, Spark, PySpark, Hive, HDFS and other NoSQL platforms.
  • Experienced working extensively on the Master Data Management(MDM) and application used for MDM.
  • Experience in transferring the data using Informatica tool from AWS S3 to AWS Redshift.
  • Hands on experience with Amazon Web Services along with provisioning and maintaining AWS resources such as EMR, S3 buckets, EC2instances, RDS and others.
  • Hands on experience with Google cloud services like GCP, BigQuery, GCS Bucket and G-Cloud Function.
  • Experienced in Informatica ILM and Informatica Lifecycle Management and its tools.
  • Efficient in all phases of the development lifecycle, coherent with Data Cleansing, Data Conversion, Data Profiling, Data Mapping, Performance Tuning and System Testing.
  • Good Knowledge on SQL queries and creating database objects like stored procedures, triggers, packages and functions using SQL and PL/SQL for implementing the business techniques.
  • Supporting ad-hoc business requests and Developed Stored Procedures and Triggers and extensively used Quest tools like TOAD.
  • Good understanding and exposure to Python programming.
  • Excellent working experience in Scrum/Agile framework and Waterfall project execution methodologies.
  • Extensive experience working with business users/SMEs as well as senior management.
  • Experience in Big Data Hadoop Ecosystem in ingestion, storage, querying, processing and analysis of Big data.
  • Experience in developing Map Reduce Programs using Apache Hadoop for analyzing the big data as per the requirement.
  • Experienced in Technical consulting and end-to-end delivery with architecture, data modeling, data governance and design - development - implementation of solutions.
  • Experience in installation, configuration, supporting and managing - Cloudera Hadoop platform along with CDH4&CDH5 clusters.
  • Strong experience and knowledge of NoSQL databases such as Mongo DB and Cassandra.
  • Proficient in Normalization/De-normalization techniques in relational/dimensional database environments and have done normalizations up to 3NF.
  • Good understanding of Ralph Kimball (Dimensional) & Bill Inman (Relational) model Methodologies.
  • Strong experience in using MS Excel and MS Access to dump the data and analyze based on business needs.
  • Good experienced in Data Analysis as a Proficient in gathering business requirements and handling requirements management.
  • Experience in migrating the data using Sqoop from HDFS and Hive to Relational Database System and vice-versa according to client's requirement.

TECHNICAL SKILLS

Data Modeling Tools: Erwin R9.7/9.6, ER Studio V17

Machine Learning: Linear regression, Logistic regression, Decision tree, Random Forest, K nearest neighbor, K mean

Big Data & Hadoop Ecosystem: MapReduce, Spark 3.3, HBase 2.3.4, Hive 2.3, Flume 1.9, Sqoop 1.4.6, Kafka 2.6, Oozie 4.3, Hue, Cloudera Manager, Neo4j, Hadoop 3.3, Apache NiFI 1.6

NOSQL Database: Mongo DB, Azure Sql DB, Cassandra 3.11.10

Databases: Microsoft SQL Server 2017, Teradata 15.0, Oracle 12c, and MS Access

Cloud Platforms: GCP, Google big-query, AWS, EC2, EC3, Redshift & MS Azure

BI Tools: Tableau 10, SSRS, Crystal Reports, Power BI.

Programming Languages: SQL, PL/SQL, UNIX shell Scripting, R

Operating Systems: Microsoft Windows Vista7/8 and 10, UNIX, and Linux.

Methodologies: Agile, RAD, JAD, RUP, UML, System Development Life Cycle (SDLC), Waterfall Model.

PROFESSIONAL EXPERIENCE

Confidential

Sr. Data Engineer

Responsibilities:

  • As Data Engineer in BNY to drive projects using Spark, SQL and Azure cloud environment.
  • Worked on data governance to provide operational structure to previously ungoverned data environments.
  • Participated in the requirement gathering sessions to understand the expectations and worked with system analysts to understand the format and patterns of the upstream source data.
  • Done data migration from an RDBMS to a NoSQL database, and gives the whole picture for data deployed in various data systems.
  • Designed and implement end-to-end data solutions (storage, integration, processing and visualization) in Azure.
  • Analyze, design and build Modern data solutions using Azure PaaS service to support visualization of data.
  • Designed and Configured Azure Cloud relational servers and databases analyzing current and future business requirements.
  • Worked on migration of data from On-prem SQL server to Cloud databases (Azure Synapse Analytics (DW) & Azure SQL DB).
  • Designed and developed data pipeline in Azure cloud which gets customer data from API and process it to Azure SQL DB.
  • Created External tables in Azure SQL Database for data visualization and reporting purpose.
  • Create and setup self-hosted integration runtime on virtual machines to access private networks.
  • Orchestrated all Data pipelines using Azure Data Factory and built a custom alerts platform for monitoring.
  • Extract Transform and Load data from Sources Systems to Azure Data Storage services using a combination of Azure Data Factory.
  • Done Data Ingestion to one or more Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and processing the data in Azure Databricks.
  • Performed SAP Data Migration by using Business Objects Data Services as the ETL tool.
  • Working with Azure BLOB and Data lake storage and loading data into Azure SQL Synapse analytics (DW).
  • Working on building visuals and dashboards using Power BI reporting tool.
  • Developed streaming pipelines using Apache Spark with Python.
  • Created PL/SQL packages and Database Triggers and developed user procedures and prepared user manuals for the new programs.
  • To meet specific business requirements wrote UDF’s in Scala and PySpark.
  • Developed JSON API Scripts for deploying the Pipeline in Azure Data Factory (ADF) that process the data using the Sql Activity.
  • Worked in Agile environment, and used rally tool to maintain the user stories and tasks.
  • Worked with Enterprise data support teams to install Hadoop updates, patches, version upgrades as required and fixed problems, which raised after the upgrades.
  • Implemented test scripts to support test-driven development and continuous integration.
  • Used Spark for Parallel data processing and better performances.
  • Used Azure Key vault as central repository for maintaining secrets and referenced the secrets in Azure Data Factory and also in Databricks notebooks.
  • Used Python to extract data for Web scraping.
  • Used to collect data from Social Media websites such as Twitter to find out what’s trending using Social Media Scraping.
  • Conducted numerous training sessions, demonstration sessions on Big Data.

Environment: Hadoop 3.3, Spark 3.3, Azure, ADF, Scala 3.0, JSON, Power BI, Azure SQL DB, Azure Synapse, Python 3.9, PL/SQL and Agile.

Confidential - Eden Prairie, MN

Sr. Data Engineer

Responsibilities:

  • Worked as Data Engineer to review business requirement and compose source to target data mapping documents.
  • Conducted technical orientation sessions using documentation and training materials.
  • Gathered the business requirements from the Business Partners and Subject Matter Experts.
  • Served as technical expert guiding choices to implement analytical and reporting solutions for client.
  • Worked closely with the business, other architecture team members and global project teams to understand, document and design data warehouse processes and needs.
  • Involved in Agile development methodology active member in scrum meetings.
  • Designed and architect various layer of Data Lake.
  • Designed star schema in BigQuery.
  • Using rest API with Python to ingest Data from and some other site to BigQuery.
  • Monitored BigQuery, Dataproc and cloud Data flow jobs via Stack driver for the entire environment.
  • Open SSH tunnel to Google Dataproc to access to yarn manager to monitor spark jobs.
  • Submitted Spark jobs using gsutil and spark submission get it executed in Dataproc cluster.
  • Used g-cloud function with Python to load Data in to BigQuery for on arrival CSV files in GCS bucket.
  • Wrote a program to download a SQL Dump from the maintenance site and then load it in GCS bucket.
  • Loaded the SQL dump from GCS bucket to MYSQL (hosted in Google cloud SQL) and load the Data from MYSQL to BigQuery using Python, Scala, spark and Dataproc.
  • Process and load bound and unbound Data from Google pub/sub topic to BigQuery using cloud Dataflow with Python.
  • Wrote a Python program to maintain raw file archival in GCS bucket.
  • Wrote Scala program for spark transformation in Dataproc.

Environment: BigQuery, DataStage, Agile/scrum, Python, Spark, Scala, Dataproc, G-cloud, GCS bucket, SQL, Star Schema & MYSQL.

Confidential

Data Engineer

Responsibilities:

  • Worked as a Data Engineer to review business requirement and compose source to target data mapping documents.
  • Architected, Designed and Developed Business applications and Data marts for reporting.
  • Developed Big Data solutions focused on pattern matching and predictive modeling
  • Build Hadoop solutions for big data problems using MR1 and MR2 in YARN.
  • Developed complete end to end Big-data processing in Hadoop eco system.
  • Implemented Installation and configuration of multi-node cluster on Cloud using Amazon Web Services (AWS) on EC2.
  • Developed reconciliation process to make sureelasticsearchindex document count match to source records
  • Developed data pipelines to consume data from Enterprise Data Lake (MapR Hadoop distribution - Hive tables/HDFS) for analytics solution.
  • Created Data Pipeline using Processor Groups and multiple processors using Apache Nifi for Flat File as part of a POC using Amazon EC2.
  • Created Hive External tables to stage data and then move the data from Staging to main tables
  • Implemented the Big Data solution using Hadoop, hive and Informatica to pull/load the data into the HDFS system.
  • Developed incremental and complete load Python processes to ingest data intoElasticSearchfrom oracle database
  • Pulled the data from data lake (HDFS) and massaging the data with various RDD transformations.
  • Created Airflow Scheduling scripts in Python.
  • Developed Spark code using Scala and Spark-SQL/Streaming for faster processing of data.
  • Developed Rest services to write data intoElasticSearchindex using Python Flask specifications
  • Used AWS Cloud with Infrastructure Provisioning / Configuration.
  • Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting on the dashboard.
  • Involved in PL/SQL query optimization to reduce the overall run time of stored procedures.
  • Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting on the dashboard.
  • Worked on configuring and managing disaster recovery and backup on NoSQL database.
  • Utilized Oozie workflow to run Pig and Hive Jobs Extracted files from Mongo DB through Sqoop and placed in HDFS and processed.
  • Continuously tuned Hive UDF's for faster queries by employing partitioning and bucketing.
  • Implemented partitioning, dynamic partitions and buckets in Hive.
  • Used Flume to collect, aggregate, and store the web log data from different sources like web servers, mobile and network devices and pushed to HDFS.
  • Supported in setting up QA environment and updating configurations for implementing scripts with Pig, Hive and Sqoop.

Environment: Apache Spark, Hive 2.3, Informatica, HDFS, Airflow, MapReduce, Scala, Apache NIFI 1.6, Yarn, PL/SQL, Mongo DB, Pig 0.16, Sqoop 1.2, Flume 1.8

Confidential - Houston, TX

Data Analyst/ Data Engineer

Responsibilities:

  • Worked on Data Analyst/Data Engineer and data validation to ensure the accuracy of the data between the warehouse and source systems .
  • Evaluated business requirements and prepared detailed specifications that follow project guidelines required to develop written programs.
  • Used agile methodology and SCRUM process for project developing.
  • Deployed and monitored scalable infrastructure on cloud environment Amazon web services (AWS).
  • Analyzed and Prepare data, identify the patterns on dataset by applying historical models.
  • Involved in Migrating Objects from Teradata to Snowflake.
  • Designed and implemented effective Analytics solutions and models with Snowflake.
  • Involved with data profiling for multiple sources and answered complex business questions by providing data to business users.
  • Performed Data scrubbing for removing incomplete, irrelevant data and maintained consistency in the target data warehouse by cleaning the dirty data.
  • Loaded data into Hive Tables from Hadoop Distributed File System (HDFS) to provide SQL-like access on Hadoop data.
  • Perform data manipulation, data preparation, normalization, and predictive modeling.
  • Improve efficiency and accuracy by evaluating model in R.
  • Tested the ETL process for both before data validation and after data validation process.
  • Used AWS glue catalog with crawler to get the data from S3 and perform Sql query operations.
  • Designed data profiles for processing, including running SQL, Procedural/SQL queries and using R for Data Acquisition and Data Integrity which consists of Datasets Comparing and Dataset schema checks.
  • Used Python and R for programming for improvement of model.
  • Written SQL queries against Snowflake.
  • Designed the Database Tables & Created Table and Column Level Constraints using the suggested naming conventions for constraint keys.
  • Responsible for creating on - demand tables on S3 files using Lambda Functions and AWS Glue using Python and PySpark.
  • Automated solutions to manual processes with big data tools (Spark, Python and AWS).
  • Involved in loading data from Unix file system to HDFS.
  • Worked on the enhancing the data quality in the database.
  • Maintained PL/SQL objects like packages, triggers, procedures etc.
  • Designed and developed T-SQL stored procedures to extract, aggregate, transform, and insert data.
  • Created several reports for claims handling which had to be exported out to PDF formats.
  • Created Tableau dashboards, datasets, data sources and worksheets

Environment: Snowflake, Hadoop 2.5, Teradata, R, Python, Spark, AWS S3, Glue, PySpark, PL/SQL, T-SQL, Tableau, and agile/Scrum.

Confidential - Boston, MA

Data Analyst

Responsibilities:

  • As a Data Analyst I was responsible for gathering data migration requirements.
  • Identified problematic areas and conduct research to determine the best course of action to correct the data.
  • Analyzed problem and solved issues with current and planned systems as they relate to the integration and management of order data.
  • Under supervision of Sr. Data Scientist performed Data Transformation method for Re scaling and Normalizing Variables.
  • Developed and implemented predictive models using Natural Language Processing Techniques and machine learning algorithms.
  • Involved in Data Mapping activities for the data warehouse.
  • Analyzed reports of data duplicates or other errors to provide ongoing appropriate inter-departmental communication and monthly or daily data reports.
  • Monitor for timely and accurate completion of select data elements.
  • Collected, analyze and interpret complex data for reporting and/or performance trend analysis.
  • Monitor data dictionary statistics.
  • Involved in analyzing and adding new features of Oracle 10g like DBMS SHEDULER, Create Directory, Data pump, CONNECT BY ROOT in existing Oracle 10g application.
  • Coded, tested, debugged, implemented and documented data using R.
  • Applied K-Means algorithm in determining the position of an Agent based on the data collected.
  • Applied Regression in identifying the probability of the Agent's location regarding the insurance policies sold.
  • Archived the old data by converting them in to SAS data sets and flat files.
  • Extensively used Erwin tool in Forward and reverse engineering, following the Corporate Standards in Naming Conventions, using Conformed dimensions whenever possible.
  • Enhance smooth transition from legacy to newer system, through change management process.
  • Planned project activities for the team based on project timelines using Work Breakdown Structure.
  • Compare data with original source documents and validate Data accuracy.
  • Used reverse engineering to create Graphical Representation (E-R diagram) and to connect to existing database.
  • Generate weekly and monthly asset inventory reports.
  • Created Technical Design Documents, Unit Test Cases.
  • Written SQL Scripts and PL/SQL Scripts to extract data from Database to meet business requirements and for Testing Purposes.
  • Written complex SQL queries for validating thedataagainst different kinds of reports generated by Business Objects XIR2
  • Involved in Test case/ data preparation, execution and verification of the test results
  • Created user guidance documentations.
  • Created reconciliation report for validating migrated data.

Environment: UNIX, Shell Scripting, XML Files, K-Means, R, XSD, XML, SAS, PL/SQL, Oracle 10g, Erwin 9.5, Autosys.

We'd love your feedback!