We provide IT Staff Augmentation Services!

Sr. Data Engineer Resume

5.00/5 (Submit Your Rating)

New York, NY

SUMMARY

  • Overall 7+ years of professional experience in IT, working with various Legacy Database systems, which include work experience in Big Data technologies as well.
  • Hands on experience on complete Software Development Life Cycle SDLC for the projects using methodologies like Agile and hybrid methods.
  • Experience in analyzing data using Big Data Ecosystem including HDFS, Hive, HBase, Zookeeper, PIG, Sqoop, and Flume.
  • Knowledge and working experience on big data tools like Hadoop, Azure Data Lake, AWS Redshift.
  • Good understanding of Apache Airflow.
  • Experience in workflow scheduling with Airflow, AWS Data Pipelines, Azure, SSIS, etc.
  • Experience on Migrating SQL database to Azure Data Lake, Azure data lake Analytics, Azure SQL Database, Data Bricks and Azure SQL Data warehouse and Controlling and granting database access and Migrating On premise databases to Azure Data lake store using Azure Data factory.
  • Data Engineer with working experience in AWS, Spark, Hadoop Ecosystem, Python for Data science, data pipeline and Tableau.
  • Strong experience in AWS S3, EMR, EC2, Glue, Lambda, IAM, Kinesis, RDS, Route 53, VPC, Code build, Code pipeline, Cloud watch and Cloud formation.
  • Experienced in working with Amazon Web Services (AWS) using S3, EMR, Redshift, Athena, Glue catalog etc.,
  • Good understanding of Big Data Hadoop and Yarn architecture along with various Hadoop Demons such as Job Tracker, Task Tracker, Name Node, Data Node, Resource/Cluster Manager, and Kafka (distributed stream - processing).
  • Experience in Text Analytics, Data Mining solutions to various business problems and generating data visualizations using SAS and Python.
  • Experience in Developing Spark applications using Spark - SQL in Databricks for data extraction, transformation and aggregation from multiple file formats for analyzing & transforming the data to uncover insights into the customer usage patterns.
  • Good understanding of Spark Architecture including Spark Core, Spark SQL, Data Frames, Spark Streaming, Driver Node, Worker Node, Stages, Executors and Tasks.
  • Strong experience and knowledge of NoSQL databases such as MongoDB and Cassandra.
  • Experience in development and support knowledge on Oracle, SQL, PL/SQL, T-SQL queries.
  • Experienced in configuring and administering the Hadoop Cluster using major Hadoop Distributions like Apache Hadoop and Cloudera.
  • Solid Excellent experience in creating cloud based solutions and architecture using Amazon Web services (Amazon EC2, Amazon S3, Amazon RDS) and Microsoft Azure.
  • Experienced in Technical consulting and end-to-end delivery with architecture, data modeling, data governance and design - development - implementation of solutions.
  • Experience in Big Data Hadoop Ecosystem in ingestion, storage, querying, processing and analysis of Big data.
  • Extensive working experience in agile environment using a CI/CD model.
  • Extensive experience working with structured data using Spark SQL, Data frames, Hive QL, optimizing queries, and in corporate complex UDF's in business logic.

PROFESSIONAL EXPERIENCE

Confidential, New York, NY

Sr. Data Engineer

Responsibilities:

  • Sr Engineer participated in Joint Application Development (JAD) sessions for communicating and managing expectations with the business users and end users.
  • Involved in Agile development methodology active member in scrum meetings.
  • Worked on building centralized Data Lake on AWS Cloud utilizing primary services like S3, EMR, Redshift and Athena and Glue.
  • Design, implement and maintain all AWS infrastructure and services within a managed service environment.
  • Design, Deploy and maintain enterprise class security, network and systems management applications within an AWS environment.
  • Involved in ingestion, transformation, manipulation, and computation of data using kinesis, SQL, AWS glue and Spark
  • Developed AWS lambdas using Python & Step functions to orchestrate data pipelines.
  • Designed and Configured Azure Cloud relational servers and databases analyzing current and future business requirements.
  • Interacted with the SMEs (Subject Matter Experts) and stakeholders to get a better understanding of client business processes and gathered and analyzed business requirements.
  • Designed, documented and deployed systems pertaining to Enterprise Data Warehouse standards and best practices.
  • Installed Hadoop distribution system,
  • Worked on migration of data from On-prem SQL server to Cloud databases (Azure Synapse Analytics (DW) & Azure SQL DB).
  • Designed and developed data pipeline in Azure cloud which gets customer data from API and process it to Azure SQL DB.
  • Worked on catapulting data from database to consume on Databricks.
  • Performance tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning.
  • To meet specific business requirements wrote UDF’s in Scala and PySpark.
  • Pulled the data from data lake (HDFS) and massaging the data with various RDD transformations.
  • Ingested data into HDFS using Sqoop and scheduled an incremental load to HDFS.
  • Worked with Hadoop infrastructure to storage data in HDFS storage and use HIVE SQL to migrate underlying SQL codebase in Azure.
  • Involved with Azure BLOB and Data lake storage and loading data into Azure SQL Synapse analytics (DW).
  • Involved in creating Pipelines and Datasets to load the data onto Data Warehouse.
  • Build Data Warehouse in Azure platform using Azure data bricks and data factory.
  • Implemented a Python-based distributed random forest via Python streaming.
  • Worked on collection of large sets using Python scripting.
  • Worked on storing the data frame into hive as table using Python (PySpark).
  • Used Python Packages for processing JSON and HDFS file formats.
  • Implemented ad-hoc analysis solutions using Azure Data Lake Analytics/Store and HDInsight.
  • Worked on creating tabular models on Azure analysis services for meeting business reporting requirements.
  • Worked with Azure BLOB and Data lake storage and loading data into Azure SQL Synapse analytics (DW).
  • Developed JSON Scripts for deploying the Pipeline in Azure Data Factory (ADF) that process the data using the Sql Activity.
  • Worked on all data management activities on the project data sources, data migration.
  • Worked with data compliance teams, Data governance team to maintain data models, Metadata, Data Dictionaries define source fields and its definitions.
  • Used Azure reporting services to upload and download reports

Confidential, Greenwood Village, CO

Data Analyst/ Data Engineer

Responsibilities:

  • Understood business processes, data entities, data producers, and data dependencies.
  • Conducted meetings with the business and technical team to gather necessary analytical data requirements in JAD sessions.
  • Created the automated build and deployment process for application, re-engineering setup for better user experience, and leading up to building a continuous integration system.
  • Involved in the Complete Software development life cycle (SDLC) to develop the application.
  • Used and supported database applications and tools for extraction, transformation and analysis of raw data.
  • Created and analyzed business requirements to compose functional and implementable technical data solutions.
  • Identified integration impact, data flows and data stewardship.
  • Involved in data analysis, data discrepancy reduction in the source and target schemas.
  • Conducted detailed analysis of the data issue, mapping data from source to target, design and data cleansing on the Data Warehouse
  • Created new data constraints and or leveraged existing constraints for reuse.
  • Created data dictionary, Data mapping for ETL and application support, DFD, ERD, mapping documents, metadata, DDL and DML as required.
  • Anticipated JAD sessions as primary modeler in expanding existing DB and developing new ones.
  • Identified and analyzed source data coming from SQL server and flat files.
  • Evaluated and enhanced current data models to reflect business requirements.
  • Generated, wrote and run SQL script to implement the DB changes including table update, addition or update of indexes, creation of views and store procedures.
  • Consolidated and updated various data models through reverse and forward engineering.
  • Restructured Logical and physical data models to respond to changing business needs and to assured data integrity using PowerDesigner.
  • Created naming convention files and co-coordinated with DBAs to apply the data model changes.
  • Designed ETL specification documents to load the data in target using various transformations according to the business requirements.
  • Used Informatica PowerCenter for extracting, transforming and loading
  • Performed Data profiling, Validation and Integration.
  • Created materialized views to improve performance and tuned the database design.
  • Involved in Data migration and Data distribution testing.
  • Developed and presented Business Intelligence reports and product demos to the team using SSRS (SQL Server Reporting Services).
  • Performed testing, knowledge transfer and mentored other team members.
  • Documented the extent to which data fails to meet threshold reporting requirements
  • Designed, developed, implemented and roll-out of Micro Strategy Business Intelligence applications.
  • Ensured the compliance of the extracts to the Data Quality Center initiatives
  • Built reports and report models using SSRS to enable end user report builder usage.
  • Created daily and monthly reports using SQL and UNIX for Business Analysis.

Confidential

Sr. Data Engineer

Responsibilities:

  • Worked as Data Engineer to review business requirement and compose source to target data mapping documents.
  • Install and configure Apache Airflow for S3 bucket and Snowflake data warehouse and created dags to run the Airflow.
  • Participated in requirements sessions to gather requirements along with business analysts and product owners.
  • Involved in Agile development methodology active member in scrum meetings.
  • Involved in Data Profiling and merge data from multiple data sources.
  • Created data governance templates and standards for the data governance organization.
  • Designed a pseudo-automatic integration process to integrate Data governance portal with MDM hub for enforcing certain Data Governance standards.
  • Implemented a proof of concept deploying this product in AWS S3 bucket and Snowflake.
  • Implemented AWS Lambda functions to run scripts in response to events in Amazon DynamoDB table or S3 bucket or to HTTP requests using Amazon API gateway.
  • Migrated data from AWS S3 bucket to Snowflake by writing custom read/write snowflake utility function using Scala.
  • Wrote, compiled, and executed programs as necessary using Apache Spark in Scala to perform ETL jobs with ingested data.
  • Used Spark Streaming to divide streaming data into batches as an input to Spark engine for batch processing.
  • Prepared scripts to automate the ingestion process using Python and Scala as needed through various sources such as API, AWS S3 and snowflake.
  • Designed and Developed Spark workflows using Scala for data pull from AWS S3 bucket and Snowflake applying transformations on it.
  • Worked on Snowflake Schemas and Data Warehousing and processed batch and streaming data load pipeline using Snow Pipe and Matillion from Data Lake to AWS S3 bucket.
  • List each identified target system and give over view on the impacts for each system. This again includes data landing, staging, core, data sharing and data visualization.
  • Provided design to Data history and Retention requirements.
  • Worked on Snowflake environment to remove redundancy and load real time data from various data sources into HDFS using Kafka
  • Participate in weekly data analyst meting and submit weekly data governance status.
  • Worked on data pre-processing and cleaning the data to perform feature engineering and performed data imputation techniques for the missing values in the dataset using Python.
  • Implemented Spark RDD transformations to Map business analysis and apply actions on top of transformations.
  • Automated resulting scripts and workflow using Apache Airflow and shell scripting to ensure daily execution in production.
  • Created scripts to read CSV, JSON and parquet files from S3 buckets in Python and load into AWS S3, Dynamo DB and Snowflake.
  • Extensively used Code cloud for code check-in and checkouts for version control.

Confidential

Data Engineer

Responsibilities:

  • As a Data Engineer worked with the analysis teams and management teams and supported them based on their requirements.
  • Architected, Designed and Developed Business applications and Data marts for reporting.
  • Implemented Installation and configuration of multi-node cluster on Cloud using Amazon Web Services (AWS) on EC2.
  • Developed reconciliation process to make sure elastic search index document count match to source records.
  • Maintained Tableau functional reports based on user requirements.
  • Created action filters, parameters, and calculated sets for preparing dashboards and worksheets in Tableau.
  • Used Agile (SCRUM) methodologies for Software Development.
  • Developed data pipelines to consume data from Enterprise Data Lake (MapR Hadoop distribution - Hive tables/HDFS) for analytics solution.
  • Created Hive External tables to stage data and then move the data from Staging to main tables.
  • Wrote complex Hive queries to extract data from heterogeneous sources (Data Lake) and persist the data into HDFS.
  • Implemented the Big Data solution using Hadoop, hive and Informatica to pull/load the data into the HDFS system.
  • Developed incremental and complete load Python processes to ingest data into Elastic Search from Hive.
  • Pulled the data from data lake (HDFS) and massaging the data with various RDD transformations.
  • Created Oozie workflow and Coordinator jobs to kick off the jobs on time for data availability.
  • Developed Rest services to write data into Elastic Search index using Python Flask specifications
  • Developed complete end to end Big-data processing in Hadoop eco system.
  • Used AWS Cloud with Infrastructure Provisioning / Configuration.
  • Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting on the dashboard.
  • Created dashboards for analyzing POS data using Tableau.
  • Developed Tableau visualizations and dashboards using Tableau Desktop.
  • Involved in PL/SQL query optimization to reduce the overall run time of stored procedures.
  • Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting on the dashboard.
  • Continuously tuned Hive UDF's for faster queries by employing partitioning and bucketing.
  • Implemented partitioning, dynamic partitions and buckets in Hive.
  • Deployed RMAN to automate backup and maintaining scripts in recovery catalog.
  • Worked on QA the data and adding Data sources, snapshot, caching to the report.

Confidential

SQL Server Developer

Responsibilities:

  • Performed data ETL by collecting, exporting, merging and massaging data from multiple sources and platforms including SSRS/SSIS (SQL Server Integration Services) in SQL Server.
  • Worked with cross-functional teams (including data engineer team) to extract data and rapidly execute from MongoDB through MongoDB connector.
  • Worked on Automating the provisioning of AWS cloud using cloud formation for Ticket routing techniques.
  • Worked with Amazon Redshift tools like SQL workbench/J, PG Admin, DB Hawk, Squirrel SQL.
  • Performed data cleaning and feature selection using Scikit-learn package in python.
  • Partition clustering into 100 by k-means clustering using Scikit-learn package in Python.
  • Used Python to perform ANOVA test to analyze the differences among hotel clusters.
  • Implemented application of various machine learning algorithms and statistical modeling like Decision Tree, Text Analytics, Sentiment Analysis, Naive Bayes, Logistic Regression and Linear Regression using Python to determine the accuracy rate of each model.
  • Worked with ARIMAX, Holt Winters VARMAX to predict the sales in the regular and seasonal intervals.
  • Determined the most accurately prediction model based on the accuracy rate.
  • Used text-mining process of reviews to determine customers concentrations.
  • Delivered result analysis to support team for hotel and travel recommendations.
  • Designed Tableau bar graphs, scattered plots, and geographical maps to create detailed level summary reports and dashboards.
  • Developed hybrid model to improve the accuracy rate.

We'd love your feedback!