We provide IT Staff Augmentation Services!

Senior Data Engineer Resume

4.00/5 (Submit Your Rating)

Manassas, VA

PROFESSIONAL SUMMARY:

  • Having 8+ years of experience as Senior Data Engineer with strong technical expertise, business experience, and communication skills to drive high - impact business outcomes
  • Skilled in data cleansing, preprocessing using Python and creating data workflows with SQL queries using Alteryx and prepares Tableau Data Extracts (TDE)
  • Expertise in PySpark on AWS (EMR, S3) to create HDFS files with Structured streaming along with Apache NiFi workflows on NoSQL environment
  • Experience in writing SQL queries to validate data movement between different layers in data warehouse environment.
  • Experience in developing Logical data modeling, Reverse engineering and physical data modeling of CRM system using ER-WIN and Infosphere
  • Experience in fetching data into Hadoop Data lake from various databases like MySQL, Oracle, DB2, Teradata and SQL Server using Sqoop.
  • Experience in designing stunning visualizations using Tableau software and publishing and presenting dashboards, stories on web and desktop platforms
  • Experience in data stream processing using Kafka (Zookeeper for developing data pipelines with PySpark
  • Experience in developing Logical data modeling, Reverse engineering and physical data modeling of CRM system using ER-WIN and Infosphere
  • Expertise in all aspects of Agile SDLC from requirement analysis, Design, Development Coding, Testing, Implementation, and maintenance
  • Experience with Airflow to schedule ETL jobs and Glue and Athena to extract the data from AWS data warehouse
  • Designed NoSQL, Google BigQuery for transforming unstructured data to structured data sets
  • Extensive experience in machine learning and statistics to draw meaningful insights from data. I am good at communication and storytelling with data
  • Strong knowledge of statistical methods (regression, time series, hypothesis testing, randomized experiment), machine learning, algorithms, data structures and data infrastructure
  • Expertise in technical proficiency in Designing, Data Modeling for data warehouse/Business Intelligence Applications
  • Defining job flows in Hadoop environment-using tools like Oozie for data scrubbing and processing

TECHNICAL SKILLS:

Cloud/Frameworks: Amazon web services (AWS), Google cloud, Spark (PySpark, Mlib)

Visualization: Tableau, Power BI, Data studio

Tools: Excel, Data Robot, Apache NiFi, Alteryx

Python: pandas, scikit-learn. Regular expressions, SQL

Data Analysis: Data cleansing, slicing, transformation of variables

ML: Regression (Linear, Logistic), Random forests

PROFESSIONAL EXPERIENCE:

Confidential, Manassas, VA

Senior Data Engineer

Responsibilities:

  • Responsible for building the datalake in Amazon AWS, ingesting structured shipment and master data from Azure ServiceBus using the AWS APIGateway, Lambda, and Kinesis Firehose into s3 buckets.
  • Implemented Data pipelines for big data processing using Spark transformations and Python API and clusters in AWS
  • Create complex Sql queries in Teradata Data Warehouse environment to test the data flow across all the stages
  • Integrated data sources from Kafka (Producer and Consumer API) for data stream-processing in Spark using AWS Network
  • Designing the rules engine in sparkSql which will process millions of records on a Spark Cluster on the Azure Datalake.
  • Extensively involved in designing the SSIS packages to load data into Data Warehouse
  • Built customer insights on customer/service utilization, bookings & CRM data using Gainsight.
  • Executed process improvements in data workflows using Alteryx processing engine and SQL
  • Collaborated with business owners of products for understanding business needs and automated business processes and data storytelling in Tableau
  • Implemented Agile Methodology for building the data applications and framework development
  • Implemented business processing models using predictive & prescriptive analytics on transactional data with regression
  • Implemented Logistic, Random forests ML models with Python packages to decide insurance purchase by a Confidential member

Environment: Python, SQL, Tableau, Bigdata, Data lake, Alteryx, Hive, CRM, OLAP, Excel, DataRobot

Confidential, Austin, Texas

Senior Data Engineer

Responsibilities:
  • Responsible for Designing and Creating SSAS Cubes from the Data Ware House.
  • Developed data processing pipelines (processes 40-50 GB daily) using Python libraries with Google internal tools such as Pantheon ETL, Plx scripts with SQL
  • Automated feature engineering mechasims using Python scripts and deployed on Google cloud platform (GCP) and BigQuery
  • Pulling the data from data lake (HDFS) and massaging the data with various RDD transformations.
  • Prepared Technical design based on Functional requirements and modified Spark scripts and resolved Production bugs on various scripts for data transformations in Python.
  • Extensive expertise in Data Warehousing on different database (s), as well as data modeling, both logical and physical data modeling tools like Erwin, Power Designer and ER Studio.
  • Parsed JSON and log data and designed the data flows using Apache NiFi - Processors, Funnels
  • Build Tableau and Data studio dashboards based on Marketing campaign requirements and presented them to Sales Directors
  • Built a data lake as a cloud based solution in AWS using Apache Spark and provide visualization of the ETL orchestration using CDAP tool.
  • Implemented the project development using Agile processes in Kanban boards and 2-week sprints
  • Developed machine learning models such as Random forests using TensorFlow
  • Prepared Data models and schema on GCP for different projects based on star and snowflake schema designs

Confidential, Henderson, NV

Data Engineer - Marketing

Responsibilities:

  • Used pandas, numpy, Seaborn, scipy, matplotlib, sci-kit-learn in Python for developing various machine learning algorithms
  • Participated in all phases of data mining; data collection, data cleaning, developing models, validation, visualization and performed Gap analysis.
  • Implemented Logistic regression, TensorFlow with R packages - dplyr, mice, rpart
  • Worked as Data Architects and IT Architects to understand the movement of data and its storage
  • Data Manipulation and Aggregation froma different source using Nexus, Toad, Business Objects, PowerBI and SmartView.
  • Focus on integration overlap and Informatica newer commitment to MDM with the acquisition of Identity Systems.
  • Updated Python scripts to match training data with our database stored in AWS Cloud Search, so that we would be able to assign each document a response label for further classification.
  • Data transformation from various resources, data organization, features extraction from raw and stored.
  • Handled importing data from various data sources, performed transformations using Hive, MapReduce, and loaded data into HDFS.
  • Interaction with Business Analyst, SMEs, and other Data Architects to understand Business needs and functionality for various project solutions

Environment: Python, Informatica, Bigdata, Hive, OLAP, DB2, Metadata, H20.ai

Confidential, Boston, MA

Data Engineer/Analyst

Responsibilities:

  • Worked with several R packages including knitr, dplyr, Spark, R, Causal Infer, space-time.
  • Implemented end-to-end systems for Data Analytics, Data Automation and integrated with custom visualization tools using R, Mahout, Hadoop, andMongoDB.
  • Gathering all the data that is required from multiple data sources and creating datasets that will be used in the analysis.
  • Extracted data using SQL from data sources and performed Exploratory Data Analysis (EDA) and Data Visualizations using R, and Tableau.
  • Implemented Univariate and Bi-variate analysis to understand the intrinsic effect/combined effects.
  • Worked with Data Governance, Data quality, data lineage, Data architect to design various models and processes.
  • Independently coded new programs and designed Tables to load and test the program effectively for the given POC's using with Big Data/Hadoop.
  • Designed data models and data flow diagrams using MS Visio.
  • As an Architect implemented MDM hub to provide clean, consistent data for an SOA implementation.
  • Developed, Implemented &Maintained the Conceptual, Logical&PhysicalDataModels using Erwin for forwarding/Reverse Engineered Databases.
  • Lead the development and presentation of a data analytics data-hub prototype with the help of the other members of the emerging solutions team
  • Performed data cleaning and imputation of missing values using R.
  • Take up ad-hoc requests based on different departments and locations

Environment: R, SQL, Informatica, ODS, OLTP, Oracle 10g, Hive, OLAP, Excel, MS Visio, Hadoop

Confidential, Boston, MA

Data Engineer/Analyst

Responsibilities:

  • Analyzed survey response data to determine consumer preferences on client products and proposed recommendations
  • Improved efficiency of business processes by 10% through implementation of data management procedures
  • Automated the computations to determine market metric information on consumer demographic information
  • Implemented predictive modeling techniques to increase long-term growth by 12% for products in US regions
  • Developed a scoring mechanism using SAS based on customer segmentation to increase sales by 20%
  • Performed Map Reduce Programs those are running on the cluster.
  • Developed multiple MapReduce jobs in java for data cleaning and preprocessing.

Environment: SQL/Server, Oracle, MS-Office, Teradata, Informatica, ER Studio, XML, Business Objects, Java, HIVE, AWS

Confidential

Data Analyst

Responsibilities:

  • Extracted and validated financial data from external data source like Quandl to generate reports to C-level executives
  • Designed a data story framework and new financial benchmark metrics on Costs and departmental expenditures
  • Implemented charts, graphs and distribution of revenues through visualization tools in Tableau for CFOs
  • Reduced 500 man-hours by auto cleaning of data with validations using Python and R to improve efficiency
  • Predicted revenue based on R&D and Sales expenses using financial econometric models
  • Worked with large amounts of structured and unstructured data.
  • Worked in Business Intelligence tools and visualization tools such as Business Objects, ChartIO, etc.
  • Configured the project on WebSphere 6.1 application servers
  • Communicated with other Health Care info by using Web Services with the help of SOAP, WSDL JAX-RPC

Environment: MDM, Tableau, Data modeling, PL/SQL, Python, JSON

We'd love your feedback!