Senior Data Engineer Resume Manassas, VA - Hire IT People

PROFESSIONAL SUMMARY:

Having 8+ years of experience as Senior Data Engineer with strong technical expertise, business experience, and communication skills to drive high - impact business outcomes
Skilled in data cleansing, preprocessing using Python and creating data workflows with SQL queries using Alteryx and prepares Tableau Data Extracts (TDE)
Expertise in PySpark on AWS (EMR, S3) to create HDFS files with Structured streaming along with Apache NiFi workflows on NoSQL environment
Experience in writing SQL queries to validate data movement between different layers in data warehouse environment.
Experience in developing Logical data modeling, Reverse engineering and physical data modeling of CRM system using ER-WIN and Infosphere
Experience in fetching data into Hadoop Data lake from various databases like MySQL, Oracle, DB2, Teradata and SQL Server using Sqoop.
Experience in designing stunning visualizations using Tableau software and publishing and presenting dashboards, stories on web and desktop platforms
Experience in data stream processing using Kafka (Zookeeper for developing data pipelines with PySpark
Experience in developing Logical data modeling, Reverse engineering and physical data modeling of CRM system using ER-WIN and Infosphere
Expertise in all aspects of Agile SDLC from requirement analysis, Design, Development Coding, Testing, Implementation, and maintenance
Experience with Airflow to schedule ETL jobs and Glue and Athena to extract the data from AWS data warehouse
Designed NoSQL, Google BigQuery for transforming unstructured data to structured data sets
Extensive experience in machine learning and statistics to draw meaningful insights from data. I am good at communication and storytelling with data
Strong knowledge of statistical methods (regression, time series, hypothesis testing, randomized experiment), machine learning, algorithms, data structures and data infrastructure
Expertise in technical proficiency in Designing, Data Modeling for data warehouse/Business Intelligence Applications
Defining job flows in Hadoop environment-using tools like Oozie for data scrubbing and processing

TECHNICAL SKILLS:

Cloud/Frameworks: Amazon web services (AWS), Google cloud, Spark (PySpark, Mlib)

Visualization: Tableau, Power BI, Data studio

Tools: Excel, Data Robot, Apache NiFi, Alteryx

Python: pandas, scikit-learn. Regular expressions, SQL

Data Analysis: Data cleansing, slicing, transformation of variables

ML: Regression (Linear, Logistic), Random forests

PROFESSIONAL EXPERIENCE:

Confidential, Manassas, VA

Senior Data Engineer

Responsibilities:

Responsible for building the datalake in Amazon AWS, ingesting structured shipment and master data from Azure ServiceBus using the AWS APIGateway, Lambda, and Kinesis Firehose into s3 buckets.
Implemented Data pipelines for big data processing using Spark transformations and Python API and clusters in AWS
Create complex Sql queries in Teradata Data Warehouse environment to test the data flow across all the stages
Integrated data sources from Kafka (Producer and Consumer API) for data stream-processing in Spark using AWS Network
Designing the rules engine in sparkSql which will process millions of records on a Spark Cluster on the Azure Datalake.
Extensively involved in designing the SSIS packages to load data into Data Warehouse
Built customer insights on customer/service utilization, bookings & CRM data using Gainsight.
Executed process improvements in data workflows using Alteryx processing engine and SQL
Collaborated with business owners of products for understanding business needs and automated business processes and data storytelling in Tableau
Implemented Agile Methodology for building the data applications and framework development
Implemented business processing models using predictive & prescriptive analytics on transactional data with regression
Implemented Logistic, Random forests ML models with Python packages to decide insurance purchase by a Confidential member

Environment: Python, SQL, Tableau, Bigdata, Data lake, Alteryx, Hive, CRM, OLAP, Excel, DataRobot

Confidential, Austin, Texas

Senior Data Engineer

Responsibilities:

Responsible for Designing and Creating SSAS Cubes from the Data Ware House.
Developed data processing pipelines (processes 40-50 GB daily) using Python libraries with Google internal tools such as Pantheon ETL, Plx scripts with SQL
Automated feature engineering mechasims using Python scripts and deployed on Google cloud platform (GCP) and BigQuery
Pulling the data from data lake (HDFS) and massaging the data with various RDD transformations.
Prepared Technical design based on Functional requirements and modified Spark scripts and resolved Production bugs on various scripts for data transformations in Python.
Extensive expertise in Data Warehousing on different database (s), as well as data modeling, both logical and physical data modeling tools like Erwin, Power Designer and ER Studio.
Parsed JSON and log data and designed the data flows using Apache NiFi - Processors, Funnels
Build Tableau and Data studio dashboards based on Marketing campaign requirements and presented them to Sales Directors
Built a data lake as a cloud based solution in AWS using Apache Spark and provide visualization of the ETL orchestration using CDAP tool.
Implemented the project development using Agile processes in Kanban boards and 2-week sprints
Developed machine learning models such as Random forests using TensorFlow
Prepared Data models and schema on GCP for different projects based on star and snowflake schema designs

Confidential, Henderson, NV

Data Engineer - Marketing

Responsibilities:

Used pandas, numpy, Seaborn, scipy, matplotlib, sci-kit-learn in Python for developing various machine learning algorithms
Participated in all phases of data mining; data collection, data cleaning, developing models, validation, visualization and performed Gap analysis.
Implemented Logistic regression, TensorFlow with R packages - dplyr, mice, rpart
Worked as Data Architects and IT Architects to understand the movement of data and its storage
Data Manipulation and Aggregation froma different source using Nexus, Toad, Business Objects, PowerBI and SmartView.
Focus on integration overlap and Informatica newer commitment to MDM with the acquisition of Identity Systems.
Updated Python scripts to match training data with our database stored in AWS Cloud Search, so that we would be able to assign each document a response label for further classification.
Data transformation from various resources, data organization, features extraction from raw and stored.
Handled importing data from various data sources, performed transformations using Hive, MapReduce, and loaded data into HDFS.
Interaction with Business Analyst, SMEs, and other Data Architects to understand Business needs and functionality for various project solutions

Environment: Python, Informatica, Bigdata, Hive, OLAP, DB2, Metadata, H20.ai

Confidential, Boston, MA

Data Engineer/Analyst

Responsibilities:

Worked with several R packages including knitr, dplyr, Spark, R, Causal Infer, space-time.
Implemented end-to-end systems for Data Analytics, Data Automation and integrated with custom visualization tools using R, Mahout, Hadoop, andMongoDB.
Gathering all the data that is required from multiple data sources and creating datasets that will be used in the analysis.
Extracted data using SQL from data sources and performed Exploratory Data Analysis (EDA) and Data Visualizations using R, and Tableau.
Implemented Univariate and Bi-variate analysis to understand the intrinsic effect/combined effects.
Worked with Data Governance, Data quality, data lineage, Data architect to design various models and processes.
Independently coded new programs and designed Tables to load and test the program effectively for the given POC's using with Big Data/Hadoop.
Designed data models and data flow diagrams using MS Visio.
As an Architect implemented MDM hub to provide clean, consistent data for an SOA implementation.
Developed, Implemented &Maintained the Conceptual, Logical&PhysicalDataModels using Erwin for forwarding/Reverse Engineered Databases.
Lead the development and presentation of a data analytics data-hub prototype with the help of the other members of the emerging solutions team
Performed data cleaning and imputation of missing values using R.
Take up ad-hoc requests based on different departments and locations

Environment: R, SQL, Informatica, ODS, OLTP, Oracle 10g, Hive, OLAP, Excel, MS Visio, Hadoop

Confidential, Boston, MA

Data Engineer/Analyst

Responsibilities:

Analyzed survey response data to determine consumer preferences on client products and proposed recommendations
Improved efficiency of business processes by 10% through implementation of data management procedures
Automated the computations to determine market metric information on consumer demographic information
Implemented predictive modeling techniques to increase long-term growth by 12% for products in US regions
Developed a scoring mechanism using SAS based on customer segmentation to increase sales by 20%
Performed Map Reduce Programs those are running on the cluster.
Developed multiple MapReduce jobs in java for data cleaning and preprocessing.

Environment: SQL/Server, Oracle, MS-Office, Teradata, Informatica, ER Studio, XML, Business Objects, Java, HIVE, AWS

Confidential

Data Analyst

Responsibilities:

Extracted and validated financial data from external data source like Quandl to generate reports to C-level executives
Designed a data story framework and new financial benchmark metrics on Costs and departmental expenditures
Implemented charts, graphs and distribution of revenues through visualization tools in Tableau for CFOs
Reduced 500 man-hours by auto cleaning of data with validations using Python and R to improve efficiency
Predicted revenue based on R&D and Sales expenses using financial econometric models
Worked with large amounts of structured and unstructured data.
Worked in Business Intelligence tools and visualization tools such as Business Objects, ChartIO, etc.
Configured the project on WebSphere 6.1 application servers
Communicated with other Health Care info by using Web Services with the help of SOAP, WSDL JAX-RPC

Environment: MDM, Tableau, Data modeling, PL/SQL, Python, JSON

We provide IT Staff Augmentation Services!

Senior Data Engineer Resume

Manassas, VA

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship