Senior Data Engineer Resume
Manassas, VA
PROFESSIONAL SUMMARY:
- Having 8+ years of experience as Senior Data Engineer with strong technical expertise, business experience, and communication skills to drive high - impact business outcomes
- Skilled in data cleansing, preprocessing using Python and creating data workflows with SQL queries using Alteryx and prepares Tableau Data Extracts (TDE)
- Expertise in PySpark on AWS (EMR, S3) to create HDFS files with Structured streaming along with Apache NiFi workflows on NoSQL environment
- Experience in writing SQL queries to validate data movement between different layers in data warehouse environment.
- Experience in developing Logical data modeling, Reverse engineering and physical data modeling of CRM system using ER-WIN and Infosphere
- Experience in fetching data into Hadoop Data lake from various databases like MySQL, Oracle, DB2, Teradata and SQL Server using Sqoop.
- Experience in designing stunning visualizations using Tableau software and publishing and presenting dashboards, stories on web and desktop platforms
- Experience in data stream processing using Kafka (Zookeeper for developing data pipelines with PySpark
- Experience in developing Logical data modeling, Reverse engineering and physical data modeling of CRM system using ER-WIN and Infosphere
- Expertise in all aspects of Agile SDLC from requirement analysis, Design, Development Coding, Testing, Implementation, and maintenance
- Experience with Airflow to schedule ETL jobs and Glue and Athena to extract the data from AWS data warehouse
- Designed NoSQL, Google BigQuery for transforming unstructured data to structured data sets
- Extensive experience in machine learning and statistics to draw meaningful insights from data. I am good at communication and storytelling with data
- Strong knowledge of statistical methods (regression, time series, hypothesis testing, randomized experiment), machine learning, algorithms, data structures and data infrastructure
- Expertise in technical proficiency in Designing, Data Modeling for data warehouse/Business Intelligence Applications
- Defining job flows in Hadoop environment-using tools like Oozie for data scrubbing and processing
TECHNICAL SKILLS:
Cloud/Frameworks: Amazon web services (AWS), Google cloud, Spark (PySpark, Mlib)
Visualization: Tableau, Power BI, Data studio
Tools: Excel, Data Robot, Apache NiFi, Alteryx
Python: pandas, scikit-learn. Regular expressions, SQL
Data Analysis: Data cleansing, slicing, transformation of variables
ML: Regression (Linear, Logistic), Random forests
PROFESSIONAL EXPERIENCE:
Confidential, Manassas, VA
Senior Data Engineer
Responsibilities:
- Responsible for building the datalake in Amazon AWS, ingesting structured shipment and master data from Azure ServiceBus using the AWS APIGateway, Lambda, and Kinesis Firehose into s3 buckets.
- Implemented Data pipelines for big data processing using Spark transformations and Python API and clusters in AWS
- Create complex Sql queries in Teradata Data Warehouse environment to test the data flow across all the stages
- Integrated data sources from Kafka (Producer and Consumer API) for data stream-processing in Spark using AWS Network
- Designing the rules engine in sparkSql which will process millions of records on a Spark Cluster on the Azure Datalake.
- Extensively involved in designing the SSIS packages to load data into Data Warehouse
- Built customer insights on customer/service utilization, bookings & CRM data using Gainsight.
- Executed process improvements in data workflows using Alteryx processing engine and SQL
- Collaborated with business owners of products for understanding business needs and automated business processes and data storytelling in Tableau
- Implemented Agile Methodology for building the data applications and framework development
- Implemented business processing models using predictive & prescriptive analytics on transactional data with regression
- Implemented Logistic, Random forests ML models with Python packages to decide insurance purchase by a Confidential member
Environment: Python, SQL, Tableau, Bigdata, Data lake, Alteryx, Hive, CRM, OLAP, Excel, DataRobot
Confidential, Austin, Texas
Senior Data Engineer
Responsibilities:- Responsible for Designing and Creating SSAS Cubes from the Data Ware House.
- Developed data processing pipelines (processes 40-50 GB daily) using Python libraries with Google internal tools such as Pantheon ETL, Plx scripts with SQL
- Automated feature engineering mechasims using Python scripts and deployed on Google cloud platform (GCP) and BigQuery
- Pulling the data from data lake (HDFS) and massaging the data with various RDD transformations.
- Prepared Technical design based on Functional requirements and modified Spark scripts and resolved Production bugs on various scripts for data transformations in Python.
- Extensive expertise in Data Warehousing on different database (s), as well as data modeling, both logical and physical data modeling tools like Erwin, Power Designer and ER Studio.
- Parsed JSON and log data and designed the data flows using Apache NiFi - Processors, Funnels
- Build Tableau and Data studio dashboards based on Marketing campaign requirements and presented them to Sales Directors
- Built a data lake as a cloud based solution in AWS using Apache Spark and provide visualization of the ETL orchestration using CDAP tool.
- Implemented the project development using Agile processes in Kanban boards and 2-week sprints
- Developed machine learning models such as Random forests using TensorFlow
- Prepared Data models and schema on GCP for different projects based on star and snowflake schema designs
Confidential, Henderson, NV
Data Engineer - Marketing
Responsibilities:
- Used pandas, numpy, Seaborn, scipy, matplotlib, sci-kit-learn in Python for developing various machine learning algorithms
- Participated in all phases of data mining; data collection, data cleaning, developing models, validation, visualization and performed Gap analysis.
- Implemented Logistic regression, TensorFlow with R packages - dplyr, mice, rpart
- Worked as Data Architects and IT Architects to understand the movement of data and its storage
- Data Manipulation and Aggregation froma different source using Nexus, Toad, Business Objects, PowerBI and SmartView.
- Focus on integration overlap and Informatica newer commitment to MDM with the acquisition of Identity Systems.
- Updated Python scripts to match training data with our database stored in AWS Cloud Search, so that we would be able to assign each document a response label for further classification.
- Data transformation from various resources, data organization, features extraction from raw and stored.
- Handled importing data from various data sources, performed transformations using Hive, MapReduce, and loaded data into HDFS.
- Interaction with Business Analyst, SMEs, and other Data Architects to understand Business needs and functionality for various project solutions
Environment: Python, Informatica, Bigdata, Hive, OLAP, DB2, Metadata, H20.ai
Confidential, Boston, MA
Data Engineer/Analyst
Responsibilities:
- Worked with several R packages including knitr, dplyr, Spark, R, Causal Infer, space-time.
- Implemented end-to-end systems for Data Analytics, Data Automation and integrated with custom visualization tools using R, Mahout, Hadoop, andMongoDB.
- Gathering all the data that is required from multiple data sources and creating datasets that will be used in the analysis.
- Extracted data using SQL from data sources and performed Exploratory Data Analysis (EDA) and Data Visualizations using R, and Tableau.
- Implemented Univariate and Bi-variate analysis to understand the intrinsic effect/combined effects.
- Worked with Data Governance, Data quality, data lineage, Data architect to design various models and processes.
- Independently coded new programs and designed Tables to load and test the program effectively for the given POC's using with Big Data/Hadoop.
- Designed data models and data flow diagrams using MS Visio.
- As an Architect implemented MDM hub to provide clean, consistent data for an SOA implementation.
- Developed, Implemented &Maintained the Conceptual, Logical&PhysicalDataModels using Erwin for forwarding/Reverse Engineered Databases.
- Lead the development and presentation of a data analytics data-hub prototype with the help of the other members of the emerging solutions team
- Performed data cleaning and imputation of missing values using R.
- Take up ad-hoc requests based on different departments and locations
Environment: R, SQL, Informatica, ODS, OLTP, Oracle 10g, Hive, OLAP, Excel, MS Visio, Hadoop
Confidential, Boston, MA
Data Engineer/Analyst
Responsibilities:
- Analyzed survey response data to determine consumer preferences on client products and proposed recommendations
- Improved efficiency of business processes by 10% through implementation of data management procedures
- Automated the computations to determine market metric information on consumer demographic information
- Implemented predictive modeling techniques to increase long-term growth by 12% for products in US regions
- Developed a scoring mechanism using SAS based on customer segmentation to increase sales by 20%
- Performed Map Reduce Programs those are running on the cluster.
- Developed multiple MapReduce jobs in java for data cleaning and preprocessing.
Environment: SQL/Server, Oracle, MS-Office, Teradata, Informatica, ER Studio, XML, Business Objects, Java, HIVE, AWS
Confidential
Data Analyst
Responsibilities:
- Extracted and validated financial data from external data source like Quandl to generate reports to C-level executives
- Designed a data story framework and new financial benchmark metrics on Costs and departmental expenditures
- Implemented charts, graphs and distribution of revenues through visualization tools in Tableau for CFOs
- Reduced 500 man-hours by auto cleaning of data with validations using Python and R to improve efficiency
- Predicted revenue based on R&D and Sales expenses using financial econometric models
- Worked with large amounts of structured and unstructured data.
- Worked in Business Intelligence tools and visualization tools such as Business Objects, ChartIO, etc.
- Configured the project on WebSphere 6.1 application servers
- Communicated with other Health Care info by using Web Services with the help of SOAP, WSDL JAX-RPC
Environment: MDM, Tableau, Data modeling, PL/SQL, Python, JSON