Data Engineer Resume Boston, MA - Hire IT People

PROFESSIONAL SUMMARY:

More than 5+ years of experience in implementing various Big Data Engineering, Cloud Data engineering, Data Warehouse, Data Mart, Data Visualization, Reporting, Data Quality, and Data virtualization Solution
Experience in Data transformation, Data mapping from source to target database schema, Data Cleansing procedures
Adept in programming languages like R and Python including Big Data technologies like Hadoop, Hive
Experienced in Normalization (1NF, 2NF, 3NF and BCNF) and De - normalization techniques for effective and optimum performance in OLTP and OLAP environments
Collaborated with lead Data Architect to model Data warehouse in accordance with 3NF format, and Star/Snowflake schema
Excellent Knowledge of Relational Database Design, Data Warehouse/OLAP concepts, and methodologies
Experience in designing star schema, Snowflake schema for Data Warehouse, ODS architecture
Expertise in OLTP/OLAP System Study, Analysis and E-R modeling, developing Database Schemas like Star schema and Snowflake schema used in relational, dimensional and multidimensional modeling
Experienced in Data Management solution that covers DWH/Data Architecture design, Data Governance Implementation and Big Data
Experience in designing, building and implementing complete Hadoop ecosystem comprising of MapReduce, HDFS, Hive, Impala, Pig, Sqoop, Oozie, HBase, MongoDB, and Spark
Experienced in Data Architecture and data modeling using Erwin, ER-Studio and MS Visio
Experience in coding SQL for developing Procedures, Triggers, and Packages
Experience in creating separate virtual data warehouses with difference size classes in AWS Snowflake
Good knowledge on Tableau Metadata tables
Experienced in handling BIG DATA using HADOOP eco system components like SQOOP, PIG and HIVE
Experience writing spark streaming and spark batch jobs, using spark MLlib for analytics
Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems (RDBMS)- Oracle, DB2 and SQL Server and from RDBMS to HDFS
Experienced in Data Analysis, Design, Development, Implementation and Testing using Data Conversions, Extraction, Transformation and Loading (ETL) and SQL Server, ORACLE and other relational and non-relational databases
Well experienced in Normalization, De-Normalization and Standardization techniques for optimal performance in relational and dimensional database environments
Solid understanding of AWS (Amazon Web Services) S3, EC2 and Apache Spark, Scala process, and concepts
Hands on experience in machine learning, big data, data visualization, R and Python development, Linux, SQL, GIT/GitHub
Experience with data visualization using tools like ggplot, Matplotlib, Seaborn, Tableau and using Tableau software to publish and presenting dashboards, storyline on web and desktop platforms
Experienced in python data manipulation for loading and extraction as well as with python libraries such as NumPy, SciPy and Pandas for data analysis and numerical computations
Hands on experience with RStudio for doing data pre-processing and building machine learning algorithms on different datasets
Experienced in Data Analysis, Data Migration, Data Cleansing, Transformation, Integration, Data Import and Data Export through use of multiple tools such as SSIS and Informatica Power Center
Experienced in Data Modeling retaining concepts of RDBMS, Logical and Physical Data Modeling until (3NF) and Multidimensional Data Modeling Schema (Star schema, Snow-Flake Modeling, Facts and dimensions)
Experienced working on NoSQL databases like MongoDB and HBase.
Expertise in Technical proficiency in Designing, Data Modeling Online Application, Data Warehouse, Business Intelligence Applications
Worked and extracted data from various database sources like Oracle, SQL Server, and DB2
Extensive working experience with Python including Scikit-learn, SciPy, Pandas, and NumPy developing machine learning models, manipulating and handling data
Extensive experience in Text Analytics, developing different Statistical Machine Learning, Data Mining solutions to various business problems and generating data visualizations using R, Python
Expertise in complex Data design/development, Master data and Metadata and hands-on experience on Data analysis in planning, coordinating, and executing on records and databases
Proficient in Machine Learning techniques (Decision Trees, Linear/Logistic Regressors, Random Forest, SVM, Bayesian, XG Boost, K-means Clustering, K-Nearest Neighbors) and Statistical Modeling in Forecasting/ Predictive Analytics, Segmentation methodologies, Regression-based models, Hypothesis testing, Factor analysis/ PCA, Ensembles
Implemented machine learning algorithms on large datasets to understand hidden patterns and capture insights

TECHNICAL SKILLS:

Big Data Tools: Hadoop, HDFS, Sqoop, Hbase, Hive, MapReduce, Spark, Cassandra

Cloud Technologies: Snowflake, AWS

ETL Tools: SSIS, Informatica Power Center

Erwin, ER Studio, StarSchema, Snowflake: Schema Modeling, FACT and dimension tables, Pivot Tables

Database: Snowflake Cloud Database, Oracle, MS SQL Server, Teradata, MySQL, DB2

Operating Systems: Microsoft Windows and Unix

Reporting Tools: MS Excel, Tableau, Tableau server, Tableau Reader, Power BI, QlikView

Methodologies: Agile, UML, System Development Life Cycle (SDLC), Ralph Kimball, Waterfall Model

Machine Learning: Regression Models, Classification Models, Clustering, Linear regression, Logistic regression, Decision trees, Random Forest, Gradient Boosting, K nearest neighbor (KNN), K mean, Na ve Bayes, Time Series Analysis, PCA, Avro, MLbase tidyr, tidyverse, dplyr, lubridate, ggplot2, tseries Python - beautiful Soup, numpy, scipy, matplotlib, seaborn, pandas, scikit-learn, keras

Programming Languages: SQL, R (shiny, R-studio), Python (Jupyter Notebook, PyCharm IDE), Scala

PROFESSIONAL EXPERIENCE:

Confidential, Boston, MA

Data Engineer

Responsibilities:

Worked as Data Engineer to review business requirement and compose source to target data mapping documents
Involved in Agile development methodology active member in scrum meetings
Involved in Data Profiling and merge data from multiple data sources
Involved in Big data requirement analysis, develop and design solutions for ETL and Business Intelligence platforms
Designed 3NF data models for ODS, OLTP systems and dimensional data models using Star and Snowflake Schemas
Worked on Snowflake environment to remove redundancy and load real time data from various data sources into HDFS using Kafka
Developed data warehouse model in Snowflake for over 100 datasets
Designing and implementing a fully operational production grade large scale data solution on Snowflake Data Warehouse
Work with structured/semi-structured data ingestion and processing on AWS using S3, Python. Migrate on-premises big data workloads to AWS
Involved in migration of data from existing RDBMS to Hadoop using Sqoop for processing data, evaluate performance of various algorithms/models/strategies based on real-world data sets
Created Hive tables for loading and analyzing data and developed Hive queries to process data and generate data cubes for visualizing
Extracted data from HDFS using Hive, Presto and performed data analysis using Spark with Scala , pySpark and feature selection and created nonparametric models in Spark
Handled importing data from various data sources, performed transformations using Hive, Map Reduce , and loaded data into HDFS
Captured unstructured data that was otherwise not used and stored it in HDFS and HBase / MongoDB. Scarpe data using Beautiful Soup and saved data into MongoDB (JSON format)
Worked on AWS S3 buckets and intra cluster file transfer between PNDA and s3 securely
Design & Implementation of Data Mart, DBA coordination, DDL & DML generation & usage
Provide data architecture support to enterprise data management efforts, such as development of enterprise data model and master and reference data, as well as support to projects, such as development of physical data models, data warehouses and data marts
Worked with Data governance, Data quality, data lineage, Data architect to design various models and processes
Independently coded new programs and designed Tables to load and test program effectively for given POC's using with Big Data/Hadoop
Developed MapReduce/Spark Python modules for machine learning & predictive analytics in Hadoop on AWS
Worked on Apache Spark with Python to develop and execute Big Data Analytics and Machine learning applications, executed machine learning use cases under Spark ML and MLLib
Built and analyzed datasets using R, SAS and Python, designed data models and data flow diagrams using Erwin and MS Visio
Used pandas, numpy, seaborn, scipy, matplotlib, scikit-learn, NLTK in Python for developing various machine learning algorithms for predictive modeling utilizing R and Python
Implemented a Python-based distributed random forest via Python streaming
Utilized machine learning algorithms such as linear regression, multivariate regression, Naive Bayes, Random Forests, K-means, & KNN for data analysis

Environment: Python, R/R studio, SQL, Oracle, Cassandra, MongoDB, AWS, Snowflake, Hadoop, Hive, MapReduce, Scala, Spark, Kafka, MLLib, regression, Tableau

Confidential, Boston MA

Data Engineer

Responsibilities:

Gathered, analyzed, and translated business requirements to technical requirements, communicated with other departments to collect client business requirements and access available data
Acquiring, cleaning and structuring data from multiple sources and maintain databases/data systems. Identifying, analyzing, and interpreting trends or patterns in complex data sets
Develop, prototype and test predictive algorithms. Filtering and cleaning data, review reports and performance indicators
Developing and implementing data collection systems and other strategies that optimize statistical efficiency and data quality
Create and statistically analyze large data sets of internal and external data
Working closely with marketing team to deliver actionable insights from huge volume of data, coming from different marketing campaigns and customer interaction matrices such as web portal usage, email campaign responses, public site interaction, and other customer specific parameters
Performed incremental loads as well as full loads to transfer data from OLTP to Data Warehouse of snowflake schema using different data flow and control flow tasks and provide maintenance for existing jobs
Design and implement secure data pipelines into a Snowflake data warehouse from on-premise and cloud data sources
Creation of best practices and standards for data pipelining and integration with Snowflake data warehouses
Responsible for Data Cleaning, features scaling, features engineering by using NumPy and Pandas in Python
Conducted Exploratory Data Analysis using Python Matplotlib and Seaborn to identify underlying patterns and correlation between features
Worked with NoSQL databases like HBase in creating tables to load large sets of semi structured data coming from source systems
Used information value, principal components analysis , and Chi square feature selection techniques
Used Python and R scripting by implementing machine algorithms to predict data and forecast data for better results
Used Python and R scripting to visualize data and implemented machine learning algorithms
Experience in developing packages in R studio with a shiny interface
Improve efficiency and accuracy by evaluating model in Python and R
Used Python and R script for improvement of model
Experimented with multiple classification algorithms, such as Logistic Regression , Support Vector Machine (SVM) , Random Forest , AdA boost and Gradient boosting using Python Scikit-Learn and evaluated performance on customer discount optimization on millions of customers
Built models using Python and Pyspark to predict probability of attendance for various campaigns and events
Implemented classification algorithms such as Logistic Regression , K-NN neighbors and Random Forests to predict Customer churn and Customer interface
Performed data visualization and Designed dashboards with Tableau , and generated complex reports, including charts, summaries, and graphs to interpret findings to team and stakeholders

Environment: OLTP Data Warehouse, Hadoop, Hive, HBase, Spark, Snowflake, R/R studio, Python- Pandas, Numpy, Scikit-Learn, TensorFlow - SciPy, Seaborn, Matplotlib, SQL, Machine Learning, ggplot, lattice, MASS, mice and logit.

Confidential

Data Engineer

Responsibilities:

Worked with Hadoop eco system covering HDFS, HBase, YARN and MapReduce
Worked with Oozie Workflow Engine in running workflow jobs with actions that run Hadoop MapReduce, Hive, Spark jobs
Performed Data Mapping, Data design (Data Modeling) to integrate data across multiple databases in to EDW
Responsible for design and development of advanced R/Python programs to prepare transform and harmonize data sets in preparation for modeling
Hands on experience on Hadoop /Big Data related technology experience in Storage, Querying, Processing and analysis of data
Developed Spark/Scala, Python for regular expression (regex) project in Hadoop / Hive environment for big data resources. Used clustering techniques like K-means to identify outliers and to classify unlabeled data
Data gathering, data cleaning and data wrangling performed using Python and R
Transformed raw data into actionable insights by incorporating various statistical techniques, data mining,data cleaning, data quality, integrity utilizing Python (Scikit-Learn, NumPy, Pandas, and Matplotlib) and SQL
Calculated errors using various machine learning algorithms such as Linear Regression, Ridge Regression, Lasso Regression, Elastic net regression, KNN, Decision Tree Regressor, SVM, Bagging Decision Trees, Random Forest, AdaBoost, and XGBoost. Chose best model eventually based on MAE
Experimented with Ensemble methods to increase accuracy of training model with different Bagging and Boosting methods
Identified target groups by conducting Segmentation analysis using Clustering techniques like K-means
Conducted model optimization and comparison using stepwise function based on AIC value
Used cross-validation to test models with different batches of data to optimize models and prevent over fitting
Worked and collaborated with various business teams (operations, commercial, innovation, HR, logistics, safety, environmental, accounting) to analyze and understand changes in key financial metrics and provide ad-hoc analysis that can be leveraged to build long term points of view where value can be captured
Explored and analyzed customer specific features by using Matplotlib, Seaborn in Python and dashboards in Tableau

Environment: Machine Learning, R Language, Hadoop, Big Data, Python, DB2, MongoDB, Web Services

Confidential

Data Engineer

Responsibilities:

Analyzed and translated Functional Specifications and Change Requests into Technical Specifications
Designed and Implemented Big Data Analytics architecture, transferring data from Oracle Datawarehouse/ external APIs/ flat files to Hadoop using Hortonworks
Designed and developed Use Cases, Activity Diagrams, and Swim Lane Diagrams and Process flows using Unified Model Language
Ran SQL queries for data validation and performed quality analysis on data extracts to ensure data quality and integrity across various database systems
Involved with Data Profiling activities for new sources before creating new subject areas in warehouse
Created DDL scripts for implementing Data Modeling changes
Worked with ETL processes to transfer/migrate data from relational database and flat files common staging tables in various formats to meaningful data in Oracle and MS- SQL
Tested ETL process for both before data validation and after data validation process. Tested messages published by ETL tool and data loaded into various database
Responsible for different Data mapping activities from Source systems
Performed extensive data cleansing, data manipulations and date transforms and data auditing
Involved in SQL Development, Unit Testing and Performance Tuning and to ensure testing issues are resolved on basis of using defect reports
Involved in Data mapping specifications to create and execute detailed system test plans. Data mapping specifies what data will be extracted from an internal data warehouse, transformed and sent to an external entity
Developed Data Migration and Cleansing rules for Integration Architecture ( OLTP, ODS, DW )
Responsible for defining naming standards for data warehouse
Performed data discovery and build a stream that automatically retrieves data from multitude of sources ( SQL databases, external data such as social network data , user reviews) to generate KPI's using Tableau
Writing SQL queries for visualization and reporting systems. Good experience in visualization tool Tableau
Wrote ETL scripts in SQL for extraction and validating data

Environment: : SQL Server, ETL, SSIS, SSRS, Tableau, Excel

We provide IT Staff Augmentation Services!

Data Engineer Resume

Boston, MA

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship