Bi & Data Engineer Sr Resume
Buffalo, NY
SUMMARY
- Data scientist with 7 years of financial services, technology, and e - commerce experience.
- Over 4 plus years of experience involved in the entire data science project life cycle, including Data Acquisition, Data Cleaning, Data Manipulation, Data Mining, Machine Learning Algorithms, Data Validation, and Data Visualization.
- Expertise in transforming business requirements into analytical models, applying algorithms, and reporting solutions that scales across massive volume of structured and unstructured data.
- Experienced with linear regression and logistic regression, Bayesian inference, SVM, neural networks, ANOVA, Gaussian mixture, recommendation system and maximum likelihood estimation analysis.
- Strong skills in statistical methodologies and dimension reduction methods like PCA and correspondence analysis, variable clustering.
- Worked with testing and validation using k-fold cross validation and regularization.
- Extensive experience in developing time series modeling, including but not limited to ARIMA and GARCH modeling, using SAS 9.4, SAS Enterprise Miner & SAS Enterprise Guide and SAS/JMP.
- Worked with Python 3.3 in developing machine learning algorithms, like decision tree, random forest, lasso regression, k-mean clustering analysis, using Numpy, Pandas, Scikit-learn, SFrame, Scipy and Matplotlib, nltk packages.
- Strong ability to write and optimize diverse SQL queries, working knowledge of RDBMS and NoSQL Database, such as MySQL, SQL Server, HBase, Cassandra, MongoDB.
- Adept and deep understanding of text mining, generating data visualizations, delivering projects using various packages in R, like ggplot2, dplyr, caret, twitteR, NLP, rjson, openNLP, tm, GoogleVis, Shiny.
- Working knowledge of Map Reduce with Hadoop and Spark. Good knowledge of big data ecosystem like Hadoop 2.0 (HDFS, Hive, Pig, Impala), Splunk, Spark (SparkSql, Spark MILib, Spark Streaming).
- Excellent performance in building, publishing customized interactive reports and dashboards with customized parameters and user-filters using Tableau 10.3/9+
- Working knowledge of HTML5/CSS3/JavaScript.
- Excellent understanding of SDLC, Agile, and Scrum.
- Effective team player with strong communication and interpersonal skills, possess a strong ability to adapt and learn new technologies and new business lines rapidly.
TECHNICAL SKILLS
BI Tools \Languages: \: Tableau 9.4/9.2, SharePoint 2016/2013, \Python 3.3/2.7, R 3, SQL, SAS 9.4, VBA, \ MS Office (Word/Excel/PowerPoint/Visio),\HiveQL, Pig Latin\ ELK\
Big Data Tools \Operating Systems: \: Hadoop 2 (Hive, HDFS, Pig, Impala), Spark 2.1 \Windows 10/8/7, UNIX, Linux\(SparkSql, MILib), MapReduce, Splunk 6.0, \SSIS\
Packages \Database: \: Python (Numpy, Pandas, Scikit-learn, SFrame, \Oracle 11g, MS Access 2013, SQL Server \Scipy, Matplotlib, nltk) R (ggplot2, dplyr, caret, \2014/2012, MySQL 5.5, HBase 1.2, MongoDB \Twitter, NLP, openNLP, rjson, tm, GoogleVis,3.2, Cassandra 3.0\Shiny)\
PROFESSIONAL EXPERIENCE
Confidential - Buffalo, NY
BI & Data Engineer Sr
Responsibilities:
- Extracted, transformed, loaded transactional data from Oracle 12c and MS SQL Server
- Developed interactive dashboards to support user studies for different products - Zelle, Real time payments, Mobile Next Gen, Enterprise Message Hub, Digital card self-services, and Confidential &T Insurance Agency sales, using Tableau
- Monitor transactional database log to identify inconsistent data format, unexpected data loss
- Effectively improved Tableau processing performance and reduced load time by scheduling extract refresh, and reducing calculation processing levels and improving queries
- Applied various machine learning algorithms and statistical models - decision tree, logistic regression, gradient boosting machine to build predictive model using Scikit-learn in Python
- Continuously worked with center technology to integrate Python script with Tableau in support of advanced data modeling and text analysis
- Created data mapping documents, followed bank wide data architect regulations and best practices to support ETL work
- Applied time series models to forecast customer enrollments and transactions, in support of adjusting promotions and strategies to meet KPIs
- Responsible for software license management, created team owned database on MS SQL Server, extracted, transformed and loaded machine log data
- Visualized software license management analysis results using Tableau dashboards and presented finding to executive management on a weekly basis
- Helped bank wide software registration automation process by using license reports
- Lead Tableau training programs across the bank, performed as Tableau trainer of the bank
- Documented the requirements including the available code which should be implemented using Spark, Hive, HDFS
Environment: Tableau 10.3/9+, Python 3, Numpy, Pandas, nltk, Scikit-Learn, Oracle 12c, SQL server 2016/2012/2008
Confidential - Stamford, CT
Data Scientist
Responsibilities:
- Continuously collected business requirements during the whole project life cycle
- Worked on data cleaning and reshaping, generated segmented subsets using Numpy and Pandas in Python
- Applied various machine learning algorithms and statistical modeling like decision tree, logistic regression, Gradient Boosting Machine to build predictive model using scikit-learn package in Python
- Identified the variables that significantly affect the target
- Conducted model optimization and comparison using stepwise function based on AIC value
- Worked on model selection based on confusion matrices, minimized the Type II error
- Generated cost-benefit analysis to quantify the model implementation comparing with the former situation
- Generated data analysis reports using Matplotlib, Tableau, successfully delivered and presented the results for C-level decision makers
- Wrote and optimized complex SQL queries involving multiple joins and advanced analytical functions to perform data extraction and merging from large volumes of historical data stored in Oracle 11g, validating the processed data in target database
- Developed Python scripts to automate data sampling process. Ensured the data integrity by checking for completeness, duplication, accuracy, and consistency
Environment: Tableau 9.4, Python 3.3, Numpy, Pandas, Matplotlib, Scikit-Learn, Machine Learning, Oracle 11g, SQL
Confidential - Hartford, CT
Data Scientist
Responsibilities:
- Performed time series model (ARIMA) with SAS to capture data pattern and traffic trends, conducted the forecasting of the occupancy rate by different parking lots
- Effectively communicated with the business development team, ensured to implement, and complete the initiative that may increase opportunities
- Efficiently delivered data interpretation by creating interactive analysis reports using data visualization tools - Tableau, to identify business solutions and to support business decisions on marketing and operation
- Implemented and delivered all requirements that are outlined within the contractual agreement between company and the university
- Prepared and executed complex SAS/PROC SQL involving multiple joins and advanced analytical functions to validate the processed data in target database
- Searched and collected data from external sources, integrated with the primary database. Created SparkSql Context to load data from JSON files and performed SQL queries
- Extracted and compiled data, conducted data manipulation to ensure data quality, consistency, and integrity using SFrame in Python
Environment: SAS 9.4/SAS Studio, Hadoop 2, Spark, SparkSql, MS Office (Excel), Tableau 9.4, Python, SFrame
Confidential - West Hartford, CT
Data Scientist
Responsibilities:
- Collected business requirements and data analysis needs from other departments
- Worked on data cleaning and ensured data quality, consistency, integrity using Numpy, SFrame in Python
- Experimented text mining based on customer complaints using nltk in Python
- Used k-means clustering technique to identify outliers and to classify unlabeled data
- Applied Principal Component Analysis method in feature engineering to analyze high dimensional data
- Application of various machine learning algorithms and statistical modeling - decision tree, lasso regression, multivariate regression to identify key features using scikit-learn package in python
- Evaluated models using k-fold cross validation, log loss function
- Ensured that the model has low false positive rate, validated model by interpreting ROC Plot
- Built repeatable processes in support of implementation of new features and other initiatives
- Created various type of data visualization using Tableau
- Performed data parsing and data profiling from large volumes of varied data to learn about behavior with various features based on transactional data, call center history data and customer personal profile, etc.
- Processed the primary quantitative and qualitative market research and loaded the survey responses into database, in preparation of data exploration
- Developed python scripts to automate data sampling process. Ensured the data integrity by checking for duplication, completeness, accuracy, and validity
Environment: Python 3.3, Hadoop 2, Tableau 9.4, Numpy, SFrame, Scikit-Learn, nltk
Confidential
Data Analyst
Responsibilities:
- Extracted and amalgamated information on the data working. Create primary and secondary competitive intelligence gathering for distribution of impactful bi-weekly and monthly reports
- Performed initial descriptive data analysis on datasets using SAS, generated statistical report by PROC UNIVARIATE and FREQ
- Conducted hypothesis tests and analysis on the content of clinical datasets to assess quality, completeness, and volumes of data
- Coordinated with research team and system owners, in order to understand the origins, contents, and structure of datasets, ensured that research objectives were able to be met
- Effectively communicated the results and reported to colleagues and partners
- Created decision-driving competitive intelligence reporting from scientific conferences
- Comprehensive knowledge of drug development and commercial landscapes
Environment: SAS 9.4, SAS Enterprise Guide, SAS Studio, SQL server 2012, SPSS, Microsoft Office 2013 (Access/PowerPoint/Word/Excel)
Confidential
Business Data Analyst
Responsibilities:
- Accomplished the study of client, including buying behaviors, client profile, segmentations
- Analyzed the traffic and business performance of commercial and marketing operations in an approach of continuous improvement of digital devices. Created the data visualization using Shiny in R, in order to track the performance of business campaigns (newsletters, mailing)
- Implemented the strategic initiatives with history data, built and tested the predictive models to better estimate the impact of new campaigns
- Developed the recommendation system by applying collaborative filter and content-based filter, based on large scale of data set, improved the accuracy and the promptitude of customized recommendation
- Created materials on emphasizing product knowledge, brand heritage, website user experience, and luxury service to support CRM initiative and drive sales results
- Prepared and executed complex SQL queries involving multiple joins and advanced analytical functions to validate data in target database
Environment: MySQL, R, dplyr, caret, mle2, Shiny
Confidential
Business Analyst
Responsibilities:
- Participated in data entry, data extraction using SQL queries with MySQL
- Identified the key parameters by clearly defining treatment and control groups and marking target audiences who would be incremental and profitable to business
- Conducted A/B testing for the implementations of new initiatives and conducted documentation in support of the Web design team
- Created different kinds of charts to visualize data analysis results
- Successfully generated decision-driving reports
Environment: Python 2.7, MySQL, R, MS Office (Excel/PowerPoint/Word), Pandas