Data Scientist Resume
SUMMARY:
- Overall 7 years in IT including 5+ years in delivering end - to-end advanced analytics projects using Statistical, Mathematical Modeling, and Machine Learning. Strong Project Management skills and experience leading/collaborating with cross-functional teams.
- Proficient in implementing various statistical models and data mining tools (predictive modeling, clustering, logistic regression, multivariate regression, decision trees, neural networks)
- Solid understanding of EXALEAD CloudView components - CloudView Connectivity, CloudView Semantic Factory, CloudView Index and CloudView Mashup Builder.
- Experience processing massive amounts of structured and unstructured data using Spark/SQL/Hive
- Hands on experience in executing analysis using tools such as SQL, Python (Numpy, Pandas, scikit-learn), and Tableau.
- Proficient in various data visualization tools and libraries such as Tableau, Matplotlib, Seaborn, Plotly, D3.js, DECK.GL to create visually powerful and actionable interactive reports and dashboards.
- Excellent knowledge and experience with Hadoop Architecture and other components of its ecosystems like HDFS, YARN, Map Reduce, Hive, Pig, Hbase, Flume, Sqoop, Oozie, Tez, Spark.
- Comprehensive knowledge in provisioning virtual clusters under AWS cloud which includes services like EC2, S3, and EMR, and other cloud technologies such as Microsoft Azure.
- Proficient in database development in RDBMs: Oracle, PostgreSQL, MySQL, and MS-SQL
- Excellent knowledge in creating Databases, Tables, Stored Procedure, DDL/DML Triggers, Views, User defined data types, effective functions, Cursors and Indexes
- Solid experience in Software Development Life Cycle (SDLC) including requirement gathering, analysis, design, development and testing in both Waterfall and Agile methodologies.
TECHNICAL SKILL:
PDM: JIRA, Version One, SMARTSHEET
Programming: Python 3, Linux, Spark 2.2, Java, MEL (Familiarity Scala)
Tools: PyCharm, Jupyter Notebook, Eclipse Apache Zeppelin, Git, Salesforce Einstein, Alteryx 2
Cloud: Cloudview, AWS/EMR/EC2/S3 (also direct-Hadop- EC2 (nonEMR))
SQL/NoSQL: Oracle, MySQL, PostgreSQL, Hadoop Hive, Cassandra Presto
BI Tools: Tableau 10.0, PARSE.LY, Plot.ly, Kibana, EXALEAD CLOUDVIEW
Analytics: Pandas Scikit-learn, Spark MLlib, R-Studio, SAS, IBM SPSS, MS-Excel
ML Algorithms: Linear Regression, Logistic Regression, ARIMA, Decision Trees, Random Forest, k-Nearest Neighbors, Support Vector Machine, Neural Networks, Deep Learning, PCA, Model Evaluation, Model Selection, Feature Engineering
Web: HTML CSS, JQuery, JavaScript
EXPERIENCE:
Confidential
Data Scientist
Responsibilities:
- Created hive tables and loaded the data into it. And then connected the hive db with jupyter/pycharm.
- Checked the shape and quality of the data and then created a fix-it list i.e changing data types.
- Performed uni-varaite analysis to check the distribution of each variable and bi-variate analysis such as violin plots and bar graphs to compare the distribution of different variables using a combination of matplotlib and seaborn libraries.
- Derived features such as ‘aging’, aggregations, dummy variables, and target variables and many more using python libraries numpy and pandas.
- Applied the Imputation methods such as Mean Imputation, Last value carried forward, using information about from related observations to deal with MCAR, MAR and MNAR cases.
- Applied logistic regression, linear regression, ordinary least square method, mean-variance, theory of large numbers, residuals, Poisson distribution, Bayes, Naive Bayes, fitting function and many more to data using Python libraries - Scikit, Scipy, Numpy and Pandas.
- Applied clustering algorithms on task related data to study the underlying data patterns using a variety of techniques i.e. PCA, Factor analysis, K-means using numpy pandas and Scikits
- Created the scoring model using Logistic Regression to maximize the leads conversion rate - used precision and recall for scoring metrics and R squared, MSE, RMSE, Log loss, Gini Index, and AUC for cross validation metrics.
- Helped machine-learning engineering to get-started with Docker containers for deploying machine learning models.
- Crated data flows by joining various data objects in Salesforce Einstein. And derived more features by writing SAQL codes to generate the final dataset.
- Created various lenses - a predefined API in Einstein to create charts. And then merged all lenses to create the final dashboard that is consumed by State Farm’s Agents.
- Crafted technical-reports to summarize the data insights to the stakeholders and management.
Environment: PyCharm, Python 3, Linux, Git, Docker, MLP, Hadoop, Hive, Salesforce Einstein Discovery
Confidential, Needham, MA
Data Analytics
Responsibilities:
- Involved in creating database objects like master tables, view procedures, triggers, and functions using PostgreSQL to provide definition, structure and to maintain data efficiently.
- Building, publishing customized interactive reports and dashboards, report scheduling using Tableau server.
- Created action filters, parameters and calculated sets for preparing dhasboards and worksheets in Tableau.
- Extracted the Case related data form Freshdesk in JSON format using APIs, derived fields ie. Delta for each stages of the case from the data using Python, and then build out BI dashboards in Tableau showing the hidden actionable insight such as Time taken for a ticket in each stage.
- Assisted data scientists by performing data cleansing and data wrangling on vast amount of marketing attribution data using Numpy and Pandas.
Environment: Tableau 10, PostgreSQL, Python 3, Numpy Pandas, Jupyter Notebook.
Confidential, FL
Data Scientist
Responsibilities:
- Explored the automatic meter reading (AMR) data from the Confidential using Numpy and Pandas and Alteryx, applied various data wrangling methods such as reshaping, summering, creating new derived features and also combining with the other open source data ie Weather.
- Performed feature engineering around data-time and location fields and then performed bi-varaite analysis ie. Creating dual axis line graphs and bar graphs to compare the distribution of variables.
- Used 5-fold cross validation along with stratified sampling so that the test set generated using stratified sampling has target category proportions almost identical to those in the full dataset.
- Applied logistic regression classifier, SVM and Random Forest using pandas and sckit-learn and then used confusion matrix for model evaluation.
- Applied ARIMA method to forecast the water consumption of each account - keeping the error of margin under 3% to flag a “faulty water meter”.
Environment: Alteryx 10.0, Tableau 10.4, Python 3.5, Numpy Pandas Matloplib Seaborn Scikit-learn
Confidential, NJ
Data Scientist
Responsibilities:
- Contribute in new product development and the delivery of real-time business intelligence using a wide range of cloud and big data technologies such as Microsoft Azure, Hadoop/Hive, Apache Spark, R, Python and Node.js.
- Applied linear regression, multiple regression, ordinary least square method, mean-variance, theory of large numbers, logistic regression, dummy variable, residuals, Poisson distribution, Bayes, Naive Bayes, fitting function and many more to data using Python libraries - Scikit, Scipy, Numpy and Pandas.
- Applied clustering algorithms on market data to study the underlying data patterns using a variety of techniques i.e. PCA, Factor analysis, Hierarchical, K-means through Scikits, for projecting market.
- Designed and developed some of the complex modules of the system in AWS using Spark Cluster.
- Create, modify and execute DDL and tables to load data into Hive and AWS Redshift tables. Data
- Validation, Quality check in RedShift using Python.
- Applied linear regression in Python to understand the relationship between different attributes of dataset and causal relationship between them.
Environment: AWS, Python 3, Jupyter notebook, Hadoop, Hive, Spark 2.0, Galaxy 2021
Confidential, Waltham, MA
Data Scientist
Responsibilities:
- Transformed the global product marketing team into a cloud-based data-driven team by digitizing and automatizing of all kinds of customers, CRM, marketing and pipeline data resulting in providing a full 360 view of customers with the ENOVIA PLM brand, quality reporting, actionable insights, dynamic user engagement, and improved productivity by more than 500%.
- Extracted and transformed customers and marketing datasets obtained from multiple sources (Oracle, PostgreSQL, CSVs etc.), loading them into EXALEAD for indexing, and developed a web based front-end /search based applications using a combination of Grails/Java + JQuery + d3.js.
- Extracted a variety of data sources such as events, CRM, engagement platform, performed preprocessing, and built the customer response model for marketing using Logistic Regression, used model evaluation techniques such as F1 score, ROC curve and confusion matrix.
- Created a product portfolio application using previous years of revenue data, using a combination of EXALEAD SQL HTML CSS JS.
- Added an Ontology Matcher and Annotation Manager to the analysis pipeline, created a new GPS point index field, and added a mapping source and a mapping target, and then added the Google Maps widget to mashup UI app to for enabling geolocation based on place detection in CloudView.
- Created and deployed EXALEAD customized dashboards in production that included matrices like S-Curves, Bar Graphs, Multi-axis like charts and more.
- Created a demand and sales forecasting model using regression method, integrated the model with the sales dashboard using D3.js.
- Performed A/B testing for Banner Ads on google, additionally, for multiple alternatives, choose multi-armed bandit algorithm to create more value in spending - 60% increase in ROI.
- Helped the product marketing team with segmentation, strategy, content creation, UI, and overall go-to-market plan for 2017 product launch.
- Improved UX|UI of brands’ webpage by collaborating with the design and engineering team.
Environment: EXALEAD Cloudview, HTML CSS JavaScript, Java, MEL, UQL, SQL Server, Siebel CRM, Data Mining
Confidential, Boston, MA
Data Analyst
Responsibilities:
- Analyzed business requirements, system requirements, data mapping requirement specifications, and responsible for documenting functional requirements and supplementary requirements.
- Provided scheduled financial reporting solutions using SQL Server reporting solutions (SSRS) and Excel (pivot table, vlookup, hlookup, conditionals etc), and crated a few reporting formats to train new hires.
- Created BI dashboards for visual spend analytics and regression analysis using Tableau to identify opportunities to reduce cost, track contract compliances, and measure supplier performance - 11% cost reduction in FY15.
- Update and maintain the custodian database in PeopleSoft Finance for asset management, performed QA for data duplicates or other errors.
Environment: MS-Excel, SQL Server, R Studio, Tableau, PeopleSoft, Buy Ways
Confidential
Software Engineer - Product Manager
Responsibilities:
- Worked in a team of a few software developers to build Computer Adaptive Testing (CAT) application in C and Java using Agile SDLC methodology. Also, conducted JAD sessions and brainstorming.
- Performed Unit Testing, Load Testing and Performance Testing in Java.
- Designed a real-time intelligent chat bot in Python to improve the customer service and staff efficiency by 60%.
- Created pricing strategy and go-to-market plan along with the CEO by analyzing the consumer data within different geographical locations. Also, helped Digital and Advocacy marketing campaigns resulting 2000% ROI.
Environment: Agile, C, Java, Python, JIRA, MS-Excel PowerPoint