We provide IT Staff Augmentation Services!

Sr. Data Scientist Resume

0/5 (Submit Your Rating)

Hartford, CT

SUMMARY

  • Data Scientist with over 7of experience execution data - driven solutions with adept knowledge on Data Analytics, Text Mining, Machine Learning (ML), Predictive Modelling, and Natural Language Processing (NLP).
  • Experience in productionizing Machine Learning pipelines on Google cloud platform which performs data extraction, data cleaning, model training and updating the model on performance basis.
  • Utilized GCP resources namely Big Query, cloud composer, compute engine, Kubernetes cluster and GCP storage buckets for building the production ML pipeline.
  • Expertise in building batch and streaming data pipelines on pulling data from multiple sources into Google’s Big Query. These pipelines are built using python, Kafka, Dataflow and Data Proc.
  • Expertise in building ML models to predict failure events over store self-checkout machines and provides root cause for those failure events.
  • Proficient in Statistical Modeling and Machine Learning techniques (Linear, Logistics, Decision Trees, Random Forest, SVM, K-Nearest Neighbors, Bayesian, XG Boost) in Forecasting/ Predictive Analytics.
  • Firsthand solving problems which brings significant business value by building predictive models utilizing structured & unstructured data.
  • Built a Machine Learning model to predict hourly sales (Orders, Invoices and Shipments) for an ecommerce platform.
  • Hands - on experience in Machine Learning algorithms such as Linear Regression, GLM, CART, SVM, KNN, LDA/QDA, Naive Bayes, Random Forest, SVM, Boosting.
  • Experience in working on both windows, Linux and UNIX platforms including programming and debugging skills in UNIX Shell Scripting.
  • Hands on experience in creating data visualizations, dashboards in Tableau desktop.
  • Expertise in performing time series analysis and built forecasting models to predict the temperature and humidity spikes inside cold storage warehouses.
  • Expertise in building monitoring dash boards that visualizes the present and predicted health of the cold storage warehouses.
  • Proficient in Python and its libraries such as NumPy, Pandas, Scikit-learn, Matplotlib and Seaborn.
  • Experience in building data warehouses, data marts and data cubes for creating Power BI reports to visualize various key performance indicators of business.
  • Utilized python libraries namely pandas, matplotlib and plotly for performing data analysis, data visualizations and predicted unexpected reboot events on store self-checkout machines (POS systems).
  • Built a facial recognition model which is being used to perform user authentication for employee work hours tracking system.
  • Utilized python’s flask framework for building REST APIs on top of Data Lake (Big Query, Cloud SQL).
  • Using Python, Telegraph and Kafka built a metrics data pipelines to push virtual infrastructure performance metrics, failed events to Wave Front tenant and built monitoring dash boards.
  • Expertise in building user interface (UI) applications using Bootstrap.
  • Expertise in containerizing applications using Docker composes.
  • Achieved Continuous Integration &Continuous Deployment (CI/CD) for applications using Concourse.
  • Experience with Test driven development (TDD), Agile methodologies and SCRUM processes.
  • Experience in version control and collaboration tools like Git and source tree.

TECHNICAL SKILLS

Languages: Python, Java, Java Script, C, C++, SQL

ML/AI: Tensor Flow, Keras, Scikit-learn, Prophet, Py-Spark, NLTK, Airflow, Pandas, Open-CV

Data Base: MySQL, SQL server, PostgreSQL, MongoDB

Reporting Tools: Power BI, Wave Front

Predictive and Machine Learning: Regression (Linear, Logistic, Bayesian, Polynomial, Ridge, Lasso), Classification (Logistic Reg., two/multiclass classification, Boosted Decision Tree, Random Forest, Decision Tree, Naïve Bayes, Support Vector Machines, k-Nearest Neighbors, Neural Network, and various other models), Clustering (K-means, Hierarchical), Anomaly Detection, LSTM, RNN

Cloud: Google Cloud Platform, Pivotal Cloud Foundry, Azure, AWS

GCP ML Resources: Big Query, Cloud Composer, AI Platform, Kube flow

GCP Other Resources: Dataflow, Data Proc, Compute Engine, Google Kubernetes Engine, App Engine

Other Cloud Resources: Azure Data bricks, AWS Glue

Frameworks: Flask, Django, Falcon, Bottle

Tools: Apache Spark, Kafka, Docker, Git, Concourse, Swagger

Operating System: Linux, Windows, Unix, MacOS

Automation Tool: Ansible, Telegraph

PROFESSIONAL EXPERIENCE

Sr. Data Scientist

Confidential, Hartford, CT

Responsibilities:

  • Applied Lean Six Sigma process improvement in plant and developed Capacity Calculation systems using purchase order tracking system and improvement inbound efficiency by 23.56%.
  • Worked with Machine learning algorithms like Linear Regressions (linear, logistic etc.) SVMs
  • Decision trees for classification of groups and analyzing most significant variables such as FTE, waiting for times of purchase orders
  • Capacities available and applied process improvement techniques.
  • And calculated Process Cycle efficiency of 33.2% and identified value added and non-value-added activities.
  • And utilized SAS for developing Pareto Chart for identifying highly impacting categories in modules to find the work force distribution and created various data visualization charts.
  • Performed univariate, bivariate, and multivariate analysis of approx. 4890 tuples using bar charts, box plots and histograms.
  • Participated in features engineering such as feature creating, feature scaling and One-Hot encoding with Scikit-learn.
  • Converted raw data to processed data by merging, finding outliers, errors, trends, missing values, and distributions in the data.
  • Generated detailed report after validating the graphs using R and adjusting the variables to fit the model.
  • Worked on Clustering and factor analysis for classification of data using machine learning algorithms.
  • Developed Descriptive statistics and inferential statistics for Logistics optimization, Average hours per job, Value throughput data to at 95% confidence interval.
  • Written MapReduce code to process and parsing the data from various sources and storing parsed data into HBase and Hive using HBase - Hive Integration.
  • Created SQL tables with referential integrity and developed advanced queries using stored procedures and functions using SQL server management studio.
  • Used Pandas, NumPy, seaborn, SciPy, Matplotlib, Scikit-learn, NLTK in Python for developing various machine learning algorithms and utilized machine learning algorithms such as linear regression, multivariate regression, Naive Bayes, Random Forests, K-means, & KNN for data analysis.
  • Used packages like dplyr, tidyr & ggplot in R Studio for data visualization and generated scatter plot and high low graph to identify relation between different variables.
  • Worked on Business forecasting, segmentation analysis and Data mining and prepared management reports defining the problem; documenting the analysis and recommending courses of action to determine the best outcomes.
  • Worked on various Statistical models like DOE, hypothesis testing, Survey testing and queuing theory.
  • Experience with risk analysis, root cause analysis, cluster analysis, correlation and optimization and K-means algorithm for clustering data into groups.
  • Coordinate with data scientists and senior technical staff to identify client's needs and document assumptions.

Environment: SQL Server, Jupyter, Python, MATLAB, SSRS, SSIS, SSAS, MongoDB, HBase, HDFS, Hive, Pig, Microsoft office, SQL Server Management Studio, Business Intelligence Development Studio, MS Access.

Sr. Data Scientist/Data Analyst

Confidential, Stamford, CT

Responsibilities:

  • Implemented Data Exploration to analyze patterns and to select features using Python SciPy.
  • Built Factor Analysis and Cluster Analysis models using Python SciPy to classify customers into different target groups.
  • Designed an A/B experiment for testing the business performance of the new recommendation system.
  • Supported MapReduce Programs running on the cluster.
  • Evaluated business requirements and prepared detailed specifications that follow project guidelines required to develop written programs.
  • Participated in Data Acquisition with Data Engineer team to extract historical and real-time data by using Hadoop MapReduce and HDFS.
  • Communicated and presented default customers profiles along with reports using Python and Tableau, analytical results, and strategic implications to senior management for strategic decision-making.
  • Developed scripts in Python to automate the customer query addressable system using python which decreased the time for solving the query of the customer by 45%.
  • Collaborated with other functional teams across the Risk and Non-Risk groups to use standard methodologies and ensure a positive customer experience throughout the customer journey.
  • Performed Data Enrichment jobs to deal missing value, to normalize data, and to select features.
  • Developed multiple MapReduce jobs in java for data cleaning and pre-processing.
  • Analyzed the partitioned and bucketed data and compute various metrics for reporting.
  • Extracted data from Twitter using Java and Twitter API. Parsed JSON formatted twitter data and uploaded to database.
  • Developed Hive queries for analysis and exported the result set from Hive to MySQL using Sqoop after processing the data.
  • Created HBase tables to store various data formats of data coming from different portfolios.
  • Worked on improving performance of existing Pig and Hive Queries.
  • Created reports and dashboards, by using D3.js and Tableau, to explain and communicate data insights, significant features, models scores and performance of new recommendation system to both technical and business teams.
  • Utilize SQL, Excel, and several Marketing/Web Analytics tools (Google Analytics, Bing Ads, AdWords, AdSense, Criteo, Smartly, SurveyMonkey, and Mailchimp) to complete business & marketing analysis and assessment.
  • Used Git 2.x for version control with Data Engineer team and Data Scientists colleagues.
  • Used Agile methodology and SCRUM process for project developing.
  • KT with the client to understand their various Data Management systems and understanding the data.
  • Creating meta-data and data dictionary for the future data use/ data refresh of the same client.
  • Structuring the Data Marts to store and organize the customer's data.
  • Running SQL scripts, creating indexes, stored procedures for data analysis
  • Data Lineage methodology for data mapping and maintaining data quality.
  • Prepared Scripts in Python and Shell for Automation of administration tasks.
  • Maintained PL/SQL objects like packages, triggers, procedures etc.
  • Mapping flow of trade cycle data from source to target and documenting the same.
  • Performing QA on the data extracted, transformed, and exported to excel.
  • Participated in all phases of data mining; data collection, data cleaning, developing models, validation, visualization and performed Gap analysis.
  • Extracted data from HDFS and prepared data for exploratory analysis using data munging.
  • Built models using Statistical techniques like Bayesian HMM and Machine Learning classification models like XG Boost, SVM, and Random Forest.
  • Conducted a highly immersive Data Science program involving Data Manipulation & Visualization, Web Scraping, Machine Learning, Python programming, SQL, GIT, Unix Commands, NoSQL, MongoDB, Hadoop.
  • Used pandas, numpy, seaborn, scipy, matplotlib, scikit-learn, NLTK in Python for developing various machine learning algorithms.
  • Worked on different data formats such as JSON, XML and performed machine learning algorithms in Python.

Environment: ER Studio, MDM, GIT, Unix, Python (SciPy, NumPy, Pandas, StatsModel, Plotly), MySQL, Excel, Google Cloud Platform, Tableau, SVM, Random Forests, Naïve Bayes Classifier, A/B experiment, Git, Agile/SCRUM., MLLib, SAS, regression, logistic regression, Hadoop, NoSQL, Teradata, OLTP, random forest, OLAP, HDFS, ODS, NLTK, SVM, JSON, XML, MapReduce

Data Scientist / Data Analyst / Machine Learning

Confidential

Responsibilities:

  • Built scalable and deployable machine learning models.
  • Utilized Sqoop to ingest real-time data. Used analytics libraries Sci-Kit Learn, MLLIB and MLxtend.
  • Extensively used Python's multiple data science packages like Pandas, NumPy, matplotlib, Seaborn, SciPy, Scikit-learn and NLTK.
  • Performed Exploratory Data Analysis, trying to find trends and clusters.
  • Built models using techniques like Regression, Tree based ensemble methods, Time Series forecasting, KNN, Clustering and Isolation Forest methods.
  • Worked on data that was a combination of unstructured and structured data from multiple sources and automated the cleaning using Python scripts.
  • Extensively performed large data read/writes to and from csv and excel files using pandas.
  • Tasked with maintaining RDD's using SparkSQL.
  • Communicated and coordinated with other departments to collection business requirement.
  • Tackled highly imbalanced Fraud dataset using under sampling with ensemble methods, oversampling and cost sensitive algorithms.
  • Improved fraud prediction performance by using random forest and gradient boosting for feature selection with Python Scikit-learn.
  • Implemented machine learning model (logistic regression, XG Boost) with Python Scikit- learn.
  • Optimized algorithm with stochastic gradient descent algorithm Fine-tuned the algorithm parameter with manual tuning and automated tuning such as Bayesian Optimization.
  • Developed a technical brief based on the business brief. This contains detailed steps and stages of developing and delivering the project including timelines.
  • After sign-off from the client on technical brief, started developing the SAS codes.
  • Wrote the data validation SAS codes with the help of Univariate, Frequency procedures.
  • Summarized the data at customer level by joining the datasets of customer transaction, dimension and from 3rd party sources.
  • Separately calculated the KPIs for Target and Mass campaigns at pre-promo-post periods with respective to their transactions, spend and visits.
  • Also measured the KPIs at MoM (Month on Month), QoQ (Quarter on Quarter) and YoY (Year on Year) with respect to pre-promo-post.
  • Measured the ROI based on the difference’s pre-promo-post KPIs.
  • Extensively used SAS procedures like IMPORT, EXPORT, SORT, FREQ, MEANS, FORMAT, APPEND, UNIVARIATE, DATASETS and REPORT.
  • Standardized the data with the help of PROC STANDARD.
  • Implemented cluster analysis (PROC CLUSTER and PROC FASTCLUS) iteratively.
  • Worked extensively with data governance team to maintain data models, Metadata, and dictionaries.
  • Used Python to preprocess data and attempt to find insights.
  • Iteratively rebuild models dealing with changes in data and refining them over time.
  • Created and published multiple dashboards and reports using Tableau server.
  • Extensively used SQL queries for legacy data retrieval jobs.
  • Tasked with migrating the Django database from MySQL to PostgreSQL.
  • Gained expertise in Data Visualization using matplotlib, Bokeh and Plotly.
  • Responsible for maintaining and analyzing large datasets used to analyze risk by domain experts.
  • Developed Hive queries that compared new incoming data against historic data. Built tables in Hive to store large volumes of data.
  • Used big data tools Spark (SparkSQL) to conduct the real time analysis of credit card fraud based on AWS.
  • Performed Data audit, QA of SAS code/projects and sense check of results.

Environment: Spark, Hadoop, AWS, SAS Enterprise Guide, SAS/MACROS, SAS/ACCESS, SAS/STAT, SAS/SQL, ORACLE, MS-OFFICE, Python (scikit-learn, pandas, Numpy), Machine Learning (logistic regression, XGboost), Gradient Descent algorithm, Bayesian optimization, Tableau.

Data Analyst

Confidential

Responsibilities:

  • Developed and implemented predictive models using Natural Language Processing Techniques and machine learning algorithms such as linear regression, classification, multivariate regression, Naive Bayes, Random Forests, K-means clustering, KNN, PCA and regularization for data analysis.
  • Designed and developed Natural Language Processing models for sentiment analysis.
  • Applied clustering algorithms i.e., Hierarchical, K-means with help of Scikit and Scipy.
  • Developed visualizations and dashboards using ggplot, Tableau.
  • Worked on development of data warehouse, Data Lake and ETL systems using relational and non-relational tools like SQL, NoSQL.
  • Built and analyzed datasets using R, SAS, MATLAB, and Python.
  • Participated in all phases of data mining; data collection, data cleaning, developing models, validation, visualization and performed gap analysis.
  • Data Manipulation and Aggregation from different source using Nexus, Toad, BusinessObjects, PowerBI and SmartView.
  • Implemented Agile Methodology for building an internal application.
  • Good knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, Secondary Name Node, and Map Reduce concepts.
  • As Architect delivered various complex OLAP databases/cubes, scorecards, dashboards, and reports.
  • Programmed a utility in Python that used multiple packages (scipy, numpy, pandas).
  • Implemented Classification using supervised algorithms like Logistic Regression, Decision trees, KNN, Naïve Bayes.
  • Used Teradata15 utilities such as Fast Export, MLOAD for handling various tasks of data migration/ETL from OLTP Source Systems to OLAP Target Systems
  • Maintenance in the testing team for System testing/Integration/UAT.
  • Involved in preparation & design of technical documents like Bus Matrix Document, PPDM Model, and LDM & PDM.
  • Understanding the client business problems and analyzing the data by using appropriate Statistical models to generate insights.

Environment: Erwin, Tableau, MDM, Qlik View, ML Lib, PL/SQL, HDFS, Teradata, JSON, HADOOP (HDFS), Map Reduce, PIG, Spark, R Studio, MAHOUT, JAVA, HIVE, AWS.

We'd love your feedback!