We provide IT Staff Augmentation Services!

Data Scientist Resume

Seattle, WA

SUMMARY:

  • Data Scientist with over 12 Years of overall experience & 5 years in Statistical solutions, Mathematical modelling, Machine learning.
  • Proficient in Statistical Modeling and Machine Learning techniques in Forecasting / Predictive Analytics, Segmentation methodologies, Regression based models, Hypothesis testing.
  • Involved in entire data science project life cycle, including Data Acquisition, Data Cleansing, Data Manipulation, Feature Engineering, Modelling, Evaluation, Optimization, Testing and Deployment
  • Proficient with Python including Numpy, Scikit - learn, Pandas, Matplotlib and Seaborn.
  • Hands-on with data analytics, OLAP reporting and machine learning models like Linear, Logistic regression, Decision trees, Random Forest, SVM, K-Nearest neighbors, Clustering: K-means and Hierarchical, Bayesian etc.
  • Strong experience in wrangling very large data sets to understand and identify patterns.
  • Proficient in SQL with hands-on in Spark SQL, HIVEQL, PIG, PySpark.
  • Hands-on experience in Scala, Python, JavaScript, Java.
  • Experienced in Amazon Web Services (AWS) such as AWS EC2, EMR, S3, RD3, and Redshift, Confidential Azure Services such as Web and Mobile Apps, Azure Functions, Storage, Cognitive Services, Data lake
  • Proficient in requirement gathering, writing, analysis, estimation, use case review, scenario preparation, test planning and strategy decision making.
  • Working in agile fashion for 7+ years.
  • Strong business sense and abilities to communicate data insights to both technical and non-technical stakeholders.

TECHNICAL SKILLS:

Programming: Python, Java, Scala, .Net, Spark, Spark SQLBusiness Intelligence Tools Power BI, MS Excel - Analytical Solver

Database & Skills: SQL Server,DB2, Oracle 11g

Machine Learning: Linear, Lasso, Ridge Regression, Logistic Regression,Random Forest, Support Vector Machine, Neural Networks,Decision Tree, Time Series Analysis, PCA & Filtering, Analysis, Clustering, Text Mining,Collaborative Filtering, Na ve Bayes

PROFESSIONAL SKILLS:

Confidential, Seattle, WA

Data Scientist

Responsibilities:

  • Responsible for data aggregation, data pre-processing, missing value imputation, data enrichment, end user data quality.
  • Developed efficient and intelligent data pipelines using Spark for Bing.com
  • Architected the data pipeline for ingesting and processing multi million records to Bing.
  • Used machine learning techniques to conflate data from various providers and implement multi-level rankers to serve search queries.
  • Developed several dashboards that handle huge data sets for business reports, quality control of Bing local data.
  • Implemented text-mining from user feedback to automatically enrich data.
  • Designed and deployed a classifier for identifying junk data and businesses closed in the real world with accuracy of > 90
  • Achieved the goal of bringing up the quality of top entities across 4 markets from ~80 to 98% in just 14 months that directly contributed to a hike in customer satisfaction.
  • Produced high quality datasets that are used to train many new models.
  • Hands on with deploying cloud services and REST API endpoints on Azure.
  • Mentored new hires and managed a team of vendors

Environment: Python, Spark, .Net, JavaScript, Azure Data Lake, Hadoop, Hive, T-SQL, U-SQL

Confidential, CA

Data Scientist / Sr. Software Engineer

Responsibilities:

  • Collaborated with data engineers and operation team to implement ETL process, wrote and optimized SQL queries to perform data extraction to fit the analytical requirements.
  • Used NLP to segregate topic segmentation of products and sentiment analysis of customers to optimize occasional offers.
  • Performed data analysis by using Hive to retrieve the data from Hadoop cluster, SQL to retrieve data from RedShift.
  • Explored and analyzed the customer specific features by using Spark SQL.
  • Performed data imputation using Scikit-learn package in Python.
  • Participated in features engineering such as feature intersection generating, feature normalize and label encoding with Scikit-learn preprocessing.
  • Conducted analysis on assessing customer consuming behaviors and discover value of customers with RMF analysis; applied customer segmentation with clustering algorithms such as K-Means Clustering and Hierarchical Clustering.
  • Built regression models include: Lasso, Ridge, SVR, XGboost to predict Customer Life Time Value.
  • Built classification models include: Logistic Regression, SVM, Decision Tree, Random Forest to predict Customer Churn Rate.
  • Used F-Score, AUC/ROC, Confusion Matrix, MAE, RMSE to evaluate different Model performance.
  • Designed and implemented a recommendation system which utilized Collaborative filtering techniques to recommend course for different customers and deployed to AWS EMR cluster.
  • Designed rich data visualizations to model data into human-readable form with Tableau and Matplotlib.

Environment: Python, Spark, R, JavaScript, Spark Streaming, Spark ML

Confidential

Data Analyst

Responsibilities:

  • Gathered and translated business requirements into detailed, production-level technical specifications, new features, and enhancements to existing technical business functionality.
  • Data analysis and reporting using MS Power Point, MS Access and SQL assistant.
  • Generated periodic reports based on the statistical analysis of the data using SQL Server Reporting Services.
  • Used advanced Confidential Excel to create pivot tables, used VLOOKUP and other Excel functions.
  • Worked on CSV files while trying to get input from the MySQL database.
  • Created functions, triggers, views and stored procedures using MySQL.
  • Worked on database testing, wrote complex SQL queries to verify the transactions and business logic.
  • Worked on data profiling and data validation to ensure the accuracy of the data between the warehouse and source systems.
  • Developed SQL scripts for creating tables, Sequences, Triggers, views and materialized views.
  • Compiled data from various sources public and private databases to perform complex analysis and data manipulation for actionable results.
  • Used and maintained database in MS SQL Server to extract the data inputs from internal systems.
  • Interacted with the Client and documented the Business Reporting needs to analyze the data.
  • Used SAS for pre-processing data, SQL queries, data analysis, generating reports, graphics, and statistical analyses.
  • Migrated database from legacy systems, SQL server to Oracle.
  • Performed data analysis, statistical analysis, and generated reports, listings and graphs using SAS tools-SAS/Base, SAS/Macros and SAS graph, SAS/SQL, SAS/Connect, and SAS/Access.
  • Developed data mapping documentation to establish relationships between source and target tables including transformation processes using SQL.
  • Used the Waterfall methodology to build the different phases of Software development life cycle.
  • Participated in all phases of data mining, data collection, data cleaning, developing models, validation, and visualization and performed Gap analysis
  • Demonstrated experience in design and implementation of Statistical models, Predictive models, enterprise data model, metadata solution and data life cycle management.
  • Extensive data cleansing and analysis, using pivot tables, formulas (V-lookup and others), data validation, conditional formatting, and graph and chart manipulation using excel.
  • Created pivot tables and charts using worksheet data and external resources, modified pivot tables, sorted items and group data, and refreshed and formatted pivot tables.
  • Developed SQL Queries to fetch complex data from different tables in remote databases using joins, database links and Bulk collects.

Environment: Erwin 9.0, SDLC, MS Power Point, MS Access, MS SQL Server2008, SAS, Oracle11g, Confidential Excel

Confidential

Data Analyst

Responsibilities:

  • Gathered and translated business requirements into detailed, production-level technical specifications, new features, and enhancements to existing technical business functionality.
  • Data analysis and reporting using MS Power Point, MS Access and SQL assistant.
  • Generated periodic reports based on the statistical analysis of the data using SQL Server Reporting Services.
  • Used advanced Confidential Excel to create pivot tables, used VLOOKUP and other Excel functions.
  • Worked on CSV files while trying to get input from the MySQL database.
  • Created functions, triggers, views and stored procedures using MySQL.
  • Worked on database testing, wrote complex SQL queries to verify the transactions and business logic.
  • Worked on data profiling and data validation to ensure the accuracy of the data between the warehouse and source systems.
  • Developed SQL scripts for creating tables, Sequences, Triggers, views and materialized views.
  • Compiled data from various sources public and private databases to perform complex analysis and data manipulation for actionable results.
  • Used and maintained database in MS SQL Server to extract the data inputs from internal systems.
  • Interacted with the Client and documented the Business Reporting needs to analyze the data.
  • Used SAS for pre-processing data, SQL queries, data analysis, generating reports, graphics, and statistical analyses.
  • Migrated database from legacy systems, SQL server to Oracle.
  • Performed data analysis, statistical analysis, and generated reports, listings and graphs using SAS tools-SAS/Base, SAS/Macros and SAS graph, SAS/SQL, SAS/Connect, and SAS/Access.
  • Developed data mapping documentation to establish relationships between source and target tables including transformation processes using SQL.
  • Used the Waterfall methodology to build the different phases of Software development life cycle.
  • Participated in all phases of data mining, data collection, data cleaning, developing models, validation, and visualization and performed Gap analysis
  • Demonstrated experience in design and implementation of Statistical models, Predictive models, enterprise data model, metadata solution and data life cycle management.
  • Extensive data cleansing and analysis, using pivot tables, formulas (V-lookup and others), data validation, conditional formatting, and graph and chart manipulation using excel.
  • Created pivot tables and charts using worksheet data and external resources, modified pivot tables, sorted items and group data, and refreshed and formatted pivot tables.
  • Developed SQL Queries to fetch complex data from different tables in remote databases using joins, database links and Bulk collects.

Environment: Erwin 9.0, SDLC, MS Power Point, MS SQL Server2008, SAS, Oracle11g, Confidential Excel

Hire Now