We provide IT Staff Augmentation Services!

Data Scientist Resume

4.00/5 (Submit Your Rating)

WA

SUMMARY

  • Professional qualified Data Scientist with over 5+ years of experience in Data Science and Analytics including Machine Learning, Data Mining and Statistical Analysis
  • Involved in the entire data science project life cycle and actively involved in all the phases including data extraction, data cleaning, statistical modeling and data visualization with large data sets of structured and unstructured data
  • Having knowledge on Apache Spark and developing data processing and analysis algorithms using Python
  • Understanding of machine learning (“ML”) concepts and application of algorithms in non - academic environments
  • Strong software development background in functional and object-oriented programming
  • Expertized in developing Machine learning algorithm using Python
  • Ability to manipulate, transform, and analyze abstract data structures such as Dataframes
  • Fundamental understanding of machine learning concepts including training models, as well as understanding precision and recall in the real world
  • Strong programming experience in the following: R, Python, Matlab.
  • Performed Collection, cleansing, and verification of structured and unstructured data with R and Python
  • Experience in visualization tools like, Tableau 9, 10 for creating dashboards
  • Excellent understanding Agile and Scrum development methodology
  • Used the version control tools like Git
  • Passionate about cleaning insightful information from massive data assets and developing a culture of sound, data-driven decision making
  • Ability to maintain a fun, casual, professional and productive team atmosphere

TECHNICAL SKILLS

Machine Learning: classification, regression, clustering, feature engineering, deep learning, neural networks

Programming Languages: Python (pandas, scikit-learn, numpy, scipy, GraphLab Create), R, SQL, Hadoop (MapReduce, Sqoop, Flume), Scala, Java, Spark (PySpark, MLlib)

Operating Systems: Linux (CentOS, Ubuntu, Kali Linux), Windows, MacOS

Software Tools: PyCharm, Jupyter Notebook, R studio, Tableau, Microsoft Office, Eclipse IDE

PROFESSIONAL EXPERIENCE

Confidential, WA

Data Scientist

Responsibilities:

  • Designing and developing various machine learning frameworks using python, R, and Matlab.
  • Collaborated with data engineers to implement ETL process, wrote and optimized SQL queries to perform data extraction from Cloud and merging from Oracle 12c.
  • Performed Data Manipulation in R with the packages like dpylr, data.table, lubridate, and ggplot2 for visualization.
  • Performed data integrity checks, data cleaning, exploratory analysis and feature engineer using R 3.4.0.
  • Conducted analysis on assessing customer consuming behaviors and discover value of customers with RMF analysis; applied customer segmentation with clustering algorithms such as K-Means Clustering and Hierarchical Clustering.
  • Developed personalized products recommendation with Machine Learning algorithms, including Collaborative filtering and Gradient Boosting Tree, to better meet the needs of existing customers and acquire new customers.
  • Worked on outliers identification with box-plot, K-means clustering using Pandas, Numpy
  • Participated in features engineering such as feature intersection generating, feature normalize and Label encoding with Scikit-learn preprocessing
  • Used Python 3.0 (numpy, scipy, pandas, scikit-learn, seaborn, NLTK) and Spark 1.6 / 2.0 (PySpark, MLlib) to develop variety of models and algorithms for analytic purposes.
  • Coordinated the execution of A/B tests to measure the effectiveness of personalized recommendation system.
  • Performed data visualization with Tableau 10 and generated dashboards to present the findings.
  • Recommended and evaluated marketing approaches based on quality analytics of customer consuming behavior.
  • Determined customer satisfaction and helped enhance customer experience using NLP.
  • Used Git 2.6 to apply version control. Tracked changes in files and coordinated work on the files among multiple team members.

Environment: s: R(dpylr, data.table, lubridate, ggplot2), Matlab, MongoDB, exploratory analysis, feature engineering, K-Means Clustering, Hierarchical Clustering, Machine Learning (Gradient Boosting Tree, NLP), Python (numpy, scipy, pandas, scikit-learn, NLTK), Spark (MLlib, PySpark), Tableau, Git

Confidential, MN

Data Scientist

Responsibilities:

  • Communicated and coordinated with other departments to collection business requirement
  • Worked on miss value imputation, outliers identification with statistical methodologies using Pandas, Numpy
  • Participated in features engineering such as feature creating, feature scaling and One-1/otencoding with Scikit-learn
  • Tackled highly imbalanced Fraud dataset using undersampling with ensemble methods,oversampling and cost sensitive algorithms
  • Improved fraud prediction performance by using random forest and gradient boosting for feature selection with Python Scikit-learn
  • Implemented machine learning model (logistic regression, XGboost) with Python Scikit- learn
  • Optimized algorithm with stochastic gradient descent algorithmFine-tuned the algorithm parameter with manual tuning and automated tuning such as Bayesian Optimization
  • Validated and select models using k-fold cross validation, confusion matrices and worked on optimizing models for high recall rate
  • Implemented Ensemble Models with majority votes to enhance the efficiency and performance
  • Designed rich data visualizations with Tableau 9.4

Environment: s: Python (scikit-learn, pandas, Numpy), Machine Learning (logistic regression, XGboost), Gradient Descent algorithm, Bayesian optimization, Tableau

Confidential, NY

Data Scientist

Responsibilities:

  • Build Analytics systems, data structures, gather and manipulate data, using statistical techniques and predictive modelling to tell people story.
  • Designing suite of interactive dashboards, which provided an opportunity to scale and measure the statistics of the HR dept. which was not possible earlier and schedule and publish reports.
  • Provided and created data presentation to reduce biases and telling true story of people by Pulling millions of rows of data using SQL, analysis of Data.
  • Worked on Machine Learning to compare the Metrics of HR data closely

Environment: s: R, Python (pandas, numpy, scikit-learn), Machine Learning (predictive modeling), SQL, Tableau

Confidential

Big Data Developer

Responsibilities:

  • Involved in the process of load, transform and analyze data from various sources into HDFS (Hadoop Distributed File System) using Hive, Pig and Sqoop.
  • Experienced in handling data from different datasets, join and preprocess them using Pig join operations.
  • Worked on Pig script to count the number of times a particular URL was opened in a particular duration.
  • Developed PIG UDFs for the needed functionality such as custom Pigsloader known as timestamp loader.
  • Created Hive tables based on the business requirements.
  • Pig scripts and Hive queries were used to analyze the large data sets.
  • Scheduling and managing jobs on a Hadoop cluster using Oozie work flow.
  • Written MapReduce code to process and parsing the data from various sources and storing parsed data into HBase and Hive using HBase - Hive Integration.
  • Worked on Developing custom MapReduce programs and User Defined Functions (UDFs) in Hive to transform the large volumes of data with respect to business requirement.
  • Involved in creating Hive tables and working on them using Hive QL.
  • Involved in moving all log files generated from various sources to HDFS for further processing through Flume.
  • Implemented Frameworks using Java and python to automate the ingestion flow.
  • Loading and transforming of large sets of structured and semi structured data.
  • Exported filtered data into HBase for fast query.
  • Handled the real time streaming data from different sources using flume and set destination as HDFS.
  • Involved in the installation, configuration and used the Hadoop ecosystem components such as Map Reduce, HDFS, Pig, Hive, Flume, HBase.

Environment: s: Hadoop, HDFS, Hive, Pig, Sqoop, HBase, MapReduce, Flume, UDFs

Confidential

Python Developer

Responsibilities:

  • Analyzed customer Help data, contact volumes, and other operational data in MySQL to provide insights that enable improvements to Help content and customer experience.
  • Individually developed, implemented and managed a data operation platform system to ensure company's routine work reduce unnecessary repetitive operations, and highly improve all departments' working efficiency.
  • Brought in and implemented updated analytical methods such as regression modeling, classification tree, statistical tests and data visualization techniques with Python
  • Deployed Machine Learning Models built using mahout on Hadoop cluster
  • Maintained and updated existing automated solutions.
  • Analyzed historical demand, filter out outliers/exceptions, identify the most appropriate statistical forecasting algorithm develop base plan, understand variance, propose improvement opportunities, and incorporate demand signal into forecast and executed data visualization by using plotly package in Python.
  • Improved data collection and distribution processes by using pandas and numpy packages in Python while enhancing reporting capabilities to provide clear line of sight into key performance trends and metrics.
  • Interacted with QA to develop test plans from high-level design documentation

Environment: s: Python (pandas, numpy, scipy, plotly, scikit learn, matplotlib), MySQL, Hadoop (HDFS, mahout), algorithms (regression, classification)

We'd love your feedback!