We provide IT Staff Augmentation Services!

Data Scientist Resume

Bothell, WA


  • Over 5 years of experience in manipulation, wrangling, model building and visualization with large data sets.
  • An analytical and detail oriented Data science professional with proven records of success in the collection and manipulation of large datasets.
  • Demonstrated expertise in decisive leadership and in delivering research based, data driven solutions that move organizations vision forward.
  • Highly competent Confidential researching, visualizing and analyzing raw data in order to identify recommendations for meeting organizational challenges.
  • Proven excellence in personal management and program development.
  • Ability to perform Data preparation and exploration to build the appropriate machine learning model.
  • Proficient in Statistical Modeling and Machine Learning techniques in Predictive Analytics, Segmentation methodologies, Regression based models, Hypothesis testing, PCA, Ensembles.
  • Expertise in Machine Learning models like Linear, Logistic Regression, Decision Trees, Naive Bayes, SVM, Neural Networks, K - Nearest Neighbors, clustering (K-means, Hierarchical)
  • Implement and practice Machine learning techniques on structured and unstructured data with equal proficiency.
  • Experience with Data Analytics, Data Reporting, Ad-hoc Reporting, Graphs, Scales, Pivot Tables.
  • Ability to use dimensionality reduction techniques and regularization techniques.
  • Highly skilled in using visualization tools like Matplotlib, ggplot2 and Seaborn for creating dashboards.
  • Experience working with Big Data tools such as Hadoop - HDFS and MapReduce, Hive, Sqoop, and Apache Spark (PySpark).
  • Experience working with RDBMS such as SQL Server, MySQL and NoSQL databases such as MongoDB, Cassandra, HBase.
  • Experience in importing and exporting data from different RDBMS like MySql, Oracle and SQL Server into HDFS and Hive using Sqoop.
  • Good Knowledge about scalable, secure cloud architecture based on Amazon Web Services (AWS cloud services: EC2, EMR and S3).
  • Strong communication skills with professional attitude and can take the pressures to drive with enthusiasm to support with full potential.


Programming: Python, R, SCALA

Python: Data Manipulation, Numpy, Pandas, Matplotlib, Seaborn, Plotly, Scikit learn (machine learning libraries and others)

Big Data: Hadoop, Map Reduce, HDFS, Hive, Kafka, Pig, Oozie, Flume, Sqoop, Impala, Spark

Spark: Spark Core, Spark SQL, Spark Streaming, MLlib, GraphX, PySpark, Data Frame

Platforms: Ubuntu, Linux, MacOS

Analytical Tools: SQL, Jupyter Notebook, Apache Zeppelin, MS Excel

Methodologies: Agile, Scrum, Software development Life Cycle(SDLC)

NoSQL: MongoDB, Cassandra, HBase

Others: AWS, S3, EC2, EMR, MySQL, PostgreSQL


Confidential - Bothell, WA

Data Scientist


  • Understanding the business, problem statement and manual approaches company has followed since years
  • Gathered all the data that is required from multiple data sources such as data warehouse, Billing department
  • Involved in creating Data Lake by extracting customer's Big Data from various data sources into Hadoop HDFS.
  • This included data from Excel, Flat Files, RDBMS, SQL Server, HBase, and also log data from servers
  • Perform data cleaning and transformations that is suitable for applying models using Pandas, Numpy
  • Performed transformations of data using Spark and Hive to generate the final dataset to be consumed by analytical applications
  • Performed Exploratory Data Analysis (EDA)
  • Participated in features engineering such as feature generating, PCA, feature normalization with Scikit-learn preprocessing
  • Analyzed data and predicted end customer behaviors and product performance by applying machine learning algorithms using Spark MLlib
  • Experimented and built predictive models using Logistic regression, Decision Tree, Support Vector Machine and KNN to predict customer churn
  • Model performance accuracy was evaluated by using Confusion Matrix, Precision, and Recall
  • Developed logistic regression model with 61 percent of model accuracy

Environment: HDFS, Hive, Sqoop, Spark, Spark MLlib, SQL, Excel, MongoDB Python3 (Scikit -Learn/ Scipy/ Numpy/ Pandas/ Matplotlib/ Seaborn), Machine Learning (Logistic regression/ Random Forests/ KNN/ K-Means Clustering/PCA)

Confidential - San Francisco, CA

Data Scientist


  • Responsible for researching and developing the action plan required for the development of the model
  • Participated in Data Acquisition with Data Engineer team to extract historical and real-time data by using Sqoop, Spark, Hive, Kafka, MapReduce and HDFS
  • Worked with data-sets of varying degrees of size and complexity including both structured and unstructured data
  • Performed data integrity checks, data cleansing, exploratory analysis and feature engineer using python and data visualization packages such as Matplotlib, Seaborn
  • Utilized data wrangling tools and advanced statistical/machine learning techniques to create high-performing predictive models and actionable insights to address business objectives and client needs
  • Used various metrics (RMSE, MAE, F-Score, ROC and AUC) to evaluate the performance of each model
  • Used big data tools Spark (PySpark, SparkSQL, MLlib) to conduct real time analysis of customer behavior
  • Communicated effectively with internal stakeholders on product design, data specification, model implementations, with partners on collaboration ideas and specifics, with clients and account teams on project/test results
  • Recommended and evaluated marketing approaches based on quality analytics on customer behavior
  • Designed rich data visualizations to model data into human-readable form with Seaborn and Matplotlib

Environment: Hadoop, Spark, HDFS, Hive, MongoDB, Cassandra, Kafka, Sqoop, SQL, Python 3 (Scikit -Learn/ Scipy/ Numpy/ Pandas/ Matplotlib/ Seaborn), Machine Learning (Logistic regression/ Random Forests/ KNN/ K-Means Clustering/PCA)

Confidential - Dublin, CA

Big Data Engineer


  • Responsible for data engineering functions such as data extraction, injection and transformation
  • Imported data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and extracted the data from RDBMS into HDFS using SQOOP
  • Optimized Hive pipelines in Data lake by implementing Partitioning, and bucketing concepts for improving performance
  • Exported the analyzed data to the RDBMS using SQOOP for visualization and to generate reports for the BI team
  • Stored the resultant data from transformation into HBase, MongoDB and also in parquet file format
  • Worked closely with data scientists to assist on feature engineering, model training frameworks, and model deployments Confidential scale
  • Worked with application developers and DBAs to diagnose and resolve query performance problems
  • Collaborated with Marketing, Finance, Business Development, Product & other teams to help them uncover the insights from the data

Environment: HDFS, PIG, HIVE, Map Reduce, Linux, HBase, Flume, Sqoop, R, VMware, Cloudera, Python, MongoDB, Cassandra, MySQL

Confidential - Emeryville, CA

Data Engineer


  • Worked on Hadoop Cluster with size of 30 nodes and 50 TB capacity
  • Extracted and Loaded customer data from databases to HDFS and HIVE tables using Sqoop
  • Loading the data into Hive managed tables using partitions and buckets
  • Performed data transformations, cleaning and filtering, using Hive and Pig
  • Analyzed and studied customer behavior by running Hive queries
  • Stored the resultant from transformation into parquet, seq, avro file format
  • Work closely with the business and analytics teams in gathering the system requirements
  • Documentation of the day to day tasks

Environment: Hadoop, HDFS, YARN, Map-Reduce, Hive, Pig, Sqoop, Linux Python.

Confidential - Mountain View, CA

Jr. SQL Developer


  • Worked closely with all teams within the organization to understand business processes, gather requirements, understand complexities and end goals to come up with the best plan of execution
  • Created database objects like Tables, Indexes, Stored Procedures, Views, User Defined Functions, Cursors and Triggers.
  • Developed Report Services using SSRS
  • Assisted managers and business analysts in developing reports, presentations, and analysis for upper management
Environment: Python, MYSQL, SQL

Hire Now