Data Scientist Resume

SUMMARY

Over 5 years of experience in manipulation, wrangling, model building and visualization with large data sets.
An analytical and detail oriented Data science professional with proven records of success in the collection and manipulation of large datasets.
Demonstrated expertise in decisive leadership and in delivering research based, data driven solutions that move organizations vision forward.
Highly competent Confidential researching, visualizing and analyzing raw data in order to identify recommendations for meeting organizational challenges.
Proven excellence in personal management and program development.
Ability to perform Data preparation and exploration to build the appropriate machine learning model.
Proficient in Statistical Modeling and Machine Learning techniques in Predictive Analytics, Segmentation methodologies, Regression based models, Hypothesis testing, PCA, Ensembles.
Expertise in Machine Learning models like Linear, Logistic Regression, Decision Trees, Naive Bayes, SVM, Neural Networks, K - Nearest Neighbors, clustering (K-means, Hierarchical)
Implement and practice Machine learning techniques on structured and unstructured data with equal proficiency.
Experience with Data Analytics, Data Reporting, Ad-hoc Reporting, Graphs, Scales, Pivot Tables.
Ability to use dimensionality reduction techniques and regularization techniques.
Highly skilled in using visualization tools like Matplotlib, ggplot2 and Seaborn for creating dashboards.
Experience working with Big Data tools such as Hadoop - HDFS and MapReduce, Hive, Sqoop, and Apache Spark (PySpark).
Experience working with RDBMS such as SQL Server, MySQL and NoSQL databases such as MongoDB, Cassandra, HBase.
Experience in importing and exporting data from different RDBMS like MySql, Oracle and SQL Server into HDFS and Hive using Sqoop.
Good Knowledge about scalable, secure cloud architecture based on Amazon Web Services (AWS cloud services: EC2, EMR and S3).
Strong communication skills with professional attitude and can take the pressures to drive with enthusiasm to support with full potential.

TECHNICAL SKILLS

Programming: Python, R, SCALA

Python: Data Manipulation, Numpy, Pandas, Matplotlib, Seaborn, Plotly, Scikit learn (machine learning libraries and others)

Big Data: Hadoop, Map Reduce, HDFS, Hive, Kafka, Pig, Oozie, Flume, Sqoop, Impala, Spark

Spark: Spark Core, Spark SQL, Spark Streaming, MLlib, GraphX, PySpark, Data Frame

Platforms: Ubuntu, Linux, MacOS

Analytical Tools: SQL, Jupyter Notebook, Apache Zeppelin, MS Excel

Methodologies: Agile, Scrum, Software development Life Cycle(SDLC)

NoSQL: MongoDB, Cassandra, HBase

Others: AWS, S3, EC2, EMR, MySQL, PostgreSQL

PROFESSIONAL EXPERIENCE

Confidential

Data Scientist

Responsibilities:

Understanding the business, problem statement and manual approaches company has followed since years
Gathered all the data that is required from multiple data sources such as data warehouse, Billing department
Involved in creating Data Lake by extracting customer's Big Data from various data sources into Hadoop HDFS. This included data from Excel, Flat Files, RDBMS, SQL Server, HBase, and also log data from servers
Perform data cleaning and transformations that is suitable for applying models using Pandas, Numpy
Performed transformations of data using Spark and Hive to generate the final dataset to be consumed by analytical applications
Performed Exploratory Data Analysis (EDA)
Participated in features engineering such as feature generating, PCA, feature normalization with Scikit-learn preprocessing
Analyzed data and predicted end customer behaviors and product performance by applying machine learning algorithms using Spark MLlib
Experimented and built predictive models using Logistic regression, Decision Tree, Support Vector Machine and KNN to predict customer churn
Model performance accuracy was evaluated by using Confusion Matrix, Precision, and Recall
Developed logistic regression model with 61 percent of model accuracy

Environment: HDFS, Hive, Sqoop, Spark, Spark MLlib, SQL, Excel, MongoDB

Confidential - San Francisco, CA

Data Scientist

Responsibilities:

Responsible for researching and developing the action plan required for the development of the model
Participated in Data Acquisition with Data Engineer team to extract historical and real-time data by using Sqoop, Spark, Hive, Kafka, MapReduce and HDFS
Worked with data-sets of varying degrees of size and complexity including both structured and unstructured data
Performed data integrity checks, data cleansing, exploratory analysis and feature engineer using python and data visualization packages such as Matplotlib, Seaborn
Utilized data wrangling tools and advanced statistical/machine learning techniques to create high-performing predictive models and actionable insights to address business objectives and client needs
Used various metrics (RMSE, MAE, F-Score, ROC and AUC) to evaluate the performance of each model
Used big data tools Spark (PySpark, SparkSQL, MLlib) to conduct real time analysis of customer behavior
Communicated effectively with internal stakeholders on product design, data specification, model implementations, with partners on collaboration ideas and specifics, with clients and account teams on project/test results
Recommended and evaluated marketing approaches based on quality analytics on customer behavior
Designed rich data visualizations to model data into human-readable form with Seaborn and Matplotlib

Environment: Hadoop, Spark, HDFS, Hive, MongoDB, Cassandra, Kafka, Sqoop, SQL, Python 3 (Scikit -Learn/ Scipy/ Numpy/ Pandas/ Matplotlib/ Seaborn), Machine Learning (Logistic regression/ Random Forests/ KNN/ K-Means Clustering/PCA)

Confidential - San Francisco, CA

Big Data Engineer

Responsibilities:

Responsible for data engineering functions such as data extraction, injection and transformation
Imported data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and extracted the data from RDBMS into HDFS using SQOOP
Optimized Hive pipelines in Data lake by implementing Partitioning, and bucketing concepts for improving performance
Exported the analyzed data to the RDBMS using SQOOP for visualization and to generate reports for the BI team
Stored the resultant data from transformation into HBase, MongoDB and also in parquet file format
Worked closely with data scientists to assist on feature engineering, model training frameworks, and model deployments Confidential scale
Worked with application developers and DBAs to diagnose and resolve query performance problems
Collaborated with Marketing, Finance, Business Development, Product & other teams to help them uncover the insights from the data

Environment: HDFS, PIG, HIVE, Map Reduce, Linux, HBase, Flume, Sqoop, R, VMware, Cloudera, Python, MongoDB, Cassandra, MySQL

Confidential - Emeryville, CA

Data Engineer

Responsibilities:

Worked on Hadoop Cluster with size of 30 nodes and 50 TB capacity
Extracted and Loaded customer data from databases to HDFS and HIVE tables using Sqoop
Loading the data into Hive managed tables using partitions and buckets
Performed data transformations, cleaning and filtering, using Hive and Pig
Analyzed and studied customer behavior by running Hive queries
Stored the resultant from transformation into parquet, seq, avro file format
Work closely with the business and analytics teams in gathering the system requirements
Documentation of the day to day tasks

Environment: Hadoop, HDFS, YARN, Map-Reduce, Hive, Pig, Sqoop, Linux Python.

Confidential

Jr. SQL Developer

Responsibilities:

Worked closely with all teams within the organization to understand business processes, gather requirements, understand complexities and end goals to come up with the best plan of execution
Created database objects like Tables, Indexes, Stored Procedures, Views, User Defined Functions, Cursors and Triggers.
Developed Report Services using SSRS
Assisted managers and business analysts in developing reports, presentations, and analysis for upper management

ENVIRONMENT: Python, MYSQL, SQL

We provide IT Staff Augmentation Services!

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship