We provide IT Staff Augmentation Services!

Data Scientist Resume

2.00/5 (Submit Your Rating)

Los Angeles, CA

PROFESSIONAL SUMMARY:

  • 8+ years of experience in Data Analysis, Data Profiling, Data Integration, Migration, Data governance and Metadata Management, Master Data Management and Configuration Management.
  • Extensive experience in Text Analytics, developing different Statistical, Machine Learning, Data Mining solutions to various business problems and generating data visualizations using R, Python.
  • Hands on SparkMlib utilities such as including classification, regression, clustering, collaborative filtering, dimensionality reduction.
  • Hands on experience in implementing LDA, Naïve Bayes and skilled in Random Forests, Decision Trees, Linear and Logistic Regression, SVM, Clustering, neural networks, Principle Component Analysis and good knowledge on Recommender Systems.
  • Proficient in the complete stack of Data Science - Analysing the business scenario, Data Analysis using Exploratory Data Analysis techniques, Data Pre-processing, Dimensionality Reduction and Feature Engineering, Model building and evaluation, Interpreting and reporting results for data driven decision making
  • Hands-on domain knowledge in eclectic domains - Health Insurance, Marketing, Customer Analytics, Digital Security, and Logistics.
  • Proficient in designing and implementing Data Science solutions using multifarious tools and technologies - Python (Scikit-Learn, Pandas, Numpy), Tensor Flow, SQL, Apache Spark, Spark MLIB, Microsoft Azure HDInsight.
  • Expertise in major Machine Learning algorithms - Regression (Linear, Ridge, Lasso and Elastic Net), Classification (Decision Trees, Logistic Regression, Naïve Bayes, k-nearest neighbors), Ensemble methods (Bagging, Adaboost, Functional Gradient Boosting), Clustering (K-means, Mixture Models) through implementation from scratch in Python.
  • Expertise in advanced Deep Learning Architectures - Artificial Neural Networks, Recurrent Neural Networks (Long Term Short Term Memory, Gated Recurrent Units), Convolutional Neural Networks and Deep Learning model tuning techniques - Dropout, Xavier Initialization, Gradient Checking, Batch Normalization through implementation in Tensor Flow and Python
  • Proficient in optimization techniques - Gradient Descent (Stochastic, Mini-Batch, With Momentum), rmsProp, and Adam Optimization.
  • Proficient in Deep Learning for Natural Language Processing - Pre-processing text data (Cleaning, Annotation, Normalization), Word2vec Modeling using LSTMs, Language modeling and Sequence Modeling in LSTMs, Machine Translation with Attention Based networks.
  • Hands-on implementation of Machine Learning models in the cloud using Microsoft Azure.
  • Proficient in interpreting and reporting results to top management and clients to assist in data driven decision making.
  • Expertise in transforming business requirements into analytical models, designing algorithms, building models, developing datamining and reporting solutions that scales across massive volume of structured and unstructured data.
  • Strong experience in Software Development Life Cycle (SDLC) including Requirements Analysis, Design Specification and Testing as per Cycle in both Waterfall and Agile methodologies.
  • Adept in statistical programming languages like Rand also Python including Big Data technologies like Hadoop, Hive.
  • Skilled in using dplyr and pandas in R and python for performing Exploratory data analysis.
  • Experience in designing star schema, Snow flake schema for Data Warehouse, ODS architecture.
  • Experience in designing stunning visualizations using Tableau software and publishing and presenting dashboards, Storyline on web and desktop platforms.
  • Experience with Data Analytics, Data Reporting, Ad-hoc Reporting, Graphs, Scales, Pivot Tables and OLAP reporting.
  • Highly skilled in using visualization tools like Tableau, ggplot2 and d3.js for creating dashboards.

TECHINCAL SKILLS:

Big Data/Hadoop Technologies: Hadoop, HDFS, YARN, MapReduce, Hive, Pig, Impala, Sqoop, Flume, Spark, Kafka, Zookeeper, and Oozie

Languages: C, C++, HTML5, DHTML, WSDL, css3 XML, R/R Studio, SAS Enterprise Guide, SAS R, R (Caret, Weka, ggplot), Python (NumPy, SciPy, Pandas), SQL, PL/SQL, Pig Latin, HiveQL, Shell Scripting.

Cloud Computing Tools: Amazon AWS, Azure.

Databases: Microsoft SQL Server 2008 MySQL 4.x/5.x, Oracle 10g, 11g, 12c, DB2, Teradata, Netezza

NO SQL Databases: HBase, Cassandra, MongoDB, MariaDB

Build Tools: Maven, ANT, Toad, SQL Loader, RTC, RSA, Control-M, Oozie, Hue, SOAP UI

Development Tools: Microsoft SQL Studio, Eclipse, NetBeans, IntelliJ

Development Methodologies: Agile/Scrum, Waterfall, UML, Design Patterns

Version Control Tools and Testing: API Git, SVM, GitHub, SVN and JUNIT

ETL Tools: Informatica Power Centre, SSIS

Reporting Tools: MS Office (Word/Excel/PowerPoint/ Visio/Outlook), Crystal Reports XI, SSRS, Cognos7.0/6.0.

Operating Systems: All versions of UNIX, Windows, LINUX, Macintosh HD, Sun Solaris

PROFESSIONAL EXPERIENCE:

Confidential, Los Angeles, CA

Data Scientist

Responsibilities:

  • Worked closely with business, data governance, SMEs and vendors to define data requirements.
  • Created An deep learning models to detected the Various object.
  • Designed the prototype of the Data mart and documented possible outcome from it for end-user.
  • Involved in Analysing various Data aspect to know the user behaviours
  • Developed and maintained data dictionary to create metadata reports for technical and business purpose.
  • Developed various QlikView Data Models by extracting and using the data from various sources files, DB2, Excel, Flat Files and Bigdata.
  • Collected Data from Various Resource and collaborate to Performing various EDA and Visualization.
  • Building prediction models using Linear and Ridge Regression, for predicting future customers based on historical data. Developed the model with ~3 million data points from historical data and evaluated the model with F-score and adjusted R-squared measure.
  • Customer Profiling models using K - means and K-means ++ clustering algorithms to enable targeted marketing. Developed the model with ~1.4million data points and used the elbow method to find the optimal value of K using Sum of Squared error as the error measure.
  • Designed and implemented a probabilistic churn prediction model with ~80k customer data to predict the probability of customer churn out using Logistic Regression in Python. Client utilized the results in the business to finalize the list of customers to provide a discount.
  • Implemented dimensionality reduction using Principal Component Analysis and k-fold cross validation as part of Model Improvement.
  • Implemented Pearson’s Correlation and Maximum Variance techniques to find the key predictors for the Regression models.
  • Data analysis using Exploratory Data Analysis techniques in Python and R, including:
  • Generating Univariate and Multivariate graphical plots.
  • Correlation Analysis (chi-square and Pearson correlation test)
  • Coordinated with Onsite Actuaries, Senior Management and Client to interpret and report the results for assisting the in corporation if results in business scenarios.
  • Implement Various Big data Pipelines to build a machine learning models.
  • Analysed various customer behaviours on product to find out the root cause of problem.

Environment: Python, Pandas, Numpy, SQL, IBM DB2 Stored Procedures, Tableau Hadoop, PL/SQL, etc.

Confidential, Los Angeles, CA

Data Scientist

Responsibilities:

  • Extracted data from HDFS and prepared data for exploratory analysis using data mining
  • Built models using Statistical techniques like Bayesian HMM and Machine Learning classification models like XGBoost, SVM, and Random Forest.
  • A highly immersive Data Science program involving Data Manipulation &Visualization, Web Scraping, Machine Learning, Python programming.
  • Setup storage and data analysis tools in Amazon Web Services cloud computing infrastructure.
  • Used pandas, numpy, seaborn, scipy, matplotlib, scikit-learn, NLTK in Python for developing various machine learning algorithms.
  • Worked on different data formats such as JSON, XML and performed machine learning algorithms in Python.
  • Participated in all phases of datamining; data collection, data cleaning, developing models, validation, visualization and performed Gapanalysis.
  • Good knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task
  • Programmed a utility in Python that used multiple packages (scipy, numpy, pandas)
  • Implemented Classification using supervised algorithms like Logistic Regression, Decision trees, KNN, Naïve Bayes.
  • Experience in Hadoop ecosystem components like Hadoop Map Reduce, HDFS, HBase, Hive, Sqoop, Pig, Flume including their installation and configuration.
  • Updated Python scripts to match training data with our database stored in AWS Cloud Search, so that we would be able to assign each document a response label for further classification.
  • Data transformation from various resources, data organization, features extraction from raw and stored.
  • Validated the machine learning classifiers using ROC Curves and LiftCharts.

Environment: Unix, Python 3.5,, MLLib, SAS, regression, logistic regression, Hadoop 2.7, NoSQL, Teradata, OLTP, random forest, OLAP, HDFS, ODS, NLTK, SVM, JSON, XML and Map Reduce.

Confidential, Los Angeles, CA

Data Scientist

Responsibilities:

  • Design, implementation and evaluation of Predictive models to predict the rate quoted by carriers for customers and shipping contractors
  • Dynamic Pricing Model using Recurrent Neural Networks in Tensor Flow considering the temporal features Local and Destination weather, Fuel Prices, Produce Season, Time of Day/Month/year, Order Seasonality etc.
  • Prediction Model using Artificial Deep Neural Networks in Tensor Flow for static features - Lane, Equipment Type, Dead time, Headhaul vs Backhaul etc.
  • Implementing Xavier Initialization and Drop out regularization for tuning the model.
  • Implementing ANOVA, and forward selection based on adjusted r squared to select the top features.
  • Developed various Qlik View Data Models by extracting and using the data from various sources files, DB2, Excel, Flat Files and Bigdata.
  • Handled importing data from various data sources, performed transformations using Hive, Map Reduce, and loaded data into HDFS.
  • Interaction with Business Analyst, SMEs, and other Data Architects to understand Business needs and functionality for various project solutions
  • Researched, evaluated, architected, and deployed new tools, frameworks, and patterns to build sustainable Big Data platforms for the clients.
  • Designed both 3NF data models for ODS, OLTP systems and dimensional data models using Star and Snowflake Schemas.
  • Collaborated the data mapping document from a source to target and the data quality assessments for the source data.
  • Used Expert level understanding of different databases in combinations for Data extraction and loading, joining data extracted from different databases and loading to a specific database.
  • Created PL/SQL packages and Database Triggers and developed user procedures and prepared user manuals for the new programs.

Environment: r9.0, Informatics 9.0. Hadoop, Tensor Flow, Python, Apache Spark, Pandas, Numpy, matplotlib, Windows 10

Confidential, Farmington, MI

Data Analyst

Responsibilities:

  • Responsible for performing Machine-learning techniques regression/classification to predict the outcomes.
  • Responsible for design and development of advanced R/Python programs to prepare to transform and harmonize data sets in preparation for modelling.
  • Developed large data sets from structured and unstructured data. Perform data mining.
  • Partnered with modellers to develop data frame requirements for projects.
  • Performed Ad-hoc reporting/customer profiling, segmentation using R/Python.
  • Tracked various campaigns, generating customer profiling analysis and data manipulation.
  • Provided R/SQL programming, with detailed direction, in the execution of data analysis that contributed to the final project deliverables. Responsible for data mining.
  • Analyzed large datasets to answer business questions by generating reports and outcome- driven marketing strategies.
  • Used Pythonto apply time series models, the fast growth opportunities for our clients
  • Involved in fixing bugs and minor enhancements for the front-end modules.
  • Class model, Sequence diagrams, and Activity diagrams for SDLC process of the application
  • Maintenance in the testing team for System testing/Integration/UAT.
  • Conducted Design reviews and Technical reviews with other project stakeholders.
  • Was a part of the complete life cycle of the project from the requirements to the production support.
  • Involved in loading data from RDBMS and weblogs into HDFS using Sqoop and Flume.
  • Worked on loading the data from MySQL to HBase where necessary using Sqoop.
  • Developed Hive queries for Analysis across different banners.
  • Extracted data from Twitter using Java and Twitter API. Parsed JSON formatted twitter data and uploaded to the database the existing system.

Environment: R3.0, Erwin9.5, Tableau8.0, MDM, Qlik View, MLLib, PL/SQL, HDFS, Teradata14.1, JSON, Map Reduce, MySQL, Spark, R Studio.

Confidential

Research Analyst

Responsibilities:

  • Created datasets as per the approved specification collaborated with project teams to complete scientific reports and review reports to ensure accuracy and clarity.
  • Expertise in Agile Scrum Methodology to implement project life cycles of reports design and development
  • Implemented Functional Gradient Boosting from scratch in python, compared the model with scikit-learn to result in equal accuracy of 98.3%.Relational Functional Gradient Boosting(RFGB) - State of the Art Statistical Relational learning algorithm in Python.
  • Development and Integration of GUI in Java for Human in the Loop Machine Learning to impart advice to the learning algorithm (RFGB and Transfer Learning) and model Boosted Relational Dependency Networks for probabilistic inference on the data.
  • Implemented Tree Boosting for Relational Imitation Learning (TIBRIL) in Python with AUC PR of 93.5%

Environment: SQL Server, Python, Pandas, Numpy, matplotlib, Java, Windows 10.

Confidential

Data Analyst

Responsibilities:

  • Developed business process models using MS Visio to create case diagrams and flow diagrams to show flow of steps that are required.
  • Worked with other teams to analyse customers to analyse parameters of marketing.
  • Used MS Excel, MS Access and SQL to write and run various queries.
  • Used traceability matrix to trace the requirements of the organization.
  • Recommended structural changes and enhancements to systems and databases.

Environment: UNIX, SQL, Oracle, MS Office, MS Visio

We'd love your feedback!