We provide IT Staff Augmentation Services!

Data Scientist Resume

0/5 (Submit Your Rating)

Miami, FL

SUMMARY

  • Around 5+ years’ experience in advising use of data for compiling personnel and statistical reports and preparing personnel action documents patterns within data, analyzing data and interpreting results.
  • Strong ability to analyze sets of data for signals, patterns, ways to group data to answer questions and solve complex data puzzles.
  • Proficient in Data Acquisition, Storage, Analysis, Integration, Predictive Modeling, Logistic Regression, Decision Trees, Data Mining Methods, Forecasting, Factor Analysis, Cluster Analysis, Neural Networks and other advanced statistical and econometric techniques.
  • Adept in writing code in R and T - SQL scripts to manipulate data for data loads and extracts.
  • Proficient in data entry, data auditing, creating data reports & monitoring data for accuracy.
  • Ability to extract Web search and Data collection, Web data mining, extract database from website, extract Data entry and Data processing.
  • Extensive experience with creating MapReduce jobs, SQL on Hadoop using Hive and ETL using PIG scripts, Flume for transferring unstructured data to HDFS.
  • Strong Oracle/SQL Server programming skills, with experience in working with functions, packages and triggers.
  • Experience in all phases of Data warehouse development from Requirements, analysis, design, development, testing and post production support.
  • Strong in-depth knowledge in doing data analysis, data quality and source system analysis.
  • Independent, enthusiastic team player with strong adaptability to new technologies.

TECHNICAL SKILLS

Development: R, Python.

Concepts: Regression (Linear, Polynomial, Decision tree, Random Forest, SVR), Classification (Logistic regression, KNN, SVM, Decision tree, Random Forest), Text Mining, Sentiment analysis, Topic Modeling, Clustering, Neural Networks, Time series forecast, Business Process Mining.

Databases: Oracle, MySQL.

Tools: R studio, Azure ML, SAS Enterprise Miner.

PROFESSIONAL EXPERIENCE

Confidential, Miami, FL

Data Scientist

Responsibilities:

  • Utilized Spark, Scala, Hadoop, HBase, Kafka, Spark Streaming, MLLib, R, a broad variety of machine learning methods including classifications, regressions, dimensionally reduction etc. and Utilized the engine to increase user lifetime by 45% and triple user conversations for target categories.
  • Participated in features engineering such as feature intersection generating, feature normalize and label encoding with Scikit-learn preprocessing.
  • Used Python 3.X (numpy, scipy, pandas, scikit-learn, seaborn) and Spark2.0 (PySpark, MLlib) to develop variety of models and algorithms for analytic purposes.
  • Developed and implemented predictive models using machine learning algorithms such as linear regression, classification, multivariate regression, Naive Bayes, Random Forests, K-means clustering, KNN, PCA and regularization for data analysis.
  • Ensure solutions architecture / technical architectures are documented & maintained, while setting standards and offering consultative advice to technical & management teams and involved in recommending the roadmap and an approach for implementing the data integration architecture (with Cost, Schedule & Effort Estimates)
  • Designed and developed NLP models for sentiment analysis.
  • Led discussions with users to gather business processes requirements and data requirements to develop a variety of Conceptual, Logical and Physical Data Models. Expert in Business Intelligence and Data Visualization tools: Tableau, Microstrategy.
  • Developed and evangelized best practices for statistical analysis of Big Data.
  • Designed and implemented system architecture for Amazon EC2 based cloud-hosted solution for client.
  • Designed the Enterprise Conceptual, Logical, and Physical Data Model for 'Bulk Data Storage System 'using Embarcadero ER Studio, the data models were designed in 3NF
  • Worked on machine learning on large size data using Spark and MapReduce.
  • Collaborated with data engineers and operation team to implement ETL process, wrote and optimized SQL queries to perform data extraction to fit the analytical requirements.
  • Performed data analysis by using Hive to retrieve the data from Hadoop cluster, SQL to retrieve data from RedShift.
  • Explored and analyzed the customer specific features by using SparkSQL.
  • Performed data imputation using Scikit-learn package in Python.
  • Let the implementation of new statistical algorithms and operators on Hadoop and SQL platforms and utilized optimizations techniques, linear regressions, K-means clustering, Native Bayes and other approaches.
  • Developed Spark/Scala, SAS and R programs for regular expression (regex) project in the Hadoop/Hive environment with Linux/Windows for big data resources.
  • Conducted analysis on assessing customer consuming behaviours and discover value of customers with RMF analysis; applied customer segmentation with clustering algorithms such as K-Means Clustering and Hierarchical Clustering.
  • Built regression models include: Lasso, Ridge, SVR, XGboost to predict Customer Life Time Value.
  • Built classification models include: Logistic Regression, SVM, Decision Tree, Random Forest to predict Customer Churn Rate.
  • Used F-Score, AUC/ROC, Confusion Matrix, MAE, RMSE to evaluate different Model performance.

Environment: AWS RedShift, EC2, EMR, Hadoop Framework, S3, HDFS, Spark (Pyspark, MLlib, Spark SQL), Python 3.x (Scikit-Learn/Scipy/Numpy/Pandas/Matplotlib/Seaborn), Tableau Desktop (9.x/10.x), Tableau Server (9.x/10.x), Machine Learning (Regressions, KNN, SVM, Decision Tree, Random Forest, XGboost, LightGBM, Collaborative filtering, Ensemble), Teradata, Git 2.x, Agile/SCRUM

Confidential, Tampa, FL

Data Scientist

Responsibilities:

  • This Project AML Transaction Monitoring. Compliance programs depend on accurate and timely information. AML compliance centers on sifting through thousands of transactions and matching them against risk profiles. The result of that process is a focused examination of transactions and identification of suspicious.
  • Performed data extraction and analysis to develop business process mining model using BupaR.
  • Built forecast models using Prophet that improved planning and productivity by 25%.
  • Created Anomaly detection model, and auto answering chatbot to improve task management which was highly recognized by top management.
  • Developed SLA prediction models to help teams enhance work quality well before deadline.
  • Improved old models of SPSS into R framework by execution speed and accuracy by 20%.
  • Developed sophisticated data models to support automated reporting and analytics.
  • Analyzed & processed complex data sets using advanced query, visualization and analytics tools.
  • Collaborated with teams to develop and support internal data platform, ongoing analyses of client behavior and business outcomes, deployment of models on R server.
  • Performed data integrity checks, data cleaning, exploratory analysis and feature engineer
  • Developed personalized products recommendation with Machine Learning algorithms including collaborative filtering and Gradient Boosting Tree to meet the needs of existing customers and acquire new customers.
  • Coordinated the execution of A/B tests to measure the effectiveness of personalized recommendation system.
  • Recommended and evaluated marketing approaches based on quality analytics of customer consuming behavior.
  • Implemented Dynamic time wrapping for time series classification.

Environment: R, Oracle 12c, Tableau.

Confidential

Data Scientist

Responsibilities:

  • The team of Data analysts focused on providing analytics insights and decision support tools for executives for accurate demand planning and task allocation.
  • Identified, measured and recommended improvement strategies for KPIs across all business areas.
  • Assisted in defining, implementing, and utilizing business metrics calculations and methodologies.
  • Managed a team of Research, Development and Analysis (RDA) professionals for 1 year.
  • Assisted in demand planning by delivering the accurate forecasts and allocation plans for tickets.
  • Provided analytical support to underwriting and pricing by preparing and analyzing data to be used in auctorial calculations.
  • Designed dashboards with Tableau and Meteor JS provided complex reports including summaries, charts, and graphs to interpret findings to team and stakeholders.
  • Identified process improvements that significantly reduce workloads or improve quality.
  • Worked for BI & Analytics team to conduct A/B testing, data extraction and exploratory analysis.
  • Generated dashboards and presented the analysis to researchers explaining insights on the data.
  • Improved probability predictions model for client to check the prices of raw material the next year.

Environment: Excel 2010, R, MS SQL Server 200.

Confidential

Data Analyst

Responsibilities:

  • The team of developers and consultant worked across multiple projects to implement OLAP and OLTP systems, data modeling, system migration from old data warehouse and system maintenance.
  • Built data pipelines from multiple data sources by performing necessary ETL tasks.
  • Performed Exploratory Data Analysis using R, Apache Spark and text analysis, tri-idf analysis.
  • Worked on Data Cleaning, features scaling, features engineering.
  • Handle natural language processing to extract features from text data.
  • Visualized bigrams networks to investigate individual importance.
  • Built a forecasting model to predict future sales for anti-diabetes vaccines in global market.
  • Built multiple time-series models like ARIMA, ARIMAX (Dynamic Regression), TBATS, ETS.
  • Evaluated model’s performance on multiple test metrics such as MAPE, MAE & MASE.
  • Developed a shiny app to highlight Bayesian analysis and performed visualizations with ggplot2.

Environment: Oracle, SQL.

We'd love your feedback!