Data Scientist Resume
Hamilton, NJ
PROFESSIONAL SUMMARY
- Data Scientist with 8 years of experience in Machine Learning, Data mining with large data sets of Structured and Unstructured data, Data Acquisition, Data Validation, Predictive modeling, Data Visualization, Web Crawling, Web Scraping and programming languages like R and Python including Big Data technologies like Hadoop, Hive.
- Deep understanding of Statistical Analysis & Modelling, Algorithms and Multivariate Analysis. Familiar with model selection, testing, comparison and validation.
- Experienced in Machine learning (ML) techniques: Supervised, Unsupervised and Reinforcement learning techniques in building models.
- Worked on large sets of structured, semi - structured, and unstructured data.
- Statistical: Descriptive statistics, Distance measures, Hypothesis testing, Chi-Square, ANOVA.
- Experienced in Linear/ Logistics Regression, Random Forest, Decision Trees, CART, Naive Bayes, Association Mining, K-Means, hierarchical clustering.
- Experienced in implementing Factor Analysis and Principal Component Analysis Dimension reduction techniques.
- Performed Support vector machine(SVM) and Artificial Neural Networks(ANN) in building models.
- Working knowledge of current techniques and approaches in natural language processing (NLP).
- Developed Time series analysis that guide business people in determining key strategies.
- Capable in solving problems and producing flexible solutions using analytical and creative skills.
- Pruned the rules that are extracted from the decision trees using Pruning techniques like cost -complexity in order to improve accuracy and to decrease over-fitting the data.
- Implemented techniques like forward selection, backward elimination and step wise approach for selection of most significant independent variables.
- Evaluated model performance using RMSE score, Confusion matrix, ROC, Cross validation and A/B testing to in both simulated environment and real world.
- Experience in improving accuracy of models by using Boosting and Bagging techniques
- Experience in design, development, maintenance and support of Big Data Analytics using Hadoop Ecosystem components like HDFS, MapReduce, HBase, Hive, Impala and Pig.
- Experience in writing MAPREDUCE programs in java for data cleansing and preprocessing.
- Excellent understanding/knowledge in installation, configuration, supporting and managing Hadoop clusters using Amazon Web Services (AWS).
- Performed Exploratory Data Analysis and also visualized data using R, Python and hadoop.
- Performed Clustering Algorithms in segmenting clients using Social Media data.
- Extensive knowledge and work Experience in developing Android applications.
- Participate in daily agile meeting, weekly and monthly staff meetings and collaborate with various teams to develop and support ongoing analyses.
- Conducted data accuracy analysis and support stakeholders for decision-making.
- Analyzing data using advanced Excel functions such as Pivot Table, Charts and Graphs.
- Sound RDBMS concepts and extensively worked with Oracle 8i 9i 10g 11g, DB2, SQL Server 8.0 9.0 10.0 10.5 11.0, MySQL, and MS-Access.
- Developed interactive dashboards, Created various Ad Hoc reports for users in Tableau by connecting various data sources.
TECHNICAL SKILLS
Languages: R, SQL, Python, Shell scripting,Java
IDE: R Studio, Jupyter, Eclipse, NetBeans,Atom.
Databases: Oracle 11g, SQL Server, MS Access, MySQL,MongoDB, PL/SQL
Big Data Ecosystems: Hadoop, MapReduce, HDFS, HBase, Hive, Pig, Impala, Spark MLLib, ETL.
Operating Systems: Windows XP/7/8/10, Ubuntu, Unix,Linux
Packages: ggplot2, caret, dplyr, RWeka, gmodels, RCurl, tm, C50, Wordcloud, Kernlab, Neuralnet, twitter, NLP, Reshape2, rjson, plyr, pandas, numpy, seaborn, scipy, matplotlib, scikit-learn, Beautiful Soup, Rpy2.
Web Technologies: HTML, CSS, PHP, JavaScript
Data Analytics Tools: R console, Python (numpy, pandas, scikit-learn, scipy), SPSS
BI and Visualization: Tableau, SSAS, SSRS
Version Controls: GIT, SVN
WORK EXPERIENCE
Confidential, Hamilton, NJ
Data Scientist
Responsibilities:
- Responsible in data pattern recognition and data cleaning. Identify missing, invalid values and outliers, analyze and categorize variables of datasets.
- Actively involved in designing and developing data ingestion, aggregation, integration and advanced analytics in Hadoop.
- Worked on large sets of Unstructured, and Structured data.
- Developed multiple custom data models to drive innovative business solutions.
- Trained and tested Supervised Algorithms in Predicting lapse behavior model of policyholder.
- Pruned the rules that are extracted from the Decision trees using pruning techniques like cost -complexity in order to improve accuracy and to decrease over-fitting the data.
- Performed Principal Component Analysis (PCA) in R, to identify significant parameters in analysis.
- Evaluated the model using ROC curve, Cross validation and the k fold cross validation techniques.
- Improved accuracy of models by using Boosting and Bagging techniques
- Used R and Python packages for generating various graphs and charts for analyzing the data.
- Involved in analyzing large data sets to develop multiple custom models and algorithms to drive innovative business solutions.
- Involved in Analysis, Design and Implementation/translation of Business User requirements.
- Involved in developing strategic marketing opportunities by developing by analyzing and investigating relationships between policyholder satisfaction, engagement, and policy information.
- Developed Tableau visualizations to support ad-hoc analyses and interacted with Business users in understanding their requirements.
Environment: R, Python, SQL server, HDFS, Hbase, MapReduce, Hive, Impala, Pig, Sqoop, Spark MLLib, MangoDB, NoSQL, Tableau, ETL, Unix/Linux
Confidential, RESTON, VA
Data Scientist
Responsibilities:
- Performed Data Profiling to learn about behavior with various features such as caller ID, traffic pattern, location, number validity.
- Application of various machine learning algorithms like decision trees, regression models, neural networks, SVM, clustering to identify fraudulent profiles using scikit-learn package in python.
- Used clustering technique K-Means to identify outliers and to classify unlabeled data.
- Evaluated models using Cross Validation, Log loss function, ROC curves and used AUC for feature selection.
- Analyze traffic patterns by calculating autocorrelation with different time lags.
- Ensured that the model has low False Positive Rate.
- Addressed overfitting by implementing of the algorithm regularization methods like L2 and L1.
- Used Principal Component Analysis in feature engineering to analyze high dimensional data.
- Created and designed reports that will use gathered metrics to infer and draw logical conclusions of past and future behavior.
- Performed Multinomial Logistic Regression, Random forest, Decision Tree, SVM to classify Scammer, Telemarketer.
- Performed data analysis by using Hive to retrieve the data from Hadoop cluster, SQL to retrieve data from Oracle database.
- Implemented rule based expertise system from the results of exploratory analysis and information gathered from the people from different departments.
- Performed Data Cleaning, features scaling, features engineering using pandas and numpy packages in python.
- Developed MapReduce pipeline for feature extraction using Hive.
- Created Data Quality Scripts using SQL and Hive to validate successful data load and quality of the data. Created several types of data visualizations using Python and Tableau.
- Communicated the results with operations team for taking best decisions.
- Collected data needs and requirements by Interacting with the other departments.
Environment: Python 2.7, CDH5, HDFS, Hadoop 2.3,Hive, Impala, Linux, Tableau Desktop, SQL Server 2012, Microsoft Excel.
Confidential, Oak Brook, IL
DataScientist
Responsibilities:
- Implemented customer segmentation using unsupervised machine learning algorithms by implementing k-means algorithm and improved marketing advertisement traction by 20%.
- Explored and Extracted data from source XML in HDFS, preparingdatafor exploratory analysis usingdata munging.
- Used R for ExploratoryDataAnalysis, A/B testing, Anova test and Hypothesis test to compare and identify the effectiveness of Creative Email Campaigns.
- Created clusters to classify Control and test groups and conducted group campaigns.
- Performed Market Basket analysis and identified business rules to boost the revenue sales by 12%.
- Created various types ofdatavisualizations using R and Tableau.
- UsedR, SQL to create machine learning algorithms involving Multivariate Regression, Linear Regression, Logistic Regression, PCA, Random forest models, Matrix factorization models, Bayes collaborative models to target users with mobile offer campaigns and native Ads.
- Implemented Text analytics on historical email subject lines to retrieve effective Keywords and suggested them to creative team for creating new subject line that would increase open rates and delivery rates.
- Evaluated Time series analysis using R on historical Revenue to forecast weekly, monthly and quarterly Revenue. Created Revenue optimization algorithm to divert click traffic to different advertiser throughout the day to maximize Revenue.
- Scheduled the task for weekly updates and running the model in workflow. Automated the entire process flow in generating the analysis and reports.
Environment: R 3.0, HDFS, Hadoop 2.3, Pig, Hive, Linux, R-Studio, Tableau 10, SQL Server, MS Excel.
Confidential
Data Scientist
Responsibilities:
- Involved in Analysis, Design and Implementation/translation of Business User requirements.
- Worked on large sets of Structured and Unstructured data.
- Actively involved in designing and developing data ingestion, aggregation, and integration in Hadoop environment.
- Developed Sqoop scripts to import export data from relational sources and handled incremental loading on the customer, transaction data by date.
- Created Hive Tables, Partitioning and Bucketing.
- Performed data analysis and data profiling using complex SQL queries on various sources systems including Oracle 10g/11g and SQL Server 2012.
- Identified inconsistencies in data collected from different source.
- Worked with business owners/stakeholders to assess Risk impact, provided solution to business owners.
- Determined trends and significant data relationships Analyzing using advanced Statistical Methods.
- Carrying out specified data processing and statistical techniques such as sampling techniques, estimation, hypothesis testing, time series, correlation and regression analysis Using R.
- Applied various data mining techniques: Linear Regression & Logistic Regression, classification clustering.
- Took personal responsibility for meeting deadlines and delivering high quality work. Strived to continually improve existing methodologies, processes, and deliverable templates.
- Created Heat Map in Tableau showing current service subscribers by color that were broken into regions allowing business user to understand where we have most users vs. least users.
- Provided the thought leadership for framework of front end interactive visualization dashboard that delivers user-driven business intelligence using Tableau for end users.
Environment: R, SQL server, Oracle, HDFS, Hbase, MapReduce, Hive, Impala, Pig, Sqoop, NoSQL, Tableau, Unix/LinuxIWEEN
Confidential
Data Analyst/ Scientist
Responsibilities:
- Execute quantitative analyses that translate data into actionable insights. Provide analytical and data-driven decision-making support for key projects.
- Perform quantitative analysis of product
- sales trends to recommend pricing decisions.
- Conduct cost and benefits analysis on new ideas.
- Scrutinize and track customer behavior to identify trends and unmet needs.
- Develop statistical models to forecast inventory and procurement cycles.
- Assist in developing internal tools for data analysis.
Environment: Linux, MySQL, R, R-Studio, Tableau
Confidential
Programmer Analyst
Responsibilities:
- Effectively communicated with the stakeholders to gather requirements for different projects
- Used MySQL DB package and Python-MySQL connector for writing and executing several MYSQL database queries from Python.
- Created functions, triggers, views and stored procedures using My SQL.
- Worked closely with back-enddeveloper to find ways to push the limits of existing Web technology.
- Involved in the code review meetings.
Environment: Python, MySQL 5.1