- Over 7 years of experience as Data scientist with strong technical expertise, business experience, and communication skills to drive high - impact business outcomes through data-driven innovations and decisions.
- Extensive experience in Text Analytics, developing different Statistical Machine Learning, Data Mining solutions to various business problems and generating data visualizations using R, Python, and Tableau.
- Hands on experience in implementing Naive Bayes and skilled in Random Forests, Decision Trees, Linear and Logistic Regression, SVM, Clustering, neural networks, Principal Component Analysis and good knowledge on Recommender Systems.
- Experience in using cloud services Amazon Web Services (AWS) including EC2, S3, Amazon Machine Learning and EMR.
- Deep knowledge with Hadoop, Spark and experience with Big Data tools such as PySpark, Pig, Hive and flume etc.
- Good Understanding of working on Artificial Neural Networks and Deep Learning models using Theano and TensorFlow packages using in Python.
- Strong experience working with SQL Server 2008, RStudio, MATLAB, Oracle10i, Python.
- Experience working with statistical and regression analysis, multi-objective optimization.
- Worked on several python packages like NumPy, matplotlib, Beautiful Soup, Pickle, PySide, SciPy, python, PyTables etc.
- Experience in Text Analytics, data visualizations using R, Python, and Tableau and large transactional databases Oracle, HDFS.
- Developed Time series forecasting model for various business databases using the ARIMA time series analysis model.
- Solid knowledge of mathematics and experience in applying it to technical and research fields.
- Worked with clients to identify analytical needs and documented them for further use.
- Developed predictive models using Python & R to predict customers churn and classification of customers.
- Hands on advanced SQL experience summarizing, transforming, segmenting, joining datasets
- Automated recurring reports using SQL and Python and visualized them on BI platform like Tableau.
- Proficient at building and publishing interactive reports and dashboard with design customizations based on the stakeholders' needs in Tableau.
Data Sources: HDFS, SQL Server, Excel
Programming Languages: Python (numpy, pandas, nltk, scikit-learn, matplotlib), R, SQL, Matlab, Hadoop
Data Visualization: R, python, MS Power BI, Tableau
Data Exploration: R, Python, Tabpy
Cloud Platforms: AWS
Repository: Github, BitBucket using source tree, Eventhub
Regression analysis, classification, K: Means Clustering, Bayesian Methods, Decision Trees, Random Forests, Support Vector Machines, neural networks, Collaborative Filtering, KNN, Ensemble Methods.
IDE s: Canopy, Spyder, I python Notebook, Jupyter Notebook, R- studio.
- Worked as Data Scientist and used predictive modeling, statistics, Machine Learning, Data Mining, and other aspects of data analytics techniques to collect, explore, and extract insights from structured and unstructured data
- Prepared large volumes of structured and unstructured data performed quality checks, cleaning and preprocessing, and performed ETL with Hadoop, Impala, Hive and Pig on HDFS and SQL Server.
- Used Spark for test data analytics using MLLib and Analyzed the performance to identify bottlenecks.
- Utilized machine learning algorithms such as linear regression, multivariate regression, Naive Bayes, Random Forests, K-means, & KNN for data analysis.
- Developed MapReduce/Spark modules for machine learning & predictive analytics in Hadoop on AWS
- Implemented end-to-end systems for Data Analytics, Data Automation and integrated with custom visualization tools using R, Mahout, Hadoop and MongoDB.
- Advanced Text analytics using Deep learning techniques such as Convolutional neural networks to determine the sentiment of texts.
- Accomplished multiple tasks from collecting data to organizing data and interpreting statistical information.
- Evaluate the performance of various algorithms/models/strategies based on the real-world data sets.
- Interacted with the other departments to understand and identify data needs and requirements and work with other members of the IT organization to deliver data visualization and reporting solutions to address those needs.
- Worked on enhancements on PIG Scripts to include more Topics and creation of Pig UDFs to process User Cookies Data.
- Performed Exploratory Data Analysis and Data Visualizations using Tableau.
Environment: - Python 3.3, scipy, Pandas, AWS, Apache Spark, Hadoop, R Studio, SVN, Linux, Eclipse, Shell Scripting, Pig, MySQL, Hive, Impala, Mahout, Tableau.
Machine Learning Engineer
- Performed Data Profiling to learn about user behavior and merged data from multiple data sources.
- Performed K-means clustering, Logistic regression, Random forest, Decision Tree, Naive Bias, PCA and Support Vector Machines in Python and R.
- Worked on different data formats such as JSON, XML and performed machine learning algorithms in R and Python.
- Used pandas, numpy, seaborn, scipy, matplotlib, scikit-learn, NLTK in Python for developing various machine learning algorithms.
- Used k-Fold cross validation to evaluate the bank Neural Network. While it turned out to be a low bias, low variance model, its mean accuracy of 83% still left room for improvement.
- Grid Search was then used to automate hyperparameter tuning for the bank model. I evaluated performance using different combinations of the parameters used to compile and train the model like optimization function, loss function, batch size, and number of epochs. Accuracy was elevated to 86% while maintaining low bias and low variance.
- The RNN accurately predicts the direction the stock price is going, i.e whether it's rising or falling, while also appropriately smoothing out sharp spikes in price, but predictions tend to come in below real-world values. So, I'm currently using Grid Search for hyperparameter tuning to minimize the Root Mean Square Error (RMSE).
- Worked on Business forecasting, segmentation analysis and Data mining.
- Performed time series analysis using ARIMA model
- Worked on R packages to interface with Caffe Deep Learning Framework.
- Data Story teller, Mining Data from different Data Source such as SQL Server, Oracle, Cube Database, Web Analytics and Business Object.
- Carried out segmentation, conjoint analysis, building predictive models, social network analysis, and integrating secondary and primary data using R and SQL.
- Performed time series analysis using Tableau.
- Maintain version control of code using Github
Environment: - Python 3.3, scipy, Pandas, scikit-learn, matplotlib, R Studio, SVN, SQL, Tableau, Oracle, Github.
Confidential, New York
Data Analyst/Data Scientist
- Performed data extraction, aggregation, log analysis on real time data using Spark Streaming
- Prepare Data Model according to business requirements.
- Deployed different predictive models using python Scikit-Learn python framework.
- Improved statistical model performance by using leaning curves, feature selection methods and regularization.
- Implemented Principal Component Analysis and Liner Discriminate Analysis.
- Worked on commercial data from desperate source systems, built data models and transformed data to provide added value in IT applications by streamlining processes, reducing cost, maximizing profits & rolling out business solutions that met one of the objectives.
- Wrote simple and advanced SQL queries and scripts to create standard and ad hoc reports for senior managers.
- Perform validation on machine learning output from R.
- Created models for time-series forecasting, multi-variate analysis, optimizer design and simulation using E-views and R platform.
- Eliminate incomplete or unusable data.
- Performed Exploratory Data Analysis and Data Visualizations using R, and Tableau.
Environment: - Python, scipy, Pandas, R Studio, Tableau, SQL, scikit-learn, matplotlib, numpy.
- Responsible for data identification, collection, exploration & cleaning for modeling, participate in model development
- Conducted data analyses for company-level predictive models on key performance indicators cross-sectional analysis, industry/macro indicators, customer segmentations and customer cohort analysis by using Python.
- Implemented a job which leads an electronic medical record, extract data into Oracle Database and generate an output.
- Parsed data, producing concise conclusions from raw data in a clean, well-structured and easily maintainable format. Developed clustering models for customer segmentation using Python.
- Created dynamic linear models to perform trend analysis on customer transactional data in Python.
- Designed, implemented and automated modeling and analysis procedures on existing and experimentally created data.
- Involved with Data Analysis primarily Identifying Data Sets, Source Data, Source Meta Data, Data Definitions and Data Formats.
- Used Hive to store data and perform data cleaning for huge datasets.
- Extracted data from SQL servers (Oracle SQL) in Excel format.
- Analyze the data and provide the insights about the customers using Tableau.
Environment: - Python 3.3, scipy, Pandas, R Studio, SQL, Oracle, Tableau, matplotlib, numpy.
- Programming in python using libraries like scipy, numpy, pandas.
- Performing estimation and requirement analysis for the project timelines.
- Generated PDF reports daily using Aspose PDF kit.
- Generating property list applications using python.
- Designing SQL procedures and Linux shell script for import/export and converting data.
- Written SQL queries, store procedures and triggers for MYSQL databases.
- Coordinate architects and senior technical staff to identify client's needs and document assumptions.
- Building new requirements to move code through user acceptance testing.
- Analyze the data and provide the insights about the customers using Tableau and Power BI.
Environment: - Python 3.3, scipy, Pandas, SQL, numpy, Linux, Tableau, Power BI.
- Developed and designed forms using visual basic with ODBC.
- Involved in Creation of database access layer using JDBC and PL/SQL stored procedures.
- Developed transformations using jobs like Filter, Join, Lookup, Merge, Hashed file, Aggregator, Transformer and Dataset.
- Used MS Excel, MS Access and SQL to write and run various queries.
- Recommended structural changes and enhancements to systems and databases.
- Worked in dimensional modeling to design the data warehouse.
- Created functions, triggers, views and stored procedures using My SQL.
- Supported operations with database administration in Oracle RDBMS, Network, Windows XP, and Linux RedHat system administrations.
- Analyze the data and provide the insights about the customers using Tableau.
Environment: - Python 3.3, scipy, Pandas, Linux, MySQL, skilearn, numpy, Tableau.