Data Scientist Resume
Austin, TX
OBJECTIVE:
- I am a Data Scientist with over 5+ years of experience. Looking for opportunities on Data Science. SQL, MS Excel, and R are the most frequently used tools. Have great mathematical and statistical skills with usage of time series calculation, predictive modelling, data processing, and data mining algorithms to solve challenging business problems.
- Probability was my major course in Bachelors and Statistics in Masters.
- My goal is to help an organization succeed in implementing / deploying software.
SUMMARY
- Data Scientist/ML Engineer having 5+ years of experience in Data Science, Machine Learning, Natural Language Processing, Deep Learning and Artificial Intelligence with hands on experience at executing data driven solution to increase efficiency, accuracy, and utility of internal data processing.
- Microsoft Azure Machine Learning Certified for Big Data Analytics
- Well versed in machine learning algorithms (Supervised, Unsupervised and Reinforcement Learning) such as Linear Regression, Logistic Regression, Decision Tree, Random Forest, Support Vector Machine (SVM), Naive Bayes, Neural Network, K - nearest neighbors, Social Network Analysis, Association Rule, PCA, clustering algorithms (K-Means, Hierarchical), Singular Value Decomposition, A/B Testing, Forecasting using (Naïve forecasting, AR, MA, ARIMA, VAR, VARMA, SES, RNN, LSTM)
- Hands on experience on Predictive Modeling, Predictive Analytics, Statistical Modeling, Trend Analysis, Time Series analysis, Forecasting and risk analysis.
- Sound knowledge on Recommendation Systems (Collaborative filtering, Content based recommendation, Item based recommendation)
- Proficient in Natural Language Processing (NLP), Natural Language Understanding, Text Analytics etc. using R and Python (Stanford NLP, NLTK, genism, TextBlob etc.) and Strong background on Statistical learning techniques for NLP (HMMs, CRFs, SVMs, LDA, LSI, MRFs)
- Extensive programming skills in analytical programming languages such as R, Python, MATLAB, SAS
- Hands on experience in Data mining with large data sets of Structured, semi-structured and unstructured data, Data munging and transformation, Data Acquisition, Data Validation, Explanatory Data Analytics (EDA), Data Wrangling, Data Warehousing, Data mapping and data review.
- Extensive experience in Data Visualization including producing tables, graphs, listings using various procedures and tools such as Tableau, Power BI, R, R shiny, ggplot2, matplotlib and seaborn and expert in creating dashboard and reports using Tableau, MS PowerBI, Microsoft Azure.
- Adept and deep understanding of Statistical Modeling, Multivariate Analysis, model testing, problem analysis, model comparison and validation.
- Hands on experience with Big Data tools like Hadoop, Spark, Hive, Pig, Impala, Pyspark, Spark Sql.
- Experience with mathematical and statistical Python Libraries: numpy, pandas, skit-learn, SciPy, Matplotlib, Seaborn, Beautiful Soup, Rpy2, NLTK, TensorFlow, Keras, Theano, R packages: CaTools, Caret, rpart, dplyr, ggplot2, rjson, Rweka, tidytext, tm (text mining), stringr, snowball, pylr, RCurl, gmodels, C50, reshape2, twitter r,
- Proficient in managing entire data science project life cycle and actively involved in all the phases of project life cycle including data acquisition, data cleaning, Engineering, features scaling, features engineering, statistical modeling, dimensionality reduction using Principal Component Analysis, Factor Analysis, testing and validation using ROC plot, K- fold cross validation and Data Visualization
- Adept in statistical programming languages like Python and R including Big Data technologies like Hadoop and Hive, Deep understanding of MapReduce with Hadoop and Spark. Good knowledge of Big Data Ecosystem like Hadoop 2.0 (HDFS, Hive, Pig, Impala), Spark (SparkSql, Spark MILib, Spark Streaming)
- Experienced with new modern Deep Learning approaches to NLP, NLU (word/paragraph embedding’s, structure prediction, sentiment analysis, disambiguation, entity/relation extraction, information extraction, summarization, semantic, documents classification, ontology, question answering, and knowledge graph)
- Skilled in performing Data Parsing, Data Manipulation and Data Preparation with methods including describe Data contents, compute descriptive statistics of Data, regex, split and combine, remap, merge, subset, reindex, melt and reshape
- Sound knowledge in Enterprise Data Warehousing and Business Intelligence, ETL (Extract, Transform, Load), dimensional/Hierarchical data modeling, Data mapping, Data Dictionaries Used IBM Cognos BI tool .
- Experience writing advanced SQL programs for joining multiple tables, sorting data, creating SQL views, creating indexes and metadata analysis.
- Experiences in performing Ad Hoc queries for various application and reports on a daily, weekly and monthly basis using complex SQL queries.
- Good Knowledge in Proof of Concepts (PoC's), gap analysis and gathered necessary Data for analysis from different sources, prepared Data for Data exploration using Data munging
- Excellent performance in building, publishing customized interactive reports and dashboards with customized parameters and user-filters using Tableau
- Experience in implementing data analytics solutions in Cloud platforms AWS (EC2, EMR. S3), Google Cloud Platform, Microsoft Azure
- Have excellent knowledge on Machine Learning as Service (MLaas), AWS Machine Learning, Microsoft Azure Machine Learning, Google Prediction API, IBM Watson
- Excellent understanding of Systems Development Life Cycle (SDLC), Agile, Scrum and waterfall
- Experience with Version Control tool - Git, Subversion, BitBucket
- Hands on experience with ETL automation tools-Talend, Informatica PowerCenter, IDQ, MDM, and Informatica cloud
TECHNICAL SKILLS
Web Technologies: HTML/HTML5, CSS2/CSS3.
Database: PL/SQL (oracle), MySQL, MSSQL, Oracle 11g, SQLite3.
Development Tools: Adobe Photoshop CS5, Yahoo Search Marketing (spring tool suite)
Cloud Platform: AWS
IDE's and Tools: Eclipse IDE, NetBeans, Dreamweaver, Firebug, Developer Tools, Edit Plus, JSfiddle, WebStorm, Tatastrom, Sublime text, Notepad++, PyCharm, PyScripter, Spyder.
Programming Languages: C, Core Java, Advance Java, Python.
Operating Systems: Windows 98/2000/XP/Vista/7/8, MAC OS X, Unix, Windows10/10.1
Project Management Software: Script Editors VS 2008/2010/2012, MyEclipse, Dreamweaver, Microsoft Visio.
MS Project: Jira
Cloud Computing: Firebase
Methodologies: Waterfall, Agile/Scrum
Python Libraries/Packages: NumPy, SciPy, Pickle, PyQT, PySide, PyTables, Data Frames, Pandas, Matplotlib, SQLAlchemy, HTTPLib2, Urllib2, Beautiful Soup, Py Query
Python Frameworks: PyJamas, Jython
Miscellaneous: Git, GitHub
Web Services/ Protocols: TCP/IP, UDP, FTP, HTTP/HTTPS, SOAP, Rest, Restful
REPORTING TOOLS: SQL Server, MS Access, SSIS, SSRS, Excel, Pivot
PROFESSIONAL EXPERIENCE
Confidential, Austin, TX
Data Scientist
Responsibilities:
- Involved in End-to-end components of prediction modeling for Structure and Unstructured data to solve the business problem
- Build recommendation engine to recommend the skills for the chat-bot users
- Developed user personalized recommendation model, build collaborative filtering (item-based, popularity based, user-based) recommendation engine to enhance and increase the efficiency of the chat bot.
- Build Machine Learning models using deep learning (Neural Network: LSTM) for the text classification to classify the type of claim complaint made by email.
- Have used Spacy, NLTK, StanfordCoreNLP for Email Parsing, Name Entity Recognition to auto-reply emails by identify the types of claim and area to give the proper contact info about the manager
- Developed Gradient Boosting Classification models (XGboost, Light Gradient Boost, Catboost) to solve the unnecessary dispatches for V-repair problem
- Have used AWS Sagemaker for Data processing, develop ML model, hyper-parameter tuning to optimize the result, Model evaluation and deployment of the models in production
- Have developed Deep Learning models using Deep Learning framework Keras and PyTorch
- Have worked on User Analytics and developed a recommendation engine to navigate their need quicker and more efficiently for the chat bot users
- Developed Regression models to predict remaining useful lifetime (RUL) of the utility pole, and developed the survival models for the prediction of failure probability over time.
- Develop Spark/Scala, Python, R for regular expression (regex) project in the Hadoop/Hive environment with Linux/Windows for big data resources. Use clustering technique K-Means to identify outliers and to classify unlabeled data.
- Built Classification models to predict failure within given time window
- Used Anomaly detection techniques for flagging anomalous behavior of the pole
- Used classification techniques including Random Forest and Logistic Regression to quantify the likelihood of each user failure of the circuit for given region
- Deployed the model Flask API, and Plumber for the web application.
- Implemented a Convolutional Neural Network, RNN LSTM for a multiclass text classification problem using keras/Tensorflow
Confidential, Cary, NC
Data Scientist
Responsibilities:
- Perform Data Profiling to learn about user behavior and merge data from multiple data sources.
- Implemented big data processing applications to collect, clean and normalization large volumes of open data using Hadoop ecosystems such as PIG, Hive and HBase.
- Hands on experience in using HIVE, Hadoop, HDFS and Bigdata related topics.
- Designing and developing various machine learning frameworks using Python, R, and MATLAB.
- Integrate R into MicroStrategy to expose metrics determined by more sophisticated and detailed models than natively available in the tool.
- Participated in Data Acquisition with Data Engineer team to extract historical and real-time data by using Sqoop, Pig, Flume, Hive, MapReduce and HDFS.
- Worked on different data formats such as JSON, XML and performed machine learning algorithms in Python.
- Worked as Data Architects and IT Architects to understand the movement of data and its storage and ERStudio9.7.
- Processed huge datasets (over billion data points, over 1 TB of datasets) for data association pairing and provided insights into meaningful data association and trends
- Coded R functions to interface with Caffe Deep Learning Framework.
- Developed cross-validation pipelines for testing the accuracy of predictions
- Enhanced statistical models (linear mixed models) for predicting the best products for commercialization using Machine Learning Linear regression models, KNN and K-means clustering algorithms.
- Participated in all phases of datamining, data collection and data cleaning, developing models, validation and visualization and performed Gap analysis.
- Developed highly scalable classifiers and tools by leveragingmachine learning, Apache spark & deep learning.
- Data manipulation and Aggregation from a different source using Nexus, Toad, BusinessObjects, PowerBL and SmartView.
- Developed NLP models for Topic Extraction, Sentiment Analysis.
- Independently coded new programs and designed Tables to load and test the program effectively for the given POC's using with Big Data/Hadoop.
- Develop documents and dashboards of predictions in MicroStrategy and present it to the business intelligence team.
- Skilled in using collections in Python for manipulating and looping through different user-defined objects
- Designed and managed API system deployment using a fast HTTP server and Amazon AWS architecture.
- Developed various QlikView Data Models by extracting and using the data from various sources files, DB2, Excel, Flat Files, and Bigdata.
- Ensemble Deep learning model, CRF model and NLP techniques to improve the model results.
- Good knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, Secondary Name Node and MapReduce concepts.
- As Architect delivered various complex OLAPdatabases/cubes, scorecards, dashboards and reports.
- Implemented Classification using supervised algorithms like Logistic Regression, Decision trees, KNN, Naive Bayes.
- Extracted data fromHDFSusingHive,Prestoand performed data analysis usingSparkwithScala,pySpark,Redshift and feature selection and created nonparametric models inSpark.
- Application of various Artificial Intelligence (AI)/machine learning algorithmsandstatisticalmodeling like decision trees, text analytics, Image and Text Recognition usingOCRtools likeAbbyy, natural language processing (NLP), supervised and unsupervised,regressionmodels.
Confidential
Jr Python Developer
Responsibilities:
- Performed Systems Analysis and detail application design and designed various Databases in MySQL, Oracle.
- Implemented threshold points to secure the code for crash free run environment.
- Worked with Pandas to monitor, migrate and develop table database.
- Implemented SQL scripts and quarries in Python code to work with various databases and data sources.
- Facilitated in the overall management of projects including risk analysis and mitigation plan, status reports, client presentations, defining milestone deliverables and establishing critical success factors.
- Responsible for writing detailed descriptions of user needs, program functions, and steps required for developing or modifying software programs using MS Visio
- Planning, Analysis and tracking the progress of implementation daily. Ensuring no bottle necks in requirements
- Performed data migration and data filter from various data sources and build different databases to store the Raw data and filtered Data.
- Developed entire frontend and backend modules usingPythonon Django Web Framework.
- Designed and developed the UI of the website using HTML, CSS and JavaScript
- Worked on CSS Bootstrap to develop web applications.
- Designed and developed Web services using XML and jQuery.
- Improved performance by using more modularized approach and using more in-built methods.
- Experienced in Agile Methodologies and SCRUM Process.
- Maintained program libraries, users' manuals, and technical documentation.
- Wrote unit test cases for testing tools.
- Involved in entire lifecycle of the projects including Design, Development, and Deployment, Testing and Implementation and support.
- Used multi-threading for simple optimization, where each sub thread is waiting for a URL to resolve and respond, to put its contents on the queue.
- Created database using MySQL and wrote several queries to extract data from database.
- Develop remote integration with third party platforms by using Restful web services and Successful implementation of Apache Spark and Spark Streaming applications for large scale data.
- Held overall responsibility of coordination of implementations of software builds and releases and provided analytics to help determine optimal way to assemble releases.
- Tracked any additions, deletions or change in scope on the published release plan including efficient tracking of defects.
- Complicated in Unit, integration, and smoke testing.
- Involved in the Complete Software development life cycle (SDLC) to develop the application.
- Restful web services using Python REST API Framework.
- Designed and built a reporting module that uses Apache Spark SQL to fetch and generate reports.
- Designed and developed data center management system using MySQL.
- Rewrite existing Python modules to deliver certain format of data.
- Used Django Database API's to access database objects.
- Wrote python scripts to parse XML documents and load the data in database.
- Developed SQL Queries, Stored Procedures, and Triggers Using Oracle.
- Handled all the client-side validation using JavaScript.
- Using Subversion version control tool to coordinate team-development.
