Data Scientist Resume
TN
SUMMARY:
- Around 5 years of IT Industry experience and as Data Scientist with specialization in implementing advanced Machine Learning and Natural Language Processing algorithms upon data from diverse domains and building highly efficient models to derive actionable insights for business environments leveraging exploratory data analysis, feature engineering, statistical modeling and predictive analytics.
- Experienced in implementing entire data science project life cycle and actively involved in all the phases including data extraction, data cleaning, statistical modelling and data visualization with large data sets of structured, unstructured data.
- Experience in problem solving, Data science, Machine learning, Statistical inference, predictive analytics, descriptive analytics, prescriptive analytics, graph analysis, natural language processing with extensive experience in predictive analytics and recommendation.
- Built visualizations to convey technical research findings, and to ensure alignment of Data Science Ideas with client use cases.
- Created Machine Learning and NLP solutions for Big Data on top of Spark using Scala.
- Extensive experience in Text Analytics, developing different Statistical Machine Learning, Data Mining solutions to various business problems and generating data visualizations using R, Python.
- Experienced with Machine Learning Algorithms such as Logistic Regression, KNN, SVM, Random Forest, Neural Network (CNN, RNN, LSTM), Linear Regression, Lasso Regression, K - Means, PCA
- Ability to use dimensionality reduction techniques and regularization techniques.
- Designed and provisioned the platform architecture to execute Hadoop, Spark and Machine Learning use cases under Cloud infrastructure, AWS, EMR, and S3
- Build ML pipelines and continuous integration and continuous deployment workflows
- Proficient in data entry, data auditing, creating data reports & monitoring data for accuracy Ability to extract Web search and data collection, Web data mining, Extract database from website, Extract Data entry and Data processing.
- Excellent communication and presentation skills along with good experience in communicating and working with various stakeholders.
TECHNICAL SKILLS:
Languages: Python(Pandas, Numpy, SciPy, Pandas, Scikit-learn, Matplotlib, Seaborn, ggplot2, NLP, TensorFlow, Keras), R, Scala
Machine Learning: Regression, Classification, Clustering, Logistic Regression, K-Means, Simple Linear Regression, Polynomial Regression, Multiple linear Regression, Decision Tree, Na ve Bayes, Random Forest, KNN, Kernel SVM, CNN, RNN, LSTM
Big Data: Spark, Hadoop, HDFS, Hive, HBase, Pig, PySpark
Databases: Oracle, MS SQL Server, MS Access, DB2
Operating Systems: Windows, Unix, Linux
Tools: Jupyter Notebook, Google Collab, Github, SQL, Tableau
ML Platforms: AWS, GCP, MLFlow
Web Technologies: Web services, Web API(RESTFUL), XML, JSON, YAML, FastAPI
CI/CD: Docker, Kubernetes
AWS Services: EMR, Redshift, RDS, Athena, Glue, S3, Lambda
PROFESSIONAL EXPERIENCE:
Confidential
Data Scientist
Responsibilities:
- Involved in extensive hoc reporting, routine operational reporting and data manipulation to produce routine metrics and dashboards for management
- Implemented complete data science project involving data acquisition, data wrangling, exploratory data analysis, model development and model evaluation.
- Utilized Apache Spark with Python to develop and execute Big Data Analytics and Machine Learning applications, executed Machine Learning use cases under Spark ML and MLlib.
- Involved in running MapReduce jobs for processing millions of records.
- Worked on data discovery, handling structured and unstructured data, cleaning and performing descriptive analysis, and preparing data sets.
- Created dynamic linear models to perform trend analysis on customer transactional data in Python.
- Developed in Python program for manipulating the data reading from various Teradata and converting them as one CSV file.
- Performed K-means clustering, Multivariate analysis and Support Vector Machines in Python.
- Created various types of data visualizations using Python libraries and Tableau.
- Created parameters, action filters and calculated sets for preparing dashboards and worksheets in Tableau.
- Worked on different data formats such as JSON, XML and performed machine learning algorithms in Python.
- Performed Statistical analysis and leveraged appropriate Data Visualization techniques to deliver meaningful insights of the data.
- Developed NLP methods that ingest large unstructured data sets, separate signal from noise, and provide personalized insights at the patient level that directly improve our analytics platform.
- Performed Logistic Regression, Random forest, Decision Tree, SVM to classify package is going to deliver on time for the new route.
- Worked on data profiling and data validation to ensure the accuracy of the data between the warehouse and source systems.
- Designed & developed scalable ML pipeline using Spark and HDFS on AWS EMR.
- Developed MapReduce pipeline for feature extraction using Hive.
- Created quality scripts using SQL and Hive to validate successful data load and quality of the data.
- Implemented data refreshes on Tableau Sever for biweekly and monthly increments based on business change to ensure that the views and dashboards were displaying the changed data accurately.
- Actively involved in Analysis, Development and Unit testing of the data and delivery assurance of the user story in Agile Environment.
Environment: Machine Learning, Deep Learning, Python, Spark, K-Learn, R, Tableau, JSON, XML, SQL, Hive, Tableau, Agile and Windows.
Confidential, TN.
Data Scientist
Responsibilities:
- Built models using Statistical techniques and Machine Learning classification models like XG Boost, SVM, and Random Forest.
- A highly immersive Data Science program involving Data Manipulation & Visualization, Web Scraping, Machine Learning, Python programming, SQL, GIT, Unix Commands, NoSQL and Hadoop.
- Found outlier, anomalies, trends in any given data set.
- Setup storage and data analysis tools in Amazon Web Services cloud computing infrastructure.
- Used Pandas, Numpy, seaborn, Scipy, matplotlib, scikit-learn, NLTK in Python for developing various machine learning algorithms.
- Extensively used open source tools - PyCharm (Python) for statistical analysis and building machine learning algorithms.
- Applied various machine learning algorithms and statistical modeling like decision tree, logistic regression, Gradient Boosting Machine to build predictive model using scikit-learn package in Python.
- Generated data analysis reports using Matplotlib successfully delivered and presented the results for C-level decision makers.
- Used Tableau and Python for programming for improvement of model.
- Translate business needs into mathematical model and algorithms and build exceptional machine learning and Natural Language Processing (NLP) algorithms (using Python, java and NLP modules).
- Implemented Agile Methodology for building an internal application.
- Implemented Classification using supervised algorithms like Logistic Regression, Decision trees, KNN, Naive Bayes.
- Updated Python scripts to match data with our database stored in AWS Cloud Search, so that we would be able to assign each document a response label for further classification.
- Data transformation from various resources, data organization, features extraction from raw and stored.
- Validated the machine learning classifiers using ROC Curves.
- Extracted data from HDFS and prepared data for exploratory analysis using data munging.
Environment: ER Studio, AWS, GIT, UNIX, Python, MLLib, Regression, Logistic Regression, Hadoop, OLTP, OLAP, HDFS, NLTK, SVM, JSON, XML.
Confidential, Columbus, GA.
Data Scientist
Responsibilities:
- Involved in gathering, analyzing and translating business requirements into analytic approaches.
- Worked with Machine learning algorithms like Neural network models, Linear Regressions (linear, logistic etc.), SVM's, Decision trees for classification of groups and analyzing most significant variables.
- Converted raw data to processed data by merging, finding outliers, errors, trends, missing values and distributions in the data.
- Implementing analytics algorithms in Python, R programming languages.
- Performed K-means clustering, Regression and Decision Trees in R.
- Worked on Naïve Bayes algorithms for Agent Fraud Detection using R.
- Performed data analysis, visualization, feature extraction, feature selection, feature engineering using Python.
- Generated detailed report after validating the graphs using Python and adjusting the variables to fit the model.
- Worked on Clustering and factor analysis for classification of data using machine learning algorithms.
- Used Power Map and Power View to represent data very effectively to explain and understand technical and non-technical users.
- Written MapReduce code to process and parsing the data from various sources and storing parsed data into HBase and Hive using HBase - Hive Integration.
- Used TensorFlow, Keras, Pandas, NumPy, SciPy, Scikit-learn, NLTK in Python for developing various machine learning algorithms such as Neural network models, Linear Regression, multivariate regression, naïve Bayes, random Forests, decision trees, SVMs, K-means and KNN for data analysis.
- Responsible for developing data pipeline with AWS S3 to extract the data and store in HDFS and deploy implemented all machine learning models.
- Worked on Business forecasting, segmentation analysis and Data mining and prepared management reports defining the problem; documenting the analysis and recommending courses of action to determine the best outcomes.
- Worked with risk analysis, root cause analysis, cluster analysis, correlation and optimization and K-means algorithm for clustering data into groups.
Environment: Python, Jupyter, HBase, HDFS, Hive, Pig, Power Query, Power Pivot, Power Map, Power View, SQL Server, MS Access.
Confidential, Plano TX.
Data Scientist
Responsibilities:
- Involved in defining the Source to business rules, Target data mappings and data definitions.
- Performing Data Validation /Data Reconciliation between disparate source and target systems for various projects.
- Identifying the Customer and account attributes required for MDM implementation from disparate sources and preparing detailed documentation.
- Prepared Data Visualization reports for the management using R.
- Used R machine learning library to build and evaluate different models.
- Utilized a broad variety of statistical packages like R, MLIB, Python and others.
- Performed data cleaning using R, filtered input variables using the correlation matrix, step-wise regression, and Random Forest.
- Performed Multinomial Logistic Regression, Random Forest, Decision Tree, SVM to classify package is going to deliver on time for the new route.
- Provides input and recommendations on technical issues to Business & Data Analysts, BI Engineers and Data Scientists.
- Segmented the customers based on demographics using K-means Clustering.
- Involved in loading data from RDBMS and weblogs int HDFS using Sqoop and Flume.
- Worked on loading the data from MySQL to HBase where necessary using Sqoop.
- Generating weekly, monthly reports for various business users according to the business requirements.
Environment: Python, R, ETL, Tableau, PL, SQL, HDFS, SQL, Machine Language, MS Excel and Windows.