Data Scientist Resume
Norfolk, VA
SUMMARY:
- 6+ years experience in Machine Learning, Data mining, Data Architecture, Data Modeling, Data Analysis, NLP with large data sets of Structured and Unstructured data, Data Validation, Predictive modeling, Data Visualization, Text mining to transposing words and phrases in unstructured data into numerical values.
- Worked with complex applications such as Python libraries to develop neural network, cluster analysis.
- Expertise in all aspects of Software Development Lifecycle (SDLC) from requirement analysis, Design, Development Coding, Testing, Implementation, and Maintenance, followed Agile methodologies
- Experience in designing stunning visualizations using Tableau software and publishing and presenting dashboards, Storyline on web and desktop platforms.
- Hands on experience in implementing Naive Bayes and skilled in Random Forests, Decision Trees, Linear and Logistic Regression, SVM, Clustering, neural networks, Principle Component Analysis and knowledge on Recommender Systems.
- Experienced with machine learning algorithm such as logistic regression, random forest, XGboost, KNN, SVM, neural network, linear regression, lasso regression and k - means.
- Developing Logical Data Architecture with adherence to Enterprise Architecture.
- Adept in statistical programming language like Python including Big Data technologies like Hadoop 2, HIVE, HDFS, MapReduce, and Spark.
- Experienced in Spark 21, Spark SQL and PySpark.
- Done Clustering, regression and Classification using Machine learning library MLlib(PySpark).
- Skilled in using Numpy, NLTK and Pandas in python for performing Exploratory data analysis.
- Very good experience and knowledge in provisioning virtual clusters under AWS cloud which includes services like EC2, S3, SageMaker and EMR.
- Good understanding of Teradata SQL Assistant, Teradata Administrator and data load/ export utilities like Fast Load, Multi Load.
- Proficient knowledge in statistics, mathematics, machine learning, recommendation algorithms and analytics with excellent understanding of business operations and analytics tools for effective analysis of data.
- Highly self-motivated, enthusiastic, and result-driven with the ability to effectively communicate with all levels of the organization including senior management and executives.
- Guide the development teams to break down large and complex user story into simplified versions for execution.
TECHNICAL SKILLS:
Programming: Python (NumPy, Pandas, Scikit-Learn, Matplotlib, Seaborn), SQL, PySpark, Scala, C++, Java, Javascript, HTML, CSS
Analytics and Visualization Tools: Tableau, MS Excel
Statistical methods: ANOVA, ARIMA, Regression Analysis, Hypothesis Testing, Time Series, Regression Models, Splines, Confidence Intervals, Principal Component Analysis and Dimensionality Reduction
Amazon Web Services: S3, EC2, EMR, Cloudformation, SageMaker and Rekognizer
Machine Learning Algorithms: Logistic Regression, Linear Regression, Decision Tree, Random Forest, Gradient Boosting, Nearest Neighbor Classifier, Weight of Evidence & Information Value (WOE & IV), K-means clustering, Affinity Propagation, Principal Component Analysis, Support Vector Machines, Naive Bayes, Auto Regression, Lasso Regression & Moving Averages.
Deep Learning: TensorFlow, Keras
Big Data Tools and Technologies: HDFS, PIG, MapReduce, HIVE, SQOOP, FLUME, HBase, Storm, Kafka, Elastic Search, Flume, Storm, Kafka, Elastic Search, Redis, Flume, Scoop.
Other Tools: Jupyter Notebook, Git Version Control, IPython Notebook, Unix, Visual Studio Code, Net beans, Visual Studio code
PROFESSIONAL EXPERIENCE:
Confidential, Norfolk, VA
Data Scientist
Responsibilities:
- Involved in all phases of data acquisition, data collection, data cleaning, model development, model validation, and visualization to deliver data science solutions.
- Created classification models to recognize web request with product association in order to classify the orders and scoring the products for analytics which improved the online sales percentage by 16.78%.
- Used Pandas, NumPy, Scikit-learn in Python for developing various machine learning models such Random forest and step-wise regression.
- Hands on experience in Dimensionality Reduction, Model selection and Model boosting methods using Principal Component Analysis (PCA), K-Fold Cross Validation and Gradient Tree Boosting.
- Implemented a structured learning method that is based on search and scoring method.
- Customer segmentation based on their behavior or specific characteristics like age, region, income, geographical location and applying Clustering algorithms to group the customers based on their similar behavior patterns.
- Worked on NLTK library in python for doing sentiment analysis on customer product reviews
- Worked on migrating MapReduce programs into Spark transformations using Spark and Scala, initially done using python (PySpark).
- Developed various Spark applications using Scala to perform various enrichment of these click stream data merged with user profile data.
- Developed highly optimized Spark applications to perform data cleansing, validation, transformation and summarization activities
- Data pipeline consists Spark, Hive and Sqoop and custom build Input Adapters to ingest, transform and analyze operational data.
- Created and maintained reports to display the status and performance of deployed model and algorithm with Tableau.
- Implemented Pearson's Correlation and Maximum Variance techniques to find the key predictors for the Regression models.
- Worked with numerous data visualization tools in python like matplotlib, seaborn.
Confidential, Miami, FL
Data Scientist
Responsibilities:
- Developed applications of Machine Learning, Statistical Analysis and Data Visualizations with challenging data Processing problems in sustainability and biomedical domain.
- Compiled data from various sources public and private databases to perform complex analysis and data manipulation for actionable results.
- Designed and developed Natural Language Processing models for sentiment analysis.
- Worked on Natural Language Processing with NLTK module of python for application development for automated customer response.
- Used predictive modeling with tools in Python.
- Worked with the Spark for improving performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Spark MLlib, Data Frame, Pair RDD's, Spark YARN.
- Applied concepts of probability, distribution and statistical inference on given dataset to unearth interesting findings through use of comparison, T-test, F-test, R-squared, P-value etc.
- Applied linear regression, multiple regression, ordinary least square method, mean-variance, theory of large numbers, logistic regression, dummy variable, residuals, Poisson distribution, Bayes, Naive Bayes, fitting function etc to data with help of Scikit, Scipy, Numpy and Pandas module of Python.
- Applied clustering algorithms i.e. Hierarchical, K-means with help of Scikit and Scipy.
- Worked on Clustering and classification of data using machine learning algorithms.
- Used Tensor Flow machine learning to create sentimental and time series analysis.
- Developed visualizations and dashboards using ggplot, Tableau
- Built and analyzed datasets using Python, Seaborn, and MatLab
- Applied linear regression in Python and SAS to understand the relationship between different attributes of dataset and causal relationship between them
- Designed and implemented a probabilistic churn prediction model with 100k customer data to predict the probability of customer churn out using Logistic Regression in Python. Client utilized the results in the business to finalize the list of customers to provide a discount.
- Pipelined (ingest/munge/clean/transform) data for feature extraction toward downstream classification.
- Expertise in Business Intelligence and data visualization using Tableau.
Confidential, Collegeville, PA
Data Analyst
Responsibilities:
- Developed Python modules, Machine learning & predictive analytics for day to day business activities.
- Worked on preprocessing of data which involves collecting, formatting, cleaning, aggregation, segregation of large volume of data and finally sampling data from it for performing statistical evaluations further inferred valuable conclusion from data.
- Developed Natural Language Processing to automate the classification of customer incident queries into levels of classes to improve the customer services
- Implemented number of customer clustering models and these clusters are plotted visually using Tableau legends for the higher management.
- Perform Exploratory analysis, hypothesis testing, cluster analysis, correlation, ANOVA, ROC Curve and build models in Supervised and Unsupervised Machine Learning algorithms, Text Analytics & Time Series Forecasting
- Implemented Porter Stemmer (Natural Language Tool Kit) and NLP bag of words models (CountVectorizer, IDF) to prepare the data.
- Implemented a machine learning model for customer sentiment pattern to better assess the heartbeat of the customer trend.
- Conducting studies, rapid plots and using advanced data mining and statistical modeling techniques to build a solution that optimizes the quality and performance of data.
- Developed Simple to midlevel Map Reduce Jobs using hive and Pig and developed multiple MapReduce jobs in python for data cleaning and preprocessing.
- Analyzing large data sets apply machine learning techniques and develop predictive models, statistical models and developing and enhancing statistical models by leveraging best-in-class modeling techniques.
- Worked with parameter tuning and model evaluation techniques Confusion Matrix, Cross validation. Customer Profiling models using K-means and K-means++ clustering algorithms to enable targeted marketing.
- Implemented dimensionality reduction using Principal Component Analysis and k-fold cross validation as part of Model Improvement.
- Worked with data visualization tools in python like matplotlib.
Confidential, Collegeville, PA
UI/ UX Developer
Responsibilities:
- Provided design expertise to the organization and work directly with web development and production teams
- Worked closely with the Product Manager and team leads to ensure we were developing world-class applications with UX/UI design expertise
- Collaborated with other designers, user researchers, game designers, engineering teams, and business/marketing stakeholders to prioritize UX activities throughout the game/application development life-cycle and deliver high quality experiences on schedule
- Worked in graphic design area, an excellent eye for typography, clean layout, purposeful color, and attention to detail. Developed deep appreciation for simple, fun, intuitive and usable interfaces
- Worked on in-house UI tools and scripting language where problems were solved using a combination of JavaScript, JSON, and JQUERY
- Having knowledge in HTML, CSS, browser compatibility and web standards for interactive prototypes, plus Adobe Creative Suite (primarily Photoshop) or similar tool for wireframe and static visual designs.