We provide IT Staff Augmentation Services!

Data Scientist Resume

SUMMARY

  • Around 8+ years’ experience in IT field with 4+ years’ experience as Data Scientist with strong technical and business experience, and communication skills
  • Expertise in Statistical analysis, Predictive modeling, Text mining, Supervised learning, UnsupervisedLearning, and Reinforcement learning
  • Strong mathematical background in Linear algebra, Probability, Statistics, Differentiation andIntegration
  • Extensively involved in Data preparation, Exploratory analysis and Predictive modeling with expert knowledge in building Propensity model .
  • Proficient in Data Mining Methods, Factor Analysis, ANOVA, Hypothetical testing, normal distribution
  • Experience with Deep learning techniques such as Convolutional Neural Networks, Recurrent Neural Networks by using Kera’s and TensorFlow
  • Worked on several python packages like NumPy, Matplotlib, Beautiful Soup, Pickle, SciPy, Python, PyTables etc.
  • Responsible for implementing data mining and statistical machine learning solutions to various business problems such as sales lead scoring, supply chain optimization, demand forecasting , and targeted marketing
  • Proficient in implementing Dimensionality Reduction Techniques like Principal Component Analysis, t - Stochastics Neighborhood Embedding (t-SNE), and Linear Discriminant Analysis (LDA)
  • Experience in implementing data analysis with various analytic tools, such as Anaconda 4.0, Jupyter Notebook 4.X.
  • Extensively worked on Spark using Scala on cluster for computational (analytics), installed it on top of Hadoop performed advanced analytical application by making use of Spark with Hive and SQL/Oracle .
  • Experience with statistical and quantitative modelling, forecasting, and trend analysis
  • Excellent understanding of Analytics concepts and Supervised Machine Learning algorithms like Logistic Regression, Linear regression, K-Nearest Neighbors, Naïve Bayes, Support Vector Machines, Decision Trees and Ensemble models: Random Forests, Gradient Boosted Decision Trees, Stacking Models.
  • Hands on advanced SQL experience summarizing, transforming, segmenting, joining datasets
  • Well experienced in Normalization & De-Normalization techniques for optimum performance in relational and dimensional database environments
  • Experience in balancing the datasets by using Resampling methods such as Oversampling and Under sampling techniques
  • Developed an algorithm in R that automated financial forecasting
  • Proficient in Natural Language methods for Sentiment Analysis using Word Embeddingslike Word2Vec, tf-idf and Glove Methods
  • Worked with NoSQL Database like DynamoDB, Redshift , MongoDB, HBase and Cassandra .
  • Implemented infrastructure as a code processes by creating templates, scripts to automate provisioning of services in GCP and AWS
  • Combining cybersecurity domain expertise and contemporary data science skills to enhance adversary detection, network defense, and SOC process improvement.
  • Building analytics leveraging heuristics and machine learning to identify malicious network traffic, endpoint behavior, user behavior, and files.
  • Developed Spark Applications by using Scala and Implemented Apache Spark data processing project to handle data from various RDBMS and Streaming sources.
  • Utilize automation tools such as Terraform, CloudFormation, Ansible etc. to provision resources
  • Implemented and monitoring and alerting of all services and applications hosted in GCP and AWS
  • Implemented sound access control IAM policies, custom IAM roles to manage access to resources in GCP and AWS, troubleshoot any access related issues for users
  • Adept in employing data visualization tools such as Tableau and Python libraries Matplotlib, Seaborn and Plotly to create visually appealing plots and interactive dashboards
  • Excellent Programming skills at a higher level of abstraction using Scala , Java and Python
  • Experience in working on both Windows, Linux platforms
  • Experience in working on Agile and Scrum methodologies
  • Actively involved in all phases of the Data Science project life cycle including Data Extraction (ETL), Cleaning, Preprocessing, Visualizing, Modelling and Version Control (GIT)

TECHNICAL SKILLS

Languages & Packages: Python, SQL, R, TensorFlow, PySpark, NumPy, Pandas, Kera’s, NLTK, Matplotlib, MySQL.

Machine Learning Algorithms: Logistic Regression, Linear Regression, K Means Clustering Algorithm, Decision Trees, Support Vector Machines, Naïve Bayes, Hierarchical Clustering, Density Clustering, Trigger Word Detection, Speech Recognition and Linguistic translation

Deep Learning Techniques: Artificial Neural Networks, Convolutional Neural Networks, Multi-layer Perceptron’s, Recurrent Neural Networks, LSTM, Back Propagation, Chain rule, Choosing Activation Functions, Drop Out, Optimization algorithms, Vanishing and Exploding gradient, Optimized Weight Initializations, Match Pooling, Batch Normalizations

SQL Server: NoSQL, Redshift, DynamoDB, MongoDB

Business Intelligence Tools: Tableau, Power BI, SAS, SSIS, SSRS

Visualization Tools: Tableau, Plotly

IDE: Jupyter Notebook, Spyder, Sublime text

Microsoft Stack & Version control: Microsoft Excel, Access, Visio, PowerPoint

Operating Systems: Windows, Linux

Methodologies: Agile, Scrum

PROFESSIONAL EXPERIENCE

Data Scientist

Confidential

Responsibilities:

  • Performed Data Cleaning, Feature Scaling, Feature Engineering and Exploratory Data analysis to maximize insight, detect outliers and extract important features for modelling
  • Implemented Principal Component Analysis (PCA) and t-Stochastics Neighbor Embedding (t-SNE) dimensionality reduction algorithms to achieve reduced datasets
  • Implemented Clustering algorithms for market segmentation to analyze customer behavior
  • Trained several machine learning models on selected features to predict Customer churn
  • Utilized cross-validation techniques and LASSO regularization to avoid overfitting, then evaluated the models adopting performance metrics more robust to imbalanced classes using Confusion Matrix and Classification Report
  • Tuned the model hyperparameters using Bayesian optimization and grid search to achieve higher levels of model performance
  • Improved model accuracy by 5% introducing Ensemble techniques: Bagging,Gradient, Xtreme Gradient and Adaptive Boosting
  • Feature engineered email data by employing NLP techniques like Word2Vec, BOW and tf-idf
  • Performed Sentiment Analysis on email feedback to understand the emotional tone behind words Utilized data visualization tools such as Tableau and Python’s vast data visualization libraries to communicate findings to the data science, marketing and engineering teams
  • Involved in various pre-processing phases of text data like Tokenizing, Stemming, Lemmatization and converting the raw text data to structured data
  • Generated ADHOC reports to the business teams in Tableau to make client make impactful data driven decisions
  • Used Tableau to convey the results by using dashboards to communicate with team members and with other data science teams, marketing and engineering teams.
  • Communicated the results with operations team for taking best decisions.

Environment: NumPy, Pandas, Matplotlib, Seaborn, scikit-learn, Tableau, Linux, Git,Microsoft Excel, HADOOP, PCA, Logistic Regression, CRF, Tensor Flow, Kera’s, Natural Language Tool Kit, Named Entity Recognition, Natural Language Generation, Git

Data Scientist

Confidential

Responsibilities:

  • Performed Data Profiling to learn about behavior with various features such as traffic pattern, location, time, Date and Time etc.
  • Application of various machine learning algorithms and statistical modeling like decision trees, regression models and K-Means using Python and R
  • Used clustering technique K-Means to identify outliers and to classify unlabeled data.
  • Ensured that the model has low False Positive Rate.
  • Created and designed reports that will use gathered metrics to infer and draw logical conclusions of past and future behavior.
  • Worked on feature engineering such as feature creating, feature scaling and One-Hot encoding withScikit-learn
  • Performed Logistic Regression, Random forest, Decision Tree, SVM to classify package is going to deliver on time for the new route.
  • Implemented public segmentation by implementing k-means algorithm.
  • Implemented rule-based expertise system from the results of exploratory analysis and information gathered from the people from different departments.
  • Generated detailed report after validating the graphs using R and adjusting the variables to fit the model.
  • Performed Data Cleaning, features scaling, features engineering using pandas and NumPy packages in python.
  • Created Data Quality Scripts using SQL and Hive to validate successful data load and quality of the data.
  • Written MapReduce code to process and parsing the data from various sources and storing parsed data into HBase and Hive using HBase - Hive Integration.
  • Created SQL tables with referential integrity and developed advanced queries using stored procedures and functions using SQL server management studio.
  • Used packages like Dplyr, tidyr& ggplot2 in R Studio for Data visualization and generated scatter plot and high low graph to identify relation between different variables.
  • Created various types of data visualizations using Python and Tableau.
  • Communicated the results with operations team for taking best decisions.
  • Collected data needs and requirements by Interacting with the other departments.

Environment: R, Python 2.x, Linux, Tableau Desktop, SQL Server.

Confidential, Cedar Rapids, Iowa

Data Scientist

Responsibilities:

  • Worked alongside the data engineering and data science teams to build high-performance, low latency systems to manage high velocity data streams
  • Cleaned and Processed the unstructured fraudulent wire extraction data via Tokenizing, Stemming and Parts of Speech tagging to extract customer and bank information
  • Employed Regular Expressions and built Named Entity Recognition models using the NaturalLanguage Tool Kit (NLTK) andSpaCyto pull out relevant customer information
  • Built a Conditional Random Fields (CRF) model in scikit-learn for pattern recognition and compared the model predictions against actual outputs
  • Employed python’s visualization libraries to draw patterns from customer credit history and payment activities on historical data to analyze customer behavior and generate reports to the business team
  • Assigned probability scores to credit card applicants based on their feature attributes to aid client make better decisions regarding an applicant’s credibility
  • Built NLP pipelines and collaborated with the DEVOPS team to deploy code into production
  • Performed web scraping from google alert links to pull out information for analysis
  • Utilized Random Forests and SVM’s to classify if the crime in question is of a Financial nature
  • Extracted fraudster information for financial crimes and wrote SQL queries to perform Teradata search to determine if the concerned person is a Fidelity customer
  • Utilized results from Teradata search and customer information from Fraudulent wire Transactions to generate auto narratives of potential threats
  • Conceptualized and implemented Artificial Neural Networks as well as LSTM’s via dense Recurrent Neural Networks into the pipeline to process continuous data and gather informationin sequence
  • Created distributed Tensor Flow environments across multiple CPUs and GPUs to run in parallel

Environment: NumPy, Pandas, Matplotlib, Seaborn, scikit-learn, Tableau, Linux, Git,Microsoft Excel, HADOOP, PCA, Logistic Regression, CRF, Tensor Flow, Kera’s, Natural Language Tool Kit, Named Entity Recognition, Natural Language Generation, Git

Confidential

Data Analyst

Responsibilities:

  • Involved in designing Context Flow Diagrams, Structure Chart and ER - diagrams
  • Implementation of full lifecycle in Data warehouses and Business Data marts with Star Schemas, Snowflake Schemas, SCD & Dimensional Modelling
  • Conducted sessions, wrote meeting minutes and documented the requirements
  • Collected requirements from business users and analyzed based on the requirements
  • Extracted data from various sources (SQL Server, Oracle, text files and excel sheets), used ETL load scripts to manipulate, concatenate and clean source data
  • Involved in the data transfer, creating tables from various tables, views, procedures and SQL scripts
  • Ensure the business metadata definitions of all data attributes and entities in a given data model are documented to meet standards
  • Involved in database development by creating PL/SQL Functions, Procedures, and Packages, Cursors, Error handling and views
  • Involved in creating scripts for data manipulation and management
  • Extensive system study, design, development and testing were carried out in the Oracle environment to meet the customer requirements
  • Serve as a member of a development team to provide business data requirements analysis services, producing logical and Physical data models using Erwin 7.1.
  • Responsible for defining database schemas to support business data entities and transaction processing requirements
  • Responsible for executing customized SQL code for ad hoc reporting duties and used other tools for routine report generation
  • Applied organizational best practices to enable application project teams to produce data structures that fully meet application needs for accurate, timely, and consistent data that fully meets its intended purposes

Hire Now