We provide IT Staff Augmentation Services!

Sr. Data Scientist Resume

5.00/5 (Submit Your Rating)

Plano, TX

SUMMARY

  • Have more than 9 years extensive experience in Data scientist and machine learning engineer, and data analytics, with proven technical and analytical abilities. Strong Analytics and data management skills.
  • Have 5+ years of data scientist experiences in various organizations and more than 5 years of data analysis and management experiences during my PhD work in physics, my experience includes strong problem solving, and advance statistical analysis skills and abilities.
  • Furthermore, finished the computer simulation of filtering, guiding, trapping, and trajectory of cold molecular nitric oxide (NO) in the electrostatic, magnetic and electromagnetic field during my graduate work as PhD student.
  • Have done a lot of prediction modelling work, analytical work with our lab data that corresponds to the transition of elections from one energy state to other energy state when the molecules/atoms are exposed to laser light. Predicted the electron density, transition probability, and energy level in the different electronic states from the data collected using lab view in the lab and the data calculated from the theoretical model that I developed.
  • Have practical understanding of statistical modelling and supervised/unsupervised/ reinforcement machine learning techniques with keen interests in applying these techniques to predictive analytics world.
  • Good for engineering, developing, deploying, and maintaining business systems with technical expertise including hands on solution development and implementation experience. I have ability to work effectively with diverse team of co - workers, and researchers, enthusiastic and self-motivated; flexible with adjusting to work schedules and teamwork needs; project management skills; research driven personality.
  • Have ability to adapt to a fast pace and dynamic work environment with interpersonal, leadership and co-ordination skills.
  • Exploring opportunities in collaborative environments to develop solutions for requirements including designing, coding and testing innovative applications in areas such as data science, artificial intelligence, deep machine learning, computational linguistics, Natural Language Processing (NLP), advanced and semantic information search, extraction, induction, classification and exploration.

TECHNICAL SKILLS

Data Analysis and Data Science: Generalized Linear Models, Boxplots, K-Means, Clustering, SVN, PuTTY, WinSCP, Redmine (Bug Tracking, Documentation, Scrum), Neural networks, AI, Teradata, Tableau, H2O flow, Splunk, GitHub, Linear, regression, k-means Clustering, Deep neural network, Logistic regression, Decision Tress, Random Forests, KNN, XGBoost, Ensembles (Bagging, Boosting), Support Vector Machines, Neural Networks, graph/network analysis, and time series analysis(ARIMA model)

SQL: MySQL, SQL Server, Oracle, SQLite, PostgreSQL

Programming Language: R - (Packages Stats, Zoo, Matrix, data, table, OpenSSL). Java 1.7,1.8, maven, Scala, spark 2, 2.3, Spark Sql, Spark Streaming, Hadoop, MapReduce, HDFS, Eclipse, Anaconda, Jupyter notebook

Python Versions: Python 2.7 and 3.6, (Packages NumPy, SciPy, Pandas, scikit-learn, Matplotlib, seaborn, Sklearn, stats models).

Microsoft Office: Excel, PowerPoint, Zeppelin, Tableau Desktop (Version 10.0), MATLAB, Windows 8 and 7, LINUX Ubuntu and Mac.

PROFESSIONAL EXPERIENCE

Confidential, Plano, TX

Sr. Data Scientist/ Machine Learning Engineer

Responsibilities:

  • Used and designed logistic regression, multivariate regressions, clustering algorithms, NLP, dynamic programming, support vector machines, ensemble trees, simulation, scenario analysis, modeling, neural networks, GLM/Regression, Random Forest, Boosting, text mining, social network analysis, and neural networks by using the standard machine learning python packages, such as pandas, numpy, Jupyter Notebook, scipy, scikit-learn, TensorFlow, keras, and Theano.
  • Worked for Hyperparameter tuning, debugging and troubleshooting machine learning models and automated processes to optimize the performance.
  • Worked with data mining, data collection, data cleansing, dataset preparation, ETL, store information for machine learning models in an RDBMS database using SQL
  • Used predictive modeling, machine learning and statistics, trend analysis, and other data analysis techniques to collect, explore, and identify the data to be analyzed from internal and external sources to explain customer behavior and segmentation, text analytics and big data analytics, sales analysis, product level analysis and customer experience analysis.
  • Solved problems in NLP such as language modeling, intent recognition, and entity extraction and developed prototypes, implement complete solutions for Google's dialogflow Chatbot NLP problems
  • Delivered analytics use cases with Aster analytics functions primarily nPath, Text Parser, Collaborative Filtering, SQL MR, Query vectors, Regression Models, Correlations, Pattern Matching, and Text Mining. Aster ACT query tool was used for bulk loading data from aster to aster and aster to Teradata. BTEQ utility in Teradata was used to run any DDL statement, DML statement, to import data into Teradata tables from flat file and to extract data from tables into files or reports.
  • Designed, developed, and tested applications for text processing, such as entity matching, text categorization, named-entity extraction, sentiment analysis, customer segmentation, customer behavior pattern using advanced SQL, statistical and machine learning methods in the Aster SQL-MapReduce framework such as K-means, minhash, kmodes, frequent paths, sessionize, time series analysis (ARIMA model ) and other data science models.
  • Understood customer business use cases and be able to translate them to analytical data applications and data science models. Utilized the Aster technology, combining aster Teradata database and SQL MR functionality, to deliver innovative analytic solutions to the customers
  • Used different aster analytical function such as text analysis, KNN, linear and logistic regression, random forest, Adda boost, XGBoost, decision tree, support vector machine, and other data science models for prediction and data mining using Aster Sql in Teradata studio.
  • Created complex SQL views and stored procedures in Teradata sql and implemented it to create the regex for large number of columns of more than 7K. Programmed stored procedures, user defined functions, database triggers and packages and all the database objects needed in Teradata environment to implement business logic. Implemented stored Procedures, Functions, Views, User Profiles, and Data Dictionaries. Used stored procedures, wrote new stored procedures, modified existing ones, and tuned them such that they perform well.
  • Created the SSH tunnel using PuTTY. Used putty as secure connection and created .sh file to implement the development work in the UAT, DEV and PROD environment.
  • Autosys was used to define, schedule, monitor, control workload, and reporting of the job for the automated job control system
  • Develop responsive web apps and integrate APIs using NodeJS and worked for Predictive Maintenance Systems (PMS) to monitor for future system failures using machine learning.
  • Created Google's dialogflow Natural language processing and Machine learning chatbots and recommended Google's dialogflow chatbot for the company.
  • Exploratory Data Analysis (EDA), and classification, regression machine learning problems was solved using Spark, and PySpark Dataframe in Python
  • Worked to create application that use deep learning (LSTM recurrent neural network (RNN) ) sequence to sequence (Seq2Seq) model. Convolutional Neural Networks (ConvNets or CNNs) was used for image recognition and classification.
  • Designed and evaluated an encoder-decoder network with and without attention for the sequence prediction problem. Used encoder-decoder recurrent neural network with attention to improve the model accuracy.

Confidential, Irving, TX

Sr. Data Scientist

Responsibilities:

  • Involved in the design, implementation, development & integration of an Artificial Intelligence solution. Utilized statistical Natural Language Processing for sentiment analysis, mine unstructured data, and create insights; analyze and model structured data using advanced statistical methods and implement algorithms.
  • Python libraries such as pandas, sklearn, Scikit-learn, numphy, scipy, keras, tensorFlows was used to predict categories based on location, time and some other features by Linear, regression, k-means Clustering, Deep neural network, Logistic regression, Decision Tress, Random Forests, KNN, XGBoost, Ensembles (Bagging, Boosting), Support Vector Machines, Neural Networks, graph/network analysis, and time series analysis(ARIMA model)
  • Trained large list of models, evaluated and compared and selected the best models for prediction and forecasting. Established and maintain effective processes for K - fold validating, confusion matrix and updated and validated different models. Developed Convolutional Neural Networks (CNNs) in Python using the TensorFlow deep learning library to address the image and recognition problem
  • Developed LSTM recurrent neural network (RNN) in Python using the Keras and tensorflow deep learning library to address the time-series prediction problem such as to predict the sale by 2022. Used LSTM recurrent neural network (RNN) to solve the use case such as natural language processing (NLP)
  • H2O Flow Driverless AI automation tool was used for data science and machine learning workflows such as feature engineering, model validation, model tuning, model selection, and model deployment,, and visualization, It was also used to improve the model for data analytics and prediction.
  • Built Ensemble Models in machine learning such as Bootstrap aggregating (bagging - Bagged Decision Trees and Random Forest) and Boosting (Gradient boosting, XGBoost and AdaBoost) to improve accuracy, reduce variance and bias, and improve stability of a model, Random forest, select K best, RFE was used for feature selection process.
  • Familiar with Azure Machine Learning Model Management to manage and deploy machine-learning workflows and models into production
  • Used Spark SQL for ETL of raw data. Worked for feature selection, data wrangling, and feature extraction and worked on ETL on Hadoop.
  • Tableau was used to connect to files, relational and Big Data sources to acquire, visually analyze, and process data. Tableau was also used to create and distribute an interactive and shareable dashboard to see the trends, variations, and density of the data in the form of graphs and charts.
  • Took sole ownership of the analytics solution from requirements through to delivery.
  • Wrote database complex SQL queries in Oracle, PostgreSQL, MySQL and developed data models for data analysis and extraction.
  • Familiar with AR (Autoregressive), MA (Moving Average), and ARIMA (Autoregressive Integrated Moving Average) time series analysis models. CVS market demand was predicted using ARIMA time series analysis and forecasting statistical model with grid search.
  • Captured and elaborated analytics solution requirements, working with customers and product managers. Created advanced analytics solution. Zeppelin, Spark SQL and Spark MLLib was combined to simplify exploratory Data Science
  • Apache Zeppelin web-based notebook was used to bring data ingestion, data exploration, visualization, sharing and collaboration features to Hadoop and Spark.
  • Experience with Big data related techniques i.e., Hadoop, MapReduce, NoSQL, Pig/Hive, Spark, Spark MLlib. Have programming knowledge in Java, Scala, spark, Sql and python
  • Kafka was used as message broker to collect large volume of data and to analyze the collected data in the distributed system. Business data was feed into Kafka and then processed using Spark Streaming in Scala for real-time analytics and data science.
  • Spark Streaming was used to populate real-time sentiment analysis, crisis management and service adjusting. Amazon Simple Storage Service (Amazon S3) was used to store and retrieve data. GitHub was used for version control.
  • Worked with cloud computing to store, retrieve, and share large quantities of data in AWS is the Amazon S3 object store. Read and wrote to S3 from Apache Hadoop, Apache Spark, and Apache Hive. PCA was used for dimensional Reduction and created the K-means clustering.
  • Familiarity with NoSQL databases, graphical analyses. Evaluated and performed POC on new strategic technical products and applications.
  • Apache Flume, and Apache Sqoop data loading was used to load both structured and unstructured streaming data to HDFS, hive and HBase.
  • Worked in the Agile methodology with self-organizing, cross-functional teams sprint towards results in fast, iterative, incremental, and adaptive steps
  • Developed scalable, reliable and efficient enterprise applications using Java, Spring, Spring MVC, Hibernate, Web Services, RESTful, JSF, JDBC, JSP, Servlets, EJB, JMS, XML JAXB, SQL, JAX-WS Unix Shell scripting. Jenkins tool was used to perform continuous integration and build automation
  • Crated Micro services with Spring Boot based on RESTful API. Used Log4j framework to log/track application. Wrote and Tested Restful Web services and SOAP in SOAPUI
  • Splunk was used to search, investigate, troubleshoot, monitor, visualize, alert, and report machine-generated big data. Splunk Enterprise Security (ES). was used to identify and track security incidents, analyze security risks, use predictive analytics, and threat discovery.
  • K-nearest neighbor, k-means Clustering, Support Vector Machine (SVM) was used for anomaly detection. k-means Clustering was used for Customer Segmentation based on sale behavior and text mining by Clustering Text Documents in Scikit-learn
  • Designed, developed, and implemented end-to-end cloud-based machine learning production pipelines (data exploration, sampling, training data generation, feature engineering, model building, and performance evaluation)
  • Worked with different clustering methods such as K-means, DBSCAN, and Gaussian mixture.
  • YAML data serialization language was used for configuration files and debugging output.
  • Develop responsive web apps and integrate APIs using NodeJS. Created the POC for Google's dialogflow chatbots
  • Created and Presented Google's dialogflow chatbots efficiency reports to higher Management Develop system flow diagrams to automate a business function and identify impacted systems; metrics to depict the cost benefit analysis of the solutions developed.
  • As a data curator, worked with data engineers, data analysts, and data scientists, to develop a deep understanding of how data is used by the business and IT to make the data available.

Confidential, Philadelphia, PA

Sr. Data Scientist/ Machine Learning Engineer

Responsibilities:

  • Gathered, documented, and implemented business requirements for analysis or as part of a long-term document/report generation. Analyzed large volumes of data and provide results to technical and managerial staffs.
  • Worked with both unstructured/structured data Machine Learning Algorithms such as Linear, Logistic, Decision Tress, Random Forests, Support Vector Machines, Ensemble models, Neural Networks, KNN, and Time series analysis. Evaluated, tested, compared, and validated these different models before selecting the best model for prediction.
  • Used pandas, python, Scikit-learn, numphy, scipy, the statsmodels, other python library to build predictive forecasting for time series analysis using AR (Autoregressive), MA (Moving Average), and ARIMA (Autoregressive Integrated Moving Average) models.
  • K-fold cross Validation technique was used to improved model performance and to test the model on the sample data before finalizing the model.
  • Confusion Matrix and ROC Chart were used to evaluate the classification model.
  • Have knowledge of Numerical optimization, Anomaly Detection and estimation, A/B testing, Statistics, and Maple . Collaborated with product management and other departments to gather the requirements.
  • Used K-means clustering for grouping similar data and documented.
  • Worked for feature engineering that involves converting the arbitrary data to well-behaved data such as dealing with categorical features, text features, image features, and missing data
  • Worked on data generation, machine learning for Anti-Fraud detection, Data Modeling, operations decisioning, and loss forecasting such as product-specific fraud
  • Monte Carlo simulation algorithms were used to obtain numerical results by running simulations many times in succession in order to calculate probabilities with machine learning.
  • Wrote complex SQL programming language to interact with database with more than 98 million rows using Teradata studio
  • Extracted, transformed, and loaded data (ETL) in Postgres data base using Python scripts.
  • Build and maintain SQL scripts, indexes, and complex queries for data analysis and extraction.
  • Trained and supervised learning up to 8 other team members for the SQL/Scala/Spark programming language and assist in the installation and upgrading of Python, Scala, Sql, Java and Spark. Worked with various data pools and DBAs to have access to data.
  • Splunk ES was used for application management, security, performance management, and analytics for the public APIs. Splunk was used for collecting, indexing, monitoring, and visualization of the data.
  • Paxata was used to combine, clean and shape the data prior to analytics and it was also used to bring together data, find and fix dirty or missing data, and share and re-use data projects across teams.
  • Kibana and Tableau was used for Business Intelligence tool for visually analyzing the data and to shows the trends, variations and density of the data in form of graphs and charts
  • Kafka and spark streaming were used to collect large volume of data and to analyze the collected data in the distributed system. Spark MLlib was used for model evaluation and prediction. The Natural Language Toolkit (NLTK) was used for data extraction for sentimental analysis. Experiences D3, Django web app.
  • Keras along with numerical computation libraries such as Theano and TensorFlow was used for developing and evaluating deep neural network models.
  • Worked with public/private Cloud Computing technologies (IaaS, PaaS & SaaS) and Amazon Web Services (AWS) and worked for customer analytics and predictions
  • MapReduce/Hive/Pig, Hadoop was used to store, process and analyze huge amount of unstructured data having 100 million users to offer a gift card to its top 10 customers.
  • Worked with Core Java, spring, Spring MVC, Hibernate, Spring Restful web services to create the application and to get the data from database for data analytics. Wrote and Tested Restful Web services in SOAPUI
  • Formulated procedures for integration of R programming plans with data sources and delivery systems and R language was used for prediction.
  • Experience with continuous integration, test and deployment. Worked in agile software development paradigm (e.g., Scrum). Be up-to-date knowledge in the appropriate technical area
  • Hands-on experience with rich UI and stunning data visualization using JavaScript
  • NLTK open source natural language processing (NLP) platform in Python was used for textual tokenization for the sentiment analysis the text message
  • Used Google Cloud Platform services for compute, storage, networking, Big data, machine learning as well as cloud management, security and developer tools

Confidential, New York

Sr. Data Scientist / Machine Learning Engineer

Responsibilities:

  • Performed advanced statistical analysis, including generalized Linear, regression, Logistic regression, Decision Tress, Random Forests, Ensembles (Bagging, Boosting), Support Vector Machines, Neural Networks, KNN, K means clustering, XGBoost, graph/network analysis, and time series analysis using pandas, python, and Scikit-learn, numphy, scipy and other python library to develop predictive models.
  • Worked with Big data related techniques i.e., Hadoop, MapReduce, NoSQL, Pig/Hive, Spark/Shark, MLlib and Scala, numpy, scipy, Pandas, scikit-learn.
  • Used query languages such as SQL, Hive, Pig and experience with NoSQL databases, such as MongoDB, Cassandra, HBase
  • Worked in agile software development paradigm (e.g., Scrum. Be up-to-date knowledge in the appropriate technical area. Develop all weekly, monthly and quarterly reports and dashboards
  • Used complex Excel formulas and pivot tables to manipulate large datasets.
  • Splunk ES was used for application management, security, performance management, and analytics for the public APIs. Splunk was used for collecting, indexing, monitoring, and visualization of the data.
  • K fold Cross Validation technique was used to improved model performance and to test the model on the sample data before finalizing the model. Confusion Matrix Gain and Lift Charts, K-S Chart, and ROC Chart were used to evaluate the classification model.
  • QlikView and Tableau was used to create guided analytics applications as well as dashboards designed for business challenges
  • Worked for feature engineering that involves converting the arbitrary data to well-behaved data such as dealing with categorical features, text features, image features, and missing data.
  • Tableau and Kibana was used for data visualization such as trend lines, charts and graphs
  • Collaborated with others in data science and analytics on data mining and predictive modeling as required, Created anomaly detection systems and constant tracking of its performance
  • Selected features, building and optimizing classifiers using machine learning techniques.
  • Enhanced data collection procedures to include information that is relevant for building analytic systems Processing, cleansing, and verifying the integrity of data used for analysis
  • Worked with team to select and implement model development process from statistics and/or machine learning to answer business problems.
  • Extracted data from a variety of relational databases, manipulate, explore data using quantitative, statistical and visualization tools. Wrote complex SQL programming language to interact with Aster database with more than few million rows using Teradata studio
  • Developed scalable, reliable and efficient enterprise applications using Java, Spring, Spring MVC, Hibernate, Web Services, RESTful, JSF, JDBC, JSP, Servlets, EJB, JMS, XML, XSLT, JAXB SQL, JAX-WS Unix Shell scripting. Exposing the Micro services with Spring Boot based services based on RESTful API utilizing Spring Boot with Spring MVC. Used Log4j framework to log/track application.
  • Wrote and Tested Restful Web services and SOAP in SOAPUI

Confidential, Wilmington, DE

Data Scientist/Big data Analyst

Responsibilities:

  • Gathered information from various programs, analyzed time requirements and prepared documentation to change existing programs.
  • Created and used advanced machine learning algorithms and statistics: regression, simulation, modeling, clustering, decision trees, neural network, GLM/Regression, social network analysis, Random Forest, Boosting, Trees, text mining
  • Experience on K-means, Hierarchical Clustering, Mixture Modeling. Artificial Intelligence and NLP: Worked for feature engineering that involves converting the arbitrary data to well-behaved data such as dealing with categorical features, text features, image features, and missing data.
  • Working knowledge on Natural Language Processing (NLP) and Natural Language Generation (NLG) using Python. Worked collaboratively with customers and team members supporting large business initiatives.
  • Worked closely with other Data Scientists to build a better understanding on how Data Science integrates with the big database. Worked on the Data and Analytics Solutions team utilizing Scrum practices and techniques.
  • JAVA/J2EE Application: implemented JAVA/J2EE technologies for application deployment using JSP, Servlets, Web Services (SOAP and Restful), Spring MVC, and Hibernate. Worked with business stakeholders identify business problems, opportunities and/or initiatives for which analytics models may provide insights that can contribute to or drive the development of an approach or solution.
  • Used Eclipse RAD and RTC as an IDE. Used Tomcat, WebLogic, Web Sphere, and JBOSS as application server. Worked with latest version of Spring as core business logic implementer - experienced in spring annotation/spring boot/spring-data
  • Programmed SQL scripts inside R environment, including SQL commands such as joins over multiple tables.
  • Scripted SQL ad hoc queries for testing database and explored data issues arising from other manual testing procedures. .
  • Reviewed and independently tested the effectiveness and accuracy of Image Analytics, NLP and machine learning models Utilized expertise in models that leverage the newest data sources, technologies, and tools
  • QlikView was used to create advanced reports from multiple data sources and Processed, clean, and verified the integrity of data used for analysis.
  • Worked with common data science toolkits, such as R, Weka, NumPy, sklearn, MatLab, etc and experience with data visualization tools, such as D3.js, GGplot, etc.
  • Used query languages such as SQL, Hive, Pig and experience with NoSQL databases, such as MongoDB, Cassandra, HBase and applied statistics skills, such as distributions, statistical testing, regression, etc.
  • Stata was used for data manipulation, production of tables and graphs, linear regression analysis, and logistic modeling.

Confidential

Graduate Teaching and Research Assistant

Responsibilities:

  • Used theoretical knowledge and hands-on experience in statistical techniques to analyze the collected data; used Excel, Origin graph and computer programming to analyze data and to sketch and draw charts and other visual materials required to supplement explanatory text; communicated effectively with professors to support research work using PowerPoint.
  • Took an effective scientific writing course for writing publications in scientific journals and or presentations; assisted in grant writing; attended scientific conferences and meetings and presented research posters/papers in many conferences and meetings; able to manage research budgets, resources and timelines.
  • Calculated the trap depth along all directions of two, four, and six axially magnetized high-flux NdFeB permanent magnets that are positioned and aligned along three mutually orthogonal axes of rectangular bar magnets, ring cylindrical magnets and cylindrical magnets using computer programming.
  • Experience in evaluating and monitoring social care tools and technologies, working directly with our vendors and key stakeholders across the organization.
  • Calculated numerically the on axis and off axis magnetic field at any points between the two coils for both gradient coil and Maxwell coil using computer programming.
  • Used computer programming to design and plan for magnetic trapping of neutral particles, and analyzed the result; executed gradient coil, and Maxwell coil using the big data.
  • Used Mathematica to calculate the trajectory of the particles in the electromagnetic field, magnetic trap, magneto optical trap (MOT), electromagnet, gradient coils, stark hexapole guide, and permanent magnet Zeeman slower. Wrote LabVIEW programs and changed existing LabVIEW programs to collect data.
  • Used computer programming to calculate the time of flight, nitric oxide rotational synthetic spectra, Zeeman splitting, convolution spectra, flux from the hexapole guide, and enhancement curves;
  • Used SIMION software package to calculate electric fields and the trajectories of charged particles in those fields calculated spin density, relaxation times and flow and spectral shifts; experience with optimization of permanent magnetic trap, gradient coil, Zeeman slower, and stark hexapole guide.
  • Designed magnetic trap using I-DEAS (Integrated Design and Engineering Analysis Software, a computer-aided design software package) and NX 7.5.

We'd love your feedback!