Sr. Data Scientist/ Machine Learning Engineer Resume
Plano, TX
SUMMARY:
- Have more than 8+ years extensive experience in Data management and experience in analytics, social media related with proven technical and analytical abilities. Strong Analytics and data management skills.
- Extensive years of data scientist experiences in various organizations and more than Extensive years of data analysis and management experiences during my PhD work in physics, my experience includes strong problem solving, and advance statistical analysis skills and abilities.
- Furthermore, finished the computer simulation of filtering, guiding, trapping, and trajectory of cold molecular nitric oxide (NO) in the electrostatic, magnetic and electromagnetic field during my graduate work as PhD student.
- Have done a lot of prediction modelling work, analytical work with our lab data that corresponds to the transition of elections from one energy state to other energy state when the molecules/atoms are exposed to laser light. Predicted the electron density, transition probability, and energy level in the different electronic states from the data collected using lab view in the lab and the data calculated from the theoretical model that I developed.
- Have practical understanding of statistical modelling and supervised/unsupervised/ reinforcement machine learning techniques with keen interests in applying these techniques to predictive analytics world.
- Good for engineering, developing, deploying, and maintaining business systems with technical expertise including hands on solution development and implementation experience. I have ability to work effectively with diverse team of co - workers, and researchers, enthusiastic and self-motivated; flexible with adjusting to work schedules and team work needs; project management skills; research driven personality.
- Have ability to adapt to a fast pace and dynamic work environment with interpersonal, leadership and co-ordination skills.
- Exploring opportunities in data science, including deep machine learning, natural language processing, and artificial intelligence.
TECHNICAL SKILLS:
Data Analysis: Generalized Linear Models, Logistic Regressions, Boxplots, K-Means, Clustering, SVN, PuTTY, WinSCP, Redmine (Bug Tracking, Documentation, Scrum), Neural networks, AI, Teradata, Tableu, H2O flow
SQL: MySQL, SQL Server, Oracle, SQLite, PostgreSQL
R: (Packages: Stats, Zoo, Matrix, data, table, OpenSSL). Java 1.7,1.8, maven, scala, spark 2, 2.3, Spark Sql, Spark Streaming, Hadoop, mapreduce, HDFS, Eclipse, Ananconda
Python Versions: 2.7and 3.3, (Packages: NumPy, SciPy, Pandas, scikit-learn,Matplotlib, seaborn, statsmodels).
Microsoft Office: Excel, PowerPoint, Zeppline, Tableau Desktop (Version 10.0), MATLAB, Windows 8 and 7, LINUX Ubuntu and Mac.
PROFESSIONAL EXPERIENCE:
Confidential, Plano, TX
Sr. Data Scientist/ Machine Learning Engineer
Responsibilities:
- Gathered, documented, and implemented business requirements for analysis or as part of a long-term document/report generation. Analyzed large volumes of data and provide results to technical and managerial staffs.
- Worked with both unstructured/structured data Machine Learning Algorithms such as Linear, Logistic, Decision Tress, Random Forests, Support Vector Machines, Ensemble models, Neural Networks, KNN, and Time series analysis. Evaluated, tested, compared, and validated these different models before selecting the best model for prediction.
- Used pandas, python, Scikit-learn, numphy, scipy, the statsmodels, other python library to build predictive forecasting for time series analysis using AR (Autoregressive), MA (Moving Average), and ARIMA (Autoregressive Integrated Moving Average) models.
- K-fold cross Validation technique was used to improved model performance and to test the model on the sample data before finalizing the model.
- Confusion Matrix and ROC Chart were used to evaluate the classification model.
- Have knowledge of Numerical optimization, Anomaly Detection and estimation, A/B testing, Statistics, and Maple . Collaborated with product management and other departments to gather the requirements.
- Used K-means clustering for grouping similar data and documented.
- Worked for feature engineering that involves converting the arbitrary data to well-behaved data such as dealing with categorical features, text features, image features, and missing data
- Worked on data generation, machine learning for Anti-Fraud detection, Data Modeling, operations decisioning, and loss forecasting such as product-specific fraud
- Monte Carlo simulation algorithms were used to obtain numerical results by running simulations many times in succession in order to calculate probabilities with machine learning. Analyzed data for Fraud Analysis and Direct Fraud.
- Wrote complex SQL programming language to interact with database with more than 98 million rows using Teradata studio
- Extracted, transformed, and loaded data (ETL) in Postgres data base using Python scripts.
- Build and maintain SQL scripts, indexes, and complex queries for data analysis and extraction.
- Trained and supervised learning up to 8 other team members for the SQL/Scala/Spark programming language and assist in the installation and upgrading of Python, Scala, Sql, Java and Spark. Worked with various data pools and DBAs to have access to data.
- Splunk ES was used for application management, security, performance management, and analytics for the public APIs. Splunk was used for collecting, indexing, monitoring, and visualization of the data.
- Paxata was used to combine, clean and shape the data prior to analytics and it was also used to bring together data, find and fix dirty or missing data, and share and re-use data projects across teams.
- Kibana and Tableau was used for Business Intelligence tool for visually analyzing the data and to shows the trends, variations and density of the data in form of graphs and charts
- Kafka and spark streaming were used to collect large volume of data and to analyze the collected data in the distributed system. Spark MLlib was used for model evaluation and prediction. The Natural Language Toolkit (NLTK) was used for data extraction for sentimental analysis. Experiences D3, Django web app.
- Keras along with numerical computation libraries such as Theano and TensorFlow was used for developing and evaluating deep neural network models.
- Worked with public/private Cloud Computing technologies (IaaS, PaaS & SaaS) and Amazon Web Services (AWS) and worked for customer analytics and predictions
- MapReduce/Hive/Pig, Hadoop was used to store, process and analyze huge amount of unstructured data having 100 million users to offer a gift card to its top 10 customers who have spent the most in the previous year.
- Worked with Core Java, spring, Spring MVC, Hibernate, Spring Restful web services to create the application and to get the data from database for data analytics. Wrote and Tested Restful Web services in SOAPUI
- Formulated procedures for integration of R programming plans with data sources and delivery systems and R language was used for prediction.
- Experience with continuous integration, test and deployment. Worked in agile software development paradigm (e.g., Scrum). Be up-to-date knowledge in the appropriate technical area
- Hands-on experience with rich UI and stunning data visualization using JavaScript
- NLTK open source natural language processing (NLP) platform in Python was used for textual tokenization for the sentiment analysis the text message.
Confidential, TX
Jr. Data Scientist/Big Data Analyst
Responsibilities:
- Performed advanced statistical analysis, including generalized Linear, regression, Logistic regression, Decision Tress, Random Forests, Ensembles (Bagging, Boosting), Support Vector Machines, Neural Networks, KNN, K means clustering, XGBoost, graph/network analysis, and time series analysis using pandas, python, and Scikit-learn, numphy, scipy and other python library to develop predictive models.
- Worked with Big data related techniques i.e., Hadoop, MapReduce, NoSQL, Pig/Hive, Spark/Shark, MLlib and Scala, numpy, scipy, Pandas, scikit-learn.
- Used query languages such as SQL, Hive, Pig and experience with NoSQL databases, such as MongoDB, Cassandra, HBase
- Worked in agile software development paradigm (e.g., Scrum. Be up-to-date knowledge in the appropriate technical area. Develop all weekly, monthly and quarterly reports and dashboards
- Used complex Excel formulas and pivot tables to manipulate large datasets.
- Splunk ES was used for application management, security, performance management, and analytics for the public APIs. Splunk was used for collecting, indexing, monitoring, and visualization of the data.
- K fold Cross Validation technique was used to improved model performance and to test the model on the sample data before finalizing the model. Confusion Matrix, Gain and Lift Charts, K-S Chart, and ROC Chart were used to evaluate the classification model.
- QlikView and Tableau was used to create guided analytics applications as well as dashboards designed for business challenges
- Worked for feature engineering that involves converting the arbitrary data to well-behaved data such as dealing with categorical features, text features, image features, and missing data.
- Tableau and Kibana was used for data visualization such as trend lines, charts and graphs
- Collaborated with others in data science and analytics on data mining and predictive modeling as required, Created anomaly detection systems and constant tracking of its performance
- Selected features, building and optimizing classifiers using machine learning techniques.
- Enhanced data collection procedures to include information that is relevant for building analytic systems Processing, cleansing, and verifying the integrity of data used for analysis
- Worked with team to select and implement model development process from statistics and/or machine learning to answer business problems.
- Extracted data from a variety of relational databases, manipulate, explore data using quantitative, statistical and visualization tools. Wrote complex SQL programming language to interact with Aster database with more than few million rows using Teradata studio
Confidential, Bethesda, MD
Sr. Data Scientist / Data Analyst
Responsibilities:
- Predicted categories based on location, time and some other features by Linear, regression, Logistic regression, Decision Tress, Random Forests, Ensembles (Bagging, Boosting), Support Vector Machines, Neural Networks, KNN, K means clustering, XGBoost, graph/network analysis, and time series analysis using pandas, python, and Scikit-learn, numphy, scipy and other python library.
- Trained large list of models, evaluated and compared and selected the best models for predction and forcasting. Established and maintain effective processes for K - fold validating, confusion matrix and updating predictive models
- H2O Flow notebook that is user interface for H2O was used to capture, rerun, present, and share workflow. Used H2O to import files, build models, and improve the model for data analytics and prediction.
- Built Ensemble Models in machine learning such as Bootstrap aggregating (bagging - Bagged Decision Trees and Random Forest) and Boosting (Gradient boosting, XGBoost and AdaBoost) to improve accuracy, reduce variance and bias, and improve stability of a model
- Involved in the design, implementation, development & integration of an Artificial Intelligence solution. Utilized statistical Natural Language Processing for sentiment analysis, mine unstructured data, and create insights; analyze and model structured data using advanced statistical methods and implement algorithms.
- Used Spark SQL for ETL of raw data. Worked for feature selection, data wrangling, and feature extraction and worked on ETL on Hadoop.
- Tableau was used to connect to files, relational and Big Data sources to acquire, visually analyze, and process data. Tableau was also used to create and distribute an interactive and shareable dashboard to see the trends, variations, and density of the data in the form of graphs and charts. Took sole ownership of the analytics solution from requirements through to delivery
- Strong understanding of relational databases (Oracle, PostgreSQL, SQL Server)
- Wrote database complex SQL queries in Oracle, PostgreSQL, MySQL and developed data models for data analysis and extraction.
- Keras and TensorFlow was used for developing and evaluating deep neural network models.
- Familiar with AR (Autoregressive), MA (Moving Average), and ARIMA (Autoregressive Integrated Moving Average) time series analysis models. AT & T market demand was predicted using ARIMA time series analysis and forecasting statistical model with grid search.
- Captured and elaborated analytics solution requirements, working with customers and product managers. Created advanced analytics solution. Zeppelin, Spark SQL and Spark MLLib was combined to simplify exploratory Data Science
- Gathered, documented, and implemented business requirements for analysis or as part of a long-term document/report generation. Analyzed large volumes of data and provide results to technical and managerial staffs. Recognized, reported, and analyzed trends that was developed from data stream.
- Apache Zeppelin web-based notebook was used to bring data ingestion, data exploration, visualization, sharing and collaboration features to Hadoop and Spark.
- Experience with Big data related techniques i.e., Hadoop, MapReduce, NoSQL, Pig/Hive, Spark, Spark MLlib. Have programming knowledge in Java, Scala, spark, Sql and python
- Kafka was used as message broker to collect large volume of data and to analyze the collected data in the distributed system. Business data was feed into Kafka and then processed using Spark Streaming in Scala for real-time analytics and data science.
- Spark Streaming was used to populate real-time sentiment analysis, crisis management and service adjusting.
- Comfort and proficiency in using cloud compute (e.g. AWS, Azure). Familiarity with NoSQL databases, graphical analyses. Evaluated and performed POC on new strategic technical products and applications.
- Apache Flume, and Apache Sqoop data loading was used to load both structured and unstructured streaming data to HDFS, hive and HBase.
- Worked in the Agile methodology with self-organizing, cross-functional teams sprint towards results in fast, iterative, incremental, and adaptive steps
- Developed scalable, reliable and efficient enterprise applications using Java, Spring, Spring MVC, Hibernate, Web Services, RESTful, JSF, JDBC, JSP, Servlets, EJB, JMS, XML, XSLT, JAXB SQL, JAX-WS Unix Shell scripting. Exposing the Micro services with Spring Boot based services based on RESTful API utilizing Spring Boot with Spring MVC. Used Log4j framework to log/track application. Wrote and Tested Restful Web services and SOAP in SOAPUI
Confidential, Wilmington, DE
Data Scientist/ Data analyst
Responsibilities:
- Gathered information from various programs, analyzed time requirements and prepared documentation to change existing programs.
- Experience on K-means, Hierarchical Clustering, Mixture Modeling. Artificial Intelligence and NLP: Worked for feature engineering that involves converting the arbitrary data to well-behaved data such as dealing with categorical features, text features, image features, and missing data.
- Working knowledge on Natural Language Processing (NLP) and Natural Language Generation (NLG) using Python. Worked collaboratively with customers and team members supporting large business initiatives.
- Worked closely with other Data Scientists to build a better understanding on how Data Science integrates with the big database. Worked on the Data and Analytics Solutions team utilizing Scrum practices and techniques.
- JAVA/J2EE Application: implemented JAVA/J2EE technologies for application deployment using JSP, Servlets, Web Services (SOAP and Restful), Spring MVC, and Hibernate. Worked with business stakeholders identify business problems, opportunities and/or initiatives for which analytics models may provide insights that can contribute to or drive the development of an approach or solution.
- Used Eclipse RAD and RTC as an IDE. Used Tomcat, WebLogic, Web Sphere, and JBOSS as application server. Worked with latest version of Spring as core business logic implementer - experienced in spring annotation/spring boot/spring-data
- Programmed SQL scripts inside R environment, including SQL commands such as joins over multiple tables.
- Scripted SQL ad hoc queries for testing database and explored data issues arising from other manual testing procedures. .
- Reviewed and independently tested the effectiveness and accuracy of Image Analytics, NLP and machine learning models Utilized expertise in models that leverage the newest data sources, technologies, and tools, such as machine learning, Python, Hadoop, Spark, Azure/AWS, as well as for Big Data.
- QlikView was used to create advanced reports from multiple data sources and Processed, clean, and verified the integrity of data used for analysis.
- Worked with common data science toolkits, such as R, Weka, NumPy, skitlearn, MatLab, etc and experience with data visualization tools, such as D3.js, GGplot, etc.
- Used query languages such as SQL, Hive, Pig and experience with NoSQL databases, such as MongoDB, Cassandra, HBase and applied statistics skills, such as distributions, statistical testing, regression, etc.
- Stata was used for data manipulation, production of tables and graphs, linear regression analysis, and logistic modeling.
Confidential
Graduate Teaching and Research Assistant- MS/PhD Thesis
Responsibilities:
- Used theoretical knowledge and hands-on experience in statistical techniques to analyze the collected data; used Excel, Origin graph and computer programming to analyze data and to sketch and draw charts and other visual materials required to supplement explanatory text; communicated effectively with professors to support research work using PowerPoint.
- Took an effective scientific writing course for writing publications in scientific journals and or presentations; assisted in grant writing; attended scientific conferences and meetings and presented research posters/papers in many conferences and meetings; able to manage research budgets, resources and timelines.
- Calculated the trap depth along all directions of two, four, and six axially magnetized high-flux NdFeB permanent magnets that are positioned and aligned along three mutually orthogonal axes of rectangular bar magnets, ring cylindrical magnets and cylindrical magnets using computer programming.
- Experience in evaluating and monitoring social care tools and technologies, working directly with our vendors and key stakeholders across the organization.
- Calculated numerically the on axis and off axis magnetic field at any points between the two coils for both gradient coil and Maxwell coil using computer programming.
- Used computer programming to design and plan for magnetic trapping of neutral particles, and analyzed the result; executed gradient coil, and Maxwell coil using the big data.
- Used Mathematica to calculate the trajectory of the particles in the electromagnetic field, magnetic trap, magneto optical trap (MOT), electromagnet, gradient coils, stark hexapole guide, and permanent magnet Zeeman slower. Wrote LabVIEW programs and changed existing LabVIEW programs to collect data.
- Used computer programming to calculate the time of flight, nitric oxide rotational synthetic spectra, Zeeman splitting, convolution spectra, flux from the hexapole guide, and enhancement curves;
- Used SIMION software package to calculate electric fields and the trajectories of charged particles in those fields calculated spin density, relaxation times and flow and spectral shifts; experience with optimization of permanent magnetic trap, gradient coil, Zeeman slower, and stark hexapole guide.
- Designed magnetic trap using I-DEAS (Integrated Design and Engineering Analysis Software, a computer-aided design software package) and NX 7.5.