Sr. Data Scientist / Data Analyst Resume
Plano, TX
SUMMARY:
- Have more than 9 years extensive experience in Customer engagement and Data management and experience in analytics, digital or social media related with proven technical and analytical abilities. Strong Analytics and data management skills.
- Have 5+ years of data scientist experiences in various organizations and more than 5 years of data analysis and management experiences during my PhD work in physics, my experience includes strong problem solving, and advance statistical analysis skills and abilities.
- Furthermore, finished the computer simulation of filtering, guiding, trapping, and trajectory of cold molecular nitric oxide (NO) in the electrostatic, magnetic and electromagnetic field during my graduate work as PhD student.
- Have done a lot of prediction modelling work, social analytical work with our lab data that corresponds to the transition of elections from one energy state to other energy state when the molecules/atoms are exposed to laser light. I predicted the electron density, transition probability, and energy level in the different electronic states from the data collected using lab view in the lab and the data calculated from the theoretical model that I developed.
- Have practical understanding of statistical modelling and supervised/unsupervised/reinforcement machine learning techniques with keen interests in applying these techniques to predictive analytics world.
- Good for engineering, developing, deploying, and maintaining business systems with technical expertise including hands on solution development and implementation experience. I have ability to work effectively with diverse team of co - workers, and researchers, enthusiastic and self-motivated; flexible with adjusting to work schedules and team work needs; project management skills; research driven personality.
- Align closely with e-mail marketing and social analytics manager in order to define comprehensive measurement strategies for clients and to ensure delivery of actionable insights. Motivated and self-driven.
- Have ability to adapt to a fast pace and dynamic work environment with interpersonal, leadership and co-ordination skills.
- Exploring opportunities in data science, including deep machine learning, natural language processing, and artificial intelligence.
TECHNICAL SKILLS:
Data Analysis: Generalized Linear Models, Logistic Regressions, Boxplots, K-Means, Clustering, SVN, PuTTY, WinSCP, Redmine (Bug Tracking, Documentation, Scrum), Neural networks, AI, Teradata, Tableu
SQL: MySQL, SQL Server, Oracle, SQLite, PostgreSQL
R: (Packages: Stats, Zoo, Matrix, data, table, OpenSSL). Java 1.7,1.8, maven, scala, spark 2, 2.3, Spark Sql, Spark Streaming, Hadoop, mapreduce, HDFS, Eclipse, Ananconda
Python Versions: 2.7and 3.3, (Packages: NumPy, SciPy, Pandas, scikit-learn,Matplotlib, seaborn, statsmodels).
Microsoft Office: Excel, PowerPoint, Zeppline, Tableau Desktop (Version 10.0), MATLAB, Windows 8 and 7, LINUX Ubuntu and Mac. Tableau/Qlikview, Market Research Data understanding, HMM, LSTM, Mixture modeling, Stochastic. Strong SQL programming knowledge and experience. Hadoop including Hive, HDFS, MapReduce and Spark
PROFESSIONAL EXPERIENCE:
Confidential, Plano, TX
Sr. Data Scientist / Data Analyst
Responsibilities:
- Captured and elaborated analytics solution requirements, working with customers and product managers. Created advanced analytics solution. Zeppelin, Spark SQL and MLLib was combined to simplify exploratory Data Science
- Gathered, documented, and implemented business requirements for analysis or as part of a long-term document/report generation. Analyzed large volumes of data and provide results to technical and managerial staffs. Recognized, reported, and analyzed trends that was developed from data stream.
- Apache Zeppelin multi-purposed web-based notebook was used to bring data ingestion, data exploration, visualization, sharing and collaboration features to Hadoop and Spark. Took sole ownership of the analytics solution from requirements through to delivery
- Big data Analysis: Big data related techniques i.e., Hadoop, MapReduce, NoSQL, Pig/Hive, Spark/Shark, MLlib and Scala, numpy, scipy, Pandas, scikit-learn.
- Have programming knowledge in Java, Scala, spark, Sql and python
- Kafka was used as message broker to collect large volume of data and to analyze the collected data in the distributed system. Business data was feed into Kafka and then processesed using Spark Streaming in Scala for real-time analytics.
- Zeppelin was used for data ingestion, data exploration, visualization, sharing and collaboration features to Hadoop and Spark. Involved in the design, development & integration of an Artificial Intelligence solution. Used Spark SQL for ETL of raw data
- Spark Streaming was used to populatereal-time sentiments for crisis management and service adjusting. Worked on Natural Language Processing for sentiment analysis of the text and worked to data cleaning and feature extraction.
- Utilized statistical natural language processing to mine unstructured data, and create insights; analyze and model structured data using advanced statistical methods and implement algorithms and software needed to perform analyses
- Design, implementation and deep understanding of advanced statistical and predictive modeling concepts, machine-learning approaches, K means clustering and classification techniques
- Comfort and proficiency in using cloud compute (e.g. AWS, Azure). Familiarity with NoSQL databases, graphical analyses, and large-scale data processing frameworks (e.g. Apache Spark). Evaluated and performed POC on new strategic technical products and applications.
- Established and maintain effective processes for K - fold validating, confusion matrix and updating predictive models
- Digital Communication Analytics is designed to give a broad-based approach to extract, report, model and analyze communications data that is required to get ahead in public relations, marketing or advertising
- Developed scalable, reliable and efficient enterprise applications using Java, Spring, Spring MVC, Hibernate, Web Services, RESTful, JSF, JDBC, JSP, Servlets, EJB, JMS, XML, XSLT, JAXB SQL, JAX-WS Unix Shell scripting. Exposing the Micro services with Spring Boot based services based on RESTful API utilizing Spring Boot with Spring MVC. Used Log4j framework to log/track application.
- Wrote and Tested Restful Web services and SOAP in SOAPUI
- Predicted categories based on location, time and some other features by Linear, regression including polynomial linear regression, Logistic regression, Decision Tress, Random Forests, Ensembles (Bagging, Boosting), Support Vector Machines, Neural Networks, KNN, Clustering, XGBoost, graph/network analysis with pandas, python, and Scikit-learn and other python library. Tableau was used to connect to files, relational and Big Data sources to acquire and process data.
- Strong understanding of relational databases (Oracle, PostgresSQL, SQL Server)
- Wrote database compleex SQL queries in Oracle, PostgreSQL, MySql and developed data models for data analysis and extraction.
- Keras and TensorFlow was used for developing and evaluating deep neural network models.
- Familiar with AR (Autoregressive), MA(Moving Average), and ARIMA (Autoregressive Integrated Moving Average) time series analysis models. Confidential & Confidential market demand was predicted using ARIMA time series analysis and forecasting statistical model with grid seach.
- Tableau was used for visually analyzing the data. Tableau was also used to create and distribute an interactive and shareable dashboard to see the trends, variations, and density of the data in the form of graphs and charts
- IBM Watson Explorer Analytical tool was used to collect and analyze structured and unstructured content in documents, email, databases, websites, and other enterprise repositories.
Confidential, Plano, TX
Sr. Data Scientist/ Machine Learning Engineer
Responsibilities:
- Gathered, documented, and implemented business requirements for analysis or as part of a long-term document/report generation. Analyzed large volumes of data and provide results to technical and managerial staffs.
- Wrote complex SQL programming language to interact with database with more than 98 million rows using Teradata studio
- Worked with various data pools and DBAs to have access to data. Have knowledge of NLP, NLTK or Text Mining
- Trained and supervised learning up to 8 other team members for the SQL/Scala/Spark programming language and assist in the installation and upgrading of Python, Scala, Java and Spark
- Have programming knowledge in Java, Scala, spark, Sql and python
- Used K-means clustering for grouping similar data and documented.
- Extracted, transformed, and loaded data in Postgres data base using Python scripts.
- Data visualization: Pentaho, Tableau, D3, Django web app. Have knowledge of Numerical optimization, Anomaly Detection and estimation, A/B testing, Statistics, and Maple. Have big data analysis technique using Big data related techniques i.e., Hadoop, MapReduce, NoSQL, Pig/Hive, Spark/Shark, MLlib and Scala, numpy, scipy, Pandas, scikit-learn.
- Worked to research and develop statistical learning models for data analysis. Collaborated with product management and engineering departments
- SAS Data Analysts is used for analyzing client business needs, managing large data sets, storing and extracting information. Worked for feature engineering that involves converting the arbitrary data to well-behaved data such as dealing with categorical features, text features, image features, and missing data.
- Worked with Core Java, and spring, Spring MVC and worked with Restful web services. Wrote and Tested Restful Web services in SOAPUI
- Hadoop - MAP REDUCE/Hive/Pig/ was used to store, process and analyze huge amount of unstructured data having 100 million users to offer a gift card to its top 10 customers who have spent the most in the previous year. Kafka was used as message broker to collect large volume of data and to analyze the collected data in the distributed system.
- Splunk ES was used for application management, security, performance management, and analytics for the public APIs.
- Splunk was used for collecting, indexing, monitoring, and visualization of the data.
- Paxata was used to combine, clean and shape the data prior to analytics and it was also used to bring together data, find and fix dirty or missing data, and share and re-use data projects across teams. .
- Worked on data generation, machine learning for Anti-Fraud detection, Data Modeling, operations decisioning, and loss forecasting such as product-specific fraud, or buyer vs. seller fraud.
- Monte Carlo simulation algorithms were used to obtain numerical results by running simulations many times in succession in order to calculate probabilities with machine learning. Analyzed data for Fraud Analysis and Direct Fraud.
- K-fold cross Validation technique was used to improved model performance and to test the model on the sample data before finalizing the model.
- Confusion Matrix, Gain and Lift Charts, K-S Chart, and ROC Chart were used to evaluate the classification model.
- Worked with public/private Cloud Computing technologies (IaaS, PaaS & SaaS) and Amazon Web Services (AWS) and worked for customer analytics and predictions
- Kibana and Tableau was used for Business Intelligence tool for visually analyzing the data and to shows the trends, variations and density of the data in form of graphs and charts
- QlikView was used to create guided analytics applications as well as dashboards designed for business challenges.
- Formulated procedures for integration of R programming plans with data sources and delivery systems and R language was used for prediction.
- Built advanced analytics solutions deployed into production for prediction, forecasting, and optimization, including: Data mining, Statistical analysis, Modeling, Machine learning, Visualization
- Big data tools were used for developing and executing models such as big data technologies like Hadoop, Hive, Pig and Spark
- Used query languages such as SQL, Hive, Pig and experience with NoSQL databases, such as MongoDB, Cassandra, HBase
- Worked with data visualization tools like Tableau and Kibana. Worked with SOA, IaaS, and Cloud Computing technologies, in the AWS environment. Experience with continuous software integration, test and deployment. Worked in agile software development paradigm (e.g., Scrum. Be up-to-date knowledge in the appropriate technical area
- Big data Analysis: Big data related techniques i.e., Hadoop, MapReduce, NoSQL, Pig/Hive, Spark/Shark, MLlib and Scala, numpy, scipy, Pandas, scikit-learn.
- Hands-on experience with rich UI and stunning data visualization using JavaScript
- Worked with both unstructured/structured data Machine Learning Algorithms such as Linear, Logistic, Decision Tress, Random Forests, Support Vector Machines, Neural Networks, KNN, and Time series analysis
- Keras along with numerical computation libraries such as Theano and TensorFlow was used for developing and evaluating deep neural network models.
- Tableau was used for analysing the data to show the trends, variations and density of the data in form of graphs and charts. Tableau was connected to files, relational and Big data sources to acquire and process data
- Build and maintain SQL scripts, indexes, and complex queries for data analysis and extraction.
- Created and executed complex SQL statements in both SQL production and development environments.
- Used scikit-learn, Pandas, and the statsmodels Python libraries to build predictive forecasting for time series analysis using AR (Autoregressive), MA (Moving Average), and ARIMA (Autoregressive Integrated Moving Average) models.
Confidential, TX
Sr. Data Scientist/Big Data Analyst
Responsibilities:
- Built advanced analytics solutions deployed into production for prediction, forecasting, and optimization, including: Data mining, Statistical analysis, Modeling, Machine learning, Visualization,
- Worked with Big data tools for developing and executing models such as big data technologies like Hadoop, Hive, Pig and Spark
- Used query languages such as SQL, Hive, Pig and experience with NoSQL databases, such as MongoDB, Cassandra, HBase
- Worked with data visualization tools like Tableau. Worked with SOA, IaaS, and Cloud Computing technologies, in the AWS environment. Experience with continuous software integration, test and deployment.
- Worked in agile software development paradigm (e.g., Scrum. Be up-to-date knowledge in the appropriate technical area. Develop all weekly, monthly and quarterly reports and dashboards
- Worked with Big data related techniques i.e., Hadoop, MapReduce, NoSQL, Pig/Hive, Spark/Shark, MLlib and Scala, numpy, scipy, Pandas, scikit-learn.
- Used complex Excel formulas and pivot tables to manipulate large datasets.
- Splunk ES was used for application management, security, performance management, and analytics for the public APIs. Splunk was used for collecting, indexing, monitoring, and visualization of the data.
- Used Eclipse RAD and RTC as an IDE. Used Tomcat, WebLogic, Web Sphere, and JBOSS as application server.
- K fold Cross Validation technique was used to improved model performance and to test the model on the sample data before finalizing the model. Confusion Matrix, Gain and Lift Charts, K-S Chart, and ROC Chart were used to evaluate the classification model.
- QlikView was used to create guided analytics applications as well as dashboards designed for business challenges
- Converted and scrubbed data for uploading to MS SQL tables, using tools such as R, Python, Excel, and shell scripts (PDF to Text).
- Created queries to correct data issues in MS SQL tables. Built Excel and MS SQL tables to consolidate bank statements, including vendor data.
- Worked for feature engineering that involves converting the arbitrary data to well-behaved data such as dealing with categorical features, text features, image features, and missing data.
- Tableau and Kibana was used for data visualization such as trend lines, charts and graphs
- Collaborated with others in data science and analytics on data mining and predictive modeling as required, Created anomaly detection systems and constant tracking of its performance
- Selected features, building and optimizing classifiers using machine learning techniques.
- Enhanced data collection procedures to include information that is relevant for building analytic systems Processing, cleansing, and verifying the integrity of data used for analysis
- Worked with team to select and implement model development process from statistics and/or machine learning to answer business problems.
- Extracted data from a variety of relational databases, manipulate, explore data using quantitative, statistical and visualization tools. Wrote complex SQL programming language to interact with Aster database with more than few million rows using Teradata studio
- Developed scalable, reliable and efficient enterprise applications using Java, Spring, Spring MVC, Hibernate, Web Services, RESTful, JSF, JDBC, JSP, Servlets, EJB, JMS, XML, XSLT, JAXB SQL, JAX-WS Unix Shell scripting. Exposing the Micro services with Spring Boot based services based on RESTful API utilizing Spring Boot with Spring MVC. Used Log4j framework to log/track application.
- Wrote and Tested Restful Web services and SOAP in SOAPUI
- Used both unstructured/structured data Machine Learning Algorithms such as Linear, Logistic, Decision Tress, Random Forests, Support Vector Machines(SVM), Neural Networks, KNN, graph/network analysis, K-Means, and Dimensionality Reduction Algorithms
Confidential, Wilmington, DE
Data Scientist/ Data analyst
Responsibilities:
- Gathered information from various programs, analyzed time requirements and prepared documentation to change existing programs.
- Experience on K-means, Hierarchical Clustering, Mixture Modeling. Artificial Intelligence and NLP: Worked for feature engineering that involves converting the arbitrary data to well-behaved data such as dealing with categorical features, text features, image features, and missing data.
- Working knowledge on Natural Language Processing (NLP) and Natural Language Generation (NLG) using Python. Worked collaboratively with customers and team members supporting large business initiatives.
- Worked closely with other Data Scientists to build a better understanding on how Data Science integrates with the big database. Worked on the Data and Analytics Solutions team utilizing Scrum practices and techniques.
- JAVA/J2EE Application: implemented JAVA/J2EE technologies for application deployment using JSP, Servlets, Web Services (SOAP and Restful), Spring MVC, and Hibernate. Worked with business stakeholders identify business problems, opportunities and/or initiatives for which analytics models may provide insights that can contribute to or drive the development of an approach or solution.
- Used Eclipse RAD and RTC as an IDE. Used Tomcat, WebLogic, Web Sphere, and JBOSS as application server. Worked with latest version of Spring as core business logic implementer - experienced in spring annotation/spring boot/spring-data
- Programmed SQL scripts inside R environment, including SQL commands such as joins over multiple tables.
- Scripted SQL ad hoc queries for testing database and explored data issues arising from other manual testing procedures. .
- Reviewed and independently tested the effectiveness and accuracy of Image Analytics, NLP and machine learning models Utilized expertise in models that leverage the newest data sources, technologies, and tools, such as machine learning, Python, Hadoop, Spark, Azure/AWS, as well as for Big Data.
- QlikView was used to create advanced reports from multiple data sources and Processed, clean, and verified the integrity of data used for analysis.
- Worked with common data science toolkits, such as R, Weka, NumPy, skitlearn, MatLab, etc and experience with data visualization tools, such as D3.js, GGplot, etc.
- Used query languages such as SQL, Hive, Pig and experience with NoSQL databases, such as MongoDB, Cassandra, HBase and applied statistics skills, such as distributions, statistical testing, regression, etc.
Confidential
Graduate Teaching and Research Assistant
Responsibilities:
- Used theoretical knowledge and hands-on experience in statistical techniques to analyze the collected data; used Excel, Origin graph and computer programming to analyze data and to sketch and draw charts and other visual materials required to supplement explanatory text; communicated effectively with professors to support research work using PowerPoint.
- Took an effective scientific writing course for writing publications in scientific journals and or presentations; assisted in grant writing; attended scientific conferences and meetings and presented research posters/papers in many conferences and meetings; able to manage research budgets, resources and timelines.
- Calculated the trap depth along all directions of two, four, and six axially magnetized high-flux NdFeB permanent magnets that are positioned and aligned along three mutually orthogonal axes of rectangular bar magnets, ring cylindrical magnets and cylindrical magnets using computer programming.
- Experience in evaluating and monitoring social care tools and technologies, working directly with our vendors and key stakeholders across the organization.
- Calculated numerically the on axis and off axis magnetic field Confidential any points between the two coils for both gradient coil and Maxwell coil using computer programming.
- Used computer programming to design and plan for magnetic trapping of neutral particles, and analyzed the result; executed gradient coil, and Maxwell coil using the big data.
- Used Mathematica to calculate the trajectory of the particles in the electromagnetic field, magnetic trap, magneto optical trap (MOT), electromagnet, gradient coils, stark hexapole guide, and permanent magnet Zeeman slower. Wrote LabVIEW programs and changed existing LabVIEW programs to collect data.
- Used computer programming to calculate the time of flight, nitric oxide rotational synthetic spectra, Zeeman splitting, convolution spectra, flux from the hexapole guide, and enhancement curves;
- Used SIMION software package to calculate electric fields and the trajectories of charged particles in those fields calculated spin density, relaxation times and flow and spectral shifts; experience with optimization of permanent magnetic trap, gradient coil, Zeeman slower, and stark hexapole guide.
- Designed magnetic trap using I-DEAS (Integrated Design and Engineering Analysis Software, a computer-aided design software package) and NX 7.5.
Confidential
Research Assistant
Responsibilities:
- Commercial pharma analytics experience in Life science Analytics domain.
- Researched and developed in Imaging technologies for Digital Pathology products and solutions.
- Utilized expertise in image processing and computer vision on initiatives pertaining to the development of imaging technologies for Digital Pathology.
- Investigated a wide variety of scientific principles and concepts resulting in potential inventions, products and problems.
- Served as an in-house and outside expert on imaging related applications.
- Researched, designed and developed new and robust solutions in image segmentation, registration, machine learning for analysis of digitized histopathology slides.
- Designed, conducted and led advanced independent or multi-disciplinary research driven by strategic business needs in digital pathology product development.
- Provided technical direction, mentorship, guidance and feedback to others.
- Strong experience in Customer Monitoring /Listening Platform and maintain the data base.
- R&D experience in one or more of the following topics: image processing, computer vision, machine learning and Medical Image Analysis.
- Hands on software development and implementation of image processing algorithms.
- Proven track record for innovations in problems solving and analysis.
- Good mathematical and analytical skills and through knowledge of basic and advanced image processing techniques.
