Data Scientist Resume
Baltimore, MD
SUMMARY
- Over 6 years of experience in Machine Learning, Deep Learning, Datamining with large datasets of Structured and Unstructured Data, Data Acquisition, Data Validation, Predictive Modeling.
- Experience with AWS cloud computing, Spark (especially AWS EMR), Kibana, Node.js, Tableau, Looker.
- Experience in the healthcare domain.
- Strong technical communication skills; both written and verbal.
- Ability to understand and articulate the “big picture” and simplify complex ideas.
- Strong problem solving and structuring skills.
- Ability to identify and learn applicable new techniques independently as needed.
- Ability to create new methods of solutions through a combination of foundational research and collaboration with ongoing initiatives.
- Experience in stochastic optimization, which has resulted in utilization by commercial applications and open - source algorithms.
- Experience formulating and solving discrete and continuous optimization problems.
- Experience developing and applying novel methods of stochastic optimization and optimization under uncertainty algorithms for large scale problems, including mixed integer type problems.
- Expertise with design optimization methods with computational efficiency considerations.
- Able to research statistical machine learning methods including forecasting, supervised learning, classification, and Bayesian methods.
- Conduct complex, advanced research projects in areas of interest to Business Units.
- Develop new and advanced cutting-edge techniques and algorithms
- Transfer and implement results and technology in hard- and software prototypes and demo systems relevant to the businesses
- Survey relevant technologies and stay abreast of latest developments
- Draft and submit papers and patents based on research
- Contribution to several research projects that combine new data sources and computational tools
- Wrote efficient code and working with large datasets
- Exceptional mathematical and statistical modeling and computer programming skills
- Use of mathematical and statistical modeling and computer programming skills in an innovative manner.
- The ability to work comfortably and effectively within an interdisciplinary research environment.
- Able to advance the technical sophistication of solutions using machine learning and other advanced technologies.
TECHNICAL SKILLS
Data Science Specialties: Natural Language Processing, Machine Learning, Internet of Things (IoT) analytics, Social Analytics, Predictive Maintenance
Analytic Skills: Bayesian Analysis, Inference, Models, Regression Analysis, Linear models, Multivariate analysis, Stochastic Gradient Descent, Sampling methods, Forecasting, Segmentation, Clustering, Sentiment Analysis, Predictive Analytics
Analytic Tools: Classification and Regression Trees (CART), Support Vector Machine, Random Forest, Gradient Boosting Machine (GBM), TensorFlow, PCA, RNN, Regression, Naïve Bayes
Analytic Languages and Scripts: R, Python, HiveQL, Spark, Spark SQL, Scala, Impala, MapReduce
Languages: Java, Python, R, JavaScript, SQL, MATLAB
Version Control: GitHub, Git, SVN
IDE: Jupyter, Spyder
Data Query: Azure, Google, Amazon RedShift, Kinesis, EMR; HDFS, RDBMS, SQL and noSQL, data warehouse, data lake and various SQL and NoSQL databases and data warehouses.
Deep Learning: Machine perception, Data Mining, Machine Learning algorithms, Neural Networks, TensorFlow, Keras
Soft Skills: Able to deliver presentations and highly technical reports; collaboration with stakeholders and cross-functional teams, advisement on how to leverage analytical insights. Development of clear analytical reports which directly address strategic goals.
PROFESSIONAL EXPERIENCE
Data Scientist
Confidential, Baltimore, MD
Responsibilities:
- Applied advanced analytics skills, with proficiency at integrating and preparing large, varied datasets, architecting specialized database and computing environments, and communicating results.
- Developed analytical approaches to strategic business decisions.
- Performed analysis using predictive modeling, data/text mining, and statistical tools.
- Collaborated cross-functionally to arrive at actionable insights.
- Synthesized analytic results with business input to drive measurable change.
- Assisted in continual improvement of AWS data lake environment.
- Identifying, gathering, and analyzing complex, multi-dimensional datasets utilizing a variety of tools.
- Performed data visualization and developed presentation material utilizing Tableau.
- Responsible for defining the key business problems to be solved while developing, maintaining relationships with stakeholders, SMEs, and cross-functional teams.
- Used Agile approaches, including Extreme Programming, Test-Driven Development, and Agile Scrum.
- Provided knowledge and understanding of current and emerging trends within the analytics industry
- Participated in product redesigns and enhancements to know how the changes will be tracked and to suggest product direction based on data patterns.
- Applied statistics and organizing large datasets of both structured and unstructured data.
- Use of algorithms, data structures and performance optimization.
- Worked with applied statistics and/or applied mathematics.
- Facilitated the data collection to analyze document data processes, scenarios, and information flow.
- Determined data structures and their relations in supporting business objectives and provided useful data in reports.
- Promoted enterprise-wide business intelligence by enabling report access in SAS BI Portal and on Tableau Server.
Data Scientist
Confidential, San Jose, CA
Responsibilities:
- Strong experience in Software Development Life Cycle (SDLC) including Requirements Analysis, Design Specification and Testing as per cycle in both Waterfall and Agile methodologies.
- Worked in Git development environment.
- Experienced in Data Integration Validation and Data Quality controls for ETL process and Data Warehousing using MS Visual Studio, SSIS, SSAS, SSRS.
- Adept at using SAS Enterprise suite, Python, and Big Data related technologies including knowledge in Hadoop, Hive, Sqoop, Oozie, Flume, Map-Reduce
- Proficient in Predictive Modeling, Data Mining Methods, Factor Analysis, ANOVA, Hypothetical Testing, and Normal Distribution.
- Tasks involved in preprocessing content (normalization, POS tagging, and parsing) in the realm of natural language processing, as well as named entity recognition, opinion mining, and event extraction.
- Utilized spaCy for industrial strength natural language processing.
- Solved problems related to social media analysis with NLP techniques involving supervised techniques with word-level features, sometimes combined with social media and social network metadata.
- Transforming business requirements into analytical models, designing algorithms, building models, developing data mining and reporting solutions that scales across massive volume of structured and unstructured data.
- Professional competency in Statistical NLP /Machine Learning, especially Supervised Learning- Document classification, information extraction, and named entity recognition in-context.
- Worked with Proof of Concepts (Poc's) and gap analysis and gathered necessary data for analysis from different sources, prepared data for data exploration using data wrangling.
- Implementing neural network skilled in Random Forests, Decision Trees, Linear and Logistic Regression, SVM, Clustering, neural networks, Principle Component Analysis and good knowledge on Recommender Systems.
- Strong SQL Server and Python programming skills with experience in working with functions
- Efficient in developing Logical and Physical Data model and organizing data as per the business requirements using Sybase Power Designer, ER Studio in both OLTP and OLAP applications
- Experience in designing star schema, Snow flake schema for Data Warehouse, ODS architecture.
- Experience and Technical proficiency in Designing, Data Modeling Online Applications, Solution Lead for Architecting Data Warehouse/Business Intelligence Applications.
- Worked with languages like Python and Scala and software packages such as Stata, SAS and SPSS to develop neural network and cluster analysis.
- Designed visualizations using Tableau software and publishing and presenting dashboards, Storyline on web and desktop platforms.
- Used dplyr in R and Pandas in Python for performing Exploratory data analysis.
- Use of statistical programming languages like R and Python including Big Data technologies like Hadoop 2, HIVE, HDFS, MapReduce, and Spark.
- Use of Spark 2, Spark SQL and PySpark.
- Responsible for Data Analytics, Data Reporting, Ad-hoc Reporting, Graphs, Scales, PivotTables and OLAP reporting.
- Interacted with data from Hadoop for basic analysis and extraction of data in the infrastructure to provide data summarization.
- Created visualization tools like Tableau, ggplot2 and d3.js, Plotly, R Shiny, for creating dashboards.
- Worked with and extracted data from various database sources like Oracle, SQLServer, and DB2.
Data Scientist
Confidential, McLean, VA
Responsibilities:
- Identifying and executing process improvements, hands-on in various technologies such as Oracle, Informatica, and Business Objects.
- Developed large data sets from structured and unstructured data. Perform data mining.
- Partnered with modeling experts to develop data frame requirements for projects.
- Performed Ad-hoc reporting/customer profiling, segmentation using R/Python.
- Created statistical models for the collected data, exploratory, pre-processing, to provide conclusions with decision guides.
- Programmed a utility in Python that used multiple packages (scipy, numpy, pandas)
- Implemented Classification using supervised algorithms like Logistic Regression, Decision trees, KNN, Naive Bayes.
- Validated the machine learning classifiers using ROC Curves and Lift Charts.
- Extracted data from HDFS and prepared data for exploratory analysis using data munging.
- Updated Python scripts to match training data with our database stored in AWS Cloud Search, so that we would be able to assign each document a response label for further classification.
- Assisted with exploring the use of NLP to enhance the analytics frameworks and to gain a better understanding of their clients and their broader operational environments.
- Identified use of NLP to verify the consistency between company reports and financial statements.
Data Analyst
Confidential, Atlanta, GA
Responsibilities:
- Generated SQL scripts to retrieve data from multiple tables and to load data into UAT and production environment.
- Handling large databases in Dev, UAT and Production environments.
- Maintained high security while sharing the data within internal, external team.
- Performed data analysis to measure individual performance and analyzed priorities of tickets to draw insights.
- Developed reports, dashboards for daily/weekly/monthly performance metrics by using SQL, MS Excel, MS PowerPoint and share point.
- Recognizes the connection between business operations and analytics to influence business strategies and solutions.
- Recommended business for reporting and analytic views. Simplified the process which helped clear backlogs which reduced the SLA breach by 36%.
- Gathered and prepared data from multiple sources to support information analytics.
- Monitored and tracked data reporting details for all data sources.
- Document requests for data enhancements.
- Proactively review data for areas requiring improvement, change, or infill.