Data Scientist Resume
Washington, DC
SUMMARY
- Around 8 Years of experience in Machine Learning, Data mining with large datasets of Structured and Unstructured data, Data Acquisition, Data Validation, Predictive modeling, Data Visualization.
- Used Pandas, Numpy, Seaborn, Matplotlib and Scikit - learn in Python for developing various machine learning algorithms and utilized machine learning algorithms such as Linear Regression, Multivariate Regression, Logistic Regression, Naive Bayes, Random Forests, K-Means, & KNN for Data Analysis.
- Hands on experience in implementing Clustering and Principle Component Analysis techniques for Segmentation and Recommender Systems.
- Strong experience in Software Development Life Cycle (SDLC) including Requirements Analysis, Design Specification and Testing as per Cycle in both Waterfall and Agile methodologies.
- Adept in statistical programming languages like R and also Python including Big Data technologies like Hadoop, Hive.
- Skilled in using dplyr and pandas in R and Python for performing exploratory data analysis.
- Experience working with data workflow tools like Draw.io, Visio and ER Studio.
- Experience in designing stunning visualizations using Power BI software and publishing and presenting dashboards, Storyline on web and desktop platforms.
- Designed and implemented system architecture for Microsoft Azure and AWS Sagemaker based cloud-hosted solution for the client.
- Skilled in using Hadoop (pig and Hive) for basic analysis and extraction of data in the infrastructure to provide data summarization.
- Highly skilled in using visualization tools like Power BI, ggplot2 and matplotlib for creating dashboards.
- Worked and extracted data from various database sources like Oracle, SQL Server, MySQL and experienced in Normalization & De-Normalization techniques for optimum performance
- Regularly accessing JIRA and Asana tool and other internal issue trackers for the Project development.
TECHNICAL SKILLS
Scripting/programming language: R (dplyr, ggplot2, shiny, plotly), Python (Numpy, Scipy, Pandas, Scikit-learn, Matplotlib, NLTK, Beautiful Soup, Selenium, Python IDE)
Machine learning/Deep learning: Classification, Regression(Linear, Logistic, Elastic Net), Clustering analyses using neural nets (MLP), RF, KNN, SVM, GLM, MLR, Logit, K-means algorithms
Database management systems: RDBMS (Microsoft SQL server, Oracle DB)
Big Data: MySQL, Spark, Hadoop/MapReduce, Hive, Impala
Data storage/processing framework: Hadoop And Spark
Data visualization/reporting: Power BI, Draw.io, OGMA, ggplot2, shiny, and matplotlib
Operating System: Windows, Unix
ETL Tools: Alteryx, Trifacta, Adaptive, Airflow
Cloud: Microsoft Azure, Azure Machine Learning Studio, AWS Sagemaker
PROFESSIONAL EXPERIENCE
Confidential, Washington, DC
Data Scientist
Responsibilities:
- Lead the team in implementing Data Lineage activities for the cybersecurity metrics of the clients and a cted in advisory capacity sharing industry best practices at an enterprise scale.
- Improved the performance of existing machine learning models and ETLs to ensure accuracy of the data processing so that the metrics/KPIs are up to date for the senior management.
- Determined data sources, aggregated and modeled the data to make threat predictions.
- Used sqoop ingestion to consume data from SQL Server, Oracle, MySQL and exported it into Hive databases.
- Utilized Cloudera Navigator to connect the Hive databases to external outputs like dashboards for user consumption.
- Analyzed data dictionaries and workflows for the ETL engines.
- Designed PoCs of ETL workflows on the php and perl scripts on Trifacta, Adaptive, Alteryx and Airflow tools .
- Implemented a fully vetted solution analytical solution using Python, React/D3 and third party tools and have a user base aligned to that of the Dashboard metric consumer base.
- Implemented visualization of the data flow using Draw.io, D3 & React to simplify the complex state of the data solutions .
- Performed reverse engineering tasks and analysis work for quality checks and suggested alert mechanisms.
Environment: Python, Cloudera Navigator, Oracle, SQL Server, Hive, Draw.io, React, D3, PHP, Alteryx, Trifacta, Adaptive, Airflow
Confidential, Washington, DC
Data Scientist and Research Manager
Responsibilities:
- Implemented machine learning solutions by understanding clients’ needs and correlating Confidential ’s data with the clients’ data and supported the Research team in developing reproducible statistical modeling.
- Provided executives with analytics and decision-support tools used as the basis for reorganization, consolidation and relocation strategies.
- Collaborated with cross-functional stakeholder groups across the organization to ensure business, technology and research alignment. Prototyped and created dashboards in Power BI
- Various projects include applications in higher education, sports/esports performance, fan profiling, health insurance, and customized content delivery.
- Used Pandas, Numpy, Seaborn, Scipy, Matplotlib, Sci-kit-learn, and dplyr, ggplot2, SparkR for developing various machine learning algorithms.
- Automated the models in Azure ML Studio for production and user consumption.
- Develop Python, HIVE scripts to filter/map/aggregate data. Scoop to transfer data to and from Hadoop.
- Developed a Machine Learning test-bed with 12 different model learning and feature learning algorithms..
- Utilized Hadoop, HBase, Kafka, Spark Streaming, MLLib, Python, a broad variety of machine learning methods including classifications, regressions, dimensionally reduction etc. and Utilized the engine to increase user lifetime by 45% and triple user conversations for target categories.
- Developed various Power BI Data Models by extracting and using the data from various sources files, DB2, Excel, Flat Files and Big data.
- Participated in all phases of Data-Mining, Data-collection, Data-Cleaning, Developing-Models, Validation, Visualization and Performed Gap Analysis.
- Designed and developed Use Case, Activity Diagrams, Sequence Diagrams, OOD (Object oriented Design) using Draw.io and Visio.
Environment: R, Python, Power BI, Azure ML Studio, HDFS, Oracle, Hive, OLAP, DB2, Metadata, MS Excel, MS Visio, Map-Reduce, SQL.
Confidential, Washington, DC
Data Analyst/ Data Scientist
Responsibilities:
- Developed Player Recruitment Strategy by analyzing the Prozone's Major League Soccer data and applying Principal Component Analysis and Clustering techniques.
- Analyzed the playing styles of the Major League Soccer clubs in 2015 and 2016 seasons using pie-radar visualization charts using Power BI.
- Conducted research on the current work in the Soccer Analytics that can be used by the clubs.
- Designed a static pipeline in MS Azure for data ingestion and dashboarding. Used MS ML Studio for modeling and MS Power BI for dash boarding.
- Analyzed large datasets to provide strategic direction to the organization.
- Used PANDAS, NUMPY, SEABORN, MATPLOTLIB, SCIKIT-LEARN, SCIPY, NLTK in Python for developing various machine learning algorithms.
- Assisted in developing internal tools for Data Analysis by carrying out specified data processing and statistical techniques and advised on the suitability of methodologies and suggested improvements.
- Familiarity with Hadoop cluster environment and configurations for resource management for analysis works Python, Pyspark, HIVE for analytics and developing dashboards
- Created statistical models using distributed and standalone models to build various descriptive, predictive and prescriptive solution.
- Produced effective strategies based on accurate and meaningful data reports and analysis and/or keen observations.
- Developed web applications using RShiny, Python and worked on bug fixes/issues that arise in the production environment and resolved them at the earliest.
Environment: Power BI, MS Azure, HDFS, Python, R, Shiny, Map reduce, PIG, Hive, UNIX, XML, JSON etc.
Confidential, Washington, DC
Data Science Research Assistant
Responsibilities:
- Assisted the Institute for the Study of International Migration at Confidential by creating and debugging robots using Kapow to scrape websites
- Utilized NLP application and sentiment analysis to identify trends and patterns within scrapped massive data sets.
- Developed scalable machine learning solutions and created automated detection systems and constant tracking of its performance.
- Knowledge in ML & Statistical libraries (e.g. Scikit-learn, Pandas) and strong command of data architecture and data modelling techniques .
- Drew inferences and conclusions, and created Dashboards and Visualizations of processed data to identify trends and anomalies
- Worked closely with internal stakeholders such as business teams, product managers, engineering teams, and partner teams.
- Data mining using state-of-the-art methods and extending company’s data with third party sources of information when needed
- Enhanced data collection procedures to include information that is relevant for building analytical systems
- Processed, cleansed, and verified the integrity of data used for analysis and created automated metrics using complex databases.
- Ad-hoc analysis and presenting results in a clear manner and fostered a culture of continuous engineering improvement through mentoring, feedback, and metrics
Environment: Kapow, Python, HDFS, Hadoop, Tableau, Scikit-Learn, Pandas, Spark, Storm etc.
Confidential
Analytical Solutions/Data Engineer
Responsibilities:
- Guided a team of 12 on analysis, design, development, testing, and deployment of mobile applications, internal data systems and analytical solutions for 350,000 internal employees.
- Increased the usage of the mobile apps by 20% in Eight months using data-driven approach and creating actionable intelligence from the internal data systems.
- Supported the system integration, databases to record and store the usage of the mobile applications, and analyzed the large data sets from the multiple sources to understand the usage behavior of the mobile apps.
- Managed and maintained relationships with external stakeholders including User Experience, Testing, Security, Performance, and Deployment teams for the successful implementation of the entire software development life cycle.
- Built dashboards and reports using MS Office and SQL scripts to present relevant trends and significant data points on the usage of the apps to the senior management for decision making.
- Initiated process improvements which ensured data quality is carried out by the team and documented all processes.
- Collaborated with Data analyst and others to get insights and understanding of the data and performed Data Mining, Data Analytics, Data Collection, and Data Cleaning.
- Developing Models, Validation, and Visualization and performed Gap analysis.
- Cleansed and transformed the data by treating Outliers and handling the missing values.
- Used Elastic Search to retrieve data into application as required.
- Statistical Modelling with ML to bring Insights in Data under guidance of Principal Data Scientist Data modeling with Pig, Hive, Impala.
- Analyzed the partitioned and bucketed data and compute various metrics for reporting.
- Involved in loading data from RDBMS and web logs into HDFS using Sqoop and Flume.
- Worked on loading the data from MySQL to HBase where necessary using Sqoop.
- Developed Hive queries for Analysis across different banners.
- Extracted feedback data from the users using Java and API. Parsed JSON formatted data and uploaded to database for analysis to study customer behavior.
- Exported the result set from Hive to MySQL using Sqoop after processing the data.
- Hands on experience working on AVRO file formats and compression.
- Experience in writing MapReduce programs with Java API to cleanse Structured and unstructured data.
- Created HBase tables to store various data formats of data coming from different portfolios.
Environment: Objective C, Java, HDFS, HBase, Sqoop, Hive, Impala, Java API, Map reduce, PostgreSQL, MS Access etc.
