Data Scientist Resume
Baltimore, MD
SUMMARY:
- 6 plus years of work experience as a Data Scientist, including deep expertise and experience with Statistical Analysis, Data Mining and Machine Learning Skills using R, Python and SQL.
- Data Driven and highly analytical with working knowledge and statistical model approaches and methodologies (Clustering, Regression analysis, Hypothesis testing, Decision trees, Machine learning), rules and ever evolving regulatory environment.
- Professional working experience in Machine Learning algorithms such as Linear Regression, Logistic Regression, Random Forests, Decision Trees, K - Means Clustering and Association Rules.
- Knowledge in Cloud services such as Microsoft Azure and Amazon AWS.
- Experience with Amazon Web Services (AWS) in planning, designing, implementing and maintaining system applications in AWS Cloud in Windows and Linux Environments.
- Experience with AWS services (S3, EC2, ELB, EBS, Route53, VPC, Auto Scaling etc.) and deployment services (Lambda and Cloud Formation) and security practices (IAM, Cloud Watch and Cloud Trail).
- Experience working in Agile Scrum Software Development
- Knowledge about Big Data toolkits like Mahout, SparkML, H2O.
- Knowledge in NoSQL databases such as HBase, Cassandra, and Mongo DB etc.
- Working experienced of statistical analysis using R, SPSS, Mat lab and Excel.
- Experience working with Version Control (Git).
- Experience with traditional analytics tools (Excel and Tableau).
- Knowledge in Microsoft Azure PaaS services such as SQL server, HDInsight and Cloud platform based Kusto.
- Hands on experience in writing queries in SQL and R to Extract, Transform and Load ( ETL ) data from large datasets.
- Experience working on Python libraries such as Pandas, matplotlib, PIL.
- Experience with Python packages such as numpy, scipy.
- Experience working with various kinds of data files such as Images (jpeg, jpg, png) and audio (.mp4, .wav).
- Experience working with SAS Language for validating data and generating reports.
- Experience working with Web languages such as html, css, rshiny etc.
- Strong Data Analysis skills using business intelligence, SQL and / or MS Office Tools.
- Experience working in Agile/Scrum Methodologies to accelerate Software Development iteration.
- Experience in applying predictive modeling and machine learning algorithms for analytical reports.
- Profound Analytical and problem solving skills along with ability to understand current business process and implement efficient solutions to issues/problems.
- Experience using technology to work efficiently with datasets such as scripting, data cleansing tools, statistical software packages.
- Strong understanding of how analytics supports a large organization including being able to successfully articulate the linkage between business objectives, analytical approaches &findings and business decisions.
- Excellent analytical skills with demonstrated ability to solve problems.
- Ability to work with large transactional databases across multiple platforms (Teradata, Oracle, HDFS, SAS).
- High Proficiency in Excel including complex data analysis and manipulation.
- Good oral and written communication skills.
- Strong interpersonal skills to successfully build long-term relationships with colleagues and business partners.
- A results-driven individual with a passion for data/analytics who can work collaboratively with others to solve business problems that drive business growth.
- Demonstrated leadership and self-direction. Demonstrated willingness to both teach others and learn new techniques.
- Ability to work with managers and executives to understand the business objectives and deliver as per the business needs and a firm believer in teamwork.
TECHNICAL SKILLS:
Programming & Scripting Languages: R, C, C++, JAVA, JCL, COBOL, HTML, CSS, SAS,Java Script, Scala
Database: SQL, MySQL, MS Access, Oracle NoSQL Hbase, Cassandra, MongoDB, Pig, Hive, Impala.
Statistical Software: SPSS, R, SAS
Web Packages: Google Analytics, Adobe Test & Target, Web Trends, Rshiny
Development Tools: R Studio, Notepad++, PyCharm IDE, Jupyter, Spyder IDE Big Data (Hadoop): Mahout, SparkMl, H2O, MapReduce, Pig, Hive, HBase
Version Control: Git
Writing Tools: Latex
Packages: Dplyr, rjson, GGPLOT2, NumPy, SciPy, Pandas, matplotlib, tesseract, PIL, Rshiny, rjson, rpart.
Techniques: Machine learning, Regression, Clustering, Data mining.
Machine Learning: Naive Bayes, Decision trees, Regression models, Random Forests, Time-series, K-means, PCA, SVM.
Business Analysis: Requirements Engineering, Business Process Modeling & Improvement, Financial Modeling
Cloud: AWS(EC2, S3, RDS, EBS,VPC, IAM, Security Groups), Azure
Operating Systems: Microsoft windows 7/8/8.1/10/Vista/XP, Linux (Ubuntu)
PROFESSIONAL EXPERIENCE:
Confidential, Baltimore, MD
Data Scientist
Responsibilities:
- Work independently and collaboratively throughout the complete analytics project lifecycle including data extraction/preparation, design and implementation of scalable machine learning analysis and solutions, and documentation of results (In Agile Environment).
- Responsible for launching Amazon EC2 instances using Amazon Web Services (Windows & Linux).
- Created roles for EC2, S3 and EBS resources to communicate within the team using IAM.
- Responsible for S3 bucket creation, policies and the IAM role based policies and creating alarms and notifications for EC2 hosts using Cloud Watch.
- Build and configure Virtual Data Center in AWS cloud to support EDW hosting including Virtual Private Cloud (VPC), Public and Private Subnets, Security Groups, Route Tables and launching EC2, RDS instances in the defined virtual private connection.
- Worked with Different datatypes in R such as Vectors, Lists, Matrices, Arrays and Data Frames.
- Read and write data from various .csv, .xml, .json files in R-studio.
- Worked with SAS for extracting data, manipulating, validating and generating reports.
- Used various PROC and DATA statements like MEANS, UNIVARIATE, and PRINT, LABEL, FORMAT etc. and loops in SAS to read and write data.
- Updated changes on a regular basis in rshiny web application for the insurance trends for Confidential .
- Perform data visualization using matplotlib library function in R such as Histograms, Pie charts, Bar charts, scatter plots etc.
- Responsible for Creating Repositories in Git for a new user story.
- Worked with datatypes in Python such as Strings, Lists and Dictionaries.
- Used various libraries and packages in Python such as Pandas, Numpy, Scipy, scikit-learn for reading, writing, calculations and modelling.
- Installed RStudio server on AWS Linux AMI as a free tier.
- Application of various machine learning algorithms and statistical modeling like decision trees, regression models, clustering, SVM to identify Volume using scikit-learn package in R and Python.
- Hands on experience in implementing Naive Bayes, Random Forests, Decision Trees, Linear and Logistic Regression, SVM, Clustering, Principle Component Analysis in Python and R.
- Experience working with machine learning algorithms in python using scikit-learn libraries.
- Have knowledge on A/B Testing, ANOVA, Multivariate Analysis, Association Rules and Text Analysis using R.
- Have Knowledge on Hadoop ecosystem framework, Pig and Hive.
- Extensive data cleansing and analysis, using pivot tables, formulas (V-lookup and others), data validation, conditional formatting, and graph and chart manipulation using MS Excel.
- Verify that the proper files are uploaded into the right git repositories.
- Worked with python scipy and numpy libraries for performing statistical analysis.
- Collaborate with technical and non-technical resources across the business to leverage their support and integrate our efforts.
- Coordinated and monitored the project progress to ensure the timely flow and complete delivery of the project.
- Training and Testing of Data under data modelling process for each machine learning algorithm.
- Created pivot tables and charts using worksheet data and external resources, modified pivot tables, sorted items and group data, and refreshed and formatted pivot tables.
- Analyzed the Root cause in a code and incorporated changes in programs as cost-effective solution.
- Worked on different data formats such as JSON, XML and performed machine-learning algorithms in Python.
- Preparing the Test Documents, Results and delivery assuring completion of the user story.
Confidential
Data Scientist
Responsibilities:
- Responsible for Retrieving data using SQL/Hive Queries from the Cyber Life database and perform Analysis enhancements.
- Used R, SAS and SQL to manipulate data, and develop and validate quantitative models.
- Worked as a RLC (Regulatory and Legal Compliance) Team Member and undertook user stories (tasks) with critical deadlines in Agile Environment.
- Read data from various files including .html, .csv, .sas7bdat file etc. using SAS/R/Python.
- Involved in Analyzing system failures, identifying root causes, and recommended course of actions.
- Coded, tested, debugged, implemented and documented data using Python and R.
- Responsible for Testing and Modelling the data in order to migrate it to production environment.
- Experience in retrieving unstructured data from different sites such as in html, xml format.
- Make sure that all the requirements are gathered before working on the data.
- Performing Exploratory Data Analysis on the data provided by the Client.
- Retrieve data from various files such as .json, .xml, .csv using R and python.
- Worked with Pandas libraries in Python for storing the retrieved data in a file.
- Conducting Root Cause Analysis independently/collaboratively in resolving multiple occurred issues.
- Worked with Dataframes and other data interfaces in R for retrieving and storing the data.
- Responsible in making sure that the data is accurate with no outliers.
- Applied various machine learning algorithms such as Decision Trees, K-Means, Random Forests, Regression in R with the required packages installed.
- Applied K-Means algorithm in determining the position of an Agent based on the data collected.
- Applied Regression in identifying the probability of the Agent’s location regarding the insurance policies sold.
- Used advanced Microsoft Excel functions such as Pivot tables and VLOOKUP in order to analyze the data and prepare programs.
- Performed various statistical tests for clear understanding to the client.
- Provided training to Beginners regarding the Cyber Life system and other basics.
- Actively involved in Analysis, Development and Unit testing of the data and delivery assurance of the user story in Agile Environment.
Confidential
Jr. Data Scientist
Responsibilities:
- Responsible for collecting patients data from various sources including hospitals, clinics etc.
- Prepared regular patient reports by collecting samples of Diagnosed Patients using Excel spreadsheets.
- Ensure that there are no missing values in the dataset and can be used for further Analysis.
- Cleaned data by analyzing and eliminating duplicate and inaccurate data (outliers) using R.
- Worked in Agile Environment and responsible for designing analytic frameworks for data mining, ETL, analysis, and reporting under the supervision of the Manager.
- Trained in Basics of Data Scientist and implemented those software applications in collecting and managing patient data in Excel/SPSS.
- Trained in Python packages/libraries such as Pandas, Matplotlib, scipy, numpy etc.
- Involved in analyzing image files using visualization tools such as tesseract, isomap.
- Assisted in performing statistical analysis of the data and storing them in a database.
- Worked with Quality Control Teams to develop Test Plan and Test Cases.
- Involved in designing and implementing the data extraction (XML DATA stream) procedures.
- Generated graphs and reports using ggplot, ggplot2 in R Studio for analyzing models.
- Generating the Results and predicting the Accuracy.
- Preparing the Final Documents and ensure delivery to the Client before EOD.