Data Scientist Resume
Cleveland, OH
SUMMARY:
- 10+ Years of IT Experience with strong background of object oriented programming, Data warehousing and SQL.
- 4+ years for experience with data science using data mining, Machine learning and advance statistical techniques.
- Strong knowledge of computer science fundaments, writing robust algorithms and data structure concepts, developed reusable applications objects using different programming languages.
- Worked on variety of analytical products and model like patient matching and risk prediction, Customer Retention analysis, Sentimental analysis of product reviews, Job recommendation engine(Collaborative Filtering).
- Expertise in employing techniques for Supervised and Unsupervised (Clustering, Classification, PCA, Decision trees, KNN, SVM) learning, Predictive Analytics, Optimization methods and Natural Language Processing(NLP), Time Series Analysis.
- Experience with Basic and advance statistical modeling techniques like AROMA, Hypothesis Testing, Non Parameterized Testing, Regression, Logistic Regression, Missing Value analysis, Survival Analysis using Kaplan - Meirer and Cox proportional hazard models.
- Hands on experience of Data Science libraries in Python such as Pandas, NumPy, SciPy, scikit-learn, Matplotlib, Seaborn, BeautifulSoup, Orange, Rpy2, LibSVM, neurolab, NLTK.
- Hands on experience on R packages and libraries like ggplot2, Shiny, h2o, dplyr, reshape2, plotly, RMarkdown, ElmStatLearn, caTools etc.
- Hands on Experience in working with Hadoop ecosystem componnets like Hive, Pig, Sqoop, Map Reduce, Flume, OoZie and Excellent understanding of HDFS and Spark framework.
- Experience in performing operations like union, join, group, filter, collate join, grouped data in bags using pig and retrieve data from Hbase using Hive and Imapla queries.
- Extensive experience in using unix and writing shell scripts to schedule, automate jobs.
- Sound knowledge of Data warehousing and database concepts. Expertise in Performance tuning, Optimization, Data integrity and Statistics by using SQL Profiler.
- Extensive experience in data analysis, mapping, extracting transforming and loading data using various business intelligence technologies.
- Extensive experience in using SQL Server, Oracle, Sybase and DB2 databases. Writing complex SQL queries, procedures, triggers etc.
- Experience in healthcare, retail, insurance, corporate banking and erp domains.
TECHNICAL SKILLS:
Operating System: Windows, Unix, Mac
Machine Learning Python: pandas, numpy, scikit-learn, scipy, statsmodels, ggplot2
Databases: Vertica, Oracle, MS Sql Server, MySql, Sybase ASE 12.5, DB2
Big Data & Analytical: R, Python, SAS EMiner, NodeXL, XLMiner, Stata, Analytical Solver, Hadoop Pig, Hive, Impala, Ruby
Visualization: Tableau, ggplot2 and matplotlib, seaborn
Business Intelligence: SSIS, SSRS
PROFESSIONAL EXPERIENCE:
Confidential, Cleveland, OH
Data Scientist
Responsibilities:
- Responsible for creating new and maintain existing statistical, machine learning models for patient matching, risk prediction and grouping.
- Develop algorithms for data manipulation, abstraction, and standardization.
- Use logistic regression on EMR patient data over ten thousand variable automatically extracted from EMR data.
- Use 10 year risk estimates model based on the multivariate regression equations. Measure of calibration to accurately measure the absolute level of risk.
- Use Likelihood ratio test to measure the model fit.
- Perform risk reclassification analysis to access the utility of risk prediction models.
- Evaluate model and train the model using variable cases, performed hypothesis testing for patient matching models.
- Use Python NumPy, SciPy, Pandas packages to perform dataset manipulation.
- Use R to write multivariate regression models and Risk predication algorithms.
- Heavily used complex Sql queries to retrieve, validate data from Hbase and vertica database.
- Following agile methodology, weekly sprint and daily scrum for status updates on task.
Tools: and Technologies used: R, Python, Hadoop/Hive/Pig/Impala, Vertica, Statistics & Machine Learning
Confidential, Cincinnati, OH
Data Scientist
Responsibilities:
- Performed exploratory data analysis and removed the unwanted symbols, tags, punctuation, spaces etc and stemming of words from reviews and
- Figured out the industry specific positive/negative lexicons and created data dictionary to analysis the people opinions.
- Generated the Document Term Matrix and Term frequency matrix on corpus.
- Generated word clouds and other plots to analyze the sentiments trends.
Tools: and Technologies used: Text Mining, R, SQL, Excel
Confidential
Data Scientist
Responsibilities:
- Performed exploratory data analysis on job seekers and postings dataset.
- Analyzed clusters by plotting them using various plots like entropy plot, uncertainty plots, Cluster Density Plots etc
- Use Collaborative filtering for recommending the appropriate jobs to job seeker .
- Use cosine similarity to measure the similarity between skill pairs, created job seekers beharvioiur matrix.
- Written algorithm using R and generated interactive plots using ggplot2.
- Written heavy Sql queries using MySQL, Hive and impala to analyze data.
Environment: R, Machine Learning, Data Mining, Hadoop, Hive, Impala
Confidential
Data Scientist
Responsibilities:
- Cleaned & Identified sample data into a single view of the customer to start the analysis.
- Applied clustering technique know customer segment.
- Build Propensity model to predict the likelihood of an outcome, how likely a customer is to default, churn or lapse,
- Calculated customer value metric to priorities customer based on probability and value.
- Created Dashboards using Tableau to exposed the analysis in Dashboards.
Tools: and Technologies used: Python, SQL Server, Tableau, Excel
Confidential
Data Analyst
Responsibilities:
- Involved in analyzing user specifications for workability, completeness and business flow.
- Extracted data from various sources like SQL Server 2008/2005, DB2, .CSV, Excel and Text file from Client servers and through FTP.
- Developed, deployed and monitored SSIS Packages for data warehouse created for project.
- The packages created included a variety of transformations, for example Slowly Changing Dimensions, Look up, Aggregate, Derived Column, Conditional Split, Fuzzy Lookup, Multicast and Data Conversion.
- Reviewed & Tested packages, Reports and Cubes, fixing bugs (if any) using SQL 2008 Business Intelligence Development Studio.
- Created cubes and dashboards to show to Zurich assets trends to higher management.
Environment: Windows XP, SQL 2008, SSIS, SSRS
Confidential
Data Warehousing Consultant
Responsibilities:
- Converted existing Base SAS packages into SSIS 2005.
- Analyze and document the SAS code logic and prepare the database design for SSIS packages.
- Developed SSIS package to pull the data from various heterogeneous sources like Excel, Flat files, Db2, Nettezza and Sql server.
- Converted reports generated during the process into SSRS technology.
- Developed Stored Procedures, Triggers, and SQL scripts for performing automation tasks.
- System testing of bug fixes / enhancements and Support of the application RcardETL application.
Environment: Windows XP, SQL 2005, SSIS 2005, SSRS 2005
Confidential
ETL Developer
Responsibilities:
- Evaluates existing manual process in application and proposed automation of process to business in order to reduce manual intervention and save the customer cost.
- Enhance and maintain the application using power builder 10.
- Estimate, Design, develop and implement new ETL process based.
- Developed Stored Procedures, Views, and T-SQL scripts.
- Developed SSIS packages to pull data from mainframe flat file.
- Developed reports using SSRS and in excel format generated as part of during ETL process.
- System testing of bug fixes.
Environment: Windows XP, SQL 2005, SSIS, SSRS
Confidential
Consultant - Application Development
Responsibilities:
- Migrated application frontend from Power Builder 6.5 to 12.5 and backend from Sybase ASE 12.5 to MS Sql Server 2008.
- Converted mainframe/Unix jobs into SSIS technology.
- Managed individually all migration and development activities of the project.
- Presented Demos/Prototypes for the migrated or new developed application part to customer time to time.
- Participated in Client review and status meetings.
- System testing of bug fixes / enhancements.
- Production Support of the application.
Environment: Windows XP, SQL Server 2008, SSIS, Sybase ASE 12.5
Confidential
Senior Software Engineer
Responsibilities:
- Involved in Database design and development.
- Developed various Reports using SSRS, Customized existing reports according to the functional specifications.
- Performed Tuning on the SQL queries, making the Procedures runs faster and more efficiently.
- Client communication.
- Developed SSIS packages extracting data from various sources oracle, Flat file and sql server.
Environment: Windows XP, SQL 2005, SSIS
Confidential
Senior Software Engineer
Responsibilities:
- Analyzed Requirement, Designed and developed.
- Involved in Database design and development.
- Developed Complex reports using SSRS.
- Created SSIS package to extract the data from Oracle database and Developed complex SSRS reports.
Environment: Windows XP, SQL 2005, SSRS, SSIS
Confidential
Software Engineer
Responsibilities:
- Developed complex Stored Procedures, Triggers, functions and SQL scripts.
- Troubleshoot database related issues in a timely fashion.
Environment: Windows XP, SQL 2005