We provide IT Staff Augmentation Services!

Associate Data Scientist Resume

3.00/5 (Submit Your Rating)

Chicago, IL

PROFESSIONAL SUMMARY

  • Over 8+ years of industry experience in Statistics, Data Mining, Data Warehousing, Machine Learning,Data Analysis,Data Integration, Data Migration, wif large sets of Structured and Unstructured datato solve scientific and business problemsusing R,Python,SQL, and SAS.
  • Hands on experience of predictive and statistical analysis tools such as R, Python, Excel, weka, SAS, SPSS.
  • Experience in using various packages in R and python like ggplot2, caret, dplyr, Rweka, pandas, RCurl, tm, C50, numpy, NLP, Reshape2, rjson, plyr, Beautiful Soup, Rpy2, tkinter.
  • Expertise in building robust Machine Learning models, Deep Learning models, Convolution Neural Networks (CNN), Recurrent Neural Networks (RNN), LSTM (Time Series Analysis) using TensorFlow and Keras.
  • Strong working experience in using of Pandas, Numpy, Seaborn, Scipy, Matplotlib, Scikit - learn, NLTK in python for developing various Machine learning algorithms such as Supervised (Linear Regression, Logistic Regression, Naïve Bayes, Decision Trees, SVM, Random forest, Neural networks) and Unsupervised (K-Means Clustering, KNN, Apriori).
  • Knowledge in Text Mining, Graph Processing, Regression Modelling, Correlation, GAP analysis, SWOT analysis, Time Series Analysis,MANOVA, Hypothesis Testing, TEMPPrincipal Component Analysis (PCA), cluster analysis, factor analysis, Ensembles, testing and validation usingROC plot,LOOC,k-fold cross validation, AUC.
  • Knowledge in Markov-Chain Monte-Carlo (MCMC) simulations, mean-variance analysis, Black-Scholes model, volatility estimation and other statistical models.
  • Strong proficiency in supporting Cloud environments like Amazon Web Services (AWS), Azure, OpenStack, IaaS, PaaS and SaaS tools.
  • Knowledge in AWS services like Auto Scaling, CloudFormation, CloudTrail, CloudWatch, DynamoDB, EBS, EC2, ELB, IAM, Redshift, RDS, S3, VPC, Route 53, CloudFront, security group and AWS lambda.
  • Expertise in handling large transactional databases across multiple platformssuch as Oracle, Teradata, SAS, HDFSand in NoSQL databases like HBase, Cassandra, MangoDB.
  • Extensive noledge and experience working wif Big Data Hadoop eco-system tools like Pig, Hive, HBase, Apache Kafka, Apache Flume-ng, Apache Spark, Sqoop, Oozie, Cassandra.
  • Experience in writing UDF (User Defined Functions) in python for Hadoop (Hive, Pig) and python modules to extract data from MYSQL databases and from HTML pages.
  • Good familiarity wif spark clusters to manipulate RDD (Resilient Distributed Datasets)usingRDD partitions and building models wifPySpark, SparkSQL, SparkMLLib, and SparkML.
  • Strong Oracle/SQL Server programming skills wif experience in working wif functions, packages, joins,groupby,merges,triggers and in data mining concepts like Dimensional Data Modeling, Star Schema modeling, Snowflake Schema modeling, fact constellation table.
  • Expertise in Business Intelligence solutions using Data warehousing/Decision Support Systems, OLAP, OLTP and in developingProcess Modeling Data Flow diagrams(DFD)&flow charts.
  • Knowledge in ticketing tools such as Jira/confluence, ClearQuest and version control tools such as GitHub, Git, Subversion (SVN) and in ETL tools like Informatica, SSIS (SQL Server Integration Services), SSRS (SQL Server Reporting Services).
  • Extensive noledge in business intelligence tools like Teradata, MicroStrategy, Tableau, Plotly for creating histograms, Pie charts, Dot charts, Box plots, dashboards and Storylinealso in developing Tabular Reports, Custom Reports, Matrix Reports using SSIS in BIDS.

TECHNICAL SKILLS

SDLC Methodologies: Waterfall, RUP, Agile/Scrum.

Operating Systems: Windows, Linux.

Programming/Scripting Languages: Shell, BASH, Python 2.7/3, R 3.4, SQL, SAS, SPSS.

Cloud/IaaS/Saas/PaaS: Amazon Web Services (AWS), Microsoft AzureAWS ServicesAutoscaling, CloudFormation, CloudTrail, CloudWatch, DynamoDB, EBS, EC2, ELB, IAM, Redshift, RDS, S3, VPC, Route 53, CloudFront, security group, Glacier, Lamda, SNS, SQS, SES.

Databases: Oracle (12c/11g/10g), MySQL, Teradata, SQL Server, MS Access, PostgreSQL,Mango DB, Cassandra, Netezza, DB2.

Algorithms: Linear Regression, Logistic Regression, Naïve Bayes, Decision Trees, SVM, Random forest, Neural networks, K-Means Clustering, KNN, Ensembles.

Ticketing Tools: Atlassian JIRA, REMEDY, Bugzilla, Redmine.

ETL Tools: Informatica 9.0, SSIS.

Big Data Tools: Scala, Spark, Sqoop, Flume-ng, Hive, Pig, kafka, oozie, HBase.

Version Control Tools: Github, Git, Subversion(SVN).

Packages: ggplot2, caret, dplyr, Rweka, gmodels, RCurl, tm, C50, NLP, Reshape2, rjson, plyr, Beautiful Soup, Rpy2, tkinter, numpy, pandas, matplotlib.

Visualization/Reporting tools: Tableau, ggplot, plotly, Qlikview, Matplotlib, Powerview, SAP Lumira, D3, SAP BO.

Data Modelling Tools: Rational Rose, MS Visio, ERWin r7, MY SQL WorkBench, ER/studio.

IDE: Jupyter, Spyder, RStudio, PyCharm, IPython, Visual Studio R, Anaconda, SAS ODS, MATLAB, weka

BI Tools: MicroStrategy, Clear Analytics, Tableau.

PROFESSIONAL EXPERIENCE

Associate Data Scientist

Confidential - Chicago, IL

Responsibilities:

  • Experience in working wif health care-related data, including a sound understanding of medical claims/ EMR, Medicare and Medicaid, pulled from both structured and unstructured data using Hadoop big data systems.
  • Created roles for EC2, S3 and EBS resources to communicate wifin teh team using IAM. Responsible for S3 bucket creation, policies and teh IAM role-based policies and creating alarms and notifications for EC2 hosts using Cloud Watch.
  • Involved in data collection and extraction of datafromNetezza, Teradataas per requirements andconducted data preparation, data profiling, data preprocessing, data cleansing and outlier detection.
  • Analysis using R and Python on data derived from big data Hadoop systems using distributed processing paradigms and databases using SQL and NoSQL.
  • Deployed automated Sqoop jobs to import and export data from Relational DBMS to HDFS environment.
  • Created internal and external tables using Hive for querying and analyzing data sets, theirby converting Hive queries into Spark transformations using Spark RDD.
  • Devised PL/SQL/Pig/Hive statements - Stored Procedures, Functions, Triggers, Views and packages. Made use of Indexing, Aggregation and materialized views to optimize query performance.
  • Performed data profiling to learn about teh behavior of various features and finding dependencies wifin them also used imputation techniques for missing data.
  • Due to complexities in raw data set, used TEMPPrincipal Component Analysis(PCA)& Factor Analysis in feature engineering to analyze high dimensional data in Python.
  • Used Pandas, numpy, scipy, matplotlib, scikit-learn, Tensor flow to design various algorithms
  • Application of various machine learning algorithms and statistical modeling like decision trees, text analytics, supervised and unsupervised, regression models, neural networks, deep learning, clustering using scikit-learn package in python.
  • Developed machine learning algorithms such as Decision Tree, linear regression, multivariate regression, Naive Bayes, Random Forests & KNN using python in teh Hadoop/Hive environment on AWS EC2 instance.
  • Used clustering algorithm like K-Means to identify outliers and to classify unlabeled data.
  • Performed regression analysis to predict teh incidence of teh disease and classifying patients by probability of occurrence of disease and providing health insurance plans.
  • Developed dashboards using Tableau and Excel to identify disease incidence in geographic locations and factors such as high BMI, BP, triglycerides, level of education, level of income influencing teh outcome.

Environment: AWS, Teradata 15, MS Office Suite - Excel(Pivot, VLOOKUP), R, Python, Netezza, Tableau 10.1.12, Hadoop, Map Reduce, Pyspark, R, Spark, Hive, Spark MLlib, Sqoop,SQL, MangoDB, Machine learning.

Data Analyst

Confidential -Austin, TX

Responsibilities:

  • Perform Data Profiling to learn about user behavior and merge data from multiple data sources such as Teradata, MySQL into HDFS.
  • Implemented big data processing applications to collect, clean and normalize large volumes of open data using Hadoop eco system such as PIG, HIVE, and HBase.
  • Performed data analysis in pandas library on teh sales data of HP products and consumer survey reports for creation of reports to be forwarded to teh technical team.
  • Also worked wif several R packages including knitr, dplyr, SparkR, CausalInfer, spacetime.
  • Evaluated current data wif historic data to capture competitive market data using data analysis, market feedback and built a data pipeline for predicting teh likelihood of customers and increased profit margin over 9.5%.
  • Improved demand forecasting that reduced backorders to retail partners by 14% and predicted future demands.
  • Developed affinity and market basket rule association models that pin-points specific patterns of purchase behaviors on services.
  • Performed linear regression to predict teh yearly sales figures and used pivot tables to determine which and vendor sold teh most.
  • Provided root cause analysis and preventative measures for any data quality issues that occurred in day to day operations to clients.
  • Used logistic regression algorithm(using R and Python) to predict subscription response rate based on customers variables like promotions, demographics, interests and hobbies.
  • Predict teh impact of various promotional decision makings on overall sale of teh Company using various Clustering techniques of Machine learning approaches like K-means, Maximum Likelihood and other Hierarchical Clustering techniques.
  • Designed and deployed graphic visualizations wif Drill Down and Drop down menu option and Parameterized using Tableau wif reporting objects like Facts, Attributes, Hierarchies.
  • Developed dashboards using Tableau and Excel to identify sales trends, profitability ratio and product consumption pattern across various geographic locations and prepared comprehensive summary report to Business Consultants.

Environment: DB2, MySQL, Teradata 11g, Tableau 8, Excel, Python, Hive, SparkR, HDFS, MS VisioMS Excel, Qlickview, Machine learning, Spark MLlib, SQL, MS Visio.

Data Analyst / R programmer

Confidential - Centerville, OH

Responsibilities:

  • Experience in Financial Service industry related issues - Credit Risk, Liquidity Risk, Basel II/III, CCAR identifying data issues, data gaps and recommended solutions. Worked in defining conceptual Data Models and logical Data Models using 3NF model.
  • Developed strategies to access, integrate, and analyze data from disparate sources/platforms for transactions involving credit card, debit card, check other payment methods.
  • Knowledge in big data technologies such as Map reduce, Hive, Scala, HBase to process teh large amount of unstructured data.
  • Determination of customer's risk in terms of money laundering, terrorist finance or identity theft and utilized statistical analysis, NLP techniques such as topic modelling and sentiment analysis to identify trends and patterns wifin massive data sets in R.
  • Developed Classification model using R that predicted policies that are likely to cancel before end of teh term which helped company to increase their customer retention by 2.3%.
  • Developed working documents to support findings such as Business requirements document (BRD), Use Case Specifications, Functional Specifications (FSD), Systems Design Specification (SDS), Requirement Traceability Matrix (RTM) and testing documents.
  • Performed and coordinated System testing, Regression testing of various builds during integration, Performance and User acceptance testing (UAT) using Rational Test. Configure and develop XML Schema using Stylus Studio XML editor.
  • Compiled output either from teh Data Warehouse and teh web analytics tool. Developed reports on Excel and Clear Analytics.

Environment: R, Hive, MapReduce, HBase, Excel, Clear Analytics, PostgreSQL, NLP, Big data, Hadoop, HDFS.

PL/SQL Developer

Confidential

Responsibilities:

  • Wrote sequences for automatic generation of unique keys to support primary and foreign key constraints in data conversions.
  • Created SQL tables wif referential integrity and developed queries using SQL, SQL PLUS and PL/SQL to retrieve data using cursors and exception handling.
  • Converted various SQL statements into stored procedures theirby reducing teh number of database accesses.
  • Created indexes on tables to improve teh performance by eliminating teh full table scans and views for hiding teh actual tables and to eliminate teh complexity of teh large queries.
  • Developed DTS packages to extract, transform and load into teh Campaign database from OLTP database using of SQL Server Integration Services (SSIS).
  • Worked on Migration of packages from DTS using SQL Server Integration Service (SSRS).
  • Generating reports using SQL Reporting Services (SSRS) for customized and ad-hoc Queries.
  • Conducted and performed Data Modeling exercises in support of subject areas and/or specific client needs for data, reports, or analyses, wif a concern towards reuse of existing data elements and setting teh logical and physical relationships using MS Visio and ERWin r7.
  • Developed Pl/SQL Packages, procedure, triggers, functions, indexes and collections to implement business logic using SQLNavigator.
  • Worked on to migrate teh SQL data to BI tool MicroStrategy for visualization using charts and reports.
  • Used Crystal Reports to track logins, mouse overs, click-through, session durations and demographical comparisons wif SQL database of customer information.

Environment:T-SQL, MySQL, SSIS, SSRS, ERWin r7, DTS, MicroStrategy, SQL PLUS, PL/SQL, Informatica 9.0, MS Visio.

Tableau Developer

Confidential

Responsibilities:

  • Experience in Extracting, Transforming and Loading (ETL) data from Excel, Flat file, Oracle to MS SQL Server by using BCP utility, DTS and SSIS services.
  • Wrote custom SQL queries to optimize teh extracts, and heavily relied on data blending as teh end workbooks ultimately required multiple fields from different data sources.
  • Connected to Tableau PostgreSQL database to create customized interface for Tableau workbooks to provide native style navigation.
  • Developed various Tableau Data Reports by extracting and using teh data from various source files, DB2, Excel, Flat Files and Big data.
  • Created interactive insights using combined axis and dual axis charts wif multiple measures.
  • Developed Tableau workbooks to perform year over year, YTD, QTD and MTD type of analysis.
  • Designed complex dashboards taking advantage of all tableau functions including data blending, actions, parameters, filters.
  • Created different visualizations using cross tables, Bar charts, Pie charts, Maps, line charts, scatter plots, heat maps, bullet charts, stack bars, Gnatt charts, geographical maps.
  • Actively involved in testing new/upgraded features like Forecasting, Data Blending, Parallelized Dashboard, Hyperlink Objects, Color-coded tabs etc. in Tableau 8 desktop/Server.
  • Developed dashboards analyzing call handing forecasting, call rate, call waiting time, and call forward, average call time using Tableau BI.

We'd love your feedback!