Data Scientist Resume Dallas, TX - Hire IT People

SUMMARY

An ardent Data Wrangler possessing 7 years’ of cross industry including healthcare experience in handling volumes of data nuggets as structured & unstructured data using Python, R, SQL, Microsoft Excel, Hadoop ecosystem (Hive, Sqoop, PySpark, Spark SQL) for Data Mining, Data Cleansing, Data Munging and Machine Learning.
Experienced working on data preprocessing steps like exploration, aggregation, missing data imputation, sampling, feature selection, dimensionality reduction, outlier detection.
Well versed in machine learning algorithms such as Linear and Logistic regression, Decision Trees, Random Forest, Support Vector Machines, K nearest neighbors.
Experienced in performing Time Series forecasting using Auto ARIMA models and NLP projects like Text Analytics, Sentimental Analysis in RStudio.
Experienced in working with various Python IDE’s using PyCharm, PyScripter, Jupyter, Spyder and Sublime Text.
Experienced on data wrangling, data visualization and reporting using Python, Tableau.
Built time - series models and statistical models for sales predictions and descriptive visualizations of sales data to showcase identified hidden trends and anomalies
Built models to identify & associate product sales with products from same category, aiding business decisions by cleansing, standardizing, pre-processing data & inferring hidden trends
Built models to validate the effect of product sales during holiday season based on the seasonality, assisting in shelf and store cluster analysis.
Experienced on extracting data from various non-traditional data sources (Like web scraping)
Extensive experience in developing dashboards, reports using tools like Tableau, PowerBI e.t.c.
Extensive Knowledge on developing Spark SQL jobs by developing Data Frames.
Wrote SQOOP scripts for moving data between Relational DBs, HDFS, and S3 storage.

TECHNICAL SKILLS

Languages: Python, R Programming, SAS, SQL

Cloud Platforms: AWS, GCP(BigQuery), Azure DataBricks

Big Data: Hive, Impala, Sqoop, Pig, PySpark

Analysis: Supervised Learning (Linear Regression, Logistic Regression, Decision Tree, Random Forest, SVM, Classification), Unsupervised Learning (Clustering, KNN, Factor Analysis, PCA), Natural Language Processing, Time Series Forecasting

Serverless ML: BigML, DataRobot, H2O.ai

Relational Databases: Oracle10g, IBM DB2, PostgreSQL, SAP HANA

NoSQL: MongoDB

Data Visualization: Tableau, PowerBI, AWS QuickSight

Specialties: Machine Learning / Predictive Analytics / Text Mining / Market Basket Analytics:

Regression: Simple Linear Regression, Multiple Linear Regression, Logistic Regression

Ensemble: Boosting, Bagging, Stacking

Instance-based: k-Nearest Neighbor (kNN)

Decision Tree Learning: Classification and Regression Tree (CART), Gradient Boosting Machines (GBM)

Clustering: k-Means

Deep Learning: TensorFlow, Keras, PyTorch

Time Series: Moving Average, ARIMA

Text Analytics: NLTK, Pandas, Word Cloud

PROFESSIONAL EXPERIENCE

Data Scientist

Confidential, Dallas, TX

Techniques: Data wrangling, Logistic Regression, Dataset creation, Predictive Modeling, PCA, Time Series Analysis, Random Forest, Decision Tree

Tools: AWS (Sagemaker, ECS, Kinesis)

Responsibilities:

Performed exploratory analysis on product data to know the structure, attributes, dimensions, missing values and outliers in the data using Python
Detected and treated outliers and ran stepwise regression and all subset regression methods to choose effective variables to build the Machine Learning model.
Worked on Jupyter, RStudio creating EC2 instances by accessing unstructured data stored in S3 by leveraging the capabilities of AWS Cloud Machine Learning platform.
Well versed with BigData Hadoop ecosystem within the AWS EMR cluster, serverless ML frameworks like BigML, DataRobot and H20.ai
Solved performance issues in Hive and Pig scripts with understanding of Joins, Group and aggregation.
Loading and transforming of large sets of structured data from SQL Server through Sqoop and placed in HDFS/HIVE for further processing.
Created Hive queries that helped market analysts spot emerging trends by comparing fresh data with EDW reference tables and historical metrics.
Enabled speedy reviews and first mover advantages by using Oozie to automate data loading into the Hadoop Distributed File System and PIG to pre-process the data.
Involved in creating Hive table, loading with data, writing hive queries to test the test cases
Created Rich dashboards using AWS QuickSight, Tableau and prepared user stories to create compelling dashboards to deliver actionable insights.
Wrote Dockerfiles, pushed to ECR and written task definitions to orchestrate containers across various target groups with knowledge of ECS as the container clustering/management resource.
Architected continuous Machine Learning system that generates predictive models for every product channel, evaluates performance of models, and smart enough to decide the best model for each product channel and deploy it in production automatically using AWS Lambda and Docker (Elastic Container)

Data Scientist

Confidential

Projects / Techniques: Cluster Analysis, Customer Attrition, Predictive Modeling, Data Wrangling

Tools: Used: Hadoop Cluster, Python, SQL

Responsibilities:

End to end analytical solutions to the business problems. Understanding the problem, data and creating an analytical solution using statistical techniques in Python or R and providing recommendations.
Extracted data from database or other resources using SAS, SQL, EXCEL etc., prepared data for further analysis and modeling, validated data, ensured data integrity and consistency
Carried out data cleansing, converted data into structured format, removed outliers, dropped irrelevant columns & missing values, imputed missing values with median/mode/average/min/max
Dropped highly correlated & low variance variables & ensured normal distribution through transformation
Developed and scaled machine learning and deep learning models like Logistic Regression, Random Forest, Gradient Boosting Machines, SVM (Support Vector Machines), etc. for classification
Programmed using python to prototype and deploy Machine Learning, Deep Learning, Predictive models, Probabilistic and Statistical Modeling based approach with user interface development
Analyzing customer survey data to enable better targeting of customers and understanding the concerns raised by the consumers using text analytics (leveraged Python's Natural Language Tool Kit)
Extensive experience with building time series forecasting models (ARIMAX, ARIMA, exponential smoothing, etc. & building Visualizations using tools like Tableau, Excel, ggplot, Matplotlib
Equipped with strong analytical skills in using Big data tools like Hadoop, SQOOP, Map Reduce, Hive, Pig

Senior Data Analyst

Confidential

Responsibilities:

Created SQL and SWOT analysis reports and dashboards to provide visibility into the products in the supply chain pipeline at major trans-load sites and in-transit to delivery centers, enhancing delivery efficiency by 7%
Performed data analysis and reporting and helped with strategic initiatives to support Regional Distribution Center operations while working cross functionally with Yard Operations, Finance and Quality to collect, measure, visualize and interpret large data sets
Collected data from disparate sources like MS SQL Server, Excel, and other Flat files. Integrated, analysed and interpreted the data and presented the findings in the form of reports and briefings for Senior Management
Utilized, created, and maintained SQL scripts used for data exchange and validation
Investigated ETL job or process failure by checking underlying queries/log files
Developed database structures, SQL queries and ETL scripts to extract, transform and load data from all the departmental transaction processing systems into the Data warehouse.

Associate - Marketing Analytics

Confidential

Responsibilities:

Collected and analyzed data on customer demographics, preferences, needs, and buying habits to identify potential markets and factors affecting product demand.
Help strategize sample and data collection for the purpose of research data analysis
Maintain, update and clean datasets generated from various sources to ensure data integrity for downstream analysis
Discover potential analytical methods using intuitive and interactive data visualization techniques
Conduct statistical analysis, including but not limited to correlation test, univariate and multivariate modeling, on cross-sectional and longitudinal data to serve the aim of the study
Provide clear result interpretation to meeting participants in order to facilitate collaboration in current and future projects
Communicate daily with project manager and weekly with primary investigator to provide updates, develop methods, and prevent problems
Prepared reports of findings, illustrating data graphically and translating complex findings into written text.

Data Analyst

Confidential

Responsibilities:

Evaluated outcomes experienced by children and adult with severe Mental Illness(SMI) including the incidence of restrictive behavioral health treatment, involuntary examinations and criminal justice encounters that occur before and after plan enrollment
Worked as technical consultant for Agency in developing reports on patient demographics, disease and respective MMA data
Developed SAS code to analyze data from multiple sources like baker act data, healthcare claims data
Identified and tracked high risk Medicaid recipients by analyzing patient, prescriber, plan & provider level data
Processed monthly data MMA encounters data in SAS which again includes Institutional, Professional, Dental claims files
Extracted data from flat files using PROC IMPORT and merged the datasets as per requirement using PROC SQL
Developed SAS/Macros for weekly, monthly and quarterly reports
Created RTF and PDF formatted reports using SAS/ ODS for presentation to Agency For Healthcare Administration

We provide IT Staff Augmentation Services!

Data Scientist Resume

Dallas, TX

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship