Data Scientist Resume
New York, NY
SUMMARY:
- Over 6 years of experience in Database Management, Statistical Modeling, Machine Learning, Time Series Forecasting, Risk management, Data mining, Data visualization, Sampling Optimization, Social Network Development, Data Management, and Reporting.
- Experienced in entire data project life cycle including Data Acquisition, Data Cleansing, Data Manipulation, Visualization, Feature Engineering, Modelling, Testing, and Optimization etc.
- Query relevant data from Microsoft SQL server using complex Structured Query Language (SQL), manipulated large unstructured and structured datasets to build insightful solutions to complex problems and present the results in visually engaging and intuitive reports and dashboards.
- Excellent understanding of designing and developing T - SQL queries, ETL packages, and business reports using SQL Server Management Studio (SSMS), BI Suite (SSIS/SSRS), and Tableau.
- Proficient in Machine Learning algorithms and Predictive Modeling such as Linear Regression, Logistic Regression, Naïve Bayes, Decision Tree, Random Forest, Gradient Boosting, SVM, KNN, K-means clustering, and Artificial Neural Network model etc.
- Expertise in developing time series forecasting models such as Autoregressive Integrated Moving Average model, Exponential Smoothing model, Seasonal Exponential Smoothing model, and Holt-Winters model in Python and SAS.
- Knowledge of statistics Methodologies such as Hypothesis Testing, Principle Component Analysis (PCA), ANOVA, Chi-Square, and Cross Validation.
- Strong skills of optimized sampling methodologies like synthetic minority oversampling technique to deal with oversampling or undersampling issues.
- Familiar with Hadoop ecosystem and Apache Spark framework such as HDFS, Map reduce, Pig Latin, HiveQL, SparkSQL, and PySpark.
- Knowledge and experience in Cloud Services Amazon Web Services (AWS) and Microsoft Azure such as EC2, EMR, RDS, S3 and Azure HDinsight, Machine Learning Studio, and Azure Data Lake to assist with big data tools, solve storage issue, and work on deployment solution.
- Excellent understanding of Software Development Life Cycle (SDLC) in Agile environment.
- Creative in finding solutions to problems and determining modifications for optimal use of organizational data and expert at providing realistic projections and establishing various scenarios to determine viable process strategies to utilize.
TECHNICAL SKILLS:
Databases: Microsoft SQL Server, MySQL, IBM Netezza, and MongoDB 3.x.
Languages: Python 2.x/3.x, SAS, R, SQL, and T-SQL.
BI Tools: Tableau 9.x/10.x, Microsoft Suite (SSIS/SSRS), SAS Enterprise Miner, and Power BI.
Cloud Services: Amazon Web Services (AWS), and Microsoft Azure.
Machine Learning Algorithms: Linear Regression, Na ve Bayes, Logistic Regression, K-nearest Neighbors (KNN), K-means Clustering, Decision Tree, Random Forest, Ada Boosting, Gradient Boosting, Support Vector Machine (SVM), Neural Network, Autoregressive Integrated Moving Average model (ARIMA), Seasonal Exponential Smoothing model, and Holt-Winters model.
Hadoop Ecosystem: Hadoop 2, HDFS
Operating Systems: Windows 7/8/10, Linux, and Mac.
Project Management Tools: JIRA, GitHub, Slack, and Google Shared Services.
PROFESSIONAL EXPERIENCE:
Confidential, New York, NY
Data Scientist
Responsibility:
- Participated in all phases of project life cycle including data collection, data mining, data cleaning, developing models, validation, and reports creating.
- Implemented business intelligence dashboards using Tableau producing different summary results based on requirements and role members.
- Utilized MapReduce and PySpark programs to process data for analysis reports.
- Worked on data cleaning and ensured data quality, consistency, and integrity using Pandas and Numpy.
- Performed data preprocessing on messy data including imputation, normalization, scaling, and feature engineering etc., using Scikit-Learn.
- Conducted exploratory data analysis using Python Matplotlib and Seaborn to identify underlying patterns and correlations between features.
- Built classification models based on Logistic Regression, Decision Trees, Random Forest Support Vector Machine, and Ensemble algorithms to predict the probability of absence of patients.
- Performed complex pattern recognition of automotive time series data and forecast demand through the ARMA and ARIMA models for exponential smoothening of multivariate time series data.
- Used various metrics such as F-Score, ROC, and AUC to evaluate the performance of each model and K -fold cross-validation to test the models with different batches of data to optimize the models.
- Implemented and tested the model on AWS EC2 and collaborated with development team to get the best algorithm and parameters.
- Performed data visualization, designed dashboards with Tableau, and generated complex reports including chars, summaries, and graphs to interpret the findings to the team and stakeholders.
Environment: Microsoft SQL Server 2012, SQL Server Management Studio, T-SQL, Spark (Pyspark, MLlib, Spark SQL), MapReduce, Python, JIRA, AWS, and Tableau.
Confidential, Hartford, CT
Data Scientist
Responsibility:
- Gathered, analyzed, documented, and translated application requirements into data models and Supported standardization of documentation and the adoption of standards and practices related to data and applications.
- Collaborated with data engineers and operation team to implement ETL process, wrote, and optimized SQL queries to perform data extraction to fit the analytical requirements.
- Developed Spark Python modules for machine learning and predictive analytics, and implemented Python-based distributed algorithms via PySpark.
- Used different feature engineering methods in Python to cleanse high dimensional datasets and prepared the datasets for data modeling.
- Developed and implemented predictive models using machine learning algorithms such as linear regression, classification, multivariate regression, Naive Bayes, Random Forests, K-means clustering, KNN, PCA, and regularization for data analysis.
- Deployed machine learning models on cloud services including AWS Lambda and EC2.
- Generated report on predictive analytics using Python and Tableau including visualizing model performance and prediction results.
- Delivered and communicated research results, recommendations, and opportunities to the managerial and executive teams, and implemented the techniques for priority projects.
Environment: Microsoft SQL Server 2012, SQL Server Management Studio, T-SQL, Spark (Pyspark, MLlib, Spark SQL), Visual Studio, Python, JIRA, AWS, and Tableau.
Confidential, Philadelphia, PA
Data Scientist
Responsibility:
- Designed SSIS package to perform extract, transform, and load (ETL) data across different platforms, validate the data, and achieve the data from database. Also, performed error handling for future debugging purpose.
- Explored and visualized the data to check the pattern, distribution, descriptive statistics, and correlation using Python Matplotlib and Seaborn on Jupyter Notebook.
- Assisted in business modeling and gathering user/project requirements from different stakeholders and converted into documentation required for the project.
- Worked with Amazon Web Service EC2 based could-hosted architecture systems to provide solutions for client and loaded data files from cloud servers in AWS environment.
- Performed feature selection by exploratory data analysis, created Correlation Matrix, and used PCA to reduce dimensionality.
- Performed regression analysis, logistic regression, discriminant analysis, and cluster analysis using Python.
- Provided statistical research analysis and data modeling support for pricing system of billboard in Python.
- Evaluated parameters with K-Fold Cross Validation and optimized performance of models.
- Collaborated with business leaders to analyze problems, optimize processes, and build presentation dashboards.
Environment: Microsoft SQL Server 2008, SQL Server Management Studio, MS BI Suite (SSIS, SSRS), T-SQL, Visual Studio, Amazon Web Service and Python.
Confidential, Oak Ridge, NJ
Data Analyst
Responsibility:
- Optimized the performance of queries with modification in SQL queries, eliminated redundant and inconsistent data, normalized tables
- Maintained and developed complex SQL queries, stored procedures, views, functions and reports that meet customer requirements.
- Used SSIS to create ETL packages to Validate, Extract, Transform and Load data into Data Warehouse and Data Mart.
- Used SAS/SQL to pull data out from databases and aggregate to provide detailed reporting based on the user requirements.
- Used SAS for pre-processing data, SQL queries, data analysis, generating reports, graphics, and statistical analyses.
- Perform analyses such as regression analysis, logistic regression and cluster analysis using SAS programming.
- Published workbooks and extract data sources to Tableau Server, implemented row-level security and scheduled automatic extract refresh.
- Collaborated with Business Analysts across departments to gather business requirements and identify workable items for further development.
Environment: Microsoft SQL Server 2008, SQL Server Management Studio, MS BI Suite (SSIS, SSRS), T-SQL, Visual Studio, SAS.
Confidential, New York, NY
BI Developer
Responsibility:
- Assessed, captured, and translated complex business problems and requirements into structured analytics use case including rapid learning of industry/domain/client dynamics and developing effective work stream plans.
- Used SSIS to create ETL packages to Validate, Extract, Transform, and Load data into Data Warehouse and Data Mart.
- Optimized the performance of queries with modification in T-SQL queries, removed the unnecessary columns, redundant data, normalized tables, established joins, and created index.
- Maintained and developed complex SQL queries, stored procedures, triggers, user-defined functions (UDFs), clustered index, and non-clustered index that meets user requirements.
- Created SSIS packages using Pivot Transformation, Fuzzy Lookup, Derived Columns, Condition Split, Aggregate, Execute SQL Task, Data Flow Task, and Execute Package Task.
- Developed parameterized dynamic performance reports and maintained existing reports using Microsoft SQL Reporting Services (SSRS).
- Implemented data refreshes on SQL Server for weekly and monthly reports based on business change to ensure the views and dashboards were displaying the changed data accurately.
Environment: Microsoft SQL Server 2012, SQL Server Management Studio, MS BI Suite (SSIS, SSRS), T-SQL, Visual Studio.