We provide IT Staff Augmentation Services!

Data Scientist Resume

3.00/5 (Submit Your Rating)

Atlanta, GA

SUMMARY

  • Around 5 years of experience in IT as Data Scientist with strong technical expertise, business experience, and communication skills to drive high - impact business outcomes through data-driven innovations and decisions.
  • Experience with Statistical Analysis, Data Mining and Machine Learning Skills using R, Python and SQL.
  • Data Driven and highly analytical with working knowledge and statistical model approaches and methodologies (Clustering, Regression analysis, Hypothesis testing, Decision trees, Machine learning), rules and ever evolving regulatory environment.
  • Strong practical understanding of statistical modeling and supervised/ unsupervised/ reinforcement machine learning techniques with keen interests in applying these techniques to predictive analytics.
  • Good familiarity in entire Data Science project life cycle, including Data Acquisition, Data Cleansing, Data Manipulation, Feature Engineering, Modeling, Evaluation, Optimization, Testing and Deployment.
  • Experience in problem solving, data science, Machine learning, statistical inference, predictive analytics, descriptive analytics, prescriptive analytics, graph analysis, natural language processing, and computational linguistics; with extensive experience in predictive analytics and recommendation.
  • Hands on experience on clustering algorithms like K-means & Medoids clustering and Predictive and Descriptive algorithms.
  • Expertise in Model Development, Data Mining, Predictive Modeling, Descriptive Modeling DataVisualization, Data Clearing and Management, and Database Management.
  • Experience using machine learning models such as random forest, KNN, SVM, logistic regression and used packages such as ggplot, dplyr, lm, rpart, Random Forest, nnet, PROC-(pca, dtree, corr, princomp, gplot, logistic, cluster), NumPy, sci-kit learn, pandas, etc., in R, SAS and python.
  • Logistic Regression, SVM, Clustering, neural networks, Principal Component Analysis and good knowledge on Recommender Systems.
  • Proficient in Statistical Modeling and Machine Learning techniques (Linear, Logistics, Decision Trees, Random Forest, SVM, K-Nearest Neighbors, Bayesian, XGBoost) in Forecasting/ Predictive Analytics, Segmentation methodologies, Regression-based models, Hypothesis testing, Factor analysis/ PCA, Ensembles.
  • Worked and extracted data from various database sources like Oracle, SQL Server, DB2, and Teradata.
  • Regularly accessing JIRA tool and other internal issue trackers for the Project development.
  • Skilled in System Analysis, E-R/Dimensional Data Modeling, Database Design and implementing RDBMS specific features.
  • Expertise in all aspects of Software Development Lifecycle (SDLC) from requirement analysis, Design, Development Coding, Testing, Implementation, and Maintenance.

TECHNICAL SKILLS

Languages: C, C++, Python, R, SAS, Java-SQL, PL/SQL, SQL, MATLAB, DAX

Databases: SQL Server, MS-Access, Oracle 11g/10g/9i and Teradata, big data, Hadoop

DWH / BI Tools: Microsoft Power BI, Tableau, SSIS, SSRS, SSAS, Business Intelligence Development Studio

Big Data technologies: Hadoop, Hive, HDFS, MapReduce, Pig, Kafka.

Tools: and Utilities: SQL Server Management Studio, SQL Server Enterprise Manager, SQL Server Profiler, Import & Export Wizard, Visual Studio .Net, Microsoft Management Console, Visual Source Safe 6.0, DTS, Crystal Reports, Power Pivot, ProClarity, Microsoft Office, Excel Power Pivot

PROFESSIONAL EXPERIENCE

Confidential, Atlanta, GA

Data Scientist

Responsibilities:

  • KVC/ KVI detection using model which resulted in identifying if the category or item was demand or traffic driver, Basket driver, convenience driver, destination driver or margin enhancer. Programmed in R.
  • Drove and supported marketing, merchandising and supply chain data analytics and machine learning initiatives under high visibility. Programmed in Python.
  • Used PCA and other feature engineering, feature normalization and label encoding Scikit-learn preprocessing techniques to reduce the high dimensional data (>150 features).
  • Experimented with predictive models including Logistic Regression, Support Vector Machine (SVM), Random Forest provided by Scikit-learn, XGBoost, LightGBM and Neural network by Keras to predict showing probability and visiting counts.
  • Worked with Strategy head to provide actionable aggregated results of store level performance which would lead to improve the organization overall performance in terms of revenue. Programmed in Python.
  • Pricing analytics, promotion forecasting and media mix, customer analytics and segmentation, automation and dash boarding Programmed in R and tableau.
  • Implemented Hypothesis testing kit for sparse sample data by wiring R packages.
  • Collected the feedback after deployment, retrained the model to improve the performance.
  • Designed, developed and maintained daily and monthly summary, trending and benchmark reports in Tableau Desktop.

Technology Stack: SQL Server 2012/2014, AWS EC2, AWS Lambda, AWS S3, AWS EMR, Linux, Python3.x (Scikit-Learn, NumPy, Pandas, Matplotlib), R, Machine Learning algorithms, Tableau.

Confidential, Plano, TX

Data Scientist

Responsibilities:

  • Analyzed the data using Python and Spark; performed feature engineering to clean the data for further analysis.
  • Implemented exploratory analysis on complex data sets using SQL.
  • Performed correlation analysis to narrow down and choose the best attributes for building machine learning model.
  • Assisted in building a predictive model in Python using classification techniques such as decision trees to help identify which applicants would sign their loan application improving their sales by 31%.
  • Created customized reports in tableau for data visualization.
  • Worked closely with stakeholders and subject matter experts to elicit and gather business requirements.
  • Worked directly with the Dev team to ensure product backlog was understood to the level needed and to keep sprint on track.
  • Facilitated Agile team ceremonies including Daily Standup, Backlog Grooming, Sprint Review, Sprint Planning etc.
  • Used Pandas, NumPy, seaborn, SciPy, Matplotlib in Python for developing various machine learning algorithms and utilized machine learning algorithms such as linear regression, multivariate regression for dataanalysis.
  • Assisted Data Scientist team to gather the Requirements, Develop Process Model and detailed Business Policies and modified the business requirement document.
  • Credibly challenge remediation data analytics to ensure reasonable, complete, accurate, and consistent requirements, strategies, data logic, and implementation.
  • Create business intelligence reports using Tableau as needed.
  • Analysis of detailed logical flow chart to object-oriented python language.Bug fixing, feature adding, and code optimizations.
  • Created database using PostgreSQL, wrote several queries to extract data from database.
  • Wrote scripts in Python for extracting data from HTML file.
  • Tracked Velocity, Capacity, Burn Down Charts, and other metrics during iterations.
  • Using Python, I have automated a process to extract data and various document types from a website, save the documents to specified file path, and upload documents into an excel template.

Environment: SQL/Server, Oracle, MS-Office, Teradata, Informatica, ER Studio, XML, Business Objects, HDFS, Teradata 14.1, JSON, HADOOP (HDFS), Python, MapReduce, PIG, Spark, R Studio MAHOUT, JAVA, HIVE, AWS.

Confidential, Pennington, NJ

Data Scientist

Responsibilities:

  • Worked directly with Risk Management in LMA to understand standards.
  • Extensively involved in all phases of data acquisition, data collection, data cleaning, model development, model validation, and visualization to deliver data science solutions.
  • Built machine learning models to identify fraudulent applications for loan pre-approvals and to identify fraudulent credit card transactions using the history of customer transactions with supervised learning methods.
  • Extracted data from database, copied into HDFS File system and used Hadoop tools such as Hive and Pig Latin to retrieve the data required for building models.
  • Worked on data cleaning and ensured data quality, consistency, integrity using Pandas, NumPy.
  • Tackled highly imbalanced Fraud dataset using sampling techniques like down-sampling, up-sampling and SMOTE (Synthetic Minority Over-Sampling Technique) using Python Scikit-learn.
  • Used PCA and other feature engineering techniques to reduce the high dimensional data, feature normalization techniques and label encoding with Scikit-learn library in Python.
  • Used Pandas, NumPy, Seaborn, Matplotlib, Scikit-learn in Python for developing various machine learning models such as Logistic regression, Gradient Boost Decision Tree and Neural Network.
  • Used cross-validation to test the models with different batches of data to optimize the models and prevent overfitting.
  • Experimented with Ensemble methods to increase the accuracy of the training model with different Bagging and Boosting methods.
  • Implemented a Python-based distributed random forest via PySpark and MLlib.
  • Used AWS S3, DynamoDB, AWS lambda, AWS EC2 for data storage and models' deployment.
  • Created and maintained reports to display the status and performance of deployed model and algorithm with Tableau.

Environment: Python 2.x, CDH5, ML, HDFS, Hadoop 2.3, Hive, Impala, Linux, Spark, Tableau Desktop, SQL Server 2012, Microsoft Excel, MATLAB, Spark SQL, PySpark.

Confidential

Data Scientist

Responsibilities:

  • Using the expertise in quantitative analysis, data mining Techniques in Python/SQL presented the data to see beyond the numbers and understand how users interact with our core products
  • Collaborated with data engineers and operation team to implement ETL process, wrote and optimized SQL queries to perform data extraction to fit the analytical requirements.
  • Worked on data cleaning and ensured data quality, consistency, integrity using Pandas, NumPy.
  • Explored and analyzed the customer specific features by using Matplotlib, Seaborn in Python and dashboards in Tableau.
  • Performed data imputation using Scikit-learn package in Python.
  • Participated in features engineering such as feature generating, PCA, feature normalization and label encoding with Scikit-learn preprocessing.
  • Used Python 2.x/3.X (NumPy, SciPy, Pandas, Scikit-learn, seaborn to develop variety of models and algorithms for analytic purposes.
  • Experimented and built predictive models including ensemble methods such as Gradient boosting trees and Neural Network by Keras to predict Sales amount.
  • Conducted analysis and patterns on customers' shopping habits in different location, different categories and different months by using time series modeling techniques.
  • Used RMSE/MSE to evaluate different models' performance.
  • Designed rich data visualizations to model data into human-readable form with Tableau and Matplotlib.

Confidential

Data Scientist

Responsibilities:

  • Capture business requirements, after discussing it with clients regarding what functionalities they want to incorporate in the product.
  • Worked with BTEQ to submit SQL statements, import and export data, and generate reports in Teradata.
  • Responsible for defining the key identifiers for each mapping/interface.
  • Document the complete process flow to describe program development, logic, testing, and implementation, application integration, coding.
  • Involved in defining the business/transformation rules applied for sales and service data.
  • Worked with internal architects and, assisting in the development of current and target state dataarchitectures.
  • Implementation of Metadata Repository, Maintaining data Quality, data Cleanup procedures.
  • Transformations, data Standards, data Governance program, Scripts, Stored Procedures, triggers and execution of test plans
  • Performed data quality in Talend Open Studio.
  • Document data quality and traceability documents for each source interface.
  • Establish standards of procedures.
  • Generate weekly and monthly asset inventory reports.

We'd love your feedback!