We provide IT Staff Augmentation Services!

Data Scientist Resume

4.00/5 (Submit Your Rating)

NY

SUMMARY

  • Around 7 years of experience in IT as Data Scientist with strong technical expertise, business experience, and communication skills to drive high - impact business outcomes through data-driven innovations and decisions.
  • Experience with Statistical Analysis, Data Mining and Machine Learning Skills using R, Python and SQL.
  • Data Driven and highly analytical with working noledge and statistical model approaches and methodologies (Clustering, Regression analysis, Hypothesis testing, Decision trees, Machine learning), rules and ever evolving regulatory environment.
  • Strong practical understanding of statistical modeling and supervised/ unsupervised/ reinforcement machine learning techniques with keen interests in applying these techniques to predictive analytics.
  • Good familiarity in entire Data Science project life cycle, including Data Acquisition, Data Cleansing, Data Manipulation, Feature Engineering, Modeling, Evaluation, Optimization, Testing and Deployment.
  • Experience in problem solving, data science, Machine learning, statistical inference, predictive analytics, descriptive analytics, prescriptive analytics, graph analysis, natural language processing, and computational linguistics; with extensive experience in predictive analytics and recommendation.
  • Hands on experience on clustering algorithms like K-means & Medoids clustering and Predictive and Descriptive algorithms.
  • Expertise in Model Development, Data Mining, Predictive Modeling, Descriptive Modeling Data Visualization, Data Clearing and Management, and Database Management.
  • Experience using machine learning models such as random forest, KNN, SVM, logistic regression and used packages such as ggplot, dplyr, lm, rpart, Random Forest, nnet, PROC-(pca, dtree, corr, princomp, gplot, logistic, cluster), NumPy, sci-kit learn, pandas, etc., in R, SAS and python.
  • Logistic Regression, SVM, Clustering, neural networks, TEMPPrincipal Component Analysis and good noledge on Recommender Systems.
  • Proficient in Statistical Modeling and Machine Learning techniques (Linear, Logistics, Decision Trees, Random Forest, SVM, K-Nearest Neighbors, Bayesian, XGBoost) in Forecasting/ Predictive Analytics, Segmentation methodologies, Regression-based models, Hypothesis testing, Factor analysis/ PCA, Ensembles.
  • Worked and extracted data from various database sources like Oracle, SQL Server, DB2, and Teradata.
  • Well experienced in Normalization & De-Normalization techniques for optimum performance in relational and dimensional database environments.
  • Regularly accessing JIRA tool and other internal issue trackers for teh Project development.
  • Skilled in System Analysis, E-R/Dimensional Data Modeling, Database Design and implementing RDBMS specific features.
  • Expertise in all aspects of Software Development Lifecycle (SDLC) from requirement analysis, Design, Development Coding, Testing, Implementation, and Maintenance.

TECHNICAL SKILLS

Languages: C, C++, Python, R, SAS, Java-SQL, PL/SQL, SQL, MATLAB, DAX

Databases: SQL Server, MS-Access, Oracle 11g/10g/9i and Teradata, big data, Hadoop

DWH / BI Tools: Microsoft Power BI, Tableau, SSIS, SSRS, SSAS, Business Intelligence Development Studio (BIDS), Visual Studio, Crystal Reports.

Big Data technologies: Hadoop, Hive, HDFS, MapReduce, Pig, Kafka.

Tools: and Utilities: SQL Server Management Studio, SQL Server Enterprise Manager, SQL Server Profiler, Import & Export Wizard, Visual Studio .Net, Microsoft Management Console, Visual Source Safe 6.0, DTS, Crystal Reports, Power Pivot, ProClarity, Microsoft Office, Excel Power Pivot, Excel

Data Explorer: Machine Learning Linear Regression, Logistic Regression, Gradient boosting, Random Forests, Maximum likelihood estimation, Clustering, Classification & Association Rules, K-Nearest Neighbors (KNN), K-Means Clustering, Decision Tree (CART & CHAID), Neural Networks, TEMPPrincipal Component Analysis, Weight of Evidence (WOE) and Information Value (IV), Factor Analysis, Sampling Design, Time Series Analysis, ARIMA, ARMA, GARCH, Market Basket Analysis, Text mining

Cloud Technologies: Amazon Web Services (EC2, EBS, S3, VPC, RDS, S3, SES)

Methodologies: Agile, Scrum

PROFESSIONAL EXPERIENCE

Confidential, NY

Data Scientist

Responsibilities:

  • KVC/ KVI detection using model which resulted in identifying if teh category or item was demand or traffic driver, Basket driver, convenience driver, destination driver or margin enhancer. Programmed in R.
  • Drove and supported marketing, merchandising and supply chain data analytics and machine learning initiatives under high visibility. Programmed in Python.
  • Used PCA and other feature engineering, feature normalization and label encoding Scikit-learn preprocessing techniques to reduce teh high dimensional data(>150 features).
  • Experimented with predictive models including Logistic Regression, Support Vector Machine (SVM), Random Forest provided by Scikit-learn, XGBoost, LightGBM and Neural network by Keras to predict showing probability and visiting counts.Worked with Strategy head to provide actionable aggregated results of store level performance which would lead to improve teh organization overall performance in terms of revenue. Programmed in Python.
  • Pricing analytics, promotion forecasting and media mix, customer analytics and segmentation, automation and dash boarding Programmed in R and tableau.
  • Implemented Hypothesis testing kit for sparse sample databy wiring R packages.
  • Collected teh feedback after deployment, retrained teh model to improve teh performance.
  • Designed, developed and maintained daily and monthly summary, trending and benchmark reports in Tableau Desktop. Technology Stack: SQL Server 2012/2014, AWS EC2, AWS Lambda, AWS S3, AWS EMR, Linux, Python3.x (Scikit-Learn, NumPy, Pandas, Matplotlib), R, Machine Learning algorithms, Tableau.

Environment: Python, SQL, Oracle 12c, PL/SQL, T-SQL, Tableau, MLlib, regression, Cluster analysis, Scala NLP, Spark, Kafka, MongoDB, logistic regression, Hadoop, Hive, Teradata, random forest, OLAP, MariaDB, SAP CRM, HDFS, ODS, NLTK, SVM, JSON, Tableau, XML, Cassandra, Map Reduce, AWS.

Confidential - Houston, TX

Data Scientist

Responsibilities:

  • Analyzed teh datausing Python and Spark; performed feature engineering to clean teh data for further analysis.
  • Implemented exploratory analysis on complex datasets using SQL.
  • Performed correlation analysis to narrow down and choose teh best attributes for building machine learning model.
  • Assisted in building a predictive model in Python using classification techniques such as decision trees to help identify which applicants would sign their loan application improving their sales by 31%.
  • Created customized reports in tableau for datavisualization.
  • Worked closely with stakeholders and subject matter experts to elicit and gather business requirements.
  • Worked directly with teh Dev team to ensure product backlog was understood to teh level needed and to keep sprint on track.
  • Facilitated Agile team ceremonies including Daily Standup, Backlog Grooming, Sprint Review, Sprint Planning etc.
  • Used Pandas, NumPy, seaborn, SciPy, Matplotlib in Python for developing various machine learning algorithms and utilized machine learning algorithms such as linear regression, multivariate regression for dataanalysis.
  • Assisted Data Scientistteam to gather teh Requirements, Develop Process Model and detailed Business Policies and modified teh business requirement document.
  • Credibly challenge remediation dataanalytics to ensure reasonable, complete, accurate, and consistent requirements, strategies, datalogic, and implementation.
  • Create business intelligence reports using Tableau as needed.
  • Analysis of detailed logical flow chart to object-oriented python language.Bug fixing, feature adding, and code optimizations.
  • Created database using PostgreSQL, wrote several queries to extract datafrom database.
  • Wrote scripts in Python for extracting data from HTML file.
  • Tracked Velocity, Capacity, Burn Down Charts, and other metrics during iterations.
  • Using Python, I have automated a process to extract dataand various document types from a website, save teh documents to specified file path, and upload documents into an excel template.

Environment: SQL/Server, Oracle, MS-Office, Teradata, Informatica, ER Studio, XML, Business Objects, HDFS, Teradata 14.1, JSON, HADOOP (HDFS), Python, MapReduce, PIG, Spark, R Studio MAHOUT, JAVA, HIVE, AWS.

Confidential - Englewood Cliffs, NJ

Data Scientist

Responsibilities:

  • Extensively involved in all phases of dataacquisition, data collection, datacleaning, model development, model validation, and visualization to deliver datascience solutions.
  • Built machine learning models to identify fraudulent applications for loan pre-approvals and to identify fraudulent credit card transactions using teh history of customer transactions with supervised learning methods.
  • Extracted data from database, copied into HDFS File system and used Hadoop tools such as Hive and Pig Latin to retrieve teh datarequired for building models.
  • Worked on data cleaning and ensured data quality, consistency, integrity using Pandas, NumPy.
  • Tackled highly imbalanced Fraud dataset using sampling techniques like down-sampling, up-sampling and SMOTE (Synthetic Minority Over-Sampling Technique) using Python Scikit-learn.
  • Used PCA and other feature engineering techniques to reduce teh high dimensional data, feature normalization techniques and label encoding with Scikit-learn library in Python.
  • Used Pandas, NumPy, Seaborn, Matplotlib, Scikit-learn in Python for developing various machine learning models such as Logistic regression, Gradient Boost Decision Tree and Neural Network.
  • Used cross-validation to test teh models with different batches of datato optimize teh models and prevent overfitting.
  • Experimented with Ensemble methods to increase teh accuracy of teh training model with different Bagging and Boosting methods.
  • Implemented a Python-based distributed random forest via PySpark and MLlib.
  • Used AWS S3, DynamoDB, AWS lambda, AWS EC2 for datastorage and models' deployment.
  • Created and maintained reports to display teh status and performance of deployed model and algorithm with Tableau.

Environment: Python 2.x, CDH5, ML, HDFS, Hadoop 2.3, Hive, Impala, Linux, Spark, Tableau Desktop, SQL Server 2012, Microsoft Excel, MATLAB, Spark SQL, PySpark.

Confidential - New York, NY

Data Scientist

Responsibilities:

  • Collaborated with data engineers and operation team to implement ETL process, wrote and optimized SQL queries to perform dataextraction to fit teh analytical requirements.
  • Worked on datacleaning and ensured data quality, consistency, integrity using Pandas, NumPy.
  • Explored and analyzed teh customer specific features by using Matplotlib, Seaborn in Python and dashboards in Tableau.
  • Performed dataimputation using Scikit-learn package in Python.
  • Participated in features engineering such as feature generating, PCA, feature normalization and label encoding with Scikit-learn preprocessing.
  • Used Python 2.x/3.X (NumPy, SciPy, Pandas, Scikit-learn, seaborn to develop variety of models and algorithms for analytic purposes.
  • Experimented and built predictive models including ensemble methods such as Gradient boosting trees and Neural Network by Keras to predict Sales amount.
  • Conducted analysis and patterns on customers' shopping habits in different location, different categories and different months by using time series modeling techniques.
  • Used RMSE/MSE to evaluate different models' performance.
  • Designed rich data visualizations to model datainto human-readable form with Tableau and Matplotlib.

Confidential

Data Scientist

Responsibilities:

  • Worked with BTEQ to submit SQL statements, import and export data, and generate reports in Teradata.
  • Responsible for defining teh key identifiers for each mapping/interface.
  • Document teh complete process flow to describe program development, logic, testing, and implementation, application integration, coding.
  • Involved in defining teh business/transformation rules applied for sales and service data.
  • Worked with internal architects and, assisting in teh development of current and target state dataarchitectures.
  • Implementation of Metadata Repository, Maintaining data Quality, data Cleanup procedures.
  • Transformations, dataStandards, dataGovernance program, Scripts, Stored Procedures, triggers and execution of test plans
  • Performed dataquality in Talend Open Studio.
  • Document dataquality and traceability documents for each source interface.
  • Establish standards of procedures.
  • Generate weekly and monthly asset inventory reports.

We'd love your feedback!