We provide IT Staff Augmentation Services!

Data Scientist Resume

2.00/5 (Submit Your Rating)

Melville, NY

SUMMARY

  • Over 7 years of experience in Data mining, predictive modeling, Statistical analytics, econometric modeling, data visualization with large data sets of structured and unstructured data.
  • Strong experience in Data Analysis, Data Cleaning, Data Migration, Data Conversion, Data Export and Import, Data Integration.
  • Experience in using python libraries like Numpy, SciPy, Pandas, Matplotlib, Scikit - learn.
  • Proficient in Statistical Modeling and Machine Learning techniques (Linear, Logistics, Decision Trees, Random Forest, SVM, K-Nearest Neighbors, Bayesian, XG Boost) in Forecasting/ Predictive Analytics, Segmentation methodologies, Regression based models, Hypothesis testing, Factor analysis/ PCA, Ensembles.
  • Experienced in integration of various relational and non-relational sources such as, Teradata, Oracle, SQL Server, NoSQL, COBOL, XML and Flat Files.
  • Performed extensive data profiling and analysis for detecting and correcting inaccurate data from the databases and to track data quality.
  • Experience in performance tuning and query optimization techniques in transactional and data warehouse environments.
  • Experience working on BI visualization tools (Tableau, & QlikView).
  • Extensive experience in Text Analytics, developing different Statistical Machine Learning, Data Mining solutions to various business problems and generating data visualizations using R, Python and Tableau.
  • Created reports for the users using Tableau by connecting to multiple data sources like Flat Files, MS Excel, CSV files, SQL Server, and Oracle.
  • Hands on experience in writing queries in SQL and R to extract, transform and load (ETL) data from large datasets using Data Staging.
  • Evaluating data sources and strong understanding of data warehouse/data mart design, ETL, BI, OLAP, client/server applications.
  • Experience in creating partitions, indexes, indexed views to improve the performance, reduce contention and increase the availability of data.

TECHNICAL SKILLS

Data Modelling Tools: Erwin 8.0, ER/Studio, SAP Power designer, MS Viso

Programming Languages: Python, R, SQL, NoSQL, SAS

Scripting Languages: Python (Numpy, SciPy, Pandas, matplotlib, scikitlearn and seaborn), R (ggplot, Weka, dplyr, knitr, caret)

BI and Visualization: Tableau, QlikView, Tableau server, SAP Business Objects, OBIEECrystal Reports XI, Power BI, Tableau, SSRS and SPSS

Databases: MSSQL Server, Oracle database, and MySQL and NoSQL (Hive, and Oracle NoSQL, Teradata

Modeling techniques: Predictive Modeling/ANOVAs/Linear Regression, Logistic, Regression/Cluster analysis

Machine Learning: Naïve Bayes, Decision Trees, Regression models, random forests, K-means clustering, Market Basket Analysis, Time-series and support vector machines

Tools: HQL, Pig, Apache Spark, Microsoft Visual Studio, SPSS, Microsoft SQL Integration Services, Microsoft SQL Analyzing Services, Microsoft SQL Reporting Services, Oracle Database Integrator.

ETL tools: Informatica power center, SSIS, SSAS.

PROFESSIONAL EXPERIENCE

Confidential, Melville, NY

Data Scientist

Responsibilities:

  • Involved in Exploratory data analysis using Descriptive statistics and Data visualization to determine the base line MLAs.
  • Involved in building and automating the robust model with very good accuracy for the given customer base.
  • Conducted analysis on assessing customer consuming behaviors and discover value of customers with RMF analysis; applied customer segmentation with clustering algorithms such as K-Means Clustering and Hierarchical Clustering.
  • Coordinated the execution of A/B tests to measure the effectiveness of personalized recommendation system.
  • Applied Wilcoxon sign test to stock performance data for pre-acquisition and post-acquisition for different sectors to find the statistical significance in R programming
  • Recommended and evaluated marketing approaches based on quality analytics of customer consuming behavior.
  • Utilized SQL and Hive QL to query, manipulate data from variety data sources including Oracle and HDFS, while maintaining data integrity.
  • Worked on data cleaning, data preparation and feature engineering with Python including Numpy, SciPy, Pandas, Matplotlib, Seaborn and Scikit-learn.
  • Predicted the claim severity to understand future loss and ranked importance of features.
  • Used Python and Spark to implement different machine learning algorithms including Generalized Linear Model, SVM, Random Forest, Boosting.
  • Identifying internal and external information sources, building effective working relationships with subject matter experts across research groups within the firm and the external marketplace Involved in Data preparation using various tasks like.
  • Data reduction - Obtains reduced representation in volume but produces the same or similar analytical results Developed logistic regression models to predict subscription response rate based on customers' variables like past transactions, response to prior mailings, promotions, demographics, interests etc.
  • Data discretization - transform quantitative data into qualitative data.
  • Data cleaning - Fill in missing values, handle the noisy data, identify or remove outliers and resolve inconsistencies.
  • Data integration - Integration of multiple databases, data cubes, or files.
  • Data transformation - Normalization, standardization and aggregation.
  • Designed dashboards with Tableau and D3.js and provided complex reports, including summaries, charts, and graphs to interpret findings to team and stakeholders.

Environment: Tableau, Oracle, Teradata, R, Python and Spark, SQL, Hive QL, Machine learning algorithms, HDFS.

Confidential, Irving, TX

Data Scientist

Responsibilities:

  • Worked closely with the Data Governance Office team in assessing the source systems for project deliverables.
  • Developed Descriptive statistics and inferential statistics for Logistics optimization average hours per job, Value throughput data to at 95% confidence interval.
  • Analyzed customer behavior, developed a Churn Prediction Model and a Regression model to estimate Life-time Value (CLV) of users
  • Performed univariate, bivariate and multivariate analysis of approx. 4890 tuples using bar charts, box plots and histograms
  • Extensively used open source tools - R Studio(R) and Spyder(Python) for statistical analysis and building the machine learning.
  • Used packages like dplyr, tidyr & ggplot2 in R Studio for data visualization and generated scatter plot and high low graph to identify relation between different variables.
  • Used T-SQL queries to pull the data from disparate systems and Data warehouse in different environments.
  • Presented DQ analysis reports and score cards on all the validated data elements and presented- to the business teams and stakeholders.
  • Involved in defining the Source to Target data mappings, business rules, data definitions.
  • Interacting with the Business teams and Project Managers to clearly articulate the anomalies, issues, findings during data validation.
  • Extracting data from different databases as per the business requirements using SQL Server Management Studio (SSMS).
  • Writing complex SQL queries for validating the data against different kinds of reports generated by Cognos.
  • Extensively using MS Excel (Pivot tables, VLOOKUP) for data validation.
  • Interacting with the ETL, BI teams to understand / support on various ongoing projects.
  • Generating weekly, monthly reports for various business users according to the business requirements. Manipulating/mining data from database tables (Oracle, Data Warehouse).
  • Providing analytical network support to improve quality and standard work results.
  • Create data pipelines using Hadoop, spark as big data technologies etc.
  • Create statistical models using distributed and standalone models to build various diagnostics, predictive and prescriptive solution.
  • Utilize a broad variety of statistical packages like SAS, R, MLIB, Graphs, Spark, Map Reduce, Pig and others.
  • Provides input and recommendations on technical issues to BI Engineers, Business & Data Analysts and Data Scientists.

Environment: Machine learning, AWS, MS Azure, Spark, HDFS, Hive, Linux, Python (Scikit-Learn/SciPy/Numpy/Pandas), R, MySQL, Eclipse, PL/SQL, SQL connector, Tableau

Confidential, Philadelphia PA

Data Analyst

Responsibilities:

  • Actively involved in the gathering requirements and analysis of the requirements related to the project and responsible for writing Business Approach Documents and Technical Documents for new projects.
  • Worked with the team in mortgage domain to implement designs based on the free cash flow, acquisition, and capital efficiency.
  • Extensively worked on production support that includes Deployment of SSIS Packages into Development, Production Servers, Creating, Scheduling and Managing the SQL Jobs.
  • Creating SSIS Package Configurations and maintaining their Tables by editing the values of the variables as per the requirement.
  • Worked on Data migration project by validating three source data row by row and field by filed.
  • Using SQL Server Integration Services (SSIS) Migrated data from Heterogeneous Sources and legacy system (Oracle, Access, Excel) to centralized SQL Server databases to overcome transformation constraints and limitations.
  • Worked on development of SSIS package to generate excel reports individually and consolidated reports by dynamically passing the carrier ids from the SQL Table in Shared Folder to make available to the business users.
  • Used transformations like SCD, CDC, Data Conversion, Conditional Split, Merge Join, Derived Column, Lookup, Cache Transform and Union all etc., to convert raw data into required data meeting the business requirements.
  • Worked on re-design of the few SSIS Package which involved very complex logic and more than a hundred tables and tasks and developed simplified version of the Packages.
  • Created Rich dashboards using Tableau Desktop and prepared user stories to create compelling dashboards to deliver actionable insights.
  • Performed data refresh on Tableau Server on weekly, monthly and quarterly basis and ensured that the views and dashboards were accurately displaying the changes in data.
  • Designed and developed a business intelligence dashboard using Tableau Desktop, allowing executive management to view past, current and forecast sales data.
  • Created multiple Visualization reports/dashboards using Dual Axes charts, Histograms, Filled map, Bubble chart, Bar chart, Line chart, Tree map, Box and Whisker Plot, Stacked Bar etc.,
  • Experience with creation of users, groups, projects, workbooks and the appropriate permission sets for Tableau server logons and security checks.
  • Created advanced analytical dashboards using Reference Lines, Bands and Trend Lines.
  • Developed and rendered monthly, weekly and Daily reports as per the requirement and gave roles, user access with respect to security in report manager.
  • Extensively used Teradata utilities (BTEQ, Fast Load, Multiload, TPUMP) to import/export and load the data from oracle and flat files.

Environment: Microsoft SQL Server 2012/2008 R2, Teradata, SSIS/SSAS/SSRS 2012/2008, Microsoft Office, MS VISO, Crystal Reports XI Release, Visual Source Safe, Microsoft TFS, BIDS, And Visual Studio, Tableau

We'd love your feedback!