We provide IT Staff Augmentation Services!

Data Scientist Resume

4.00/5 (Submit Your Rating)

Melville, NY

SUMMARY:

  • Over 7 years of experience in Data mining, predictive modeling, Statistical analytics, econometric modeling, data visualization with large data sets of structured and unstructured data.
  • Strong experience in Data Analysis, Data Cleaning, Data Migration, Data Conversion, Data Export and Import, Data Integration.
  • Experience in using python libraries like Numpy, SciPy, Pandas, Matplotlib, Scikit - learn.
  • Proficient in Statistical Modeling and Machine Learning techniques (Linear, Logistics, Decision Trees, Random Forest, SVM, K-Nearest Neighbors, Bayesian, XG Boost) in Forecasting/ Predictive Analytics, Segmentation methodologies, Regression based models, Hypothesis testing, Factor analysis/ PCA, Ensembles.
  • Experienced in integration of various relational and non-relational sources such as, Teradata, Oracle, SQL Server, NoSQL, COBOL, XML and Flat Files.
  • Performed extensive data profiling and analysis for detecting and correcting inaccurate data from the databases and to track data quality.
  • Experience in performance tuning and query optimization techniques in transactional and data warehouse environments.
  • Experience working on BI visualization tools (Tableau, & QlikView).
  • Extensive experience in Text Analytics, developing different Statistical Machine Learning, Data Mining solutions to various business problems and generating data visualizations using R, Python and Tableau.
  • Created reports for the users using Tableau by connecting to multiple data sources like Flat Files, MS Excel, CSV files, SQL Server, and Oracle.
  • Hands on experience in writing queries in SQL and R to extract, transform and load (ETL) data from large datasets using Data Staging.
  • Evaluating data sources and strong understanding of data warehouse/data mart design, ETL, BI, OLAP, client/server applications.
  • Experience in creating partitions, indexes, indexed views to improve the performance, reduce contention and increase the availability of data.

PROFESSIONAL EXPERIENCE:

Confidential, Melville, NY

Data Scientist

Responsibilities:

  • Involved in Exploratory data analysis using Descriptive statistics and Data visualization to determine the base line MLAs.
  • Involved in building and automating the robust model with very good accuracy for the given customer base.
  • Conducted analysis on assessing customer consuming behaviors and discover value of customers with RMF analysis; applied customer segmentation with clustering algorithms such as K-Means Clustering and Hierarchical Clustering.
  • Coordinated the execution of A/B tests to measure the effectiveness of personalized recommendation system.
  • Applied Wilcoxon sign test to stock performance data for pre-acquisition and post-acquisition for different sectors to find the statistical significance in R programming
  • Recommended and evaluated marketing approaches based on quality analytics of customer consuming behavior.
  • Utilized SQL and Hive QL to query, manipulate data from variety data sources including Oracle and HDFS, while maintaining data integrity.
  • Worked on data cleaning, data preparation and feature engineering with Python including Numpy, SciPy, Pandas, Matplotlib, Seaborn and Scikit-learn.
  • Predicted the claim severity to understand future loss and ranked importance of features.
  • Used Python and Spark to implement different machine learning algorithms including Generalized Linear Model, SVM, Random Forest, Boosting.
  • Identifying internal and external information sources, building effective working relationships with subject matter experts across research groups within the firm and the external marketplace Involved in Data preparation using various tasks like.
  • Data reduction - Obtains reduced representation in volume but produces the same or similar analytical results Developed logistic regression models to predict subscription response rate based on customers' variables like past transactions, response to prior mailings, promotions, demographics, interests etc.
  • Data discretization - transform quantitative data into qualitative data.
  • Data cleaning - Fill in missing values, handle the noisy data, identify or remove outliers and resolve inconsistencies.
  • Data integration - Integration of multiple databases, data cubes, or files.
  • Data transformation - Normalization, standardization and aggregation.
  • Designed dashboards with Tableau and D3.js and provided complex reports, including summaries, charts, and graphs to interpret findings to team and stakeholders.

Environment: Tableau, Oracle, Teradata, R, Python and Spark, SQL, Hive QL, Machine learning algorithms, HDFS.

Confidential, Irving, TX

Data Scientist

Responsibilities:

  • Worked closely with the Data Governance Office team in assessing the source systems for project deliverables.
  • Developed Descriptive statistics and inferential statistics for Logistics optimization average hours per job, Value throughput data to at 95% confidence interval.
  • Analyzed customer behavior, developed a Churn Prediction Model and a Regression model to estimate Life-time Value (CLV) of users
  • Performed univariate, bivariate and multivariate analysis of approx. 4890 tuples using bar charts, box plots and histograms
  • Extensively used open source tools - R Studio(R) and Spyder(Python) for statistical analysis and building the machine learning.
  • Used packages like dplyr, tidyr & ggplot2 in R Studio for data visualization and generated scatter plot and high low graph to identify relation between different variables.
  • Used T-SQL queries to pull the data from disparate systems and Data warehouse in different environments.
  • Presented DQ analysis reports and score cards on all the validated data elements and presented- to the business teams and stakeholders.
  • Involved in defining the Source to Target data mappings, business rules, data definitions.
  • Interacting with the Business teams and Project Managers to clearly articulate the anomalies, issues, findings during data validation.
  • Extracting data from different databases as per the business requirements using SQL Server Management Studio (SSMS).
  • Writing complex SQL queries for validating the data against different kinds of reports generated by Cognos.
  • Extensively using MS Excel (Pivot tables, VLOOKUP) for data validation.
  • Interacting with the ETL, BI teams to understand / support on various ongoing projects.
  • Generating weekly, monthly reports for various business users according to the business requirements. Manipulating/mining data from database tables (Oracle, Data Warehouse).
  • Providing analytical network support to improve quality and standard work results.
  • Create data pipelines using Hadoop, spark as big data technologies etc.
  • Create statistical models using distributed and standalone models to build various diagnostics, predictive and prescriptive solution.
  • Utilize a broad variety of statistical packages like SAS, R, MLIB, Graphs, Spark, Map Reduce, Pig and others.
  • Provides input and recommendations on technical issues to BI Engineers, Business & Data Analysts and Data Scientists.

Environment: Machine learning, AWS, MS Azure, Spark, HDFS, Hive, Linux, Python (Scikit-Learn/SciPy/Numpy/Pandas), R, MySQL, Eclipse, PL/SQL, SQL connector, Tableau

Confidential, Philadelphia PA

Data Analyst

Responsibilities:

  • Actively involved in the gathering requirements and analysis of the requirements related to the project and responsible for writing Business Approach Documents and Technical Documents for new projects.
  • Worked with the team in mortgage domain to implement designs based on the free cash flow, acquisition, and capital efficiency.
  • Extensively worked on production support that includes Deployment of SSIS Packages into Development, Production Servers, Creating, Scheduling and Managing the SQL Jobs.
  • Creating SSIS Package Configurations and maintaining their Tables by editing the values of the variables as per the requirement.
  • Worked on Data migration project by validating three source data row by row and field by filed.
  • Using SQL Server Integration Services (SSIS) Migrated data from Heterogeneous Sources and legacy system (Oracle, Access, Excel) to centralized SQL Server databases to overcome transformation constraints and limitations.
  • Worked on development of SSIS package to generate excel reports individually and consolidated reports by dynamically passing the carrier ids from the SQL Table in Shared Folder to make available to the business users.
  • Used transformations like SCD, CDC, Data Conversion, Conditional Split, Merge Join, Derived Column, Lookup, Cache Transform and Union all etc., to convert raw data into required data meeting the business requirements.
  • Worked on re-design of the few SSIS Package which involved very complex logic and more than a hundred tables and tasks and developed simplified version of the Packages.
  • Created Rich dashboards using Tableau Desktop and prepared user stories to create compelling dashboards to deliver actionable insights.
  • Performed data refresh on Tableau Server on weekly, monthly and quarterly basis and ensured that the views and dashboards were accurately displaying the changes in data.
  • Designed and developed a business intelligence dashboard using Tableau Desktop, allowing executive management to view past, current and forecast sales data.
  • Created multiple Visualization reports/dashboards using Dual Axes charts, Histograms, Filled map, Bubble chart, Bar chart, Line chart, Tree map, Box and Whisker Plot, Stacked Bar etc.,
  • Experience with creation of users, groups, projects, workbooks and the appropriate permission sets for Tableau server logons and security checks.
  • Created advanced analytical dashboards using Reference Lines, Bands and Trend Lines.
  • Developed and rendered monthly, weekly and Daily reports as per the requirement and gave roles, user access with respect to security in report manager.
  • Extensively used Teradata utilities (BTEQ, Fast Load, Multiload, TPUMP) to import/export and load the data from oracle and flat files.

Environment: Microsoft SQL Server 2012/2008 R2, Teradata, SSIS/SSAS/SSRS 2012/2008, Microsoft Office, MS VISO, Crystal Reports XI Release, Visual Source Safe, Microsoft TFS, BIDS, And Visual Studio, Tableau

Confidential

Data Analyst

Responsibilities:

  • Requirement gathering from the users by participating in JAD sessions. A series of meetings were conducted with the business system users to gather the requirements for reporting.
  • Responsible for conceptual, logical, and physical data modeling, database design, star schema, snowflake schema design, data analysis, documentation, implementation and support.
  • Created and maintained logical and physical models for the data mart, which supports the credit, fraud and risk retail reporting for credit card portfolio.
  • Used forward engineering to create a Physical Data Model with DDL that best suits the requirements from the Logical Data Model.
  • Collected necessary analytical data requirements by conduction meetings with business and technical team.
  • Created DDL scripts for implementing Data Modeling changes. Created ERWIN reports in HTML, RTF format depending upon the requirement, Published Data model in model mart, created naming convention files, co-coordinated with DBA's to apply the data model changes.
  • Coordinated with DBA's and generated SQL code from the data models using.
  • Verified the correct authoritative sources were being used and the extract, transform and load (ETL) routines would not compromise the integrity of the source data Supported UAT (User Acceptance Testing) by writing SQL queries.
  • Managed day to day operational and tactical aspects of maintenance programs and capital improvement projects to ensure profitable and successful manufacturing operations.
  • Responsible for defining the naming standards for data warehouse.
  • Performed extensive data analysis and data validation on Teradata.
  • Worked with the reporting analyst and reporting development team to understand reporting requirements.
  • Involved in generating ad-hoc reports using crystal reports 9.
  • Designed and developed oracle database tables, views, indexes with proper privileges and maintained and updated the database by deleting and removing old data.
  • Created the conceptual model for the data warehouse with emphasis on insurance (life and health), mutual funds and annuity using ER Studio data modeling tool.

Environment: ERWIN 8.0, ER Studio, Oracle 11g/10g, SQL, Teradata, Informatica, Windows.

Confidential

Business Data Analyst

Responsibilities:

  • Involved in all phases of the SDLC (Software Development Life Cycle) from Requirement Gathering, analysis and design, development, testing, maintenance with timely delivery against aggressive deadlines.
  • Developed and customized stored procedures, functions, packages and Triggers.
  • Used Bulk Collections for better performance and easy retrieval of data, by reducing context switching between SQL and PL/SQL engines.
  • Handled errors using Exception Handling extensively for the ease of debugging and displaying the error messages in the application.
  • Involved in ETL Data validation, count and source to target table mapping using SQL queries and back-end testing.
  • Involved in Writing Backend SQL queries for source And Target Table Validation.
  • Worked with Informatica Tool for Understanding Transformation.
  • Troubleshooting performance issues and fine-tuning queries and stored procedures.
  • Worked under the supervision of a DBA and created database objects such as tables, views, sequences, synonyms, and table/column constraints, indexes for enhancement
  • Involved in the design of the overall database using Entity Relationship diagrams.
  • Involved in code review and unit testing.
  • Involved in Functional Testing, Integration Testing, Regression Testing.
  • Created several database triggers for implementing Integrity Constraints.

Environment: Oracle 11g, Informatica Power Center 8.5, SQL developer, SQL, PL/SQL.

TECHNICAL SKILLS:

Data Modelling Tools: Erwin 8.0, ER/Studio, SAP Power designer, MS Viso

Programming Languages: Python, R, SQL, NoSQL, SAS

Scripting Languages: Python (Numpy, SciPy, Pandas, matplotlib, scikitlearn and seaborn), R (ggplot, Weka, dplyr, knitr, caret)

BI and Visualization: Tableau, QlikView, Tableau server, SAP Business Objects, OBIEECrystal Reports XI, Power BI, Tableau, SSRS and SPSS

Databases: MSSQL Server, Oracle database, and MySQL and NoSQL (Hive, and Oracle NoSQL, Teradata

Modeling techniques: Predictive Modeling/ANOVAs/Linear Regression, Logistic, Regression/Cluster analysis

Machine Learning: Naïve Bayes, Decision Trees, Regression models, random forests, K-means clustering, Market Basket Analysis, Time-series and support vector machines

Tools: HQL, Pig, Apache Spark, Microsoft Visual Studio, SPSS, Microsoft SQL Integration Services, Microsoft SQL Analyzing Services, Microsoft SQL Reporting Services, Oracle Database Integrator.

ETL tools: Informatica power center, SSIS, SSAS.

We'd love your feedback!