We provide IT Staff Augmentation Services!

Data Scientist Resume

5.00/5 (Submit Your Rating)

New York, Ny

SUMMARY:

  • Around 8 years of IT experience as a Data Scientist, including profound expertise and experience on statistical data analysis such as transforming business requirements into analytical models, designing algorithms, and strategic solutions that scales across massive volumes of data.
  • Proficient in Statistical Methods like Regression models, hypothesis testing, confidence intervals, principal component analysis and dimensionality reduction.
  • Expert in R and Python scripting. Worked in stats function with Numpy, visualization using Matplotlib / Seaborn and Pandas for organizing data.
  • 4 years of experience in Scala and spark.
  • Experience in using various packages in R and python like ggplot2, caret, dplyr, Rweka, rjson, plyr, SciPy, scikit - learn, Beautiful Soup, Rpy2 .
  • Extensive experience in Text Analytics, generating data visualizations using R, Python and creating dashboards using tools like Tableau.
  • Experience in writing code in R and Python to manipulate data for data loads, extracts, statistical analysis, modelling, and data munging .
  • Extensively worked on using major statistical analysis tools such as R, SQL, SAS, and MATLAB.
  • Utilized analytical applications like R, SPSS, Rattle and Python to identify trends and relationships between different pieces of data, draw appropriate conclusions and translate analytical findings into risk management and marketing strategies that drive value.
  • Skilled in performing data parsing, data manipulation and data preparation with methods including describe data contents, compute descriptive statistics of data, regex, split and combine, Remap, merge, subset, re-index, melt and reshape.
  • Highly skilled in using visualization tools like Tableau, ggplot2 and d3 . js for creating dashboards.
  • Professional working experience in Machine Learning algorithms such as, linear regression, logistic regression, Naive Bayes, Decision Trees, Clustering, and Principle Component Analysis.
  • Hands on experience with big data tools like Hadoop, Spark, Hive, Pig, Impala, Pyspark, Spark SQL .
  • Good knowledge in Database Creation and maintenance of physical data models with Oracle, Teradata, Netezza, DB2, MongoDB, HBase and SQL Server databases.
  • Experienced in writing complex SQL Queries like Stored Procedures, triggers, joints, and Sub quires.
  • Interpret problems and provides solutions to business problems using data analysis, data mining, optimization tools, and machine learning techniques and statistics.
  • Knowledge of working with Proof of Concepts ( PoC's ) and gap analysis and gathered necessary data for analysis from various sources, prepared data for data exploration using data munging and Teradata .
  • Experience with Data Analytics, Data Reporting, Ad - hoc Reporting, Graphs, Scales, PivotTables and OLAP reporting.
  • Ability to work with managers and executives to understand the business objectives and deliver as per the business needs and a firm believer in team work.
  • Experience and domain knowledge in various industries such as healthcare, insurance, retail, banking, media and technology.
  • Work closely with customers, cross-functional teams, research scientists, software developers, and business teams in an Agile / Scrum work environment to drive data.

TECHNICAL SKILLS:

Programming: C, Python, SQL, PL/SQL, SQL * Plus, R.

Machine Learning: Regression, clustering, SVM, Decision trees, Classification, Recommendation systems, Association Rules, Survival Analysis etc.

Scikit: learns, Keras, TensorFlow, Numpy, Pandas, Scala, NLP, NLTK, Gensim, Matplotib, ggplot2.

Analysis & Modelling Tools: Erwin, Sybase Power Designer, Oracle Designer, Erwin, Rational Rose, ER/Studio, TOAD, MS Visio, & SAS.

ETL Tools: Informatica Power Centre, Data Stage, Ab Initio, Talend.

OLAP Tools: MS SQL Analysis Manager, DB2 OLAP, Cognos Power Play.

SQL, PL/SQL, T: SQL, XML, HTML, UNIX Shell Scripting, C, C++, AWK, JavaScript.

Databases: Oracle, Teradata, DB2 UDB, MS SQL Server, Netezaa, Sybase ASE, Informix, Mongo DB, HBase, Cassendra, AWS RDS.

Tools: & Software TOAD, MS Office, BTEQ, Teradata SQL Assistant.

Methodologies: Ralph Kimball, COBOL.

Reporting Tools: Business Objects XIR2, Cognos Impromptu, Informatica Analytics Delivery Platform, Micro Strategy, SSRS, Tableau.

Tools: Oracle SQL*Plus, TOAD, IBM Info Sphere Data Stage/11.5/9.1, Rstudio, ipython Notebook, Spyder, MS Office Suite, MariaDb, SAS, Spark MLib Kibana, Elastic search packages, VSS.

Operating Systems: Windows, UNIX (Sun-Solaris, HP-UX), Windows NT/XP/Vista, MSDOS.

PROFESSIONAL EXPERIENCE:

Confidential, New York, NY.

Data Scientist

Responsibilities:

  • Involved in extensive hoc reporting, routine operational reporting and data manipulation to produce routine metrics and dashboards for management
  • Created parameters, action filters and calculated sets for preparing dashboards and worksheets in Tableau.
  • Interacting with other data scientists and architects, custom solutions for data visualization using tools like a tableau, R-Shiny and Packages in R.
  • Involved in running Map Reduce jobs for processing millions of records.
  • Written complex SQL queries using joins and OLAP functions like Count, CSUM, and Rank etc.
  • Building, publishing customized interactive reports, report scheduling and dashboards using Tableau server.
  • Developed in Python programs for manipulating the data reading from various Teradata and convert them as one CSV Files.
  • Wrote several Teradata SQL Queries using Teradata SQL Assistant for Ad-Hoc Data Pull request.
  • Performing statistical data analysis and data visualization using R and Python.
  • Worked on creating filters and calculated sets for preparing dashboards and worksheets in Tableau.
  • Created data models in Splunk using pivot tables by analyzing vast amount of data and extracting key information to suit various business requirements.
  • Created new scripts for Splunk scripted input for system, collecting CPU and OS data.
  • Implemented data refreshes on Tableau Server for biweekly and monthly increments based on business change to ensure that the views and dashboards were displaying the changed data accurately.
  • Developed normalized Logical and Physical database models for designing an OLTP application.
  • Knowledgeable in AWS Environment for loading data files from on prim to Redshift cluster.
  • Performed SQL Testing on AWS Redshift databases
  • Developed TeradataSQL scripts using OLAP functions like rank and rank over to improve the query performance while pulling the data from large tables.
  • Developed and implemented SSIS, SSRS and SSAS application solutions for various business units across the organization.
  • Designed the Data Marts in dimensional data modelling using star and snowflake schemas.
  • Analyzed Data Set with SAS programming, R and Excel.
  • Publish Interactive dashboards and schedule auto-data refreshes
  • Developed MapReduce Python modules for machine learning & predictive analytics in Hadoop on AWS .
  • Maintenance of large data sets, combining data from various sources by Excel, Enterprise, and SAS Grid, Access and SQL queries.
  • Performed Tableau administering by using tableau admin commands.
  • Created Hive queries that helped market analysts spot emerging trends by comparing incremental data with Teradata reference tables and historical metrics.
  • Created UDFs to calculate the pending payment for the given residential or small business customer's quotation data and used in Pig and Hive Scripts.
  • Worked on moving data from Hive tables into HBase for real time analytics on Hive tables.
  • Handled importing of data from various data sources, performed transformations using Hive. (External tables, partitioning).
  • Responsible for creating Hive tables, loading the structured data resulted from Map Reduce jobs into the tables and writing hive queries to further analyze the logs to identify issues and behavioral patterns.
  • Design and development of ETL processes using Informatica ETL tools for dimension and fact file creation.
  • Develop and automate solutions for a new billing and membership Enterprise data Warehouse including ETL routines, tables, maps, materialized views, and stored procedures incorporating Informatica and Oracle PL/SQL toolsets.
  • Performed analysis of implementing Spark uses Scala and wrote spark sample programs using PySpark.

Environment: SQL/Server, Oracle, MS-Office, Teradata, Informatica, ER Studio, XML, Hive, HDFS, Flume, Sqooq, R connector, Python, R, Tableau.

Confidential, St.louis, MO.

Data Scientist

Responsibilities:

  • Conducted analysis in assessing customer consuming behaviors and discover value of customers with RMF analysis; applied customer segmentation with clustering algorithms such as K-Means Clustering and Hierarchical Clustering.
  • Collaborated with data engineers to implement ETL process, wrote and optimized SQL queries to perform data extraction and merging from Oracle.
  • Involved in managing backup and restoring data in the live Cassandra Cluster.
  • Used R, Python, and Spark to develop a variety of models and algorithms for analytic purposes.
  • Coordinated the execution of A or B tests to measure the effectiveness of a personalized recommendation system.
  • Collected unstructured data from MongoDB and completed data aggregation.
  • Performed data integrity checks, data cleaning, exploratory analysis and feature engineer using R and Python.
  • Developed personalized product recommendation with Machine learning algorithms, including Gradient Boosting Tree and Collaborative filtering to better meet the needs of existing customers and acquire new customers.
  • Developed logistic regression models to predict subscription response rate based on customer’s variables like past transactions, promotions, response to prior mailings, demographics, interests and hobbies, etc.
  • Predicted the claim severity to understand future loss and ranked the importance of features.
  • Used Python and Spark to implement different machine learning algorithms, including Generalized Linear Model, Random Forest, SVM, Boosting and Neural Network.
  • Evaluated parameters with K-Fold Cross Validation and optimized performance of models.
  • Worked on benchmarking Cassandra Cluster using the Cassandra stress tool.
  • A highly immersive Data Science program involving Data Manipulation and Visualization , Web Scraping, Machine Learning, GIT, SQL, UNIX Commands , Python programming , NoSQL , MongoDB , Hadoop .
  • Worked on data cleaning, data preparation and feature engineering with Python, including Numpy, Scipy, Matplotlib, Seaborn, Pandas, and Scikit-learn.
  • Identified risk level and eligibility of new insurance applicants with Machine Learning algorithms.
  • Provided analytical support to underwriting and pricing by preparing and analyzing data to be used in actuarial calculations.
  • Determined customer satisfaction and helped enhance customer using NLP.
  • Recommended and evaluated marketing approaches based on quality analytics on customer consuming behavior.
  • Utilized SQL and HiveQL to query, manipulate data from variety data sources including Oracle and HDFS, while maintaining data integrity.
  • Performed data visualization and Designed dashboards with Tableau and D3.js and provided complex reports, including charts, summaries, and graphs to interpret the findings to the team and stakeholders.
  • Identified process improvements that significantly reduce workloads or improve quality.

Environment: : Python, SQL/Server, Oracle 11g, MS-Office, Teradata, Informatica, ER Studio, XML, Hive, HDFS, Flume, Sqoop, R connector, Python, R, Tableau.

Confidential, Albany, New York.

Data Scientist

Responsibilities:

  • Provided Configuration Management and Build support for more than 5 different applications, built and deployed to the production and lower environments.
  • Implemented public segmentation using unsupervised machine learning algorithms by implementing k-means algorithm using Pyspark .
  • Using AirFlow to keep track of job statuses in repositories like MySQl and Postgre databases.
  • Responsible for different Data mapping activities from Source systems to Teradata, Text mining and building models using topic analysis, sentiment analysis for both semi-structured and unstructured data.
  • Used R and python for Exploratory Data Analysis, A/B testing, HQL, VQL, Data Lake, AWS Redshift, oozie, pySpark, Anova test and Hypothesis test to compare and identify the effectiveness of Creative Campaigns.
  • Computing A/B testing frameworks, clickstream and time spent databases using Airflow
  • Created clusters to Control and test groups and conducted group campaigns using Text Analytics .
  • Created positive and negative clusters from merchant’s transaction using Sentiment Analysis to test the authenticity of transactions and resolve any chargebacks.
  • Analysed and calculated the lifetime cost of everyone in the welfare system using 20 years of historical data.
  • Created and developed classes and web page elements using C# and AJAX. JSP was used for validating client side responses and connected C# to database to retrieve SQL data
  • Developed LINUX Shell scripts by using NZSQL / NZLOAD utilities to load data from flat files to Netezza database.
  • Developed triggers, stored procedures, functions and packages using cursors and ref cursor concepts associated with the project using Pl / SQL
  • Created various types of data visualizations using R, C#, python and Tableau/Spotfire also connected Pipeline Pilot with Spotfire to create more interactive business driven layouts.
  • Used Python, R, SQL to create Statistical algorithms involving Multivariate Regression, Linear Regression, Logistic Regression, PCA, Random forest models, Decision trees, Support Vector Machine for estimating the risks of welfare dependency.
  • Identified and targeted welfare high-risk groups with Machine learning/deep learning algorithms .
  • Conducted campaigns and run real-time trials to determine what works fast and track the impact of different initiatives.
  • Developed Tableau visualizations and dashboards using Tableau Desktop.
  • Used Graphical Entity-Relationship Diagramming to create new database design via easy to use, graphical interface.
  • Created multiple custom SQL queries in Teradata SQL Workbench to prepare the right data sets for Tableau dashboards
  • Perform analyses such as regression analysis, logistic regression, discriminant analysis, cluster analysis using SAS programming.

Environment: R, C#, Pig, Hive, Linux, R-Studio, Tableau, SQL Server, Ms Excel, Pypark.

Confidential, Chicago, IL.

Data Modeler/Data Analyst

Responsibilities:

  • Created and maintained Logical and Physical models for the data mart. Created partitions and indexes for the tables in the data mart .
  • Performed data profiling and analysis applied various data cleansing rules designed data standards and architecture/designed the relational models.
  • Maintained metadata (data definitions of table structures) and version controlling for the data model.
  • Developed SQL scripts for creating tables , Sequences , Triggers , views and materialized views
  • Worked on query optimization and performance tuning using SQL Profiler and performance monitoring.
  • Developed mappings to load Fact and Dimension tables, SCD Type 1 and SCD Type 2 dimensions and Incremental loading and unit tested the mappings.
  • Utilized Erwin's forward / reverse engineering tools and target database schema conversion process.
  • Worked on creating enterprise wide Model EDM for products and services in Teradata Environment based on the data from PDM . Conceived, designed, developed and implemented this model from the scratch.
  • Building, publishing customized interactive reports and dashboards, report scheduling using Tableau server
  • Write SQL scripts to test the mappings and Developed Traceability Matrix of Business Requirements mapped to Test Scripts to ensure any Change Control in requirements leads to test case update.
  • Responsible for development and testing of conversion programs for importing Data from text files into map Oracle Database utilizing PERL shell scripts &SQL*Loader.
  • Involved in extensive DATA validation by writing several complex SQL queries and Involved in back-end testing and worked with data quality issues.
  • Developed and executed load scripts using Teradata client utilities MULTILOAD, FASTLOAD and BTEQ.
  • Exporting and importing the data between different platforms such as SAS, MS-Excel.
  • Generated periodic reports based on the statistical analysis of the data using SQL Server Reporting Services ( SSRS ).
  • Worked with the ETL team to document the Transformation Rules for Data Migration from OLTP to Warehouse Environment for reporting purposes.
  • Created SQL scripts to find data quality issues and to identify keys, data anomalies, and data validation issues.
  • Formatting the data sets read into SAS by using Format statement in the data step as well as Proc Format.
  • Applied Business Objects best practices during development with a strong focus on reusability and better performance.
  • Developed Tableau visualizations and dashboards using Tableau Desktop.
  • Used Graphical Entity - Relationship Diagramming to create new database design via easy to use, graphical interface.
  • Designed different type of STAR schemas for detailed data marts and plan data marts in the OLAP environment.

Environment: Erwin, MS SQL Server 2008, DB2, Oracle SQL Developer, PL/SQL, Business Objects, Erwin, MS office suite, Windows XP, TOAD, SQL*PLUS, SQL*LOADER, Teradata, Netezza, SAS, Tableau, Business Objects, SSRS, tableau, SQL Assistant, Informatica, XML.

Confidential

Data Engineer

Responsibilities:

  • Designing and building multi-terabyte, full end-to-end Data Warehouse infrastructure from the ground up on Confidential Redshift for large scale data handling Millions of records every day.
  • Implementing and Managing ETL solutions and automating operational processes.
  • Optimizing and tuning the Redshift environment, enabling queries to perform up to 100x faster for Tableau and SAS Visual Analytics.
  • Wrote various data normalization jobs for new data ingested into Redshift
  • Advanced knowledge on Confidential Redshift and MPP database concepts.
  • Migrated on premise database structure to Confidential Redshift data warehouse
  • Was responsible for ETL and data validation using SQL Server Integration Services.
  • Defined and deployed monitoring, metrics, and logging systems on AWS .
  • Implemented Work Load Management ( WML ) in Redshift to prioritize basic dashboard queries over more complex longer-running ad-hoc queries. This allowed for a more reliable and faster reporting interface, giving sub-second query response for basic queries.
  • Worked publishing interactive data visualizations dashboards, reports /workbooks on Tableau and SAS Visual Analytics.
  • Responsible for Designing Logical and Physical data modelling for various data sources on Confidential Redshift
  • Designed and Developed ETL jobs to extract data from Salesforce replica and load it in data mart in Redshift.

Environment: Confidential Redshift, AWS Data Pipeline, SQL Server Integration Services, SQL Server, AWS Data Migration Services, DQS, SAS Visual Analytics, SAS Forecast server and Tableau.

Confidential

Data Analyst/Data Modeler

Responsibilities:

  • Developed UI screens for data entry application in Java swing.
  • Worked on backend service in Spring MVC and openEJB for the interaction with Oracle and Mainframe using DAO and model objects.
  • Introduced Spring IOC to increase application flexibility and replace the need for hard - coded class based application functions
  • Used Spring IOC for dependency injection to auto-wire different beans and data source to the Application.
  • Used Spring JDBC templates for database interactions and used declarative Spring AOP transaction management.
  • Used mainframe screen scraping for adding forms to mainframe through the claims data entry application.
  • Worked on jasper reports ( iReport ) to generate reports for various people (executive secretary and commissioners) based on their authorization.
  • Generated Electronic letters for attorneys and insurance carriers using iReport .
  • Worked on application deployment on various tomcat server instances using putty .
  • Worked in TOAD for PL/SQL in Oracle database for writing queries, functions, stored procedures and triggers.
  • Worked on JSP, Servlets, HTML, CSS, JavaScript, JSON, Jquery, AJAX for Vault web based project and Confidential application.
  • Used Spring MVC architecture with dispatcher Servlet and view resolver for the web applications.
  • Worked on web service integration for confidential project for integrating third party pay processing system with confidential application.

We'd love your feedback!