We provide IT Staff Augmentation Services!

Sr Data Scientist Resume

4.00/5 (Submit Your Rating)

ColoradO

SUMMARY

  • 7+ years of robust professional experience in Data collection, Data Extraction, Data Cleaning, Data Aggregation, Data Mining, Data verification, Data analysis, Reporting, and data warehousing environments.
  • Strong understanding of Data Warehouse/Data Mart Design, ETL, BI, OLAP, Client/Server applications
  • Extensive experience in querying languages using SQL, PL/SQL, T - SQL, SAS.
  • Proficient in Data Analysis with sound knowledge in extraction of data from various database sources like MySQL, MSSQL, Oracle, Teradata and other database systems.
  • Expertise in developing advanced PL/SQL code through Stored Procedures, Triggers, Cursors, Tables, Views and User Defined Functions.
  • Experience in building Data Integration, Workflow Solutions and Extract, Transform, and Load (ETL) solutions for data warehousing using SQL Server Integration Service (SSIS), SSRS.
  • Developed OLAP Cubes by using SQL Server Analysis Services (SSAS) and defined Data Source views, Dimensions, Measures, Hierarchies, Attributes, Calculations using multi-dimensional expression (MDX), Perspectives and Roles.
  • Worked on Big Data analytic with Petabyte data volumes on Microsoft’s Big data platform (COSMOS) and SCOPE scripting.
  • Extract, Transform and load data from sources systems to Azure Data Storage services using a combination of Azure data factory, T-SQL, Spark SQL. Data ingestion to one or more Azure services (Azure Data Lake, Azure storage, Azure SQL) and processing data in Azure Data bricks
  • Expertise in Normalization/DE normalization techniques for effective and optimum performance in OLTP and OLAP environments.
  • Developed Merge jobs in Python to extract and load data into MySQL database.
  • Hands on experience in SAS programming for extracting data from Flat files Excel spreadsheets and external RDBMS (ORACLE, MySQL, DB2, Sybase) tables using LIBNAME and SQL PASSTHRU facility.
  • Expertise in Data Manipulations using SAS data step, such as Formats/Informants, Merge, Procedures like PROC APPEND, PROC DATASETS, PROC SORT, PROC TRANSPOSE.
  • Experience in developing predictive models like Decision trees, Interactive decision tree, Gradient boosting, Regression, Neural networks etc. using SAS enterprise miner.
  • Extensive knowledge of advanced STAT procedures including PROC REPORT, PROC TABULATE, PROC CORR, PROC GLM, PROC ANOVA, PROC LOGISTIC, PROC TTEST, PROC SGPLOT PROC REG, PROC FREQ, PROC MEANS, PROC UNIVARIATE.
  • Automate the process of data extraction, transformation and processing using shell scripts.
  • Used Test driven approach for developing the application and Implemented the unit tests using Python Unit test framework.
  • Experience in developing programs by using SQL, SAS & shell scripts and scheduling the processes to run on a regular basis.
  • Highly Experienced in Statistical, Econometric & Machine Learning Techniques: Descriptive Statistics, Regression, Time series, Panel Data Regression, Bayesian Methods, Clustering, Dimensionality Reduction, Market Basket Analysis, Logit, Probit and Tobit Models, Ensemble Methods, Bagging, Boosting and Pasting methods, Perceptron, CNN and RNN, Monte-Carlo Simulations, NLP
  • Worked in creating different Visualizations in Tableau & Power BI using Bar charts, Line charts, Pie charts, Maps, Scatter Plot charts, Heat maps and Table reports.
  • Extensive experience in various reporting objects like Facts, Attributes, Hierarchies, Transformations, Filters, Prompts, Calculated Fields, Sets, Groups, Parameters in Tableau.
  • Created Dashboards style of reports using Qlikview components like List Box Slider, Buttons, Charts and Bookmarks.
  • Database analysis tools Teradata SQL Assistant, TOAD and Erwin. Data cleansing tool Win Pure.
  • Responsible for creating ETL design specification document to load data from operational data store to data warehouse.
  • Prepared scripts to ensure proper data access, manipulation and reporting functions with R programming languages.
  • Formulated procedures for integration of R programming plans with data sources and delivery system.

TECHNICAL SKILLS

Programming: R, Python, SAS, Hive, Cobol, CICS, VSAM, JCL, Mainframe, JAVA, UNIX, JavaScript, Html, Xml

ETL: Informatica, Alteryx, Talend, DataStage, Pentaho, Azure Data Factory

Predictive Analytics: Decision tree, Interactive decision tree, Regression, Gradient boosting, Neural networks etc.

NOSQL: HBase, Cassandra, MongoDB, Accumulo

Databases: Microsoft SQL, SQL, PL/SQL, PostgreSQL, HIVE, Oracle SQL Developer

Operating System: Unix, Mac OS and Windows, Mainframe, Z/OS

Libraries: NumPy, Pandas, Seaborn, Bootstrap, Plotly

Machine Learning: Scikit learn, Tensor flow, Keras, Spacy, Dplyr

BI tools: Qlikview, Tableau, MSBI (SSIS, SSAS, SSRS), Data Cleaning, Data Blending, ETL, Data Wrangling, Data Mining, Data A-B Testing, Database design

PROFESSIONAL EXPERIENCE

Confidential, COLORADO

Sr Data scientist

Responsibilities:

  • Hands On experience creating, converting oracle scripts (SQL, PL/SQL) to Teradata scripts.
  • Proficient in importing/exporting large amounts of data from files to Teradata and vice versa
  • Developed reports using the Teradata advanced techniques like rank, row number
  • Worked exclusively with the Teradata SQL Assistant to interface with the Teradata
  • Used text mining and predictive modeling feature of SAS Enterprise Miner to cleanse and mine collected data in order to provide modeling and analysis of structured and unstructured data used for major business initiatives.
  • Hands on experience in working with Hadoop Ecosystems Including Hive, HBase and Spark.
  • Excellent understanding of Hadoop architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, YARN, Spark and Map Reduce programming paradigm.
  • Experienced in working with various Python Integrated Development Environments like Net Beans, PyCharm, PyScripter, Spyder, PyStudio, PyDev and Sublime Text.
  • Responsible for maintaining reports in Tableau server, entity relational data modeling, giving appropriate access to users monitoring performances. Automation of regular analysis report by creation of BI(Power BI, Qlik view) platform, involving ETL(Informatica, Data stage, Talend) of COSMOS log files, data modeling and data base( creation, design and development of SSRS reports.
  • Developed restricted user access to Tableau reports according to the user’s requirements at worksheet level.
  • Built models using K-means clustering algorithm to create user groups.
  • Modified program for Ad-hoc reporting in Enterprise Guide using Query Builder, BASE SAS programming, SAS/SQL and SAS/MACRO.
  • Scheduled data refresh on Tableau Server for weekly and monthly increments based on business change to ensure that the views and dashboards were displaying the changed data accurately.
  • Created visualizations in Tableau using Excel data extract source. Performance tuning by analyzing and comparing the turnaround times between SQL and Tableau.
  • Develop and validate SAS code for tables, listings and graphs in accordance with regulatory requirements, guidance and corporate and departmental Standard Operating Procedure (SOPs) and work practices.
  • Used AWS sagemaker and Jupyter notebook on top of AWS EMR servers for writing Python scripts for training/testing data sets.
  • Implemented Naïve Bayes, Decision Trees, Random Forest and Gradient Boosting for predictive analysis using python Scikit-Learn.
  • Carried out Statistical Analysis such as Hypothesis and Chi-square tests using R 3.4.
  • Initial models were built using supervised classification techniques like K-Nearest Neighbor (KNN), Logistic Regression and Random Forests with Principal component analysis to identify important features.
  • Created data visualizations and reports to convey results and analyze data using Tableau 9.3.
  • Analyzed data using data visualization tools and reported key features using statistical tools and supervised machine learning techniques to achieve project objectives
  • Maintained data warehouse tables through the loading of data and monitored system configurations to ensure data integrity.
  • Perform data extraction, sampling, advance data mining and statistical analysis using linear and logistic regression, time series analysis and multivariate analysis within R and Python.

Environment: Windows 8, SAS/EG, Python, MS SQL Server, Oracle, Teradata, MS-Excel, Tableau 8.2, AWS REDSHIFT, SSIS, SSRS, Snowflake, Star schema, Hadoop, SSRS, SSIS, ETL, DB2, MDM, HDFS.

Confidential, NJ

Sr. Data Engineer

Responsibilities:

  • Gathered and documented MDM application, conversion and integration requirements.
  • Interacting with Business Analysts and Developers in identifying the requirements, designing and implementing the Database Schema.
  • Performing codebase maintenance and quality checks for Microsoft Azure.
  • Recreating existing application logic and functionality in the Azure Data Lake, Data Factory, SQL Database and SQL data warehouse environment.
  • Develop dashboards and visualizations to help business users analyze data as well as providing data insight to upper management with a focus on products like SSRS and Power BI.
  • Documenting and maintaining database system specifications, diagrams, and connectivity charts.
  • Participating in T-SQL code reviews and technical quality standards reviews with the development teams.
  • Involved with Query Optimization to increase the performance.
  • Supporting solution architects in problem analysis and solution design.
  • Developing and optimizing Stored Procedures, Views, and User-Defined Functions for the Application.
  • Support Data and Analytics/Transformation Architecture teams who are building a data strategy aligned with global strategic direction - develop canonical and other models as required, implementing data architecture platforms and solutions and data services, develop MDM foundation, participate in the design and implementation of unified data warehouse.
  • Developing physical data models and creating DML scripts to create database schema and database objects.
  • Created Clustered and Non-Clustered Indexes to improve data access performance.
  • Identified Relationships between tables and enforce referential integrity using foreign key constraints.
  • Created Functional Design Documents and Transaction Definition Documents.
  • Implemented metadata standards, data governance and stewardship, master data management, ETL, ODS, data warehouse, data marts, reporting, dashboard, analytics, segmentation, and predictive modelling
  • Designing dashboards and reports, parameterized reports, predictive analysis in Power BI.
  • Creating dashboards with Combination Charts, Custom Charts based on the requirement.
  • Deploying and managing user permissions for reports and dashboards on Power BI web portal.
  • Creating DAX Queries to generated computed columns in Power BI.
  • Evaluated data profiling, cleansing, integration and extraction tools (e.g. Informatica).
  • Responsible for the Database backup and Restoration using SQL native tool.
  • Partnering closely with business and IT teams in meeting the deadlines pertaining to design and development deliverables and maintaining audit and compliance needs.

Environment: SQL, Business Objects XIR2, ETL Tools Informatica 8.6/9.1, 11G, Enterprise BI in Azure with Azure Data slake/Synapse, Microsoft Power BI

Confidential, Dayton OH

HealthCare Data Analyst

Responsibilities:

  • Analyzed client products data and ingested onto Master Data Management (MDM) with compliance oversight into data governance standards.
  • Investigated market sizing, competitive analysis, and positioning for product feasibility.
  • Wrote SQL Scripts for various MDM tables which links all the customer’s demographic details along their associated products together and mapped them to Persistent Id’s which uniquely identifies each client.
  • Automation of mastering Customers daily transactions and ingesting into MDM.
  • Performed data management projects and fulfilling ad-hoc requests per user specifications by utilizing data management software programs and tools like Perl, Toad, MS Access, Excel, and SQL.
  • Written SQL scripts to test the mappings and Developed Traceability Matrix of Business Requirements mapped to Test Scripts to ensure any Change Control in requirements leads to test case update.
  • Generated graphs and reports using ggplot package in R-Studio for analytical models.
  • Loaded, cleaned, exported and performed data analysis for medical/pharmacy claims and eligible data.
  • Knowledge on medical coding ICD-9, ICD-10, HCPCS and CPT-4 to let client know the outcome of medical claim.
  • Collaborated with operations team and developed analytics solution for company management to identify trends, demand and assist them with pricing strategy using Tibco Spotfire.
  • Leverage R’s statistical packages with Spotfire, implement data functions and TERR code in Spotfire analysis file to enhance data mining and predictive analysis capabilities.
  • Worked with AWS S3, AWS Glue, Amazon DynamoDB for extracting, transforming, data from various data sources and ingesting into MDM.
  • Developed various workbooks in Tableau from multiple data sources.
  • Created dashboards in Power BI to visualize data.
  • Later used Alteryx designer to blend the data and to validate data lineage.
  • Performed analysis using JMP SAS.
  • Written connectors to extract data from databases.
  • Analysis on Mainframe data to generate reports for business users.
  • Identified & recorded defects with required information for issue to be reproduced by development team.

Environment: Tableau, SQL, Business Objects XIR2, ETL Tools Informatica 8.6/9.1, Oracle 11G, Teradata V2R12/ R13.10, Teradata SQL Assistant 12, Spotfire Professional 4.5/5, Spotfire Web Player.

Confidential, NEW YORK

Sr Data Engineer

Responsibilities:

  • Responsible for building scalable distributed data solutions using EMR cluster environment with Amazon EMR 5.6.1.
  • Worked on Kafka REST API to collect and load the data on Hadoop file system and also used sqoop to load the data from relational databases.
  • Extract Real time feed using Kafka and Spark Streaming and convert it to RDD and process data in the form of Data Frame and save the data as Parquet format in HDFS.
  • Used Spark-Streaming APIs to perform necessary transformations and actions on the data got from Kafka and Persists into HDFS.
  • Developed Spark scripts by writing custom RDDs in Scala for data transformations and perform actions on RDDs.
  • Worked on creating Spring-Boot services for Oozie orchestration.
  • Deployed Spring-Boot entity services for Audit Framework of the loaded data.
  • Worked with Avro, Parque, ORC file formats and compression techniques like LZO.
  • Used Hive to form an abstraction on top of structured data resides in HDFS and implemented Partitions, Dynamic Partitions, Buckets on HIVE tables.
  • Used SparkAPI over HadoopYARN as execution engine for data analytics using Hive.
  • Performed advanced procedures like text analytics and processing, using the in-memory computing capabilities of Spark using Scala.
  • Worked on migrating MapReduce programs into Spark transformations using Scala.
  • Designed, developed data integration programs in a Hadoop environment with NoSQL data store Cassandra for data access and analysis.
  • Used Job management scheduler apache Oozie to execute the workflow.
  • Used Ambari to monitor node's health and status of the jobs in Hadoopclusters.
  • Designing and implementing data warehouses and data marts using components of Kimball Methodology, like Data Warehouse Bus, Conformed Facts & Dimensions, Slowly Changing Dimensions, Surrogate Keys, Star Schema, Snowflake Schema, etc.
  • Worked on Tableau to build customized interactive reports, worksheets and dashboards.
  • Implemented Kerberos for strong authentication to provide data security.
  • Implemented LDAP and Active directory for Hadoop clusters
  • Involved in performance tuning of spark jobs using Cache and using complete advantage of cluster environment.

Environment: AWS- S3, EMR, Lambda, CloudWatch, Amazon Redshift, Spark-Java, Spark- Scala, Athena, Hive, HDFS, Spark, Scala, Oozie, Bitbucket Github, Snowflake

Confidential

Sr Data Analyst

Responsibilities:

  • Gathered and documented MDM application, conversion and integration requirements.
  • Interacting with Business Analysts and Developers in identifying the requirements, designing and implementing the Database Schema.
  • Performing codebase maintenance and quality checks for Microsoft Azure.
  • Documenting and maintaining database system specifications, diagrams, and connectivity charts.
  • Participating in T-SQL code reviews and technical quality standards reviews with the development teams.
  • Involved with Query Optimization to increase the performance.
  • Supporting solution architects in problem analysis and solution design.
  • Developing and optimizing Stored Procedures, Views, and User-Defined Functions for the Application.
  • Support Data and Analytics/Transformation Architecture teams who are building a data strategy aligned with global strategic direction - develop canonical and other models as required, implementing data architecture platforms and solutions and data services, develop MDM foundation, participate in the design and implementation of unified data warehouse.
  • Developing physical data models and creating DML scripts to create database schema and database objects.
  • Created Clustered and Non-Clustered Indexes to improve data access performance.
  • Identified Relationships between tables and enforce referential integrity using foreign key constraints.
  • Created Functional Design Documents and Transaction Definition Documents.
  • Implemented metadata standards, data governance and stewardship, master data management, ETL, ODS, data warehouse, data marts, reporting, dashboard, analytics, segmentation, and predictive modelling
  • Designing dashboards and reports, parameterized reports, predictive analysis in Power BI.
  • Creating dashboards with Combination Charts, Custom Charts based on the requirement.
  • Deploying and managing user permissions for reports and dashboards on Power BI web portal.
  • Creating DAX Queries to generated computed columns in Power BI.
  • Evaluated data profiling, cleansing, integration and extraction tools (e.g. Informatica).
  • Partnering closely with business and IT teams in meeting the deadlines pertaining to design and development deliverables and maintaining audit and compliance needs.

Environment: SQL, Business Objects XIR2, ETL Tools Informatica 8.6/9.1, 11G, Enterprise BI in Azure with Azure Data slake/Synapse, Microsoft Power BI

Confidential

Python Developer

Responsibilities:

  • Involved in developing an application using the Flask framework. Undertook a test-driven approach for developing the application and implemented unit tests using the python unit test framework.
  • Used JQuery and Ajax calls for transmitting JSON data objects between frontend and controllers.
  • Developed and executed complex SQL queries on PostgreSQL database using python’s Psycopg connector library.
  • Worked on report writing using SQL Server Reporting Services (SSRS) and in creating various types of reports such as a table, matrix, chart report, web reporting by customizing URL Access.
  • Used Git as the version-controlling tool to maintain the code base and to collaborate with team members. Debugged and troubleshoot the application after every sprint release.
  • Used Jenkins pipelines to drive all microservices builds out to the Docker registry and then deployed to Kubernetes. Created and managed using Kubernetes Pods.
  • Established a continuous integration process for the application deployments. Set up a Jenkins server and created Jenkins jobs to build and deploy the application in different environments using Maven and different plug-ins.
  • Developed transformation logic for BI tools (Informatica) for data transformation into various layers in Data warehouse.
  • Utilized SQL to develop stored procedures, views to create result sets to meet varying reporting requirements.
  • Used advanced excel formulas (lookup functions, pivot table, If Statements etc.) for analyzing data.
  • Identified process improvements that significantly reduce workloads or improve quality.
  • Worked for BI Analytics team to conduct A/B testing, data extraction and exploratory analysis.
  • Generated dashboards and presented the analysis to researchers explaining insights on the data.

Environment: Excel 2010, R, Informatica Power Center 9.0, MS SQL Server

We'd love your feedback!