We provide IT Staff Augmentation Services!

Data Scientist Resume

2.00/5 (Submit Your Rating)

NY

PROFESSIONAL SUMMARY:

  • Over 8 years of Experience in Designing, Administration, Analysis, Management in the Business Intelligence Data warehousingWeb - based Applications and Databases and Experience in industries such as Retail, Financial, Accounting, Distribution, Logistics, Inventory, Manufacturing, Marketing, Services, Networking and Engineering
  • Experience in all the Latest BI Tools Tableau, Qlikview Dashboard Design and SAS.
  • Strong Data Analysis skills using business intelligence, SQL and / or MS Office Tools.
  • Experience working in Agile/Scrum Methodologies to accelerate Software Development iteration.
  • Familiar on building models with big data frameworks like Cloudera Manager and Hadoop
  • Experience in BigData with Hadoop, HDFS, MapReduce, and Spark.
  • Experience in managing full life cycle of Data Science project includes transforming business requirements into Data Collection, Data Cleaning, DataPreparation, DataValidation, DataMining, and DataVisualization from structured and unstructured Data Sources.
  • Sound RDBMS concepts and extensively worked with Oracle 8i 9i 10g 11g, DB2, SQL Server 8.0 9.0 10.0 10.5 11.0 , MySQL, and MS-Access.
  • Expertise in Project Management, Analysis, Estimation, with a unique mix of managerial functional, domain, technical and client handling skills
  • Expertise and experience in SQL, SAS and Relational databases. Deep understanding & exposure of Data mining
  • Excellent knowledge in Normalization (1NF, 2NF, 3NF and BCNF) and De-normalization techniques for improved database performance in OLTP, OLAP and Data Warehouse/Data Mart environments
  • 2+ years' experience in Agile background of software/data design, development, deployment to build services and customer support in Enterprise applications using Object Oriented Analysis and Design (OOAD)
  • Experience in applying predictive modeling and machine learning algorithms for analytical reports.
  • Profound Analytical and problem solving skills along with ability to understand current business process and implement efficient solutions to issues/problems.
  • Experience using technology to work efficiently with datasets such as scripting, data cleansing tools, statistical software packages.
  • Strong understanding of how analytics supports a large organization including being able to successfully articulate the linkage between business objectives, analytical approaches &findings and business decisions.
  • Experience in using Python and statistical software (R, Excel, Tableau)
  • Experience with Azure, SQL and Oracle PL/SQL
  • Experience working with Amazon Web Services (AWS) product like S3
  • Involved in a Aveva start-up mode and contributed to projects using Amazon Web Services (AWS) to develop and deploy applications support on device and cloud
  • Hands on experience with scripting languages like Perl, Bash Shell and PHP (for automation)
  • Familiar with nightly build management tools like Visual Studio Team Foundation Server (VSTFS)
  • Experience developing software in traditional programming languages (in C, C++) using tools like MS Visual Studio Compact Framework (VSCF) for the Windows Mobile Platform
  • Familiar with configuration management and repository management of subversion (SVN) control system source code using tools like GIT and MS Team Foundation Version Control (TFVC)
  • Good Experience in database design using PL/SQL, SQL, T-SQL to write Stored Procedures, Functions, Triggers, Views
  • Extensive experience in Data Modeling, Data Analysis and design of OLTP and OLTP systems.
  • Expertise in the Data Analysis, Design, Development, Implementation and Testing using Data Conversions, Extraction, Transformation and Loading (ETL) and SQL Server, ORACLE and other relational and non-relational databases

TECHNICAL SKILLS:

Languages: T-SQL, PL/SQL, SQL, C, C++, XML, HTML, DHTML, HTTP, MATLAB, DAX, Python

Databases: SQL Server 2014/2012/2008/2005/2000 , MS-AccessOracle 11g/10g/9i and Teradata, big data, Hadoop

DWH / BI Tools: Microsoft Power BI, Tableau, SSIS, SSRS, SSAS, Business Intelligence Development Studio (BIDS), Visual Studio, Crystal Reports, Informatica 6.1.

Database Design Tools and Data Modeling: MS Visio, ERWIN 4.5/4.0, Star Schema/Snowflake Schema modeling, Fact & Dimensions tables, physical & logical data modeling, Normalization and De-normalization techniques, Kimball &Inmon Methodologies

Tools: and Utilities: SQL Server Management Studio, SQL Server Enterprise Manager, SQL Server Profiler, Import & Export Wizard, Visual Studio .Net, Microsoft Management Console, Visual Source Safe 6.0, DTS, Crystal Reports, Power Pivot, ProClarity, Microsoft Office, Excel Power Pivot, Excel Data Explorer, Tableau, JIRA,SparkMlib

PROFESSIONAL EXPERIENCE:

Data Scientist

Confidential, NY

Responsibilities:

  • As an Architect design conceptual, logical and physical models using Erwin and build DataMart’s using hybrid Inmon and Kimball DW methodologies.
  • Worked closely with business, data governance, SMEs and vendors to define data requirements.
  • Designed and provisioned the platform architecture to execute Hadoop and machine learning use cases under Cloud infrastructure, AWS, EMR, and S3.
  • Selection of statistical algorithms - (Two Class Logistic Regression Boosted Decision Tree, Decision Forest Classifiers etc)
  • Used MLlib, Spark's Machine learning library to build and evaluate different models.
  • Executed ad-hocdataanalysis for customer insights using SQL using Amazon AWS Hadoop Cluster.
  • Worked on predictive and what-if analysis using R from HDFS and successfully loaded files to HDFS from Teradata, and loaded from HDFS to HIVE.
  • Designed the schema, configured and deployed AWSRedshiftfor optimal storage and fast retrieval ofdata.
  • Worked in using Teradata14 tools like Fast Load, Multi Load, T Pump, Fast Export, Teradata Parallel Transporter (TPT) and BTEQ.
  • Involved in creatingDataLake by extracting customer's BigDatafrom variousdatasources into Hadoop HDFS. This includeddatafrom Excel, Flat Files, Oracle, SQL Server, Mongo DB, Cassandra, HBase, Teradata, Netezzaand also logdatafrom servers
  • Used Spark Data frames, Spark-SQL, SparkMLLibextensively and developing and designing POC's using Scala, Spark SQL andMLliblibraries.
  • Created high level ETL design document and assisted ETL developers in the detail design and development of ETL maps using Informatica.
  • Used R, SQL to create Statistical algorithms involving Multivariate Regression, Linear Regression, Logistic Regression, PCA, Random forest models, Decision trees, Support Vector Machine for estimating the risks of welfare dependency.
  • Helped in migration and conversion of data from the Sybase database into Oracle database, preparing mapping documents and developing partial SQL scripts as required.
  • Generated ad-hoc SQL queries using joins, database connections and transformation rules to fetch data from legacy Oracle and SQL Server database systems.
  • Analyzed data and predicted end customer behaviors and product performance by applying machine learning algorithms using SparkMLlib.
  • Performed data mining on data using very complex SQL queries and discovered pattern and Used extensive SQL fordataprofiling/analysis to provide guidance in building thedatamodel

Environment: R, Machine Learning, Teradata 14, Hadoop Map Reduce, Pyspark, Spark, R, Spark MLLib, Tableau, Informatica, SQL, Excel, VBA, BO, CSV, Erwin, SAS, AWS Redshift, Scala Nlp, Cassandra, Oracle, MongoDB, Informatica MDM, Cognos,SQL Server 2012, Teradata, DB2, SPSS, T-SQL, PL/SQL, Flat Files, XML, and Tableau

Data Scientist

Confidential, VA

Responsibilities:

  • Involved in data management including Data Modeling, Metadata, Data Analysis, Data mapping and Data Dictionaries, Erwin9.1 and involved in Data Modeling (Oracle/MySQL/Netezza), Data Characterization, Workflow design and implementation
  • Used ErwinDataModelerand Erwin Model Manager to create Conceptual, Logical and Physical data models and maintain the model versions in Model Manager for further enhancements.
  • Worked on logical and physical modeling of various data marts as well as data warehouse using Taradata14.
  • Created and maintained LogicalDataModel (LDM) for the project. Includes documentation of all entities, attributes,datarelationships, primary and foreign key structures, allowed values, codes and glossary terms in accordance with the CorporateDataDictionary etc.
  • Implemented Bulk Load Process by converting existing Triggers toOracle11g packages to improve the Data loading process.
  • Designed and Developed Oracle11g, PL/SQL Procedures and UNIX Shell Scripts forData Import/Export andDataConversions.
  • Maintained warehousemetadata, naming standards and warehouse standards.
  • Involved in the validation of the OLAP Unit testing and System Testing of the OLAP Report Functionality anddatadisplayed in the reports.
  • Worked in using Teradata14.1 tools like Fast Load, Multi Load, T Pump, Fast Export, Teradata Parallel Transporter (TPT) and BTEQ.
  • Designed the overall ETL solution including analyzing data, preparation of high level, detailed design documents, test plans and deployment strategy.
  • Strong knowledge of Entity-Relationship concept, Facts and dimensions tables,slowly changing dimensions and Dimensional Modeling(Star Schema and Snow Flake Schema).
  • Created Software Development Life Cycle (SDLC) document and excelled in the process of Change Management, Release management and Configuration processes.
  • PreparedDataVisualization reports for the management using R
  • Documented logical, physical, relational and dimensionaldatamodels. Designed theDataMarts in dimensionaldatamodeling using star and snowflake schemas.
  • Involved inNormalizationandDe-Normalizationof existing tables for faster query retrieval.
  • CreateMDMbase objects, Landing and Staging tables to follow the comprehensive data model in MDM.
  • Used ETL methodology for supportingdataextraction, transformations and loading processing, in a complex MDM using Informatica.
  • PerformedDataValidation andDataCleaning using PROC SORT, PROC FREQ and through various SAS formats.
  • DesignedDataStaging Area andDataWarehouse to integrate thedatafrom various sources including Flat Files to facilitate management to make more fact based decisions.
  • Created jobs, alerts to run SSIS,SSRSpackages periodically. Created the automated processes for the activities such as database backup processes and SSIS,SSRSPackages run sequentially using SQL Server Agent job and windows Scheduler.
  • Manageddatain different locations using differentdatamarts usingAzure.
  • Perform reverse engineering of physicaldatamodels from databases and SQL scripts.
  • Involved in Normalization (3rd normal form), De-normalization (Star Schema forDataWarehousing.
  • Used SSIS to create ETL packages to validate, extract, transform and load data to data warehouse databases, data mart databases to OLAP databases.
  • Implemented slowly changing dimensions Type2 and Type3 for accessing history of data changes.
  • Extensively used SAS to query and subset data, summarize and present data, combine tables using joins and merges and created and modified tables.
  • Combined views and reports into interactive dashboards inTableauDesktop that were presented to Business Users, Program Managers, and End Users.
  • Utilized SDLC and Agile methodologies such as SCRUM.
  • Worked in PL/SQL Programming (Stored procedures, Triggers, Packages) using Oracle (SQL, PL/SQL), SQL Server2008 and UNIX shell scripting to perform job scheduling.

Environment: ERwin9.1, Teradata14, Oracle10g, PL/SQL, UNIX, Agile, Azure, TIDAL, MDM, ETL, BTEQ, SQL Server2008, Informatica MDM, Netezza, DB2, SAS, Tableau, UNIX, SSRS, SSIS, T-SQL, MDM, Informatica, SQL

Data/Business Analyst

Confidential, New York, NY

Responsibilities:

  • Coded R functions to interface with CaffeDeepLearning Framework
  • Working in AmazonWebServices cloud computing environment
  • Worked with several R packages including knitr, dplyr, SparkR, CausalInfer, spacetime.
  • Implemented end-to-end systems for DataAnalytics, DataAutomation and integrated with custom visualization tools using R,Mahout, Hadoop and MongoDB.
  • Gathering all the data that is required from multiple data sources and creating datasets that will be used in analysis.
  • Performed Exploratory DataAnalysis and DataVisualizations using R, andTableau.
  • Perform a proper EDA, Univariate and bi-variate analysis to understand the intrinsic effect/combined effects.
  • Worked withDatagovernance,Dataquality,datalineage,Dataarchitectto design various models and processes.
  • Independently coded new programs and designed Tables to load and test the program effectively for the given POC's using with BigData/Hadoop.
  • Designeddatamodels anddataflow diagrams using Erwin and MSVisio.
  • Performed datacleaning and imputation of missing values using R.
  • Worked with Hadoop eco system covering HDFS, HBase, YARN and MapReduce
  • Take up ad-hoc requests based on different departments and locations
  • Used Hive to store the data and perform datacleaning steps for huge datasets.
  • Created dash boards and visualization on regular basis using ggplot2 and Tableau
  • Creating customized business reports and sharing insights to the management.
  • Worked with BTEQ to submit SQL statements, import and export data, and generate reports in Teradata.
  • Interacted with the other departments to understand and identify dataneeds and requirements and work with other members of the ITorganization to deliver data visualization and reportingsolutions to address those needs.

Environment: Erwin r9.0, Informatica 9.0, ODS, OLTP, Oracle 10g, Hive, OLAP, DB2, Metadata, MS Excel, Mainframes MS Visio, Rational Rose, Requisite Pro, Hadoop, PL/SQL, etc

Data/Business Analyst

Confidential, Minneapolis MN

Responsibilities:

  • Participated in JAD sessions, gathered information from Business Analysts, end users and other stakeholders to determine the requirements.
  • Worked in Data warehousing methodologies/Dimensional Data modeling techniques such as Star/Snowflake schema using ERWIN9.1.
  • Extensively used AginityNetezzaworkbench to perform various DDL, DML etc. operations on Netezzadatabase.
  • Involved in Perform Daily Monitoring ofOracleinstances usingOracleEnterprise Manager, ADDM, TOAD, monitor users, table spaces, memory structures, rollback segments, logs, and alerts.
  • Used ER StudioData/Modelerfordatamodeling (datarequirementsanalysis, database design etc.) of custom developed information systems, including databases of transactional systems anddatamarts.
  • Involved in Teradata SQL Development, Unit Testing and Performance Tuning and to ensure testing issues are resolved on the basis of using defect reports.
  • Generated DDL scripts using Forward Engineering technique to create objects and deploy them into the databases.
  • Involved in database testing, writing complex SQL queries to verify the transactions and business logic like identifying the duplicate rows by using SQLDeveloperand PL/SQLDeveloper.
  • Used Teradata SQL Assistant, Teradata Administrator, PMON and data load/export utilities like BTEQ, FastLoad, Multi Load, Fast Export, Tpump on UNIX/Windows environments and running the batch process for Teradata.
  • Worked ondataprofiling anddatavalidation to ensure the accuracy of thedatabetween the warehouse and source systems.
  • Worked on Data warehouse concepts like Data warehouse Architecture,Starschema, Snowflakeschema, and Data Marts, Dimension and Fact tables.
  • Developed SQL Queries to fetch complex data from different tables in remote databases using joins, database links and Bulk collects.
  • Wrote TSQL like Indexes, Views, Stored Procedures and Triggers inSSMSto fulfill Requirements.
  • Involved in Database migrations from legacy systems, SQL server to Oracle andNetezza.
  • Used SSIS to create ETL packages to validate, extract, transform and load data to pull data from Source servers to staging database and then toNetezzaDatabase and DB2 Databases.
  • Worked on SQL Server concepts SSIS (SQL Server Integration Services), SSAS (Analysis Services) and SSRS (Reporting Services).

Environment: Windows XP, SQL Developer, MS-SQL 2008 R2, MS-Access, MS Excel and SQL-PLU, Java:

Data/Business Analyst

Confidential

Responsibilities:

  • Writing and executing customized SQL code for ad hoc reporting duties and used other tools for routine
  • Developed stored procedures and complex packages extensively using PL/SQL and shell programs
  • Involved in customized reports using SAS/MACRO facility, PROC REPORT, PROC TABULATE and PROC
  • Generated ad-hoc SQL queries using joins, database connections and transformation rules to fetch data from legacy SQL Server database systems
  • Used existing UNIX shell scripts and modified them as needed to process SAS jobs, search strings, execute permissions over directories etc.
  • Extensively used Star Schema methodologies in building and designing the logical data model into Dimensional Models
  • Involved in designing Context Flow Diagrams, Structure Chart and ER- diagrams
  • Worked on database features and objects such as partitioning, change data capture, indexes, views, indexed views to develop optimal physical data mode
  • Worked with SQL Server Integration Services in extracting data from several source systems and transforming the data and loading it into ODS
  • Involved in Data Analysis, Data Validation, Data Cleansing, Data Verification and identifying data mismatch.

Environment: Windows XP,SQL Developer, MS-SQL 2008 R2, MS-Access, MS Excel and SQL-PLU, Java

Confidential

Data Analyst/Data Modeler

Responsibilities:

  • Worked with project team representatives to ensure that logical and physical ER/Studio data models were developed in line with corporate standards and guidelines.
  • Involved in defining the source to target data mappings, business rules, data definitions.
  • Worked with BTEQ to submitSQLstatements, import and export data, and generate reports in Teradata.
  • Responsible for defining the key identifiers for each mapping/interface.
  • Responsible for defining the functional requirement documents for each source to target interface.
  • Document, clarify, and communicate requests for change requests with the requestor and coordinate with the development and testing team.
  • Work with users to identify the most appropriate source of record and profile the data required for sales and service.
  • Document the complete process flow to describe program development, logic, testing, and implementation, application integration, coding.
  • Involved in defining the business/transformation rules applied for sales and service data.
  • Define the list codes and code conversions between the source systems and the data mart.
  • Worked with internal architects and, assisting in the development of current and target state data architectures.
  • Coordinate with the business users in providing appropriate, effective and efficient way to design the new reporting needs based on the user with the existing functionality.
  • Remain knowledgeable in all areas of business operations in order to identify systems needs and requirements.

Environment: Erwin r7.0, SQL Server 2000/2005, Windows XP/NT/2000, Oracle 8i/9i, MS-DTS, UML, UAT, SQL Loader, OOD, OLTP, PL/SQL, MS Visio, Informatica.

We'd love your feedback!