We provide IT Staff Augmentation Services!

Data Scientist Resume

Warsaw, IndianA


  • Over 8+years of IT industry experience in Application Design, Development, and Data Management - DataGovernance, DataArchitecture, DataModeling, DataWarehousing and BI, DataIntegration, Meta-data, ReferenceDataandMDM.
  • Experience with emerging technologies such Big Data, Hadoop, andNoSQL.
  • Experience in importing and exporting data from different relational databases like MySQL, Netezza, Oracle 12c into HDFS and Hive usingSqoop.
  • Highly Efficient in Data Mart Design and creation of cubes using dimensional datamodelingidentifyingFacts and Dimensions, Star Schema and Snowflake Schema.
  • Solid hands on experience in creating and implementation of Conceptual, Logical and PhysicalModels for Online Transaction Processing and Online Analytical Processing.
  • Efficient in developing Logical and PhysicalDatamodel and organizing data as per the business requirements using Sybase Power Designer, Erwin, ER Studio in both OLTP and OLAP applications.
  • Strong experience in analyzing/ Data Transformation of large amounts of data sets writing Pigscripts and Hivequeries in AWS EMR, AWS RDS.
  • Extensive knowledge in Hadoop stack components viz. Apache Hive, Pig Scripting, etc.
  • Skillful in Data Analysis using SQL on Oracle, MS SQL Server, DB2 & Teradata.
  • Good knowledge in DataExtraction/Transformation/LoadingDataConversion and DataMigration by using SQL ServerIntegrationServices (SSIS) and PL/SQL Scripts.
  • Experience in writing SQL queries and optimizing the queries in Teradata, Netezza, Oracle and SQL Server.
  • Good experience in Normalization for OLTP and De-normalization of Entities for Enterprise Data Warehouse.
  • Excellent knowledge of Machine Learning, Mathematical Modeling and Operations Research. Comfortable with R, Python, SAS and Weka, MATLAB, Relational databases. Deep understanding & exposure of BigData Eco-system.
  • Excellent communication and inter personal skills in understanding the flow of business process and ability to interact with all the levels in Software Development Life Cycle(SDLC).
  • Experience in developing Entity-Relationship diagrams and modeling Transactional Databases and DataWarehouse using tools like Erwin, ER/Studio and Power Designer.
  • Good knowledge and experience with Normalization / De-normalization techniques for effective and optimum performance in OLTP and OLAP environments.
  • Efficient in enterprise datawarehouses using Kimball datawarehouse and Inman's methodologies.
  • Experience in creating Mapping documents for dataextraction, transformation and loading.
  • Experience in generating and documenting Metadata while designing OLTP and OLAP systems environment.
  • Experience working on creating models for Teradata master data management.
  • Experience with DBA tasks involving performance tuning, creation of indexes, creating and modifying table spaces for optimization purposes.
  • Experience working with data modeling tools like Erwin, Power Designer and ER Studio.
  • Experience in designing Star Schema, Snowflake schema for DataWarehouse, and ODSarchitecture.
  • Experience in Data Modeling&DataAnalysis experience using DimensionalDataModeling and RelationalDataModeling, Star Schema/Snowflake Modeling, FACT &Dimensionstables, Physical&LogicalDataModeling.
  • Experience in big data analysis and developing data models using Hive, PIG, and Map reduce, SQL with strong data architecting skills designing data-centric solutions.
  • Hands on experience with big data tools like Hadoop, Spark, Hive, Pig, Impala, Pyspark, Spark Sql.
  • Very good knowledge and experience on AWS, Redshift, S3 and EMR.
  • Excellent development experience SQL, Procedural Language(PL) of databases like Oracle, Teradata, Netezzaand DB2.


Machine Learning: Regression, Polynomial Regression, Random Forest, Logistic Regression, Decision Trees, Classification, Clustering, Association, Simple/Multiple linear, Kernel SVM, K-Nearest Neighbours (K-NN).

OLAP/ BI / ETL Tool: Business Objects 6.1/XI, MS SQL Server 2008/2005 Analysis Services (MS OLAP, SSAS), Integration Services (SSIS), Reporting Services (SSRS), Performance Point Server (PPS), Oracle 9i OLAP, MS Office Web Components (OWC11), DTS, MDX, Crystal Reports 10, Crystal Enterprise 10(CMC)

BI Tools: Tableau, Tableau server, Tableau Reader, SAP Business Objects, OBIEE, QlikView, SAP Business Intelligence, Amazon Redshift, or Azure Data Warehouse

Big Data Technologies: Hadoop, Map Reduce, SQOOP, Pig, Hive, NOSQL, Spark, Apache Kafka, Shiny, Yarn, Data Frames, pandas, ggplot2, Sklearn, Theano, Cuda, Azure, HD Insight, etc.

Packages: ggplot2, caret, dplyr, Rweka, gmodels, RCurl, tm, C50, twitter, NLP, Reshape2, rjson, plyr, pandas, numPy, seaborn, sciPy, matplot lib, scikit-learn, Beautiful Soup, Rpy2, sqlalchemy.

Web Technologies: JDBC, HTML5, DHTML and XML, CSS3, Web Services, WSDL

Tools: Erwin r 9.6, 9.5, 9.1, 8.x, Rational Rose, ER/Studio, MS Visio, SAP Power designer.

Languages: Java 8, Python, R.

R Package: dplyr, sqldf, data table, Random Forest, gbm, caret, elastic net and all sort of MachineLearning Packages.

Database Tools: SQL Profiler, SQL Query Analyzer, DTS Import/Export, SSRS, SSIS, SSAS, OLAP Services, Informatica Power Centre, SQL Agents, SQL Alerts, Data Visualization.

Databases: SQL, Hive, Impala, Pig, Spark SQL, Databases SQL-Server, MySQL, MS Access, HDFS, HBase, Teradata, Netezza, Mongo DB, Cassandra, SAP HANA.

Reporting Tools: MS Office (Word/Excel/Power Point/ Visio), Tableau, Crystal reports XI, Business Intelligence, SSRS, Business Objects 5.x/ 6.x, Cognos7.0/6.0.

ETL Tools: Informatica Power Centre, SSIS.

Version Control Tools: SVM, GitHub.

Methodologies: Ralph Kimball and Bill Inmon data warehousing methodology, Rational Unified Process (RUP), Rapid Application Development (RAD), Joint Application Development (JAD).

Operating System: Windows, Linux, Unix, Macintosh HD, Red Hat.


Confidential, Warsaw, Indiana

Data Scientist


  • Massively involved in DataArchitect role to review business requirement and compose source to target datamapping documents.
  • Responsible for the dataarchitecture design delivery, data model development, review, approval and Data warehouse implementation.
  • Interacting with other data scientists and architects, custom solutions for data visualization using tools like a tableau, R-Shiny and Packages in R.
  • Set strategy and oversee design for significant data modeling work, such as Enterprise Logical Models, ConformedDimensions, and Enterprise Hierarchy.
  • Analyzed existing Conceptual and Physicaldatamodels and altered them using Erwin to support enhancements.
  • Application of various machine learning algorithms and statistical modeling like decision trees, regression models, neural networks, SVM, clustering to identify Volume using scikit-learn package in python, Matlab.
  • Designed the LogicalDataModel using Erwin with the entities and attributes for each subject areas.
  • Lead architectural design in BigData, Hadoop projects and provide for a designer that is an idea-driven.
  • Wrote several Teradata SQL Queries using Teradata SQL Assistant for AdHoc Data Pull request.
  • Performing statistical data analysis and data visualization using R and Python.
  • Extensively worked on using major statistical analysis tools such as R, SQL, SAS, and MATLAB
  • Worked on creatingfiltersand calculated sets for preparing dashboards and worksheets in Tableau.
  • Created data models in Splunk using pivot tables by analyzing vast amount of data and extracting key information to suit various business requirements.
  • Created new scripts for Splunk scripted input for system, collecting CPU and OS data.
  • Implemented data refreshes on Tableau Server for biweekly and monthly increments based on business change to ensure that the views and dashboards were displaying the changed data accurately.
  • Developed and configured on InformaticaMDM hub supports the MasterDataManagement (MDM), BusinessIntelligence (BI) and DataWarehousing platforms to meet business needs.
  • Loaded data into Hive Tables from HadoopDistributedFileSystem (HDFS) to provide SQL access on Hadoop data.
  • Used AgileMethodology of Data Warehouse development.
  • Designed and developedarchitecture for dataservicesecosystem spanning Relational, NoSQL, and BigData technologies.
  • Implemented multi-data center and multi-rack Cassandra cluster.
  • Created HBase tables to load large sets of structured, semi-structured and unstructured data coming fromNoSQL and a variety of portfolios.
  • Created Entityrelationshipsdiagrams, dataflowdiagrams and enforced all referential integrity constraints using Rational Rose.
  • Worked with the ETL team to document the SSIS packages for data extraction to Warehouse environment for reporting purposes.
  • Created UDFs to calculate the pending payment for the given residential or small business customer's quotation data and used in Pig and Hive Scripts.
  • Developed data Mart for the base data in Star Schema, Snow-FlakeSchema involved in developing the data warehouse for the database.
  • Performed analysis of implementing Spark uses Scala and wrote spark sample programs using PySpark.
  • Involved in Dataloading using PL\SQLScripts and SQLServer Integration Services packages.
  • Involved in the validation of the OLAP, Unittesting and System Testing of the OLAP Report Functionality and data displayed in the reports.
  • Worked on AmazonRedshift and AWS and architecting a solution to load data creates data models and run BI on it.
  • Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted the data from Oracle into HDFS using Sqoop
  • Lead database level tuning and optimization in support of application development teams on an ad-hoc basis.

Environment: Python 2.x, R, CDH5, AWS, HDFS, Hadoop 2.3, Hive, Impala, Linux, Spark, Tableau Desktop, SQL Server 2012, Microsoft Excel, Matlab, Sqoop, Splunk, Spark SQL, Pyspark.

Confidentia, Memphis,TN.

Data Scientist


  • Understand the high level design choices and the defined technical standards for software coding, tools and platforms and ensure adherence to the same.
  • Conducted analysis in assessing customer consuming behaviors and discover value of customers with RMF analysis; applied customer segmentation with clustering algorithms such as K-Means Clustering and Hierarchical Clustering.
  • Collaborated with data engineers to implement ETL process, wrote and optimized SQL queries to perform data extraction and merging from Oracle.
  • Responsible for design and development of advanced R/Python programs to prepare transform and harmonize data sets in preparation for modeling.
  • Updated Pythonscripts to match training data with our database stored in AWS Cloud Search, so that we would be able to assign each document a response label for further classification.
  • Involved in managing backup and restoring data in the live CassandraCluster.
  • Used R, Python, and Spark to develop a variety of models and algorithms for analytic purposes.
  • Coordinated the execution of A or B tests to measure the effectiveness of a personalized recommendation system.
  • Collected unstructured data from MongoDB and completed data aggregation.
  • Performed data integrity checks, data cleaning, exploratory analysis and feature engineer using R and Python.
  • Developed personalized product recommendation with Machine learning algorithms, including Gradient Boosting Tree and Collaborative filtering to better meet the needs of existing customers and acquire new customers.
  • Used Agile Methodology of DataWarehouse development using Kanbanize.
  • Rapid model creation in Python using pandas, numpy, sklearn, and plot.ly for data visualization. These models are then implemented in SAS where they are interfaced with MSSQL databases and scheduled to update on a timely basis.
  • Extracted Mega Data from Amazon Redshift, AWS, and Elastic Search engine using SQLQueries to create reports.
  • Designed both 3NF data models for ODS, OLTP systems and dimensional data models using Star and SnowFlakeSchemas.
  • Provided suggestion to implement multitasking for existing HiveArchitecture in Hadoop also suggested UI customization in Hadoop.
  • Involved in Planning, Defining and Designing database using ERStudio on business requirement and provided documentation.
  • A highly immersive Data Science program involving Data Manipulation and Visualization , Web Scraping, Machine Learning, GIT, SQL, UNIX Commands , Python programming, NoSQL , MongoDB , Hadoop .
  • Worked on datacleaning, datapreparation and feature engineering with Python, including Numpy, Scipy, Matplotlib, Seaborn, Pandas, andScikit-learn.
  • Completed enhancement for MDM (Master Data Management) and suggested the implementation for hybrid MDM (Master Data Management).
  • Generated ad-hoc reports using Crystal Reports XI.
  • Utilized SQL and HiveQL to query, manipulate data from variety data sources including Oracle and HDFS, while maintaining data integrity.
  • Integrated data from multiples sources including HDFS to HiveDatawarehouse.
  • Worked very close with DataArchitectures and DBAteam to implement data model changes in database in all environments.

Environment: Machine learning, AWS, MS Azure, Cassandra, Spark, HDFS, Hive, Pig, Linux, Python (Scikit-Learn/Scipy/Numpy/Pandas), R, SAS, SPSS, Mysql, Eclipse, PL/SQL, SQL connector, Tableau.

Confidential, Long Island City, New York.

Data Scientist


  • Developed logical data models and physical database design and generated database schemas using Erwin 8.5.
  • Document all datamapping and transformation processes in the Functional Design documents based on the business requirements.
  • Prepared High Level LogicalDataModels using Erwin, and later translated the model into physicalmodel using the ForwardEngineering technique.
  • Developed MapReduce/SparkPython modules for machine learning & predictive analytics in Hadoop on AWS. Implemented a Python-based distributed random forest via Python streaming.
  • Generated and DDL (DataDefinitionLanguage) scripts using Erwin and assisted DBA in Physical Implementation of data Models.
  • Generated SQLscripts and implemented the relevant databases with related properties from keys, constraints, indexes & sequences.
  • Developed the batch program in PL/SQL for the OLTP processing and used UNIX Shell scripts to run in corn tab.
  • Advanced Statistical Analysis and Multivariate Big Data Manipulation related programming in Python and R depends on projects requirement. Analytics for different types of IT Research studies.
  • Created DDL scripts using Erwin and source to target mappings to bring the data from source to the warehouse.
  • Designed and developed SAS macros, applications and other utilities to expedite SAS Programming activities.
  • Involved in writing T-SQL working on SSIS, SSRS, SSAS, DataCleansing, DataScrubbing and DataMigration.
  • Analyzed and Gathered requirements from business people and management and businessrequirement document to prioritize their needs.
  • Responsible for backing up the data and involved in writing storedprocedures and involved in writing ad-hocqueries for the data mining.
  • Used SSRS for generating Reports from Databases and Generated Sub-Reports, Drilldownreports, Drill through reports and parameterized reports using SSRS.
  • Developed PL/SQL scripts to validate and load data into interface tables and Involved in maintaining data integrity between Oracle and SQL databases.
  • Heavily worked on SQLquery optimization also tuning and reviewing the performance metrics of the queries.
  • Imported the customer data into Python using Pandas libraries and performed various data analysis - found patterns in data which helped in key decisions for the company.
  • Performed the DataMapping, Datadesign(Data Modeling) to integrate the data across the multiple databases in to EDW.
  • Collaborated with the Relationship Management and Operations teams to develop and present KPIs to top-tier clients.

Environment: Erwin 8.5, Oracle 10g, MS SQL Server 2008, SSRS, OLAP, OLTP, MS Excel, Flat Files,, PL/SQL, OLAP, OLTP, SQL, IBM Cognos, Tableau.

Confidential, New York, New York.

Data Analyst


  • Coordinated with Track Leads and Project Manager to setup the pre-validation and validation environment to execute the test scripts.
  • Created PhysicalDataAnalyst from the LogicalDataAnalyst using Compare and Merge Utility in ERStudio and worked with the naming standards utility.
  • Utilized SDLC and Agilemethodologies such as SCRUM.
  • Developed normalizedLogical and Physicaldatabasemodels for designing an OLTP application.
  • Designed ER diagrams (PhysicalandLogicalusing Erwin) and mapping the data into database objects and produced Logical /PhysicalDataModels.
  • Worked with developers on dataNormalization and De-normalization, performance tuning issues, and provided assistance in stored procedures as needed.
  • Extensively used Star Schema methodologies in building and designing the logicaldatamodel into Dimensional Models
  • Creation of database objects like tables, views, Materialized views, procedures, packages using Oracle tools like PL/SQL, SQL*Loader and Handled Exceptions.
  • Enforced referential integrity in the OLTPdatamodel for consistent relationship between tables and efficient database design.
  • Involved in administrative tasks, including creation of database objects such as database, tables, and views, using SQL, DDL, and DML requests.
  • Worked on DataAnalysis, Dataprofiling, andDataModeling, datagovernance identifying Data Sets, Source Data, Source Meta Data, Data Definitions and Data Formats.
  • Loaded multi format data from various sources like flat-file, Excel, MS Access andperformingfile system operation.
  • Used T-SQL stored procedures to transfer data from OLTP databases to staging area and finally transfer into datamarts.
  • Creation of database objects like tables, views, Materialized views, procedures, packages using Oracle tools like PL/SQL, SQL* Plus, SQL*Loader and Handled Exceptions.
  • Worked on Physicaldesign for both SMP and MPP RDBMS, with understanding of RDMBS scaling features.
  • Wrote SQLQueries, Dynamic-queries, sub-queries and complex joins for generating Complex Stored Procedures, Triggers, User-defined Functions, Views and Cursors.
  • Wrote simple and advanced SQL queries and scripts to create standard and ad hoc reports for senior managers.
  • Performed ETL SQL optimization designed OLTP system environment and maintained documentation of Metadata.
  • Involved with Data Analysis primarily Identifying DataSets, SourceData, SourceMetaData, DataDefinitionsandDataFormats.
  • Used Teradata for OLTP systems by generating models to support Revenue Management Applications that connect to SAS.
  • Created SSIS Packages for import and export of data between Oracle database and others like MS Excel and Flat Files.
  • Worked in the capacity of ETL Developer (OracleDataIntegrator (ODI) / PL/SQL) to migrate data from different sources in to target Oracle Data Warehouse.
  • Designed and Developed PL/SQL procedures, functions and packages to create Summary tables.
  • Involved in creating tasks to pull and push data from Salesforce to Oracle Staging/Data Mart.
  • Created VBA Macros to convert the Excel Input files in to correct format and loaded them to SQL Server.
  • Helped the BI, ETL Developers in understanding the DataModel, data flow and the expected output for each model created.

Environment: SQL/Server, Oracle, MS-Office, Teradata, Informatica, ER Studio, XML, Business Objects, HDFS, Teradata 14.1, JSON, HADOOP (HDFS), MapReduce, PIG, Spark, R Studio, MAHOUT, JAVA, HIVE, AWS.


Data Architect/Data Modeler


  • Analyzed data sources and requirements and business rules to perform logical and physicaldatamodeling.
  • Analyzed and designed best fit logical and physicaldatamodels and relational database definitions using DB2. Generated reports of data definitions.
  • Conducted source dataanalysis of various data sources and develop source-to-target mappings with business rules.
  • Involved in Normalization/De-normalization, Normal Form and database design methodology.
  • Maintained existing ETL procedures, fixed bugs and restored software to production environment.
  • Developed the code as per the client's requirements using SQL, PL/SQL and DataWarehousingconcepts.
  • Developed enterprise datamodel management process to manage multiple datamodels developed by different groups
  • Designed and created DataMarts as part of a datawarehouse.
  • Effectively used triggers and stored procedures necessary to meet specific application's requirements.
  • Created SQL scripts for database modification and performed multiple data modeling tasks at the same time under tight schedules.
  • Reviewed new data development and ensured that it is consistent and well integrated with existing structures.
  • Wrote complex SQL queries for validating the data against different kinds of reports generated by Business Objects XIR2.
  • Involved in reviewing business requirements and analyzing data sources form Excel/Oracle SQL Server for design, development, testing, and production rollover of reporting and analysis projects.
  • Document and publish test results, troubleshoot and escalate issues
  • Worked on SAS and IDQ for DataAnalysis.
  • Using Erwin modelingtool, publishing of a datadictionary, review of the model and dictionary with subject matter experts and generation of data definition language.
  • Coordinated with DBA in implementing the Database changes and also updating DataModels with changes implemented in development, QA and Production.
  • Created and execute test scripts, cases, and scenarios that will determine optimal system performance according to specifications.
  • Worked Extensively with DBA and Reportingteam for improving the ReportPerformance with the Use of appropriate indexes and Partitioning.
  • Developed DataMapping, Transformation and Cleansing rules for the Master Data Management Architecture involved OLTP, ODS and OLAP.
  • Tuned and coded optimization using different techniques like dynamic SQL,dynamiccursors, and tuningSQL queries, writing generic procedures, functions and packages.
  • Analyzed the data and provide resolution by writing analytical/complex SQL in case of data discrepancies.
  • Experienced in GUI, RelationalDatabaseManagementSystem (RDBMS), designing of OLAP system environment as well as Report Development.
  • Extensively used SQL, T-SQL and PL/SQL to write stored procedures,functions, packages and triggers.
  • Analyzed of data report were prepared weekly, biweekly, monthly using MS Excel, SQL & UNIX.

Environment: Erwin 7.5, Oracle 10g Application Server, Oracle Developer Suite, PL/SQL, T-SQL, DB2, SQL Plus, Microsoft SQL Server 2005


Data Analyst/Data Modeler


  • NormalizedandDe-normalized the tables and maintaining ReferentialIntegritybyusingTriggersandPrimaryandForeignKeys.
  • Conducted and automated the ETL operations to Extract data from multiple data sources, transform inconsistent and missing data to consistent and reliable data, and finally load it into the Multi-dimensionaldatawarehouse.
  • Developed packages using FastParse in SSIS to reduce the extraction time in ETL process by 9.5%.
  • Involved in design and analysis of underlying database schema, altering and creation of the table structure.
  • Developed InformaticaMappings using heterogeneous sources like flat files and different relational databases, Mapplets, Mappings using PowerCenter Designer.
  • Experience with routine DBA activities like QueryOptimization, PerformanceTuningandEffective SQL Server configuration for better performance and cost reduction. Installed and configured SQL Mailclientfor SQL 2000.
  • Responsible for report generation using SQL ServerReportingServices (SSRS) and CrystalReports based on business requirements.
  • Created number of jobs, alerts and operators to be paged or emailed in case of failure for SQL 2000.
  • Created and Configured DataSource&DataSourceViews, Dimensions, Cubes, Measures, Partitions, KPI’s&MDXQueries using SQLServer2005AnalysisServices(SSAS).
  • Experience in Creating Backend validations using Insert/Update and Delete triggers and Created views for generating reports, Indexed Views.
  • Improved the performance of the SQL server queries using query plan, coveringindex, indexed views and by rebuilding and reorganizing the indexes.
  • Configure and manage databasemaintenanceplans for update statistics, database integrity check and backup operations.

Environment: SQL Server 2008, SSRS, SSIS, SSAS, Microsoft office, SQL Server Management Studio, Business Intelligence Development Studio, MS Access, Erwin data Modeler

Hire Now