We provide IT Staff Augmentation Services!

Data Scientist Resume

Naperville, IL

PROFESSIONAL SUMMARY:

  • Around 8+ years of professional IT experience in Machine Learning Statistic Modeling, Predictive Modeling, Data Analytics, Data Modeling, Data Architecture, Data Analysis, Data Mining, Text Mining and Natural Language Processing (NLP), Artificial Intelligence algorithms, Business Intelligence, Analytics Models (like Decision Trees, Linear & Logistic Regression, Hadoop (Hive, PIG), R, Python, Spark, Scala, MS Excel, SQL and PostgreSQL, Erwin.
  • Proficient in managing entire data science project life cycle and actively involved in all the phases of project life cycle including data acquisition, data cleaning, data engineering, features scaling, features engineering, statistical modeling (decision trees, regression models, neural networks, SVM, clustering), dimensionality reduction using Principal Component Analysis and Factor Analysis, testing and validation using ROC plot, K - fold cross validation and data visualization.
  • Extensive experience in Text Analytics, generating data visualizations using R, Python and creating dashboards using tools like Tableau.
  • Adept and deep understanding of Statistical modeling, Multivariate Analysis, model testing, problem analysis, model comparison and validation.
  • Good knowledge in Database Creation and maintenance of physical data models with Oracle, Teradata, Netezza, DB2 and SQL Server databases.
  • Expertise in Excel Macros, Pivot Tables, vlookups and other advanced functions and expertise R user with knowledge of statistical programming languages SAS .
  • Experience in Data Analysis, Data Migration, Data Cleansing, Transformation, Integration, Data Import, and Data Export using multiple ETL tools such as Ab Initio and InformaticaPowerCenter .
  • Experienced in developing Conceptual, Logical and Physical Data Models using UML and IE notations for Online Transactional processing ( OLTP ) and Online Analytical Processing ( OLAP ) systems using Erwin, ERStudio, EnterpriseArchitect and PowerDesigner .
  • Experienced in developing Physical Data Model for multiple platforms - SQL Server/ DB2/ Oracle/ Teradata.
  • Experience in writing expressions in SSRS and Expert in fine tuning the reports. Created many Drill through and Drill Down reports using SSRS.
  • Extensive Experience in implementation functionalities like Grouping, Sorting, Derived Report parameters by using SSRS.
  • Experience in applying PredictiveModeling and MachineLearning algorithms for Analytical projects.
  • Collaborated with the lead Data Architect to model the Data warehouse in accordance to FSLDM subject areas, Snowflakeschema, 3NF format.
  • Experience in coding SQL/PLSQL using Procedures, Triggers and Packages.
  • Experience in designing stunning visualizations using Tableau software and publishing and presenting dashboards, desktop platforms and Storyline on webs.
  • Highly skilled in using visualization tools like ggplot2, Tableau and d3.js for creating dashboards.
  • Proficiency in understanding statistical and other tools/languages - R, C, C++, Java,Python, SQL, UNIX, Qlikview data visualization tool and Anaplan forecasting tool.
  • Proficient in Predictive Modeling, Data Mining Methods, Factor Analysis, ANOVA, Hypotheticaltesting, normal distribution and other advanced statistical and econometrictechniques.
  • Experienced in using ETL tools in (SSIS) MS SQL 2016, 2014, MS SQL 2012, MS SQL 2008, MSSQL 2005 and DTS in MS SQL 2000.
  • Experience in Deploying the SSIS Packages from development server to production server
  • Expert in creating simple and parameterized reports and also complex reports involving Sub Reports, Matrix/Tabular Reports, Charts and Graphs using SSRS in Businessintelligencedevelopmentstudio (BIDS).

TECHNICAL SKILLS:

Data Modeling Tools: Erwin r9.6, 9.5, 9.1, 8.x, Rational Rose, ER/Studio, MS Visio, SAP Power designer.

SQL Server Tools: SQL Server Profiler, Enterprise Manager, SQL Server 2017/2012/2008/2005, Management Studio, DTS, SSIS, SSRS, SSAS, Performance Point Server 2010.

Databases: Oracle, Teradata, Netezza, Microsoft SQL Server, MongoDB, HBase, Cassandra.

Programming Languages: Oracle PL/SQL, Python, SQL, T-SQL, UNIX shell scripting, Java.

Scripting Languages: Python (NumPy, SciPy, Pandas, Gensim, Keras), R (Caret, Weka, ggplot)

Big Data Technologies: Hadoop, Hive, HDFS, MapReduce, Pig, Kafka.

Reporting Tools: Crystal reports XI, Business Intelligence, SSRS, Business Objects 5.x/ 6.x, Cognos7.0/6.0, Tableau.

DWH / BI Tools: Microsoft Power BI, SSIS, SSRS, SSAS, Business Intelligence Development Studio (BIDS), Visual Studio, SAP Business Objects, SAP SE v 14.1(Crystal Reports) and Informatica 6.1.

Architecture: Relational DBMS, Client-Server Architecture, OLAP, OLTP, OLE-DB, XML, ASP, HTML, FTP.

Data Modeling Tools: Erwin, Sybase Power Designer, ER Studio, Enterprise Architect, Oracle Designer, MS Visio.

Operating Systems: Windows, UNIX, MS DOS, Sun Solaris.

PROFESSIONAL EXPERIENCE:

Confidential - Naperville, IL

Data Scientist

Responsibilities:

  • Built models using Statistical techniques like BayesianHMM and MachineLearning classification models like XG Boost, SVM, and Random Forest.
  • Worked with data compliance teams, data governance team to maintain data models, Metadata, data Dictionaries, define source fields and its definitions.
  • Developed MapReduce/Spark Python modules for machine learning & predictive analytics in Hadoop on AWS. Implemented a Python-based distributed random forest via Python streaming.
  • Setup storage and data analysis tools in AmazonWebServices cloud computing infrastructure.
  • A highly immersive Data Science program involving Data Manipulation & Visualization, Web Scraping, Machine Learning, Python programming, SQL, GIT, Unix Commands, NoSQL, MongoDB, Hadoop.
  • Transformed Logical Data Model to Erwin, Physical Data Model ensuring the Primary Key and Foreign Key relationships in PDM, Consistency of definitions of Data Attributes and Primary Index Considerations.
  • Developed Oracle10g stored packages, procedures, functions and database triggers using PL/SQL for ETL process, data handling, logging, archiving and to perform Oracle back-end validations for batch processes.
  • Documented logical, physical, relational and dimensional data models. Designed the Data Marts in dimensional data modeling using star and snowflake schemas.
  • Worked as Data Architects and IT Architects to understand the movement of data and its storage and ER Studio 9.7
  • Worked with the UNIX team and installed TIDAL job scheduler on QA and Production Netezza environment.
  • Handled importing data from various data sources, performed transformations using Hive, Map Reduce, and loaded data into HDFS.
  • Worked with BTEQ to submit SQL statements, import and export data, and generate reports in Teradata.
  • Designed and documented Use Cases, Activity Diagrams, Sequence Diagrams, OOD (Object Oriented Design) using UML and Visio.
  • Created Hive queries that helped analysts spot emerging trends by comparing fresh data with EDW reference tables and historical metrics and processed the data using HQL (like SQL) on top of Map-reduce.
  • Handson development and maintenance using Oracle SQL, PL/SQL, SQL Loader, and InformaticaPower Center9.1.
  • Designed the ETL process to Extract translates and load data from OLTPOracle database system to Teradata data warehouse.
  • Created tables, sequences, synonyms, joins, functions and operators in Netezza database.
  • Created and implemented MDM data model for Consumer/Provider for HealthCareMDM product from Variant.
  • Built and published customized interactive reports and dashboards, report scheduling using Tableauserver.
  • Hands on Oracle External Tables feature to read the data from flat files into Oracle staging tables.
  • Analyzed the web log data using the HiveQL to extract number of unique visitors per day, page views, visit duration, most purchased product on website and managed and reviewed Hadoop log files.
  • Used Erwin9.1 for effective model management of sharing, dividing and reusing model information and design for productivity improvement.
  • Designed and developed user interfaces and customization of Reports using Tableau and OBIEE and designed cubes for data visualization, mobile/web presentation with parameterization and cascading.
  • Performed Data Analysis and Data Profiling and worked on data transformations and data quality rules.
  • Created SSIS Packages using Pivot Transformation, Execute SQL Task, DataFlowTask, etc to import data into the data warehouse.
  • Developed and implemented SSIS, SSRS and SSAS application solutions for various business units across the organization.

Environment: ERwin9.x, Teradata, Oracle10g, Hadoop, HDFS, Pig, Hive, MapReduce, PL/SQL, UNIX, Informatica Power Center, MDM, SQL Server, Netezza, DB2, Tableau, Aginity, Architecture, SAS/Graph, SAS/SQL, Tableau, SAS/Connect and SAS/Access.

Confidential - Chicago, IL

Data Scientist

Responsibilities:

  • Provided the architectural leadership in shaping strategic, business technology projects, with an emphasis on application architecture.
  • Utilized domain knowledge and application portfolio knowledge to play a key role in defining the future state of large, business technology programs.
  • Participated in all phases of data mining, data collection, data cleaning, developing models, validation, and visualization and performed Confidential analysis.
  • Developed MapReduce/Spark Python modules for machine learning & predictive analytics in Hadoop on AWS. Implemented a Python-based distributed random forest via Python streaming.
  • Performed Source System Analysis, database design, data modeling for the warehouse layer using MLDM concepts and package layer using Dimensional modeling.
  • Created ecosystem models (e.g. conceptual, logical, physical, canonical) that are required for supporting services within the enterprise data architecture (conceptual data model for defining the major subject areas used, ecosystem logical model for defining standard business meaning for entities and fields, and an ecosystem canonical model for defining the standard messages and formats to be used in data integration services throughout the ecosystem).
  • Used Pandas, NumPy, seaborn, SciPy, Matplotlib, Scikit-learn, NLTK in Python for developing various machine learning algorithms and utilized machine learning algorithms such as linear regression, multivariate regression, naive Bayes, Random Forests, K-means, & KNN for data analysis.
  • Demonstrated experience in design and implementation of Statistical models, Predictive models, enterprise data model, metadata solution and data life cycle management in both RDBMS, Big Data environments.
  • Developed LINUX Shell scripts by using NZSQL/NZLOAD utilities to load data from flat files to Netezza database.
  • Designed and implemented system architecture for Amazon EC2 based cloud-hosted solution for client.
  • Tested Complex ETL Mappings and Sessions based on business user requirements and business rules to load data from source flat files and RDBMS tables to target tables.
  • Analyzed large data sets apply machine learning techniques and develop predictive models, statistical models and developing and enhancing statistical models by leveraging best-in-class modeling techniques.
  • Hands on database design, relational integrity constraints, OLAP, OLTP, Cubes and Normalization (3NF) and De-normalization of database.
  • Developed MapReduce/Spark Python modules for machine learning & predictive analytics in Hadoop on AWS.
  • Worked on customer segmentation using an unsupervised learning technique - clustering.
  • Worked with various Teradata15 tools and utilities like Teradata Viewpoint, Multi Load, ARC, Teradata Administrator, BTEQ and other Teradata Utilities.
  • Utilized Spark, Scala, Hadoop, HBase, Kafka, Spark Streaming, MLlib, Python, a broad variety of machine learning methods including classifications, regressions, dimensionally reduction etc.

Environment: Erwin r9.6, Python, SQL, Oracle 12c, Netezza, SQL Server, Informatica, Java, SSRS, PL/SQL, T-SQL, Tableau, MLlib, regression, Cluster analysis, Scala NLP, Spark, Kafka, MongoDB, logistic regression, Hadoop, Hive, Teradata, random forest, OLAP, Azure, MariaDB, SAP CRM, HDFS, ODS, NLTK, SVM, JSON, Tableau, XML, Cassandra, MapReduce, AWS.

Confidential - Waltham, MA

Data Modeler

Responsibilities:

  • Conducted one-to-one sessions with business users to gather data for Data Warehouse requirements.
  • Part of team analyzing database requirements in detail with the project stakeholders through Joint Requirements Development (JRD) sessions.
  • Developed an Object modeling in UML for Conceptual Data Model using Enterprise Architect.
  • Developed logical and Physical data models using Erwin to design OLTP system for different applications.
  • Facilitated transition of logical data models into the physical database design and recommended technical approaches for good data management practices.
  • Worked with DBA group to create Best-Fit Physical Data Model with DDLfrom the LogicalDataModel using Forward engineering.
  • Created entity process association matrices using ZachmanFramework, functional decomposition diagrams and data flow diagrams from business requirements documents.
  • Involved in detail designing of data marts by using Star Schema and plan data marts involving shared dimensions.
  • Used Model Manager Option in Erwin to synchronize the data models in ModelMart approach.
  • Gather various reporting requirements from Business Analysts.
  • Worked on enhancements to the Data Warehouse model using Erwin as per the business reporting requirements.
  • Reverse Engineering the reports and identified DataElements (in the source system). Dimensions, Facts and Measures required for reports.
  • Worked with the ETL team to document the transformation rules for data migration from OLTP to Warehouse environment for reporting purposes.
  • Extensive system study, design, development and testing were carried out in the Oracle environment to meet the customer requirements.
  • Developed Data Migration and Cleansing rules for the Integration Architecture (OLTP, ODS, DW).
  • Used Teradata utilities such as FastExport, MultiLOAD for handling various tasks.
  • Involved in migration projects to migrate data from data warehouses on Oracle/DB2 and migrated those to Teradata.
  • Developed data mapping documents between Legacy, Production, and User Interface Systems.
  • Generated comprehensive analytical reports by running SQLqueries against current databases to conduct data analysis.
  • Developed and maintained Data Dictionary to create Metadata Reports for technical and business purpose using Erwin report designer.
  • Generated ad-hoc repots using Crystal Reports 9and SQL Server Reporting Services (SSRS).

Environment: Erwin r9.6, DB2, Teradata, SQL-Server2008, Informatica 8.1, Enterprise Architect, Power Designer, MS SSAS, Crystal Reports, SSRS, ER Studio, Lotus Notes, Windows XP, MS Excel, word and Access.

Confidential - New York, NY

Data Modeler

Responsibilities:

  • Installation of TalendStudio.
  • Develop Integrations jobs to transfer data from source system to Hadoop.
  • Technical design documents for Transformation processes.
  • Task allocation for the ETL and Reporting team.
  • Application of business rules on the data being transferred.
  • Communicate effectively with client and their internal development team to deliver product functionality requirements.
  • Demo of POC built for the prospective customer and provide guidance and gather the feedback to backend ETL testing on SQLServer 2008 using SSIS.
  • Gathering and analysis requirements definition meetings with business users and document meeting outcomes.
  • Architecting and design of data warehouse ETL processes.
  • Create the Operational manual Document.
  • Design and implement the ETL Data model and create staging, source and Target tables in SQL server database.
  • Create Integration Jobs to backup a copy of data in network file system.
  • Frontend reporting testing on Qlik Sense.

Environment: Hadoop, ETL, ODS, MS Office, SQL Server 2008, Talend Studio, and OLAP.

Confidential

Data Analyst

Responsibilities:

  • Work with users to identify the most appropriate source of record and profile the data required for sales and service.
  • Document the complete process flow to describe program development, logic, testing, and implementation, application integration, coding.
  • Involved in defining the business/transformation rules applied for sales and service data.
  • Define the list codes and code conversions between the source systems and the data mart.
  • Worked with internal architects and, assisting in the development of current and target state data architectures.
  • Worked with project team representatives to ensure that logical and physical ER/Studio data models were developed in line with corporate standards and guidelines.
  • Involved in defining the source to target data mappings, business rules, data definitions.
  • Worked with BTEQ to submit SQL statements, import and export data, and generate reports in Teradata.
  • Responsible for defining the key identifiers for each mapping/interface.
  • Responsible for defining the functional requirement documents for each source to target interface.
  • Document, clarify, and communicate requests for change requests with the requestor and coordinate with the development and testing team.
  • Coordinated meetings with vendors to define requirements and system interaction agreement documentation between client and vendor system.
  • Enterprise Metadata Library with any changes or updates.
  • Document data quality and traceability documents for each source interface.
  • Establish standards of procedures.
  • Generate weekly and monthly asset inventory reports.
  • Coordinate with the business users in providing appropriate, effective and efficient way to design the new reporting needs based on the user with the existing functionality.
  • Remain knowledgeable in all areas of business operations in order to identify systems needs and requirements.
  • Responsible for defining the key identifiers for each mapping/interface.
  • Implementation of Metadata Repository, Maintaining Data Quality, Data Cleanup procedures, Transformations, Data Standards, Data Governance program, Scripts, Stored Procedures, triggers and execution of test plans
  • Performed data quality in Talend Open Studio.

Environment: SQL Server 2008R2/2005 Enterprise, SSRS, SSIS, Crystal Reports, Windows Enterprise Server 2000, DTS, SQL Profiler, and Query Analyzer

Confidential

Data Analyst

Responsibilities:

  • Designed and developed Natural Language Processing models for sentiment analysis.
  • Applied clustering algorithms i.e. Hierarchical, K - means with help of Scikit and Scipy .
  • Developed visualizations and dashboards using ggplot, Tableau
  • Worked on development of data warehouse, Data Lake and ETL systems using relational and non-relationaltools like SQL, No SQL.
  • Built and analyzed datasets using R, SAS, Matlab and Python (in decreasing order of usage).
  • Participated in all phases of datamining; data collection, data cleaning, developing models, validation, visualization and performed Confidential analysis .
  • Data Manipulation and Aggregation from different source using Nexus, Toad, Business Objects, PowerBI and Smart View .
  • Implemented Agile Methodology for building an internal application.
  • Good knowledge of HadoopArchitecture and various components such as HDFS, Job Tracker, Task Tracker, NameNode, DataNode, SecondaryNameNode, and MapReduce concepts.
  • As Architect delivered various complex OLAPdatabases /cubes, scorecards, dashboards and reports.
  • Programmed a utility in Python that used multiple packages ( scipy, numpy, pandas )
  • Implemented Classification using supervised algorithms like LogisticRegression, Decisiontrees, KNN, Naive Bayes.
  • Used Teradata15 utilities such as Fast Export, MLOAD for handling various tasks data migration /ETL from OLTP Source Systems to OLAP Target Systems
  • Maintenance in the testing team for System testing/Integration/ UAT .
  • Involved in preparation & design of technical documents like Bus Matrix Document, PPDM Model, and LDM & PDM .
  • Understanding the client business problems and analyzing the data by using appropriate Statistical models to generate insights.

Environment: R 3.0, Erwin 9.5, Tableau 8.0, MDM, QlikView, ML Lib, PL/SQL, HDFS, Teradata 14.1, JSON, HADOOP (HDFS), MapReduce, PIG, Spark, R Studio, MAHOUT, JAVA, HIVE, AWS.

Hire Now