Data Scientist Resume
Boston, MA
SUMMARY:
- Over 8+years of hands on experience and comprehensive industry knowledge of Machine Learning Statistic Modeling, Predictive Modeling, Data Analytics, Data Modeling, Data Architecture, Data Analysis, Data Mining, Text Mining and Natural Language Processing (NLP), Artificial Intelligence algorithms, Business Intelligence, Analytics Models (like Decision Trees, Linear & Logistic Regression, Hadoop (Hive, PIG), R, Python, Spark, Scala, MS Excel, SQL and Postgre SQL, Erwin.
- Experienced in utilizing analytical applications like R, SPSS, and Python to identify trends and relationships between different pieces of data draw appropriate conclusions and translate analytical findings into risk management and marketing strategies that drive value.
- Experienced in designing star schema (identification of facts, measures and dimensions), Snowflake schema for Data Warehouse, ODS Architecture by using tools like Erwin Data Modeler, Power Designer, E - R Studio and Microsoft Visio.
- Extensive experience in Text Analytics, developing different Statistical Machine Learning, Data Mining solutions to various business problems and generating data visualizations using R, Python and creating dashboards using tools like Tableau.
- Expertise in applying data mining techniques and optimization techniques in B2B and B2C industries and proficient in Machine Learning, Data/Text Mining, Statistical Analysis and Predictive Modeling.
- Experienced writing spark streaming and spark batch jobs using spark MLlib for analytics.
- Experienced in SQL Queries and optimizing the queries in Oracle, SQL Server, DB2, Netezza and Teradata.
- Experienced Data Modeler with conceptual, Logical and Physical Data Modeling skills, Data Profiling skills, Maintaining Data Quality, Teradata 15/14, experienced with JAD sessions for requirements gathering, creating Data Mapping, documents, writing functional specifications and queries.
- Hands on experience on clustering algorithms like K-means & Medoids and Predictive algorithms.
- Expertise in Model Development, Data Mining, Predictive Modeling, Data Visualization, Data Clearing and Management, and Database Management.
- Proficient in Hadoop, Hive, MapReduce, Pig and NOSQL databases like MongoDB, HBase, Cassandra.
- Excellent experience in SQL Loader, SQL Data, SQL Data Modeling, Reporting, SQL Database Development to load data from the Legacy systems into Oracle Databases using control files and used Oracle External Tables feature to read the data from flat files into Oracle staging tables.
- Used EXPORT/IMPORT Oracle utilities to help the DBA to migrate the databases from Oracle 11g/ 10g /9i.
- Excellent knowledge of Machine Learning, Mathematical Modeling and Operations Research, Comfortable with R, Python, SAS and Weka, MATLAB, Relational databases. Deep understanding & exposure of Big Data Eco-system.
- Experienced in Data Modeling retaining concepts of RDBMS, Logical and Physical Data Modeling until 3NormalForm (3NF) and Multidimensional Data Modeling Schema (Star schema, Snow-Flake Modeling, Facts and dimensions).
- Experienced in designing Architecture for Modeling a Data Warehouse by using tools like Erwin, Power Designer and E-R Studio.
- Experienced in Database using Oracle, XML, DB2, Teradata, Netezza, SQL server, Big Data and NoSQL.
- Good knowledge in Database Creation and maintenance of physical data models with Oracle, Teradata, Netezza, DB2 and SQL Server databases.
- Expertise in Excel Macros, Pivot Tables, vlookups and other advanced functions and expertise R user with knowledge of statistical programming languages SAS.
- Experience in Data Analysis, Data Migration, Data Cleansing, Transformation, Integration, Data Import, and Data Export using multiple ETL tools such as Ab Initio and Informatica Power Center.
- Experience in SQL and good knowledge in PL/SQL programming and developed Stored Procedures and Triggers and Data Stage, DB2, UNIX, Cognos, MDM, UNIX, Hadoop and Pig.
- Expertise in data acquisition, storage, analysis, integration, predictive modeling, logistic regression, decision trees, data mining methods, forecasting, factor analysis, cluster analysis, and other advanced statistical techniques.
- Very good knowledge in Data Analysis, Data Validation, Data Cleansing, Data Verification and Identifying Data Mismatch.
- Excellent experience on Teradata SQL queries, Teradata Indexes, Utilities such as MLOAD, TPump, Fast load and Fast Export.
- Strong experience and knowledge in Data Visualization with Tableau creating: Line and scatter plots, Bar Charts, Histograms, Pie chart, Dot charts, Box plots, Time series, Error Bars, Multiple Charts types, Multiple Axes, subplots etc.
- Experienced in SAS/BASE, SAS/STAT, SAS/SQL, SAS/MACROS, SAS/GRAPH, SAS/ACCESS, SAS/ODS, SAS/QC, SAS/ETS in Mainframe, Windows and UNIX environments.
- Experienced in Database performance tuning and Data Access optimization, writing complex SQL quires and PL/SQL blocks like stored procedures, Functions, Triggers, Cursors and ETL packages.
TECHNICAL SKILLS:
Data Modeling Tools: Erwin r9.6, 9.5, 9.1, 8.x, Rational Rose, ER/Studio, MS Visio, SAP Power designer.
Programming Languages: Oracle PL/SQL, Python, SQL, T-SQL, UNIX shell scripting, Java.
Scripting Languages: Python (NumPy, SciPy, Pandas, Gensim, Keras), R (Caret, Weka, ggplot)
Big Data Technologies: Hadoop, Hive, HDFS, MapReduce, Pig, Kafka.
Reporting Tools: Crystal reports XI, Business Intelligence, SSRS, Business Objects 5.x/ 6.x, Cognos7.0/6.0, Tableau.
ETL: Informatica Power Centre, SSIS.
Project Execution Methodologies: Ralph Kimball and Bill Inmon data warehousing methodology, Rational Unified Process (RUP), Rapid Application Development (RAD), Joint Application Development (JAD)
BI Tools: Tableau, Tableau server, Tableau Reader, SAP Business Objects, OBIEE, Qlikview, SAP Business Intelligence, Amazon Redshift, or Azure Data Warehouse
Tools: MS-Office suite (Word, Excel, MS Project and Outlook), Spark MLlib, Scala NLP, MariaDB, Azure, SAS.
Databases: Oracle, Teradata, Netezza, Microsoft SQL Server, MongoDB, HBase, Cassandra.
Operating Systems: Windows, UNIX, MS DOS, Sun Solaris.
PROFESSIONAL EXPERIENCE:
Confidential, Boston, MA
Data Scientist
Responsibilities:
- Provided the architectural leadership in shaping strategic, business technology projects, with an emphasis on application architecture.
- Utilized domain knowledge and application portfolio knowledge to play a key role in defining the future state of large, business technology programs.
- Participated in all phases of data mining, data collection, data cleaning, developing models, validation, and visualization and performed Gap analysis.
- Developed MapReduce/Spark Python modules for machine learning & predictive analytics in Hadoop on AWS. Implemented a Python-based distributed random forest via Python streaming.
- Created ecosystem models (e.g. conceptual, logical, physical, canonical) that are required for supporting services within the enterprise data architecture (conceptual data model for defining the major subject areas used, ecosystem logical model for defining standard business meaning for entities and fields, and an ecosystem canonical model for defining the standard messages and formats to be used in data integration services throughout the ecosystem).
- Used Pandas, NumPy, seaborn, SciPy, Matplotlib, Scikit-learn, NLTK in Python for developing various machine learning algorithms and utilized machine learning algorithms such as linear regression, multivariate regression, naive Bayes, Random Forests, K-means, & KNN for data analysis.
- Conducted studies, rapid plots and using advance data mining and statistical modelling techniques to build solution that optimize the quality and performance of data.
- Demonstrated experience in design and implementation of Statistical models, Predictive models, enterprise data model, metadata solution and data life cycle management in both RDBMS, Big Data environments.
- Analyzed large data sets apply machine learning techniques and develop predictive models, statistical models and developing and enhancing statistical models by leveraging best-in-class modeling techniques.
- Worked on database design, relational integrity constraints, OLAP, OLTP, Cubes and Normalization (3NF) and De-normalization of database.
- Developed MapReduce/Spark Python modules for machine learning & predictive analytics in Hadoop on AWS.
- Worked on customer segmentation using an unsupervised learning technique - clustering.
- Worked with various Teradata15 tools and utilities like Teradata Viewpoint, Multi Load, ARC, Teradata Administrator, BTEQ and other Teradata Utilities.
- Utilized Spark, Scala, Hadoop, HBase, Kafka, Spark Streaming, MLlib, Python, a broad variety of machine learning methods including classifications, regressions, dimensionally reduction etc.
- Developed LINUX Shell scripts by using NZSQL/NZLOAD utilities to load data from flat files to Netezza database.
- Designed and implemented system architecture for Amazon EC2 based cloud-hosted solution for client.
- Tested Complex ETL Mappings and Sessions based on business user requirements and business rules to load data from source flat files and RDBMS tables to target tables.
Environment: Erwin r9.6, Python, SQL, Oracle 12c, Netezza, SQL Server, Informatica, Java, SSRS, PL/SQL, T-SQL, Tableau, MLlib, regression, Cluster analysis, Scala NLP, Spark, Kafka, MongoDB, logistic regression, Hadoop, Hive, Teradata, random forest, OLAP, Azure, MariaDB, SAP CRM, HDFS, ODS, NLTK, SVM, JSON, Tableau, XML, Cassandra, MapReduce, AWS.
Confidential, Des Moines, IA
Sr. Data Modeler
Responsibilities:
- Worked with data compliance teams, data governance team to maintain data models, Metadata, Data Dictionaries; define source fields and its definitions.
- Performed Source System Analysis, database design, data modeling for the warehouse layer using MLDM concepts and package layer using Dimensional modeling.
- Documented logical, physical, relational and dimensional data models. Designed the Data Marts in dimensional data modeling using star and snowflake schemas.
- Enabled speedy reviews and first mover advantages by using Oozie to automate data loading into the Hadoop Distributed File System and PIG to pre-process the data.
- Transformed Logical Data Model to Erwin, Physical Data Model ensuring the Primary Key and Foreign Key relationships in PDM, Consistency of definitions of Data Attributes and Primary Index Considerations.
- Developed Oracle10g stored packages, procedures, functions and database triggers using PL/SQL for ETL process, data handling, logging, archiving and to perform Oracle back-end validations for batch processes.
- Used Netezza SQL, Stored Procedures, and NZload utilities as part of the DWH appliance framework.
- Worked with the UNIX team and installed TIDAL job scheduler on QA and Production Netezza environment.
- Handled importing data from various data sources, performed transformations using Hive, Map Reduce, and loaded data into HDFS.
- Worked with BTEQ to submit SQL statements, import and export data, and generate reports in Teradata.
- Designed and documented Use Cases, Activity Diagrams, Sequence Diagrams, OOD (Object Oriented Design) using UML and Visio.
- Created Hive queries that helped analysts spot emerging trends by comparing fresh data with EDW reference tables and historical metrics and processed the data using HQL (like SQL) on top of Map-reduce.
- Worked on development and maintenance using Oracle SQL, PL/SQL, SQL Loader, and Informatica Power Center9.1.
- Designed the ETL process to Extract translates and load data from OLTP Oracle database system to Teradata data warehouse.
- Created tables, sequences, synonyms, joins, functions and operators in Netezza database.
- Created and implemented MDM data model for Consumer/Provider for HealthCare MDM product from Variant.
- Analyzed the web log data using the HiveQL to extract number of unique visitors per day, page views, visit duration, most purchased product on website and managed and reviewed Hadoop log files.
- Used Erwin9.1 for effective model management of sharing, dividing and reusing model information and design for productivity improvement.
- Designed and developed user interfaces and customization of Reports using Tableau and OBIEE and designed cubes for data visualization, mobile/web presentation with parameterization and cascading.
- Performed Data Analysis and Data Profiling and worked on data transformations and data quality rules.
- Created SSIS Packages using Pivot Transformation, Execute SQL Task, Data Flow Task, etc to import data into the data warehouse.
- Performed administrative tasks, including creation of database objects such as database, tables, and views, using SQL DCL, DDL, and DML requests.
- Developed and implemented SSIS, SSRS and SSAS application solutions for various business units across the organization.
- Built and published customized interactive reports and dashboards, report scheduling using Tableau server.
- Used SQL Loader to load data from the Legacy systems into Oracle databases using control files extensively.
- Used Oracle External Tables feature to read the data from flat files into Oracle staging tables.
Environment: ERwin9.x, Teradata, Oracle10g, Hadoop, HDFS, Pig, Hive, MapReduce, PL/SQL, UNIX, Informatica Power Center, MDM, SQL Server, Netezza, DB2, Tableau, Aginity, Architecture, SAS/Graph, SAS/SQL, Tableau, SAS/Connect and SAS/Access.
Confidential, Austin, TX
Sr. Data Modeler/Data Analyst
Responsibilities:
- Designed ER diagrams (Physical and Logical using Erwin) and mapping the data into database objects and produced Logical /Physical Data Models.
- Performed Data Analysis, Data Migration and data profiling using complex SQL on various sources systems including Oracle and Teradata13.1.
- Participated in Normalization /De-normalization, Normal Form and database design methodology. Expertise in using data modeling tools like MS Visio and Erwin Tool for logical and physical design of databases.
- Participated in Dimensional modeling (Star Schema) of the Data warehouse and used Erwin to design the business process, dimensions and measured facts.
- Used extensively Base SAS, SAS/Macro, SAS/SQL, and Excel to develop codes and generated various analytical reports.
- Implemented full lifecycle in Data Modeler/Data Analyst, Data warehouses and DataMart’s with Star Schemas, Snowflake Schemas, and SCD& Dimensional Modeling Erwin
- Used ERWIN for reverse engineering to connect to existing database and ODS to create graphical representation in the form of Entity Relationships and elicit more information.
- Used Model Manager Option in Erwin to synchronize the data models in Model Mart approach.
- Documented the transformation rules for data migration from OLTP to Warehouse environment for reporting purposes with help from ETL team.
- Created 3NF business area data modeling with de-normalized physical implementation data and information requirements analysis using ERWIN tool.
- Performed data mining on data using very complex SQL queries and discovered pattern and used extensive SQL for data profiling/analysis to provide guidance in building the data model.
- Created high level ETL design document and assisted ETL developers in the detail design and development of ETL maps using Informatica.
- Integrated various relational and non-relational sources such as DB2, Teradata, Oracle, SFDC, Netezza, SQL Server, COBOL, XML and Flat Files.
- Imported data and cleansed the data from various sources like Teradata, Oracle, flat files, SQL Server2008 with high volume data.
- Developed dimensional model for Data Warehouse/OLAP applications by identifying required facts and dimensions.
- Worked on SQL Server concepts SSIS (SQL Server Integration Services), SSAS (Analysis Services) and SSRS (Reporting Services).
- Developed the code as per the client's requirements using SQL, PL/SQL and Data Ware housing concepts.
- Participated in several facets of MDM implementations including Data Profiling, metadata acquisition and data migration.
- Identified and tracked the slowly changing dimensions (SCD), heterogeneous sources and determined the hierarchies in dimensions.
- Migrated the Current Optum Rx Data Warehouse from the series database environment to a Netezza appliance.
Environment: ERWIN r9.1, SQL server 2008, Business Objects XI, MS Excel 2010, Informatica, Rational Rose, Oracle 10g, SAS, SQL, PL/SQL, SSRS, SSIS, T-SQL, Netezza, Tableau, XML, DDL, TOAD for Data Analysis, Teradata SQL Assistant.
Confidential, Franklin Lakes, NJ
Data Analyst/Modeler
Responsibilities:
- Participated in JAD sessions, gathered information from Business Analysts, end users and other stakeholders to determine the requirements.
- Worked in Data warehousing methodologies/Dimensional Data modeling techniques such as Star/Snowflake schema using ERWIN9.1.
- Extensively used Aginity Netezza workbench to perform various DDL, DML etc. operations on Netezza database.
- Designed the Data Warehouse and MDM hub Conceptual, Logical and Physical data models.
- Performed Daily Monitoring of Oracle instances using Oracle Enterprise Manager, ADDM, TOAD, monitor users, table spaces, memory structures, rollback segments, logs and alerts.
- Used ER Studio Data/ Modeler for data modeling (data requirements analysis, database design etc.) of custom developed information systems, including databases of transactional systems and Datamart’s.
- Involved in Teradata SQL Development, Unit Testing and Performance Tuning and to ensure testing issues are resolved on the basis of using defect reports.
- Customized reports using SAS/MACRO facility, PROC REPORT, PROC TABULATE and PROC.
- Used Normalization methods up to 3NF and De-normalization techniques for effective performance in OLTP and OLAP systems.
- Generated DDL scripts using Forward Engineering technique to create objects and deploy them into the databases.
- Worked on database testing, wrote complex SQL queries to verify the transactions and business logic like identifying the duplicate rows by using SQL Developer and PL/SQL Developer.
- Used Teradata SQL Assistant, Teradata Administrator, PMON and data load/export utilities like BTEQ, Fast Load, Multi Load, Fast Export, TPump on UNIX/Windows environments and running the batch process for Teradata.
- Worked on data profiling and data validation to ensure the accuracy of the data between the warehouse and source systems.
- Worked on Data warehouse concepts like Data warehouse Architecture, Star schema, Snowflake schema, and Data Marts, Dimension and Fact tables.
- Developed SQL Queries to fetch complex data from different tables in remote databases using joins, database links and Bulk collects.
- Migrated database from legacy systems, SQL server to Oracle and Netezza.
- Used SSIS to create ETL packages to validate, extract, transform and load data to pull data from Source servers to staging database and then to Netezza Database and DB2 Databases.
- Worked on SQL Server concepts SSIS (SQL Server Integration Services), SSAS (Analysis Services) and SSRS (Reporting Services).
Environment: ER Studio, Teradata13.1, SQL, PL/SQL, BTEQ, DB2, Oracle, MDM, Netezza, ETL, RTF UNIX, SQL Server2010, Informatica, SSRS, SSIS, SSAS, SAS, Aginity.