Data Scientist Resume
ColoradO
PROFESSIONAL SUMMARY:
- Above 8+ years of experience in Machine Learning, Datamining with large datasets of Structured and Unstructured data, Data Acquisition, DataValidation, Predictive modeling, Data Visualization.
- Extensive experience in Text Analytics, developing different Statistical Machine Learning, Data Mining solutions to various business problems and generating data visualizations using R, Python and Tableau.
- Designing of Physical Data Architecture of New system engines.
- Hands on SparkMlib utilities such as including classification, regression, clustering, collaborative filtering, dimensionality reduction.
- Having good experience in NLP with Apache, Hadoop and Python.
- Hands on experience in implementing LDA, NaiveBayes and skilled in Random Forests, Decision Trees, Linear and Logistic Regression, SVM, Clustering, neural networks, Principle Component Analysis and good knowledge on Recommender Systems.
- Proficient in Statistical Modeling and Machine Learning techniques (Linear, Logistics, Decision Trees, Random Forest, SVM, K - Nearest Neighbors, Bayesian, XG Boost) in Forecasting/ Predictive Analytics, Segmentation methodologies, Regression based models, Hypothesis testing, Factor analysis/ PCA, Ensembles.
- Expertise in transforming business requirements into analytical models, designing algorithms, building models, developing datamining and reporting solutions that scales across massive volume of structured and unstructured data.
- Developing Logical Data Architecture with adherence to Enterprise Architecture.
- Strong experience in Software Development Life Cycle (SDLC) including Requirements Analysis, Design Specification and Testing as per Cycle in both Waterfall and Agile methodologies.
- Adept in statistical programming languages like Rand also Python including BigData technologies like Hadoop, Hive.
- Skilled in using dplyr and pandas in R and python for performing Exploratory data analysis.
- Experience working with data modeling tools like Erwin, PowerDesigner and ERStudio.
- Experience in designing star schema, Snow flake schema for Data Warehouse, ODS architecture.
- Experience in designing stunning visualizations using Tableau software and publishing and presenting dashboards, Storyline on web and desktop platforms.
- Experience and Technical proficiency in Designing, Data Modeling Online Applications, Solution Lead for Architecting Data Warehouse/Business Intelligence Applications.
- Good understanding of Teradata SQL Assistant, Teradata Administrator and data load/ export utilities like BTEQ, FastLoad, MultiLoad, FastExport.
- Experience with Data Analytics, Data Reporting, Ad-hoc Reporting, Graphs, Scales, Pivot Tables and OLAP reporting.
- Experience in maintaining database architecture and metadata that support the Enterprise Datawarehouse.
- Highly skilled in using Hadoop (pig and Hive) for basic analysis and extraction of data in the infrastructure to provide data summarization.
- Highly skilled in using visualization tools like Tableau, ggplot2 and d3.js for creating dashboards.
- Worked and extracted data from various database sources like Oracle, SQL Server, DB2, Regularly accessing JIRA tool and other internal issue trackers for the Project development.
- Skilled in System Analysis, E-R/Dimensional Data Modeling, Database Design and implementing RDBMS specific features.
- Knowledge of working with Proof of Concepts (PoC’s) and gap analysis and gathered necessary data for analysis from different sources, prepared data for data exploration using data munging and Teradata.
- Well experienced in Normalization and De-Normalization techniques for optimum performance in relational and dimensional database environments.
TECHNICAL SKILLS:
Languages: T-SQL, PL/SQL, SQL, C, C++, XML, HTML, DHTML, HTTP, Matlab, DAX, Python
Databases: SQL Server 2014/2012/2008/2005/2000, MS-AccessOracle 11g/10g/9i and Teradata, big data, hadoop
DWH / BI Tools: Microsoft Power BI, Tableau, SSIS, SSRS, SSAS, Business Intelligence Development Studio (BIDS), Visual Studio, Crystal Reports, Informatica 6.1.
Database Design Tools and Data Modeling: MS Visio, ERWIN 4.5/4.0, Star Schema/Snowflake Schema modeling, Fact & Dimensions tables, physical & logical data modeling, Normalization and De-normalization techniques, Kimball &Inmon Methodologies
Tools and Utilities: SQL Server Management Studio, SQL Server Enterprise Manager, SQL Server Profiler, Import & Export Wizard, Visual Studio .Net, Microsoft Management Console, Visual Source Safe 6.0, DTS, Crystal Reports, Power Pivot, ProClarity, Microsoft Office, Excel Power Pivot, Excel Data Explorer, Tableau, JIRA, SparkMlib
PROFESSIONAL EXPERIENCE:
Confidential, Colorado
Data Scientist
Responsibilities:
- As an Architect design conceptual, logical and physical models using Erwin and build datamarts using hybrid Inmon and Kimball DW methodologies.
- Worked closely with business, datagovernance, SMEs and vendors to define data requirements.
- Worked with data investigation, discovery and mapping tools to scan every single data record from many sources.
- Designed the prototype of the Data mart and documented possible outcome from it for end-user.
- Involved in business process modeling using UML
- Developed and maintained data dictionary to create metadata reports for technical and business purpose.
- Created SQLtables with referential integrity and developed queries using SQL, SQL*PLUS and PL/SQL .
- Implementing SparkMlib utilities such as including classification, regression, clustering, collaborative filtering and dimensionality reduction.
- Design, coding, unit testing of ETL package source marts and subject marts using Informatica ETL processes for Oracledatabase.
- Developed various QlikView Data Models by extracting and using the data from various sources files, DB2, Excel, Flat Files and Bigdata.
- Handled importing data from various data sources, performed transformations using Hive, Map Reduce, and loaded data into HDFS.
- Interaction with Business Analyst, SMEs and other Data Architects to understand Business needs and functionality for various project solutions
- Researched, evaluated, architected, and deployed new tools, frameworks, and patterns to built sustainable Big Data platforms for the clients
- Identifying and executing process improvements, hands-on in various technologies such as Oracle, Informatica, BusinessObjects.
- Designed both 3NF data models for ODS, OLTP systems and dimensionaldatamodels using Star and SnowflakeSchemas.
Environment: r9.0, Informatica 9.0, ODS, OLTP, Oracle 10g, Hive, OLAP, DB2, Metadata, MS Excel, Mainframes MS Visio, Rational Rose, Requisite Pro., Hadoop, PL/SQL, etc.
Confidential, Indianapolis
Data Scientist
Responsibilities:
- Worked as a Data Modeler/Analyst to generate Data Models using Erwin and developed relational database system.
- Analyzed the business requirements of the project by studying the Business Requirement Specification document.
- Extensively worked on DataModeling tools ErwinDataModeler to design the datamodels.
- Designedmapping to process the incremental changes that exists in the source table. Whenever source data elements were missing in source tables, these were modified/added in consistency with third normal form based OLTP source database.
- Designed tables and implemented the naming conventions for Logical and PhysicalData Models in Erwin 7.0.
- Provide expertise and recommendations for physicaldatabasedesign,architecture, testing, performance tuning and implementation.
- Designedlogical and physical data models for multiple OLTP and Analytic applications.
- Extensively used the Erwin design tool &Erwin model manager to create and maintain the DataMart.
- Designed the physical model for implementing the model into oracle9i physical data base.
- Involved with DataAnalysis primarily Identifying DataSets, SourceData, Source Meta Data, Data Definitions and Data Formats
- Performance tuning of the database, which includes indexes, and optimizing SQL statements, monitoring the server.
- Wrote simple and advanced SQLqueries and scripts to create standard and adhoc reports for senior managers.
- Collaborated the data mapping document from source to target and the data quality assessments for the source data.
- Used Expert level understanding of different databases in combinations for Data extraction and loading, joiningdata extracted from different databases and loading to a specific database.
- Co-ordinate with various business users, stakeholders and SME to get Functional expertise, design and business test scenarios review, UAT participation and validation of financial data.
- Worked very close with Data Architects and DBA team to implement data model changes in database in all environments.
- Created PL/SQL packages and DatabaseTriggers and developed user procedures and prepared user manuals for the new programs.
- Performed performance improvement of the existing Data warehouse applications to increase efficiency of the existing system.
- Designed and developed UseCase, Activity Diagrams, Sequence Diagrams, OOD (Object oriented Design) using UML and Visio.
Environment:: SQL Server 2008R2 / 2005 Enterprise, SSRS, SSIS, Crystal Reports, Windows Enterprise Server 2000, DTS, SQL Profiler, and Query Analyzer.
Confidential, Dallas, TX
Data Scientist
Responsibilities:
- Coded R functions to interface with CaffeDeepLearning Framework
- Working in AmazonWebServices cloud computing environment
- Worked with several R packages including knitr, dplyr, SparkR, CausalInfer, spacetime.
- Implemented end-to-end systems for DataAnalytics, DataAutomation and integrated with custom visualization tools using R,Mahout, Hadoop and MongoDB.
- Gathering all the data that is required from multiple data sources and creating datasets that will be used in analysis.
- Performed Exploratory DataAnalysis and DataVisualizations using R, andTableau.
- Perform a proper EDA, Univariate and bi-variate analysis to understand the intrinsic effect/combined effects.
- Worked with Data governance, Data quality, data lineage, Data architect to design various models and processes.
- Independently coded new programs and designed Tables to load and test the program effectively for the given POC's using with Big Data/Hadoop.
- Designed data models and data flow diagrams using Erwin and MSVisio.
- As an Architect implemented MDM hub to provide clean, consistent data for a SOA implementation.
- Developed, Implemented & Maintained the Conceptual, Logical&Physical Data Models using Erwin for Forward/ReverseEngineered Databases.
- Established Data architecture strategy, best practices, standards, and roadmaps.
- Lead the development and presentation of a dataanalytics data-hub prototype with the help of the other members of the emerging solutions team
- Performed datacleaning and imputation of missing values using R.
- Worked with Hadoop eco system covering HDFS, HBase, YARN and MapReduce
- Take up ad-hoc requests based on different departments and locations
- Used Hive to store the data and perform datacleaning steps for huge datasets.
- Created dash boards and visualization on regular basis using ggplot2 and Tableau
- Creating customized business reports and sharing insights to the management.
- Worked with BTEQ to submit SQL statements, import and export data, and generate reports in Teradata.
- Interacted with the other departments to understand and identify dataneeds and requirements and work with other members of the ITorganization to deliver data visualization and reportingsolutions to address those needs.
Environment: Erwin r9.0, Informatica 9.0, ODS, OLTP, Oracle 10g, Hive, OLAP, DB2, Metadata, MS Excel, Mainframes MS Visio, Rational Rose, Requisite Pro, Hadoop, PL/SQL, etc.
Confidential, Plano, TX
Data Scientist
Responsibilities:
- Supported MapReduce Programs running on the cluster.
- Evaluated business requirements and prepared detailed specifications that follow project guidelines required to develop written programs.
- Configured Hadoop cluster with Name node and slaves and formatted HDFS.
- Used Oozie workflow engine to run multiple Hive and Pig jobs.
- Performed Map Reduce Programs those are running on the cluster.
- Developed multiple MapReduce jobs in java for data cleaning and preprocessing.
- Analyzed the partitioned and bucketed data and compute various metrics for reporting.
- Involved in loading data from RDBMS and web logs into HDFS using Sqoop and Flume.
- Worked on loading the data from MySQL to HBase where necessary using Sqoop.
- Developed Hive queries for Analysis across different banners.
- Extracted data from Twitter using Java and Twitter API. Parsed JSON formatted twitter data and uploaded to database.
- Launching Amazon EC2 Cloud Instances using Amazon Images (Linux/ Ubuntu) and Configuring launched instances with respect to specific applications.
- Exported the result set from Hive to MySQL using Sqoop after processing the data.
- Analyzed the data by performing Hive queries and running Pig scripts to study customer behavior.
- Have hands on experience working on Sequence files, AVRO, HAR file formats and compression.
- Used Hive to partition and bucket data.
- Experience in writing MapReduce programs with Java API to cleanse Structured and unstructured data.
- Wrote Pig Scripts to perform ETL procedures on the data in HDFS.
- Created HBase tables to store various data formats of data coming from different portfolios.
- Worked on improving performance of existing Pig and Hive Queries.
Environment: SQL/Server, Oracle 9i, MS-Office, Teradata, Informatica, ER Studio, XML, Business Objects.
Confidential
Data Architect/Data Modeler
Responsibilities:
- Deployed GUI pages by using JSP, JSTL, HTML, DHTML, XHTML, CSS, JavaScript, AJAX
- Configured the project on WebSphere 6.1 application servers
- Implemented the online application by using Core Java, Jdbc, JSP, Servlets and EJB 1.1, Web Services, SOAP, WSDL
- Communicated with other Health Care info by using Web Services with the help of SOAP, WSDL JAX-RPC
- Used Singleton, factory design pattern, DAO Design Patterns based on the application requirements
- Used SAX and DOM parsers to parse the raw XML documents
- Used RAD as Development IDE for web applications.
- Preparing and executing Unit test cases
- Used Log4J logging framework to write Log messages with various levels.
- Involved in fixing bugs and minor enhancements for the front-end modules.
- Implemented Microsoft Visio and Rational Rose for designing the Use Case Diagrams, Class model, Sequence diagrams, and Activity diagrams for SDLC process of the application
- Doing functional and technical reviews
- Maintenance in the testing team for System testing/Integration/UAT
- Guaranteeing quality in the deliverables.
- Conducted Design reviews and Technical reviews with other project stakeholders.
- Was a part of the complete life cycle of the project from the requirements to the production support
- Created test plan documents for all back-end database modules
- Implemented the project in Linux environment.
Environment: R 3.0, Erwin 9.5, Tableau 8.0, MDM, QlikView, MLLib, PL/SQL, HDFS, Teradata 14.1, JSON, HADOOP (HDFS), MapReduce, PIG, Spark, R Studio, MAHOUT, JAVA, HIVE, AWS.
Confidential
Data Analyst/Data Modeler
Responsibilities:
- Worked with project team representatives to ensure that logical and physical ER/Studio data models were developed in line with corporate standards and guidelines.
- Involved in defining the source to target data mappings, business rules, data definitions.
- Worked with BTEQ to submit SQL statements, import and export data, and generate reports in Teradata.
- Responsible for defining the key identifiers for each mapping/interface.
- Responsible for defining the functional requirement documents for each source to target interface.
- Document, clarify, and communicate requests for change requests with the requestor and coordinate with the development and testing team.
- Work with users to identify the most appropriate source of record and profile the data required for sales and service.
- Document the complete process flow to describe program development, logic, testing, and implementation, application integration, coding.
- Involved in defining the business/transformation rules applied for sales and service data.
- Define the list codes and code conversions between the source systems and the data mart.
- Worked with internal architects and, assisting in the development of current and target state data architectures.
- Coordinate with the business users in providing appropriate, effective and efficient way to design the new reporting needs based on the user with the existing functionality.
- Remain knowledgeable in all areas of business operations in order to identify systems needs and requirements.
- Responsible for defining the key identifiers for each mapping/interface.
- Implementation of Metadata Repository, Maintaining Data Quality, Data Cleanup procedures, Transformations, Data Standards, Data Governance program, Scripts, Stored Procedures, triggers and execution of test plans
- Performed data quality in Talend Open Studio.
- Coordinated meetings with vendors to define requirements and system interaction agreement documentation between client and vendor system.
- Enterprise Metadata Library with any changes or updates.
- Document data quality and traceability documents for each source interface.
- Establish standards of procedures.
- Generate weekly and monthly asset inventory reports.
Environment: Erwin r7.0, SQL Server 2000/2005, Windows XP/NT/2000, Oracle 8i/9i, MS-DTS, UML, UAT, SQL Loader, OOD, OLTP, PL/SQL, MS Visio, Informatica.