Data Scientist Resume
FL
SUMMARY
- Above 8+ years of experience in Machine Learning, Datamining with large datasets of Structured and Unstructured data, Data Acquisition, DataValidation, Predictive modeling, Data Visualization.
- Extensive experience in Text Analytics, developing different Statistical Machine Learning, Data Mining solutions to various business problems and generating data visualizations using R, Python and Tableau.
- Designing of Physical Data Architecture of New system engines.
- Hands on SparkMlib utilities such as including classification, regression, clustering, collaborative filtering, dimensionality reduction.
- Having good experience in NLP with Apache, Hadoop and Python.
- Hands on experience in implementing LDA, NaiveBayes and skilled in Random Forests, Decision Trees, Linear and Logistic Regression, SVM, Clustering, neural networks, Principle Component Analysis and good knowledge on Recommender Systems.
- Proficient in Statistical Modeling and Machine Learning techniques (Linear, Logistics, Decision Trees, Random Forest, SVM, K - Nearest Neighbors, Bayesian, XG Boost) in Forecasting/ Predictive Analytics, Segmentation methodologies, Regression based models, Hypothesis testing, Factor analysis/ PCA, Ensembles.
- Expertise in transforming business requirements into analytical models, designing algorithms, building models, developing datamining and reporting solutions that scales across massive volume of structured and unstructured data.
- Developing Logical Data Architecture with adherence to Enterprise Architecture.
- Strong experience in Software Development Life Cycle (SDLC) including Requirements Analysis, Design Specification and Testing as per Cycle in both Waterfall and Agile methodologies.
- Adept in statistical programming languages like Rand also Python including BigData technologies like Hadoop, Hive.
- Identified, recommended, and designedsystemsolutions for opportunities for automation, and process improvement
- Skilled in using dplyr and pandas in R and python for performing Exploratory data analysis.
- Experience working withdatamodeling tools like Erwin, PowerDesigner and ERStudio.
- Experience in designing star schema, Snow flake schema forDataWarehouse, ODS architecture.
- Experience in designing stunning visualizations using Tableau software and publishing and presenting dashboards, Storyline on web and desktop platforms.
- Experience and Technical proficiency in Designing,DataModeling Online Applications, Solution Lead for ArchitectingDataWarehouse/Business Intelligence Applications.
- Good understanding of Teradata SQL Assistant, Teradata Administrator anddataload/ export utilities like BTEQ, FastLoad, MultiLoad, FastExport.
- Experience with Data Analytics, Data Reporting, Ad-hoc Reporting, Graphs, Scales, Pivot Tables and OLAP reporting.
- Experience in maintaining database architecture and metadata that support the EnterpriseDatawarehouse.
- Understanding ofSAS andR programming, conducting data extracting analysis,profiling and composite reporttools- SAS/R/SQL/ PYTHON / TABLEAUto perform joinsExpertise in data facts collection, root cause analysis of business problem and formulation of the key findings to the management
- Highly skilled in using Hadoop (pig and Hive) for basic analysis and extraction of data in the infrastructure to provide data summarization.
- Highly skilled in using visualization tools like Tableau, ggplot2 and d3.js for creating dashboards.
- Worked and extracteddatafrom various database sources like Oracle, SQL Server, DB2, Regularly accessing JIRA tool and other internal issue trackers for the Project development.
- Skilled in System Analysis, E-R/DimensionalDataModeling, Database Design and implementing RDBMS specific features.
- Knowledge of working with Proof of Concepts (PoC’s) and gap analysis and gathered necessary data for analysis from different sources, prepared data for data exploration using data munging and Teradata.
- Well experienced in Normalization and De-Normalization techniques for optimum performance in relational and dimensional database environments.
TECHNICAL SKILLS
Languages: T-SQL, PL/SQL, SQL, C, C++, XML, HTML, DHTML, HTTP, Matlab, DAX, Python
Databases: SQL Server 2014/2012/2008/2005/2000 , MS-AccessOracle 11g/10g/9i and Teradata, big data, hadoop
DWH / BI Tools: Microsoft Power BI, Tableau, SSIS, SSRS, SSAS, Business Intelligence Development Studio (BIDS), Visual Studio, Crystal Reports, Informatica 6.1.
Database Design Tools and Data Modeling: MS Visio, ERWIN 4.5/4.0, Star Schema/Snowflake Schema modeling, Fact & Dimensions tables, physical & logical data modeling, Normalization and De-normalization techniques, Kimball &Inmon Methodologies
Tools and Utilities: SQL Server Management Studio, SQL Server Enterprise Manager, SQL Server Profiler, Import & Export Wizard, Visual Studio .Net, Microsoft Management Console, Visual Source Safe 6.0, DTS, Crystal Reports, Power Pivot, ProClarity, Microsoft Office, Excel Power Pivot, Excel Data Explorer, Tableau, JIRA, SparkMlib
PROFESSIONAL EXPERIENCE
Confidential, FL
Data Scientist
Responsibilities:
- As an Architect design conceptual, logical and physical models using Erwin and build datamarts using hybrid Inmon and Kimball DW methodologies.
- Worked closely with business, datagovernance, SMEs and vendors to define data requirements.
- Worked with data investigation, discovery and mapping tools to scan every single data record from many sources.
- Designed the prototype of theDatamart and documented possible outcome from it for end-user.
- Involved in business process modeling using UML
- Developed and maintaineddatadictionary to create metadata reports for technical and business purpose.
- Designed and automated the process of score cuts that achieve increased close and good rates using advanced R programming.
- Involved onPrediction model building, Machine Learning, Business process improvements, Visualization&Process implementationwithR ProgrammingandDeepSee
- Redesigned and developedSASApplications with Netezza Database to the Netezza Applications reducing run time of Applications from 40 hours to 20 sec using Postgresql, nzsql, Aginity Workbench,SAS.
- Managed datasets using Panda data frames and MySQL, queried MYSQL relational database(RDBMS) queries from python using Python-MySQL connector MySQLdb package to retrieve information.
- Created SQLtables with referential integrity and developed queries using SQL, SQL*PLUS and PL/SQL.
- Utilized standard Python modules such as csv, itertools and pickle for development.
- Formulated procedures for integration ofR programmingplans with data sources and delivery systems and R language was used for prediction.
- Implementing SparkMlib utilities such as including classification, regression, clustering, collaborative filtering and dimensionality reduction.
- Design, coding, unit testing of ETL package source marts and subject marts using Informatica ETL processes for Oracledatabase.
- DevelopedStatisticalAnalysisand Response Modeling for Analytical Data base contributors(logistic regression).
- Developed various QlikView Data Models by extracting and using the data from various sources files, DB2, Excel, Flat Files and Bigdata.
- Handled importing data from various data sources, performed transformations using Hive, Map Reduce, and loaded data into HDFS.
- Tech stack is Python 2.7/PyCharm/Anaconda/pandas/numpy/unittest/R/Oracle
- Applied unsupervised and supervised learning methods in analyzing high-dimensional data. Proficient use of Python scikit-learn, pandas, and numpy packages
- Interaction with Business Analyst, SMEs and other Data Architects to understand Business needs and functionality for various project solutions.
- Performed data modeling operations using Power Bi, Pandas, and SQL.
- Utilized Python libraries wxPython, numPY, Twisted and matPlotLib Used python libraries like Beautiful Soup and matplotlib.
- Researched, evaluated, architected, and deployed new tools, frameworks, and patterns to built sustainable Big Data platforms for the clients
- Identifying and executing process improvements, hands-on in various technologies such as Oracle, Informatica, BusinessObjects.
- Generated graphical reports using python package Numpy and matPlotLib.
- Built various graphs for business decision making using Python matplotlib library.
- Designed both 3NF data models for ODS, OLTP systems and dimensionaldatamodels using Star and SnowflakeSchemas.
Environment: r9.0, Informatica 9.0, ODS, OLTP, Oracle 10g, Hive, OLAP, DB2, Metadata, MS Excel, Mainframes MS Visio, Rational Rose, Requisite Pro., Hadoop, PL/SQL,SAS etc.
Confidential, Dallas, Texas
Data Scientist
Responsibilities:
- Worked as aDataModeler/Analyst to generateDataModels using Erwin and developed relational database system.
- Analyzed the business requirements of the project by studying the Business Requirement Specification document.
- Participated in installation ofSAS/EBI on LINUX platform
- Extensively worked on DataModeling tools ErwinDataModeler to design the datamodels.
- Designedmapping to process the incremental changes that exists in the source table.
- Whenever source data elements were missing in source tables, these were modified/added in consistency with third normal form based OLTP source database.
- Designed tables and implemented the naming conventions for Logical and PhysicalData Models in Erwin 7.0.
- Knowledge of Information Extraction, NLP algorithms coupled with Deep Learning (ANN and CNN), Theano, Keras and Tensorflow.
- Provide expertise and recommendations for physicaldatabasedesign,architecture, testing, performance tuning and implementation.
- Designedlogical and physical data models for multiple OLTP and Analytic applications.
- Extensively used the Erwin design tool &Erwin model manager to create and maintain the DataMart.
- Designed the physical model for implementing the model into oracle9i physicaldatabase.
- Involved with DataAnalysis primarily Identifying DataSets, SourceData, Source Meta Data, Data Definitions and Data Formats
- Performance tuning of the database, which includes indexes, and optimizing SQL statements, monitoring the server.
- Wrote simple and advanced SQLqueries and scripts to create standard and adhoc reports for senior managers.
- Collaborated thedatamapping document from source to target and thedataquality assessments for the sourcedata.
- Used Expert level understanding of different databases in combinations for Data extraction and loading, joiningdata extracted from different databases and loading to a specific database.
- Built and trained a deep learning network using Tensorflow on the data, and reduced wafer scrap by 15%, by predicting likelihood of wafer damage.
- Co-ordinate with various business users, stakeholders and SME to get Functional expertise, design and business test scenarios review, UAT participation and validation of financial data.
- Worked very close withDataArchitectsand DBA team to implementdatamodel changes in database in all environments.
- A combination of the z-plot features, image features (pigmentation) and probe features are being used.
- Created PL/SQL packages and DatabaseTriggers and developed user procedures and prepared user manuals for the new programs.
- Performed performance improvement of the existingDatawarehouse applications to increase efficiency of the existing system.
- Designed and developed UseCase, Activity Diagrams, Sequence Diagrams, OOD (Object oriented Design) using UML and Visio.
Environment: SQL Server 2008R2 / 2005 Enterprise, SSRS, SSIS, Crystal Reports, Windows Enterprise Server 2000, DTS, SQL Profiler, and Query Analyzer.
Confidential, Dallas, TX
Data Scientist
Responsibilities:
- Coded R functions to interface with CaffeDeepLearning Framework
- Working in AmazonWebServices cloud computing environment
- Worked with several R packages including knitr, dplyr, SparkR, CausalInfer, spacetime.
- Implemented end-to-end systems for DataAnalytics, DataAutomation and integrated with custom visualization tools using R,Mahout, Hadoop and MongoDB.
- Gathering all the data that is required from multiple data sources and creating datasets that will be used in analysis.
- Experienced in Artificial Neural Networks(ANN) and Deep Learning models using Theano, Tensorflow and keras packages using Python.
- Performed Exploratory DataAnalysis and DataVisualizations using R, andTableau.
- Perform a proper EDA, Univariate and bi-variate analysis to understand the intrinsic effect/combined effects.
- Worked withDatagovernance,Dataquality,datalineage,Dataarchitectto design various models and processes.
- Independently coded new programs and designed Tables to load and test the program effectively for the given POC's using with BigData/Hadoop.
- Designeddatamodels anddataflow diagrams using Erwin and MSVisio.
- As an Architect implemented MDM hub to provide clean, consistent data for a SOA implementation.
- Developed, Implemented & Maintained the Conceptual, Logical&PhysicalDataModels using Erwin for Forward/ReverseEngineered Databases.
- EstablishedDataarchitecture strategy, best practices, standards, and roadmaps.
- Lead the development and presentation of a dataanalytics data-hub prototype with the help of the other members of the emerging solutions team
- Performed datacleaning and imputation of missing values using R.
- Worked with Hadoop eco system covering HDFS, HBase, YARN and MapReduce
- Take up ad-hoc requests based on different departments and locations
- Used Hive to store the data and perform datacleaning steps for huge datasets.
- Created dash boards and visualization on regular basis using ggplot2 and Tableau
- Creating customized business reports and sharing insights to the management.
- Worked with BTEQ to submit SQL statements, import and export data, and generate reports in Teradata.
- Interacted with the other departments to understand and identify dataneeds and requirements and work with other members of the ITorganization to deliver data visualization and reportingsolutions to address those needs.
Environment: Erwin r9.0, Informatica 9.0, ODS, OLTP, Oracle 10g, Hive, OLAP, DB2, Metadata, MS Excel, Mainframes MS Visio, Rational Rose, Requisite Pro, Hadoop, PL/SQL, etc.
Confidential, Minneapolis MN
Data Scientist/ Python Developer
Responsibilities:
- Supported MapReduce Programs running on the cluster.
- Worked on development of internal testing tool framework written in Python.
- Developed GUI using Python and Django for dynamically displaying block documentation and other features of python code using a web browser.
- Used JavaScript and JSON to update a portion of a webpage.
- Used SDLC process and used PHP to develop website functionality.
- Extensive code reviewing using GitHub pull requests, improved code quality, and conducted meetings among peer.
- Used Django configuration to manage URLs and application parameters.
- Scraped and retrieved web data as JSON using Scrapy, presented with Pandas library.
- Worked onPythonOpen stack API's.
- Evaluated business requirements and prepared detailed specifications that follow project guidelines required to develop written programs.
- Configured Hadoop cluster with Name node and slaves and formatted HDFS.
- Used Oozie workflow engine to run multiple Hive and Pig jobs.
- Performed Map Reduce Programs those are running on the cluster.
- Developed multiple MapReduce jobs in java for data cleaning and preprocessing.
- Successfully migrated the Django database from SQLite to MySQL to PostgreSQL with complete data integrity.
- Used Restful API's to gather network traffic data from Servers.
- Supported Apache Tomcat web server on Linux Platform.
- Designed and created backend data access modules using PL/SQL stored procedures.
- Involved in User Acceptance Testing and prepared UAT Test Scripts.
- Built database Model, Views and API's using Python for interactive web based solutions.
- Placed data into JSON files using Python to test Django websites. Used Python scripts to update the content in database and manipulate files.
Environment: Hadoop, MapReduce, Python
Confidential
Data Architect/Data Modeler
Responsibilities:
- Deployed GUI pages by using JSP, JSTL, HTML, DHTML, XHTML, CSS, JavaScript, AJAX
- Configured the project on WebSphere 6.1 application servers
- Implemented the online application by using Core Java, Jdbc, JSP, Servlets and EJB 1.1, Web Services, SOAP, WSDL
- Communicated with other Health Care info by using Web Services with the help of SOAP, WSDL JAX-RPC
- Used Singleton, factory design pattern, DAO Design Patterns based on the application requirements
- Used SAX and DOM parsers to parse the raw XML documents
- Used RAD as Development IDE for web applications.
- Preparing and executing Unit test cases
- Used Log4J logging framework to write Log messages with various levels.
- Involved in fixing bugs and minor enhancements for the front-end modules.
- Implemented Microsoft Visio and Rational Rose for designing the Use Case Diagrams, Class model, Sequence diagrams, and Activity diagrams for SDLC process of the application
- Doing functional and technical reviews
- Maintenance in the testing team for System testing/Integration/UAT
- Guaranteeing quality in the deliverables.
- Conducted Design reviews and Technical reviews with other project stakeholders.
- Was a part of the complete life cycle of the project from the requirements to the production support
- Created test plan documents for all back-end database modules
- Implemented the project in Linux environment.
Environment: R 3.0, Erwin 9.5, Tableau 8.0, MDM, QlikView, MLLib, PL/SQL, HDFS, Teradata 14.1, JSON, HADOOP (HDFS), MapReduce, PIG, Spark, R Studio, MAHOUT, JAVA, HIVE, AWS.
Confidential
Data Analyst/Data Modeler
Responsibilities:
- Developed entire frontend and backend modules using Python on Django including Tastypie Web Framework using GitHub.
- Used Python and Django creating graphics, XML processing, data exchange and business logic implementation.
- Use the Model View controller (MVC) framework to build modular and maintainable applications.
- Good Knowledge of Python and Python Web Framework Django.
- Experience in writing API's/ and Web Services in PHP and in Python.
- Created test cases during two week sprints using agile methodology.
- Designed data visualization to present current impact and growth.
- Experienced in developing internal auxiliary web apps using Python Flask framework with CSS / HTML framework.
- Coding in Python utilizing the Web2py framework with MVC methodology.
- Developed and tested various dashboard features using CSS, JavaScript, Django, and Bootstrap.
- Implemented client side logic using jQuery and JavaScript.
- Extracted the data sources and loaded to generate CSV data files with Python programming and SQL queries.
- Experience in writing complex SQL Queries, extract data from Oracle, MS-SQL Server and IBMDB2 Data base.
- Expertise in performing Data Analysis and Data Visualizations using R, Python and Tableau.
- Developed server based web traffic statistical analysis tool using Flask, Pandas.
- Created a work flow using technologies such as GIT/SSH to develop multi -programmer.
Environment: Python, Django, Java script, HTML, XML, CSS, Bootstrap, jQuery, CSV, R, Tableau, Flask, pandas, GitHub, Oracle, Django, sqlite3, Excel, SQL Server 2012, MS Office, SQL, MySQL, Windows.