We provide IT Staff Augmentation Services!

Data Scientist Resume

Fort Worth, TX

SUMMARY

  • Above 8 years of experience in Machine Learning, Datamining with large datasets of Structured and Unstructured data, Data Acquisition, Data Validation, Predictive modeling, Data Visualization.
  • Developing Logical Data Architecture with adherence to EnterpriseArchitecture.
  • Having good domain noledge on Retail, BFSI and Airlines.
  • Expertise in complete software development life cycle process that includes Analysis, Design, Development, Testing and Implementation in Hadoop Eco System, having extensive experience in TextMining.
  • Integration Architect & Data Scientist experience in Analytics, Big Data, BPM, SOA, ETL and Cloudtechnologies.
  • Highly skilled in using visualization tools like Tableau, ggplot2 and d3.js for creating dashboards.
  • Well experienced in Normalization & De - Normalization techniques for optimum performance in relational and dimensional database environments.
  • Expertise in transforming business requirements into analytical models, designing algorithms, building models, developing data mining and reporting solutions that scales across massive volume of structured and unstructured data.
  • Extensive experience in Text Analytics, developing different Statistical MachineLearning, DataMining solutions to various business problems and generating data visualizations using R and Tableau.
  • Designing of Physical Data Architecture of New system engines.
  • Experience in designing stunning visualizations using Tableau software and publishing and presenting dashboards, Storyline on web and desktop platforms.
  • Proficient in Statistical Modeling and MachineLearning techniques (Linear, Logistics, Decision Trees, Random Forest, SVM, K-Nearest Neighbors, Bayesian, XG Boost) in Forecasting/ Predictive Analytics, Segmentation methodologies, Regression based models, Hypothesis testing, Factor analysis/ PCA, Ensembles.
  • Hands on experience in implementing LDA, Naive Bayes and skilled in Random Forests, Decision Trees, Linear and Logistic Regression, SVM, Clustering, neural networks, Principle Component Analysis and good noledge on Recommender Systems.
  • Experience on advanced SAS programming techniques, such as PROC SQL (JOIN/ UNION), PROC APPEND, PROC DATASETS, and PROC TRANSPOSE.
  • Knowledge of working with Proof of Concepts (PoC's) and gap analysis and gathered necessary data for analysis from different sources, prepared data for data exploration using data munging.
  • Experience in machine learning techniques and algorithms, such as k-NN, Naive Bayes, SVM, Decision Forests, etc.
  • Automated Quality Control reporting tools using R for product commercialization resulted in efficient reporting.
  • Facilitated and halped translate complex quantitative methods into simplified solutions for end users.
  • Highly experienced and noledgeable in developing analytics and statistical models as per organizational requirements and ability to produce alternate cost TEMPeffective and efficient models.
  • Analyzed data using R, Perl, Hadoop and queried data using structured and unstructured databases
  • Experience in foundational machine learning models and concepts: regression, random forest, boosting, GBM, NNs, HMMs, CRFs, MRFs, deep learning.
  • Extensive experience in Hive, Sqoop, Flume, Hue and Oozie.
  • Worked and extracted data from various database sources like Oracle, SQL Server, DB2, and Teradata.
  • Skilled in System Analysis, E-R/Dimensional Data Modeling, Database Design and implementing RDBMS specific features.
  • Regularly accessing JIRA tool and other internal issue trackers for the Project development.
  • Developed data variation analysis, data aggregation analysis and data pair association analysis in the Bioinformatics field.
  • Designed and implemented web applications and customized data visualization using R-Shiny (GUI)and halped end users to visualize data in real-time.

TECHNICAL SKILLS

Languages: T-SQL, PL/SQL, SQL, C, C++, XML, HTML, DHTML, HTTP, Mat lab, DAX.

Databases: SQL Server 2014/2012/2008/2005/2000 , MS-AccessOracle 11g/10g/9i and Teradata, big data, Hadoop

DWH / BI Tools: Microsoft Power BI, Tableau, SSIS, SSRS, SSAS, Business Intelligence Development Studio (BIDS), Visual Studio, Crystal Reports, Informatica 6.1.

Database Design Tools and Data Modeling: MS Visio, ERWIN 4.5/4.0, Star Schema/Snowflake Schema modeling, Fact & Dimensions tables, physical & logical data modeling, Normalization and De-normalization techniques, Kimball &Inmon Methodologies

Tools: and Utilities: SQL Server Management Studio, SQL Server Enterprise Manager, SQL Server Profiler, Import & Export Wizard, Visual Studio .Net, Microsoft Management Console, Visual Source Safe 6.0, DTS, Crystal Reports, Power Pivot, ProClarity, Microsoft Office, Excel Power Pivot, Excel Data Explorer, Tableau, JIRA, Spark Mlib

PROFESSIONAL EXPERIENCE

Confidential, Fort Worth, TX

Data Scientist

Responsibilities:

  • Performing data profiling and analysis on different source systems (CDH, RDH, CBS, TRUX) that are required for Customer Master.
  • Used T-SQL queries to pull the data from disparate systems and Data warehouse in different environments.
  • Worked closely with the Data Governance Office team in assessing the source systems for project deliverables.
  • Presented DQ analysis reports and score cards on all the validated data elements and presented -to the business teams and stakeholders.
  • Extensively used open source tools - R Studio for statistical analysis and building the machine learning.
  • Involved in defining the Source to Target data mappings, business rules, data definitions.
  • Performing Data Validation / Data Reconciliation between disparate source and target systems (Salesforce, Cisco-UIC, Cognos, Data Warehouse) for various projects.
  • Interacting with the Business teams and Project Managers to clearly articulate the anomalies, issues, findings during data validation.
  • Interacting with the ETL, BI teams to understand / support on various ongoing projects.
  • Extensively using MS Excel (Pivot tables, VLOOKUP) for data validation.
  • Create automated metrics using complex databases.
  • Providing analytical network support to improve quality and standard work results.
  • Utilize a broad variety of statistical packages like SAS, R, MLIB, Graphs, Hadoop, Spark, Map Reduce, Pig and others.

Environment: SQL Server 2014, DB2, R, ETL, MS Office Suite - Excel(Pivot, VLOOKUP), Visio, Sypder 3.6, Word, Azure, HP ALM 12, Agile, MDM 10.2, Data Quality, R Studio, Tableau, Sharepoint, Reference Data Management, Data Governance.

Confidential, Dallas, TX

Data Scientist

Responsibilities:

  • As an Architect design conceptual, logical and physical models using Erwin and build datamarts using hybrid Inmon and Kimball DW methodologies.
  • Worked closely with business, data governance, SMEs and vendors to define data requirements.
  • Designed the prototype of theDatamart and documented possible outcome from it for end-user.
  • Involved in business process modeling using UML
  • Implementing SparkMlibutilities such as including classification, regression, clustering, collaborative filtering and dimensionality reduction.
  • Handled importing data from various data sources, performed transformations using Hive, Map Reduce, and loaded data into HDFS.
  • Interaction with BusinessAnalyst, SMEs and other Data Architects to understand Business needs and functionality for various project solutions
  • Researched, evaluated, architected, and deployed new tools, frameworks, and patterns to built sustainable Big Data platforms for the clients
  • Identifying and executing process improvements, hands-on in various technologies such as Oracle, BusinessObjects.

Environment: r9.0, Informatica 9.0, ODS, OLTP, Oracle 10g, Hive, OLAP, DB2, Metadata, MS Excel, Mainframes MS Visio, Rational Rose, Requisite Pro., Hadoop, PL/SQL, etc.

Confidential - King of Prussia, PA

Data Scientist

Responsibilities:

  • Developing Data Mapping, Data Governance, Transformation and Cleansing rules for the Master Data Management (MDM) Architecture involving OLTP, ODS and OLAP.
  • Conducting JAD sessions, writing meeting minutes, collecting requirements from business users and analyze based on the requirements.
  • Involved in defining the source to target data mappings, business rules, and data definitions.
  • Transformation on the files received from clients and consumed by Sql Server.
  • Providing source to target mappings to the ETL team to perform initial, full, and incremental loads into the target data mart.
  • Performing data profiling on various source systems that are required for transferring data to ECH using Informatica Analyst tool 10.1/9.6.1.
  • Working closely with the ETL, SSIS, SSRS Developers to explain the complex Data Transformation using Logic.
  • Performing Data Profiling, Cleansing, Integration and extraction tools (e.g. Informatica).
  • Utilizing Informatica toolset (Informatica Data Explorer, and Informatica Data Quality) to analyze legacy data for Data Profiling.
  • Applying data cleansing/data scrubbing techniques to ensure consistency amongst data sets.
  • Worked on DTS Packages, DTS Import/Export for transferring data between SQLServer
  • Using HPQuality Center v 11 for defect tracking of issues.
  • Performing an end to end Informatica ETL Testing for these custom tables by writing complex SQL Queries on the source database and comparing the results against the target database.
  • Extracting the source data from Oracletables, MS SQL Server, sequential files and excel sheets.

Environment: Informatica Analyst 10.1/9.6.1, Power Center 9.x/8.1, MDM, MS Excel, Agile, Sql Server, SOA, SSIS, SSRS, Oracle 10g, Meta Data, IDQ, IDD, UNIX, T-SQL, HP Quality Center 11, RDM (Reference Data Management), Data Governance, Data Lineage, ETL.

Confidential, Minneapolis MN

Data Scientist

Responsibilities:

  • Worked with project team representatives to ensure that logical and physical ER/Studio data models were developed in line with corporate standards and guidelines.
  • Involved in defining the source to target data mappings, business rules, data definitions.
  • Worked with BTEQ to submitSQLstatements, import and export data, and generate reports in Teradata.
  • Responsible for defining the key identifiers for each mapping/interface.
  • Responsible for defining the functional requirement documents for each source to target interface.
  • Document, clarify, and communicate requests for change requests with the requestor and coordinate with the development and testing team.
  • Work with users to identify the most appropriate source of record and profile the data required for sales and service.
  • Document the complete process flow to describe program development, logic, testing, and implementation, application integration, coding.
  • Involved in defining the business/transformation rules applied for sales and servicedata.
  • Define the list codes and code conversions between the source systems and the data mart.
  • Worked with internal architects and, assisting in the development of current and target state data architectures.
  • Coordinate with the business users in providing appropriate, TEMPeffective and efficient way to design the new reporting needs based on the user with the existing functionality.
  • Remain noledgeable in all areas of business operations in order to identify systems needs and requirements.
  • Responsible for defining the key identifiers for each mapping/interface.

Environment: Erwin r7.0, SQL Server 2000/2005, Windows XP/NT/2000, Oracle 8i/9i, MS-DTS, UML, UAT, SQL Loader, OOD, OLTP, PL/SQL, MS Visio, Informatica.

Confidential

Data Modeler

Responsibilities:

  • Developed applications of MachineLearning, Statistical Analysis and Data Visualizations with challenging data Processing problems in sustainability and biomedical domain.
  • Designed and developed Natural Language Processing models for sentiment analysis.
  • Applied concepts of probability, distribution and statistical inference on given dataset to unearth interesting findings through use of comparison, T-test, F-test, R-squared, P-value etc.
  • Used predictive modeling with tools in SAS, SPSS, R.
  • Applied clustering algorithms i.e. Hierarchical, K-means with halp of Scikit and Scipy.
  • Developed visualizations and dashboards using ggplot, Tableau
  • Expertise in Business Intelligence and data visualization using R and Tableau.
  • Built and analyzed datasets using R, SAS, Matlab (in decreasing order of usage).

Environment: Machine learning, AWS, MS Azure, Cassandra, Spark, HDFS, Hive, Pig, Linux, R, SAS, SPSS, MySQL, Eclipse, PL/SQL, SQL connector, Tableau.

Confidential

Data Modeler

Responsibilities:

  • Implemented the online application by using Core Java, Jdbc, JSP, Servlets and EJB 1.1, WebServices, SOAP, WSDL
  • Communicated with other Health Care info by using Web Services with the halp of SOAP, WSDLJAX-RPC
  • Used Singleton, factory design pattern, DAO Design Patterns based on the application requirements
  • Used SAX and DOM parsers to parse the raw XML documents
  • Used RAD as Development IDE for web applications.
  • Preparing and executing Unit test cases
  • Used Log4J logging framework to writeLog messages with various levels.
  • Involved in fixing bugs and minor enhancements for the front-end modules.
  • Implemented Microsoft Visio and Rational Rose for designing the Use Case Diagrams, Class model, Sequence diagrams, and Activity diagrams for SDLC process of the application
  • Doing functional and technical reviews
  • Maintenance in the testing team for System testing/Integration/UAT
  • Guaranteeing quality in the deliverables.
  • Conducted Design reviews and Technical reviews with other project stakeholders.

Environment: R 3.0, Erwin 9.5, Tableau 8.0, MDM, QlikView, MLLib, PL/SQL, HDFS, Teradata 14.1, JSON, HADOOP (HDFS), MapReduce, PIG, Spark, R Studio, MAHOUT, JAVA, HIVE, AWS.

Hire Now