We provide IT Staff Augmentation Services!

Data Scientist Resume

4.00/5 (Submit Your Rating)

San Antonio, TX

PROFESSIONAL SUMMARY:

  • Over 8+ year's experience in Data Analysis, Data Profiling, Data Integration, Migration, Data Governance and Metadata Management, Master Data Managementand Configuration Management.
  • Experience in various phases of Software Development life cycle (Analysis, Requirements gathering, Designing) with expertise in documenting various requirementspecifications, functionalspecifications, Test Plans, Data Validation, Source to Target mappings, SQL Joins, Data cleansing.
  • Experienced in Python to manipulate data for data loading and extraction and worked with python libraries like Numpy, Pandas, Scipy, and Matplotlib for data analysis.
  • Documenting new data to help source to target mapping. Also updating the documentation for existing data assisting with data profiling to maintain data sanitation, validation .
  • Data Driven and highly analytical with working knowledge and statistical model approaches and methodologies (clustering, Segmentation, Variable reduction, Regression analysis, Hypothesis testing, Decision trees, Machine learning), rules and ever evolving regulatory environment.
  • Collaborated with the lead Data Architect to model the Data warehouse in accordance with FSLDM subject areas, 3NF format, and Snowflake schema.
  • Proficient in SAS/BASE, SAS EG, SAS/SQL, SAS MACRO, SAS/ACCES .
  • Proficient in Tableau and R - Shiny data visualization tools to analyze and obtain insights into large datasets, create visually powerful and actionable interactive reports and dashboards .
  • Experience in extract data from adatabase such as DB2, Oracle, and SME-IM, MAD, M240 and UNIX server using SAS.
  • Extensive knowledge of Hadoop eco-system technologies like Apache Pig, Apache Hive, Apache Sqoop, Storm, Kafka, Elastic Search, Redis, Flume and Apache HBase.
  • Experienced in analyzing data using HiveQL and Pig Latin and custom MapReduce programs in Java.
  • Experienced in writing Pig UDF and Hive UDF and UDAFs in the analysis of data.
  • Extensive experience in Hive, Sqoop, Flume, Hue, and Oozie.
  • Knowledge in Business Intelligence tools like Business Objects, Cognos, Tableau,and OBIEE
  • Experience with Teradata and big data as the target for data marts worked with BTEQ, Fast Load, and Multi-Load .
  • Good knowledge in using all complex data types in Pig and MapReduce for handling the data and formatting it as required.
  • Integration Architect & Data Scientist experience in Analytics, BigData, BPM, SOA, ETL and Cloud technologies.
  • Built Coe competencies in the area of Analytics, SOA/EAI, ETL, and BPM.
  • Experience in foundational machine learning models and concepts: regression, random forest, boosting, GBM, NNs, HMMs, CRFs, MRFs, deep learning.
  • Experience in machine learning techniques and algorithms, such as k-NN, Naive Bayes, SVM, Decision Forests, etc.
  • Good knowledge of Hadoop architecture and its components like HDFS, MapReduce, Job Tracker, Task Tracker, Name Node and Data Node.
  • Collaborated with the lead Data Architect to model the Data warehouse in accordance with FSLDM subject areas, 3NF format, Snowflake schema.
  • Working knowledge of DICOM and Problem Loan Management applications.
  • Highly skilled in using visualization tools like Tableau, ggplot2,and d3.js for creating dashboards.
  • Experienced in importing and exporting data from therelational database into HDFS using Sqoop.
  • Experience in designing starschema, Snowflakeschema for Data Warehouse, ODSarchitecture.
  • Experience in designing stunning visualizations using Tableau software and publishing and presenting dashboards, Storyline on web and desktop platforms.
  • Experience working with data modeling tools like Erwin, PowerDesigner, and ERStudio.

TECHNICAL SKILLS

Languages: T-SQL, PL/SQL, SQL, C, C++, XML, HTML, DHTML, HTTP, MATLAB, DAX, Python

Databases: SQL Server 2014/2012/2008/2005/2000 , MS-Access, Oracle 12c/11g/10g/9i and Teradata, big data, Hadoop

Bigdata Ecosystem: HDFS, PIG, MapReduce, HIVE, SQOOP, FLUME, HBase, Storm, Kafka, Elastic Search, Redis, Flume, Storm, Kafka, Elastic Search, Redis, Flume, Scoop.

Statistical Methods: Time Series, regression models, splines, confidence intervals, principal component analysis and Dimensionality Reduction, bootstrapping

BI Tools: Rsoft Power BI, Tableau, SSIS, SSRS, SSAS, Business Intelligence Development Studio (BIDS), Visual Studio, Crystal Reports, Informatica 6.1.

Database Design Tools and Data Modeling: MS Visio, ERWIN 4.5/4.0, Star Schema/Snowflake Schema modeling, Fact & Dimensions tables, physical & logical data modeling, Normalization and De-normalization techniques, Kimball &Inmon Methodologies

Big Data / Grid Technologies: Cassandra, Coherence, Mongo DB, Zookeeper, Titan, Elasticsearch, Storm, Kafka, Hadoop

Tools: and Utilities: SQL Server Management Studio, SQL Server Enterprise Manager, SQL Server Profiler, Import & Export Wizard, Visual Studio.Net, Microsoft Management Console, Visual Source Safe 6.0, DTS, Crystal Reports, Power Pivot, ProClarity, Microsoft Office, Excel Power Pivot, Excel Data Explorer, Tableau, JIRA,SparkMLlib.

PROFESSIONAL EXPERIENCE

Confidential, San Antonio,TX

Data Scientist

Responsibilities:

  • As an Architect design conceptual, logical and physical models using Erwin and build datamarts using hybrid Inmon and Kimball DW methodologies.
  • Worked closely with business, datagovernance, SMEs and vendors to define data requirements.
  • Worked with data investigation, discovery and mapping tools to scan every single data record from many sources.
  • Designed the prototype of the Data mart and documented possible outcome from it for end-user.
  • Involved in business process modeling using UML
  • Involved on Prediction model building, Machine Learning, Business process improvements, Visualization & Process implementation with R Programming and DeepSee
  • Redesigned and developed SAS Applications with Netezza Database to the Netezza Applications reducing run time of Applications from 40 hours to 20 sec using PostgreSQL, nzsql, Aginity Workbench, SAS
  • Created SQLtables with referential integrity and developed queries using SQL, SQL*PLUS, and PL/SQL.
  • Formulated procedures for integration of R programming plans with data sources and delivery systems and R language was used for prediction.
  • Implementing SparkMlib utilities such as including classification, regression, clustering, collaborative filtering and dimensionality reduction.
  • Design, coding, unit testing of ETL package source marts and subject marts using Informatica ETL processes for Oracledatabase.
  • Developed Statistical Analysis and Response Modeling for Analytical Database contributors (logistic regression).
  • Used Pig and Hive in the analysis of data.
  • Used all complex data types in Pig for handling data.
  • Created/modified UDF and UDAFs for Hive whenever necessary.
  • Developed various QlikView Data Models by extracting and using the data from various sources files, DB2, Excel, Flat Files and Bigdata.
  • Participated in all phases of datamining; datacollection, datacleaning, developingmodels, validation, visualization and performed Gapanalysis. data manipulation and Aggregation from adifferent source using Nexus, Toad, BusinessObjects, PowerBI,and SmartView.
  • Handled importing data from various data sources, performed transformations using Hive, Map Reduce, and loaded data into HDFS.
  • Loaded and transformed large sets of structured, semi-structured and unstructured data.
  • Supported Map Reduce Programs those are running in the cluster.
  • Managed and reviewed Hadoop log files to identify issues when ajob fails.
  • Designed both 3NF data models for ODS, OLTP systems and dimensional data models using Star and snowflake schemas.
  • Developed Pig UDFs for preprocessing the data for analysis.
  • Involved in writing shell scriptsfor scheduling and automation of tasks.

Environment: r9.0, Informatica 9.0, ODS, OLTP, Oracle 12c/11g, Hive, OLAP, DB2, Metadata, MS Excel, Mainframes MS Visio, Rational Rose, Requisite Pro., Hadoop, PL/SQL, SAS etc.

Confidential, Maryland

Data Scientist

Responsibilities:

  • Perform Data Profiling to learn about user behavior and merge data from multiple data sources.
  • Implemented big data processing applications to collect, clean and normalization large volumes of open data using Hadoopecosystems such as PIG, HIVE, and HBase.
  • Designing and developing various machine learning frameworks using Python, R, and Matlab.
  • Integrate R into Micro Strategy to expose metrics determined by more sophisticated and detailed models than natively available in the tool.
  • Worked on different data formats such as JSON, XML and performed machinelearningalgorithms in Python.
  • Worked as Data Architects and IT Architects to understand the movement of data and its storage and ERStudio9.7
  • Participated in all phases of datamining,datacollection, datacleaning, developingmodels, validation, visualization and performed Gapanalysis. data manipulation and Aggregation from adifferent source using Nexus, Toad, BusinessObjects, PowerBI, and SmartView.
  • Independently coded new programs and designed Tables to load and test the program effectively for the given POC's using with Big Data/Hadoop.
  • Develop documents and dashboards of predictions in Microstrategy and present it to the business intelligence team.
  • Developed various QlikViewDataModels by extracting and using the data from various sources files, DB2, Excel, Flat Files and Bigdata.
  • Good knowledge of HadoopArchitecture and various components such as HDFS, JobTracker, TaskTracker, NameNode, DataNode, SecondaryNameNode, and MapReduce concepts.
  • As Architect delivered various complex OLAPdatabases/cubes, scorecards, dashboards and reports.
  • Programmed a utility in Python that used multiple packages (scipy, numpy, pandas)
  • Implemented Classification using supervised algorithms like LogisticRegression, Decisiontrees, KNN, NaiveBayes.
  • Used Teradata15 utilities such as FastExport, MLOAD for handling various tasks data migration/ETL from OLTP Source Systems to OLAP Target Systems
  • Collaborate with data engineers to implement ETL process, write and optimized SQL queries to perform data extraction from Cloud and merging from Oracle 12c.
  • Collect unstructured data from MongoDB 3.3 and completed data aggregation.
  • Perform data integrity checks, data cleaning, exploratory analysis and feature engineer using R 3.4.0.
  • Conducted analysis of assessing customer consuming behaviors and discover thevalue of customers with RMF analysis; applied customer segmentation with clustering algorithms such as K-MeansClustering and Hierarchical Clustering.
  • Use Python 3.0 (numPy, sciPy, pandas, sci-kit-learn, Seaborn, NLTK) and Spark 1.6 / 2.0 (PySpark, MLlib) to develop avariety of models and algorithms for analytic purposes.
  • Analyze Data and Performed Data Preparation by applying thehistoricalmodel to the data set in AZUREML.
  • Determine customer satisfaction and help enhance customer experience using NLP.

Environment: ER Studio 9.7, Tableau 9.03, AWS, Teradata 15, MDM, GIT, Unix, Python 3.5.2, MLlib, SAS, regression, logistic regression, QlikView

Confidential, Midlothian, VA

Data Scientist

Responsibilities:

  • Involved in complete Implementation lifecycle, specialized in writing custom MapReduce, Pig and Hive programs.
  • Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
  • Extensively used Hive/HQL or Hive queries to query or search for a particular string in Hive tables in HDFS.
  • Possess good Linux and Hadoop System Administration skills, networking, shell scripting and familiarity with open source configuration management and deployment tools such as Chef.
  • Experience in developing customized UDF's in java to extend Hive and Pig Latin functionality.
  • Developed several business services using Java RESTful web services using Spring MVC framework
  • Managing and scheduling Jobs to remove the duplicate log data files in HDFS using Oozie.
  • Used Apache Oozie for scheduling and managing the Hadoop Jobs. Knowledge ofHCatalog for Hadoop based storage management.
  • Interaction with Business Analyst, SMEs, and other Data Architects to understand Business needs and functionality for various project solutions
  • Identifying and executing process improvements, hands-on in various technologies such as Oracle, Informatica, BusinessObjects.
  • Expert in creating and designing data ingest pipelines using technologies such as Spring Integration, Apache Storm-Kafka
  • Dumped the data from HDFS to MYSQL database and vice-versa using SQOOP
  • Experienced in Analyzing Cassandra database and compare it with other open-source NoSQL databases to find which one of them better suites the current requirements.
  • Used File System check (FSCK) to check the health of files in HDFS.
  • Developed the UNIX shell scripts for creating the reports from Hive data.
  • Experienced in loading and transforming of large sets of structured, semi-structuredand unstructured data.
  • Involved in the pilot of Hadoop cluster hosted on Amazon Web Services (AWS)
  • Extensively used Sqoop to get data from RDBMS sources like Teradata and Netezza.
  • Create a complete processing engine, based on Cloudera' s distribution
  • Involved in collecting metrics for Hadoop clusters using Ganglia and Ambari.
  • Extracted files from CouchDB, MongoDB through Sqoop and placed in HDFS for processed
  • Spark Streaming collects this data from Kafka in near-real-time and performs necessary transformations and aggregation on the fly to build the common learner data model and persists the data in NoSQL store (HBase).

Environment: AWS, ODS, OLTP, Oracle 11g, Hive, OLAP, DB2, Metadata, MS Excel, Cassandra,Mainframes MS Visio, Rational Rose, Requisite Pro, Hadoop, PL/SQL, Sqoop, Couch DB,MongoDBetc

Confidential, NY

Data Scientist/ETL Data Engineer

Responsibilities:

  • This project was focused on customer segmentation based on machine learning and statistical modeling effort including building predictive models and generate data products to support customer segmentation.
  • Used Python to visualize the data and implemented machine learning algorithms.
  • Used R Programming for more statistical analysis
  • Design and develop analytics, machine learning models, and visualizations that drive performance and provide insights, from prototyping to production deployment and product recommendation and allocation planning;
  • Application of various machine learning algorithms and statistical modeling like decision trees, regression models, neural networks, SVM, clustering to identify Volume using package in Python.
  • Performed data imputation using Scikit-learn package in Python.
  • Experience in Big Data Hadoop, HIVE, PySpark and HDFS
  • Experience in using Database like MSSQL, Postgres.
  • Written complex Hive and SQL queries for data analysis to meet business requirements.
  • Hands on experience in implementing Naive Bayes and skilled in Random Forests, Decision Trees, Linear and Logistic Regression, SVM, Clustering, neural networks, Principle Component Analysis.
  • Performed K-means clustering, Multivariate analysis, and Support Vector Machines in Python.
  • Written complex SQL queries for implementing business requirements
  • Performed Data Cleaning, features scaling, features engineering using pandas and numpy packages in python.
  • Developed MapReduce pipeline for feature extraction using Hive.
  • Developed entire frontend and backend modules using Python on Django Web Framework.
  • Prepared Data Visualization reports for the management using R, Tableau, and Power BI.
  • Work independently or collaboratively throughout the complete analytics project lifecycle including data extraction/preparation, design and implementation of scalable machine learning analysis and solutions, and documentation of results.

Environment: R/R studio, SAS, Python, Hive, Hadoop, MS Excel, MS SQL Server, Power BI, Tableau, T-SQL, ETL, MS Access, XML, JSON, MS office 2007, Outlook.

Confidential

Data Analyst

Responsibilities:

  • Implemented Microsoft Visio and Rational Rose for designing the Use Case Diagrams, Class model, Sequence diagrams, and Activity diagrams for SDLC process of the application
  • Implemented the online application by using Core Java, Jdbc, JSP, Servlets and EJB 1.1, Web Services, SOAP, WSDL
  • Communicated with other Health Care info by using Web Services with the help of SOAP, WSDLJAX-RPC
  • Doing functional and technical reviews
  • Deployed GUI pages by using JSP, JSTL, HTML, DHTML, XHTML, CSS, JavaScript, AJAX
  • Configured the project on WebSphere 6.1 application servers
  • Used Singleton, factory design pattern, DAO Design Patterns based on the application requirements
  • Used SAX and DOM parsers to parse the raw XML documents
  • Used RAD as Development IDE for web applications.
  • Conducted Design reviews and Technical reviews with other project stakeholders.
  • Preparing and executing Unit test cases
  • Used Log4J logging framework to write Log messages with various levels.
  • Involved in fixing bugs and minor enhancements for the front-end modules.
  • Maintenance in the testing team for System testing/Integration/UAT
  • Implemented the project in Linux environment.

Environment: R 2.x, Erwin 8, Tableau 8.0, MDM, QlikView, MLLib, PL/SQL, HDFS, Teradata 14.1, JSON, HADOOP (HDFS), MapReduce, PIG, Spark, R Studio, MAHOUT, JAVA, HIVE, AWS.

Confidential

Data Analyst

Responsibilities:

  • Implementation of Metadata Repository, Maintaining Data Quality, Data Cleanup procedures, Transformations, Data Standards, Data Governance program, Scripts, Stored Procedures, triggers and execution of test plans
  • Responsible for defining the key identifiers for each mapping/interface.
  • Responsible for defining the functional requirement documents for each source to target interface.
  • Involved in defining the source to target data mappings, business rules, data definitions.
  • Worked with BTEQ to submit SQL statements, import and export data, and generate reports in Teradata.
  • Work with users to identify the most appropriate source of record and profile the data required for sales and service.
  • Involved in defining the business/transformation rules applied for sales and service data.
  • Define the list codes and code conversions between the source systems and the data mart.
  • Worked with internal architects and, assisting in the development of current and target state data architectures.
  • Coordinate with the business users in providing appropriate, effective and efficient way to design the new reporting needs based on the user with the existing functionality.
  • Remain knowledgeable in all areas of business operations in order to identify systems needs and requirements.
  • Document the complete process flow to describe program development, logic, testing, and implementation, applicationintegration, coding.

Environment: Erwin r7.0, SQL Server 2000/2005, Windows XP/NT/2000, Oracle 8i/9i, MS-DTS, UML, UAT, SQL Loader, OOD, OLTP, PL/SQL, MS Visio, Informatica.

We'd love your feedback!