We provide IT Staff Augmentation Services!

Data Scientist Resume

5.00/5 (Submit Your Rating)

Lowell, ArkansaS

PROFESSIONAL SUMMARY:

  • Highly efficient Data Scientist with over 8+ years of experience in areas including Data Analyst, Statistical Analysis, Machine Learning, Data mining with large data sets of structured and unstructured data in Manufacturing and healthcare industry.
  • Hands on experience with R packages such as Sqldf, plyr, forecast, random forest for predictive modeling.
  • Excellent working in Big Data Hadoop Hortonworks, HDFS architecture, R, Python, Jupyter, Pandas, numPy, Scikit, Matplotlib, pyhive, Keras, Hive, noSQL - HBASE, Sqoop, Pig, MapReduce, Oozie, Spark MLlib.
  • Hands on experience in Liner, Logistic Regression, K Means Cluster Analysis, Decision Tree, KNN, SVM, Random Forest, Market Basket, NLTK/Naïve Bayes, Sentiment Analysis, Text Mining/Text Analytics, Time Series Forecasting.
  • Worked on different type of Python modules such as requests, boto, flake8, flask, mock and nose
  • Extensive experienced on business intelligence (and BI technologies) tools such as OLAP, Data warehousing, reporting and querying tools, Data mining and Spreadsheets
  • Efficient in developing Logical and Physical Data model and organizing data as per teh business requirements using Sybase Power Designer, Erwin, ER Studio in both OLTP and OLAP applications
  • Solid ability to write and optimize diverse SQL queries, working noledge of RDBMS like SQL Server 2008, NoSQL databases like MongoDB 3.2
  • Strong experience in Big Data technologies like Spark 1.6, Sparksql, pySpark, Hadoop 2.X, HDFS, Hive 1.X .
  • Experience in visualization tools like, Tableau 9.X, 10.X for creating dashboards
  • Experienced with R programing for data visualization (ggplot2, mat-plot & Qplots).
  • Experienced in Big Data with Hadoop 2, HDFS, MapReduce, and Spark.
  • Experienced in Spark 2.1, Spark SQL and PySpark.
  • Performed data cleaning and feature selection using MLlib package in PySpark
  • 1Performed partitioned clustering into 100 by k-means clustering using Scikit-learn package in Python where similar hotels for a search are grouped together.
  • Adept at using SAS Enterprise suite, R, Python, and BigData related technologies including Hadoop, Hive, Pig, Sqoop, Cassandra, Oozie, Flume, Map-Reduce and Cloudera Manager for design of business intelligence applications
  • Experienced in Python to manipulate data for data loading and extraction and worked with python libraries like Matplotlib, Numpy, Scipy and Pandas for data analysis.
  • Worked with complex applications such as R, SAS, Matlab and SPSS to develop neural network, cluster analysis.
  • Strong SQL programming skills, with experience in working with functions, packages and triggers.
  • Good understanding of Teradata SQL Assistant, Teradata Administrator and data load/ export utilities like BTEQ, Fastload, Multiload, FastExport.
  • Experience with Data Analytics, Data Reporting, Ad-hoc Reporting, Graphs, Scales, PivotTables and OLAP reporting.
  • Worked with NoSQL Database including Hbase, Cassandra and MongoDB.
  • Experienced in Big Data with Hadoop, HDFS, MapReduce, and Spark.
  • Experienced in Data Integration Validation and Data Quality controls for ETL process and Data Warehousing using MS Visual Studio SSIS, SSAS, and SSRS.
  • Proficient in Tableau and R-Shiny data visualization tools to analyze and obtain insights into large datasets, create visually powerful and actionable interactive reports and dashboards.
  • Automated recurring reports using SQL and Pythonand visualized them on BI platform like Tableau.

TECHNICAL SKILLS

DataModeling Tools: Erwin r9.6, 9.5, 9.1, 8.x, Rational Rose, ER/Studio, MS Visio, SAP Power designer.

Programming Languages: C/C++, C#, Java, Oracle PL/SQL, Python, SQL, T-SQL, UNIX shell scripting, Bash, HTML5.

Scripting Languages: Python (NumPy, SciPy, Pandas, Gensim, Keras), R (Caret, Weka, ggplot), XML, JSON

Big Data Technologies: Hadoop, Hive, HDFS, MapReduce, Pig, Kafka, spark, hbase.

Reporting Tools: Crystal reports XI, Business Intelligence, SSRS, Business Objects 5.x/ 6.x, Cognos7.0/6.0, Tableau.

ETL: Informatica Power Centre, SSIS.

BI Tools: Tableau, Tableau server, Tableau Reader, SAP Business Objects, OBIEE, Qlikview, SAP Business Intelligence, Amazon Redshift, or Azure Data Warehouse

Tools: MS-Office suite (Word, Excel, MS Project and Outlook), Spark MLlib, Scala NLP, MariaDB, Azure, SAS.

Data Modeling Tools: Erwin … Sybase Power Designer, ER Studio, Enterprise Architect, Oracle Designer, MS Visio.

Operating Systems: Windows, UNIX, MS DOS, Sun Solaris.

Databases: Oracle, Teradata, Netezza, Microsoft SQL Server, Mysql, MongoDB, HBase, Cassandra.

Project Execution Methodologies: Ralph Kimball and Bill Inmon data warehousing methodology, Rational Unified Process (RUP), Rapid Application Development (RAD), Joint Application Development (JAD)

PROFESSIONAL EXPERIENCE:

Confidential, Lowell, Arkansas

Data Scientist

Responsibilities:

  • dis project was focused on customer segmentation based on machine learning and statistical modelingeffort including buildingpredictive models and generatesdataproducts to support customer segmentation.
  • Used Python to visualize teh data and implemented machine learning algorithms.
  • Used R Programming for more statistical analysis
  • Develop a pricing model for various product & services bundled offering to optimize and predict teh gross margin.
  • Built priceelasticitymodel for various product and services bundled offering.
  • Developed predictive causal model using annual failure rate and standard cost basis for teh new bundled service offering.
  • Design and develop analytics, machine learning models, and visualizations dat drive performance and provide insights, from prototyping to production deployment and productrecommendation and allocation planning;
  • Application of various machine learning algorithms and statistical modeling like decision trees, regression models, neural networks, SVM, clustering to identify Volume using package in Python.
  • Performeddataimputation using Scikit-learn package inPython.
  • Performeddataprocessing usingPythonlibraries like Numpy and Pandas.
  • Worked withdataanalysis using ggplot2 library in R to dodatavisualizations for better understanding of customers' behaviors.
  • Experience in using AWS Cloud Services
  • Experience in using data science relevant technologies likeJupyter Hub
  • Performeddataanalysis by using Hive to retrieve thedatafrom Hadoop cluster, SQL to retrievedata from Oracle database.
  • Experience in Big Data Hadoop, HIVE, PySpark and HDFS
  • Experience in using Database like MSSQL, Postgres.
  • Written complexHiveandSQLqueries for data analysis to meet business requirements.
  • Hands on experience in implementing Naive Bayes and skilled in Random Forests, Decision Trees, Linear and Logistic Regression, SVM, Clustering, neural networks, Principal Component Analysis.
  • Performed K-means clustering, Multivariate analysis, and Support Vector Machines in Python.
  • Written complexSQLqueries for implementing business requirements
  • PerformedDataCleaning, features scaling, features engineering using pandas and numpy packages inpython.
  • Developed MapReduce pipeline for feature extraction using Hive.
  • Developed entire frontend and backend modules usingPythonon Django Web Framework.
  • Implemented teh presentation layer with HTML, CSS, and JavaScript.
  • CreatedDataQuality Scripts using SQL and Hive to validate successfuldataload and quality of thedata. Created various types ofdatavisualizations usingPythonand Tableau.
  • PreparedDataVisualization reports for teh management using R, Tableau, and Power BI.
  • Work independently or collaboratively throughout teh complete analytics project lifecycle includingdataextraction/preparation, design and implementation of scalable machine learning analysis and solutions, and documentation of results.

Environment: R/R studio, SAS, Python, Hive, Hadoop, MS Excel, MS SQL Server, Power BI, Tableau, T-SQL, ETL, MS Access, XML, JSON, MS office 2007, Outlook.

Confidential - Elmhurst, IL

Data Scientist

Responsibilities:

  • Perform Data Profiling to learn about user behavior and merge data from multiple data sources.
  • Implemented big data processing applications to collect, clean and normalization large volumes of open data using Hadoopecosystems such as PIG, HIVE, and HBase.
  • Designing and developing various machine learning frameworks using Python, R, and Matlab.
  • Integrate R into Micro Strategy to expose metrics determined by more sophisticated and detailed models than natively available in teh tool.
  • Worked on different data formats such as JSON, XML and performed machinelearningalgorithms in Python.
  • Worked as Data Architects and IT Architects to understand teh movement of data and its storage and ERStudio9.7
  • Processed huge datasets (over billion data points, over 1 TB of datasets) for data association pairing and provided insights into meaningful data association and trends
  • Developed cross-validation pipelines for testing teh accuracy of predictions
  • Enhanced statistical models (linear mixed models) for predicting teh best products for commercialization using Machine Learning Linear regression models, KNN and K-means clustering algorithms
  • Participated in all phases of datamining,datacollection, datacleaning, developingmodels, validation, visualization and performed Gapanalysis.
  • data manipulation and Aggregation from adifferent source using Nexus, Toad, BusinessObjects, PowerBI, and SmartView.
  • Independently coded new programs and designed Tables to load and test teh program effectively for teh given POC's using with Big Data/Hadoop.
  • Develop documents and dashboards of predictions in Microstrategy and present it to teh business intelligence team.
  • Developed various QlikViewDataModels by extracting and using teh data from various sources files, DB2, Excel, Flat Files and Bigdata.
  • Good noledge of HadoopArchitecture and various components such as HDFS, JobTracker, TaskTracker, NameNode, DataNode, SecondaryNameNode, and MapReduce concepts.
  • As Architect delivered various complex OLAPdatabases/cubes, scorecards, dashboards and reports.
  • Programmed a utility in Python dat used multiple packages (scipy, numpy, pandas)
  • Implemented Classification using supervised algorithms like LogisticRegression, Decisiontrees, KNN, NaiveBayes.
  • Used Teradata15 utilities such as FastExport, MLOAD for handling various tasks data migration/ETL from OLTP Source Systems to OLAP Target Systems
  • Handled importing data from various data sources, performed transformations using Hive, MapReduce, and loadeddata into HDFS.
  • Collaborate with data engineers to implement ETL process, write and optimized SQL queries to perform data extraction from Cloud and merging from Oracle 12c.
  • Collect unstructured data from MongoDB 3.3 and completed data aggregation.
  • Perform data integrity checks, data cleaning, exploratory analysis and feature engineer using R 3.4.0.
  • Conducted analysis of assessing customer consuming behaviors and discover thevalue of customers with RMF analysis; applied customer segmentation with clustering algorithms such as K-MeansClustering and Hierarchical Clustering.
  • Work on outliers identification with box-plot, K-means clustering using Pandas, NumPy.
  • Participate in features engineering such as feature intersection generating, feature normalize and Label encoding with Scikit-learn preprocessing.
  • Use Python 3.0 (numPy, sciPy, pandas, sci-kit-learn, Seaborn, NLTK) and Spark 1.6 / 2.0 (PySpark, MLlib) to develop avariety of models and algorithms for analytic purposes.
  • Analyze Data and Performed Data Preparation by applying thehistoricalmodel to teh data set in AZUREML.
  • Perform data visualization with Tableau 10 and generate dashboards to present teh findings.
  • Determine customer satisfaction and halp enhance customer experience using NLP.
  • Work on Text Analytics, NaïveBayes, Sentiment analysis, creating word clouds and retrieving data from Twitter and other social networking platforms.
  • Use Git 2.6 to apply version control. Tracked changes in files and coordinated work on teh files among multiple team members.

Environment: ER Studio 9.7, Tableau 9.03, AWS, Teradata 15, MDM, GIT, Unix, Python 3.5.2,MLlib, SAS, regression, logistic regression, QlikView.

Confidential - Newark, CA

Data Scientist.

Responsibilities:

  • Worked with data investigation, discovery and mapping tools to scan every single data record from many sources.
  • Designed teh prototype of teh Data mart and documented possible outcome from it for end-user.
  • Involved in business process modeling using UML
  • Developed and maintained data dictionary to create metadata reports for technical and business purpose.
  • Created SQL tables with referential integrity and developed queries using SQL, SQL*PLUS and PL/SQL
  • Experience in maintaining database architecture and metadata dat support teh Enterprise Dataware house.
  • Design, coding, unit testing of ETL package source marts and subject marts using Informatica ETL processes for Oracle database.
  • Developed various QlikView Data Models by extracting and using teh data from various sources files, DB2, Excel, Flat Files and Bigdata.
  • Handled importing data from various data sources, performed transformations using Hive, Map Reduce, and loaded data into HDFS.
  • Interaction with Business Analyst, SMEs, and other Data Architects to understand Business needs and functionality for various project solutions
  • Researched, evaluated, architected, and deployed new tools, frameworks, and patterns to build sustainable Big Data platforms for teh clients.
  • Identifying and executing process improvements, hands-on in various technologies such as Oracle, Informatica, Business Objects.
  • Designed both 3NF data models for ODS, OLTP systems and dimensional data models using Star and Snowflake Schemas.
  • Co-ordinate with various business users, stakeholders, and SME to get Functional expertise, design, and business test scenarios review, UAT participation, and validation of financial data.
  • Worked very close with Data Architects and DBA team to implement data model changes in teh database in all environments.
  • Created PL/SQL packages and Database Triggers and developed user procedures and prepared user manuals for teh new programs.
  • Designed and developed Use Case, Activity Diagrams, Sequence Diagrams, OOD (Object-oriented Design) using UML and Visio.

Environment:r9.0, Informatica 9.0, ODS, OLTP, Oracle 10g, Hive, OLAP, DB2, Metadata, MS Excel, Mainframes MS Visio, Rational Rose, Requisite Pro. Hadoop, PL/SQL, etc.

Confidential, Chesterbrook, PA

Data Analyst.

Responsibilities:

  • Interacted with business users to identify and understand business requirements and identified teh scope of teh projects.
  • Identified and designed business Entities and attributes and relationships between teh Entities to develop a logical model and later translated teh model into physical model.
  • Developed normalized Logical and Physical database models for designing an OLTP application.
  • Enforced Referential Integrity (R.me) for consistent relationship between parent and child tables. Work with users to identify teh most appropriate source of record and profile teh data required for sales and service.
  • Involved in defining teh business/transformation rules applied for ICP data.
  • Define teh list codes and code conversions between teh source systems and teh data mart.
  • Developed teh financing reporting requirements by analyzing teh existing business objects reports
  • Utilized Informatica toolset (Informatica Data Explorer, and Informatica Data Quality) to analyze legacy data for data profiling.
  • Reverse Engineered teh Data Models and identified teh Data Elements in teh source systems and adding new Data Elements to teh existing data models.
  • Created XSD's for applications to connect teh interface and teh database.
  • Compare data with original source documents and validate Data accuracy.
  • Used reverse engineering to create Graphical Representation (E-R diagram) and to connect to existing database. \teh new reporting needs based on teh user with teh existing functionality
  • Also Worked on some impact of low quality and/or missing data on teh performance of data warehouse client
  • Worked with NZ Load to load flat file data into Netezza tables.
  • Good understanding about Netezza architecture.
  • Executed DDL to create databases, tables and views.
  • Generated comprehensive analytical reports by running SQL queries against current databases to conduct data analysis.
  • Involved in Data Mapping activities for teh data warehouse
  • Created and Configured Workflows, Work lets, and Sessions to transport teh data to target warehouse Netezza tables using Informatica Workflow Manager.
  • Extensively worked on Performance Tuning and understanding Joins and Data distribution.
  • Experienced in generating and documenting Metadata while designing application.
  • Coordinated with DBAs and generated SQL codes from data models.
  • Generate reports for better communication between business teams.

Environment: SQL/Server, Oracle9i, MS-Office, Embarcadero, Crystal Reports, Netezza, Teradata, Enterprise Architect, Toad, Informatica, ER Studio, XML, Informatica, OBIEE.

Confidential

Data Modeler

Responsibilities:

  • Worked with large amounts of structured and unstructured data.
  • Knowledge of Machine Learning concepts (Generalized Linear models, Regularization, Random Forest, Time Series models, etc.)
  • Worked in Business Intelligence tools and visualization tools such as Business Objects, ChartIO, etc.
  • Deployed GUI pages by using JSP, JSTL, HTML, DHTML, XHTML, CSS, JavaScript, and AJAX.
  • Configured teh project on WebSphere 6.1 application servers
  • Implemented teh online application by using Core Java, JDBC, JSP, Servlets and EJB 1.1, Web Services, SOAP, WSDL.
  • Handled end-to-end project from data discovery to model deployment.
  • Monitoring teh automated loading processes.
  • Communicated with other Health Care info by using Web Services with teh halp of SOAP, WSDL JAX-RPC
  • Used Singleton, factory design pattern, DAO Design Patterns based on teh application requirements
  • Used SAX and DOM parsers to parse teh raw XML documents
  • Used RAD as Development IDE for web applications.
  • Preparing and executing Unit test cases
  • Used Log4J logging framework to write Log messages with various levels.
  • Involved in fixing bugs and minor enhancements to teh front-end modules.
  • Implemented Microsoft Visio and Rational Rose for designing teh Use Case Diagrams, Class model, Sequence diagrams, and Activity diagrams for SDLC process of teh application
  • Doing functional and technical reviews
  • Maintenance in teh testing team for System testing/Integration/UAT.
  • Guaranteeing quality in teh deliverables.
  • Conducted Design reviews and Technical reviews with other project stakeholders.
  • Used pandas, NumPy, seaborn, SciPy, matplotlib, Scikit-Learn, NLTK in Python for utilizing various machine learning algorithms.
  • Worked on different data formats such as JSON, XML and performed machine learning algorithms in Python.
  • Was a part of teh complete life cycle of teh project from teh requirements to teh production support
  • Created test plan documents for all back-end database modules
  • Implemented teh project in Linux environment.

Environment: R, Erwin, MDM, QlikView, Machine learning, MLlib, PL/SQL, HDFS, Teradata, Python, JSON, HADOOP (HDFS), MapReduce, PIG, R Studio, MAHOUT, JAVA, HIVE, AWS.

Confidential

Data Modeler

Responsibilities:

  • Worked with internal architects, assisting in teh development of current and target state data architectures.
  • Worked with project team representatives to ensure dat logical and physical ER/Studio data models were developed in line with corporate standards and guidelines.
  • Involved in defining teh business/transformation rules applied for sales and service data.
  • Implementation of Metadata Repository, Transformations, Maintaining Data Quality, Data Standards, Data Governance program, Scripts, Stored Procedures, triggers and execution of test plans
  • Define teh list codes and code conversions between teh source systems and teh data mart.
  • Involved in defining teh source to business rules, target data mappings, data definitions.
  • Responsible for defining teh key identifiers for each mapping/interface.
  • Responsible for defining teh functional requirement documents for each source to target interface.
  • Document, clarify, and communicate requests for change requests with teh requestor and coordinate with teh development and testing team.
  • Generate weekly and monthly assets inventory reports.
  • Remain noledgeable in all areas of business operations in order to identify systems needs and requirements.
  • Responsible for defining teh key identifiers for each mapping/interface.
  • Performed data quality in Talend Open Studio.
  • Enterprise Metadata Library with any changes or updates.

Environment: Windows Enterprise Server 2000, SSRS, SSIS, Crystal Reports, DTS, SQL Profiler, and Query Analyze.

We'd love your feedback!