Data Scientist Resume
Lowell, ArkansaS
PROFESSIONAL SUMMARY:
- Highly efficient Data Scientist with over 8+ years of experience in areas including Data Analyst, Statistical Analysis, Machine Learning, Data mining with large data sets of structured and unstructured data in Manufacturing and healthcare industry.
- Hands on experience with R packages such as Sqldf, plyr, forecast, random forest for predictive modeling.
- Excellent working in Big Data Hadoop Hortonworks, HDFS architecture, R, Python, Jupyter, Pandas, numPy, Scikit, Matplotlib, pyhive, Keras, Hive, noSQL - HBASE, Sqoop, Pig, MapReduce, Oozie, Spark MLlib.
- Hands on experience in Liner, Logistic Regression, K Means Cluster Analysis, Decision Tree, KNN, SVM, Random Forest, Market Basket, NLTK/Naïve Bayes, Sentiment Analysis, Text Mining/Text Analytics, Time Series Forecasting.
- Worked on different type of Python modules such as requests, boto, flake8, flask, mock and nose
- Extensive experienced on business intelligence (and BI technologies) tools such as OLAP, Data warehousing, reporting and querying tools, Data mining and Spreadsheets
- Efficient in developing Logical and Physical Data model and organizing data as per the business requirements using Sybase Power Designer, Erwin, ER Studio in both OLTP and OLAP applications
- Solid ability to write and optimize diverse SQL queries, working knowledge of RDBMS like SQL Server 2008, NoSQL databases like MongoDB 3.2
- Strong experience in Big Data technologies like Spark 1.6, Sparksql, pySpark, Hadoop 2.X, HDFS, Hive 1.X .
- Experience in visualization tools like, Tableau 9.X, 10.X for creating dashboards
- Experienced with R programing for data visualization (ggplot2, mat-plot & Qplots).
- Experienced in Big Data with Hadoop 2, HDFS, MapReduce, and Spark.
- Experienced in Spark 2.1, Spark SQL and PySpark.
- Performed data cleaning and feature selection using MLlib package in PySpark
- 1Performed partitioned clustering into 100 by k-means clustering using Scikit-learn package in Python where similar hotels for a search are grouped together.
- Adept at using SAS Enterprise suite, R, Python, and BigData related technologies including Hadoop, Hive, Pig, Sqoop, Cassandra, Oozie, Flume, Map-Reduce and Cloudera Manager for design of business intelligence applications
- Experienced in Python to manipulate data for data loading and extraction and worked with python libraries like Matplotlib, Numpy, Scipy and Pandas for data analysis.
- Worked with complex applications such as R, SAS, Matlab and SPSS to develop neural network, cluster analysis.
- Strong SQL programming skills, with experience in working with functions, packages and triggers.
- Good understanding of Teradata SQL Assistant, Teradata Administrator and data load/ export utilities like BTEQ, Fastload, Multiload, FastExport.
- Experience with Data Analytics, Data Reporting, Ad-hoc Reporting, Graphs, Scales, PivotTables and OLAP reporting.
- Worked with NoSQL Database including Hbase, Cassandra and MongoDB.
- Experienced in Big Data with Hadoop, HDFS, MapReduce, and Spark.
- Experienced in Data Integration Validation and Data Quality controls for ETL process and Data Warehousing using MS Visual Studio SSIS, SSAS, and SSRS.
- Proficient in Tableau and R-Shiny data visualization tools to analyze and obtain insights into large datasets, create visually powerful and actionable interactive reports and dashboards.
- Automated recurring reports using SQL and Pythonand visualized them on BI platform like Tableau.
TECHNICAL SKILLS:
DataModeling Tools: Erwin r9.6, 9.5, 9.1, 8.x, Rational Rose, ER/Studio, MS Visio, SAP Power designer.
Programming Languages: C/C++, C#, Java, Oracle PL/SQL, Python, SQL, T - SQL, UNIX shell scripting, Bash, HTML5.
Scripting Languages: Python (NumPy, SciPy, Pandas, Gensim, Keras), R (Caret, Weka, ggplot), XML, JSON
Big Data Technologies: Hadoop, Hive, HDFS, MapReduce, Pig, Kafka, spark, hbase.
Reporting Tools: Crystal reports XI, Business Intelligence, SSRS, Business Objects 5.x/ 6.x, Cognos7.0/6.0, Tableau.
ETL: Informatica Power Centre, SSIS.
BI Tools: Tableau, Tableau server, Tableau Reader, SAP Business Objects, OBIEE, Qlikview, SAP Business Intelligence, Amazon Redshift, or Azure Data Warehouse Tools MS-Office suite (Word, Excel, MS Project and Outlook), Spark MLlib, Scala NLP, MariaDB, Azure, SAS.
Data Modeling Tools: Erwin Sybase Power Designer, ER Studio, Enterprise Architect, Oracle Designer, MS Visio.
Operating Systems: Windows, UNIX, MS DOS, Sun Solaris.
Databases: Oracle, Teradata, Netezza, Microsoft SQL Server, Mysql, MongoDB, HBase, Cassandra.
Project Execution Methodologies: Ralph Kimball and Bill Inmon data warehousing methodology, Rational Unified Process (RUP), Rapid Application Development (RAD), Joint Application Development (JAD)
PROFESSIONAL EXPERIENCE:
Confidential, Lowell, Arkansas
Data Scientist
Responsibilities:
- This project was focused on customer segmentation based on machine learning and statistical modelingeffort including buildingpredictive models and generatesdataproducts to support customer segmentation.
- Used Python to visualize the data and implemented machine learning algorithms.
- Used R Programming for more statistical analysis
- Develop a pricing model for various product & services bundled offering to optimize and predict the gross margin.
- Built priceelasticitymodel for various product and services bundled offering.
- Developed predictive causal model using annual failure rate and standard cost basis for the new bundled service offering.
- Design and develop analytics, machine learning models, and visualizations that drive performance and provide insights, from prototyping to production deployment and productrecommendation and allocation planning;
- Application of various machine learning algorithms and statistical modeling like decision trees, regression models, neural networks, SVM, clustering to identify Volume using package in Python.
- Performeddataimputation using Scikit-learn package inPython.
- Performeddataprocessing usingPythonlibraries like Numpy and Pandas.
- Worked withdataanalysis using ggplot2 library in R to dodatavisualizations for better understanding of customers' behaviors.
- Experience in using AWS Cloud Services
- Experience in using data science relevant technologies likeJupyter Hub
- Performeddataanalysis by using Hive to retrieve thedatafrom Hadoop cluster, SQL to retrievedata from Oracle database.
- Experience in Big Data Hadoop, HIVE, PySpark and HDFS
- Experience in using Database like MSSQL, Postgres.
- Written complexHiveandSQLqueries for data analysis to meet business requirements.
- Hands on experience in implementing Naive Bayes and skilled in Random Forests, Decision Trees, Linear and Logistic Regression, SVM, Clustering, neural networks, Principle Component Analysis.
- Performed K-means clustering, Multivariate analysis, and Support Vector Machines in Python.
- Written complexSQLqueries for implementing business requirements
- PerformedDataCleaning, features scaling, features engineering using pandas and numpy packages inpython.
- Developed MapReduce pipeline for feature extraction using Hive.
- Developed entire frontend and backend modules usingPythonon Django Web Framework.
- Implemented the presentation layer with HTML, CSS, and JavaScript.
- CreatedDataQuality Scripts using SQL and Hive to validate successfuldataload and quality of thedata. Created various types ofdatavisualizations usingPythonand Tableau.
- PreparedDataVisualization reports for the management using R, Tableau, and Power BI.
- Work independently or collaboratively throughout the complete analytics project lifecycle includingdataextraction/preparation, design and implementation of scalable machine learning analysis and solutions, and documentation of results.
Environment: R/R studio, SAS, Python, Hive, Hadoop, MS Excel, MS SQL Server, Power BI, Tableau, T-SQL, ETL, MS Access, XML, JSON, MS office 2007, Outlook.
Confidential, Elmhurst, IL
Data Scientist
Responsibilities:
- Perform Data Profiling to learn about user behavior and merge data from multiple data sources.
- Implemented big data processing applications to collect, clean and normalization large volumes of open data using Hadoopecosystems such as PIG, HIVE, and HBase.
- Designing and developing various machine learning frameworks using Python, R, and Matlab.
- Integrate R into Micro Strategy to expose metrics determined by more sophisticated and detailed models than natively available in the tool.
- Worked on different data formats such as JSON, XML and performed machinelearningalgorithms in Python.
- Worked as Data Architects and IT Architects to understand the movement of data and its storage and ERStudio9.7
- Processed huge datasets (over billion data points, over 1 TB of datasets) for data association pairing and provided insights into meaningful data association and trends
- Developed cross-validation pipelines for testing the accuracy of predictions
- Enhanced statistical models (linear mixed models) for predicting the best products for commercialization using Machine Learning Linear regression models, KNN and K-means clustering algorithms
- Participated in all phases of datamining,datacollection, datacleaning, developingmodels, validation, visualization and performed Gapanalysis. data manipulation and Aggregation from adifferent source using Nexus, Toad, BusinessObjects, PowerBI, and SmartView.
- Independently coded new programs and designed Tables to load and test the program effectively for the given POC's using with Big Data/Hadoop.
- Develop documents and dashboards of predictions in Microstrategy and present it to the business intelligence team.
- Developed various QlikViewDataModels by extracting and using the data from various sources files, DB2, Excel, Flat Files and Bigdata.
- Good knowledge of HadoopArchitecture and various components such as HDFS, JobTracker, TaskTracker, NameNode, DataNode, SecondaryNameNode, and MapReduce concepts.
- As Architect delivered various complex OLAPdatabases/cubes, scorecards, dashboards and reports.
- Programmed a utility in Python that used multiple packages (scipy, numpy, pandas)
- Implemented Classification using supervised algorithms like LogisticRegression, Decisiontrees, KNN, NaiveBayes.
- Used Teradata15 utilities such as FastExport, MLOAD for handling various tasks data migration/ETL from OLTP Source Systems to OLAP Target Systems
- Handled importing data from various data sources, performed transformations using Hive, MapReduce, and loadeddata into HDFS.
- Collaborate with data engineers to implement ETL process, write and optimized SQL queries to perform data extraction from Cloud and merging from Oracle 12c.
- Collect unstructured data from MongoDB 3.3 and completed data aggregation.
- Perform data integrity checks, data cleaning, exploratory analysis and feature engineer using R 3.4.0.
- Conducted analysis of assessing customer consuming behaviors and discover thevalue of customers with RMF analysis; applied customer segmentation with clustering algorithms such as K-MeansClustering and Hierarchical Clustering.
- Work on outliers identification with box-plot, K-means clustering using Pandas, NumPy.
- Participate in features engineering such as feature intersection generating, feature normalize and Label encoding with Scikit-learn preprocessing.
- Use Python 3.0 (numPy, sciPy, pandas, sci-kit-learn, Seaborn, NLTK) and Spark 1.6 / 2.0 (PySpark, MLlib) to develop avariety of models and algorithms for analytic purposes.
- Analyze Data and Performed Data Preparation by applying thehistoricalmodel to the data set in AZUREML.
- Perform data visualization with Tableau 10 and generate dashboards to present the findings.
- Determine customer satisfaction and help enhance customer experience using NLP.
- Work on Text Analytics, NaïveBayes, Sentiment analysis, creating word clouds and retrieving data from Twitter and other social networking platforms.
- Use Git 2.6 to apply version control. Tracked changes in files and coordinated work on the files among multiple team members.
Environment: ER Studio 9.7, Tableau 9.03, AWS, Teradata 15, MDM, GIT, Unix, Python 3.5.2,MLlib, SAS, regression, logistic regression, QlikView.
Confidential - Newark, CA
Data Scientist
Responsibilities:
- Worked with data investigation, discovery and mapping tools to scan every single data record from many sources.
- Designed the prototype of the Data mart and documented possible outcome from it for end-user.
- Involved in business process modeling using UML
- Developed and maintained data dictionary to create metadata reports for technical and business purpose.
- Created SQL tables with referential integrity and developed queries using SQL, SQL*PLUS and PL/SQL
- Experience in maintaining database architecture and metadata that support the Enterprise Dataware house.
- Design, coding, unit testing of ETL package source marts and subject marts using Informatica ETL processes for Oracle database.
- Developed various QlikView Data Models by extracting and using the data from various sources files, DB2, Excel, Flat Files and Bigdata.
- Handled importing data from various data sources, performed transformations using Hive, Map Reduce, and loaded data into HDFS.
- Interaction with Business Analyst, SMEs, and other Data Architects to understand Business needs and functionality for various project solutions
- Researched, evaluated, architected, and deployed new tools, frameworks, and patterns to build sustainable Big Data platforms for the clients.
- Identifying and executing process improvements, hands-on in various technologies such as Oracle, Informatica, Business Objects.
- Designed both 3NF data models for ODS, OLTP systems and dimensional data models using Star and Snowflake Schemas.
- Co-ordinate with various business users, stakeholders, and SME to get Functional expertise, design, and business test scenarios review, UAT participation, and validation of financial data.
- Worked very close with Data Architects and DBA team to implement data model changes in the database in all environments.
- Created PL/SQL packages and Database Triggers and developed user procedures and prepared user manuals for the new programs.
- Designed and developed Use Case, Activity Diagrams, Sequence Diagrams, OOD (Object-oriented Design) using UML and Visio.
Environment: r9.0, Informatica 9.0, ODS, OLTP, Oracle 10g, Hive, OLAP, DB2, Metadata, MS Excel, Mainframes MS Visio, Rational Rose, Requisite Pro. Hadoop, PL/SQL, etc.
Confidential, Chesterbrook, PA
Data Analyst
Responsibilities:
- Interacted with business users to identify and understand business requirements and identified the scope of the projects.
- Identified and designed business Entities and attributes and relationships between the Entities to develop a logical model and later translated the model into physical model.
- Developed normalized Logical and Physical database models for designing an OLTP application.
- Enforced Referential Integrity (R.I) for consistent relationship between parent and child tables. Work with users to identify the most appropriate source of record and profile the data required for sales and service.
- Involved in defining the business/transformation rules applied for ICP data.
- Define the list codes and code conversions between the source systems and the data mart.
- Developed the financing reporting requirements by analyzing the existing business objects reports
- Utilized Informatica toolset (Informatica Data Explorer, and Informatica Data Quality) to analyze legacy data for data profiling.
- Reverse Engineered the Data Models and identified the Data Elements in the source systems and adding new Data Elements to the existing data models.
- Created XSD's for applications to connect the interface and the database.
- Compare data with original source documents and validate Data accuracy.
- Used reverse engineering to create Graphical Representation (E-R diagram) and to connect to existing database. \the new reporting needs based on the user with the existing functionality
- Also Worked on some impact of low quality and/or missing data on the performance of data warehouse client
- Worked with NZ Load to load flat file data into Netezza tables.
- Good understanding about Netezza architecture.
- Executed DDL to create databases, tables and views.
- Generated comprehensive analytical reports by running SQL queries against current databases to conduct data analysis.
- Involved in Data Mapping activities for the data warehouse
- Created and Configured Workflows, Work lets, and Sessions to transport the data to target warehouse Netezza tables using Informatica Workflow Manager.
- Extensively worked on Performance Tuning and understanding Joins and Data distribution.
- Experienced in generating and documenting Metadata while designing application.
- Coordinated with DBAs and generated SQL codes from data models.
- Generate reports for better communication between business teams.
Environment: SQL/Server, Oracle9i, MS-Office, Embarcadero, Crystal Reports, Netezza, Teradata, Enterprise Architect, Toad, Informatica, ER Studio, XML, Informatica, OBIEE.
Confidential
J2EE Developer
Responsibilities:
- Involved in gathering the requirements from the Business.
- Worked on designing the algorithms, Business components, Java service programs.
- Maintained good support with Business Analyst and Business Users to identify the information as per the business requirements.
- Involved in the development of User Interface (UI) using HTML, JavaScript, CSS, XML using Spring MVC Framework.
- Responsible for write the Unit test cases and code coverage for each module.
- Actively involved in project module estimations and designs.
- Written SQL and PL/SQL new stored procedures and modified existing ones depending on the requirements in the MySQL Database.
- Involved in reviewing the internal designs and code reviews.
- Performed JavaScript validations on the data submitted by the user.
- Used Spring MVC framework at the front end.
- Implemented modules into Node.JS to integrate with designs and requirements.
- Used Core Java techniques like Multithreading, Collections, Generics in the development phase.
- Worked on JPA for persisting the objects into the system.
- Design and develop automation framework using Java, Selenium web driver, JUnit.
- Actively participated in resolving the issues encountered in the development phase.
- Performed the smoke test after every release.
- Involved in creating scenarios for performance testing followed up with the performance testing team to run the scripts.
- Developed Unit test cases using JUnit.
- Generated the required XML files to transfer the data between the server and the web pages.
- Deployed the apps on Unix Box and used FileZilla to get the logs from UNIX box.
- Implemented the middle tier using Spring MVC to process client requests and implement server side code to be executed.
- Developed SQL, stored procedures in oracle database.
Environment: Java 1.5, Spring, Hibernate, HTML, JavaScript, CSS, MySQL, JUnit, Eclipse IDE, XSLT, AJAX, Oracle 10g, XML, PL/SQL, Angular JS, IntelliJ, Node JS, jQuery, JPA.
Confidential
Java Developer
Responsibilities:
- Involved in Analysis and Design of the project, which is based on MVC (Model-View-Controller) Architecture and Design Patterns.
- Created UML Use Cases, Sequence diagrams, class diagrams and page flow diagrams using Rational Rose.
- Designed and developed UI using HTML, CSS, JSP and Struts where users have all the items listed for auctions.
- Involved in developing prototypes of the product.
- Involved in writing Detail Design Documents with UML Specifications.
- Implemented Socket Programming to communicate with all the customers.
- Developed Authentication and Authorization modules where authorized persons can only access the inventory related operations.
- Developed Controller Servlets, Action and Form objects for process of interacting with Oracle ADF database and retrieving dynamic data.
- Responsible for coding SQL Statements and Stored procedures for back end communication using JDBC
- Used Net BeansIDE to develop the application.
- Wrote JavaScript validations on the client side.
- Involved in unit testing and system testing and also responsible for preparing test scripts for the system testing in UNIX Environment.
- Responsible for packaging and deploying components in to the JBoss Application Server.
Environment: Java, Java Beans, JSP, JavaScript, Servlets, JDBC, Net Beans, JBoss, XML, HTML, Struts, WSDL, Oracle.