We provide IT Staff Augmentation Services!

Data Scientist/machine Learning Resume

0/5 (Submit Your Rating)

Albuquerque, NM

SUMMARY

  • Above 8+ years of experience in Machine Learning, Data mining with large data sets of Structured and Unstructured data, Data Acquisition, Data Validation, Predictive modeling, Data Visualization.
  • Extensive experience in Text Analytics, developing different Statistical Machine Learning, Data Mining solutions to various business problems and generating Data Visualizations using R, Python and Tableau.
  • Expertise in transforming business requirements into analytical models, designing algorithms, building models, developing data mining and reporting solutions that scales across massive volume of structured and unstructured data.
  • Sound knowledge of statistical learning theory with a post graduate background in mathematics.
  • Experience in designing stunning visualizations using Tableau software and publishing and presenting dashboards, Storyline on web and desktop platforms.
  • Experience in using Statistical procedures and Machine Learning Algorithms such as ANOVA, Clustering and Regression and Time Series Analysis to analyze data for further Model Building.
  • Designing of Physical Data Architecture of New system engines.
  • Hands on experience in implementing LDA, Naive Bayes and skilled in Random Forests, Decision Trees, Linear and Logistic Regression, SVM, Clustering, neural networks, Principle Component Analysis and good knowledge on Recommender Systems.
  • Proficient in Statistical Modeling and Machine Learning techniques (Linear, Logistics, Decision Trees, Random Forest, SVM, K - Nearest Neighbors, Bayesian, XG Boost) in Forecasting/ Predictive Analytics, Segmentation methodologies, Regression based models, Hypothesis testing, Factor analysis/ PCA, Ensembles.
  • Experience in foundational Machine Learning Models and concepts are Regression, Random Forest, Boosting, GBM, NNs, HMMs, CRFs, MRFs, Deep Learning.
  • Developing Logical Data Architecture with adherence to Enterprise Architecture.
  • Adept in statistical programming languages like R and also Python including Big Data technologies like Hadoop, Hive.
  • Skilled in using dplyr and pandas in R and python for performing exploratory data analysis.
  • Experience working withdatamodeling tools like Erwin, Power Designer and ER Studio.
  • Experience in designing star schema, Snowflake schema forDataWarehouse, ODS architecture.
  • Experience and Technical proficiency in Designing,DataModeling Online Applications, Solution Lead for ArchitectingDataWarehouse/Business Intelligence Applications.
  • Good understanding of Teradata SQL Assistant, Teradata Administrator anddataload/ export utilities like BTEQ, Fast Load, Multi Load, and Fast Export.
  • Highly skilled in using visualization tools like Tableau, ggplot2 and d3.js for creating dashboards.
  • Worked and extracteddatafrom various database sources like Oracle, SQL Server, DB2, Regularly accessing JIRA tool and other internal issue trackers for the Project development.
  • Knowledge of working with Proof of Concepts (PoC’s) and gap analysis and gathered necessary data for analysis from different sources, prepared data for data exploration using data munging and Teradata.
  • Well experienced in Normalization & De-Normalization techniques for optimum performance in relational and dimensional database environments.

TECHNICAL SKILLS

DataModeling Tools: Erwin r 9.6/9.5, ER/Studio 9.7, Star-Schema Modeling, Snowflake-Schema Modeling, FACT and dimension tables, Pivot Tables.

Databases: Oracle, MS Access, SQL Server, Sybase, DB2, Teradata, Hive, MySQL, Oracle, Teradata, MSSQL, DB2, SQL Lite, Hbase, MongoDB.

Machine Learning Tools: OpenCV, Theano, TensorFlow, Pygame, OpenGL, Numpy, Sympy, Scipy, Pandas

Big Data Tools: Hadoop, Hive, Spark, Pig, HBase, Sqoop, Flume.

Web Technologies: Django, HTML/5, CSS/3, XHTML, Java Script, React Js, XML, SOAP, REST, Bootstrap, JSON, AJAX.

R Package: dplyr, sqldf, data.table, Random Forest, gbm, caret, elastic net and all sort of Machine Learning Packages.

BI Tools: Tableau 7.0/8.2, Tableau server 8.2, Tableau Reader 8.1,SAP Business Objects, Crystal Reports

Packages: Microsoft Office 2010, Microsoft Project 2010, SAP and Microsoft Visio, Share point Portal Server

Operating Systems: Microsoft Windows 8/7/XP, Linux and UNIX.

Languages: SAS/STAT, SAS/ETS, SAS E-Miner, SPSS, SQL, PL/SQL, ASP, Visual Basic, XML, Python, SQL, T-SQL, SQL Server, C, C++, JAVA, HTML, UNIX shell scripting, PERL, R, SCALA, MATLAB, Spark, Power BI.

Applications: Toad for Oracle, Oracle SQL Developer, MS Word, MS Excel MS Power Point, Teradata, Designer 6i.

Methodologies: RAD, JAD, RUP, UML, System Development Life Cycle (SDLC), Waterfall Model.

PROFESSIONAL EXPERIENCE

Confidential, Albuquerque, NM

Data Scientist/Machine Learning

Responsibilities:

  • Built models using Statistical techniques like Bayesian HMM and Machine Learning classification models like XG Boost, SVM, and Random Forest.
  • A highly immersive Data Science program involving Data Manipulation & Visualization, Web Scraping, Machine Learning, Python programming, SQL, GIT, Unix Commands, NoSQL, MongoDB, Hadoop.
  • Setup storage and data analysis tools in Amazon Web Services cloud computing infrastructure.
  • Used pandas, numpy, seaborn, scipy, matplotlib, scikit-learn, NLTK in Python for developing various machine learning algorithms.
  • Installed and used Caffe Deep Learning Framework
  • Worked on different data formats such as JSON, XML and performed machine learning algorithms in Python.
  • Worked asDataArchitectsand ITArchitectsto understand the movement ofdataand its storage and ER Studio 9.7
  • Participated in all phases of data mining; data collection, data cleaning, developing models, validation, visualization and performed Gap analysis.
  • Data Manipulation and Aggregation from different source using Nexus, Toad, Business Objects, Power BI and Smart View.
  • Implemented Agile Methodology for building an internal application.
  • Focus on integration overlap and Informatica newer commitment to MDM with the acquisition of Identity Systems.
  • Good knowledge of Hadoop Architecture and various components such as HDFS, JobTracker, Task Tracker, NameNode, DataNode, Secondary NameNode, and MapReduce concepts.
  • AsArchitectdelivered various complex OLAP databases/cubes, scorecards, dashboards and reports.
  • Programmed a utility in Python that used multiple packages (scipy, numpy, pandas)
  • Implemented Classification using supervised algorithms like Logistic Regression, Decision trees, KNN, Naive Bayes.
  • Used Teradata15 utilities such as Fast Export, MLOAD for handling various tasks Data Migration/ETL from OLTP Source Systems to OLAP Target Systems
  • Experience in Hadoop ecosystem components like Hadoop MapReduce, HDFS, HBase, Oozie, Hive, Sqoop, Pig, Flume including their installation and configuration.
  • Updated Python scripts to match trainingdatawith our database stored in AWS Cloud Search, so that we would be able to assigneach document a response label for further classification.
  • Datatransformation from various resources,dataorganization, features extraction from raw and stored.
  • Validated the machine learning classifiers using ROC Curves and Lift Charts.
  • Extracted data from HDFS and prepared data for exploratory analysis using data munging.

Environment: ER Studio 9.7, Tableau 9.03, AWS, Teradata 15, MDM, GIT, Unix, Python 3.5.2,, MLLib, SAS, Regression, Logistic Regression, Hadoop, NoSQL, Teradata, OLTP, Random Forest, OLAP, HDFS, ODS, NLTK, SVM, JSON, XML, MapReduce.

Confidential, Seattle, WA

Data Scientist/Data Architecture

Responsibilities:

  • Responsible for performing Machine-learning techniques regression/classification to predict the outcomes.
  • Coded R functions to interface with Caffe Deep Learning Framework.
  • Working in Amazon Web Services cloud computing environment
  • Used Tableau to automatically generate reports, Worked with partially adjudicated insurance flat files, internal records, 3rd partydatasources, JSON, XML and more.
  • Identified and evaluated various distributed machine learning libraries like Mahout, MLLib (Apache Spark) and R.
  • Evaluated the performance of Various Classification and Regression algorithms using R language to predict the future power.
  • Worked with several R packages including knitr, dplyr, SparkR, CausalInfer, spacetime.
  • Involved in Detecting Patterns with Unsupervised Learning like K-Means Clustering.
  • Implemented end-to-end systems for Data Analytics, Data Automation and integrated with custom visualization tools using R, Mahout, Hadoop and MongoDB.
  • Gathering all the data that is required from multiple data sources and creating datasets that will be used in analysis.
  • Performed Exploratory Data Analysis and Data Visualizations using R, and Tableau.
  • Perform a proper EDA, Univariate and bi-variate analysis to understand the intrinsic effect/combined effects.
  • Worked withDatagovernance,Dataquality,datalineage,Dataarchitectto design various models and processes.
  • Independently coded new programs and designed Tables to load and test the program effectively for the given POC's using with BigData/Hadoop.
  • Designeddatamodels anddataflow diagrams using Erwin and MS Visio.
  • As an Architect implemented MDM hub to provide clean, consistent data for a SOA implementation.
  • Developed, Implemented & Maintained the Conceptual, Logical & PhysicalDataModels using Erwin for Forward/Reverse Engineered Databases.
  • EstablishedDataArchitecture Strategy, Best Practices, Standards, and Roadmaps.
  • Lead the development and presentation of a data analytics data-hub prototype with the help of the other members of the emerging solutions team.
  • Performed data cleaning and imputation of missing values using R.
  • Worked with Hadoop eco system covering HDFS, HBase, YARN and Map Reduce.
  • Take up ad-hoc requests based on different departments and locations.
  • Used Hive to store the data and perform data cleaning steps for huge datasets.
  • Created dash boards and visualization on regular basis using ggplot2 and Tableau.
  • Creating customized business reports and sharing insights to the management.
  • Worked with BTEQ to submit SQL statements, import and export data, and generate reports in Teradata.
  • Interacted with the other departments to understand and identify data needs and requirements and work with other members of the IT organization to deliver data visualization and reporting solutions to address those needs.

Environment: R 3.0, Erwin 9.5, Tableau 8.0, MDM, QlikView, MLLib, PL/SQL, HDFS, Teradata 14.1, JSON, HADOOP (HDFS), MapReduce, PIG, Spark, R Studio, MAHOUT, JAVA, HIVE, AWS.

Confidential

Data Scientist

Responsibilities:

  • As an Architect design conceptual, logical and physical models using Erwin and build data marts using hybrid Inmon and Kimball DW methodologies
  • Interaction with Business Analyst, SMEs and other Data Architects to understand Business needs and functionality for various project solutions
  • Researched, evaluated, architected, and deployed new tools, frameworks, and patterns to build sustainable Big Data platforms for the clients
  • Identifying and executing process improvements, hands-on in various technologies such as Oracle, Informatica, and Business Objects.
  • Worked closely with business, data governance, SMEs and vendors to define data requirements.
  • Worked with data investigation, discovery and mapping tools to scan every single data record from many sources.
  • Designed the prototype of theDatamart and documented possible outcome from it for end-user.
  • Involved in business process modeling using UML
  • Developed and maintaineddatadictionary to create metadata reports for technical and business purpose.
  • Created SQL tables with referential integrity and developed queries using SQL, SQL*PLUS and PL/SQL
  • Experience in maintaining database architecture and metadata that support the EnterpriseData warehouse.
  • Developed various QlikView Data Models by extracting and using the data from various sources files, DB2, Excel, Flat Files and Big data.
  • Handled importing data from various data sources, performed transformations using Hive, Map Reduce, and loaded data into HDFS.
  • Designed both 3NF data models for ODS, OLTP systems and dimensional data models using Star and Snow flake Schemas.
  • Design, coding, unit testing of ETL package source marts and subject marts using Informatica ETL processes for Oracle database.

Environment: Erwin r9.0, Informatica 9.0, ODS, OLTP, Oracle 10g, Hive, OLAP, DB2, Metadata, MS Excel, Mainframes MS Visio, Rational Rose, Requisite Pro, Hadoop, PL/SQL, etc.

Confidential

Java Developer

Responsibilities:

  • Participating in system design, planning, estimation, and implementation.
  • Involved in developing Use Case Diagrams, Class Diagrams, Sequence Diagrams and Process Flow Diagrams for the modules using UML and Rational Rose.
  • Developed the presentation layer using JSP, AJAX, HTML, XHTML, CSS and client validations using JavaScript.
  • Developed and implemented the MVC Architectural Pattern using Spring Framework.
  • Effective usage of J2EE Design Patterns Namely Session Facade, Factory Method, Command, and Singleton to develop various base framework components in the application.
  • Modified Account View Functionality to enable display of blocked accounts details that have tags. This involved modifying beans, JSP changes, and middle tier enhancements.
  • Worked on generating the web services classes by using WSDL, UDDI, and SOAP.
  • Consumed Web Services using WSDL, SOAP, and UDDI from the third party for authorizing payments to/from customers.
  • Involved in Units integration using JUnit, bug fixing, and User acceptance testing with test cases.
  • Used CVS for version control and Maven as a build tool.
  • Designed and developed systems based on JEE specifications and used Spring Framework with MVC architecture.
  • Used Spring Roo Framework Design/Enterprise Integration patterns and REST architecture compliance for design and development of applications.
  • Involved in the application development using Spring Core, Spring Roo, Spring JEE, Spring Aspects modules and Java web-based technologies such as Web Service (REST /SOA /micro services) including micro services implementations and Hibernate ORM.
  • Used LDAP and Microsoft active directory series for authorization and authentication services.
  • Implemented different design patterns such as singleton, Session Façade, Factory, and MVC design patterns such as Business delegate, session façade and DAO design patterns.
  • Used JPA - Object Mapping for the backend data persistence.

Environment: R 3.0, Erwin 9.5, Tableau 8.0, MDM, QlikView, MLLib, PL/SQL, HDFS, Teradata 14.1, JSON, HADOOP (HDFS), MapReduce, PIG, Spark, R Studio, MAHOUT, JAVA, HIVE, AWS.

We'd love your feedback!