We provide IT Staff Augmentation Services!

Data Scientist / Machine Learning Resume

Dallas, TX

SUMMARY:

  • Above 8+ years of experience in MachineLearning , Datamining with largedatasets of Structured and Unstructureddata , DataAcquisition , DataValidation , Predictivemodeling , DataVisualization .
  • Extensive experience in Text Analytics , developing different StatisticalMachineLearning , DataMining solutions to various business problems and generating datavisualizations using R , Python and Tableau .
  • Expertise in transforming business requirements into analyticalmodels , designingalgorithms , buildingmodels , developing datamining and reportingsolutions that scales across massive volume of structured and unstructured data.
  • Experience in designing stunning visualizations using Tableau software and publishing and presenting dashboards , Storyline on web and desktop platforms.
  • Designing of PhysicalDataArchitecture of New system engines.
  • Hands on experience in implementing LDA , NaiveBayes and skilled in RandomForests , Decision Trees, Linear and Logistic Regression, SVM , Clustering , neuralnetworks , Principle Component Analysis and good knowledge on Recommender Systems.
  • Proficient in StatisticalModeling and MachineLearning techniques (Linear, Logistics, Decision Trees, Random Forest, SVM, K - Nearest Neighbors, Bayesian, XG Boost) in Forecasting/ Predictive Analytics, Segmentation methodologies, Regression based models, Hypothesis testing, Factor analysis/ PCA, Ensembles .
  • Developing LogicalDataArchitecture with adherence to Enterprise Architecture .
  • Strong experience in Software Development Life Cycle ( SDLC ) including RequirementsAnalysis , Design Specification and Testing as per Cycle in both Waterfall and Agile methodologies.
  • Adept in statisticalprogramminglanguages like R and also Python including BigData technologies like Hadoop , Hive .
  • Skilled in using dplyr and pandas in R and python for performing exploratory dataanalysis .
  • Experience working with data modeling tools like Erwin , PowerDesigner and ERStudio .
  • Experience in designing starschema , Snowflakeschema for Data Warehouse , ODSarchitecture .
  • Experience and Technicalproficiency in Designing , Data ModelingOnlineApplications , Solution Lead for Architecting Data Warehouse / BusinessIntelligence Applications.
  • Good understanding of TeradataSQLAssistant , Teradata Administrator and data load / export utilities like BTEQ , FastLoad , MultiLoad , FastExport .
  • Experience with DataAnalytics , DataReporting , Ad-hocReporting,Graphs , Scales , PivotTables and OLAP reporting.
  • Highly skilled in using Hadoop (pig and Hive) for basic analysis and extraction of data in the infrastructure to provide datasummarization .
  • Highly skilled in using visualization tools like Tableau , ggplot2 and d3.js for creating dashboards .
  • Worked and extracted data from various database sources like Oracle , SQLServer , DB2 , and Teradata .
  • Well experienced in Normalization & De - Normalization techniques for optimum performance in relational and dimensional database environments.
  • Regularly accessing JIRA tool and other internal issue trackers for the Project development.
  • Skilled in System Analysis, E-R / Dimensional Data Modeling , DatabaseDesign and implementing RDBMS specific features.
  • Knowledge of working with Proof of Concepts ( PoC’s ) and gapanalysis and gathered necessary data for analysis from different sources, prepared data for dataexploration using datamunging .

TECHNICAL SKILLS:

Data Modeling Tools: Erwin r9.6/9.5, ER/Studio 9.7, Star-Schema Modeling, Snowflake-Schema Modeling, FACT and dimension tables, Pivot Tables.

Database: Oracle 11g/12c, MS Access, SQL Server 2012/2014, Sybase and DB2,Teradata14/15, Hive

Big Data Tools: Hadoop, Hive, Spark, Pig, HBase, Sqoop, Flume.

BI Tools: Tableau 7.0/8.2, Tableau server 8.2, Tableau Reader 8.1,SAP Business Objects, Crystal Reports

Packages: Microsoft Office 2010, Microsoft Project 2010, SAP and Microsoft Visio, Share point Portal Server

Operating Systems: Microsoft Windows 8/7/XP, Linux and UNIX.

Languages: SQL, PL/SQL, ASP, Visual Basic, XML, Python, SQL, T-SQL, SQL Server, C, C++, JAVA, HTML, UNIX shell scripting, PERL, R

Applications: Toad for Oracle, Oracle SQL Developer, MS Word, MS Excel, MS Power Point, Teradata, Designer 6i

Methodologies: RAD, JAD, RUP, UML, System Development Life Cycle (SDLC), Waterfall Model

Operating System: UNIX, Linux, Windows, Windows XP pro.

PROFESSIONAL EXPERIENCE

Confidential, Dallas, TX

Data Scientist / Machine learning

Responsibilities:

  • Built models using Statisticaltechniques like Bayesian HMM and MachineLearning classification models like XGBoost , SVM , and RandomForest .
  • A highly immersive DataScience program involving DataManipulation & Visualization , Web Scraping, MachineLearning , Python programming, SQL , GIT , Unix Commands, NoSQL , MongoDB , Hadoop .
  • Setup storage and dataanalysis tools in AmazonWebServices cloud computing infrastructure.
  • Used pandas , numpy , seaborn , scipy , matplotlib , scikit - learn , NLTK in Python for developing various machinelearningalgorithms .
  • Installed and used CaffeDeepLearningFramework
  • Worked on different data formats such as JSON , XML and performed machinelearningalgorithms in Python.
  • Worked as Data Architects and IT Architects to understand the movement of data and its storage and ERStudio9.7
  • Participated in all phases of datamining ; datacollection , datacleaning , developingmodels , validation , visualization and performed Gapanalysis .
  • DataManipulation and Aggregation from different source using Nexus , Toad , BusinessObjects , PowerBI and SmartView .
  • Implemented Agile Methodology for building an internal application.
  • Focus on integration overlap and Informatica newer commitment to MDM with the acquisition of Identity Systems.
  • Good knowledge of HadoopArchitecture and various components such as HDFS , JobTracker , TaskTracker , NameNode , DataNode , SecondaryNameNode , and MapReduce concepts.
  • As Architect delivered various complex OLAPdatabases / cubes , scorecards , dashboards and reports .
  • Programmed a utility in Python that used multiple packages (scipy, numpy, pandas)
  • Implemented Classification using supervised algorithms like LogisticRegression , Decisiontrees , KNN , NaiveBayes .
  • Used Teradata15 utilities such as FastExport , MLOAD for handling various tasks data migration/ETL from OLTP Source Systems to OLAP Target Systems
  • Experience in Hadoop ecosystem components like HadoopMapReduce , HDFS , HBase , Oozie , Hive , Sqoop , Pig , Flume including their installation and configuration.
  • Updated Pythonscripts to match training data with our database stored in AWSCloud Search, so that we would be able to assign each document a response label for further classification.
  • Data transformation from various resources, data organization , features extraction from raw and stored.
  • Validated the machine learning classifiers using ROC Curves and Lift Charts.
  • Extracted data from HDFS and prepared data for exploratory analysis using datamunging .

Environment: ER Studio 9.7, Tableau 9.03, AWS, Teradata 15, MDM, GIT, Unix, Python 3.5.2,, MLLib, SAS, regression, logistic regression, Hadoop, NoSQL, Teradata, OLTP, random forest, OLAP, HDFS, ODS, NLTK, SVM, JSON, XML, MapReduce

Confidential, Chicago, IL

Data Scientist

Responsibilities:

  • Coded R functions to interface with CaffeDeepLearning Framework
  • Working in AmazonWebServices cloud computing environment
  • Used Tableau to automatically generate reports Worked with partially adjudicated insurance flat files, internal records, 3rd party data sources , JSON , XML and more.
  • Worked with several R packages including knitr , dplyr , SparkR , CausalInfer , spacetime .
  • Implemented end-to-end systems for DataAnalytics , DataAutomation and integrated with custom visualization tools using R,Mahout , Hadoop and MongoDB .
  • Gathering all the data that is required from multiple data sources and creating datasets that will be used in analysis .
  • Performed Exploratory DataAnalysis and DataVisualizations using R , and Tableau .
  • Perform a proper EDA , Univariate and bi-variate analysis to understand the intrinsic effect/combined effects.
  • Worked with Data governance , Data quality , data lineage , Data architect to design various models and processes.
  • Independently coded new programs and designed Tables to load and test the program effectively for the given POC's using with Big Data / Hadoop .
  • Designed data models and data flow diagrams using Erwin and MSVisio .
  • As an Architect implemented MDM hub to provide clean, consistent data for a SOA implementation.
  • Developed, Implemented & Maintained the Conceptual , Logical & Physical Data Models using Erwin for Forward / ReverseEngineered Databases.
  • Established Data architecture strategy, best practices, standards, and roadmaps.
  • Lead the development and presentation of a dataanalytics data-hub prototype with the help of the other members of the emerging solutions team
  • Performed datacleaning and imputation of missing values using R .
  • Worked with Hadoop eco system covering HDFS , HBase , YARN and MapReduce
  • Take up ad-hoc requests based on different departments and locations
  • Used Hive to store the data and perform datacleaning steps for huge datasets.
  • Created dash boards and visualization on regular basis using ggplot2 and Tableau
  • Creating customized business reports and sharing insights to the management
  • Worked with BTEQ to submit SQL statements,import and export data, and generate reports in Teradata .
  • Interacted with the other departments to understand and identify dataneeds and requirements and work with other members of the ITorganization to deliver data visualization and reportingsolutions to address those needs

Environment: R 3.0, Erwin 9.5, Tableau 8.0, MDM, QlikView, MLLib, PL/SQL, HDFS, Teradata 14.1, JSON, HADOOP (HDFS), MapReduce, PIG, Spark, R Studio, MAHOUT, JAVA, HIVE, AWS.

Confidential, Washington, District of Columbia

Data Analyst

Responsibilities:

  • As an Architect design conceptual, logical and physical models using Erwin and build datamarts using hybrid Inmon and Kimball DW methodologies
  • Interaction with Business Analyst , SMEs and other Data Architects to understand Business needs and functionality for various project solutions
  • Researched, evaluated, architected, and deployed new tools, frameworks, and patterns to built sustainable Big Data platforms for the clients
  • Identifying and executing process improvements, hands-on in various technologies such as Oracle, Informatica , BusinessObjects .
  • Worked closely with business , datagovernance , SMEs and vendors to define data requirements.
  • Worked with data investigation, discovery and mapping tools to scan every single data record from many sources.
  • Designed the prototype of the Data mart and documented possible outcome from it for end-user.
  • Involved in business process modeling using UML
  • Developed and maintained data dictionary to create metadata reports for technical and business purpose.
  • Created SQLtables with referential integrity and developed queries using SQL, SQL*PLUS and PL/SQL
  • Experience in maintaining database architecture and metadata that support the Enterprise Datawarehouse .
  • Developed various QlikView Data Models by extracting and using the data from various sources files, DB2, Excel, Flat Files and Bigdata .
  • Handled importing data from various data sources, performed transformations using Hive , Map Reduce, and loaded data into HDFS .
  • Designed both 3NF data models for ODS , OLTP systems and dimensionaldatamodels using Star and SnowflakeSchemas .
  • Design, coding, unit testing of ETL package source marts and subject marts using Informatica ETL processes for Oracledatabase .

Environment: Erwin r9.0, Informatica 9.0, ODS, OLTP, Oracle 10g, Hive, OLAP, DB2, Metadata, MS Excel, Mainframes MS Visio, Rational Rose, Requisite Pro. Hadoop, PL/SQL, etc.

Confidential, SFO, CA

Python Developer

Responsibilities:

  • Participated in JAD session with business users and sponsors to understand and document the business requirements in alignment to the financial goals of the company.
  • Developed the logicaldata models and physical data models that confine existing condition/potential status data fundamentals and data flows using ERStudio
  • Created the conceptualmodel for the datawarehouse using Erwindatamodeling tool.
  • Reviewed and implemented the naming standards for the entities, attributes, alternate keys, and primary keys for the logicalmodel .
  • Performed second and third normalizations for ERdata model of OLTP system
  • Used External Loaders like MultiLoad , TPump and FastLoad to load data into Oracle and Database analysis, development, testing , implementation and deployment .
  • Designed , Build the Dimensions , cubes with starschema and SnowFlakeSchema using SQLServer Analysis Services ( SSAS ).
  • Translate business and data requirements into Logical data models in support of Enterprise Data Models , ODS , OLAP,OLTP , Operational Data Structures and Analytical systems.
  • Worked with data compliance teams, Data governance team to maintain data models , Metadata , Data Dictionaries; define source fields and its definitions.
  • Design and model the reporting data warehouse considering current and future reportingrequirement
  • Worked with Data Scientist in order to create a Data marts for data science specific functions.
  • Created stored procedures using PL / SQL and tuned the databases and backend process.
  • Determined data rules and conducted Logical and Physicaldesign reviews with business analysts, developers and DBAs .
  • Performed data analysis and data profiling using complex SQL on various sources systems including Teradata , SQLServer .
  • Involved in analysis of Businessrequirement , Design and Development of Highlevel and Low level designs , Unit and Integration testing
  • Reviewed the logicalmodel with application developers, ETL Team, DBAs and testing team to provide information about the data model and business requirements.
  • Involved in the daily maintenance of the database that involved monitoring the daily run of the scripts as well as troubleshooting in the event of any errors in the entire process.

Environment : Erwin 8, Teradata 13, SQL Server 2008, Oracle 9i, SQL*Loader, PL/SQL, ODS, OLAP, OLTP, SSAS, Informatica Power Center 8.1.

Confidential

R & SAS Programmer

Responsibilities:

  • Worked as a Data Modeler / Analyst to generate Data Models using Erwin and developed relational database system.
  • Analyzed the business requirements of the project by studying the Business Requirement Specification document.
  • Extensively worked on DataModeling tools ErwinDataModeler to design the datamodels .
  • Designedmapping to process the incremental changes that exists in the source table. Whenever source data elements were missing in source tables, these were modified/added in consistency with third normal form based OLTP source database.
  • Designed tables and implemented the naming conventions for Logical and PhysicalData Models in Erwin 7.0 .
  • Provide expertise and recommendations for physicaldatabasedesign,architecture , testing , performance tuning and implementation.
  • Designedlogical and physical data models for multiple OLTP and Analytic applications.
  • Extensively used the Erwin design tool & Erwin model manager to create and maintain the DataMart .
  • Designed the physical model for implementing the model into oracle9i physical data base.
  • Involved with DataAnalysis primarily Identifying DataSets , SourceData , Source Meta Data, Data Definitions and Data Formats
  • Performance tuning of the database, which includes indexes, and optimizing SQL statements, monitoring the server .
  • Wrote simple and advanced SQLqueries and scripts to create standard and adhoc reports for senior managers.
  • Collaborated the data mapping document from source to target and the data quality assessments for the source data.
  • Used Expert level understanding of different databases in combinations for Data extraction and loading , joiningdata extracted from different databases and loading to a specific database.
  • Co-ordinate with various business users, stakeholders and SME to get Functional expertise, design and business test scenarios review, UAT participation and validation of financial data .
  • Worked very close with Data Architects and DBA team to implement data model changes in database in all environments.
  • Created PL/SQL packages and DatabaseTriggers and developed user procedures and prepared user manuals for the new programs.
  • Performed performance improvement of the existing Data warehouse applications to increase efficiency of the existing system.
  • Designed and developed UseCase, Activity Diagrams, Sequence Diagrams, OOD (Object oriented Design) using UML and Visio .

Environment : Erwin r7.0, SQL Server 2000/2005, Windows XP/NT/2000, Oracle 8i/9i, MS-DTS, UML, UAT, SQL Loader, OOD, OLTP, PL/SQL, MS Visio, Informatica.

Confidential

SAS Developer/Analyst

Responsibilities:

  • Integrates all transaction data from multiple data sources used by Actuarial into a single repository.
  • Implement fully automated data flow into Actuarial front end (Excel) Models using SAS process.
  • Creating SAS programs using SASDI Studio.
  • Validated the entire data process using SAS and BI tools.
  • Documenting of service requests by business users, developed code documentation, logs and outputs documentation, creating Test Plans and Production Release Notices for QC , QA and Production teams to perform further analysis.
  • Extensively used PROCSQL for column modifications, field populations on warehouse tables.
  • Additional responsibilities being Requirements gathering , Designing , Coding and Analysis , Testing , Debugging , Output generations in prescribed formats, extensive documentation of SAS Programs and Macros.
  • Developed distinct OLAP Cubes from SASDataset and generated results into the excel sheets.
  • Involved in discussions with business users to define metadata for tables to perform ETL process.

Environment: Python 2.7, Windows, MySQL, ETL, Ansibleflask and Python Libraries such as Numpy, sqlalchemy, Angular Js, MySQL DB.

Hire Now