Data Scientist Resume
Sterling, VA
SUMMARY
- Around 8+ years of experience in Data Analysis, Data Profiling, Data Integration, Migration, Data governance and Metadata Management, Master Data Management and Configuration Management.
- Extensive experience in Text Analytics, developing different Statistical Machine Learning, Data Mining solutions to various business problems and generating data visualizations using R, Python and Tableau.
- Designing of Physical Data Architecture of New system engines.
- Hands on SparkMlib utilities such as including classification, regression, clustering, collaborative filtering, dimensionality reduction.
- Having good experience in NLP with Apache, Hadoop and Python.
- Hands on experience in implementing LDA, Naïve Bayes and skilled in Random Forests, Decision Trees, Linear and Logistic Regression, SVM, Clustering, neural networks, Principle Component Analysis and good knowledge on Recommender Systems.
- Excellent Knowledge in Relational Database Design, Data Warehouse/OLAP concepts and methodologies.
- Collaborated with the lead Data Architect to model the Data warehouse in accordance to FSLDM subject areas, 3NF format, Snow flake schema.
- Worked and extracteddatafrom various database sources like Oracle, SQL Server and DB2.
- Mapping and tracingdatafrom system to system in order to establishdatahierarchy andlineage.
- Experience in coding SQL/PL SQL using Procedures, Triggers and Packages.
- Extensive experience in Text Analytics, developing different StatisticalMachineLearning, DataMining solutions to various business problems and generating datavisualizations using R, Python.
- Expertise in transforming business requirements into analyticalmodels, designingalgorithms, buildingmodels, developing datamining and reportingsolutions that scales across massive volume of structured and unstructured data.
- Experience on advanced SAS programming techniques, such as PROC SQL (JOIN/ UNION), PROC APPEND, PROC DATASETS, and PROC TRANSPOSE.
- Developing Logical Data Architecture with adherence to Enterprise Architecture.
- Strong experience in Software Development Life Cycle (SDLC) including Requirements Analysis, Design Specification and Testing as per Cycle in both Waterfall and Agile methodologies.
- Adept in statistical programming languages like Rand also Python including BigData technologies like Hadoop, Hive.
- Skilled in using dplyr and pandas in R and python for performing Exploratory data analysis.
- Experience working withdatamodeling tools like Erwin, PowerDesigner and ERStudio.
- Experience in designing star schema, Snow flake schema forDataWarehouse, ODS architecture.
- Experience in designing stunning visualizations using Tableau software and publishing and presenting dashboards, Storyline on web and desktop platforms.
- Experience and Technical proficiency in Designing,DataModeling Online Applications, Solution Lead for ArchitectingDataWarehouse/Business Intelligence Applications.
- Good understanding of Teradata SQL Assistant, Teradata Administrator anddataload/ export utilities like BTEQ, FastLoad, MultiLoad, FastExport.
- Experience with Data Analytics, Data Reporting, Ad - hoc Reporting, Graphs, Scales, Pivot Tables and OLAP reporting.
- Experience in maintaining database architecture and metadata that support the EnterpriseDatawarehouse.
- Highly skilled in using Hadoop (pig and Hive) for basic analysis and extraction of data in the infrastructure to provide data summarization.
- Highly skilled in using visualization tools like Tableau, ggplot2 and d3.js for creating dashboards.
- Worked and extracteddatafrom various database sources like Oracle, SQL Server, DB2, Regularly accessing JIRA tool and other internal issue trackers for the Project development.
- Skilled in System Analysis, E-R/DimensionalDataModeling, Database Design and implementing RDBMS specific features.
- Knowledge of working with Proof of Concepts (PoC’s) and gap analysis and gathered necessary data for analysis from different sources, prepared data for data exploration using data munging and Teradata.
- Well experienced in Normalization and De-Normalization techniques for optimum performance in relational and dimensional database environments.
TECHNICAL SKILLS
Technologies: AMX BW, BE and EMS, Java, AMX BW, Active Spaces and EMS, MX BW, CLE, HL7, HIPPA and EMS Active Matrix Suite (AMX BW, AMX BPM, BE, CIM), Spring ROO, GI, Oracle,, JMS, ADB adapter, File Adapter, Hawk, AS400.
Frameworks: Microsoft .Net 4.5/ 4.0/ 3.5/3.0 , Entity Framework, Bootstrap, Microsoft Azure, Swagger.
Databases: Oracle, MongoDB, SQL Server 2014/2012/2008/2005/2000 , MS-Access, Teradata, big data, hadoop.
Data Modeling Tools: Erwin, ER/Studio, Star-Schema Modeling, Snowflake-Schema Modeling, FACT and dimension tables, Pivot Tables.
Version Controller: TFS, Microsoft Visual SourceSafe, GIT, NUNIT, MSUNIT
Database Tools: SQL Server Query Analyzer.
Data Mining: Data reduction, Clustering, Classification, Anomaly detection, sequence mining (HMM model), Text mining
Microsoft Technologies: PHP,Scala2,Shark2,Awk,Cascading,Cassandra,Clojure,Fortran,JavaScript,JMP,Mahout,objectiveC,QlickView,Redis,Redshifed
Web Technologies: Windows API, Web Services, Web API (RESTFUL) HTML5, XHTML, CSS3, AJAX, XML, XAML,MSMQ, Silverlight, Kendo UI.
Web Servers: IIS 5.0, IIS 6.0, IIS 7.5, IIS ADMIN.
Programming Languages: C#, VB.NET (VB6), VBScript, OOPS, Data structures, Algorithms, Python, R, Java, Java Script, SQL, J2EE, C, C++ and XML.
Development Tools: R x 30, SQL x 27, Python x 22, Hadoop x 19, SAS x 18, Java x15, Hive x 13, Mat lab x 12, R Studio,SAS, MSOffice,Visual Studio 2010
PROFESSIONAL EXPERIENCE
Confidential - Sterling, VA
Data Scientist
Responsibilities:
- As an Architect design conceptual, logical and physical models using Erwin and build datamarts using hybrid Inmon and Kimball DW methodologies.
- Worked closely with business, datagovernance, SMEs and vendors to define data requirements.
- Worked with data investigation, discovery and mapping tools to scan every single data record from many sources.
- Designed the prototype of theDatamart and documented possible outcome from it for end-user.
- Involved in business process modeling using UML
- Developed and maintaineddatadictionary to create metadata reports for technical and business purpose.
- Created SQLtables with referential integrity and developed queries using SQL, SQL*PLUS and PL/SQL.
- Implementing SparkMlib utilities such as including classification, regression, clustering, collaborative filtering and dimensionality reduction.
- Design, coding, unit testing of ETL package source marts and subject marts using Informatica ETL processes for Oracledatabase.
- Developed various QlikView Data Models by extracting and using the data from various sources files, DB2, Excel, Flat Files and Bigdata.
- Handled importing data from various data sources, performed transformations using Hive, Map Reduce, and loaded data into HDFS.
- Interaction with Business Analyst, SMEs and other Data Architects to understand Business needs and functionality for various project solutions
- Researched, evaluated, architected, and deployed new tools, frameworks, and patterns to built sustainable Big Data platforms for the clients
- Identifying and executing process improvements, hands-on in various technologies such as Oracle, Informatica, BusinessObjects.
- Designed both 3NF data models for ODS, OLTP systems and dimensionaldatamodels using Star and SnowflakeSchemas.
Environment: r9.0, Informatica 9.0, ODS, OLTP, Oracle 10g, Hive, OLAP, DB2, Metadata, MS Excel, Mainframes MS Visio, Rational Rose, Requisite Pro., Hadoop, PL/SQL, etc.
Confidential, Tallahassee, Florida
Data Scientist
Responsibilities:
- Extracted data from HDFS and prepared data for exploratory analysis using datamunging
- Built models using Statisticaltechniques like Bayesian HMM and MachineLearning classification models like XGBoost, SVM, and RandomForest.
- A highly immersive DataScience program involving DataManipulation&Visualization, Web Scraping, MachineLearning, Python programming, SQL, GIT, Unix Commands, NoSQL, MongoDB, Hadoop.
- Setup storage and dataanalysis tools in AmazonWebServices cloud computing infrastructure.
- Used pandas, numpy, seaborn, scipy, matplotlib, scikit-learn, NLTK in Python for developing various machinelearningalgorithms.
- Installed and used CaffeDeepLearningFramework
- Worked on different data formats such as JSON, XML and performed machinelearningalgorithms in Python.
- Worked asDataArchitectsand ITArchitectsto understand the movement ofdataand its storage and ERStudio9.7
- Participated in all phases of datamining; datacollection, datacleaning, developingmodels, validation, visualization and performed Gapanalysis.
- DataManipulation and Aggregation from different source using Nexus, Toad, BusinessObjects, PowerBI and SmartView.
- Implemented Agile Methodology for building an internal application.
- Focus on integration overlap and Informatica newer commitment to MDM with the acquisition of Identity Systems.
- Good knowledge of HadoopArchitecture and various components such as HDFS, JobTracker, TaskTracker, NameNode, DataNode, SecondaryNameNode, and MapReduce concepts.
- AsArchitectdelivered various complex OLAPdatabases/cubes, scorecards, dashboards and reports.
- Programmed a utility in Python that used multiple packages (scipy, numpy, pandas)
- Implemented Classification using supervised algorithms like LogisticRegression, Decisiontrees, KNN, NaiveBayes.
- Used Teradata15 utilities such as FastExport, MLOAD for handling various tasks data migration/ETL from OLTP Source Systems to OLAP Target Systems
- Experience in Hadoop ecosystem components like HadoopMapReduce, HDFS, HBase, Oozie, Hive, Sqoop, Pig, Flume including their installation and configuration.
- Updated Pythonscripts to match trainingdatawith our database stored in AWSCloud Search, so that we would be able to assigneach document a response label for further classification.
- Datatransformation from various resources,dataorganization, features extraction from raw and stored.
- Validated the machine learning classifiers using ROC Curves and LiftCharts.
Environment: Unix, Python 3.5,, MLLib, SAS, regression, logistic regression, Hadoop 2.7, NoSQL, Teradata, OLTP, random forest, OLAP, HDFS, ODS, NLTK, SVM, JSON, XML and MapReduce.
Confidential
Data Scientist
Responsibilities:
- Worked with data investigation, discovery and mapping tools to scan every single data record from many sources.
- Designed the prototype of the Data mart and documented possible outcome from it for end-user.
- Involved in business process modeling using UML
- Developed and maintained data dictionary to create metadata reports for technical and business purpose.
- Created SQL tables with referential integrity and developed queries using SQL, SQL*PLUS and PL/SQL
- Experience in maintaining database architecture and metadata that support the Enterprise Dataware house.
- Design, coding, unit testing of ETL package source marts and subject marts using Informatica ETL processes for Oracle database.
- Developed various QlikView Data Models by extracting and using the data from various sources files, DB2, Excel, Flat Files and Bigdata.
- Handled importing data from various data sources, performed transformations using Hive, Map Reduce, and loaded data into HDFS.
- Interaction with Business Analyst, SMEs, and other Data Architects to understand Business needs and functionality for various project solutions
- Researched, evaluated, architected, and deployed new tools, frameworks, and patterns to build sustainable Big Data platforms for the clients.
- Identifying and executing process improvements, hands-on in various technologies such as Oracle, Informatica, BusinessObjects.
- Designed both 3NF data models for ODS, OLTP systems and dimensional data models using Star and Snowflake Schemas.
- Collaborated the data mapping document from a source to target and the data quality assessments for the source data.
- Used Expert level understanding of different databases in combinations for Data extraction and loading, joining data extracted from different databases and loading to a specific database.
- Co-ordinate with various business users, stakeholders, and SME to get Functional expertise, design, and business test scenarios review, UAT participation, and validation of financial data.
- Worked very close with Data Architects and DBA team to implement data model changes in the database in all environments.
- Created PL/SQL packages and Database Triggers and developed user procedures and prepared user manuals for the new programs.
- Performed performance improvement of the existing Data warehouse applications to increase the efficiency of the existing system.
- Designed and developed Use Case, Activity Diagrams, Sequence Diagrams, OOD (Object-oriented Design) using UML and Visio.
Environment: r9.0, Informatica 9.0, ODS, OLTP, Oracle 10g, Hive, OLAP, DB2, Metadata, MS Excel, Mainframes MS Visio, Rational Rose, Requisite Pro. Hadoop, PL/SQL, etc.