Data Scientist Resume
Mansfield, MA
PROFESSIONAL SUMMARY:
- Around 8 years of IT industry experience encompassing in Machine Learning , Data Mining with large datasets of Structured and Unstructured data, Data Acquisition, Data Validation, Predictive modeling , Data Visualization .
- Extensive experience in Text Analytics , developing different statistical machine learning , Data Mining solutions to various business problems and generating data visualizations using R , Python , and Tableau .
- Over 2+Experience with Machine learning techniques and algorithms (such as k - NN, Naive Bayes, etc.)
- Experience object-oriented programming (OOP) concepts using Python , C++ and PHP.
- Have knowledge on advanced SAS programming techniques, such as PROC SQL (JOIN/ UNION), PROC APPEND, PROC DATASETS , and PROC TRANSPOSE .
- Integration Architect & Data Scientist experience in Analytics, Big Data , BPM , SOA , ETL and Cloud technologies .
- Highly skilled in using visualization tools like Tableau , ggplot2 , and d3.js for creating dashboards.
- Tagging of experience in foundational machine learning models and concepts : regression , random forest , boosting, GBM , NNs, HMMs, CRFs, MRFs , deep learning .
- Proficiency in understanding statistical and other tools/languages - R , Python , C, C++, Java, SQL , UNIX, QlikView data visualization tool and Anaplan forecasting tool.
- Proficient in the Integration of various data sources with multiple relational databases like Oracle/, MS SQL Server, DB2, Teradata and Flat Files into the staging area, ODS, Data Warehouse and Data Mart .
- Familiar on Deep learning projects for image identification CNN , RNN for stock price prediction autoencoders for Movie Recommender System (PyTorch), Image captioning ( CNN - RNN autoencoder architecture).
- Exposure to AI and Deep learning platforms/methodologies like tensorflow, RNN, LSTM
- Build LSTM neural network for text, like item description, comments.
- Have experience in Artificial Intelligence Chatbots.
- Build deep neural network with output of LSTM and other features.
- Experience in Extracting data for creating Value Added Datasets using Python , R , Azure and SQL to analyze the behavior to target a specific set of customers to obtain hidden insights within the data to effectively implement the project Objectives.
- Worked with NoSQL Database including HBase, Cassandra and MongoDB .
- Extensively worked on statistical analysis tools and adept at writing code in Advanced Excel , R , MATLAB , Python .
- Implemented deep learning models and numerical Computation with the help of data flow graphs using Tensorflow Machine Learning .
- Worked with complex applications such as R , Stata , Scala , Perl , Linear , and SPSS to develop a neural network , cluster analysis .
- Experienced the full software lifecycle in SDLC , Agile and Scrum methodologies.
- Experience in designing stunning visualizations using Tableau software and publishing and presenting dashboards, Storyline on web and desktop platforms.
- Hands on experience in implementing LDA , Naive Bayes and skilled in Random Forests , Decision Trees , Linear and Logistic Regression , SVM , Clustering , neural networks, Principal Component Analysis and good knowledge of Recommender Systems.
- Experienced with machine learning algorithms such as logistic regression , random forest , XP boost, KNN , SVM , neural network, linear regression, lasso regression and k-f
- Strong experience in Software Development Life Cycle (SDLC) including Requirements Analysis, Design Specification and Testing as per Cycle in both Waterfall and Agile methodologies.
- Skilled in using dplyr and pandas in R and python for performing Exploratory data analysis.
- Experience working with data modeling tools like Erwin , Power Designer and ER Studio.
- Experience with data analytics , data reporting , Ad-hoc reporting, Graphs, Scales, PivotTables and OLAP reporting.
- Highly skilled in using visualization tools like Tableau , ggplot 2, and d3.js for creating dashboards.
- Worked and extracted data from various database sources like Oracle, SQL Server , DB2 , and Teradata .
- Proficient knowledge of statistics , mathematics , machine learning , recommendation algorithms and analytics with an excellent understanding of business operations and analytics tools for effective analysis of data.
TECHNICAL SKILLS:
Languages: Python, R Machine Learning Regression, Polynomial Regression, Random Forest, Logistic Regression, Decision Trees, Classification, Clustering, Association, Simple/Multiple linear, Kernel SVM, K-Nearest Neighbors (K-NN),
OLAP/ BI / ETL Tool: Business Objects 6.1/XI, MS SQL Server 2008/2005 Analysis Services (MS OLAP, SSAS), Integration Services (SSIS), Reporting Services (SSRS), Performance Point Server (PPS), Oracle 9i OLAP, MS Office Web Components (OWC11), DTS, MDX, Crystal Reports 10, Crystal Enterprise 10(CMC)
Web Technologies: JDBC, HTML5, DHTML and XML, CSS3, Web Services, WSDL Tools Erwin r 9.6, 9.5, 9.1, 8.x, Rational Rose, ER/Studio, MS Visio, SAP Powerdesigner.
Big Data Technologies
: sparkpeg, Hive, HDFS, MapReduce, Pig, Kafka.
Databases: SQL, Hive, Impala, Pig, Spark SQL, Databases SQL-Server, MySQL, MS Access, HDFS, HBase, Teradata, Netezza, Mongodb, Cassandra, SAP HANA.
Reporting Tools: MS Office (Word/Excel/Powerpoint/ Visio), Tableau, Crystal reports XI, Business Intelligence, SSRS, Business Objects 5.x/ 6.x, Cognos 7.0/6.0.
Version Control Tools: SVM, GitHub.
Project Execution Methodologies: Ralph Kimball and Bill Inmon data warehousing methodology, Rational Unified Process (RUP), Rapid Application Development (RAD), Joint Application Development (JAD).
BI Tools: Tableau, Tableau server, Tableau Reader, SAP Business Objects, OBIEE, QlikView, SAP Business Intelligence, Amazon Redshift, or Azure Data Warehouse
Operating System: Windows, Linux, Unix, Macintosh HD, Red Hat.
PROFESSIONAL EXPERIENCE:
Data Scientist
Confidential, Mansfield, MA
Responsibilities:
- Performing data profiling and analysis on different source systems that are required for Customer Master.
- Identifying the Customer and account .butes required for MDM implementation from disparate sources and preparing detailed documentation.
- Used T - SQL queries to pull the data from disparate systems and Data warehouse in different environments.
- Worked closely with the Data Governance Office team in assessing the source systems for project deliverables.
- Presented DQ analysis reports and scorecards on all the validated data elements and presented -to the business teams and stakeholders.
- Used Data Quality validation techniques to validate Critical Data elements (CDE) and identified various anomalies.
- Developed clinical NLP methods that ingest large unstructured clinical data sets, separate signal from noise, and provide personalized insights at the patient level that directly improve our analytics platform.
- Used NLP methods for information extraction, topic modeling, parsing, and relationship extraction.
- Worked with the NLTK library for NLP data processing and finding the patterns.
- Extensively used open source tools - R Studio(R) and Spyder (Python) for statistical analysis and building the machine learning.
- Involved in defining the Source Of business rules, Target data mappings, and data definitions. Performing Data Validation / Data Reconciliation between the disparate source and target systems (Salesforce, Cisco-UIC, Cognos, Data Warehouse) for various projects.
- Interacting with the Business teams and Project Managers to clearly articulate the anomalies, issues, findings during data validation.
- Writing complex SQL queries for validating the data against different kinds of reports generated by Cognos.
- Extracting data from different databases as per the business requirements using SQL Server Management Studio.
- Work with the Data Governance group to identify, classify and define each assigned Critical Data Element (CDEs) and ensure that each element has a clear and unambiguous definition.
- Analyzed data lineage processes and documentation for the CDEs to identify vulnerable points, control gaps, data quality issues, and overall lack of data governance.
- Proposed data checks and standard operating procedures on the source systems to enhance data quality
- Reviewed various Project Management documents such as Business Requirements document, Functional Specification document and suggested changes to ensure it complies with policies and standards.
- Worked with the Data Governance group in creating a custom data dictionary template to be used across the various business lines.
- Worked with data stewards to ensure awareness of data quality standards and data requirements
- Linked data lineage to data quality and business glossary work within the overall data governance program.
- Managed communication and with data owners/stewards to ensure awareness of policies and standards
- Gathered requirements by working with the business users on Business Glossary, Data Dictionary, and data
- Generating weekly, monthly reports for various business users according to the business requirements. Manipulating/mining data from database tables (Redshift, Oracle, Data Warehouse).
- Create statistical models using distributed and standalone models to build various diagnostics, predictive and prescriptive solution.
- Interface with other technology teams to load (ETL), extract and transform data from a wide variety of data sources.
- Provides input and recommendations on technical issues to Business & Data Analysts, BI Engineers, and Data Scientists.
Environment: Data Governance, SQL Server, ETL, MS Office Suite - Excel (Pivot, VLOOKUP), DB2, R, Python, Visio, HP ALM, Agile, Azure, Data Quality, Tableau and Data Management.
Data Scientist
Confidential, EL Segundo, CA
Responsibilities:
- Setup storage and data analysis tools in Amazon Web Services cloud computing infrastructure.
- Used pandas, NumPy, Seaborn, SciPy, matplotlib, scikit - learn, NLTK in Python for developing various machine learning algorithms.
- Installed and used Caffe Deep Learning Framework
- Worked on different data formats such as JSON, XML and performed machine learning algorithms in Python.
- Worked as Data Architects and IT Architects to understand the movement of data and its storage and ER Studio 9.7
- Participated in all phases of data mining; data collection, data cleaning, developing models, validation, visualization and performed Gap analysis.
- Data Manipulation and Aggregation from a different source using Nexus, Toad, Business Objects, Powerball, and Smart View.
- Implemented Agile Methodology for building an internal application.
- Good knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, SecondaryNameNode, and MapReduce concepts.
- As Architect delivered various complex OLAP databases/cubes, scorecards, dashboards and reports.
- Programmed by a utility in Python that used multiple packages (SciPy, NumPy, pandas)
- Implemented Classification using supervised algorithms like Logistic Regression, Decision trees, KNN, Naive Bayes.
- Responsible for design and development of advanced R/Python programs to prepare to transform and harmonize data sets in preparation for modeling.
- Worked on different data formats such as JSON, XML and performed machine learning algorithms in Python.
- Worked as Data Architects and Architects to understand the movement of data and its storage and ER Studio 9.7
- Participated in all phases of data mining; data collection, data cleaning, developing models, validation, visualization and performed Gap analysis.
- Data Manipulation and Aggregation from a different source using Nexus, Toad, BusinessObjects, PowerBALL,and SmartView .
- Updated Python scripts to match data with our database stored in AWS Cloud Search, so that we would be able to assign each document a response label for further classification.
- Data transformation from various resources, data organization, features extraction from raw and stored.
- Handled importing data from various data sources, performed transformations using Hive, MapReduce, and loaded data into HDFS.
- Interaction with Business Analyst, SMEs, and other Data Architects to understand Business needs and functionality for various project solutions
- Researched, evaluated, architected, and deployed new tools, frameworks, and patterns to build sustainable Big Data platforms for the clients.
- Updated Python scripts to match data with our database stored in AWS Cloud Search, so that we would be able to assign each document a response label for further classification.
- Data transformation from various resources, data organization, features extraction from raw and stored.
- Handled importing data from various data sources, performed transformations using Hive, MapReduce, and loaded data into HDFS.
- Identifying and executing process improvements, hands-on in various technologies such as Oracle, Informatica, Business Objects.
- Designed both 3NF data models for ODS, OLTP systems and dimensional data models using Star and Snowflake Schemas.
Environment: R 9.0, ODS, OLTP, Big Data, Oracle 10g, Hive, OLAP, DB2, Metadata, Python, MS Excel, Mainframes MS Vision, Rational Rose.
Data Analyst
Confidential, New York, NY
Responsibilities:
- Data analysis and reporting using MySQL, MS PowerPoint, MS Access and SQL assistant.
- Involved in MySQL, MS PowerPoint, MS Access Database design and design new database on Netezza which will have optimized outcome.
- Involved in writing T - SQL, working on SSIS, SSRS, SSAS, Data Cleansing, Data Scrubbing and Data Migration.
- Involved in writing scripts for loading data to target data Warehouse using Bteq, FastLoad, Multiload.
- Create ETL scripts using Regular Expressions and custom tools (Informatica, Pentaho, and SyncSort) to ETL data.
- Developed SQL Service Broker to flow and sync of data from MS-I to Microsoft's master database management (MDM).
- Involved in loading data between Netezza tables using NZSQL utility.
- Worked on Data Modeling using Dimensional Data Modeling, Star Schema/SnowFlake schema, and Fact Dimensional, Physical Logical data modeling.
- Generate Statspack/AWR reports from Oracle Database and analyzed the reports for Oracle wait events, time consuming SQL queries, tablespace growth, and database growth.
Environment: MySQL, MS Power Point, MS Access, MYSQL, MS Power Point, MS Access, Netezza, DB2, TSQL, DTS, SSIS, SSRS, SSAS, ETL, MDM, Teradata, Oracle, Star Schema and Snowflake Schema.
Data Scientist
Confidential, Boston, MA
Responsibilities:
- Provided the architectural leadership in shaping strategic , business technology projects, with an emphasis on application architecture.
- Utilized domain knowledge and application portfolio knowledge to play a key role in defining the future state of large, business technology programs.
- Participated in all phases of data mining , data collection , data cleaning , developing models, validation, and visualization and performed Gap analysis.
- Developed MapReduce/Spark Python modules for machine learning & predictive analytics in Hadoop on AWS . Implemented a Python-based distributed random forest via Python streaming .
- Created ecosystem models (e.g. conceptual, logical, physical, canonical) that are required for supporting services within the enterprise data architecture (conceptual data model for defining the major subject areas used, ecosystem logical model for defining standard business meaning for entities and fields, and an ecosystem canonical model for defining the standard messages and formats to be used in data integration services throughout the ecosystem).
- Used Pandas, NumPy , seaborn , SciPy , Matplotlib, Scikit-learn , NLTK in Python for developing various machine learning algorithms and utilized machine learning algorithms such as linear regression , multivariate regression, naive Bayes, Random Forests, K-means , & KNN for data analysis.
- Developed a classification system for finding clickbait in media content using scikit-learn and TensorFlow.
- Spearheaded chatbot development initiative to improve customer interaction with application.
- Developed the chatbot using api.ai .
- Automated csv to chatbot friendly Json transformation by writing NLP scripts to minimize development time by 20%.
- Conducted studies, rapid plots and using advance data mining and statistical modelling techniques to build a solution that optimize the quality and performance of data.
- Demonstrated experience in design and implementation of Statistical models , Predictive models , enterprise data model, metadata solution and data life cycle management in both RDBMS , Big Data environments.
- Analyzed large data sets apply machine learning techniques and develop predictive models, statistical models and developing and enhancing statistical models by leveraging best-in-class modeling techniques.
- Designed model to monitor the illegal waste dumping activity using the Caffe deep learning. Implemented Tensor RT on Caffe model to increase memory efficiency.
- Worked on database design, relational integrity constraints, OLAP , OLTP , Cubes and Normalization (3NF) and De-normaliza tion of the database.
- Developed MapReduce/Spark Python modules for machine learning & predictive analytics in Hadoop on AWS.
- Worked on customer segmentation using an unsupervised learning technique - clustering .
- Worked with various Teradata15 tools and utilities like Teradata Viewpoint , Multi Load , ARC , Teradata Administrator, BTEQ and other Teradata Utilities.
- Utilized Spark , Scala , Hadoop , HBase , Kafka, Spark Streaming, MLlib , Python , a broad variety of machine learning methods including classifications, regressions, dimensionally reduction etc.
- Developed LINUX Shell scripts by using NZSQL/NZLOAD utilities to load data from flat files to Netezza database.
- Designed and implemented system architecture for Amazon EC2 based cloud-hosted solution for the client.
- Tested Complex ETL Mappings and Sessions based on business user requirements and business rules to load data from source flat files and RDBMS tables to target tables.
Environment: Erwin r9.6, Python, SQL, Oracle 12c, Netezza, SQL Server,SSRS, PL/SQL, T-SQL, Tableau, MLlib, regression, Cluster analysis, Scala NLP, Spark, Kafka, MongoDB, logistic regression, Hadoop, PySpark, Teradata, random forest, OLAP, Deep Learning (Keras, TensorFlow), Azure, MariaDB, SAP CRM, HDFS, ODS, NLTK, SVM, JSON, Tableau, XML, Cassandra, MapReduce, AWS.
Data Analyst
Confidential
Responsibilities:
- Created new reports based on requirements.
- Responsible for Generating Weekly ad - hoc Reports
- Planned, coordinated, and monitored project levels of performance and activities to ensure project completion in time.
- Automated and scheduled recurring reporting processes using UNIX shell scripting and Teradata utilities such as MLOAD, BTEQ , and Fast Load
- Experience with Perl .
- Worked in a Scrum Agile process & Writing Stories with two-week iterations delivering a product for each iteration
- Worked on transferring the data files to the vendor through sftp Ftp process
- Involved in defining and Constructing the customer to customer relationships based on Association with an account & customer
- Created action filters, parameters and calculated sets for preparing dashboards and worksheets in Tableau.
- Experience in performing Tableau administering by using tableau admin commands.
- Worked with architects and, assisting in the development of current and target state enterprise-level data architectures
- Worked with project team representatives to ensure that logical and physical data models were developed in line with corporate standards and guidelines.
- Involved in defining the source to target data mappings, business rules, and data definitions. Responsible for defining the key identifiers for each mapping/interface.
- Performed data analysis and data profiling using complex SQL on various sources systems including Oracle and Teradata.
- Migrated three critical reporting systems to Business Objects and Web Intelligence on a Teradata platform
- Created Excel charts and pivot tables for the Adhoc data pull.
Environment: MS Office Suite, MS Visio, MS SharePoint, Test Management Tool, MS Project, Crystal report, HTML.
