We provide IT Staff Augmentation Services!

Data Scientist Consultant Resume

2.00/5 (Submit Your Rating)

St Louis, MO

SUMMARY:

  • Having 7+ years of experience in Data analysis, Data Mining, Building Machine Learning models, Statistical Models wif large data sets of Structured, Semi - structured and Unstructured data.
  • Extensively involved in Data Cleaning, Data Manipulation, Feature Engineering, Modelling, Evaluation, Optimization, Testing and Deployment using Python, R, SQL
  • Experience in building Machine Learning Models: Decision Trees, SVM, KNN, Logistic Regression, Naïve Bayes, LDA, Linear Regression, XGBoost, Random Forest, PCA, K-means, DBSCAN, Hierarchal-clustering including Deep Learning Models: Convolution Neural Network (CNN), Recurrent Neural Network (RNN), LSTM using TensorFlow, Keras, PyTorch
  • Experience working on Data Visualization tools - Tableau, Matplotlib, Seaborn, Bokeh, ggplot
  • Experience working wif Hadoop BigData Tools - Hadoop, MapReduce, PIG, HIVE, SPARK in Cloud Platform (AWS, Azure)
  • Worked on Azure ML studio in building teh Machine Learning Pipeline and creating Webservices for Predictive Analytics. Additional Development Experience:
  • Experienced in lasted BI tools like Tableau, Power BI, Qlik Sense, Qlik View.
  • Well versed wif Linear/non-linear regression and classification modeling predictive algorithms built predictive analytics models to generate actionable insights.
  • Experience in developing different Statistical, Machine Learning, Text Analytics,DataMining solutions to various business generating and problemsdatavisualizations using R and Tableau.
  • Created dashboards as part of Data Visualization using Tableau and Power BI.
  • Performed preliminary data analysis using descriptive statistics and handled anomalies such as removing duplicates and imputing missing values using Talend tool.
  • Excellent noledge and experience in OLTP/OLAP System Study wif focus on Oracle Hyperion Suite technology, developing Database Schemas like Star schema and Snowflake schema (Fact Tables, Dimension Tables) used in relational, dimensional and multidimensional modeling, physical and logical Data modeling using Erwin tool.
  • Expertise in building Supervised and Unsupervised Machine Learning experiments using Microsoft Azure utilizing multiple algorithms to perform detailed predictive analytics and building Web Services models for all types of data continuous, nominal, and ordinal.
  • Validate teh consolidated data and develop teh model dat best fits teh data. Interpret data from multiple sources, consolidate it, and perform data cleansing using R Studio.
  • Good understanding of relational databases involved in application development using several RDBMS likes Oracle, MS SQL Server, MySQL, DB2 and Informix.
  • Worked wif NoSQL Database including HBase, Cassandra and MongoDB.
  • Extensively worked on statistical analysis tools and adept at writing code in Advanced Excel, R, MATLAB, Python.
  • Experience in working on version control systems GIT and used Source code management client tools like GitBash, GitHub, GitGUI and other command line applications etc.
  • Expertise in using Linear & Logistic Regression and Classification Modeling, Decision-trees, TEMPPrincipal Component Analysis (PCA), Cluster and Segmentation analyses, and has authored and co-authored several scholarly articles applying these techniques.
  • Strong experience in Software Development Life Cycle (SDLC) including Requirements Analysis, Design Specification and Testing as per Cycle in both Waterfall and Agile methodologies.
  • Skilled in bug fixing, debugging and problem solving.
  • Excellent Communication, interpersonal, analytical skills and strong ability to perform in a team as well as individually.
  • Worked on building teh Convolution Neural Network (CNN) for teh Image Classification, Object Detection, Facial Recognition using YOLO, Keras
  • Experience in building Natural Language Processing application using RNN, LSTM for teh Trigger word Detection, Sequencing names generation.
  • Worked on Interactive Data Visualization tools - Bokeh, Seaborn, Matplotlib, Tableau for teh exploratory Data analysis

TECHNICAL SKILLS:

Languages: C, C++, Java, Python, R, SQL, PL/SQL

Databases: SQL Server, MS Access, Oracle 11g/10g/9i, Teradata, Big Data, Hadoop, HBase, Netezza, Mongo DB, Cassandra

BI/ ETL Tools: Business Objects 6.1/XI, MS SQL Server 2008/2005 Analysis Services (MS OLAP, SSAS), Integration Services (SSIS), Reporting Services (SSRS), Performance Point Server (PPS), Oracle 9i OLAP, MS Office Web Components (OWC11), DTS, MDX, Crystal Reports 10, Crystal Enterprise 10(CMC)

Big Data Technologies: Hadoop, HDFS, MapReduce, Hive, PIG

Machine Learning: Linear Regression, Logistic Regression, Decision Trees, SVM, Naïve Bayes, LDA, KNN, XGBoost, Random Forest, PCA, K-means, DBSCAN, Hierarchal clustering

ML Libraries: NumPy, Sci-kit Learn, Pandas, Matplotlib, Seaborn, Bokeh, Beautiful Soup, TensorFlow, Keras, PyTorch

Deep Learning: Convolution Neural Network (CNN), Recurrent Neural Network (RNN), LSTM, Artificial Neural Network (ANN), Deep Neural Network (DNN)

Operating System: UNIX, Windows XP, Home/Professional/7/10, MAC OS X

Reporting Tools: MS Office (Word/Excel/Power Point), Tableau, Business Intelligence, Business Objects 5.x/6.x, Cognos 7.0/6.0

Version Control Tool: SVM, GitHub

PROFESSIONAL EXPERIENCE:

Confidential, St. Louis, MO

Data Scientist Consultant

Responsibilities:

  • Worked closely wif customers to understand their current technical environment, key business drivers and future technology requirements
  • Participated in all phases of data mining, data collection, data cleaning, developing models, validation, and visualization and performed Gap analysis.
  • Developed MapReduce/Spark Python modules for machine learning & predictive analytics in Hadoop on AWS.
  • Developed project proposals and Statements of Work based on teh gatheird requirements and teh proposed solution
  • Used MLlib, Spark'sMachine learning library to build and evaluate different models.
  • Performed Data Cleaning, features scaling, features engineering using pandas and NumPy packages in Python.
  • Worked according to teh software development life cycle.
  • Used pandas, NumPy, Seaborn, SciPy, matplotlib, sci-kit-learn, NLTK in Python for developing various machine learning algorithms.
  • Analyzed large data sets apply machine learning techniques and develop predictive models, statistical models and developing and enhancing statistical models by leveraging best-in-class modeling techniques.
  • Worked wif various databases like Oracle, SQL and performed teh computations, log transformations, feature engineering, and Data exploration to identify teh insights and conclusions from complex data using R- programming in R-studio.
  • Worked on database design, relational integrity constraints, OLAP, OLTP, Cubes and Normalization (3NF) and De-normalization of teh database.
  • Worked wif various Teradata15 tools and utilities like Teradata Viewpoint, MultiLoad, ARC, Teradata Administrator, BTEQ and other Teradata Utilities.
  • Performed in-depth statistical analysis and data mining methods using R, including Cluster Analysis, Logistic Regression and Boosting models.
  • Extensively used Azure Machine Learning to set up teh experiments and creating Web services for teh predictive analytics.
  • Created Report-Models for ad-hoc reporting and analysis.
  • Created Logical/physical Data Model in Erwin and has worked on loading teh tables in teh DataWarehouse
  • Worked extensively wif teh Erwin Model Mart for version control.
  • Participated in all phases of datamining data collection, data cleaning, developing models, validation, visualization and performed Gap Analysis.
  • Performed feature scaling, feature engineering and statistical modeling.
  • Worked on writing complex SQL queries in performing Data analysis using window functions, joins, improving performance by creating partitioned tables.
  • Utilized simple methods like PowerPoint presentations while conducting walkthroughs wif teh stakeholders.
  • Extensively designed Data mapping and filtering, consolidation, cleansing, Integration, ETL, and customization of datamart.
Environment:: Python, SQL, Oracle 12c, Netezza, SQL Server, SSRS, PL/SQL, T-SQL, Tableau, SQL/ NoSQL, Erwin r9.6, Linux, Python, R, NumPy, SciPy, Scikit- Learn, R, Tableau, MS Office 2007, MS Visio 2003, Windows XP, AFS, MS Excel.

Confidential, OH

Data Science Engineer

Responsibilities:

  • PerformedDataCollection,DataCleaning,DataVisualization and Developing Machine LearningAlgorithms by using NumPy, Pandas and Matplotlib.
  • Developed and updated SQL queries, stored procedures, clustered index and non-clustered index, and functions dat meet business requirements using SQL Server 2017.
  • Used TEMPPrincipal Component Analysis and factor Analysis in feature engineering to analyse high dimensional data in python.
  • Visualize, interpret, report findings, and develop strategic uses of data by python libraries like NumPy, Pandas, SciPy, Sci-kitLearn.
  • Utilized Spark streaming to receive real time data from teh Kafka and store teh stream data to HDFS using Python/PySpark also Scala and databases such as HBase.
  • Involved in creating Hive tables, loading wif data and writing hive queries.
  • Extensively worked on Data Modeling tools Erwin Data Modeler to design teh Data Models.
  • Created SQL tables wif referential integrity and developed queries using SQL, SQLPLUS and PL/SQL.
  • Designed and developed Use Case, Activity Diagrams, Sequence Diagrams, OOD (Object oriented Design) using UML and Visio.
  • Created customized reports in Tableau fordatavisualization.
  • Designed SSIS packages to ETL existingdatainto SQL Server, using Pivot Transformation, Fuzzy Lookup,DerivedColumns, Condition Split, Aggregation, DataFlow Task, and Execute Package Task.
  • PerformeddataanalysisbyusingHivetoretrievethedatafromHadoopcluster,Sqlto retrieve datafromOracle databaseand usedETLfordatatransformation.
  • Worked on database design, relational integrity constraints, OLAP, OLTP and Normalization (3NF) and De-normalization of database.
  • Performed in-depth statistical analysis and data mining methods using R, including Cluster analysis, Logistic Regression, and boosting models
  • Tested Complex ETL Mappings and Sessions based on business user requirements and business rules to loaddatafrom source flat files and RDBMS tables to target tables.
  • Used R and SQL to manipulate data, and develop and validate quantitative models.
  • Responsible for design and development of advanced R/Python programs to prepare to transform and harmonize data sets in preparation for modeling.
  • Data Manipulation and Aggregation from a different source using Nexus, Toad, Business Objects, Powerball, and Smart View.
  • Involved in Code migration, testing and making sure dat enhancements are deployed in production.
  • Created packages, procedures, functions, trigger and other database objects using SQL and PL/SQL.
  • Analyzed largedatasets apply machine learning techniques and develop predictive models, statistical models and developing and enhancing statistical models by leveraging best-in-class modeling techniques.

Environment: Python 3x, Cloudera, Hadoop, Apache Spark, Hive, NumPy, NLTK, Pandas, SciPyMap Reduce, Tableau, Sqoop, HBase, Oozie, HDFS, PySpark, NoSQL, Tableau, DynamoDB, Mongo DB, Teradata, SQL Server, AWS Redshift.

Confidential, Chicago, IL

Data Scientist/Machine Learning Engineer

Responsibilities:

  • Build an in-depth understanding of teh problem domain and available data assets.
  • Research, design, implement, and evaluate Machine Learning approaches and models.
  • Followed teh RUP based methods using Rational Rose to create Use Cases, Activity Diagrams / State Chart Diagrams, Sequence Diagrams.
  • Worked on teh client-server model and gatheird teh requirements and documented accordingly.
  • Involved in executing test cases to validate teh data from source to target, evaluating test results and preparing test summary reports. Automated and scheduled recurring reporting processes using UNIX shell scripting and Teradata utilities such as MLOAD, BTEQ, and Fast Load.
  • PerformeddataETL by collecting, exporting, merging and massagingdatafrom multiple sources and platforms includingSSRS/SSIS (SQL Server Integration Services) in SQL Server.
  • Worked wif several R packages including GGPLOT, DPLYR, KNITR.
  • Created SSIS Packages using Pivot Transformation, Execute SQL Task, Data Flow Task, etc. to import data into teh data warehouse.
  • Written Python Scripts to establish continuous workflows from different teams providing data.
  • Written unit and integration tests in python to test teh code.
  • Implemented LDAP authentication to authenticate and authorize teh Customers using python Rest Services.
  • Performed administrative tasks, including creation of database objects such as database, tables, and views, using SQL DCL, DDL, and DML requests.
  • Designed, modeled, validated and tested statistical algorithms against various data sets including behavioral data and deployed predictive models using R-studio.
  • Setup storage and data analysis tools in Amazon Web Services cloud computing infrastructure.
  • Implemented end-to-end systems for Data Analytics, Data Automation and integrated wif custom visualization tools using R, Mahout, Hadoop and MongoDB.
  • Designed different type of STAR schemas for detailed data marts and plan data marts in teh OLAP environment.
  • Created SQL tables wif referential integrity and developed queries using SQL, SQL PLUS and PL/SQL.
  • Interaction wif Business Analyst, SMEs and other Data Architects to understand Business needs and functionality for various project solutions.
  • Used Data Quality Validation techniques to validate Critical Data Elements (CDE) and identified various anomalies.
  • Identifying and executing process improvements, hands-on in various technologies such as Oracle and Business Objects.
  • Implemented Classification using supervised algorithms like Logistic Regression, Decision trees, KNN, Naive Bayes.
  • Involved wif Data Analysis primarily Identifying Data Sets, Source Data, Source Meta Data, Data Definitions and Data Formats.

Environment: Python, R, OLTP, OLAP, DB2, Metadata, SQL, PL/SQL, MS Excel, Mainframes MS Vision, Map-Reduce, Rational Rose, SQL, and MongoDB.

Confidential, Minneapolis, MN

Database Developer

Responsibilities:

  • Extensively interact wif Business Analysts, Technical Architect and understand teh Business requirements, preparing technical specifications.
  • Participated in daily meetings wif Client Manager, Technical Architect and other team members/offshore team to discuss on daily tasks status and other enhancements / issues.
  • Created Materialized views, tables, complex SQL Queries, Correlated Sub Queries, Views and Function Based Indexes for TEMPeffective Business Application Development.
  • Used Explain plan, DBMS Profiler for tuning teh PL/SQL Blocks.
  • Created Bitmap Indexes on low cardinality columns as they are result in reduced response time for queries and substantial reduction of storage space.
  • Loaded external CSV files data into Oracle database table using SQL Developer.
  • Used advanced features of Cursors, Ref Cursors, Bulk Collect, Global Temporary Tables (GTT), Collections and Dynamic SQL in teh procedures.
  • Used advanced Analytical functions RANK, DENSE RANK, LAG, LEAD while writing teh SQL Queries.
  • Created User defined and System defined Exceptions to handle several types of errors like NO DATA FOUND, TOO MANY ROWS etc. and logged teh error details.
  • Responsible for creation of stored procedures, functions, views and triggers using MS-SQL Server 2008/2012 to retrieve teh data from Invest AI application source database.
  • Developed ETL jobs using DataStage by extracting data from different source systems/files and loading into Oracle database.
  • Prepares teh technical design document and participated in source to target mapping design document.
  • Developed DataStage parallel jobs using different processing stages like Transformer, Aggregator, Funnel, Lookup, Filter, Join, Sort, Copy, Merge, Lookup, Copy.
  • Involved in Creating / Modifying UNIX shell scripts
  • Created process flow diagrams using MS Visio.
  • Preparing Unit Test Plan (UTP), Unit Test Plan Execution (UTPE) and Test data for Unit testing.
  • Managed offshore team to get done teh tasks in teh given scheduled time wif quality of deliverables.
  • Involved in Unit Testing and Code review prior to System Test release
  • Preparing documents to support QA team for teh System Test execution.
  • Involved in Deploying code to UAT and PROD using Rational ClearCase tool.

Environment: Oracle 11g, SQL Server, Autosys, Putty, SQL Developer, TOAD, Windows XP/ 7 / 8, MS-Excel, MS-Word, UNIX.

Confidential

SQL Server BI Developer

Responsibilities:

  • Involved in Normalization and De-Normalization of existing tables for faster query retrieval.
  • Created Databases, Tables, Cluster/Non-Cluster Index, Unique/Check Constraints, Views,
  • Monitored performance and optimized SQL queries for maximum efficiency.
  • Responsible for Creating and Modifying T-SQL stored procedures/ triggers for validating teh integrity of teh data.
  • Created stored procedures to apply banking rules
  • Created indexed views and appropriate indexes to reduce teh running time for complex queries.
  • Created several packages in SSIS.
  • Used SQL Server Agent for scheduling jobs and alerts.
  • Tuned teh SQL queries using SQL profiler.
  • Wrote database triggers in T-SQL to check teh referential integrity of teh database.
  • Managed teh use of disk space, memory and connections.
  • Prepared of project documentation.
  • Worked wif application developers in data modeling.
  • Created materialized views and normal views as per requirement.
  • Imported data from PeopleSoft to MS SQL server database.
  • Developed wif a focus on structured query language, T-SQL, Stored procedures and triggers, automating and improving repetitive data tasks so they are more efficient.
  • Reviewed and fixed issues related to pre-existing SQL queries, Ensured all database servers and Backups.

Environment: MS SQL Server 2008/2008 R2, OLAP, ROLAP, Visual Studio 2008/2010, Windows 2007, T-SQL, XML, SSRS, SSI

We'd love your feedback!