We provide IT Staff Augmentation Services!

Sr. Data Scientist/architect Resume

4.00/5 (Submit Your Rating)

Minneapolis, MN

SUMMARY:

  • Over 9+ years of Experience on Machine Learning, Statistical Modeling, Predictive Modeling, Data Analytics, Data Modeling, Data Architecture, Data Analysis, Data Mining, Text Mining & Natural Language Processing (NLP), Artificial Intelligence algorithms, Business Intelligence, Analytics Models (like Decision Trees, Linear & Logistic Regression, Hadoop (Hive, PIG), R, Python, Spark, Scala, MS Excel, SQL and Postgre SQL, Erwin.
  • Excellent working in Big Data Hadoop Hortonworks, HDFS architecture, R, Python, Jupyter, Pandas, numPy, Scikit,Matplotlib, pyhive,Keras, Hive, noSQL - HBASE, Sqoop, Pig, MapReduce, Oozie, Spark MLlib .
  • Hands on experience in Liner, Logistic Regression, K Means Cluster Analysis, Decision Tree, KNN, SVM, Random Forest, Market Basket, NLTK/Naïve Bayes, Sentiment Analysis, Text Mining/Text Analytics, Time Series Forecasting .
  • Proficient in Hadoop, HDFS, Hive, MapReduce, Pig and NOSQL databases like MongoDB, HBase, Cassandra and expertise in applying data mining techniques and optimization techniques in B2B and B2C industries and proficient in Machine Learning, Data/Text Mining, Statistical Analysis & Predictive Modeling.
  • Hands on experience in R Programming using in data extraction, pre process, data analysis, management, sorting, merging, summary using lists, vectors, arrays, matrices, functions, data frames, tables, ggplot etc and model building using lm, glm, ts, k - means validating using ANOVA, Plots, Lift charts, Gain charts, ROC Curve
  • Experienced in Data Modeling & Data Analysis experience using Dimensional Data Modeling and Relational Data Modeling, Star Schema/Snowflake Modeling, FACT & Dimensions tables, Physical & Logical Data Modeling.
  • Expertise skills in SAS DATA STEP, PROC Step, SQL, ETL, data mining, SAS MACRO, SAS ACCESS, SAS STAT, SAS GRAPH, SAS DI Studio, SAS BI Platform, SAS Web Report Studio, SAS BI Dashboard, SAS Stored Process, SAS Management Console, Enterprise Guide, Enterprise Miner, SAS VA and ODS and procs like PROC SQL, import, export, means, summary, freq, tabulate, report, univariate, append, print, sort, transpose, format, glm, corr, factor, t test, Chi Square, ANOVA, Arima, Arma, rank, Reg, logistic, boxplot etc
  • Extensive experience in Text Analytics, developing different Statistical Machine Learning, Data Mining solutions to various business problems and generating data visualizations using R, Python and creating dashboards using tools like Tableau.
  • Hands on expertise in SQL, PL/SQL,ETL, Ab Initio, Oracle, Teradata, functional analysis, Data Mapping, Data Profiling, Data cleansing, Sorting, Merging, Data joins, Data Transformation, build tables and generate reports
  • Experienced writing spark streaming and spark batch jobs using spark MLlib for analytics.
  • Experience on Logical and Physical modeling, Dimensional modeling using Erw in in DWH, performed data cleaning, data mapping, data transformation using metadata, Type1,Type2 slow changing dimensions, SQL queries and Joins/Filters in the transformations and load tables into data marts and generate customized reports in DWH
  • Excellent experience in creating cloud based solutions and architecture using Amazon Web services and Microsoft Azure.
  • Experienced in Data Scrubbing/Cleansing, Data Quality, Data Mapping, Data Profiling, and Data Validation in ETL.
  • Hands on experience on clustering algorithms like K-means & Medoids and Predictive algorithms and expertise in Model Development, Data Mining, Predictive Modeling, Data Visualization, Data Clearing and Management, and Database Management.
  • Excellent Knowledge of RalphKimball and BillInmon's approaches to Data Warehousing.
  • Extensive experience in development of T-SQL, DTS, OLAP, PL/SQL, Stored Procedures, Triggers, Functions, Packages, performance tuning and optimization for business logic implementation.
  • Extensively worked with Teradata utilities BTEQ, Fast Export and Multi Load to export and load data to/from different source systems including flat files.
  • Excellent experience on Teradata SQL queries, Teradata Indexes, Utilities such as Mload, Tpump, Fast load and Fast Export.
  • Strong experience and knowledge in Data Visualization with Tableau creating: Line and scatter plots, Bar Charts, Histograms, Pie chart, Dot charts, Box plots, Time series, Error Bars, Multiple Charts types, Multiple Axes, subplots etc.
  • Experiencedin SAS/BASE, SAS/STAT, SAS/SQL, SAS/MACROS, SAS/GRAPH, SAS/ACCESS, SAS/ODS, SAS/QC, SAS/ETS in Mainframe, Windows and UNIX environments
  • Experience in UNIX shell scripting, Perl scripting and automation of ETL Processes.
  • Extensively used ETL to load data using Power Center / Power Exchange from source systems like Flat Files and Excel Files into staging tables and load the data into the Confidential database Oracle. Analyzed the existing systems and made a Feasibility Study.
  • Experienced in developing Entity-Relationship diagrams and modeling Transactional Databases and Data Warehouse using tools like ERWIN, ER/Studio and Power Designer.

TECHNICAL SKILLS:

Data Analytics Tools/Programming: Python (numpy, scipy, pandas,Gensim, Keras), R ( Caret, Weka, ggplot), MATLAB, Microsoft SQL Server, Oracle PLSQL, Python, SQL, PL/SQL, T-SQL, UNIX shell scripting, Java, SAS.

Big Data Techs: Hadoop, Hive, HDFS, MapReduce, Pig, Kafka, HBase, Cassandra, MongoDB.

Analysis and Modeling Tools:: Erwin, Sybase Power Designer, Oracle Designer, BPwin, Rational Rose, ER/Studio, TOAD, MS Visio

ETL Tools:: Informatica Power Center, Data Stage 7.5, Ab Initio, Talend

OLAP Tools:: MS SQL Analysis Manager, DB2 OLAP, CognosPowerplay

Languages:: SQL, PL/SQL, T-SQL, XML, HTML, UNIX Shell Scripting, C, C++, AWK

Databases:: Oracle12c/11g/10g/9i/8i/8.0/7.x,Teradata14.0,DB2 UDB 8.1, MS SQLServer 2012/2008/2005, Netezaa and Sybase ASE 12.5.3/15,Informix 9, HBase, MongoDB, Cassandra, Amazon Redshift.

Operating Systems:: Windows 2007/8, UNIX (Sun-Solaris, HP-UX), Windows NT/XP/Vista, MSDOS

Project Execution Methodologies:: Ralph Kimball and Bill Inmon data warehousing methodology, Rational Unified Process (RUP), Rapid Application Development (RAD), Joint Application Development (JAD)

Reporting Tools: Business ObjectsXIR 2/6.5/5.0/5.1, Cognos Impromptu 7.0/6.0/5.0,Informatica Analytics Delivery Platform, MicroStrategy, Tableau.

Tools:: MS-Office suite (Word, Excel, MS Project and Outlook), VSS

Others: Spark MLLib, Scala NLP, MariaDB, Azure, SAS, IDE, Microsoft Azure, AWS

PROFESSIONAL EXPERIENCE:

Confidential, Minneapolis MN

Sr. Data Scientist/Architect

Responsibilities:

  • Developed MapReduce/Spark Python modules for machine learning & predictive analytics in Hadoop on AWS. Implemented a Python-based distributed random forest via Python streaming.
  • Perform Exploratory analysis, hypothesis testing, cluster analysis, correlation, ANOVA,ROC Curve and build models in Supervised and Unsupervised Machine Learning algorithms, Text Analytics & Time Series forecasting
  • Prepared the model data and built machine learning algorithms using Python Pandas, scikit learn, numPy, keras etc . librarie s using Anaconda Jupyter & Programming Linear, Logistic Regressions, KNN,K-Means Clustering, Sentiment/Text Analytics, NLP, Naïve Bayes, Time Series forecasting using lm, glm, Arima, Apriori, Forecast.
  • Extracting data from Big Data Hadoop Data Lake, Excel, Analyzing, Cleaning, Sorting, Merging Reporting and creating dashboards using Base SAS, SAS Macros, SQL, Hive, SAS VA, SAS, and Excel .
  • Developing MapReduce/Spark Python modules for machine learning & predictive analytics in Hadoop on AWS. Implemented a Python-based distributed random forest via Python streaming.
  • Used pandas, numpy, seaborn, scipy, matplotlib, scikit-learn, NLTK in Python for developing various machine learning algorithms and utilized machine learning algorithms such as linear regression, multivariate regression, naive bayes, Random Forests, K-means, & KNN for data analysis.
  • Currently working on building clustering and predictive models using Mllib to predict fault code occurrences using Spark and Mllib.
  • Conducting studies, rapid plots and using advance data mining and statistical modelling techniques to build solution that optimize the quality and performance of data.
  • Demonstrated experience in design and implementation of Statistical models, Predictive models, enterprise data model, metadata solution and data life cycle management in both RDBMS, Big Data environments.
  • Stored and retrieved data from data-warehouses using Amazon Redshift and designed and implemented system architecture for Amazon EC2 based cloud-hosted solution for client.
  • Developed Simple to complex Map Reduce Jobs using Hive and Pig and developed multiple MapReduce jobs in java for data cleaning and preprocessing.
  • Analyzing large data sets apply machine learning techniques and develop predictive models, statistical models and developing and enhancing statistical models by leveraging best-in-class modeling techniques.
  • Developed MapReduce/Spark Python modules for machine learning & predictive analytics in Hadoop on AWS.
  • Worked with various Teradata15 tools and utilities like Teradata Viewpoint, Multi Load, ARC, Teradata Administrator, BTEQ and other Teradata Utilities.
  • Utilized Spark, Scala, Hadoop, HBase, Kafka, Spark Streaming, Caffe, TensorFlow, MLLib, Python, a broad variety of machine learning methods including classifications, regressions, dimensionally reduction etc.
  • Developed LINUX Shell scripts by using NZSQL/NZLOAD utilities to load data from flat files to Netezza database.
  • Working on Information extraction from different kinds of text documents using NLP, text mining and regular expressions.
  • Worked extensively on Tableau Desktop, apply filters, drill downs, and generate Data visualizations, interactive Dash Boards, that can interact with views of data and worked on several options like query, display, analyze, sort, group, drill down, organize, summarize and generate charts, monitor and measure goals, identify patterns

Environment: Python, SQL, Oracle 12c, Netezza, SQL Server, Informatica, Java, SSRS, PL/SQL, T-SQL, Tableau, MLLib, regression, Scala NLP, Spark, Kafka, MongoDB, Workday, logistic regression, Hadoop, Hive, TensorFlow, Teradata, IDE, random forest, OLAP, Azure, MariaDB, SAP CRM, HDFS, ODS, NLTK, SVM, JSON, Tableau, XML, AWS Redshift, Pandas, Cassandra, MapReduce, AWS, Tableau, Caffe.

Confidential, Mentor OH

Sr. Data Scientist/ Architect

Responsibilities:

  • Implemented end-to-end systems for Data Analytics, Data Automation and integrated with custom visualization tools using R, Hadoop and MongoDB, Cassandra.
  • Building predictive models using tools such as SAS, R with very granular data stored in big data platform.
  • Worked on different data formats such as JSON, XML and performed machine learning algorithms in R and used Spark for test data analytics using MLLib and Analyzed the performance to identify bottlenecks.
  • Involved working with Machine Learning Algorithms such as Decision Trees, Random Forest, Gradient Boosting, Support Vector Machines, K Mean Clustering, Naïve Bayes, Bayesian Belief Networks and Artificial Neural Networks.
  • Developed Predictive models, Machine learning (Supervised and non-Supervised) using R for Machine Motor.
  • Creating various B2B Predictive and descriptive analytics using R and Tableau and performed data cleaning and data preparation tasks to convert data into a meaningful data set using R.
  • Used R to verify the results of Mahout on small data sets. Developed missing but important features of ML algorithms to the Mahout.
  • Utilized Spark, Scala, Hadoop, HBase, Kafka, Spark Streaming, MLLib, Python, a broad variety of machine learning methods including classifications, regressions, dimensionally reduction etc
  • Created partitioned and bucketed tables in Hive. Involved in creating Hive internal and external tables, loading with data and writing hive queries which involves multiple join scenarios.
  • Performed K-means clustering, Multivariate analysis and Support Vector Machines in R.
  • Create analytical models using analytics algorithms like regression, decision trees, clustering, text mining etc. and leveraging tools like R, Tableau etc. to deliver actionable insights and recommendations.
  • Developed multiple Spark jobs using Scala for data cleaning and preprocessing
  • Designed the schema, configured and deployed AWS Redshift for optimal storage and fast retrieval of data.
  • Used External Loaders like Multi Load, T Pump and Fast Load to load data into Teradata14.1Database.
  • Involved in Troubleshooting and quality control of data transformations and loading during migration from Oracle systems into Netezza EDW.
  • Used S3 Bucket to store the jar's, input datasets and used Dynamo DB to store the processed output from the input data set.
  • Worked on classification/scripting of multiple attribute models by applying text-mining, NLP, SVM and Regular Expressions given product features like title, description etc. & predicting product attribute values using Python/R
  • Worked on different data formats such as JSON, XML and performed machine learning algorithms in R.
  • Used Spark for test data analytics using MLLib and Analyzed the performance to identify bottlenecks and used Supervised learning techniques such as classifiers and neural networks to identify patters in these data sets
  • Developed Tableau visualizations and dashboards using Tableau Desktop. Tableau workbooks from multiple data sources using Data Blending.
  • Developing new data warehousing system based on spark 2.x and spark streaming, utilizing Scala and Java 8
  • Strong Knowledge on concepts of DataModeling Star Schema/Snowflake modeling, FACT& Dimensions tables and Logical & Physical data modeling.

Environment: R3.x, Erwin 9.5.2, MDM, QlikView, MLLib, PL/SQL, Tableau, Teradata 14.1, JSON, HADOOP (HDFS), MapReduce, SQL Server, MLLib, Scala NLP, SSMS, ERP, CRM, Netezza, Pandas, SAS, SPSS, Java, IDE, Cassandra, SQL, PL/SQL, AWS, SSRS, Informatica, PIG, Spark, Azure, R Studio, MongoDB, MAHOUT, JAVA, HIVE, AWS Redshift.

Confidential, Stamford CT

Sr. Data Modeler/Data Architect

Responsibilities:

  • Design database, data models, ETL processes, data warehouse applications and business intelligence (BI) reports through the use of best practices and tools, including Erwin, SQL, SSIS, SSRS and OLAP, OLTP.
  • Transformed Logical Data Model to Physical Data Model ensuring the Primary Key and Foreign Key relationships in PDM, Consistency of definitions of Data Attributes and Primary Index Considerations.
  • Validated the data of reports by writing SQL queries in PL/SQL Developer against ODS.
  • Developed MapReduce programs to parse the raw data, populate staging tables and store the refined data in partitioned tables in the EDW.
  • Involved with Data Analysis primarily Identifying Data Sets, Source Data, Source Meta Data, Data Definitions and Data Formats
  • Handled importing data from various data sources, performed transformations using Hive, Map Reduce, and loaded data into HDFS
  • Developed Star and Snow flake schemas based dimensional model to develop the data warehouse.
  • Developed Data Mapping, Data Governance, and Transformation and cleansing rules for the Master Data Management Architecture involving OLTP, ODS.
  • Created Hivequeries that helped analysts spot emerging trends by comparing fresh data with EDW reference tables and historical metrics.
  • Designed the ER diagrams, logical model (relationship, cardinality, attributes, and, candidate keys) and physical database (capacity planning, object creation and aggregation strategies) for Oracle and Teradata as per business requirements using Erwin
  • Involved with all the phases of Software Development Life Cycle (SDLC) methodologies throughout the project life cycle.
  • Extensively involved in the Physical/logical modeling and development of Reporting Data Warehousing System.
  • Performing reverse engineering of physical data models from databases and SQLscripts.
  • Analyzed the web log data using the HiveQL to extract number of unique visitors per day, page views, visit duration, most purchased product on website and managed and reviewed Hadoop log files.
  • Used Normalization (1NF, 2NF&3NF) and De-normalization techniques for effective performance in OLTP and OLAP systems.
  • Created Complex SQL Queries using Views, Indexes, Triggers, Roles, Stored procedures and User Defined Functions Worked with different methods of logging in SSIS.
  • Processed the data using HQL (like SQL) on top of Map-reduce.
  • Worked in importing and cleansing of data from various sources like DB2, Oracle, flat files onto SQL Server with high volume data.
  • Involved in Migrating the data model from one database to Teradata database and prepared a Teradata staging model.
  • Created and developed Slowly Changing Dimensions tables SCD2, SCD3 to facilitate maintenance of history.
  • Analyzed the web log data using the HiveQL to extract number of unique visitors per day, page views, visit duration, most purchased product on website.
  • Developed Tableau visualizations and dashboards using Tableau Desktop. Tableau workbooks from multiple data sources using Data Blending.
  • Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports by our BI team.
  • Documented and maintained SAS programs/macros, including independent developed Macros and utilizing existing Macros.
  • Design and Development of Business Objects Universes, which suits the standard, analytical and ad-hoc reporting requirements.
  • Collecting, aggregating, matching, consolidating, quality-assuring, persisting and distributing such data throughout an organization to ensure consistency trough MDM.
  • Used Teradata SQL Assistant, Teradata Administrator, PMON and data load/export utilities like BTEQ, Fast Load, Multi Load, Fast Export, Tpump on UNIX/Windows environments and running the batch process for Teradata.
  • Connected Tableau server to publish dashboard to a central location for portal integration.
  • Responsible for creating Hive tables, loading data and writing hive queries.
  • Worked on different data formats such as Flatfiles, SQLfiles, Databases, XMLschema, CSVfiles.

Environment: Erwin 9.x, Teradata V14, Teradata SQL Assistant, Informatica Power Centre, Oracle 11g, Netezza, SQL Server 2008, Mainframes, SQL, PL/SQL, XML, Hive, Hadoop, PIG, Hadoop, SPSS, SAS, Excel, Business Objects, Tableau, T-SQL, SSRS, SSIS, XML, Tableau.

Confidential, Columbus GA

Sr. Data Analyst/Data Modeler

Responsibilities:

  • Created and maintained Logical and Physical models for the data mart and created partitions and indexes for the tables in the data mart.
  • Performed data profiling and analysis applied various data cleansing rules designed data standards and architecture/designed the relational models.
  • Maintained metadata (data definitions of table structures) and version controlling for the data model.
  • Developed SQLscripts for creating tables, Sequences, Triggers, views and materialized views.
  • Analyze and study the source system data models to understand concept tie-outs such that the integration process into existing data warehouse is seamless and data redundancy is eliminated.
  • Used TeradataOLAP functions like RANK, ROW NUMBER, QUALIFY, CSUM and SAMPLE.
  • Involved in designing and developing Data Models and Data Marts that support the Business Intelligence Data Warehouse.
  • Conducted performance analysis and created partitions, indexes and Aggregate tables.
  • Utilized Erwin's forward/reverse engineering tools and Confidential database schema conversion process.
  • Developed SQL scripts for loading the aggregate tables and rollup dimensions.
  • Perform data profiling and data analysis to enable identify data gaps and familiarize with new source system data
  • Analyze existing source system with the help of Data Profiling and source system data models thus creating individual data models for various domains/subject areas for the proposed data warehouse solution.
  • Performed unit testing, system integrated testing for the aggregate tables.
  • Performed data analysis on the Confidential tables to make sure the data as per the business expectations.
  • Used Normalization methods up to 3NF and De-normalization techniques for effective performance in OLTP systems.
  • Developed SQL scripts for loading data from staging area to Confidential tables.
  • Proposed the EDW data design to centralize the data scattered across multiple datasets.
  • Worked on the development of Data Warehouse, Business Intelligence architecture that involves data integration and the conversion of data from multiple sources and platforms.
  • Worked on TeradataSQL queries, Teradata Indexes, Utilities such as Mload, Tpump, Fast load and FastExport
  • Worked on SQL and SAS script mapping.
  • Follow the Type 2-dimension methodology to accommodate designing and maintaining for history data.
  • Used Meta data tool for importing metadata from repository, new job categories and creating new data elements.
  • Research on the attained database space savings as and when a module has been released and come out with numbers before and after the release.
  • Performed the Data Mapping, Data design (Data Modeling) to integrate the data across the multiple databases in to EDW.

Environment: Oracle 12c, SQL Plus, Erwin, MS Visio, Source Offsite (SOS), Windows XP, QC Explorer, Share point workspace, Teradata, Oracle, SQL, PL/SQL, IBM DB2,Business Objects XI3.5,COBOL,QuickData.

Confidential

Sr. Data Analyst/Data Modeler

Responsibilities:

  • Used Erwin for effective model management of sharing, dividing and reusing model information and design for productivity improvement.
  • Involved in preparing Logical Data Models/Physical Data Models.
  • Worked extensively in both Forward Engineering as well as Reverse Engineering using data modeling tools.
  • Involved in the creation, maintenance of Data Warehouse and repositories containing Metadata.
  • Resolved the data type inconsistencies between the source systems and the Confidential system using the Mapping Documents and analyzing the database using SQL queries.
  • Extensively used both Star Schema and Snow flake schema methodologies in building and designing the logical data model in both Type1 and Type2Dimensional Models.
  • Worked with DBA group to create Best-Fit Physical Data Model from the Logical Data Model using Forward Engineering.
  • Designing and customizing data models for Data warehouse supporting data from multiple sources on real time. Requirements elicitation and Data analysis. Implementation of ETL Best Practices.
  • Developed Data Migration and Cleansing rules for the Integration Architecture(OLTP, ODS, DW).
  • Used Teradata SQL Assistant, Teradata Administrator, PMON and data load/export utilities like BTEQ, Fast Load, Multi Load, Fast Export, Tpump on UNIX/Windows environments and running the batch process for Teradata.
  • Documented logical, physical, relational and dimensional data models. Designed the data marts in dimensional data modeling using star and snowflake schemas.
  • Created dimensional model based on star schemas and designed them using ERwin.
  • Data modeling and design of data warehouse and data marts in star schema methodology with confirmed and granular dimensions and FACT tables.
  • Developed SQL Queries to fetch complex data from different tables in remote databases using joins, database links and Bulk collects.
  • Developed SQLQueries to fetch complex data from different tables in remote databases using joins, database links and Bulk collects.
  • Used OBIEE to create reports.
  • Worked on data modeling and produced data mapping and data definition specification documentation.

Environment: Erwin, Oracle, SQL server 2008, MS Excel, MS Visio, Rational Rose, Requisite Pro, SAS, SSIS, SSRS, Windows 7, PL/SQL,, SQl Server, Teradata, MS Office, MS Access, SQL, MS Visio.

We'd love your feedback!