We provide IT Staff Augmentation Services!

Sr. Data Scientist/architect Resume

4.00/5 (Submit Your Rating)

Dallas, TX

SUMMARY:

  • Over 9 + years of hands on experience & comprehensive industry knowledge of Machine Learning Statistic Modeling, Predictive Modeling, Data Analytics, Data Modeling, Data Architecture, Data Analysis, Data Mining, Text Mining and Natural Language Processing (NLP), Artificial Intelligence algorithms, Business Intelligence, Analytics Models (like Decision Trees, Linear & Logistic Regression, Hadoop (Hive, PIG, MapReduce), R, Python, Spark, Scala, AWS (EC2, S3, Redshift), MS Excel, SQL and Postgre SQL, Erwin.
  • Experienced in utilizing analytical applications like R, SPSS, and Python to identify trends and relationships between different pieces of data draw appropriate conclusions and translate analytical findings into risk management and marketing strategies that drive value.
  • Experienced in designing star schema (identification of facts, measures and dimensions), Snowflake schema for Data Warehouse, ODS Architecture by using tools like Erwin Data Modeler, Power Designer, E - R Studio and Microsoft Visio.
  • Expertise with the tools in Hadoop Ecosystem including Pig, Hive, HDFS, MapReduce, Sqoop, Storm, Spark, Kafka, Yarn, Oozie, and Zookeeper.
  • Extensive experience in Text Analytics, developing different Statistical Machine Learning, Data Mining solutions to various business problems and generating data visualizations using R, Python and creating dashboards using tools like Tableau.
  • Experienced in designing and building a DataLake using Hadoop and its ecosystem components.
  • Exploring with the Spark for improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
  • Expertise in applying data mining techniques and optimization techniques in B2B and B2C industries and proficient in Machine Learning, Data/Text Mining, Statistical Analysis and Predictive Modeling.
  • Experienced writing spark streaming and spark batch jobs using spark MLlib for analytics and Hands on experience on clustering algorithms like K-means & Medoids and Predictive algorithms.
  • Experienced in SQL Queries and optimizing the queries in Oracle, SQL Server, DB2, Netezza and Teradata.
  • Experienced Data Modeler with conceptual, Logical and Physical Data Modeling skills, Data Profiling skills, Maintaining Data Quality, Teradata 15/14, experienced with JAD sessions for requirements gathering, creating Data Mapping, documents, writing functional specifications and queries.
  • Expertise in Model Development, Data Mining, Predictive Modeling, Data Visualization, Data Clearing and Management, and Database Management.
  • Proficient in Hadoop, Hive, MapReduce, Pig and NOSQL databases like MongoDB, HBase, Cassandra.
  • Excellent experience in SQL Loader, SQL Data, SQL Data Modeling, Reporting, SQL Database Development to load data from the Legacy systems into Oracle Databases using control files and used Oracle External Tables feature to read the data from flat files into Oracle staging tables.
  • Excellent knowledge of Machine Learning, Mathematical Modeling and Operations Research, Comfortable with R, Python, SAS and Weka, MATLAB, Relational databases. Deep understanding & exposure of Big Data Eco-system.
  • Experienced in Data Modeling retaining concepts of RDBMS, Logical and Physical Data Modeling until 3NormalForm (3NF) and Multidimensional Data Modeling Schema (Star schema, Snow-Flake Modeling, Facts and dimensions).
  • Good knowledge in Database Creation and maintenance of physical data models with Oracle, Teradata, Netezza, DB2 and SQL Server databases.
  • Expertise in Excel Macros, Pivot Tables, vlookups and other advanced functions and expertise R user with knowledge of statistical programming languages SAS.
  • Experience in Data Analysis, Data Migration, Data Cleansing, Transformation, Integration, Data Import, and Data Export using multiple ETL tools such as Ab Initio and Informatica Power Center.
  • Experience in SQL and good knowledge in PL/SQL programming and developed Stored Procedures and Triggers and Data Stage, DB2, UNIX, Cognos, MDM, UNIX, Hadoop and Pig.
  • Expertise in data acquisition, storage, analysis, integration, predictive modeling, logistic regression, decision trees, data mining methods, forecasting, factor analysis, cluster analysis, and other advanced statistical techniques.
  • Very good knowledge in Data Analysis, Data Validation, Data Cleansing, Data Verification and Identifying Data Mismatch.
  • Excellent experience on Teradata SQL queries, Teradata Indexes, Utilities such as MLOAD, TPump, Fast load and Fast Export.
  • Strong experience and knowledge in Data Visualization with Tableau creating: Line and scatter plots, Bar Charts, Histograms, Pie chart, Dot charts, Box plots, Time series, Error Bars, Multiple Charts types, Multiple Axes, subplots etc.
  • Experienced in Database performance tuning and Data Access optimization, writing complex SQL quires and PL/SQL blocks like stored procedures, Functions, Triggers, Cursors and ETL packages.

TECHNICAL SKILLS:

Data Modeling Tools: Erwin r9.6, 9.5, 9.1, 8.x, Rational Rose, ER/Studio, MS Visio, SAP Power designer.

Programming Languages: Oracle PL/SQL, Python, Scala, SQL, T-SQL, UNIX shell scripting.

Scripting Languages: Python (NumPy, SciPy, Pandas, Gensim, Keras), R (Caret, Weka, ggplot)

Big Data Technologies: Hadoop, Hive, HDFS, MapReduce, Pig, Kafka, HDFS, Sqoop, Oozie, Spark and Scala.

Reporting Tools: Crystal reports, Business Intelligence, SSRS, Business Objects, Tableau.

ETL: Informatica Power Centre, SSIS.

Project Execution Methodologies: Ralph Kimball and Bill Inmon data warehousing methodology, Rational Unified Process (RUP), Rapid Application Development (RAD), Joint Application Development (JAD)

BI Tools: Tableau, Tableau server, Tableau Reader, SAP Business Objects, OBIEE, Qlikview, SAP Business Intelligence, Amazon Redshift, or Azure

Data Warehouse Tools: MS-Office suite (Word, Excel, MS Project and Outlook), Spark MLlib, Scala NLP, MariaDB, Azure, SAS.

Databases: Oracle, Teradata, Netezza, Microsoft SQL Server, MongoDB, HBase, Cassandra.

Operating Systems: Windows, UNIX, MS DOS, Sun Solaris.

WORK EXPERIENCE:

Confidential, DALLAS, TX

SR. DATA SCIENTIST/ARCHITECT

Responsibilities:

  • Provided the architectural leadership in shaping strategic, business technology projects, with an emphasis on application architecture.
  • Utilized domain knowledge and application portfolio knowledge to play a key role in defining the future state of large, business technology programs.
  • Participated in all phases of data mining, data collection, data cleaning, developing models, validation, and visualization and performed Gap analysis.
  • Developed MapReduce/Spark Python modules for machine learning & predictive analytics in Hadoop on AWS and implemented a Python-based distributed random forest via Python streaming.
  • Created ecosystem models (e.g. conceptual, logical, physical, canonical) that are required for supporting services within the enterprise data architecture (conceptual data model for defining the major subject areas used, ecosystem logical model for defining standard business meaning for entities and fields, and an ecosystem canonical model for defining the standard messages and formats to be used in data integration services throughout the ecosystem).
  • Used Pandas, NumPy, seaborn, SciPy, Matplotlib, Scikit-learn, NLTK in Python for developing various machine learning algorithms and utilized machine learning algorithms such as linear regression, multivariate regression, naive Bayes, Random Forests, K-means, & KNN for data analysis.
  • Involved in analyzing data coming from various sources and creating Meta-files and control files to ingest the data in to the DataLake and involved in configuring batch job to perform ingestion of the source files in to the DataLake.
  • Supporting data analysis projects by using Elastic MapReduce on the Amazon Web Services (AWS) cloud performed Export and import of data into S3.
  • Worked on monitoring and troubleshooting the Kafka-Storm-HDFS data pipeline for real-time data ingestion in Datalake in HDFS.
  • Conducted studies, rapid plots and using advance data mining and statistical modeling techniques to build solution that optimize the quality and performance of data.
  • Developed multiple POCs using PySpark and deployed on the YARN cluster, compared the performance of Spark, with Hive and SQL and Involved in End-to-End implementation of ETL logic.
  • Demonstrated experience in design and implementation of Statistical models, Predictive models, enterprise data model, metadata solution and data life cycle management in both RDBMS, Big Data environments.
  • Responsible for developing data pipeline with Amazon AWS to extract the data from weblogs and store in HDFS.
  • Involved in converting Hive /SQL queries into Spark transformations using Spark RDD, Scala and Python.
  • Analyzed large data sets apply machine learning techniques and develop predictive models, statistical models and developing and enhancing statistical models by leveraging best-in-class modeling techniques.
  • Worked on database design, relational integrity constraints, OLAP, OLTP, Cubes and Normalization (3NF) and De-normalization of database.
  • Created data models for AWS Redshift and Hive from dimensional data models and worked on Data modeling, Advanced SQL with Columnar Databases using AWS
  • Developed, deployed and managed several MongoDB clusters whilst implementing robustness and scalability via Sharing and replication, including automating tasks with own scripts and open-source tools for performance tuning and system monitoring.
  • Data Consolidation was implemented using SPARK, HIVE to generate data in the required formats by applying various ETL tasks for data repair, massaging data to identify source for audit purpose, data filtering and store back to HDFS.
  • Worked with various Teradata15 tools and utilities like Teradata Viewpoint, Multi Load, ARC, Teradata Administrator, BTEQ and other Teradata Utilities.
  • Utilized Spark, Scala, Hadoop, HBase, Kafka, Spark Streaming, MLlib, Python, a broad variety of machine learning methods including classifications, regressions, dimensionally reduction etc.
  • Developed LINUX Shell scripts by using NZSQL/NZLOAD utilities to load data from flat files to Netezza database.
  • Designed and implemented system architecture for Amazon EC2 based cloud-hosted solution for client.

Environment: Erwin r9.6, Python, SQL, Oracle 12c, Netezza, SQL Server, Informatica, Java, SSRS, PL/SQL, T-SQL, Tableau, MLlib, regression, Cluster analysis, Scala NLP, Spark, Kafka, MongoDB, logistic regression, Hadoop, Hive, Teradata, random forest, OLAP, Azure, MariaDB, SAP CRM, HDFS, ODS, NLTK, SVM, JSON, Tableau, XML, Cassandra, MapReduce, AWS.

Confidential, CHICAGO IL

SR. DATA SCIENTIST/ARCHITECT

Responsibilities:

  • Involved in the entire data science project life cycle and actively involved in all the phases including data extraction, data cleaning, statistical modeling and data visualization with large data sets of structured and unstructured data
  • Worked with data compliance teams, data governance team to maintain data models, Metadata, Data Dictionaries; define source fields and its definitions.
  • Handled importing data from various data sources, performed transformations using Hive, Map Reduce, and loaded data into HDFS.
  • Applied breadth of knowledge in programming (Python, R), Descriptive, Inferential, and Experimental Design statistics, advanced mathematics, and database functionality (SQL, Hadoop)
  • Worked with machine learning algorithm such as logistic regression, random forest, XGBoost, KNN, SVM, neural network, linear regression, lasso regression and k-means
  • Transformed Logical Data Model to Erwin, Physical Data Model ensuring the Primary Key and Foreign Key relationships in PDM, Consistency of definitions of Data Attributes and Primary Index Considerations.
  • Worked on real-time data processing using Spark/Storm and Kafka using Scala and worked on writing Scala programs using Spark on Yarn for analyzing data and worked on writing Scala programs using Spark/Spark-SQL in performing aggregations and developed web services in play framework using Scala in building stream data platform.
  • Developed Data Science content involving Data Manipulation & Visualization, Web Scraping, Machine Learning, Python programming, SQL, GIT and ETL for Data Extraction.
  • Converted INFORMATICA ETL logic, which is re written in SPARK, SCALA using SPARK Data Frames API for data transformations, ETL jobs, SPARK SQL for processing data as per BI aggregations, reporting needs.
  • Designed and developed architecture for data services ecosystem spanning Relational, NoSQL, and Big Data technologies.
  • Used Netezza SQL, Stored Procedures, and NZload utilities as part of the DWH appliance framework and worked with the UNIX team and installed TIDAL job scheduler on QA and Production Netezza environment.
  • Developed enhancements to MongoDB architecture to improve performance and scalability and worked on MongoDB database concepts such as locking, transactions, indexes, Sharding, replication, schema design.
  • Handled importing data from various data sources, performed transformations using Hive, Map Reduce, and loaded data into HDFS.
  • Utilized Ansible and AWS lambda, kinesis, elastic cache and cloudwatch logs to automate the creation of log aggregation pipeline with ElasticSearch, Logstash, Kibana stack (ELK stack) to send all of our team's logs coming into cloudwatch, to process them and send them off to ElasticSearch.
  • Developed scripts in Python (Pandas, Numpy) for data ingestion, analyzing and data cleaning.
  • Created Hive queries that helped analysts spot emerging trends by comparing fresh data with EDW reference tables and historical metrics and processed the data using HQL (like SQL) on top of Map-reduce.
  • Developed Python scripts to automate and provide Control flow to Pig scripts for extracting the data and load into HDFS.
  • Designed the ETL process to Extract translates and load data from OLTP Oracle database system to Teradata data warehouse and worked with BTEQ to submit SQL statements, import and export data, and generate reports in Teradata.
  • Use different machine learning algorithms such as linear and logistic regression, ANOVA/ANCOVA, decision trees, support vector machines, KNN, random forest, Deep learning neural networks and XGBoost.
  • Analyzed the web log data using the HiveQL to extract number of unique visitors per day, page views, visit duration, most purchased product on website and managed and reviewed Hadoop log files.
  • Used Erwin for effective model management of sharing, dividing and reusing model information and design for productivity improvement.
  • Responsible for developing efficient MapReduce on AWS cloud programs like claim data to detect and separate fraudulent claims
  • Designed and developed user interfaces and customization of Reports using Tableau and OBIEE and designed cubes for data visualization, mobile/web presentation with parameterization and cascading.
  • Developed and implemented SSIS, SSRS and SSAS application solutions for various business units across the organization and created SSIS Packages using Pivot Transformation, Execute SQL Task, Data Flow Task, etc to import data into the data warehouse.

Environment: ERwin9.x, Python, Spark, Scala, Teradata, Oracle11g, Hadoop, HDFS, Pig, Hive, MapReduce, PL/SQL, UNIX, Informatica Power Center, MDM, SQL Server, Netezza, DB2, Tableau, Aginity, SAS/Graph, SAS/SQL, Tableau, SAS/Connect and SAS/Access, HBase, MongoDB, Kafka, Sqoop, AWS S3, EMR, EC2, Redshift.

Confidential, TROY MI

SR. DATA MODELER/ARCHITECT

Responsibilities:

  • Providing technical leadership for new proposals and existing projects inclusive of analysis, requirements definition, architecture, design, validation, implementation, problem identification and resolution.
  • Designed ER diagrams (Physical and Logical using ER STUDIO) and mapping the data into database objects and produced Logical /Physical Data Models.
  • Worked with delivery of Data & Analytics applications involving structured and un-structured data on Hadoop based platforms on AWS EMR
  • Performed Data Analysis, Data Migration and data profiling using complex SQL on various sources systems including Oracle and Teradata13.1.
  • Involved in Teradata utilities (BTEQ, Fast Load, Fast Export, Multiload, and Tpump) in both Windows and Mainframe platforms.
  • Participated in Dimensional modeling (Star Schema) of the Data warehouse and used ER STUDIO to design the business process, dimensions and measured facts.
  • Involved in all the steps and scope of the project reference data approach to MDM and Created Data Dictionary and Data Mapping from Sources to the Target in MDM Data Model.
  • Implemented full lifecycle in Data Modeler/Data Analyst, Data warehouses and Data Mart's with Star Schemas, Snowflake Schemas, and SCD& Dimensional Modeling ER STUDIO
  • Handled importing data from various data sources, performed transformations using Hive, Map Reduce, and loaded data into HDFS.
  • Used Python for developing Spark code for faster processing of data on Hive and used Spark streaming to divide streaming data into batches as an input to spark engine for batch processing.
  • Designed both 3NF data models for ODS, OLTP systems and dimensional data models using Star and Snowflake Schemas.
  • Involved in increasing productivity through the ability to design data integration tasks and workflows graphically in Informatica, and then deploy and run them natively on Hadoop.
  • Support and hands on Cognos report development, build packages, report studio, query studio, ad-hoc report, active reports and cleaned/prepared data in SPSS.
  • Worked on MongoDB database design and Indexing Techniques and worked on creating documents in Mongo database.
  • Independently coded new programs and designed Tables to load and test the program effectively for the given POC's using with Big Data/Hadoop.
  • Updated Python scripts to match training data with our database stored in AWS Cloud Search, so that we would be able to assign each document a response label for further classification.
  • Implemented end-to-end systems for Data Analytics, Data Automation and integrated with custom visualization tools using Hadoop and MongoDB.
  • Worked on Informatica Power Center tools - Source Analyzer, Target designer, Mapping Designer, Workflow Manager, Workflow monitor, Mapplet Designer and Transformation Developer.
  • Created drill down, drill through, sub reports using SSRS as well as managed subscription reports as per client's requirements using SSRS.
  • Developed, Implemented & Maintained the Conceptual, Logical & Physical Data Models using Erwin for Forward/Reverse Engineered Databases.
  • Created 3NF business area data modeling with de-normalized physical implementation data and information requirements analysis using ER STUDIO tool.
  • Performed data mining on data using very complex SQL queries and discovered pattern and used extensive SQL for data profiling/analysis to provide guidance in building the data model.
  • Created high level ETL design document and assisted ETL developers in the detail design and development of ETL maps using Informatica.
  • Worked on SQL Server concepts SSIS (SQL Server Integration Services), SSAS (Analysis Services) and SSRS (Reporting Services).
  • Developed the code as per the client's requirements using SQL, PL/SQL and Data Ware housing concepts.
  • Participated in several facets of MDM implementations including Data Profiling, metadata acquisition and data migration.

Environment: ER Studio, SQL server, Business Objects XI, MS Excel 2010, Informatica, Rational Rose, Oracle 10g, SAS, SQL, PL/SQL, SSRS, SSIS, T-SQL, Netezza, Tableau, XML, DDL, TOAD for Data Analysis, Teradata SQL Assistant, Hadoop, Spark, Scala, Python, Hive, HDFS, Sqoop, MongoDB, HBase, AWS.

Confidential, DETROIT, MI

SR. DATA MODELER/ANALYST

Responsibilities:

  • Participated in JAD sessions, gathered information from Business Analysts, end users and other stakeholders to determine the requirements.
  • Documented logical, physical, relational and dimensional data models. Designed the Data Marts in dimensional data modeling using star and snowflake schemas.
  • Involved in Normalization (3rd normal form), De-normalization (Star Schema for Data Warehousing and implemented slowly changing dimensions Type2 and Type3 for accessing history of reference data changes.
  • Involve in the Analysis, design, testing and Implementation of Business Intelligence solutions using Enterprise Data Warehouse, ETL, OLAP, Client/Server applications.
  • Worked in using Teradata14.1 tools like Fast Load, Multi Load, T Pump, Fast Export, Teradata Parallel Transporter (TPT) and BTEQ.
  • Extensively involved in design and development of Tableau visualization solutions using operational dashboards, reports, workbooks on various flavors of RDBMS like Oracle, MsAccess and MS SQL Server.
  • Strong knowledge of Entity-Relationship concept, Facts and dimensions tables, slowly changing dimensions and Dimensional Modeling (Star Schema and Snow Flake Schema).
  • Designed the Data Warehouse and MDM hub Conceptual, Logical and Physical data models and generated DDL scripts using Forward Engineering technique to create objects and deploy them into the databases.
  • Performed Daily Monitoring of Oracle instances using Oracle Enterprise Manager, ADDM, TOAD, monitor users, table spaces, memory structures, rollback segments, logs and alerts.
  • Developed stored procedures on Netezza and SQL server for data manipulation and data warehouse population
  • Assisted in production OLAP cubes, wrote queries to produce reports using SQL Server Analysis Services (SSAS) and Reporting service (SSRS).
  • Involved in Teradata SQL Development, Unit Testing and Performance Tuning and to ensure testing issues are resolved on the basis of using defect reports.
  • Customized reports using SAS/MACRO facility, PROC REPORT, PROC TABULATE and PROC.
  • Used Normalization methods up to 3NF and De-normalization techniques for effective performance in OLTP and OLAP systems.
  • Designed and Developed Oracle12c, PL/SQL Procedures and UNIX Shell Scripts for Data Import/Export and Data Conversions.
  • Used ETL methodology for supporting data extraction, transformations and loading processing, in a complex MDM using Informatica.
  • Worked on database testing, wrote complex SQL queries to verify the transactions and business logic like identifying the duplicate rows by using SQL Developer and PL/SQL Developer.
  • Used Teradata SQL Assistant, Teradata Administrator, PMON and data load/export utilities like BTEQ, Fast Load, Multi Load, Fast Export, TPump on UNIX/Windows environments and running the batch process for Teradata.
  • Worked on Data warehouse concepts like Data warehouse Architecture, Star schema, Snowflake schema, and Data Marts, Dimension and Fact tables.
  • Developed SQL Queries to fetch complex data from different tables in remote databases using joins, database links and Bulk collects.
  • Used SSIS to create ETL packages to validate, extract, transform and load data to pull data from Source servers to staging database and then to Netezza Database and DB2 Databases.
  • Worked on SQL Server concepts SSIS (SQL Server Integration Services), SSAS (Analysis Services) and SSRS (Reporting Services).

Environment: ER Studio, Teradata13.1, SQL, PL/SQL, BTEQ, DB2, Oracle, MDM, Netezza, ETL, RTF UNIX, SQL Server2010, Informatica, SSRS, SSIS, SSAS, SAS, Aginity.

Confidential, WASHINGTON DC

SR. DATA ANALYST

Responsibilities:

  • Worked with SME's and other stakeholders to determine the requirements to identify Entities and Attributes to build Conceptual, Logical and Physical data Models.
  • Used Star Schema methodologies in building and designing the logical data model into Dimensional Models extensively.
  • Developed Star and Snowflake schemas based dimensional model to develop the data warehouse.
  • Designed Context Flow Diagrams, Structure Chart and ER- diagrams.
  • Worked on database features and objects such as partitioning, change data capture, indexes, views, indexed views to develop optimal physical data mode
  • Tested Complex ETL Mappings and Sessions based on business user requirements and business rules to load data from source flat files and RDBMS tables to target tables
  • Worked with SQL Server Integration Services in extracting data from several source systems and transforming the data and loading it into ODS.
  • Worked with SQL, SQL PLUS, Oracle PL/SQL Stored Procedures, Triggers, SQL queries and loading data into Data Warehouse/Data Marts.
  • Worked with DBA group to create Best-Fit Physical Data Model from the Logical Data Model using Forward engineering using Erwin.
  • Reviewed business requirements and analyzing data sources form Excel/Oracle SQL Server for design, development, testing, and production rollover of reporting and analysis projects.
  • Created Logical and Physical data models with Star and Snowflake schema techniques using Erwin in Data warehouse as well as in Data Mart.
  • Performed data analysis and data profiling using complex SQL on various sources systems including Oracle 8i/9i.
  • Involved in Data Analysis, Data Validation, Data Cleansing, Data Verification and identifying data mismatch.
  • Designed data model, analyzed data for online transactional processing (OLTP) and Online Analytical Processing (OLAP) systems.
  • Wrote and executed customized SQL code for ad-hoc reporting duties and other tools for routine.
  • Developed stored procedures and complex packages extensively using PL/SQL and shell programs.
  • Customized reports using SAS/MACRO facility, PROC REPORT, PROC TABULATE and PROC.
  • Generated ad-hoc SQL queries using joins, database connections and transformation rules to fetch data from legacy SQL Server database systems.
  • Used existing UNIX shell scripts and modified them as needed to process SAS jobs, search strings, execute permissions over directories etc.

Environment: Erwin, SQL Server, PL/SQL, SQL, T-SQL, ETL, OLAP, OLTP, SAS, ODS, UNIX, Oracle, DQ Analyzer, XML, IBM Rational Clear Case and Clear Quest.

We'd love your feedback!