We provide IT Staff Augmentation Services!

Data Scientist Resume

5.00/5 (Submit Your Rating)

Chicago, IL

PROFESSIONAL SUMMARY:

  • Over 8+ years of experience in Data Analysis, Decision Trees, Random Forest, Data Profiling, Data Integration, Data governance, Migration and Metadata Management, Master Data Management and Configuration Management.
  • Experience in various phases of the Software Development life cycle (Analysis, Requirements gathering, Designing) with expertise in documenting various requirement specifications, functional specifications, Data Validation, Test Plans, Source to Target mappings, SQL Joins, Data Cleansing.
  • Extensive experience in Text Analytics, developing different Statistical Machine Learning, Data Mining solutions to various business problems and generating data visualizations using R, Python and Tableau.
  • Expertise in transforming business requirements into analytical models, building models, designing algorithms, developing data mining and reporting solutions that scales across a massive volume of structured and unstructured data.
  • Proficient in developing and designing ETL packages and reporting solutions using MS BI Suite (SSIS/SSRS).
  • Experience in conducting Joint Application Development (JAD) sessions for analysis, requirements gathering, design and Rapid Application Development (RAD)
  • Experience in coding SQL/PL SQL using Triggers, Procedures and Packages.
  • Experience in end - to-end implementation of a data warehouse project based on the SAS EG.
  • Experience in building and publishing interactive reports and dashboards with design customizations based on the client requirements in Tableau, Looker, PowerBI and SSRS.
  • Proficient in data visualization tools such as Tableau, Python Matplotlib, Python Seaborn, R Shiny, R ggplot2 to create visually powerful and actionable interactive reports and dashboards.
  • Experience with Teradata and big data as the target for data marts, worked with BTEQ, Fast Load and Multi Load
  • Experience with Reporting tool Microsoft Power BI to make various Vendor spend and trade cycle reports.
  • Extensive working experience with Python including Scikit-learn, Pandas and Numpy.
  • Integration Architect & Data Scientist experience in Analytics, Big Data, SOA, ETL and Cloud technologies. Experienced in Big Data with Hadoop 2, Hive, HDFS, MapReduce, and Spark.
  • Experience working with data modeling tools like Power Designer, Erwin and ER Studio.
  • Experience in designing Snow flake schema, star schemas for Data Warehouse, ODS architecture.
  • Experience in designing stunning visualizations using Tableau software, publishing and presenting dashboards, Storyline on web and desktop platforms.
  • Skilled in System Analysis, Dimensional Data Modeling, Database Design and implementing RDBMS specific features.
  • Worked and extracted data from various database sources like Oracle, SQL Server and Teradata.
  • Experience and Technical proficiency in Designing, Analysing, Data Modeling Online Applications, Solution Lead for Architecting Data Warehouse/Business Intelligence Applications.
  • Good understanding of Teradata SQL Assistant, Teradata Administrator and data load Experience with Data Analytics, Data Reporting, Scales, Ad-hoc Reporting, Graphs, Pivot Tables and OLAP reporting.
  • Experience in advanced SAS programming techniques, such as PROC APPEND, PROC SQL (JOIN/ UNION), PROC DATASETS, and PROC TRANSPOSE.
  • Experience working with data modeling tools like Erwin, Power Designer and ER Studio.
  • Experience in designing star schemas, Snow flake schema for Data Warehouse, ODS architecture.
  • Experience in coding SQL/PL SQL using Procedures, Triggers and Packages.
  • Good understanding of Data Warehouse/OLAP, Relational Database Design concepts and methodologies.

TECHNICAL SKILLS:

Big Data/Hadoop Technologies: Hadoop, HDFS, Map Reduce, Hive, Pig, YARN, Impala, Sqoop, Flume, Spark, Kafka, Storm, Drill, Zookeeper and Oozie

Languages: HTML5, DHTML, CSS3, C, C++, XML, WSDL, R/R Studio, SAS Enterprise Guide, SAS, R (Caret, Weka, ggplot), Perl, MATLAB, Mathematica, FORTRAN, DTD, Schemas, Json, Ajax, Java, Scala, Python (NumPy, SciPy, Pandas, Gensim, Keras), Java Script, Shell Scripting

Databases: Microsoft SQL Server 2008,2010/2012, MySQL 4.x/5.x, Oracle 11g, 12c, DB2, Netezza, TeradataNO SQL Databases: Cassandra, HBase, MongoDB, MariaDB

Business Intelligence Tools: Tableau server, Tableau Reader, Tableau, Splunk, SAP Business Objects, OBIEE, SAP Business Intelligence, QlikView, Amazon Redshift, or Azure Data Warehouse

Development Tools: Microsoft SQL Studio, IntelliJ, Eclipse, NetBeans.

Development Methodologies: Agile/Scrum, UML, Design Patterns, Waterfall

Build Tools: Jenkins, Toad, SQL Loader, Maven, ANT, RTC, RSA, Control-M, Oziee, Hue, SOAP UI

Reporting Tools: MS Office (Word/Excel/Power Point/ Visio/Outlook), Crystal reports XI, SSRS, congas 7.0/6.0.

Operating Systems: All versions of Windows, UNIX, LINUX, Macintosh HD, Sun Solaris

PROFESSIONAL EXPERIENCE:

Confidential, Chicago, IL

Data Scientist

Responsibilities:

  • Conducted analysis in assessing customer consuming behaviors and discover value of customers with RMF analysis; applied customer segmentation with clustering algorithms such as K-Means Clustering and Hierarchical Clustering.
  • Collaborated with data engineers to implement ETL process, wrote and optimized SQL queries to perform data extraction and merging from Oracle.
  • Involved in managing backup and restoring data in the live Cassandra Cluster.
  • Used R, Python, MATLAB and Spark to develop a variety of models and algorithms for analytic purposes.
  • Worked with Spark for improving performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames, Pair RDD's.
  • Created SQLtables with referential integrity and developed queries using SQL, SQL*PLUS and PL/SQL Designed both 3NF data models for ODS, OLTP systems and dimensional data models using Star and Snowflake Schemas.
  • Performed data integrity checks, data cleaning, exploratory analysis and feature engineer using R and Python.
  • Developed personalized product recommendation with Machine Learning algorithms, including Gradient Boosting Tree and Collaborative filtering to better meet the needs of existing customers and acquire new customers.
  • Developed logistic regression models to predict subscription response rate based on customer’s variables like past transactions, promotions, response to prior mailings, demographics, interests and hobbies, etc.
  • Worked as Data Architects and IT Architects to understand the movement of data and its storage and ER Studio 9.7
  • Used Python and Spark to implement different machine learning algorithms, including Generalized Linear Model, Random Forest, SVM, Boosting and Neural Network.
  • Evaluated parameters with K-Fold Cross Validation and optimized performance of models.
  • Worked on benchmarking Cassandra Cluster using the Cassandra stress tool.
  • A highly immersive Data Science program involving Data Manipulation and Visualization, Web Scraping, Machine Learning, GIT, SQL, Unix Commands, Python programming, NoSQL, MongoDB, Hadoop.
  • Worked on data cleaning, data preparation and feature engineering with Python, including Numpy, Scipy, Matplotlib, Seaborn, Pandas, and Scikit-learn.
  • Identified risk level and eligibility of new insurance applicants with Machine Learning algorithms.
  • Determined customer satisfaction and helped enhance customer using NLP.
  • Recommended and evaluated marketing approaches based on quality analytics on customer consuming behavior.
  • Utilized SQL and HiveQL to query, manipulate data from variety data sources including Oracle and HDFS, while maintaining data integrity.
  • Performed data visualization and Designed dashboards with Tableau and D3.js and provided complex reports, including charts, summaries, and graphs to interpret the findings to the team and stakeholders.
  • Identified process improvements that significantly reduce workloads or improve quality.

Environment: R Studio, Python, Matlab, Tableau, SQL Server 2012,2014 and Oracle 10g, 11g

Confidential

Data Scientist

Responsibilities:

  • Utilize a broad variety of statistical packages like SAS, R, MLIB, Graphs, Hadoop, Spark, Map Reduce and others
  • Provides input and recommendations on technical issues to Business & Data Analysts, BI Engineers and Data Scientists.
  • Created parameters, action filters and calculated sets for preparing dashboards and worksheets in Tableau.
  • Interacting with other data scientists and architects, custom solutions for data visualization using tools like a tableau, R-Shiny and Packages in R
  • Involved in running Map Reduce jobs for processing millions of records.
  • Written complex SQL queries using joins and OLAP functions like Count, CSUM, and Rank etc.
  • Building, publishing customized interactive reports, report scheduling and dashboards using Tableau server.
  • Developed in Python programs for manipulating the data reading from various Teradata and convert them as one CSV Files.
  • Wrote several Teradata SQL Queries using Teradata SQL Assistant for Ad Hoc Data Pull request.
  • Performing statistical data analysis and data visualization using R and Python.
  • Worked on creating filters and calculated sets for preparing dashboards and worksheets in Tableau.
  • Extensively used open source tools - R Studio(R) and Spyder(Python) for statistical analysis and building the machine learning.
  • Performing Data Validation / Data Reconciliation between disparate source and target systems (Salesforce, Cisco-UIC, Cognos, Data Warehouse) for various projects.
  • Created new scripts for Splunk scripted input for system, collecting CPU and OS data.
  • Implemented data refreshes on Tableau Server for biweekly and monthly increments based on business change to ensure that the views and dashboards were displaying the changed data accurately.
  • Developed normalized Logical and Physical database models for designing an OLTP application.
  • Knowledgeable in AWS Environment for loading data files from on prim to Redshift cluster
  • Performed SQL Testing on AWS Redshift databases
  • Developed Teradata SQL scripts using OLAP functions like rank and rank () Over to improve the query performance while pulling the data from large tables.
  • Developed and implemented SSIS, SSRS and SSAS application solutions for various business units across the organization.
  • Utilized a diverse array of technologies and tools as needed, to deliver insights such as R, SAS, Matlab, Tableau and more.
  • Analyzed Data Set with SAS programming, R and Excel.
  • Publish Interactive dashboards and schedule auto-data refreshes
  • Developed Map Reduce Python modules for machine learning & predictive analytics in Hadoop on AWS.
  • Maintenance of large data sets, combining data from various sources by Excel, Enterprise, SAS Grid, Access and SQL queries.
  • Performed Tableau administering by using tableau admin commands.
  • Extracted data from SQL Server Database, copied into HDFS File system and used Hadoop tools such as Hive and Pig Latin to retrieve the data required for building models.
  • Created UDFs to calculate the pending payment for the given residential or small business customer's quotation data and used in Pig and Hive Scripts.
  • Worked on moving data from Hive tables into HBase for real time analytics on Hive tables.
  • Handled importing of data from various data sources, performed transformations using Hive. (External tables, partitioning)
  • Responsible for creating Hive tables, loading the structured data resulted from Map Reduce jobs into the tables and writing hive queries to further analyze the logs to identify issues and behavioral patterns.
  • Design and development of ETL processes using Informatica ETL tools for dimension and fact file creation.
  • Develop and automate solutions for a new billing and membership Enterprise data Warehouse including ETL routines, tables, maps, materialized views, and stored procedures incorporating Informatica and Oracle PL/SQL toolsets.
  • Performed analysis of implementing Spark uses Scala and wrote spark sample programs using PySpark.

Environment: SQL/Server, Oracle 10g/11g, MS-Office, Teradata, Informatica, ER Studio, XML, Hive, HDFS, Flume, Sqooq, R connector, Python, R, Tableau 9.2

Confidential - Tampa, FL

Data Analyst

Responsibilities:

  • Work with users to identify the most appropriate source of record required to define the asset data for financing
  • Implemented Agile Methodologies, Scrum stories and sprints in a Python based environment, along with data analytics and Excel data extracts.
  • Developed normalized Logical and Physical database models for designing an OLTP application.
  • Developed new scripts for gathering network and storage inventory data and make Splunk ingest data.
  • Exported the data required information to RDBMS using Sqoop to make the data available for the claims processing team to assist in processing a claim based on the data.
  • Utilize a broad variety of OLAP function like Count, SUM, CSUM and worked on MS Excel using Pivot tables, Graphs.
  • Created Hive queries that helped market analysts spot emerging trends by comparing fresh data with EDW reference tables and historical metrics.
  • Involved in defining the source to target data mappings, business rules, business and data definitions.
  • Worked on MongoDB database concepts such as locking, transactions, indexes, Sharding, replication, schema design, etc.
  • Performed Data analysis using Python Pandas.
  • Performed Data analysis and Data profiling using complex SQL on various sources systems including Oracle and Teradata.
  • Created tables in Hive and loaded the structured (resulted from Map Reduce jobs) data
  • Using HiveQL developed many queries and extracted the required information.
  • Design and deploy rich Graphic visualizations with Drill Down and Drop down menu option and Parameterized using Tableau.
  • Extracted data from the database using SAS SQL procedures, SAS/Access, and create SAS data sets.
  • Created Teradata SQL scripts using OLAP functions like RANK to improve the query performance while pulling the data from large tables.
  • Responsible for defining the key identifiers for each mapping/interface
  • Responsible for defining the functional requirement documents for each source to target interface.
  • Used advanced Excel features like Pivot tables and Charts for generating Graphs.
  • Designed and developed weekly, monthly reports by using MS Excel Techniques (Graphs, Charts, Pivot tables) and Power point presentations.
  • Involved in creation of Excel skills, including Vlookup, pivots, conditional formatting, large record sets, data manipulation and cleaning.
  • Imported the customer data into Python using Pandas libraries and performed various data analysis - found patterns in data which helped in key decisions for the company

Environment: SQL/Server, Oracle 11g, MS-Office, Teradata, Informatica, ER Studio, XML, Hive, HDFS, Flume, Sqoop, R connector, Python, R, Tableau.

Confidential - Spokane, WA

Data Engineer

Responsibilities:

  • Analysis, Design, Architecture, Development and implementation of Data warehouse and enterprise application development projects based on a provided set of business requirements.
  • Designed and publishing visually rich and intuitive Tableau dashboards for executive decision making. Created various views in Tableau like Tree maps, Scatter plots, Heat Maps, Line chart, Geographic maps, Pie charts and etc.
  • Designed and implemented appropriate ETL mappings to extract and transform data from various sources to meet the requirements.
  • Worked on Sqoop to migrate data to and fro from HDFS and MySQL or Oracle and deployed Hive and HBase integration to perform OLAP operations on HBase data.
  • Assisted in Batch processes using BTEQ, Fast Load, UNIX Shell and Teradata SQL to transfer cleanup and summarize data.
  • Developed SQL queries, PL/SQL programming Packages, Procedures and Functions to meet various user/business requirements.
  • Worked on multiple ETL tools to transform the data from Mainframe, DB2, Oracle, Flat file to target Oracle, Netezza & Teradata on a large Data Warehouse.
  • Worked in three different profiles including Hadoop ETL reporting, Mainframe Job monitoring, scheduling and user administration. Proposed, developed backup and recovery architecture for recurring (Extract, Transform and Load) reports.
  • Customized and developed the OBIEE Physical Layer, Business Model and Mapping layer.
  • Used Teradata Aster bulk load feature to bulk load flat files to Aster. Used Aster UDFs to unload data from staging tables and client data for SCD which resided on Aster Database.
  • Exploring with the Spark is improving the performance and optimization of the existing algorithms in Hadoop using Spark SQL, Data Frame, Spark Context, Pair RDD's, Spark YARN.

Environment: Hadoop, Hive, Apache Spark, Tableau, My SQL, Apache Mesos, Unix Shell Programming, Hbase, Teradata SQL, Flume, MS Office and Delimited Flat files, Oozie, DB2, Teradata, Teradata SQL Assistant, UNIX Shell Scripting, Toad, Windows XP and MS Office Suite.

Confidential - Reston, VA

Database Developer/Analyst

Responsibilities:

  • Analyzed data collected in stores like SQL, jobs, Package, stored-procedures and queries
  • Identifying the long running jobs and queries
  • Document the unit and regression test results and fix issues.
  • Design and develop triggers/crontabs as per business requirement and schedule them weekly.
  • Compiled lists of stores respecting criteria given by client services, while minimizing collection costs (SQL, Queries, Views)
  • To increase the performance and to manage large database objects Partitioned Tables using Range Partitioning, List Partitioning.
  • Designed, developed, modified Tables, Materialized Views, Views, Stored Procedures Packages and Functions.
  • Coded PL/SQL Packages and Procedures to perform data loading, Error Handling and logging.
  • Worked on performance tuning of queries using EXPLAIN PLAN
  • Wrote PL/SQL Database Triggers to implement the business rules in the application.
  • Created validation rules for performing data validations depending on the user's profile and their record type.
  • Peer review the code for data fixes and production break fixes
  • Ensure compliance of overall design with the enterprise development standards.

Environment: Oracle 9i / 10g, PL/SQL, SQL Developer, SQL Server 2005/2008, SQL*Plus, TOAD 9.5, SQL*Loader, Putty, Microsoft Visual Source Safe.

Confidential - Princeton, NJ

PL/SQL Developer

Responsibilities:

  • Involved in design, development and Modification of PL/SQL stored procedures,
  • Involved in tuning the SQL queries create database objects and optimally set storage parameters for tables and indexes.
  • Wrote PL/SQL packages consisting of functions and procedures
  • Wrote Shell scripts to insert data into database tables.
  • Wrote PL/SQL code to retrieve data for master and detail blocks.
  • Created Materialized Views, Sequences, Triggers and collections.
  • Worked with tuning the complex Stored Procedures for faster execution using bulk collections and Dbms profiler.
  • Functions, packages and triggers to implement business rules into the application.
  • Extensively used cursors for fetching the rows.
  • Loading data through UTL FILE and SQL Loader.
  • Developed complex triggers in reports before/after for validation of user input.
  • Designed and developed user interfaces using Oracle Forms.
  • Customized forms and reports as per user requirements using Oracle Developer Forms and Reports.
  • Interacted with clients in remote locations to verify customer requirements.
  • Involved in development and testing of oracle back-end objects like database triggers, stored procedures, Sequences and Synonyms.
  • Performed review of the business process, involved in Requirements Analysis, Flows, System Design Documents, Test Plan preparations and Development of Business Process / Work Flow charts.
  • Developed PL/SQL procedures, customizing existing programs according to the needs of the client testing.
  • Designed reports using Reports as per user requirements. Created Reference and Master/Detail tables to store information and Tested forms and reports using test data.

Environment: Oracle 9i, SQL, PL/SQL, SQL*Plus, Forms 6i, Reports 6i, Windows.

We'd love your feedback!