We provide IT Staff Augmentation Services!

Data Scientist Resume

0/5 (Submit Your Rating)

WI

SUMMARY

  • 8+ years of experience in Machine Learning, Data mining wif large data sets of Structured and Unstructured data, Data Acquisition, Data Validation, Predictive modeling, Data Visualization.
  • Adept in statistical programming languages like R and Python including Big Data technologies like Hadoop, Teradata wif strong domain expertise in multiple industries.
  • Expertise in Applied Statistics, Exploratory Data Analysis and Visualization using tools like R, Tableau along wif various Hadoop environment
  • Hands on experience in Supervised Machine learning for Regression and Classification, Unsupervised learning using K - means and Hierarchical clustering.
  • Data extraction, Cleansing, Analysis, and Reporting using SAS, SQL, and Advance Excel. Analyzed and Developed analytical reports for monthly risk management presentation to higher management. Maintained and enhanced teh Reports and dashboards.
  • Strong noledge in building Data Integration, Workflow Solutions, and Extract, Transform and Load (ETL) solutions for data warehousing using SQL Server Integration Service (SSIS) and Talend
  • Experience in creating and implementing interactive charts, graphs, and other user interface elements.
  • Strong experience in writing complex SQL queries using T-SQL. Involved in coding and pulling data from various tables
  • Hands on experience in writing sub-queries involving in multiple Joins and creating Temporary Tables and views which halped in generating Reports, Graphs in Tableau
  • Worked on all activities related to teh development, implementation, administration and support of ETL processes for large-scale Data Warehouses using SQL Server SSIS
  • Proficient in design and development of various dashboards, reports utilizing TableauVisualizations according to teh end user requirement.
  • Good experience onHadooptools like MapReduce, Hive and HBase
  • Experience in all stages of Software Development Life Cycle (SDLC) including unit testing and system integration.
  • Experience in creating ETL mappings in SSIS to load data from different sources to different target databases and data analysis, data profiling, data transformation, data mapping, data cleansing and data quality
  • Good experience in Using of Hbase for storing huge data on top of HDFS and access random real time data in HDFS
  • Good noledge in Acquiring data using ETL builds from Audible and Amazon Data Warehousing (EC2/DMART/Redshift)
  • Expertise in writing complex database SQL queries wif a focus on PostgreSQL and Redshift
  • Proficient working noledge in managing entire data science project life cycle using Agile methodology and actively involved in all teh phases of project life cycle including data acquisition, data cleaning, data engineering.
  • Performed joins, group by and other operations in MapReduce using Java or PIG Latin
  • Strong Testing and validation experience in various prediction and statistical models using ROC plot, K- fold cross validation and data visualization.
  • Extensive experience in Text Analytics, developing different Statistical Machine Learning, Data Mining solutions to various business problems R, Python and generating data visualizations using ggplot, Matplotlib and Informatica.
  • Expert in programming Tables, Views, Stored Procedures, Constrains, Indexes, Temporary tables, Triggers to optimize retrieval.
  • Good awareness in Writing complex queries using PL-SQL and T-SQL for extracting data from Oracle and MS SQL databases
  • Expertise in transforming business requirements into analytical models, designing algorithms, building models, developing data mining and reporting solutions that scales across massive volume of structured and unstructured data.
  • Excellent skilled in performing data parsing, data manipulation and data modeling wif methods including describe data contents, compute descriptive statistics of data, RegEx, split and combine, Remap, merge, subset, melt and reshape.
  • Expert in creating and modifying different types of ad-hoc reports including parameterized, linked, drilldown, drill-through, sub-report, matrix, tabular, and dashboard reports as well as Reports wif complex join Queries in SSRS
  • Knowledge in networking protocols and technologies, web services, version control tools like GIT and SVN.
  • Knowledge on working wif Proof of Concepts (PoC's) and gap analysis and gathered necessary data for analysis from various sources, prepared data for data exploration using data munging.
  • Good industry noledge, analytical and problem solving skills and ability to work well wifin a team as well as an individual. Highly creative, innovative, committed, intellectually curious, business savvy wif TEMPeffective communication and interpersonal skills.
  • Experienced in creating Test Cases, performing Manual Testing, identifying and tracking defects.
  • Comfortable in interacting wif business and end users, documenting teh development process
  • Skilled in documenting, tracking and managing issues/risks, supported project metrics analysis, team communication, risk analysis, report generation, and documentation control.
  • Good noledge in Integrated Complex Regular Expressions (RegEX) for extracting information into teh work flow of teh process
  • Experience in working wif various operating system like Linux (RHEL, Ubuntu, Centos), Unix along wif windows

TECHNICAL SKILLS

Reporting: Access, SSRS, Crystal Report, Excel, SQL*Plus

Programming Language: Python, R, SAS, PL-SQL, MDX, T-SQL, PL/SQL

Scripting Languages: Python, linux Bash, apache Pig, Hive,Sqoop

Operating System: Windows, Linux (centos, Ubuntu, RHEL) and Unix

DWH/ BI Tools: SQL Server, Amazon Redshift, Tera data, Tableau, Informatica

Database: MS SQL Server 2014, 2012, 2008 R2/2005 Oracle 9i/10g, MS-Access - 2007, SQL Server … Oracle 11g, and Access 2013 Database, Hive, MySQL, Amazon Redshift, Postgres SQL

NOSQL Databases: Hbase, Cassandra, MongoDB, Amazon DynamoDB, Redis

Tools: /Utilities Toad, Tableau, SQL Developer, Putty, SQL Profiler, Query Analyzer

Other Tools: MS Visual Studio, MS Office, MS Project, MS-Visio, MS PowerPoint, MS LYNC, SQL Loader, Quick Test professional and Quality Center (QC), JIRA

ETL Tools: Informatica, SSIS, Talend

PROFESSIONAL EXPERIENCE

Confidential - WI

Data Scientist

Responsibilities:

  • Performed Data Profiling to learn about user behavior and merged data from multiple data sources.
  • Participated in all phases of data mining; data collection, data cleaning, developing models, validation, visualization and performed Gap analysis.
  • Performed K-means clustering, Multivariate analysis and Support Vector Machines in Python and R.
  • Developed Clustering algorithms and Support Vector Machines that improved Customer segmentation and Market Expansion.
  • Created new ETL packages to validate, extract, transform and load data into data warehouse databases, data mart databases using SSIS packages.
  • Data Story teller, Mining Data from different Data Source such as SQL Server, Oracle, Cube Database, Web Analytics, Business Object and Hadoop.
  • Provided AD hoc analysis and reports to Executive level management team
  • Data Manipulation and Aggregation from different source using Toad and Tableau
  • Design, Implement and Maintain Database Models in RDBMS
  • Implemented analytics delivery on cloud-based visualization such as amazon web services platform
  • SPOC Data Scientist and predictive analyst to create annual and quarterly Business forecast reports.
  • Main source of Business Regression report
  • Involved in creating Hive tables, loading data and writing hive queries
  • Developed multiple MapReduce programs in java for data extraction, transformation and aggregation from multiple file formats including XML, JSON, CSV and other file formats
  • Designed interactive tableau KPI dashboards by gathering requirements from business users and slicing, across amazon Redshift
  • Creating various B2B Predictive and descriptive analytics using R and Tableau
  • Creating and automating ad hoc reports
  • Experience wif technical documentation of code and processes
  • Very good understanding of Partitions, bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance.
  • Responsible for planning & scheduling new product releases and promotional offers
  • Using SQL Server Reporting Services (SSRS) generated periodic reports based on teh statistical analysis of teh data.
  • Involved in data cleansing to remove unnecessary columns eliminate redundant and inconsistent data by using SSIS transformations
  • Experienced in Agile methodologies and attended regular SCRUM meetings.
  • Experience on working wif different data types like FLATFILES, CSV, AVRO and JSON
  • Parsing data, producing concise conclusions from raw data in a clean, well-structured and easily maintainable format.
  • Identified Key Performance Indicators (KPI) and Metrics for Business needs in SSAS
  • Worked on different data formats such as JSON, XML and performed machine learning algorithms in R and Python.

Environment: R, Python, GitHub, SQL Server, HDFS, Pig, RDBMS, Centos, Talend, Java

Confidential - Jersey City, NJ

Data Scientist/Big Data Analyst

Responsibilities:

  • Performed Logistic Regression, Classification, Random Forests and Clustering in R and Python.
  • Worked on Developing Fraud Detection Platform using various machine learning algorithms in R and Python.
  • Used Hive to store teh data and perform data cleaning steps for huge datasets.
  • Worked on Linear Discriminant analysis, Greedy Forward Selection, Greedy Backward Selection and Feature Selection, Feature reduction algorithms like TEMPPrincipal Component Analysis (PCA) and Factor Analysis.
  • Performed system data analysis. Created, modified, documented, and enhanced data specifications included identifying and mapping data elements from source systems into staging areas, teh data warehouse, data marts, and OLAP cubes.
  • Experience in analyzing data using HiveQL, Pig latin and custom MapReduce programs in Java
  • Performed Exploratory Data Analysis and Data Visualizations using R, Python and Tableau.
  • Validated teh machine learning classifiers using ROC Curves and Lift Charts.
  • Worked on customer segmentation using an unsupervised learning technique - clustering.
  • High Level Design of ETL DTS Package for integrating data from heterogeneous sources (Excel, CSV, Oracle, flat file, Text Format Data)
  • Designed and implemented Parameterized and cascading parameterized reports using SSRS
  • Experience working wif Toad, Oracle SQL, Hadoop and R in data analysis and data cleansing.
  • Performed Inventory Analysis wif Statistical and Data Visualization Tools.
  • Handled thereportingserver and was actively involved in creation of various types of complex reports
  • Performed data visualizations using Tableau and R.
  • Hive to transform teh large volumes of data wif respect to business requirement.
  • Performed Market Basket analysis to identify customer buying patterns, preferences and behaviors to better manage sales and inventory.
  • Performed Decision Tree Analysis and Random forests for strategic planning and forecasting.
  • Manipulating and Cleaning data using dplyr, tidyr, sqldf packages in R.
  • Analyzed historical data to find teh trends in system First Allocation Fill and First Pass Fill.
  • Extensively used Xelus for planning High cost/ More Selling parts.
  • Analyzed Demand data and inputs from Xelus to find out teh Safety Stock for all parts.

Environment: R, Tableau, Oracle BI, Xelus, MySQL, Hadoop, Pig, Hive, SQL Server, Tableau, Python, GitHub.

Confidential - Los Angeles, CA

Big Data Analyst

Responsibilities:

  • Implemented end-to-end systems for Data Analytics, Data Automation and integrated wif custom visualization tools using R, Mahout, Hadoop and MongoDB.
  • Performed Market-Basket Analysis and implemented decision trees, random forests and K- fold cross validation.
  • Responsible for data identification, collection, exploration & cleaning for modeling, participate in model development.
  • Facilitate ETL to large data sets utilizing Pig/Hive/Hbase on Hadoop Ecosystem
  • Visualize, interpret, report findings and develop strategic uses of data.
  • Understand transaction data and develop Analytics insights using Statistical models using Azure Machine learning.
  • Selection of statistical algorithms - (Random Forest, SVM, Bagged CART, GBM etc)
  • Interacted wif teh other departments to understand and identify data needs and requirements and work wif other members of teh IT organization to deliver data visualization and reporting solutions to address those needs
  • Built tools, analyzed, captured and reported Business Intelligence metrics captured at various stages of customer journey across marketing funnel
  • Responsible for providing reporting, analysis and insightful recommendations to business leaders on key performance metrics pertaining to sales & marketing
  • Strong Knowledge of other relational database platforms such as Oracle, DB2, IMS-DB, NoSQL (Elastic Search) etc.
  • Installed and configured Hadoop MapReduce, HDFS and developed multiple MapReduce jobs in Java/pig for data cleansing and preprocessing.
  • Used Decision trees and Random forests to find employee attrition rate.
  • Worked on Time Series forecasting using R to fill consumer demand by frequent tracking of inventory
  • Converted raw data to processed data by merging, finding outliers, errors, trends, missing values and distributions in teh data.
  • Accomplished Data analysis, Statistical analysis, generated reports, listings and graphs.
  • Provide quantitative supports for decision making to management team and Responsible for methodology development for real time data processing, using datamining, machine learning and statistics analysis using Hadoop and big data analytics
  • Developed sampling method to better analyze big data, such as shipment data and real time capturing of inventory data.
  • Importing and exporting data into HDFS and Hive using Sqoop
  • Implemented advanced stochastic method based on Hierarchical Bayesian Networks to estimate and forecast travel demand from multiple data sources.
  • Conducted comprehensive scenario analysis to improve business procedure and area planning and Optimized freight network design.
  • Implemented quantitative data analysis to understand multimodal freight behavior in port area.

Environment: R, Python, Mahout, MySQL, Hadoop, Pig, MongoDB, Oracle, DB2, Java

Confidential

Data Analyst

Responsibilities:

  • Analyzed business requirements, system requirements, data mapping requirement specifications, and responsible for documenting functional requirements and supplementary requirements in Quality Center 9.0
  • Involved in developing detailed test plan, test cases and test scripts using Quality Center for Functional and Regression Testing.
  • External interactions wif NSCC, DTCC, NFS and various fund companies in which MMLISI is associated.
  • Involved in Data mapping specifications to create and execute detailed system test plans.
  • Teh data mapping specifies what data will be extracted from an internal data warehouse, transformed and sent to an external entity.
  • Tested teh reports using Business Objects functionalities like Queries, Slice and Dice, Drill Down, Cross Tab, Master Detail and Formulae
  • Involved in Teradata SQL Development, Unit Testing and Performance Tuning
  • Tested Complex ETL Mappings and Sessions based on business user requirements and business rules to load data from source flat files and RDBMS tables to target tables.
  • Tested teh ETL Informatica mappings and other ETL Processes (Data Warehouse Testing)
  • Tested several stored procedures.
  • Validated several Business Objects reports. Reviewed and tested business requests for data and data usage.
  • Tested teh ETL process for both before data validation and after data validation process.
  • Tested teh messages published by ETL tool and data loaded into various databases
  • Responsible for Data mapping testing by writing complex SQL Queries using WINSQL
  • Experience in creating UNIX scripts for file transfer and file manipulation.
  • Validating teh data passed to downstream systems.
  • Worked wif Data Extraction, Transformation and Loading (ETL).
  • Involved in testing data mapping and conversion in a server based data warehouse.
  • Involved in testing teh UI applications
  • Involved in Security testing for different LDAP roles.
  • Tested whether teh reports developed in Business Objects are as per company standards.
  • Used Quality Center to track and report system defects
  • Involved in testing teh XML files and checked whether data is parsed and loaded to staging tables.

Environment: Informatica 8.1/7.1, Informix, DB2, Java, Business Objects, SQL, SQL Server 2000/2005, Teradata V2R6 (MLOAD, FLOAD, FAST EXPORT, BTEQ), TeradataSQL Assistant 7.0, Toad, XML, XSLT, IBM AIX 5.3, UNIX, Shell Scripting, WINSQL, Ultra edit, Rumba UNIX Display, Quality Center

Confidential

Data Analyst

Responsibilities:

  • Analyze teh Bank's data and business terms from a data quality and integrity perspective
  • Perform root cause analysis on smaller self-contained data analysis tasks that are related to assigned data processes.
  • Writes various SQL statements for data generation for purpose of analysis
  • Work to ensure high levels of data consistency between diverse source systems including flat files, XML and SQL Database.
  • Develop and run ad hoc data queries from multiple database types to identify system of records, data inconsistencies, and data quality issues.
  • Daily Data quality checks and Data profiling for accurate and better reporting and analysis.
  • Involved in translating teh business requirements into data requirements across different systems.
  • Involved in understanding teh customer needs wif regards to data, documenting requirements and complex SQL statements to extract teh data and packaging data for delivery to customers.
  • Participates in teh development of Enhancement for teh current Commercial and Mortgage Securities
  • Writes SQL Stored Procedures and Views and perform in-depth testing of new and existing systems.
  • Manipulate and prepare data, extract data from database for business analyst using Tableau.
  • Review normalized schemas for TEMPeffective and optimum performance tuning queries and data validations in OLTP and OLAP environments.
  • Exploits power of MS SQL to solve complex business problems by data analysis on a large set of data.
  • Use Tableau for T-SQL queried data, and data analysis, generating reports, graphics and statistical analysis.
  • Develop Reports using teh SQL advanced techniques like Rank, Row number etc.
  • Analyze data using Tableau for automation and determine business data trends.
  • Transfer data objects and queries from MS Excel to SQL Server.
  • Assists ETL team to define Source to Target Mappings.
  • Compile and Generate Reports in a Presentable Format to teh Project Team.

Environment: SQL Server, Tableau, MS excel, XML, T-SQL

We'd love your feedback!