We provide IT Staff Augmentation Services!

Data Scientist Resume

4.00/5 (Submit Your Rating)

Chicago, IL

PROFESSIONAL SUMMARY:

  • Over 8+ years of Experience on MachineLearning, StatisticalModeling, PredictiveModeling, DataAnalytics, Data Modeling, Data Architecture, Data Analysis, Data Mining, Text Mining & NaturalLanguage Processing (NLP), Artificial Intelligence algorithms, Business Intelligence, Analytics Models (like Decision Trees, Linear & Logistic Regression, Hadoop (Hive, PIG), R, Python, Spark, Scala, MS Excel, SQL and Postgre SQL, Erwin.
  • Excellent working in Big Data Hadoop Hortonworks, HDFS architecture, R, Python, Jupyter, Pandas, NumPy, Scikit, Matplotlib, pyhive, Keras, Hive, noSQL - HBASE, Sqoop, Pig, MapReduce, Oozie, Spark MLlib.
  • Hands on experience in Linear, Logistic Regression, K Means Cluster Analysis, Decision Tree, KNN, SVM, Random Forest, Market Basket, NLTK/Naïve Bayes, Sentiment Analysis, Text Mining/Text Analytics, Time Series Forecasting.
  • Extensive experienced on business intelligence (and BI technologies) tools such as OLAP, Data warehousing, reporting and querying tools, Data mining and Spreadsheets
  • Worked on different type of Python modules such as requests, boto, flake8, flask, mock and nose
  • Extensive experienced on business intelligence (and BI technologies) tools such as OLAP, Data warehousing, reporting and querying tools, Data mining and Spreadsheets
  • Efficient in developing Logical and Physical Data model and organizing data as per the business requirements using Sybase Power Designer, Erwin, ER Studio in both OLTP and OLAP applications
  • Strong understanding of when to use an ODS or data mart or data warehousing.
  • Experienced in employing R Programming, MATLAB, SAS, Tableau and SQL for data cleaning, data visualization, risk analysis and predictive analytics
  • Adept at using SAS Enterprise suite, R, Python, and BigData related technologies including Hadoop, Hive, Pig, Sqoop, Cassandra, Oozie, Flume, Map-Reduce and Cloudera Manager for design of business intelligence applications
  • Ability to provide wing-to- wing analytic support including pulling data, preparing analysis, interpreting data, making strategic recommendations and presenting to client/product teams.
  • Hands on experienced with Machine Learning, Regression Analysis, Clustering, Boosting, Classification, Principal Component Analysis and Data Visualization Tools
  • Strong programming skills in a variety of languages such as Python and SQL
  • Familiarity with Crystal Reports, and SSRS - Query, Reporting, Analysis and Enterprise Information Management
  • Excellent knowledge on creating reports on Pentaho Business Intelligence.
  • Experienced in Database using Oracle, XML, DB2, Teradata15/14, Netezza, server, Big Data and NoSQL.
  • Worked with engineering teams to integrate algorithms and data into Return Path solutions
  • Worked closely with other data scientists to create data driven products
  • Strong experienced in Statistical Modeling/Machine Learning and Visualization Tools
  • Proficient in Hadoop, HDFS, Hive, MapReduce, Pig and NOSQL databases like MongoDB, HBase, Cassandra and expertise in applying data mining techniques and optimization techniques in B2B and B2C industries and proficient in Machine Learning, Data/Text Mining, Statistical Analysis & Predictive Modeling.
  • Experienced in Data Modeling&Data Analysis experience using Dimensional Data Modeling and Relational Data Modeling, Star Schema/Snowflake Modeling, FACT & Dimensions tables, Physical & Logical Data Modeling.
  • Expertise skills in SAS DATA STEP, PROC Step, SQL, ETL, data mining, SAS MACRO, SAS ACCESS, SAS STAT, SAS GRAPH, SAS DI Studio, SAS BI Platform, SAS Web Report Studio, SAS BI Dashboard, SAS Stored Process, SAS Management Console, Enterprise Guide, Enterprise Miner, SAS VA and ODS and procs like PROC SQL, import, export, means, summary, freq, tabulate, report, univariate, append, print, sort, transpose, format, glm, corr, factor, t test, Chi Square, ANOVA, Arima, Arma, rank, Reg, logistic, boxplot etc

TECHNICAL SKILLS:

Data Modeling Tools: Erwin r9.6, 9.5, 9.1, 8.x, Rational Rose, ER/Studio, MS Visio, SAP Power designer.

Programming Languages: C/C++, C#, Java, Oracle PL/SQL, Python, SQL, T-SQL, UNIX shell scripting, Bash, HTML5.

Scripting Languages: Python (NumPy, SciPy, Pandas, Gensim, Keras), R (Caret, Weka, ggplot), XML, JSON

Big Data Technologies: Hadoop, Hive, HDFS, MapReduce, Pig, Kafka, spark, hbase.

Reporting Tools: Crystal reports XI, Business Intelligence, SSRS, Business Objects 5.x/ 6.x, Cognos7.0/6.0, Tableau.

ETL: Informatica Power Centre, SSIS.

BI Tools: Tableau, Tableau server, Tableau Reader, SAP Business Objects, OBIEE, Qlikview, SAP Business Intelligence, Amazon Redshift, or Azure Data Warehouse

Tools: MS-Office suite (Word, Excel, MS Project and Outlook), Spark MLlib, Scala NLP, MariaDB, Azure, SAS.

Data Modeling Tools: Erwin Sybase Power Designer, ER Studio, Enterprise Architect, Oracle Designer, MS Visio.

Operating Systems: Windows, UNIX, MS DOS, Sun Solaris.

Databases: Oracle, Teradata, Netezza, Microsoft SQL Server, Mysql, MongoDB, HBase, Cassandra.

Project Execution Methodologies: Ralph Kimball and Bill Inmon data warehousing methodology, Rational Unified Process (RUP), Rapid Application Development (RAD), Joint Application Development (JAD)

PROFESSIONAL EXPERIENCE:

Confidential, Chicago, IL

Data Scientist

Responsibilities:

  • Built data pipelines for reporting, alerting, and data mining. Experienced with table design and data management using HDFS, Hive, Impala, Sqoop, MySQL, Mem SQL, Grafana/Influx DB, and Kafka.
  • Worked with statistical models for data analysis, predictive modelling, machine learning approaches and recommendation and optimization algorithms.
  • Working in Business and Data Analysis, Data Profiling, Data Migration, Data Integration and Metadata Management Services.
  • Worked extensively on Databases preferably Oracle 11g/12c and writing PL/SQL scripts for multiple purposes.
  • Built models using Statistical techniques like Bayesian HMM and MachineLearning classification models like XGBoost, SVM, and Random Forest using R and Python packages.
  • Worked with data compliance teams, data governance team to maintain data models, Metadata, data Dictionaries, define source fields and its definitions.
  • Worked with BigData Technologies such Hadoop, Hive, MapReduce
  • Developed MapReduce/Spark Python modules for machine learning & predictive analytics in Hadoop on AWS. Implemented a Python-based distributed random forest via Python streaming.
  • Setup storage and data analysis tools in Amazon Web Services cloud computing infrastructure.
  • A highly immersive Data Science program involving Data Manipulation & Visualization, Web Scraping, Machine Learning, Python programming, SQL, GIT, Unix Commands, NoSQL, MongoDB, Hadoop.
  • Performed scoring and financial forecasting for collection priorities using Python, R and SAS machine learning algorithms.
  • Handled importing data from various data sources, performed transformations using Hive, MapReduce, and loaded data into HDFS
  • Managed existing team members, lead the recruiting and on boarding of a larger Data Science team that addresses analytical knowledge requirements.
  • Worked directly with upper executives to define requirements of scoring models.
  • Developed a model for predicting repayment of debt owed to small and medium enterprise (SME) businesses.
  • Developed a generic model for predicting repayment of debt owed in the healthcare, large commercial, and government sectors.
  • Created SQLscripts and analyzed the data in MS Access/Excel and Worked on SQL and SAS script mapping.
  • Developed a legal model for predicting which debtors respond to litigation only.
  • Created multiple dynamic scoring strategies for adjusting the score upon consumer behaviour such as payment or right-party phone call.
  • Rapid model creation in Python using pandas, NumPy, sklearn, and plot.ly for data visualization. These models are then implemented in SAS where they are interfaced with MSSQL databases and scheduled to update on a timely basis.
  • Data analysis using regressions, data cleaning, excel v-look up, histograms and TOAD client and data representation of the analysis and suggested solutions for investors
  • Attained good knowledge in Hadoop Data Lake Implementation and HADOOP Architecture for client business data management.
  • Identifying relevant key performing factors; testing their statistical significance
  • Above scoring models resulted in millions of dollars of added revenue to the company and a change in priorities of the entire company.

Environment: -R, SQL, Python 2.7.x, SQL Server 2014, regression, logistic regression, random forest, neural networks, Topic Modeling, NLTK, SVM (Support Vector Machine), JSON, XML, HIVE, HADOOP, PIG, Sklearn, SciPy, GraphLab, No SQL, SAS, SPSS, Spark, Hadoop, Kafka, HBase, MLib.

Confidential, Irving, Tx

Data Scientist

Responsibilities:

  • Utilize a broad variety of statistical packages like SAS, R, MLIB, Graphs, Hadoop, Spark, MapReduce, Pig and others
  • Refine and train models based on domain knowledge and customer business objectives
  • Deliver or collaborate on delivering effective visualizations to support the client business objectives
  • Extensive understanding of the BI and analytics space with special focus on the consumer and customer space
  • Converted time lag problems in order fulfillment into Data mining tasks
  • Performed Data Profiling to assess data quality using SQL through complex internal database
  • Improved sales and logistic data quality by data cleaning using NumPy, SciPy, Pandas in Python
  • Built Data warehouse to support end-user queries with Oracle and MS Visual Studio
  • Designed and implemented Dimensional DataModeling for order fulfillment process
  • Deployed SSIS packages to complete ETL and DataMapping process
  • Transformed data through methods like Aggregation, Slowly Changing Dimension, Splitting
  • Derived business intelligence report for order fulfillment using MS SSAS and SSRS
  • Determined regression model predictors using Correlation matrix for Factor analysis in R
  • Built Regression model to understand order fulfillment time lag issue using Scikit-learn in Python
  • Optimized predictive model by reducing insignificant variables using Stepwise Regression
  • Empowered decision makers with data analysis dashboards using Tableau and Power BI
  • Interface with other technology teams to extract, transform, and load (ETL) data from a wide variety of data sources
  • Own the functional and non-functional scaling of software systems in your ownership area.
  • Provides input and recommendations on technical issues to BIEngineers, Business&DataAnalysts and Data Scientists.
  • Outstanding analytical and problem-solving skills are essential.

Environment: - Python, Hive, C/C++, C#, Java or Python, Bash, HTML5, PERL, Processing, Python and J Query, SOAPUI, WCF, WPF, VSO, TFS, GIT,XML, XSD, SQL Server 2008, Oracle 10/11g,.

Confidential, Plano, TX

Data Scientist

Responsibilities:

  • Supported MapReduce Programs running on the cluster.
  • Evaluated business requirements and prepared detailed specifications that follow project guidelines required to develop written programs.
  • Configured Hadoop cluster with Name node and slaves and formatted HDFS.
  • Used Oozie workflow engine to run multiple Hive and Pig jobs.
  • Performed MapReduce Programs those are running on the cluster.
  • Developed multiple MapReduce jobs in java for data cleaning and pre-processing.
  • Analyzed the partitioned and bucketed data and compute various metrics for reporting.
  • Involved in loading data from RDBMS and web logs into HDFS using Sqoop and Flume.
  • Worked on loading the data from MySQL to HBase where necessary using Sqoop.
  • Developed Hive queries for Analysis across different banners.
  • Extracted data from Twitter using Java and Twitter API.ParsedJSON formatted twitter data and uploaded to database.
  • Launching AmazonEC2 Cloud Instances using Amazon Images (Linux/ Ubuntu) and Configuring launched instances with respect to specific applications.
  • Exported the result set from Hive to MySQL using Sqoop after processing the data.
  • Analyzed the data by performing Hive queries and running Pigscripts to study customer behavior.
  • Have hands on experience working on Sequence files, AVRO, HAR file formats and compression.
  • Used Hive to partition and bucketdata.
  • Experience in writing MapReduce programs with JavaAPI to cleanse Structured and unstructured data.
  • Wrote Pig Scripts to perform ETL procedures on the data in HDFS.
  • Created HBase tables to store various data formats of data coming from different portfolios.
  • Worked on improving performance of existing Pig and HiveQueries.

Environment: -SQL/Server, Oracle 9i, MS-Office, Teradata, Informatica, ER Studio, XML, Business Objects, Deep learning approaches

Confidential, Fort Lauderdale, FL

Data Analyst/Modeler

Responsibilities:

  • Participated in JAD sessions, gathered information from Business Analysts, end users and other stakeholders to determine the requirements.
  • Developed the logical data models and physical data models that confine existing condition/potential status data fundamentals and data flows using ER Studio.
  • Created Data warehousing methodologies/Dimensional Data modeling techniques such as Star/Snowflake schema using ERWIN9.1.
  • Extensively used Aginity Netezza workbench to perform various DDL, DML etc. operations on Netezza database.
  • Designed the Data Warehouse and MDM hub Conceptual, Logical and Physical data models.
  • Performed Daily Monitoring of Oracle instances using Oracle Enterprise Manager, ADDM, TOAD, monitor users, table spaces, memory structures, rollback segments, logs and alerts.
  • Involved in Teradata SQL Development, Unit Testing and Performance Tuning and to ensure testing issues are resolved on the basis of using defect reports.
  • Customized reports using SAS/MACRO facility, PROC REPORT, PROC TABULATE and PROC.
  • Translate business and data requirements into Logical data models in support of Enterprise DataModels, ODS, OLAP, OLTP, Operational Data Structures and Analytical systems.
  • Worked on database testing, wrote complex SQL queries to verify the transactions and business logic like identifying the duplicate rows by using SQL Developer and PL/SQL Developer.
  • Used Teradata SQL Assistant, Teradata Administrator, PMON and data load/export utilities like BTEQ, FastLoad, Multi Load, Fast Export, TPump on UNIX/Windows environments and running the batch process for Teradata.
  • Expertise in debugging, optimizing the performance tuning of Oracle BIEE 10g/ 11g Repository & Dashboards / Reports by implementing Aggregate tables, Fragmentation of data sources, Indexes and Cache management.
  • Worked on data profiling and data validation to ensure the accuracy of the data between the warehouse and source systems.
  • Hands on Data warehouse concepts like Data warehouse Architecture, Star schema, Snowflake schema, and Data Marts, Dimension and Fact tables.
  • Developed SQL Queries to fetch complex data from different tables in remote databases using joins, database links and Bulk collects.
  • Migrated database from legacy systems, SQL server to Oracle and Netezza.
  • Reviewed the logical model with application developers, ETL Team, DBAs and testing team to provide information about the data model and business requirements.
  • Worked on SQL Server concepts SSIS (SQL Server Integration Services), SSAS (Analysis Services) and SSRS (Reporting Services).

Environment: -ER Studio, OBIEE 11.1.1.6, Teradata13.1, SQL, PL/SQL, BTEQ, DB2, Oracle, MDM, Netezza, ETL, RTF UNIX, SQL Server2010, Informatica, SSRS, SSIS, SSAS, SAS, Aginity.

Confidential

Data Analyst/Data Modeler

Responsibilities:

  • Worked with project team representatives to ensure that logical and physical ER/Studio data models were developed in line with corporate standards and guidelines.
  • Involved in defining the source to target data mappings, business rules, data definitions.
  • Worked with BTEQ to submit SQL statements, import and export data, and generate reports in Teradata.
  • Responsible for defining the key identifiers for each mapping/interface.
  • Responsible for defining the functional requirement documents for each source to target interface.
  • Document the complete process flow to describe program development, logic, testing, and implementation, application integration, coding.
  • Involved in defining the business/transformation rules applied for sales and service data.
  • Define the list codes and code conversions between the source systems and the data mart.
  • Worked with internal architects and, assisting in the development of current and target state data architectures.
  • Coordinate with the business users in providing appropriate, effective and efficient way to design the new reporting needs based on the user with the existing functionality.
  • Remain knowledgeable in all areas of business operations in order to identify systems needs and requirements.
  • Responsible for defining the key identifiers for each mapping/interface.
  • Implementation of Metadata Repository, Maintaining Data Quality, Data Cleanup procedures, Transformations, Data Standards, Data Governance program, Scripts, Stored Procedures, triggers and execution of test plans
  • Performed data quality in TalendOpenStudio.
  • Coordinated meetings with vendors to define requirements and system interaction agreement documentation between client and vendor system.
  • Enterprise Metadata Library with any changes or updates.
  • Generate weekly and monthly asset inventory reports.

Environment: -Erwin r7.0, SQL Server 2000/2005, Windows XP/NT/2000, Oracle 8i/9i, MS-DTS, UML, UAT, SQL Loader, OOD, OLTP, PL/SQL, MS Visio, Informatica.

Confidential

Data Analyst

Responsibilities:

  • Analyze business information requirements and model class diagrams and/or conceptual domain models.
  • Gather & Review Customer Information Requirements for OLAP and building the data mart.
  • Performed document analysis involving creation of Use Cases and Use Case narrations using Microsoft Visio, in order to present the efficiency of the gathered requirements.
  • Calculated and analyzed claims data for provider incentive and supplemental benefit analysis using Microsoft Access and Oracle SQL.
  • Analyzed business process workflows and assisted in the development of ETL procedures for mapping data from source to target systems.
  • Worked with BTEQ to submit SQL statements, import and export data, and generate reports in Terra-data.
  • Responsible for defining the key identifiers for each mapping/interface
  • Responsible for defining the functional requirement documents for each source to target interface.
  • Coordinated meetings with vendors to define requirements and system interaction agreement documentation between client and vendor system.
  • Enterprise Metadata Library with any changes or updates.
  • Document data quality and traceability documents for each source interface.
  • Establish standards of procedures.
  • Generate weekly and monthly asset inventory reports.
  • Managed the project requirements, documents and use cases by IBM Rational RequisitePro.
  • Assisted in building an Integrated LogicalDataDesign, propose physical database design for building the data mart.
  • Document all data mapping and transformation processes in the Functional Design documents based on the business requirements.

Environment: -SQL Server 2008R2/2005 Enterprise, SSRS, SSIS, Crystal Reports, Windows Enterprise Server 2000, DTS, SQL Profiler, and Query Analyzer.

We'd love your feedback!