We provide IT Staff Augmentation Services!

Sr. Data Scientist Resume

5.00/5 (Submit Your Rating)

Reston, VA

SUMMARY:

  • Over 9 years of Experienced working as a Data Scientist/Data Architect/Data Analyst/Data Modeling with emphasis on Data Mapping, Data Validation in Data Warehousing Environment.
  • Extensive experienced on business intelligence (and BI technologies) tools such as OLAP, Data warehousing, reporting and querying tools, Data mining and Spreadsheets
  • Worked on different type of Python modules such as requests, boto, flake8, flask, mock and nose
  • Extensive experienced on business intelligence (and BI technologies) tools such as OLAP, Data warehousing, reporting and querying tools, Data mining and Spreadsheets
  • Efficient in developing Logical and Physical Data model and organizing data as per the business requirements using Sybase Power Designer, Erwin, ER Studio in both OLTP and OLAP applications
  • Worked on different type of modules such as requests, boto, flake8, flask, mock and nose
  • Strong understanding of when to use an ODS or data mart or data warehousing.
  • Experienced in employing R Programming, MATLAB, SAS, Tableau and SQL for data cleaning, data visualization, risk analysis and predictive analytics
  • Adept at using SAS Enterprise suite, R, Python, and Big Data related technologies including Hadoop, Hive, Pig, Sqoop, Cassandra, Oozie, Flume, Map - Reduce and Cloudera Manager for design of business intelligence applications
  • Ability to provide wing-to- wing analytic support including pulling data, preparing analysis, interpreting data, making strategic recommendations and presenting to client/product teams.
  • Hands on experienced with Machine Learning, Regression Analysis, Clustering, Boosting, Classification, Principal Component Analysis and Data Visualization Tools
  • Strong programming skills in a variety of languages such as Python and SQL
  • Familiarity with Crystal Reports, and SSRS - Query, Reporting, Analysis and Enterprise Information Management
  • Excellent knowledge on creating reports on Pentaho Business Intelligence.
  • Experienced in Database using Oracle, XML, DB2, Teradata15/14, Netezza, server, Big Data and NoSQL.
  • Worked with engineering teams to integrate algorithms and data into Return Path solutions
  • Worked closely with other data scientists to create data driven products
  • Strong experienced in Statistical Modeling/Machine Learning and Visualization Tools
  • Experienced in working with large-scale data set
  • Expert at Full SDLC processes involving Requirements Gathering, Source Data Analysis, Creating Data Models, and Source to target data mapping, DDL generation, performance tuning for data models.
  • Extensively used Agile methodology as the Organization Standard to implement the data Models
  • Experienced with machine learning tools and libraries such as Scikit-learn, R, Spark and Weka
  • Experienced working with large, real world data — big, messy, incomplete, full of errors
  • Hands-on experienced with NLP, mining of structured, semi-structured, and unstructured data
  • Experienced and in-depth knowledge of the SAS Enterprise Miner, Python programming language

TECHNICAL SKILLS:

Data Modeling Tools: Erwin r9.6, 9.5, 9.1, 8.x, Rational Rose, ER/Studio, MS Visio, SAP Power designer.

Programming Languages: Oracle PL/SQL, Julia, UNIX shell scripting, Java. Python (NumPy, SciPy, Pandas, Gensim, Keras), R (Caret, Weka, ggplot)

Big Data Technologies: Hadoop( Hive, HDFS, MapReduce, Pig, Kafka, Spark), Cassandra, MongoDB, AWS(RDS, Dynamodb, Redshift)

Reporting Tools: Crystal reports XI, Business Intelligence, SSRS, Business Objects 5.x/ 6.xAWS Cloud EC2, S3, RDS, DynamoDB, Kinesis, Redshift

ETL: Informatica Power Centre, SSIS.

Project Execution Methodologies: Ralph Kimball and Bill Inmon, data warehousing methodology, Rational Unified Process (RUP), Rapid Application Development (RAD), Joint Application Development (JAD)

Operating Systems: Windows, UNIX, LINUX, Mac OS

Databases: Oracle /11g/12c/10g/9i, SQL Server, MySQL, PostgreSQL

PROFESSIONAL EXPERIENCE:

Confidential, Reston VA

Sr. Data Scientist

Responsibilities:

  • Built data pipelines for reporting, alerting, and data mining. Experienced with table design and data management using HDFS, Hive, Impala, Sqoop, MySQL, Mem SQL, Grafana/Influx DB, and Kafka.
  • Worked with statistical models for data analysis, predictive modelling, machine learning approaches, recommendation and optimization algorithms.
  • Working in Business and Data Analysis, Data Profiling, Data Migration, Data Integration and Metadata Management Services.
  • Worked extensively on Databases preferably Oracle 11g/12c and writing PL/SQL scripts for multiple purposes.
  • Built models using Statistical techniques like Bayesian HMM and Machine Learning classification models like XG Boost, SVM, and Random Forest using R and Python packages.
  • Worked with data compliance teams, data governance team to maintain data models, Metadata, data Dictionaries, define source fields and its definitions.
  • Worked with Big Data Technologies such Hadoop, Hive, MapReduce
  • Developed MapReduce/Spark Python modules for machine learning & predictive analytics in Hadoop on AWS. Implemented a Python-based distributed random forest via Python streaming.
  • Setup storage and data analysis tools in Amazon Web Services cloud computing infrastructure.
  • A highly immersive Data Science program involving Data Manipulation & Visualization, Web Scraping, Machine Learning, Python programming, SQL, GIT, Unix Commands, NoSQL, MongoDB, Hadoop.
  • Performed scoring and financial forecasting for collection priorities using Python, R and SAS machine learning algorithms.
  • Handled importing data from various data sources, performed transformations using Hive, MapReduce, and loaded data into HDFS
  • Managed existing team members, lead the recruiting and on boarding of a larger Data Science team that addresses analytical knowledge requirements.
  • Worked directly with upper executives to define requirements of scoring models.
  • Developed a model for predicting a debtor setting up a repayment rehabilitation program for student loan debt.
  • Developed a model for predicting repayment of debt owed to small and medium enterprise (SME) businesses.
  • Developed a generic model for predicting repayment of debt owed in the healthcare, large commercial, and government sectors.
  • Created SQL scripts and analyzed the data in MS Access/Excel and Worked on SQL and SAS script mapping.
  • Developed a legal model for predicting which debtors respond to litigation only.
  • Created multiple dynamic scoring strategies for adjusting the score upon consumer behavior such as payment or right-party phone call.
  • Rapid model creation in Python using pandas, numpy, sklearn, and plot.ly for data visualization. These models are then implemented in SAS where they are interfaced with MSSQL databases and scheduled to update on a timely basis.
  • Data analysis using regressions, data cleaning, excel v-look up, histograms and TOAD client and data representation of the analysis and suggested solutions for investors
  • Attained good knowledge in Hadoop Data Lake Implementation and HADOOP Architecture for client business data management.
  • Identifying relevant key performing factors; testing their statistical significance
  • Above scoring models resulted in millions of dollars of added revenue to the company and a change in priorities of the entire company.

Environment: R, SQL, Python 2.7.x, SQL Server 2014, regression, logistic regression, random forest, neural networks, Topic Modeling, NLTK, SVM (Support Vector Machine), JSON, XML, HIVE, HADOOP, PIG, Sklearn, SciPy, GraphLab, No SQL, SAS, SPSS, Spark, Hadoop, Kafka, HBase, MLib.

Confidential, New York City, NY

Sr. Data Analyst/Data Modeler

Responsibilities:

  • Facilitated JAD sessions for project scoping, requirements gathering & identification of business subject areas.
  • Designed the ER diagrams, logical model (relationship, cardinality, attributes, and, candidate keys) and physical database (capacity planning, object creation and aggregation strategies) for Oracle as per business requirements using Erwin.
  • Walked through the Logical Data Models of all source systems for data quality analysis.
  • Deployed naming standard to the Data Model and followed company standard for Project Documentation.
  • Created Tasks, Workflows and Worklets using Workflow Manager.
  • Converted physical database models from logical models, to build/generate DDL scripts.
  • Lead enterprise logical data modeling project (in third normal form) to gather data requirements for OLTP enhancements.
  • Converted third normal form ERDs into dimensional ERDs for data warehouse effort.
  • Involved in mapping spreadsheets that will provide the Data Warehouse Development (ETL) team with source to target data mapping, inclusive of logical names, physical names, data types, domain definitions, and corporate meta-data definitions.
  • Maintained warehouse metadata, naming standards and warehouse standards for future application development.
  • Designed and developed Informatica mappings for data loads and data cleansing.
  • Tuned Informatica Mappings for optimum performance and scheduling ETL Sessions.
  • Extensively used ETL to load data from DB2, Oracle databases.
  • Built, managed, customized ETL (Extraction, Transformation, and Loading) Mappings & workflows using Informatica workflow manager & Designer tools.
  • Extensively worked on ETL transformations Lookup, Update Strategy, Joiner, Router and Stored procedure transformations.
  • Tuned performance of Informatica sessions for large data files by increasing block size, data cache size and, sequence buffer length.
  • Extensively analyzed Ralph-Kimball Methodology and implemented it successfully.
  • Maintained Metadata Repository for storing table definitions, table spaces and entity definitions.

Environment: Erwin 4.0, Informatica Power Center 7.0/6.2 (Workflow Manager, Workflow Monitor, Worklets, Source Analyzer, Warehouse designer, Mapping Designer, Mapplet Designer, Transformations), SQL Server 2000 (Query Analyzer, DTS, TSQL), Oracle 9i/8i, PL/SQL, SQL* Loader, TOAD, Business Objects 6.0, Sun Solaris 2.7.

Confidential, Southfield, MI

Sr. Data Analyst/Data modeler

Responsibilities:

  • Performed data analysis and profiling of source data to better understand the sources.
  • Created ER Diagrams, Data Flow Diagrams, grouped and created the tables, validated the data, identified PK/ FK for lookup tables.
  • Created logical data model from the conceptual model and its conversion into the physical database design using Erwin.
  • Expertise in reverse and forward engineering.
  • Conducted several Physical Data Model training sessions with the ETL Developers. Worked with them on day-to-day basis to resolve any questions on Physical Model.
  • Interacted with the database administrators and business analysts for data type and class words.
  • Conducted design sessions with business analysts and ETL developers.
  • Used Model Mart of Erwin for effective model management of sharing, dividing and reusing model information and design for productivity improvement.
  • Created 3NF business area data modeling with de-normalized physical implementation data and information requirements analysis using Erwin tool.
  • Developed Star Schema and Snowflake Schema in designing the Logical Model into Dimensional Model.
  • Ensured the quality, consistency, and accuracy of data in a timely, effective and reliable manner using Data Governance.
  • Involved in extensive Data Analysis on the Teradata and Oracle Systems querying and writing in SQL and TOAD.
  • Involved in ETL mapping documents in data warehouse projects.
  • Used SQL joins, aggregate functions, analytical functions, group by, order by clauses and interacted with DBA and developers for query optimization and tuning.
  • Assisted the ETL team to document the transformation rules for data migration from OLTP to Warehouse environment for reporting purposes.

Environment: Oracle 11g, SQL Server 2005/2008, Teradata 14, MS Access, Teradata, Data stage, Erwin r8.0, Windows XP, MS Excel, Business Objects XI, SSIS, MS Access, Teradata, Data stage, Erwin8.0, Windows XP, Excel, Informatica.

Confidential, South Portland, ME

Data Analytics

Responsibilities:

  • Gathered, analyzed, documented and translated application requirements into data models and Supports standardization of documentation and the adoption of standards and practices related to data and applications.
  • Participated in Data Acquisition with Data Engineer team to extract historical and real-time data by using Sqoop, Pig, Flume, Hive, MapReduce and HDFS.
  • Wrote user defined functions (UDFs) in Hive to manipulate strings, dates and other data.
  • Performed Data Cleaning, features scaling, features engineering using pandas and numpy packages in python.
  • Applied clustering algorithms i.e. Hierarchical, K-means using Scikit and Scipy.
  • Performs complex pattern recognition of automotive time series data and forecast demand through the ARMA and ARIMA models and exponential smoothening for multivariate time series data.
  • Delivered and communicated research results, recommendations, opportunities to the managerial and executive teams, and implemented the techniques for priority projects.
  • Designed, developed and maintained daily and monthly summary, trending and benchmark reports repository in Tableau Desktop.
  • Generated complex calculated fields and parameters, toggled and global filters, dynamic sets, groups, actions, custom color palettes, statistical analysis to meet business requirements.
  • Implemented visualizations and views like combo charts, stacked bar charts, pareto charts, donut charts, geographic maps, spark lines, crosstabs etc.
  • Published workbooks and extract data sources to Tableau Server, implemented row-level security and scheduled automatic extract refresh.

Environment: Erwin r7.0, SQL Server 2000/2005, Windows XP/NT/2000, Oracle 8i/9i, MS-DTS, UML, UAT, SQL Loader, OOD, OLTP, PL/SQL, MS Visio, Informatica

Confidential, Irvine, CA

Data analytics

Responsibilities:

  • Involved with Business Analysts team in requirements gathering and in preparing functional specifications and changing them into technical specifications
  • Applied the Agile software development process to establish a business analysis methodology
  • Analyzed the client data and business terms from a data quality and integrity perspective
  • Identified the most suitable source of record and outlined the data required for sales and service
  • Implemented metadata repository, maintained data quality, data cleanup procedures, transformations, data standards, data governance program, scripts, stored procedures, triggers and executed test plans
  • Researched and resolved issues with application teams regarding the integrity of data flow into databases & reporting to the client
  • Created customized SQL Queries using MS SQL Management Studio to pull specified data for analysis and report building in conjunction with Crystal Reports
  • Analyzed transaction level datasets using SQL combined with demographic information to provide customer insights and generate ad-hoc reports
  • Prepared sales forecasts by collecting and analyzed sales data to evaluate current sales goals
  • Designed and developed Use cases, Activity diagrams, Sequence diagrams, OOD using UML and Business Process Modeling
  • Established the logical and physical ER/Studio data models in line with business standards and guidelines
  • Created logical and physical data models using best practices to ensure high data quality and reduced redundancy
  • Designed & developed various Ad hoc reports for different teams in Business (Teradata and MSACCESS, MSEXCEL)
  • Extracted Mainframe Flat Files (Fixed or CSV) onto UNIX Server and then converted them into TeradataTables for user convenience
  • Coordinated meetings with vendors to define requirements and system interaction agreement documentation between client and vendor system
  • Utilized SQL to develop stored procedures, views to create result sets to meet varying reporting requirements
  • Data visualization, reporting using Tableau and SSRS

Environment: Agile methodology, Python, SQL/Server 2008R2/2005 Enterprise, SSRS, SSIS, SSAS, Hadoop, Tableau, MS Access, MS Visio, MS Excel, MS Project, Teradata, ER Studio, Crystal reports, and Business Objects

Confidential - Santa Monica, CA

Responsibilities:

  • Developing and scaling REST APIs and Good understanding and working experience in JSON, JSON Schema, and REST API.
  • Object-Oriented Design using common design patterns / Software Debugging, Test-driven development, UML, SVN.
  • Created Machine Learning and statistical methods, (SVM, CRF, HMM, sequential tagging) or willingness to intensely learn.
  • Used data mining algorithms and approach.
  • Developed using Python unit test framework, or any other unit test framework.
  • Using Python and Product ionizing end-to-end systems.
  • Worked on file systems, server architectures, databases, SQL, and data movement (ETL).
  • Exposure to python and python packages.
  • Collaborate with Risk Analytics teams, Stress Testing team, Middle Office, IT and other departments.
  • Develop and implement innovative AI and machine learning tools that will be used in the Risk.
  • Responsible for creation and execution of test plans, protocols, and documentation.
  • Formulate and test hypotheses, extract signals from petabyte scale, unstructured data sets, and ensure that our display advertising business delivers the highest standards of performance.
  • Understanding the trade-offs between competing approaches, and identifying the ones that are likely to have a real impact on the product.
  • Lead a project team of systems engineers (HW, FW & SW) and internal and outsourced development partners to develop reliable, cost-effective and high-quality solutions.
  • Identify and assess available machine learning and statistical analysis libraries (including repressors, classifiers, statistical tests, and clustering algorithms).
  • NLP engineer with a profound interest in research and development for cutting-edge machine learning techniques.

We'd love your feedback!