Data Scientist Resume San Francisco, CA - Hire IT People

PROFESSIONAL SUMMARY:

Over 8+ years of experience in Data Analysis, Decision Trees, Random Forest, Data Profiling, Data Integration, Data governance, Migration and Metadata Management, Master Data Management and Configuration Management.
Proficient in Predictive Modeling, Data Mining Methods, Factor Analysis, Confidential, Hypothetical testing, normal distribution and other advanced statistical and econometric techniques.
Developed predictive models using Decision Tree, Random Forest, Naïve Bayes, Logistic Regression, Cluster Analysis, and Neural Networks.
Experienced the full software lifecycle in SDLC, Agile, and Scrum methodologies.
Strong SQL programming skills, with experience in working with functions, packages, and triggers.
Excellent understanding of machine learning techniques and algorithms, such as k - NN, Naive Bayes, SVM, Decision Forests, natural language processing (NLP) etc.
Worked with RDBMS including MySQL, DB2 and Oracle SQL.
Experienced in Data Integration Validation and Data Quality controls for ETL process and Data Warehousing using MS Visual Studio SSIS, SSAS, and SSRS.
Experience in implementation of the Stored Procedures, Triggers, Functions using T-SQL
Expert in developing Data Conversions/Migration from Legacy System of various sources (flat files, Oracle, Non-Oracle Database) to Oracle system Using SQL LOADER, External table and Calling Appropriate Interface tables and API's Informatica.
Experienced in Performance tuning of Informatica (sources, mappings, targets, and sessions)
Extensive experience in Upgrading Talend versions.
Extensive experiences in ETL tools like Informatica PowerCenter 8.x,9.x, and Talend DI
Hands on experience in Teradata SQL Analytics, Teradata utilities and familiar in Creating Secondary indexes, and join indexes in Teradata.
Strong working experience on Teradata query performance tuning by analyzing CPU, AMP Distribution, Table Skewness, and IO metrics.
Experience in managing and maintaining IAM policies for organizations in AWS to define groups, create users, assign roles and define rules for role-based access to AWS resources.
Hands on experience in setting up databases in AWS using RDS, storage using an S3 bucket and configuring instance backups to S3 bucket to ensure fault tolerance and high availability.
Maintenance and monitoring of Docker in a cloud-based service during production and Set up a system for dynamically adding and removing web services from a server using Docker. Used Kubernetes to manage Docker containers cluster.
Extensively worked on Teradata Utility tools like BTEQ, Fast load, Fast Export, Multi-Load, TPUMP, and TPT.
Proficient in Tableau and R-Shiny data visualization tools to analyze and obtain insights into large datasets create visually powerful and actionable interactive reports and dashboards.
Automated recurring reports using SQL and Python and visualized them on BI platform like Tableau.
Worked in a developmentenvironment like Git and VM.
Experience in developing and analyzing data models, involved in writing simple and complex SQL queries to extract data from the database for data analysis and testing
Strong knowledge in all phases of the SDLC (Software Development Life Cycle) from analysis, design, development, testing, implementation, and maintenance with timely delivery against deadlines
Ability to thoroughly analyze the system's functional requirements and prepare BRD (Business requirement documentation), use cases and testing documents
Responsible for performing Machine-learning techniques regression/classification to predict the outcomes.
Proficient in developing and designing ETL packages and reporting solutions using MS BI Suite (SSIS/SSRS) Experience in coding SQL/PL SQL using Triggers, Procedures and Packages.
Experience in end-to-end implementation of a data warehouse project based on the SASEG.
Experience in building and publishing interactive reports and dashboards with design customizations based on the client requirements in Tableau, Looker, PowerBI and SSRS.
Extensive working experience with Python including Scikit-learn, Pandas and Numpy.
Integration Architect & Data Scientist experience in Analytics, Big Data, SOA, ETL and Cloud technologies.
Experienced in Big Data with Hadoop 2, Hive, HDFS, MapReduce, and Spark.
Experience working with data modeling tools like Power Designer, Erwin and ER Studio.
Proficient knowledge of statistics, mathematics, machine learning, recommendation algorithms and analytics with an excellent understanding of business operations and analytics tools for effective analysis of data.
Excellent communication skills. Successfully working in fast-paced multitasking environment both independently and in a collaborative team, a self-motivated enthusiastic learner.

TECHNICAL SKILLS:

Programming &Languages: C, C++, JAVA, PL/SQL,HTML5, DHTML, WSDL, XML, CSS3 JSON, Ajax, R/R Studio, Scala.

Databases: MS-Access, Oracle 12c/11g/10g/9i,Mysql,DB2

Development Tools: Microsoft SQL Studio, IntelliJ, Eclipse, NetBeans.

Software/Libraries: Keras, Caffe, TensorFlow, OpenCV, Scikit-learn, Pandas, NumPy, Microsoft Visual Studio, Microsoft Office

ETL/BI Tools: Informatica PowerCenter 9.x, Tableau, Cognos BI 10, MS Excel, SAS, SAS/Macro, SAS/SQL

Cloud: AWS, S3, EC2.

Statistical Methods: Time Series, regression models, splines, confidence intervals, principal component analysis and Dimensionality Reduction, bootstrapping

BI Tools: Microsoft Power BI, Tableau, SSIS, SSRS, SSAS, Business Intelligence Development Studio (BIDS), Visual Studio, Crystal Reports, Informatica 6.1.

Reporting Tools: MS Office (Word/Excel/PowerPoint/ Visio/Outlook), Crystal Reports XI, SSRS, Cognos 7.0/6.0.

Teradata Utilities: BTEQ, Fast load, Fast Export, Multi-load, TPUMP and TPT

Database Design Tools and Data Modeling: Load, SQL Developer, PL/SQL Developer, SQL Developer, SQL*Loader, SQL*Plus, Informatica Power Center 9.5.1 MS Visio, ERWIN 4.5/4.0, Star Schema/Snowflake Schema modeling, Fact & Dimensions tables, physical & logical data modeling, Normalization and De-normalization techniques, Kimball &Inmon Methodologies.

Operating Systems: Windows (10, 7, Vista), XP, UNIX, Linux.

PROFESSIONAL EXPERIENCE:

Confidential, San Francisco, CA

Data Scientist

Responsibilities:

Worked as a Data Modeler/Analyst to generate Data Models using Erwin and developed relational database system.
Used R, Python, MATLAB and Spark to develop a variety of models and algorithms for analytic purposes.
Worked with Spark for improving performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames, and Pair RDD's.
Created SQL tables with referential integrity and developed queries using SQL, SQL*PLUS and PL/SQL Designed both 3NF data models for ODS, OLTP systems and dimensional data models using Star and Snowflake Schemas.
Utilized Convolution Neural Networks to implement a machine learning image recognition componentusing TensorFlow.
Data Warehouse, Data Migration Application under Extensive hands on experience using ETL tools like Talend, Informatica .
Performed data integrity checks, data cleaning, exploratory analysis and feature engineer using R and Python.
Experience with TensorFlow, Theano, Keras and other Deep Learning Frameworks.
Worked as Data Architects and IT Architects to understand the movement of data and its storage and ER Studio 9.7
Worked extensively in data migration using Talend Data services platform.
A highly immersive Data Science program involving Data Manipulation and Visualization, Web Scraping, Machine Learning, GIT, SQL, Unix Commands, Python programming, No SQL, Mongo DB, Hadoop.
Extensively worked on Data Modeling tools Erwin Data Modeler to design the data models.
Setup storage and data analysis tools in Amazon Web Services cloud computing infrastructure.
Develop and deploy automated EDI functions using Talend Open Studio (TOS) and SQL Server Management Studio (SSMS)
A highly immersive Data Science program involving Data Manipulation & Visualization, Web Scraping, Machine Learning, SQL, GIT, Unix Commands, NoSQL, MongoDB.
Built analytical data pipelines to port data in and out of Hadoop/HDFS from structured and unstructured sources and designed and implemented system architecture for Amazon EC2 based cloud-hosted solution for client.
Performed K-means clustering, Multivariate analysis and Support Vector Machines in Python and R.
Professional Tableau user (Desktop, Online, and Server), Experience with Keras and Tensor Flow.
Created map reduce running over HDFS for data mining and analysis using R and Loading & Storage data to Pig Script and R for MapReduce operations and created various types of data visualizations using R, and Tableau.
Worked on machine learning on large size data using Spark and MapReduce.
Performed data analysis by using Hive to retrieve the data from Hadoop cluster, SQL to retrieve data from Oracle database.
Experience in Implementing python alongside using various libraries such as matplotlib for charts and graphs, MySQL DB for database connectivity, Pandas data frame, NumPy.
Developed Spark/Scala, Python for regular expression (regex) project in the Hadoop/Hive environment with Linux/Windows for big data resources.
Collaborate with unit managers, end users, development staff, and other stakeholders to integrate data mining results with existing systems.
Designed Mapping to process the incremental changes that exist in the source table. Whenever source data elements were missing in source tables, these were modified/added inconsistency with third normal form based OLTP source database.
Designed tables and implemented the naming conventions for Logical and Physical Data Models in Erwin 7.0.
Provide expertise and recommendations for physical database design, architecture, testing, performance tuning and implementation.
Designed logical and physical data models for multiple OLTP and Analytic applications.
Designed the physical model for implementing the model into the oracle9i physical database.
Involved with Data Analysis Primarily Identifying Datasets, Source Data, Source Meta Data, Data Definitions and Data Formats
Performance tuning of the database, which includes indexes, and optimizing SQL statements, monitoring the server.
Wrote simple and advanced SQL queries and scripts to create standard and Adhoc reports for senior managers.
Collaborated the data mapping document from a source to target and the data quality assessments for the source data.
Created S3 buckets and managed roles and policies for S3 buckets. Utilized S3 buckets and Glacier for file storage and backup on AWS cloud. Used Dynamo DB to store the data for metrics and backend reports.
Worked with Elastic Beanstalk for quick deployment of services such as EC2 instances, Load balancer, and databases on the RDS on the AWS cloud environment.
Used Java code to connect AWS S3 buckets by using AWS SDK, to access media files related to the application.
Created Data Quality Scripts using SQL and Hive to validate successful data load and quality of the data. Created various types of data visualizations using Python and Tableau.
Worked on data pre-processing and cleaning the data to perform feature engineering and performed data imputation techniques for the missing values in the dataset using Python.
Used Amazon Simple Workflow service (SWF) for data migration in data centers which automates the process and tracks every step and logs are maintained in S3 bucket.
Designed and developed user interfaces and customization of Reports using Tableau and OBIEE and designed cubes for data visualization, mobile/web presentation with parameterization and cascading.
Worked on different data formats such as JSON, XML and performed machine learning algorithms in Python.
Performed Data Analysis and Data Profiling and worked on data transformations and data quality rules.
Coded proprietary packages to analyze and visualize SPC file data to identify bad spectra and samples to reduce unnecessary procedures and costs.
Programmed a utility in Python that used multiple packages (numpy, scipy, pandas)
Implemented Classification using supervised algorithms like Logistic Regression, Decision trees, Naive Bayes, KNN.
As Architect delivered various complex OLAP databases/cubes, scorecards, dashboards and reports.
Updated Python scripts to match training data with our database stored in AWS Cloud Search, so that we would be able to assign each document a response label for further classification.
Used Teradata utilities such as Fast Export, MLOAD for handling various tasks data migration/ETL from OLTP Source Systems to OLAP Target Systems
Created SSIS Packages using Pivot Transformation, Execute SQL Task, Data Flow Task, etc. to import data into the data warehouse.
Developed and implemented SSIS, SSRS and SSAS application solutions for various business units across the organization.

Environment: Python, SQL, GIT, HDFS, Pig, Hive, Oracle, DB2, Tableau Unix Commands, NoSQL, MongoDB, SSIS, SSRS, SSAS, AWS,S3,EC2,RDS,SWF,Dynamo DB, Glacier, Erwin, Tableau, OBIEE.

Confidential

Data Scientist

Responsibilities:

Data collection, Cleaned, filtered and transformed data to the specified format.
Created captivating interactive visualizations and presentations to enhance decision-making capabilities by the management.
Developed novel applications of classification, forecasting, simulation, optimization, and summarization techniques to enhance effective decisions.
Prepared the workspace for Markdown. Accomplished Data analysis, statistical analysis, generated reports, listings, and graphs.
Found outliers, anomalies, trends in any given data sets.
Developed scalable machine learning solutions within a distributed computation framework (e.g. Hadoop, Spark, Storm etc.).
Having experience with visualization technologies such as Tableau
Draw inferences and conclusions, and create dashboards and visualizations of processed data, identify trends, anomalies
Assisted in migrating data, data pump with the Export/Importutility tool.
Provided daily change management process support, ensuring that all changes to program baselines are properly documented and approved, maintained, managed and issue change schedules.
Developed, installed, maintained and monitored company databases in high performance/high availability environment with supported configuration, performance tuning to ensure optimal resource usage.
Documented all programs and procedures to ensure an accurate historical record of work completed on the assigned project as well as to improve quality and efficacy.
Implemented various types of change data captures according to source data behavior and business requirements.
Implemented various Performance tuning techniques at ETL & Teradata BTEQ for efficient development and performance.
Used Simple storage services (s3) for snapshot and Configured S3 lifecycle of Applications & Databases logs, including deleting old logs, archiving logs based on retention policy of Apps and Databases.
Created and maintained Logical and Physical models for the data mart and created partitions and indexes for the tables in the data mart.
Performed Data profiling and Analysis applied various data cleansing rules designed data standards and architecture/designed the relational models.
Creating new data designs and ensuring that they fall within the realm of the overall Enterprise BI Architecture.
Built models using Statistical techniques like Bayesian HMM and Machine Learning classification models like XG Boost, SVM, and Random Forest.
Setup storage and data analysis tools in Amazon Web Services cloud computing infrastructure.
Created logical data model from the conceptual model and its conversion into the physical database design using ERWIN 9.6.
Designed and Developed Oracle PL/SQL and Shell Scripts, Data Import/Export, Data Conversions and Data Cleansing.
Responsible for the development of target data architecture, design principles, quality control, and data standards for the organization.
Worked with DBA to create Best-Fit Physical Data Model from the Logical Data Model using Forward Engineering in Erwin.
Produced quality reports for management for decision making.
Participated in all phases of research including data collection, data cleaning, data mining, developing models and visualizations.
Redefined many attributes and relationships and cleansed unwanted tables/columns using SQL queries.
Utilized Spark SQL API in PySpark to extract and load data and perform SQL queries.

Environment: ETL, Teradata BTEQ, S3, XGBOOST, SVM, Random Forest, AWS, Oracle PL/SQL, Erwin 9.6, DBA, SQL, Shell Script, HMM, Spark SQL, PySpark.

Confidential -Chicago, IL

Data Scientist

Responsibilities:

Manipulating, cleansing & processing data using Excel, Access and SQL.
Responsible for loading, extracting and validation of client data.
Modelled clean data into the Kafka servers for use over the spark engine.
Zookeeper along with Kafka was used to stream data and end-to-end client communication.
Performed transformations over the warehoused data using Scala& Python and modeled the data back into the servers for iterative transformations into KAFKA.
Model led data using Machine learning libraries (Sci-kit learn) apart from SVN and KNN based classification to create a training dataset for use in a predictive model.
Liaising with end-users and 3rd party suppliers.
Analyzing raw data, drawing conclusions & developing recommendations
Writing T-SQL scripts to manipulate data for data loads and extracts.
Developing data analytical databases from complex financial source data.
Performing daily system checks.
Data entry, data auditing, creating data reports & monitoring all data for accuracy.
Designing, developing and implementing new functionality.
Monitoring the automated loading processes.
Advising on the suitability of methodologies and suggesting improvements.
Carrying out specified data processing and statistical techniques.
Supplying qualitative and quantitative data to colleagues & clients.
Using Informatica& SAS to extract, transform & load source data from transaction systems.
Performed sequential analytics using SAS Enterprise miner using jobs fed by the SAS Grid Manager.
Loaded packages and stored procedures using Base SAS and integrated functional and business requirements using the EBI suite.
Creating data pipelines using big data technologies like Hadoop, spark etc.
Creating statistical models using distributed and standalone models to build various diagnostics, predictive and prescriptive solution.
Utilize a broad variety of statistical packages like SAS, R, MLIB, Graphs, Hadoop, Spark, MapReduce, Pig
Performed a check using quality parameters fed using the SAS QC engine.
Created a UI dashboard for end users and performed prototype testing using Tableau.
Refine and train models based on domain knowledge and customer business objectives
Deliver or collaborate on delivering effective visualizations to support the client business objectives.

Environment: Cloudera, HDFS, Pig, Hive, Map Reduce, python, Sqoop, Storm, Kafka, LINUX, Hbase, Impala, Java, SQL, Cassandra, MongoDB, SVN.

Confidential -Des Moines, IA

Data Analyst

Responsibilities:

Study and understanding of the business and its functionalities by communication with Business Analysts.
Analyzed the existing database for performance and suggested methods to redesign the model for improving the performance of the system.
Supported ad-hoc, standard reporting and production projects.
Designed and implemented many standard processes that are maintained and run on a scheduled basis.
Created reports using MS Access and Excel. Applying filters to retrieve best results.
Developed the Stored Procedures, SQL Joins, SQL queries for data retrieval, accessed for analysis and exported the data into CSV, Excel files.
Developed Data mapping specifications to create and execute detailed system test plans. The data mapping specifies what data will be extracted from an internal data warehouse, transformed and sent to an external entity.
Analyzed business requirements, system requirements, and data mapping requirement specifications and communicated it to developers effectively.
Documented functional requirements and supplementary requirements in Quality Center.
Setting up of environments to be used for testing and the range of functionalities to be tested as per technical specifications.
Tested Complex ETL Mappings and Sessions based on business user requirements and business rules to load data from source flat files and RDBMS tables to target tables.
Wrote and executed unit, system, integration and UAT scripts in a data warehouse projects.
Wrote and executed SQL queries to verify that data has been moved from transactional system to DSS, Data warehouse, data mart reporting system in accordance with requirements.
Troubleshoot test scripts, SQL queries, ETL jobs, and data warehouse/data mart/data store models.
Responsible for different Data mapping activities from Source systems to Teradata.
Developed SQL scripts, stored procedures, and views for data processing, maintenance etc., and other database operations.
Performed the SQL Tuning and optimized the database and created the technical documents.
Imported the Excel Sheet, CSV, Delimited Data, advanced excel features, ODBC compliance data sources into Oracle database for data extractions, data processing, and business needs.
Designed and optimized the SQL queries, pass through query, make table query, joins in MS-Access 2003 and exported the data into Oracle database server.
Compiled sales production and market penetration data for executive management. Data included employee activity, client coverage, and territory alignment analysis.
Conducted business analysis, project assessment, and feasibility determination.
Analyzed data feed requirements for Risk Management, Customer Information Management, and Analytic Support.
Familiar with data and content migration using SAS migration utility for products that rely on metadata.
Developed CSV files and reported offshore progress to management with the use of Excel Templates, Excel macros, Pivot tables,and functions.
Improved accuracy and relevance of credit card clients planning process reports and budgets reports for make high-level decisions.
Manage all UAT deliverables to completion with overlapping releases.

Environment: SAS Enterprise Guide 4.0, OLAP Cube studio, Stored Processes, SAS Management Console, Informatica 8.1, MS Excel, MS PowerPoint, MS Visio, MS Project Management, Teradata SQL Assistant, Enterprise Miner, SAS DI Studio, MS Access, MS Excel. SQL, SPSS, SQL, VBA, PL/SQL, Shell Scripting, Oracle, Oracle 10g.

Confidential

PL/SQL Developer

Responsibilities:

Interacted with users, team lead, and DBA’s and technical manager to fully understand the requirements of the system.
Performed Ad hoc analysis and data pulls for business users whenever needed.
Developed various Oracle Applications using PL/SQL, SQL*Plus, Forms, Reports, Workflow Builder and Application Object Library.
As per Project manager, work with users through all Software Development Lifecycle (SDLC) phases.
For complex business queries involving multiple tables from different databases used Joins, correlated and non-correlated sub-queries.
Developed PL/SQL packages, Oracle tables, stored procedures, triggers and UNIX shell scripts.
Write technical design documents for conversions, extensions, interfaces, and reports.
Coded, tested, debugged, documented and maintained programs.
Built Internet access and GUI interfaces to Relational Database Management Systems (RDBMS).
Generated Oracle Forms and database objects through Designer.
Created new Procedures, Functions, Triggers, Materialized Views, Packages, Simple, Ref& Traditional
Cursors, Dynamic SQL, Table functions as part of requirements.
Enforced business rules via checks and constraints.
Helped design custom data models that integrated with Oracle Applications.
Developed SQL queries to fetch data as per the business requirements with proper tuning techniques.
Developed Oracle Forms and Reports for the user interface; a data load using PL/SQL and SQL*Loader
Managed schema objects, such as tables, indexes, and views.
Created checks and constraints to maintain data integrity across the data in the database.
Created Stored Procedures to generate various Drill-through reports and functions to support efficient data storage and manipulation.
Migrated actuate code of data extract to PL/SQL equivalent and Created PL/SQL functions, packages, procedures and joins.
Provided Oracle database schema modifications.
Worked on various back-end Procedures and Functions using PL/SQL.
Wrote necessary PL/SQL API scripts for the successful running of the programs.
Scheduled SQL Server Agent Jobs to run every day to update changes in the base tables.
Used Exception Handling for handling errors of programs, scripts, and procedures.
Followed company guidelines and processes to complete the tasks.

Environment: Windows 2000, Oracle 9i, SQL Developer, UNIX, Windows XP, PL/SQL, Triggers, RDBMS.

Confidential

PL/SQL Developer

Responsibilities:

Analyzed, created new and modified existing forms, reports, scheduled reports, Oracle, PL/SQL, Stored Procedures, Functions, Packages, Triggers etc.
Designed, developed and enhanced custom Forms and Reports according to the functional specification.
Did code reviews of other developers and provided suggestions to improve performance
Provided expertise to development managers in design or preparing proof-of-concept testing.
Converted Forms & Reports from 6i to 10g.
Wrote complicated stored Procedures, Functions, Database Triggers and Packages, Shell Scripts.
Involved in handling Exceptions through PL/SQL Blocks.
Created various feeds/downloads, nightly batch jobs, and other daily/monthly reports/downloads.
Developed SQL *Loader scripts to load data from flat file to Oracle 10g database tables.
Tuned batch jobs and other daily/monthly reports/downloads.
Used Toad for creating and modifying procedures, functions, and triggers.
Developed complex Oracle Forms providing extensive GUI features (multi-select drag and drop, graphical charts, automated system alerts and notifications etc.).
Created Test Plan for QA and implementation plan for Production implementation once the unit test is done.
Performed SQL performance tuning using Explain plan, Trace utility &TKProf.
Documenting all Oracle Reports, Packages, Procedures and Functions development specifications.
Tuned SQL queries and performed refinement of the database design leading to significant improvement in system response time and efficiency.
Preparing test plan for QA for testing before moving from PreProd to the Production environment.
Modifying forms, reports and stored procedures & packages, triggers, functions etc. to meet the business requirements.

Environment: Oracle 9i, Windows XP, SQL Developer, Toad, Informatica PowerCenter 9.0.

We provide IT Staff Augmentation Services!

Data Scientist Resume

San Francisco, CA

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship