Data Scientist Resume
Portland, OR
SUMMARY:
- Over 8 years of Experienced working as a Data Scientist/Data Architect/Data Analyst/Data Modeling with emphasis on Data Mapping, Data Validation in Data Warehousing Environment.
- Extensive experienced on business intelligence (and BI technologies) tools such as OLAP, Data warehousing, reporting and querying tools, Data mining and Spreadsheets
- Worked on different type of Python modules such as requests, boto, flake8, flask, mock and nose
- Extensive experienced on business intelligence (and BI technologies) tools such as OLAP, Data warehousing, reporting and querying tools, Data mining and Spreadsheets
- Experienced with machine learning algorithm such as logistic regression, random forest, Xgboost, KNN, SVM, neural network, linear regression, lasso regression and k - means
- Strong skills in statistical methodologies such as A/B test, experiment design, hypothesis test, ANOVA
- Extensively worked on Python 3 .5/2.7 ( Numpy, Pandas, Matplotlib, NLTK and Scikit-learn )
- Experience in implementing data analysis with various analytic tools, such as Anaconda 4.0JupiterNotebook 4.X, R 3.0 (ggplot2, Caret, dplyr) and Excel
- Solid ability to write and optimize diverse SQLqueries, working knowledge of RDBMS like SQLServer2008, NoSQL databases like MongoDB3.2
- Strong experience in Bigdata technologies like Spark 1.6, Sparksql, Pyspark, Hadoop 2.X, HDFS, Hive 1.X
- Experience in visualization tools like, Tableau9.X, 10.X for creating dashboards
- Experienced the full software life cycle in SDLC, Agile and Scrummethodologies .
- Skilled in Advanced Regression Modelling, Correlation, Multivariate Analysis, Model Building, Business Intelligence tools and application of Statistical Concepts.
- Proficient in PredictiveModelling, Data Mining Methods, FactorAnalysis, ANOVA, Hypotheticaltesting, normal distribution and other advanced statistical and econometric techniques.
- Developed predictive models using Decision Tree, Random Forest, NaïveBayes, Logistic Regression, Cluster Analysis, and Neural Networks.
- Experienced in Machine Learning and Statistical Analysis with PythonScikit-Learn.
- Experienced in Python to manipulate data for data loading and extraction and worked with python libraries like Matplotlib, Numpy, Scipy and Pandas for dataanalysis .
- Worked with complex applications such as R, SAS, Matlab and SPSS to develop neural network, cluster analysis.
- Expertise in transforming business requirements into analytical models, designing algorithms, building models, developing data mining and reporting solutions that scales across massive volume of structured and unstructured data.
- Skilled in performing data parsing, datamanipulation and datapreparation with methods including describe data contents, compute descriptive statistics of data, regex, split and combine, Remap, me r ge, subset, reindex, melt and reshape.
TECHNICAL SKILLS:
Database Design Tools and Data Modeling: Star Schema/Snowflake Schema modeling, Fact & Dimensions tables, physical & logical data modeling, Normalization and De-normalization techniques, Kimball & Inmon Methodologies
Databases: SQL Server 2017/2016/2014, MS-Access, Oracle 11g/10g/9i, Sybase 15.02 and DB2 2016.
Big Data Tools: Hadoop 2.7.2, Hive, Spark2.1.1, Pig, HBase, Sqoop, Flume
DWH / BI Tools: Microsoft Power BI, SSIS, SSRS, SSAS, Business Intelligence Development Studio (BIDS), Visual Studio, SAP Business Objects, SAP SE v 14.1(Crystal Reports) and Informatica 6.1.
Frameworks: Shogun, Accord Framework/AForge.net, Scala, Spark, Cassandra, DL4J, ND4J, Scikit-learn, Microsoft .Net 4.5/ 4.0/ 3.5/3.0, Entity Framework, Bootstrap, Microsoft Azure, Swagger.
OLAP/ BI / ETL Tool: Business Objects 6.1 / XI, MS SQL Server 2008 / 2005 Analysis Services (MS OLAP,SSAS), Integration Services (SSIS),Reporting Services (SSRS), Performance Point Server (PPS),Oracle 9i OLAP,MS Office Web Components (OWC11), DTS, MDX, Crystal Reports 10, Crystal Enterprise 10(CMC)
Tools: SQL Server Profiler, Import & Export Wizard, Visual Studio v14, .Net, Power Pivot, ProClarity, Microsoft Office 2007/10/13, Excel Power Pivot, Excel Data Explorer, Tableau 8/10, JIRA
Data Modeling Tools: Erwin r9.6/9.5, ER/Studio 9.7, Star-Schema Modeling, Snowflake-Schema Modeling, FACT and dimension tables, Pivot Tables
Operating Systems: Microsoft Windows 8/7/XP, Linux and UNIX
Microsoft Technologies: PHP, Scala2, Shark2, Awk, Cascading, Cassandra, Clojure, Fortran, JavaScript, JMP, Mahout, objectiveC, QlickView, Redis, Redshifed
PROFESSIONAL EXPERIENCE:
Confidential, Portland,OR
Data Scientist
Responsibilities:
- Built data pipelines for reporting, alerting, and data mining. Experienced with table design and data management using HDFS, Hive, Impala, Sqoop, MySQL, Mem SQL, Grafana/Influx DB, and Kafka.
- Worked with statistical models for data analysis, predictive modelling, machine learning approaches, recommendation and optimization algorithms.
- Working in Business and Data Analysis, Data Profiling, Data Migration, Data Integration and Metadata Management Services.
- Worked extensively on Databases preferably Oracle 11g/12c and writing PL/SQL scripts for multiple purposes.
- Built models using Statistical techniques like Bayesian HMM and Machine Learning classification models like XG Boost, SVM, and Random Forest using R and Python packages.
- Worked with data compliance teams, data governance team to maintain data models, Metadata, data Dictionaries, define source fields and its definitions.
- Worked with Big Data Technologies such Hadoop, Hive, MapReduce
- Used MLlib, Spark'sMachinelearning library to build and evaluate different models.
- Implemented rule based expertise system from the results of exploratory analysis and information gathered from the people from different departments.
- Performed Data Cleaning, features scaling, features engineering using pandas and numpy packages in python.
- Developed MapReduce pipeline for feature extraction using Hive.
- Created Data QualityScripts using SQL and Hive to validate successful data load and quality of the data. Created various types of data visualizations using Python and Tableau.
- Handled importing data from various data sources, performed transformations using Hive, Map Reduce, and loaded data into HDFS .
- Used R and python for Exploratory Data Analysis, A/B testing, Anova test and Hypothesis test to compare and identify the effectiveness of Creative Campaigns.
- Created clusters to classify Control and test groups and conducted group campaigns.
- Analyzed and calculated the lifetime cost of everyone in the welfare system using 20 years of historical data.
- Developed LINUXShell scripts by using NZSQL/NZLOAD utilities to load data from flat files to Netezzadatabase .
- Developed triggers, stored procedures, functions and packages using cursors and ref cursor concepts associated with the project using Pl / SQL
- Created various types of data visualizations using R, python and Tableau.
- Used Python, R, SQL to create Statistical algorithms involving Multivariate Regression, Linear Regression, Logistic Regression, PCA, Random forest models, Decision trees, Support Vector Machine for estimating the risks of welfare dependency.
- Identified and targeted welfare high-risk groups with Machine learning algorithms.
- Conducted campaigns and run real-time trials to determine what works fast and track the impact of different initiatives.
- Used Graphical Entity-Relationship Diagramming to create new database design via easy to use, graphical interface.
- Created multiple custom SQLqueries in TeradataSQL Workbench to prepare the right data sets for Tableau dashboards
- Perform analyses such as regression analysis, logistic regression, discriminant analysis, cluster analysis using SAS programming .
Environment: Erwin 8, Teradata 13, SQL Server 2008, Oracle 9i, SQL Loader, PL/SQL, ODS, OLAP, OLTP, SSAS, Informatica Power Center 8.1.
Confidential, Birmingham,AL
Data Scientist
Responsibilities:
- Designed an Industry standard data Model specific to the company with group insurance offerings, Translated the business requirements into detailed production level using Workflow Diagrams, Sequence Diagrams, Activity Diagrams and Use Case Modeling
- Involved in design and development of data warehouse environment, liaison to business users and technical teams gathering requirement specification documents and presenting and identifying data sources, targets and report generation.
- Worked with DataGovernance, Dataquality, datalineage, Dataarchitect to design various models and processes.
- Independently coded new programs and designed Tables to load and test the program effectively for the given POC's using with BigData/Hadoop.
- Designed data models and data flow diagrams using Erwin and MSVisio.
- Utilized ADO.Net Object Model to implement middle-tier components that interacted with MSSQL Server 2000database.
- Participated in AMS (AlertManagementSystem) JAVA and SYBASE project. Designed SYBASE database utilizing ERWIN. Customized error messages utilizing SP ADDMESSAGE and SP BINDMSG. Created indexes, made query optimizations. Wrote stored procedures, triggers utilizing T-SQL.
- Successfully optimized codes in Python to solve a variety of purposes in data mining and machine learning in Python.
- Building programing logics for developing analysis datasets by integrating with various data marts in the sandbox environment
- Facilitated stakeholder meetings and sprint reviews to drive project completion.
- Successfully managed projects using Agile development methodology
- Project experience in Data mining, segmentation analysis, business forecasting and association rule mining using Large Data Sets with Machine Learning.
- Automated Diagnosis of Blood Loss during Accidents and Applied Machine Learning algorithms to diagnose blood loss from vital signs (ECG, HF, GSR, etc.). Demonstrated performances of 94.6% on par with state-of-the-art models used in industry
Environment: Erwin r9.6, DB2, Teradata, SQL-Server2008, Informatica 8.1, Enterprise Architect, Power Designer, MS SSAS, Crystal Reports, SSRS, ER Studio, Lotus Notes, Windows XP, MS Excel, word and Access.
Confidential, Warren, NJ
Data Analyst
Responsibilities:
- Responsible for the Study/Creation of SAS Code, SQL Queries, Analysis enhancements and documentation of the system.
- Used R, SAS, and SQL to manipulate data, and develop and validate quantitative models.
- Brainstorming sessions and propose hypothesis, approaches, and techniques.
- Created and optimized processes in the Data Warehouse to import, retrieve and analyze data from the Cyber Life database.
- Analyzed data collected in stores (JCL jobs, stored-procedures, and queries) and provided reports to the Business team by storing the data in excel/SPSS/SAS file.
- Performed Analysis and Interpretation of the reports on various findings.
- Prepared Test documents for zap before and after changes in Model, Test, and Production regions.
- Responsible for production support Abend Resolution and other production support activities and comparing the seasonal trends based on the data by Excel.
- Used advanced Microsoft Excel functions such as pivot tables and VLOOKU Pin order to analyze the data and prepare programs.
- Successfully implemented migration of client's requirement application from Test/DSS/Model regions to Production.
- Prepared SQL scripts for ODBC and Teradata servers for analysis and modeling.
- Provided complete assistance of the trends of the financial time series data.
- Various statistical tests performed for clear understanding to the client.
- Implemented procedures for extracting Excel sheet data into the mainframe environment by connecting to the database using SQL.
- Provided training to Beginners regarding the Cyber Life system and other basics.
- Complete support to all regions. (Test/Model/System/Regression/Production).
Environment: Python, R, SQL, Tableau, Spark, Machine Learning Software Package, recommendation systems.
Confidential, Pennsylvania
Data Analyst
Responsibilities:
- Worked with BI team in gathering the report requirements and Sqoop to export data into HDFS and Hive.
- Worked on MapReduce jobs in Java for data cleaning and pre-processing.
- Assisted with data capacity planning and node forecasting.
- Collaborated with the infrastructure, network, database, application and BIteams to ensure data quality and availability.
- Developed MapReduce programs to extract and transform the data sets and results were exported back to RDBMS using Sqoop.
- Was responsible for importing the data (mostly log files) from various sources into HDFS using Flume.
- Created tables in Hive and loaded the structured (resulted from MapReduce jobs) data.
- Using HiveQL developed many queries and extracted the required information.
- Exported the data required information to RDBMS using Sqoop to make the data available for the claims processing team to assist in processing a claim based on the data.
- Developed MapReduce programs to parse the raw data, populate staging tables and store the refined data in partitioned tables in the EDW.
- Worked on MongoDB database concepts such as locking, transactions, indexes, Sharing, replication, schema design, etc.
- Created Hive queries that helped market analysts spot emerging trends by comparing fresh data with EDW reference tables and historical metrics.
- Enabled speedy reviews and first mover advantages by using Oozie to automate data loading into the Hadoop Distributed File System and PIG to pre-process the data.
- Involved in defining the source to target data mappings, business rules, business and data definitions.
- Responsible for defining the key identifiers for each mapping/interface.
Environment: SQL/Server, Oracle 9i, MS-Office, Teradata, Informatica, ER Studio, XML, Business Objects, Help-Point Claims Services
Confidential
Internship Data Analyst
Responsibilities:
- Implemented Microsoft Visio and Rational Rose for designing the Use Case Diagrams, Class model, Sequence diagrams, and Activity diagrams for SDLC process of the application
- Worked with other teams to analyze customers to analyze parameters of marketing.
- Conducted Design reviews and Technical reviews with other project stakeholders.
- Was a part of the complete life cycle of the project from the requirements to the production support
- Created test plan documents for all back-end database modules
- Used MS Excel, MS Access and SQL to write and run various queries.
- Used traceability matrix to trace the requirements of the organization.
- Recommended structural changes and enhancements to systems and databases.
- Conducted Design reviews and Technical reviews with other project stakeholders.
- Maintenance in the testing team for System testing/Integration/UAT
- Guaranteeing quality in the deliverables.
Environment: UNIX, C++, SQL, Oracle 10g, MS Office, MS Visio.
Confidential
Internship Data Analyst
Responsibilities:
- Developed Internet traffic scoring platform for ad networks, advertisers,and publishers (rule engine, site scoring, keyword scoring, lift measurement, linkage analysis).
- Responsible for defining the key identifiers for each mapping/interface.
- Clients include eBay, Click Forensics, Cars.com, Turn.com, Microsoft, and Looksmart.
- Implementation of Metadata Repository, Maintaining Data Quality, Data Cleanup procedures, Transformations, Data Standards, Data Governance program, Scripts, Stored Procedures, triggers and execution of test plans.
- Designed the architecture for one of the first analytics 3.0. Online platforms: all-purpose scoring, with on-demand, SaaS, API services. Currently under implementation.
- Web crawling and text mining techniques to score referral domains, generate keyword taxonomies, and assess commercial value of bid keywords.
- Developed new hybrid statistical and data mining technique known as hidden decision trees and hidden forests.
- Reverse engineering of keyword pricing algorithms in the context of pay-per-click arbitrage.
- Coordinated meetings with vendors to define requirements and system interaction agreement documentation between client and vendor system.
- Automated bidding for advertiser campaigns based either on keyword or category (run-of-site) bidding.
- Creation of multimillion bid keyword lists using extensive web crawling. Identification of metrics to measure the quality of each list (yield or coverage, volume, and keyword average financial value).
- Enterprise Metadata Library with any changes or updates.
- Document data quality and traceability documents for each source interface.
- Establish standards of procedures.
- Generate weekly and monthly asset inventory reports.
Environment: Erwin r, SQL Server 2000/2005, Windows XP/NT/2000, Oracle, MS-DTS, UML, UAT, SQL Loader, OOD, OLTP, PL/SQL, MS Visio, Informatica.