Data Scientist Resume
Minneapolis, MN
SUMMARY:
- 8+ years of Experience in Designing, Administration, Analysis, Management in the Business Intelligence Data WarehousingWeb - based Applications and Databases and Experience in industries such as Retail, Financial, Accounting, Distribution, Logistics, Inventory, Manufacturing, Marketing, Services, Networking and Engineering
- Experience in all the Latest BI Tools Tableau, QlikView Dashboard Design and SAS
- Analyze and extract relevant information from large amounts of Data to help automate for self-monitoring, self-diagnosing, self-correcting solutions and optimize key processes
- Experience in Data architecture design, development, maintenance for Windows and Android device applications
- Familiar on building models with big Data frameworks like ClouderaManager and Hadoop
- Experience in BigData with Hadoop, HDFS, MapReduce, and Spark .
- Experience in managing full life cycle of Data Science project includes transforming business requirements into Data Collection, Data Cleaning, Data Preparation, Data Validation, Data Mining, and Data Visualization from structured and unstructured Data Sources.
- Sound RDBMS concepts and extensively worked with Oracle 8i 9i 10g 11g, DB2, SQL Server 8.0 9.0 10.0 10.5 11.0 , MySQL, and MS-Access.
- Expertise in Project Management, Analysis, Estimation, with a unique mix of managerial, functional, domain, technical and client handling skills
- Expertise and experience in SQL, SAS and Relational databases. Deep understanding & exposure of Data mining.
- Excellent knowledge in Normalization (1NF, 2NF, 3NF and BCNF) and De-normalization techniques for improved database performance in OLTP, OLAP and Data Warehouse/ DataMart environments
- 2+ years' experience in Agile background of software/ Data design, development, deployment to build services and customer support in Enterprise applications using Object Oriented Analysis and Design ( OOAD )
- Work on gigabytes of text and image files (2-D and 3-D) to solve real-world problems and visualize the Data from the generating Data reports using Google Data Studio for customer usability
- Experience in using Python and statistical software ( R, Excel, Tableau )
- Good track record of working with complex Data sets and translating Data into insights to drive key business and product decisions
- Experience with Azure, SQL and Oracle PL/SQL
- Experience working with Amazon Web Services ( AWS ) product like S3
- Involved in a Aveva start-up mode and contributed to projects using Amazon Web Services ( AWS ) to develop and deploy applications support on device and cloud
- Hands on experience with scripting languages like Perl, BashShell and PHP (for automation)
- Good understanding of scalable Data processing to discover hidden patterns, conducting error analysis in the Data for financial and statistical modeling
- Familiar with multiple Operating System ( OS ) and developing environments including Linux ( Ubuntu ), Windows, etc.
- Familiar with nightly build management tools like Visual Studio Team Foundation Server ( VSTFS )
- Experience developing software in traditional programming languages ( in C, C++) using tools like MS Visual Studio Compact Framework ( VSCF ) for the Windows Mobile Platform
- Familiar with configuration management and repository management of subversion (SVN) control system source code using tools like GIT and MS Team Foundation Version Control ( TFVC )
- Good Experience in database design using PL/SQL, SQL, T-SQL to write Stored Procedures, Functions, Triggers, Views.
- Extensive experience in Data Modeling, Data Analysis and design of OLTP and OLTP systems.
- Expertise in the Data Analysis, Design, Development, Implementation and Testing using Data Conversions, Extraction, Transformation and Loading ( ETL ) and SQL Server, ORACLE and other relational and non-relational databases
- Expertise in transforming business requirements into analytical models, designing algorithms, building models, developing Data mining and reporting solutions that scales across massive volume of Structured and Unstructured Data.
TECHNICAL SKILLS:
Data Modeling Tools: Erwin r 9.6/9.5, ER/Studio 9.7, Star-Schema Modeling, Snowflake-Schema Modeling, FACT and dimension tables, Pivot Tables.
Databases: Oracle 11g/12c, MS Access, SQL Server 2012/2014, Sybase and DB2, Teradata, Hive.
Big Data Tools: Hadoop, Hive, Spark, Pig, HBase, Sqoop, Flume.
BI Tools: Tableau 7.0/8.2, Tableau server 8.2, Tableau Reader 8.1,SAP Business Objects, Crystal Reports
Packages: Microsoft Office 2010, Microsoft Project 2010, SAP and Microsoft Visio, Share point Portal Server
Operating Systems: Microsoft Windows 8/7/XP, Linux and UNIX.
Languages: SQL, PL/SQL, ASP, Visual Basic, XML, Python, SQL, T-SQL, SQL Server, C, C++, JAVA, HTML, UNIX shell scripting, PERL, R.
Applications: Toad for Oracle, Oracle SQL Developer, MS Word, MS ExcelMS Power Point, Teradata, Designer 6i.
Methodologies: RAD, JAD, RUP, UML, System Development Life Cycle (SDLC), Waterfall Model.
Data Modeling Tools: Erwin r 9.6/9.5, ER/Studio 9.7, Star-Schema Modeling, Snowflake-Schema Modeling, FACT and dimension tables, Pivot Tables.
Databases: Oracle 11g/12c, MS Access, SQL Server 2012/2014, Sybase and DB2,Teradata14/15, Hive.
Big Data Tools: Hadoop, Hive, Spark, Pig, HBase, Sqoop, Flume.
BI Tools: Tableau 7.0/8.2, Tableau server 8.2, Tableau Reader 8.1,SAP Business Objects, Crystal Reports
Packages: Microsoft Office 2010, Microsoft Project 2010, SAP and Microsoft Visio, Share point Portal Server
PROFESSIONAL EXPERIENCE:
Confidential, Minneapolis, MN
Data Scientist
Responsibilities:
- Massively involved in Data Architect role to review business requirement and compose source to target data mapping documents.
- Responsible for the dataarchitecture design delivery, data model development, review, approval and Data warehouse implementation.
- Set strategy and oversee design for significant data modeling work, such as Enterprise Logical Models, Conformed Dimensions, and Enterprise Hierarchy.
- Analyzed existing Conceptual and Physicaldatamodels and altered them using Erwin to support enhancements.
- Designed the LogicalDataModel using Erwin with the entities and attributes for each subject areas.
- Lead Architectural Design in BigData, Hadoop projects and provide for a designer that is an idea-driven.
- Developed and configured on InformaticaMDM hub supports the MasterDataManagement (MDM), BusinessIntelligence (BI) and DataWarehousing platforms to meet business needs.
- Loaded data into Hive Tables from Hadoop Distributed File System (HDFS) to provide SQL access on Hadoop data
- Used AgileMethodology of Data Warehouse development.
- Design and implement data ingestion techniques for real time and batch processes for structured and unstructured data sources into Hadoopecosystems and HDFSclusters.
- Designed and developedarchitecture for data servicesecosystem spanning Relational, NoSQL, and BigData technologies.
- Implemented multi-datacenter and multi-rack Cassandra cluster.
- Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from NoSQL and a variety of portfolios.
- Involved in data model reviews as dataarchitect with business analysts and business users with explanation of the data model to make sure it is in-line with business requirements.
- Created Entity relationships diagrams, data flow diagrams and enforced all referential integrity constraints using Rational Rose
- Worked with the ETL team to document the SSIS packages for data extraction to Warehouse environment for reporting purposes.
- Developed data Mart for the base data in Star Schema, Snow-FlakeSchema involved in developing the data warehouse for the database.
- Involved in Dataloading using PL\SQLScripts and SQLServer Integration Services packages
- Established data governance, monitoring of DataQuality and clear documentation for facile implementation.
- Involved in the validation of the OLAP, Unittesting and System Testing of the OLAP Report Functionality and data displayed in the reports.
- Generated ad-hocSQLqueries using joins, database connections and transformation rules to fetch data from Teradata database.
- Created HBase tables to load large sets of structured, semi-structured and unstructureddata coming from UNIX, NoSQL and a variety of portfolios.
- Worked on AmazonRedshift and AWS and architecting a solution to load data creates data models and run BI on it.
- Created UNIXscripts for file transfer and file manipulation
- Directed to Create Dashboards based on the business requirement using SSRS/Cognos and helped development team in knowledge about the requirement.
- Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted the data from Oracle into HDFS using Sqoop
- Worked with various Teradata15 tools and utilities like Teradata Viewpoint, MultiLoad, ARC, TeradataAdministrator, BTEQ and other Teradata Utilities.
- Involved in several facets of MDM implementations including DataProfiling, Metadataacquisition and data migration.
- Extensively used AginityNetezza work bench to perform various DML,DDLetc operations on Netezza database.
- Created DDLscripts using Erwin and source to target mappings to bring the data from source to the warehouse.
- Lead database level tuning and optimization in support of application development teams on an ad-hoc basis.
Environment : R Erwin 9.7, HDFS, AWS Redshift, MapReduce, Hive 2.3, HBase, MongoDB, Cassandra, Metadata, Netezza, MySQL, Hadoop 3.0, ODS, Oracle 12c, T-SQL, MDM, PL/SQL, Teradata R15, Teradata SQL Assistant 15.0, Flat Files.
Confidential, Boston,MA
Data Scientist
Responsibilities:
- Evaluating the data analytics opportunities to improve the efficiency of claims handling process like Fraud Detection
- Utilized various data analysis and data visualization tools to accomplish data analysis, report design and report delivery.
- Create statistical models based on researched information to provide conclusions that will guide the company and the industry into the future.
- Taking care of missing data after import and encoding the categorical data, when needed.
- Splitting the data into training set, test set and scaling the data in training set and test set, if necessary.
- Creatively communicated and presented models to business customers and executives, utilizing a variety of formats and visualization methodologies.
- Impact of marketing tactics on sales and then forecast the impact of future sets of tactics.
- Developed Scala and SQL code to extract data from various databases
- Used R and python for Exploratory Data Analysis and Hypothesis test to compare and identify the effectiveness of Creative Campaigns.
- Used Scala, Python, R and SQL to create Statistical algorithms involving Linear Regression, Logistic Regression, Random forest, Decision trees, Support Vector Machine for estimating the risks.
- Developed statistical models to forecast inventory and procurement cycles.
- Created and designed reports that will use gathered metrics to infer and draw logical conclusions of past and future behaviour.
- Created pipelines for data ingestion and from various channels, through the scripts written in Hive&Java.
- Work with a range of proprietary, industry standard, and open source data stores to assemble and organize and analyze data.
- Mapped customers to revenue to predict the revenue (if any) from a new prospective customer.
- Visualizations, Summary Reports and Presentations using R and Tableau.
- Uploaded data to HadoopHive and combined new tables with existing databases.
- Involved in converting Hive/SQL queries into Spark transformations using SparkRDDs, and Scala.
- Developed pyspark code and Spark-SQL/Streaming for faster testing and processing of data.
- Supported Map Reduce Programs those are running on the cluster.
- Created Data Quality Scripts using SQL and Hive to validate successful data load and quality of the data.
- Scheduled jobs and workflow scheduler to manage Hadoop jobs.
- Loaded the aggregated data into Data Mart for reporting, dash boarding and ad-hoc analysis using Tableau and developed a self-service BI solution for quicker turnaround of insights.
- Maintained SQL scripts to create and populate tables in data warehouse for daily reporting across departments.
Environment: R 3.x, Python 2.x, Tableau 9, SQL Server 2012, Spark/Scala, SBT, Hive, Sqoop, Spark ML.
Confidential - Downers Grove, IL
Data Scientist
Responsibilities:
- Participated in all phases of project life cycle including data collection, data mining, data cleaning, model building and validation, as well as report creating.
- Utilized MapReduce and PySpark programs to process data for analysis reports.
- Worked on data cleaning to ensure data quality, consistency, and integrity using Pandas/Numpy.
- Performed data preprocessing on messy data including imputation, normalization, scaling, feature engineering etc. using Scikit-Learn.
- Conducted exploratory data analysis using Python Matplotlib and Seaborn to identify underlying patterns and correlations between features.
- Built classification models based on Logistic Regression, Decision Trees, Random Forest Support Vector Machine, and Ensemble algorithms to predict the probability of absence of patients.
- Applied various metrics like recall, precision, F-Score, ROC, and AUC to evaluate the performance of each model and k-fold cross-validation to test the models with different batches of data to optimize the models.
- Implemented and tested the model on AWSEC2; collaborated with development team to get the best algorithm and parameters.
- Performed data visualization and design dashboards with Tableau and generated complex reports including chars, summaries, and graphs to interpret the findings to the team and stakeholders.
Environment: Python (Scikit-Learn/Keras/Scipy/Numpy/Pandas/ Matplotlib/Seaborn), Machine Learning (Linear and Non-linear Regressions, Deep Learning, SVM, Decision Tree, Random Forest, XGboost, Ensemble and KNN), MS SQL Server 2017, AWS RedShift, S3, Hadoop Framework, HDFS, Spark (Pyspark, MLlib, Spark SQL), Tableau Desktop and Tableau Server.
Confidential - New Albany, OH
Data Analyst/Data Modeler
Responsibilities:
- Analyzed and reviewed functional specifications and requirements to determine best data design approach and translate business requirements into data models.
- Created models for various schemas and created the metadata in order to deploy the models into micro strategy to be able to reuse the definitions enterprise wide.
- Performed data Ingestion for the incoming web feeds into the Data lake store which includes both structured and unstructured data.
- Create the architectural artifacts for the Enterprise Data Warehouse and the Operational Dashboard, such as Entity Relationship Diagrams (ERD), the DDL scripts, the Conceptual Data Model, and technical as well as business documents.
- Conducted data profiling to insure that the available data could support business needs. Worked with the developers on resolving the reported bugs and various technical issues.
- Involved in requirements gathering activities analyze, and document business processes and fundamentals, and strategic data needs.
- Created data source views from MYSQL and HADOOP data sources.
- Migrated our retired systems to leverage new systems and customized according to the business requirement.
- Enforced database naming standards and maintained user domains.
- Supported data conversion activities and coordinated the resolution of conversion and data migration issues.
- Created and maintained Data Dictionary and pursued to reach consensus.
- Created data lineages and mappings for Data Lake schemas.
- Ensured Error logs and audit tables are generated and populated properly.
- Involved in troubleshooting, resolving and escalating data related issues and validating data to improve data quality.
- Tracking and reporting the issues to project team and management.
- Created mapping for horizontal data lineages for various systems.
- Contribute in the development of knowledge transfer documentation.
- Managed change requests by following change request management process for the project.
- Involved in preparing a simple and detailed user guide and training manual for the application and for an intended novice user.
Environment: Erwin 9.64, MS Access, Micro strategy, MySql, Erwin, Oracle10g, HeidiSql, Hadoop, Toad 12.5,MS Visio, SVN
Confidential
Data Analyst
Responsibilities:
- Extensively worked on Informatica PowerCenter Transformations such as Source Qualifier, Lookup, Filter, Expression, Router, Joiner, Update Strategy, Rank, Aggregator, Sequence Generator etc.
- Proficiency in using Informatica PowerCenter tool to design data conversions from wide variety of sources.
- Proficient in using Informatica workflow manager, Workflow monitor to create, schedule and control workflows, tasks, and sessions.
- Created pivot tables and ran VLOOKUP's in Excel as a part of data validation.
- Used Informatica PowerCenter for extraction, loading and transformation (ETL) of data in the data warehouse.
- Worked on data analysis, data discrepancy reduction in the source and target schemas.
- Designed and developed complex mappings, from varied transformation logic like Unconnected and Connected lookups, Router, Filter, Expression, Aggregator, Joiner, Update Strategy and more.
- Preparation of System requirements (SRS), Database specifications (DBS), Software design document (SDD).
- Responsible for the maintenance of few applications in PowerBuilder 10.2
- Involved in using SQL Server 2005 for fixed the production issues in the background.
- Coordination and Quality activities on delivery
- Involved in testing with validation of all fields, functions, programs, agents from front end and back end code reviews across the application.
- Involved in preparation program specifications, unit tests, test cases and user manual documents.
Environment : Informatica 8.x, PowerBuilder 10.2, SQL Server 2005.
Confidential
Data Analyst
Responsibilities:
- Processed data received from vendors and loading them into the database. The process was carried out on weekly basis and reports were delivered on a bi-weekly basis. The extracted data had to be checked for integrity.
- Documented requirements and obtained signoffs.
- Coordinated between the Business users and development team in resolving issues.
- Documented data cleansing and data profiling.
- Wrote SQL scripts to meet the business requirement.
- Analyzed views and produced reports.
- Tested cleansed data for integrity and uniqueness.
- Automated the existing system to achieve faster and accurate data loading.
- Generated weekly, bi-weekly reports to be sent to client business team using business objects and documented them too.
- Learned to create Business Process Models.
- Ability to manage multiple projects simultaneously tracking them towards varying timelines effectively through a combination of business and technical skills.
- Good Understanding of clinical practice management, medical and laboratory billing and insurance claim with processing with process flow diagrams.
- Assisted QA team in creating test scenarios that cover a day in a life of the patient for Inpatient and Ambulatory workflows.
Environment : SQL, data profiling, data loading, QA team.
