Sr. Data Analyst Resume
Richmond, VA
SUMMARY:
- Over 8 years of Experienced working as a Data Analyst / Data Analytics Engineer.
- Experienced on Big data Hadoop, Spark, business intelligence (and BI technologies)
- Experienced in employing Pyspark, SQL to work on building batch jobs and Fraud defenses.
- Adept at using SAS Enterprise suite, R, Python, and Big Data related technologies including Hadoop, Hive, Pig, Sqoop, Cassandra, Spark, Oozie, Flume, Map - Reduce and Cloudera Manager for design of business intelligence applications
- Hands on experienced with Machine Learning, Regression Analysis, Clustering, Boosting, Classification, Principal Component Analysis and Data Visualization Tools
- Familiarity with Crystal Reports, and SSRS - Query, Reporting, Analysis and Enterprise Information Management
- Excellent knowledge on creating reports on Business Intelligence.
- Experienced in Database using Oracle, XML, DB2, Teradata 15/14, Netezza, server, Postgre
- Experienced with machine learning tools and libraries such as Python-Scikit-learn, R, Spark and Weka
TECHNICAL SKILLS:
Data Modelling Tools: Erwin r9.6, 9.5, 9.1, 8.x, Rational Rose, ER/Studio, MS Visio
Programming Languages: Oracle PL/SQL, Julia, UNIX shell scripting, Java
Tools: Python 3.X(NumPy, SciPy, Pandas,pyspark), R (Caret, Weka, ggplot)
Big Data Technologies: Hadoop( Hive, HDFS, MapReduce, Pig, Kafka), SPARK 2.2.4, Cassandra, MongoDB, AWS(RDS, Dynamodb, Redshift)
Reporting Tools: Crystal reports XI, Business Intelligence, SSRS, Business Objects 5.x/ 6.xAWS Cloud EC2, S3, RDS, Dynamodb, Kinesis, Redshift
ETL: Informatica Power Centre, SSIS.
Project Execution Methodologies: Ralph Kimball and Bill Inmon, data warehousing methodology, Rational Unified Process (RUP), Rapid Application Development (RAD), Joint Application Development (JAD)
Operating Systems: Windows, UNIX, LINUX, Mac OS
Databases: Oracle /11g/12c/10g/9i, SQL Server, MySQL, PostgreSQL
PROFESSIONAL EXPERIENCE:
Confidential, Richmond, VA
Sr. Data Analyst
Responsibilities:
- Develop Fraud Batch Defenses On using Spark and AWS. Using Python and Spark - SQL to build stable defenses against fraud. This involves working on identifying the fields in legacy Teradata database and crosswalk mapped fields which are migrated into Cloud, stored in Onelake-S3.
- Coordinated with different teams across Confidential, while involving multiple work streams
- The defenses worked saves millions of dollars for Confidential against online Fraud.
- Experienced working with all kind of source formats, which includes - CSV, JSON, AVRO, Parquet, etc.
- Experienced working on data with billions of records, coming from disparate source engines, namely - AWS S3, Onelake-S3, Cerebra, and Snowflake.
- Identified crosswalk between the Teradata fields and corresponding One lake S3 tables on Nebula for development stages.
- Used mount functionality in spark to get access to the One Lake datasets for consumption purposes using control plane with particular, ASV, BAPCI and LOB ARN IAM role for a given databricks NPI cluster.
- Experienced working with different types of source, namely, application data, process data, TSYS data, chordiant case data.
- Worked on catapulting data from teradata to snowflake to consume on Databricks.
- Implemented the batch defenses on a framework, which works on AIDE ( Automated Investigations Decisioning engine). which was developed using pyspark and python API.
- Written extensive Pyspark SQL queries for the defense and validated the queries as comparing to the results on the legacy teradata platform and snowflake cloud accordingly.
Environment: Onelake-S3, Aws S3, Snowflake, Databricks, Spark, Pyspark, Spark-SQL, Python, SQL, Teradata BTEQ.
Confidential, Reston, VA
Data Analytics Engineer
Responsibilities:
- Performed scoring and financial forecasting for collection priorities using Python, R and SAS machine learning algorithms.
- Built data pipelines for reporting, alerting, and data mining. Experienced with table design and data management using HDFS, Hive, Impala, Sqoop, MySQL, and Kafka.
- Developed Python modules for machine learning & predictive analytics on AWS. Implemented a Python-based distributed random forest via Python streaming.
- Worked with statistical models for data analysis, predictive modelling, machine learning approaches and recommendation and optimization algorithms.
- Working in Business and Data Analysis, Data Profiling, Data Migration, Data Integration and Metadata Management Services.
- Worked extensively on Databases preferably Oracle 11g/12c and writing PL/SQL scripts for multiple purposes.
- Built models using Statistical techniques like Bayesian HMM and Machine Learning classification models like XG Boost, SVM, and Random Forest using R and Python packages.
- Worked with data compliance teams, data governance team to maintain data models, Metadata, data Dictionaries, define source fields and its definitions.
- Worked with Big Data Technologies such Hadoop, Hive, MapReduce
- Setup storage and data analysis tools in Amazon Web Services cloud computing infrastructure.
- A highly immersive Data Science program involving Data Manipulation & Visualization, Web Scraping, Machine Learning, Python programming, SQL, GIT, Unix Commands, NoSQL, MongoDB, Hadoop.
- Handled importing data from various data sources, performed transformations using Hive, MapReduce, and loaded data into HDFS
- Managed existing team members, lead the recruiting and onboarding of a larger Data Science team that addresses analytical knowledge requirements.
- Worked directly with upper executives to define requirements of scoring models.
- Developed a model for predicting a debtor setting up a repayment rehabilitation program for student loan debt.
- Developed a model for predicting repayment of debt owed to small and medium enterprise (SME) businesses.
- Developed a generic model for predicting repayment of debt owed in the healthcare, large commercial, and government sectors.
- Created SQL scripts and analyzed the data in MS Access/Excel and Worked on SQL and SAS script mapping.
- Developed a legal model for predicting which debtors respond to litigation only.
- Created multiple dynamic scoring strategies for adjusting the score upon consumer behavior such as payment or right-party phone call.
- Rapid model creation in Python using pandas, numpy, sklearn, and plot.ly for data visualization. These models are then implemented in SAS where they are interfaced with MSSQL databases and scheduled to update on a timely basis.
- Data analysis using regressions, data cleaning, excel vlookup, histograms and TOAD client and data representation of the analysis and suggested solutions for investors
- Attained good knowledge in Hadoop Data Lake Implementation and HADOOP Architecture for client business data management.
- Identifying relevant key performance factors; testing their statistical significance
- Above scoring models resulted in millions of dollars of added revenue to the company and a change in priorities of the entire company.
Environment: R, SQL, Python 2.7.x, SQL Server 2014, regression, logistic regression, random forest, neural networks, Topic Modeling, NLTK, SVM (Support Vector Machine), JSON, XML, HIVE, HADOOP, PIG, Sklearn, SciPy, GraphLab, No SQL, SAS, SPSS, Spark, Hadoop, Kafka, H Base, MLib.
Confidential, Alexandria, VA
Data scientist/ R Developer
Responsibilities:
- Designed an Industry standard data Model specific to the company with group insurance offerings, Translated the business requirements into detailed production level using Workflow Diagrams, Sequence Diagrams, Activity Diagrams and Use Case Modeling
- Involved in design and development of data warehouse environment, liaison to business users and technical teams gathering requirement specification documents and presenting and identifying data sources, targets and report generation.
- Conceptualized the most-used product module (Research Center) after building a business case for approval, gathering requirements and designing the User Interface
- A team member of Analytical Group and assisted in designing and development of statistical models for the end clients. Coordinated with end users for designing and implementation of e-commerce analytics solutions as per project proposals.
- Conducted market research for client; developed and designed sampling methodologies, and analyzed the survey data for pricing and availability of clients' products. Investigated product feasibility by performing analyses that include market sizing, competitive analysis and positioning.
- Successfully optimized codes in Python to solve a variety of purposes in data mining and machine learning in Python.
- Building programing logics for developing analysis datasets by integrating with various data marts in the sandbox environment
- Facilitated stakeholder meetings and sprint reviews to drive project completion.
- Successfully managed projects using Agile development methodology
- Project experience in Data mining, segmentation analysis, business forecasting and association rule mining using Large Data Sets with Machine Learning.
- Automated Diagnosis of Blood Loss during Accidents and Applied Machine Learning algorithms to diagnose blood loss from vital signs (ECG, HF, GSR, etc.). Demonstrated performances of 94.6% on par with state-of-the-art models used in industry
- Successfully optimized codes in Python to solve a variety of purposes in data mining and machine learning in Python.
- Trained organization-wide employees for Financial domain certification & Business Analysis Certification exams
- Prepared graphs and reports using GGplot2 library for an overview of the analytical models and results.
- Developed Shiny and R application showcasing machine learning for improving business forecasting.
- Developed predictive models using Vector Machines, Decision Tree, Random Forest and Naïve Bayes, and collaborating with marketing and dev-ops teams for production deployment.
Environment: R, Windows XP/NT/2000, SQL Server 2005/2008, SQL, Oracle8i/10g, DB2, MS Excel, Mainframes MS Visio, Crystal Reports 9., Python, R Studio, Shiny, Excel 2013.
Confidential, NY
Data Modeler
Responsibilities:
- Responsible for technical data governance, enterprise wide data modeling and database design.
- Used Model Mart of ERwin for effective model management of sharing, dividing and reusing model information and design for productivity improvement.
- Conducted detailed and comprehensive Business Analysis by working with the IT staff, Business Staff, SME's, and other stakeholders to identify the system, operational requirements and process improvements.
- Designed an Industry standard data Model specific to the company with group insurance offerings, Translated the business requirements into detailed production level using Workflow Diagrams, Sequence Diagrams, Activity Diagrams and Use Case Modeling
- Involved in design and development of data warehouse environment, liaison to business users and technical teams gathering requirement specification documents and presenting and identifying data sources, targets and report generation.
- Worked with Development DBA to assist and support developers with SQL performance tuning, query tuning and code reviews
- Analyzed business requirements, system requirements, data mapping requirement specifications, and responsible for documenting functional requirements and supplementary requirements in Quality Center.
- Responsible for evaluating various RDBMS like OLTP modeling, documentation, and metadata reporting tools including Erwin, developed logical/ physical data models using Erwin tool across the subject areas based on the specifications and established referential integrity of the system.
- Extensively worked on Source to Target mapping for business need and documentation purpose.
Environment: Erwin r8, Informatica, Windows XP/NT/2000, SQL Server 2005/2008, SQL, Oracle8i/10g, DB2, MS Excel, Mainframes MS Visio, Rational Rose, Requisite Pro.
Confidential
Data Modeler
Responsibilities:
- Conducted one-to-one sessions with business users to gather data for Data Warehouse requirements.
- Part of team analyzing database requirements in detail with the project stakeholders through Joint Requirements Development(JRD) sessions.
- Developed an Object modeling in UML for Conceptual Data Model using Enterprise Architect.
- Developed logical and Physical data models using ERwin to design OLTP system for different applications.
- Facilitated transition of logical data models into the physical database design and recommended technical approaches for good data management practices.
- Performed numerous SQL Statements to load and extract data to and from database for testing.
- Worked with SQL Server Integration Services in extracting data from several source systems and transforming the data and loading it into ODS.
- Worked with DBA group to create Best-Fit Physical Data Model from the Logical Data Model using Forward engineering using ERWin
- Worked with DBA group to create Best-Fit Physical Data Model with DDL from the Logical Data Model using Forward engineering.
- Involved in detail designing of data marts by using Star Schema and plan data marts involving shared dimensions.
- Used Model Manager Option in Erwin to synchronize the data models in Model Mart approach.
- Gather various reporting requirements from Business Analysts.
- Worked on enhancements to the Data Warehouse model using Erwin as per the business reporting requirements.
- Reverse Engineering the reports and identified Data Elements (in the source system). Dimensions, Facts and Measures required for reports.
- Worked with the ETL team to document the transformation rules for data migration from OLTP to Warehouse environment for reporting purposes.
Environment: Erwin r9.6, DB2, Teradata, SQL-Server 2008, Informatica 8.1, Enterprise Architect, Power Designer, MS SSAS, Crystal Reports, SSRS, ER Studio, Lotus Notes, Windows XP, MS Excel, word and Access.