Data Analyst / Data Modeler Resume
St Louis, MO
SUMMARY
- Data Analyst with 8+ years of professional experience in the field of Finance, Retail, and Ecommerce performing Statistical Modelling, Data Extraction, Data cleaning, Data screening, Data Exploration and Data Visualization of structured and unstructured datasets as well as implementing large scale Machine Learning and Deep Learning algorithms to deliver resourceful insights, inferences which significantly impacted business revenues and user experience.
- Experienced in Facilitating the entire lifecycle of a data science project: Data Extraction, Data Pre - Processing, Feature Engineering, Dimensionality Reduction, Algorithm implementation, Back Testing and Validation.
- Proficient in Data transformations using log, square-root, reciprocal, cube root, square and complete box-cox transformation depending upon the dataset.
- Adept at Analysis of Missing data by exploring correlations and similarities, introducing dummy variables for missingness, and choosing from imputation methods such iterative imputer on Python.
- Experienced in Machine Learning techniques such as regression and classification models like Linear, Polynomial, Support Vector, Decision Trees, Logistic Regression, Support Vector Machines, K-NN.
- Experienced in Ensemble learning using Random Forests; clustering like K-means.
- In-depth Knowledge of Dimensionality Reduction (PCA, t-SNE), Hyper-parameter tuning, Model Regularization (Ridge, Lasso, Elastic Net) and Grid Search techniques to optimize model performance.
- Experienced in developing algorithms to create Artificial Neural Networks, Deep Learning, Convolution Neural Networks to implement AI solutions.
- Proficient in Data Visualization tools such as Tableau and PowerBI, Big Data tools such as Hadoop HDFS, Spark and MapReduce, MySQL, Oracle SQL and Redshift SQL.
- Skilled in Big Data Technologies like Spark, Spark SQL, Pyspark, HDFS (Hadoop), MapReduce, Kafka, Apache Pig and Apache Hive.
- Experience in Web Data Mining with Python’s ScraPy and Beautiful Soup packages along with working knowledge of Natural Language Processing (NLP) to analyse text patterns.
- Excellent exposure to Data Visualization with Tableau, PowerBI, Seaborn, Matplotlib and ggplot2.
- Experience with Python libraries including NumPy, Pandas, SciPy, Scikit-Learn, Matplotlib, Seaborn, geopy, NLTK and R libraries like ggplot2, dplyr, Lattice, High charter etc.
TECHNICAL SKILLS
Programming Languages: Python (NumPy, Pandas, SciPy), Java
Relational Database: MySQL 5.5, MS SQL Server 2008/2012, MS Access
Cloud Technologies: Microsoft Azure
Amazon Web Services: S3, EMR, Lamda
Google cloud platform: GCS, Data proc, pub/sub
Data Visualization: Tableau, Power BI, SQL Server Reporting Services (SSRS), Excel Reports
Operating System: Windows, Linux, Mac OS
PROFESSIONAL EXPERIENCE
Confidential, St. Louis, MO
Data Analyst / Data Modeler
Responsibilities:
- Develop and automate solutions for a new billing and membership Enterprise data Warehouse including ETL routines, tables, maps, materialized views, and stored procedures incorporating Informatica and Oracle PL/SQL toolsets.
- Performed Data analysis, statistical analysis, generated reports, listings and graphs using SAS Tools SAS/Base, SAS/Macros and SAS/Graph, SAS/SQL, SAS/Connect, SAS/Access.
- Worked on claims data and extracted data from various sources such as flat files, Oracle and Mainframes and involved with data investigation, discovery and mapping tools to scan every single data record from many sources.
- Responsible for gathering requirements, defining project guidelines, data migration strategy, reporting to Director Data management, Data modelling, Data Profiling, Data Visualization, Big Data, Streamlining operations, finance, efficiency improvement, design workshops, JAD sessions, and UAT ta.
- Involved in extensive DATA validation by writing several complex SQL queries and involved in back-end testing and worked with data quality issues and worked with end users to gain an understanding of information and core data concepts behind their business.
- Generated ad-hoc SQL queries using joins, database connections and transformation rules to fetch data from legacy DB2 and SQL Server database system and exhaustively collected business and technical metadata and maintained naming standards.
- Experience in Installing and Configuring Virtual Data Port (VDP) Database setup in Denodo. Having strong capability of managing end to end data flow using virtualization layer at Data Warehouse.
- Utilized Software Development Life Cycle (SDLC) to configure and develop process, standards procedures and Continuous communication with end users, stake holders, project managers, SME’s, technical teams and quality analysts and Prepared process flow/activity diagram for existing system using MS Visio and re-engineer the design based on business requirements
- Analysed business requirements, system requirements, data mapping requirement specifications, and responsible for documenting functional requirements and supplementary requirements in Quality Centre.
- Tested Complex ETL Mappings and Sessions based on business user requirements and business rules to load data from source flat files and RDBMS tables to target tables.
- Troubleshoot data integration issues and bugs, analyse reasons for failure, implement optimal solutions, and revise procedures and documentation as needed.
- Performing monthly quantitative review of all data for quality as well as working with upstream data producers to track and improve data quality as needed; Leads additional research as needed to address unpredicted findings.
- Proficient in handling complex processes using SAS/ Base, SAS/ SQL, SAS/ STAT SAS/Graph, Merge, Join and Set statements, SAS/ ODS.
- Performing monthly quantitative review of all data for quality as well as working with upstream data producers to track and improve data quality as needed; Leads additional research as needed to address unpredicted findings.
- Supports the production of the Yearly, Quarterly & monthly HEDIS deliverables, Quality dashboard and Benchmark reporting by increasing the availability of HEDIS reports from a yearly to a monthly frequency run; and performing monthly quality checks on the data.
- Participates in the review, analysing and reporting of HEDIS measure results to business teams, management, auditors, regulatory and rating agencies.
- Created Graphical representation of reports such as Bar charts, Pie charts etc as per the End user requirements using Business Objects and Crystal reports.
- Data modeler performing business area analysis and logical and physical data modelling using Erwin for data warehouse/data mart applications as well as operational applications enhancements and new development. Data warehouse/data marts design was implemented using Ralph Kimball methodology.
- Prepared ETL technical Mapping Documents along with test cases for each Mapping for future developments to maintain SDLC and Migration process using SSIS.
Confidential, CA
Data Analyst
Responsibilities:
- Performed Data Modelling using Erwin Data Modeller for creating entities, domains, relationships, indexes, and forward and reverse engineering.
- Used Conditional Logics, cross reference and filtering records from various sources as a part of Data Mapping.
- Lead migration of MicroStrategy reports and Data from Netezza to IIAS.
- Experienced in working with Mobile dashboard and Transaction Services.
- Creating Ad hoc reporting using complex SQL queries including window functions, CTE’s, Recursive CTE’s, Subqueries and Derived Tables.
- Acted as a consumer by using Restful APIs from various sources across the Business system and clients.
- Created Volatile and Global Temporary tables on Teradata.
- Developed DDL, DML statements and created Macros and stored procedures using BTEQ utility in Teradata and identified Compression feature to apply on data to reduce storage used by tables.
- Enhanced Performance Turning by dealing with Spool Space issues, data conversions and dropping temporary tables.
- Creating automated tasks to improve operation Efficiency for report deliveries.
- Building ETL pipelines on DataStage, supporting job failures from Tidal by connecting to servers and debugging using UNIX scripts.
- Performing Analysis on large volumes of Transactional data, Customer data, Text data and managing email subscriptions to clients.
- Developing and maintaining, internal and external client-based reports. worked on IIAS (IBM Integrated Analytics System) to perform data analysis and Business Reporting.
- Experience working on Narrowcast Services for scheduling jobs and report triggers.
- Extracted the data from Azure Data Lake into HDInsight Cluster (INTELLIGENCE + ANALYTICS) and applied spark transformations & Actions to loading into HDFS and Netezza tables.
- Performed analysis using SQL API on top of Azure Cosmos DB for customer related JSON data.
- Prepare documentation on standards and best practices, pertaining to administration, deployment and support of MicroStrategy and uploading them on Confluence and managing the User Security through security roles and security filters (MicroStrategy Admin).
- Worked on Schema design, creating various calculated metrics, advanced thresholds, report subscriptions and report optimization.
- Creating and Integrating MicroStrategy reports and objects (Attributes, Filters, Facts, Prompts and custom groups).
- Expertise in reporting services, Power BI (Dash-Board Reports), Crystal reports to the decision makers of strategic planning and Daily Ledger Balance Reporting.
- Helped developing PL/SQL packages, Conducted Unit testing, coordinated the production deployment and resolved Post implementation issues. Engaged in data profiling to integrate data from different sources.
- Performed data wrangling to clean, transform and reshape the data utilizing Panda’s library
- Analysed data using SQL, Scala, Python, and presented analytical reports to management and technical teams.
- Extracted data from SQL Server Database copied into HDFS File system and used Hadoop tools such as Hive to retrieve and analyse the data required for building models.
- Worked on handling CSV tables, performing aggregations, filters and joins in Hive.
- Reviewing performance of the models and suggesting enhancements to the schema for performance tuning of reports and dashboards.
- Managing Offshore team and creating stories with data that a non-technical team could also understand.
Environment: SQL, Python, Erwin, MicroStrategy, Azure, Putty, Power BI, Big Data Tools (Hive), AWS, Machine Learning.
Confidential
Data Analyst
Responsibilities:
- Developed Stored Procedures, Views, Triggers, Indexes, User defined Functions, Constraints on various database objects to obtain the required results.
- Developed SSIS packages to load the staging tables, data marts, and to run jobs. Data sources include flat files, spreadsheet, Access, and other third-party sources.
- Created derived columns from the present columns for the given requirements.
- Used Text Files/ SQL Server Logging for ETL load at package level and task level to log number of records processed by each package and each task in a package using SSIS.
- Identified long running stored procedures using Execution Plan and optimized queries for more effective data retrieval.
- Deployed and supported SSIS packages to production server, and jobs were scheduled run the packages.
- Analyses data by defining quality requirements, making recommendations for changes on the source system, working with users to correct historical data and by applying standard data cleansing in integration services flows using SQL Server
- Involved in performing partitions on the SQL table, and performance tuning such as building indexes, dynamic cube creation.
- Developing and Extending SSAS Cubes, Partitioning, Data Mining Models, Deploying and Processing SSAS objects, writing MDX queries.
- Helped business users to reconcile reports generated from Data Warehouse with reports from legacy systems, and validate the data before and after the migration/integration
- Created Drill-through, Drill-down, Cross Tab Reports, Sub-Report and periodic reports based on the statistical analysis of the data using SQL Server Reporting Services, deploy them onto SharePoint.
- Normalization to 3NF/de-normalization techniques for optimum performance in relational and dimensional database environments.
- Wrote standard SQL Queries to perform data validation and created excel summary reports (Pivot tables and Charts).
Confidential
Data Analyst
Responsibilities:
- Gathered business requirements, definition, and design of the data sourcing, and worked with the data warehouse architect on the development of logical data models.
- Created sophisticated visualizations, calculated columns, and custom expressions, and developed Map Chart, Cross table, Bar chart, Tree map, and complex reports which involve Property Controls, Custom Expressions.
- Investigated market sizing, competitive analysis, and positioning for product feasibility. Worked on Business forecasting, segmentation analysis, and Data mining.
- Extensively used Agile methodology as the Organization Standard to implement the data Models.
- Created several types of data visualizations using Python and Tableau. Extracted Mega Data from AWS using SQL Queries to create reports.
- Performed reverse engineering using Erwin to redefine entities, attributes, and relationships existing database.
- Analysed functional and non-functional business requirements and translate them into technical data requirements and create or update existing logical and physical data models.
- Created customized Web Intelligence reports from various sources of data.
- Created AWS DMS Replication Instances to pick up data from source endpoint and place it in target endpoint.
- Created Snowflake roles, databases, warehouse & schemas, and granted permission for the same.
- Implemented storage integration to access S3 files and load data into S3.
- Implemented CDC logic using snow pipe, stream, and task.
- Created a Snowflake external table that holds entire data including historical data, also created a view on top of the external table. This view displays only the current state of data by eliminating history data.
- Scheduled the jobs using AWS CloudWatch service to monitor the hourly, daily & weekly frequencies.
- Build Data Pipelines to ingest the structured data.
- Created AWS SNS service for reporting error messages and error handling.
- Loading the data into the warehouse from different flat files.
- This Project deals with the Migration of on-premises oracle data to Cloud Data Lake (i.e., AWS S3). Also, it deals with the implementation of ETL pipelines using Snowflake and Python.
Confidential
Data Analyst
Responsibilities:
- Worked in Software Development Life Cycle (SDLC) and Waterfall Methodologies.
- Imported the claims data into Python using Panda’s libraries and performed various data analyses.
- Worked on Natural Language Processing with NLTK module of python for application development for automated customer response.
- Utilized Informatica toolset (Informatica Data Explorer and Data Quality) to inspect legacy data for data profiling.
- Predominant practice of Python Matplotlib package and Power BI to visualize and graphically analyses the data.
- Developed SSIS packages using for each loop in Control Flow to process all excel files within folder, File System Task to move file into Archive after