We provide IT Staff Augmentation Services!

Senior Data Analyst Resume

0/5 (Submit Your Rating)

CA

SUMMARY

  • Over 8+ years of technical IT experience in all phases of Software Development Life Cycle (SDLC) with skills in data analysis, design, development, testing and deployment of software systems.
  • Extensive knowledge in Business Intelligence and Data Warehousing Concepts with emphasis onETLand System Development Life Cycle (SDLC)
  • Excellent working knowledgein Data Analysis, Data Validation, Data Cleansing, Data Verification and identifying data mismatch.
  • Extensive experience in relational Data modeling, Dimensional data modeling, logical/Physical Design, ER Diagrams and OLTP and OLAP System Study and Analysis.
  • Proficiency in multiple databases like MongoDB, Cassandra, MySQL, ORACLE, and MS SQL Server.
  • Worked on different file formats like delimited files, avro, Json and parquet. Docker container orchestration using ECS, ALB and lambda
  • Utilized all Tableau tools including Tableau Desktop, Tableau Server, Tableau Reader, and Tableau Public.
  • Experience in creating Power BI Dashboards (Power View, Power Query, Power Pivot, Power Maps).
  • Utilized analytical applications like R, SPSS, Rattle and Python to identify trends and relationships between different pieces of data, draw appropriate conclusions and translate analytical findings into risk management and marketing strategies that drive value.
  • Strong understanding of the principles of Data warehousing, Fact Tables, Dimension Tables, Starand Snowflake schema modeling
  • Experienced working withExcel Pivotand VBA macros for various business scenarios.
  • Strong experience inData Analysis, Data Migration, Data Cleansing, Transformation, Integration, Data Import, and Data Export by using multiple ETL tools such as Ab Initio and Informatic PowerCenter Experience in testing and writingSQL and PL/SQLstatements - Stored Procedures, Functions, Triggers and packages.
  • CreatedSnowflake Schemasby normalizing the dimension tables as appropriate, and creating a Sub Dimension named Demographic as a subset to the Customer Dimension
  • Hands on experience in test driven development (TDD),Behavior driven development(BDD)and acceptance test driven development (ATDD)approaches
  • Experience in developing data pipelines using AWS services including EC2, S3, Redshift, Glue, Lambdafunctions, Step functions, cloud Watch, SNS, Dynamo, SQS
  • Extensive experience in Text Analytics, generating data visualizations using R, Python and creating dashboards using tools like Tableau, PowerBI.
  • Involved inData Migration projects from DB2 and Oracle to Teradata. Created automated scripts to do the migration using UNIX shell scripting,Oracle/TD SQL.
  • Proficient with Data Warehouse models like Star Schema and Snowflake Schema
  • Expertise in Java programming and have a good understanding on OOPs, I/O, Collections, Exceptions Handling, Lambda Expressions, Annotations
  • Worked on the Microsoft Azure environment (blob storage, Data Lake, AZ copy) using Hive as extracting language.
  • Experienced in building Automation Regressing Scripts for validation of ETL process between multiple databases like Oracle, SQL Server, Hive, and Mongo DB usingPython.
  • Extensively worked with Teradata utilities Fast export, and Multi Load to export and load data to/from different source systems including flat files.
  • Utilized Azure Data Factory for transforming and moving data from virtual machine to Data Factory, BLOB storage, and SQL Server.
  • Experience in TableauDesktop, Power BI for data visualization, Reporting and Analysis; Cross tab, Scatter Plots, Geographic Map, Pie Charts and Bar Charts, Page Trails, and Density Chart.
  • Worked with Informatica Data Quality toolkit, Analysis, data cleansing, data matching, data conversion, exception handling, and reporting and monitoring capabilitiesof IDQ.

TECHNICAL SKILLS

Programming Languages: R Programming, Python, Mat lab, VB, Java, C, C++, SQL, MySQL, PL/SQL

ETL Tools: Informatica Power Center 9.1/8.6 (Designer, Workflow Manager/ Monitor, Repository), Ab Initio

Testing Tools: Jira, HP ALM, IBM ClearQuest, IBMRQM, MTM, SDLC

Database Tools: Oracle SQL Developer, Toad, Oracle 10g/11g/12c, MS SQL Server, SSIS, SSRS, Data Grid

BI and Analytics Tools: OBIEE, Oracle Reports Builder, Spotfire, Tableau 10.5, Pandas, Seaborne, Matplotlib, Cognos, Excel, SAS, SAS Enterprise Miner

Operating System/Framework: Windows, Linux, Macintosh, UNIX

Cloud Technologies: AWS (Amazon Web Services), Microsoft Azure

Data Modeling: Regression Modeling, Time Series Modeling, PDE Modeling, Star-Schema Modeling, Snowflake-Schema Modeling, FACT and Dimension tables

PROFESSIONAL EXPERIENCE

Confidential, CA

Senior Data Analyst

Responsibilities:

  • Created data dictionary, Data mapping for ETL and application support, DFD, ERD, mapping documents, metadata, DDL and DML as required.
  • Analysis of functional and non-functional categorized data elements fordata profilingand mapping from source to target data environment.
  • Developed working documents to support findings and assign specific tasks
  • Analyzed DB discrepancies and synchronized the Staging, Development, UAT and Production DB environments with data models.
  • Worked on data profiling & various data quality rules development usingInformatica Data Quality.
  • Performeddata analysisanddata profilingusing complexSQLon various sources systems including Oracle andTeradata
  • Construct the AWS data pipelines using VPC, EC2, S3, Auto Scaling Groups (ASG), EBS, Snowflake, IAM, CloudFormation, Route 53, CloudWatch, CloudFront, CloudTrail.
  • Written AWS Lambda code in Python for nested Json files, converting, comparing, sorting etc.
  • Experienced in ETL concepts, building ETL solutions and Data modeling
  • Implemented Apache Airflow for authoring, scheduling, and monitoring Data Pipelines
  • Architected, Designed and Developed Business applications and Data marts for reporting.
  • Involved in different phases of Development life including Analysis, Design, Coding, Unit Testing, Integration Testing, Review and Release as per the business requirements.
  • Designed several DAGs (Directed Acyclic Graph) for automating ETL pipelines
  • Performed data extraction, transformation, loading, and integration in data warehouse, operational data stores and master data management
  • Performed Extraction, Transformation and Loading (ETL) using Informatica power center.
  • Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team UsingTableau.
  • Writing UNIX shell scripts to automate the jobs and scheduling cron jobs for job automation using commands with Crontab.
  • Connected to AWS EC2 using SSH and ran spark-submit jobs.
  • Designed and implemented configurable data delivery pipeline for scheduled updates to customer facing data stores built with Python.
  • Design and construct of AWS Data pipelines using various resources in AWS including AWS API Gateway to receives response from aws lambda and retrieve data from snowflake using lambda function and convert the response into Json format using Database as Snowflake, DynamoDB,AWS Lambda function and AWS S3.
  • Proficient in Machine Learning techniques (Decision Trees, Linear/Logistic Regressors) and Statistical Modeling.
  • Optimized the Tensor Flow Model for efficiency.

Environment: Pyspark, Apache Beam, Erwin, AWS (EC2, Lambda, S3, VPC, Snowflake, Cloud Trail, Cloud Watch, Auto Scaling, IAM, Dynamo DB) Cloud Shell, Tableau, Cloud Sql, MySQL, Postgres, Sql Server, Python, Scala, Spark, Informatica, Spark-Sql, No Sql, MongoDB, TensorFlow, Jira, GitLab.

Confidential, VA

Senior Data Analyst

Responsibilities:

  • Analysisof functional andnon-functional categorized data elements for data profiling andmapping from source to target data environment.
  • Developed working documents to support findings and assign specific tasks
  • Involved withdataprofilingfor multiple sources and answeredcomplex business questions by providing data to business users.
  • Worked with datainvestigation, discovery,and mapping tools to scan every single data record from many sources
  • Creating Pipelines in ADF using Linked Services/Datasets/Pipeline/ to Extract, Transform, and load data from different sources like Azure SQL, Blob storage, Azure SQL Data warehouse, write-back tool and backwards.
  • Involved in all the steps and scope of the project reference data approach to MDM, have created a Data Dictionary and Mapping from Sources to the Target in MDM Data Model.
  • Writing UNIX shell scripts to automate the jobs and scheduling cron jobs for job automation using commands with Crontab.
  • Utilized Power BI to create various analytical dashboards that helps business users to get quick insight of the data
  • Write research reports describing the experiment conducted, results, and findings and make strategic recommendations to technology, product, and senior management.
  • Worked closely with regulatory Prepared an ETL technical document maintaining the naming standards.
  • Wrote production level Machine Learning classification models and ensemble classification models from scratch using Python and PySpark to predict binary values for certain attributes in certain time frame.
  • Made Power BI reports more interact and activate by using storytelling features such as bookmarks, selection panes, drill through filters also created custom visualizations using “R-ProgrammingLanguage”.
  • Developed JSON Scripts for deploying the Pipeline in Azure Data Factory (ADF) that process the data using the SQL Activity.
  • Build an ETL which utilizes spark jar inside which executes the business analytical model.
  • Work on data that was a combination of unstructured and structured data from multiple sources and automate the cleaning usingPython scripts.
  • Preparing dashboards using calculated fields, parameters, calculations, groups, sets and hierarchies in Tableau.
  • Prepared technical specification to load data into various tables in Data Marts.
  • Created deployment groups in one environment for theWorkflows, Worklets, Sessions, Mappings, Source Definitions, Targetdefinitions and imported them to other environments.

Environment: Power BI, NoSQL, Data Lake, Zookeeper Python,Tableau, Azure, ADF, Unix/Linux Shell Scripting, PyCharm, Informatica PowerCenter, Linux Shell Scripting

Confidential, NY

Data Analyst

Responsibilities:

  • Identifying the business-critical Measures by closely working with the SME.
  • Involved inData mapping specificationsto create and execute detailed system test plans.
  • The data mapping specifies what data will be extracted from an internal data warehouse, transformed, and sent to an external entity.
  • Analyzedbusiness requirements, system requirements, data mapping requirement specifications, and responsible for documenting functional requirements and supplementary requirements in Quality Center.
  • Setting up of environments to be used for testing and the range of functionalities to be tested as per technical specifications.
  • TestedComplex ETL Mappingsand Sessions based on business user requirements and business rules to load data from source flat files and RDBMS tables to target tables.
  • Evaluated and enhanced current data models to reflect business requirements.
  • UsedPower Exchangeto create and maintain data maps for each file and to connect Power center to mainframe
  • Developed Mappings using Source Qualifier, Expression, Filter, Look up, Update Strategy, Sorter, Joiner, Normalizer and Router transformations.
  • Integrate Data Stage Metadata to Informatica Metadata and created ETL mappings and workflows.
  • Involved in writing, testing, and implementing triggers, stored procedures and functions at Database level using PL/SQL.
  • Migratedrepository objects, services and scripts from development to production environment
  • CreatedUNIX scriptsand environment files to runbatch jobs
  • Worked with DBA's forperformance tuningand to get privileges on different tables in different environments
  • Developed scripts using both Data frames/SQL and RDD in PySpark (Spark with Python) 1.x/2.x for Data Aggregation.

Environment: Python, PL/SQL,Metadata, Cloudera, Java, PySpark, Scala, UNIX, Tableau.

Confidential, TX

Data Analyst

Responsibilities:

  • Designing and building multi-terabyte, full end-to-end Data Warehouse infrastructure from the ground up on Confidential Redshift for large scale data handling Millions of records every day
  • Migrated on premise database structure to Confidential Redshift data warehouse
  • Was responsible for ETL and data validation using SQL Server Integration Services.
  • Defined and deployed monitoring, metrics, and logging systems on AWS.
  • Defined facts, dimensions and designed the data marts using the Ralph Kimball's Dimensional Data Mart modeling methodology using Erwin.
  • Evaluated and enhanced current data models to reflect business requirements.
  • Worked on Big data on AWS cloud services i.e. EC2, S3, EMR and DynamoDB
  • Created Entity Relationship Diagrams (ERD), Functional diagrams, Data flow diagrams and enforced referential integrity constraints and created logical and physical models using Erwin.
  • Created ad hoc queries and reports to support business decisions SQL Server Reporting Services (SSRS).
  • Analyze the existing application programs and tune SQL queries using execution plan, query analyzer, SQL Profiler and database engine tuning advisor to enhance performance.
  • Involved in the Forward Engineering of the logical models to generate the physical model using Erwin and generate Data Models using ERwin and subsequent deployment to Enterprise Data Warehouse.
  • Used Hive SQL, Presto SQL and Spark SQL for ETL jobs and using the right technology for the job to get done.

Environment: Tableau, AWS,EC2, S3, SQL Server, Erwin, Oracle, Redshift, Informatica, SQL, NOSQL, Snow Flake Schema, Tableau, Git Hub.

Confidential

ETL Developer

Responsibilities:

  • Extensively worked on Views, Stored Procedures, Triggers and SQL queries and for loading the data (staging) to enhance and maintain the existing functionality.
  • Responsible for developing, support and maintenance for the ETL (Extract, Transform and Load) processes using Oracle and Informatica Power Center.
  • Designed new database tables to meet business information needs. Designed Mapping document, which is a guideline to ETL Coding.
  • Done analysis of Source, Requirements, existing OLTP system and identification of required dimensions and facts from the Database.
  • Designed the Dimensional Model of the Data Warehouse Confirmation of source data layouts and needs.
  • Extensively used Oracle ETL process for address data cleansing.
  • Developed and tuned all the Affiliations received from data sources using Oracle and Informatica and tested with high volume of data.
  • Developed Logical and Physical data models that capture current state/future state data elements and data flows using Erwin.
  • Extracted Data from various sources like Data Files, different customized tools like Meridian and Oracle.
  • Used ETL to extract files for the external vendors and coordinated that effort.
  • Created various Documents such as Source-to-Target Data Mapping Document, and Unit Test Cases Document.

Environment: Informatica Power Centre, SQL, Oracle, MS Office, MS Excel, Windows.

We'd love your feedback!