We provide IT Staff Augmentation Services!

Sr. Data Analyst Resume

4.00/5 (Submit Your Rating)

San Jose, CA

SUMMARY

  • 7 years of experience in IT and comprehensive industry noledge as Data Analyst on Data Analysis, Data Manipulation, Data Mining, Data Visualization and Business Intelligence.
  • Proficient in Statistical methodologies such as Hypothesis Testing, ANOVA and Time Series Analysis.
  • Adept in statistical programming languages like Python and R including Big Data technologies, PySpark, Snowflake, Databricks.
  • Experienced with Insurance Domain payer, Encounters Revenue and Remediation efforts Claims and Clinical Trails systems configuration (Facets) along with Medicare/Medicaid noledge.
  • Good Understanding of Data ingestion, Airflow Operators for Data Orchestration, and other related python libraries.
  • Proficient in designing and creating various Data Visualization Dashboards, worksheets, and analytical reports to help users to identify critical KPIs and facilitate strategic planning in the organization utilizing Tableau Visualizations according to the end user requirements.
  • Expertise in building dashboards using Tableau, Power BI, QuickSight
  • Experienced in developing web applications using Plotly Dash and R - Shiny.
  • Experience in designing star schema, Snowflake schema for Data Warehouse, ODS architecture.
  • Expertise in Python data extraction and data manipulation, and widely used python libraries like NumPy, Pandas, and Matplotlib for data analysis.
  • Experienced in AWS services like EC2, S3, SNS, Lambda, Glue, Step Functions and Cloud Watch.
  • Designed and developed complex Tableau dashboards using table calculations, multiple data sources, custom SQL
  • Expert in creating PL/SQL Schema objects like Packages, Procedures, Functions, Subprograms, Triggers, Views, Materialized Views, Indexes, Constraints, Sequences,
  • Exception Handling, Dynamic SQL/Cursors, Native Compilation, Collection Types Record Type, Object Type using SQL Developer.
  • Knowledge and experience working in Waterfall as well as Agile environments including the Scrum process and using Project Management tools like Project Libre, Jira/Confluence, and version control tools such as GitHub/Git.
  • Quick learner having strong business domain noledge and can communicate business data insights easily with technical and nontechnical clients.

TECHNICAL SKILLS

Database: MS SQL Server 2005/2008/2012 , MS Access, Snowflake

Languages: Python 3.x, R, SQL

Methodologies: Agile, Scrum and Waterfall

Libraries: Scikit-Learns, Keras, Tensor ow, NumPy, Pandas, Matplotlib, Seaborn

Statistical Methods: Hypothetical Testing, ANOVA, Time Series, Con dence Intervals, Bayes Law, Principal Component Analysis (PCA), Dimensionality Reduction, Cross- Validation

Machine Learning: Regression analysis, Bayesian Method, Decision Tree, Random Forests, Support Vector Machine, Neural Network, K-Means Clustering, KNN and Ensemble Method, Natural Language Processing

Reporting Tools: Tableau 10.x, 9.x, 8.x which includes Desktop, Server and Online, MicrosoT Power BI.

Data Visualization: Tableau, Matplotlib, Sea-born, Microsoft Power BI

Machine Learning: Regression, clustering, SVM, Decision trees, Classi cation, Recommendation systems etc.

Big data Framework: Amazon EC2, S3 and EMR

ETL/Data Warehouse Tools: Web Intelligence, Talend, Tableau, Data Modeling Star-Schema Modeling, Snow ake-Schema Modeling, and FACT and dimension tables, Pivot Tables

PROFESSIONAL EXPERIENCE

Confidential

Sr. Data Analyst

Responsibilities:

  • Responsible for data management (data cleanup, extraction, and update) through SQL, perform analysis, and prospective research to identify KPIs.
  • Extract data from various data sources and prepare interactive dashboards utilizing various kinds of charts, graphs and filtering data using logical filters and publishing them.
  • Created dashboards using Tableau/Power BI for analysis project by analyzing the historical data and identifying KPIs which had a positive impact on the business.
  • Develop & maintain Tableau dashboards and KPIs to help identify areas of growth, track changes and provide visibility to stakeholders.
  • Practicing full facing Agile methodology with Sprint Meeting, Sprint Planning meeting, sprint review and retrospective meeting.
  • Worked with a cross functional team to identify risk, plan risk response, risk mitigation and developed risk resolution.
  • Developed dashboards and reports using BI tools like Qlik View and Tableau in the Insurance domain.
  • Involved in creating database objects like tables, views, procedures, triggers, functions created Tableau/ PowerBI dashboard to maintain a tracking system/database to identify trends required for project deliveries.
  • Responsible for testing products and each App OS on different devices. Ensure products and apps are in working condition.
  • Involved in Requirement Analysis, Test products, apps Formulating testing plans and testing all functions and use cases against specifications and validating issues.
  • Work on AWS Data pipeline to configure data loads from S3 to into Redshift.
  • Extracted, transformed, and loaded data from various heterogeneous data sources and destinations using AWS Redshift.
  • Developed a python script to hit REST API's and extract data to AWS S3
  • Created Tables, Stored Procedures, and extracted data using T-SQL for business users whenever required.
  • Performs data analysis and design, and creates and maintains large, complex logical and physical data models, and metadata repositories using ERWIN and MB MDR
  • Selected and generated data into csv files and stored them into AWS S3 by using AWS EC2 and then structured and stored in AWS Redshift.
  • Designed the schema, configured, and deployed AWS Redshift for optimal storage and fast retrieval of data and used Spark Data frames, Spark-SQL, Spark MLLib extensively and developing.
  • Generates ETL scripts to transform, flatten, and enrich the data from source to target using AWS Glue and created event driven ETL pipelines with AWS Glue.
  • Utilized Spark SQL API in PySpark to extract and load data and perform SQL queries.
  • Used PySpark and Pandas to calculate the moving average and RSI score of the stocks and generated them into data warehouse.
  • Design SNS notifications for Matillion pipelines which will notify on specific success/ failures of crucial data processing states.
  • Worked on insurance data related to Medicare, Medicaid, and Insurance claims.
  • Implemented AWS SQS queue to create the dependency between Matillion jobs across projects
  • Involved in integration of Hadoop cluster with spark engine to perform BATCH and GRAPHX operations.
  • Performed data preprocessing and feature engineering for further predictive analytics using Python Pandas.
  • Worked on the tuning of SQL Queries to bring down run time by working on Indexes and Execution Plan.
  • Performing ETL testing activities like running the Jobs, Extracting the data using necessary queries from database transform, and upload into the Data warehouse servers.
  • Developed a detailed project plan and helped manage the data conversion migration from the legacy system to the target snowflake database.
  • Developed data pipeline using Spark, Hive, Pig, python, Impala, and HBase to ingest customer
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Python and Scala.
  • Used ETL to implement the Slowly Changing Transformation, to maintain Historically Data in Data warehouse.
  • Created a Lambda Deployment function, and configured it to receive events from S3 buckets
  • Performing ETL testing activities like running the Jobs, Extracting the data using necessary queries from database transform, and upload into the Data warehouse servers.
  • Experience in deploying code through Jenkins and creating pull requests using bit bucket.
  • Used Git for version control with colleagues.

ENVIRONMENT: AWS, Tableau, SQL, PLSQL, MS Web Technologies, Python, Scrum AGILE, Confluence, MS Office Tools.

Confidential, San Jose, CA

Data Analyst

Responsibilities:

  • Analyzed effectiveness of products based on customer feedback and involved in data collection, cleaning, pre-processing & quantitative analysis to extract useful insights for creating various reports using SQL, Microsoft Excel (using pivot tables, Vlookups, etc) to provide definition, structure and to maintain data efficiently.
  • Gathered Business requirements from stakeholders and ensured their implementation as mentioned in requirements document and project management, to served 100% deliverable.
  • Involved in performing importing data from various sources to the Cassandra cluster using Sqoop.
  • Created queries to extract data from SQL Server (Source Database) to Flat Files or Excel files.
  • Managed and implemented the data extraction and data importing procedure to ensure Data Migration integrity and quality.
  • Hands-on technical experience in Python, Java, Q++(Mastercraft), DB2 SQL, R programming with primary exposure to the P & C Insurance domain.
  • Worked on Spark Architecture including spark core, spark SQL, DataFrame, Driver Node, Worker Node, Stages, Executors and Tasks, Deployment modes, the Execution hierarchy, fault tolerance, and collection.
  • Developed ETL Processes in Databricks to extract data from redshift, perform transformations and load data to S3- Datalayer in Databricks.
  • Executed and verified report data in regards to claims (EDI), insurance, population, treatment plans and other funding/financial data
  • Implemented Unit test cases in Pytest, Acceptance Testing in Gauge Framework using python
  • Developed and maintained a tracking system/database in Microsoft Excel (using pivot tables, Vlookups, etc) to identify trends.
  • Created Tableau reports with complex calculations and worked on Ad-hoc reporting using Tableau.
  • Experience in debugging Jenkins’s pipeline for log errors.
  • Developed ETL Processes in AWS Glue to migrate data from external sources like S3, ORC/Parquet/ Text Files into AWS Redshift.
  • Troubleshooted and maintained ETL/ELT jobs running using Matillion.
  • Develop Python and SQL used in the transformation process in Matillion.
  • Developed scripts in BigQuery and connecting it to reporting tools.
  • Work related to downloading BigQuery data into Spark data frames for advanced ETL capabilities.
  • Implemented usage of Amazon EMR for processing Big Data across a Hadoop Cluster of virtual servers on Amazon Elastic Compute Cloud (EC2) and Amazon Simple Storage Service (S3)
  • Created AWS Lambda functions and assigned IAM roles to schedule python scripts using CloudWatch Triggers to support the infrastructure needs (SQS, Event Bridge, SNS)
  • Developed a python script to hit REST API’s and extract data to AWS S3
  • Conducted ETL Data Integration, Cleansing, and Transformations using AWS glue Spark script
  • Worked on functions in Lambda dat aggregates the data from incoming events, and then stored result data in Amazon DynamoDB
  • Worked on No-SQL database like DynamoDB and MongoDB
  • Deployed the project on Amazon EMR with S3 connectivity for setting a backup storage
  • Worked on AWS Data Pipeline to configure data loads from S3 to into Redshift
  • Connected Redshift to Tableau for creating dynamic dashboard for analytics team.
  • Used JIRA to track issues and Change Management.
  • Involved in creating Jenkins jobs for CI/CD using git, Maven and Bash scripting.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs with Scala.
  • Used Spark API over Hadoop YARN as execution engine for data analytics using Hive.
  • Exported the analyzed data to the relational databases using Sqoop to further visualize and generate reports for the BI team.
  • Worked with Spark Ecosystem using Scala and Hive Queries on different data formats like Text file and parquet.
  • Responsible for migrating the code base to Amazon EMR and evaluated Amazon eco systems components like Redshift.
  • Collected the logs data from web servers and integrated in to HDFS using Flume
  • Developed Python scripts to clean the raw data.
  • Imported data from AWS S3 into Spark RDD, Performed transformations and actions on RDD's
  • Used AWS services like EC2 and S3 for small data sets processing and storage
  • Implemented Nifi flow topologies to perform cleansing operations before moving data into HDFS.
  • Worked on different file formats (ORCFILE, Parquet, Avro) and different Compression Codecs (GZIP, SNAPPY, LZO).
  • Created applications using Kafka, which monitors consumer lag within Apache Kafka clusters.
  • Worked on importing and exporting data into HDFS and Hive using Sqoop, built analytics on Hive tables using Hive Context in spark Jobs.
  • Developed workflow in Oozie to automate the tasks of loading the data into HDFS.
  • Worked in Agile environment using Scrum methodology.
  • Work on requirements gathering, analysis and designing of the systems.
  • Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
  • Worked with spark techniques like refreshing the table and handling parallelly and modifying the spark defaults for performance tuning.
  • Implemented Spark RDD transformations to Map business analysis and apply actions on top of transformations.
  • Involved in migrating MapReduce jobs into Spark jobs and used SparkSQL and Data frames API to load structured data into Spark clusters.
  • Involved in using Spark API over Hadoop YARN as execution engine for data analytics using Hive and submitted the data to BI team for generating reports, after the processing and analyzing of data in Spark SQL.
  • Performed SQL Joins among Hive tables to get input for Spark batch process.
  • Worked with data science team to build statistical model with Spark MLLIB and PySpark.
  • Used Sqoop to import functionality for loading Historical data present in RDBMS to HDFS.
  • Developed Oozie workflow for scheduling & orchestrating the ETL process.

ENVIRONMENT: Agile Methodology, ETL, Spark, PySpark, RDBMS, Matillion, HDFS, Big data, MySQL, DbVisualizer Pro, API's, Agile Scrum, Jira Board, ServiceNow, MS Office Tools, UML.

Confidential, Bronx, NY

Data Analyst/Engineer

Responsibilities:

  • Analyzed business processes and recommended areas for improvement through system enhancements and more efficient process flows.
  • Working directly with users internally within finance and externally with the departments to develop operational reports using business intelligence tools.
  • Gather Service Level Agreement API requirements and goals in DevOps with project team from stakeholders
  • OLAP and OLTP Datasets Processing and applying security, encryption, masking and obfuscation
  • Create and maintain SQL, T-SQL query scripts for data transformation and DW Tables and Views design
  • Develop and Publish PBI Reports with hierarchies and slicers in visualization with various visuals
  • Maintain, automate, improve, and add features for improved user interactions to existing reports
  • Monitor and manage, Develop Logic Apps for pipeline workflows and email notifications using web API
  • Use Azure Data Factory (ADF), Synapse, Databricks, Blob Storage, Data Lake Storage, Blobs, Azure DB
  • Write Performance tuned Spark SQL query and subquery scripts to Extract Transform and Load, both ETL & ELT
  • Gather and document requirements, Ingest, normalize retail sales data from disparate sources for sales report
  • Data mapping of application source (csv, json, excel,xml), Validate, Scrub, clean in Data Flow to staging tables
  • Extract, Transform and Load data to DW Tables, Build Tabular models in Visual Studio, deploy to SSAS
  • Import data using Sqoop to load data from Teradata to HDFS on a regular basis.
  • Develop pyspark applications to apply business validation rules to incoming transactional data.
  • Develop data processing applications using pyspark to process transactional data and persist in data lakes.
  • Develop HQL Scripts to create external tables in Hive on top of ingested data and processed data.
  • Develop pyspark applications to join data transactional data with multiple dimensional tables, process the data, and persist the data to Cassandra.
  • Experienced with Insurance domain payer, Encounters Revenue and Remediation efforts Claims and Clinical Trails systems configuration (Facets) along with Medicare/Medicaid noledge.
  • Create analytical applications to perform certain analytics and push the data RDBMS for business analysis.
  • Built an Ingestion Framework dat would ingest the files from SFTP to HDFS using Apache nifi.
  • Expert in writing, configuring, and maintaining the Hibernate configuration files and writing and updating Hibernate mapping files for each Java object to be persisted.
  • Involved in application performance tuning and fixing bugs.
  • Performed various pocs in data ingestion, data analysis, and reporting using Hadoop, mapreduce, Hive, Pig, Sqoop, Flume, Elastic Search.
  • Research and recommend various tools and technologies on the Hadoop stack considering the workloads of the organization.
  • Extensively used SQL queries, PL/SQL stored procedures & triggers in data retrieval and updating of information in the Oracle database using JDBC.
  • Performed Code Reviews and responsible for Design, Code, and Test signoff.
  • Assigning work to the team members and assisting them in development, clarifying on design issues, and fixing the issues.
  • New data warehouse development project using the Data Vault 2.0 methodology, Insurance domain.
  • Create ETL data pipelines using Azure data factory to load data into Azure SQL Server data warehouse.
  • Create COSMOS/SCOPE scripts to extract data from big data streams for analysis.
  • Create SQL scripts to support Powerbi reports and adhoc data requests.
  • Create Power BI dashboard reports utilizing DAX and interactive report visualizations.

ENVIRONMENT: Team Forge, Teradata, Teradata SQL Assistant, SQL, Adobe Analytics, HP Agile Manager (HP AGM), HP Quality Center (HP QC), SAFe, Windows, MS Office Suite.

Confidential, Boron, CA

Data Analyst

Responsibilities:

  • Utilized SQL and other data warehousing programs, as well as dashboard/visualization toolkits for data analysis.
  • Converted data into actionable insights with Tableau for future outcomes.
  • Used statistical techniques for hypothesis testing to validate data and
  • Devised simple and complex SQL scripts to check and validate Dataflow in various applications.
  • Review dashboard summaries with senior management interpreting and presenting trends, provide strategies and recommendations to increase effectiveness.
  • Devised PL/SQL Stored Procedures, Functions, Triggers, Views and packages.
  • Made use of Indexing, Aggregation and Materialized views to optimize query performance.
  • Created Tableau dashboards/reports for data visualization, Reporting and Analysis and presented it to Business.
  • Created Data Connections, Published on Tableau Server for usage with Operational or Monitoring Dashboards.
  • Worked with senior management to plan, define and clarify dashboard goals, objectives and requirement.
  • Responsible for daily communications to management and internal organizations regarding status of all assigned projects and tasks.

Environment: PL/SQL, Tableau, SQL Server 2012, SSIS, SSRS, Share point, Oracle, MS Office 2007, MS Access, Windows Server 2008.

We'd love your feedback!