We provide IT Staff Augmentation Services!

Sr. Data Analyst Resume

5.00/5 (Submit Your Rating)

SUMMARY

  • Extensive 5+ years of IT experience as data scientist and focusing on Data warehousing, Data modeling, Data integration, Data Migration, ETL process and Business Intelligence.
  • Expertise in Informatica ETL and reporting tools. Deep understanding of the Data Warehousing SDLC and architecture of ETL, reporting and BI tools.
  • Drawing on Experience in all aspects of analytics/data warehousing solutions (Database issues, Data modeling, Data mapping, ETL Development, metadata management, data migration and reporting solutions).
  • Strong understanding of Data Modeling (Relational, dimensional, Star and Snowflake Schema), Data analysis, implementations of Data warehousing using Windows and UNIX.
  • Involved inDataScience project life cycle, includingDataCleaning,Dataextraction, Visualization, with large datasets of structured and unstructureddata, created ER diagrams and schema.
  • Experience with Machine Learning algorithms such as logistic regression, KNN, SVM, random forest, neural network, linear regression, lasso regression and k - means.
  • Working Experience on Python3/2.7 such as NumPy, SQL Alchemy, Beautiful Soup, pickle, Py side, Py Mongo, SciPy, Py Tables.
  • Experience in BigDatatechnologies like Spark 1.6, Spark SQL, Py Spark, Hadoop 2.X, HDFS, Hive 1.X.
  • Worked on Machine Learning algorithms like Classification and Regression with KNN Model, Decision Tree Model, Naïve Bayes Model, Logistic Regression, SVM Model and Latent Factor Model.
  • Used SQL Queries and Stored Procedures extensively in retrieving the contents from MySQL alsoimplementing SQL tuning techniques such as Join Indexes (JI), Aggregate Join Indexes (AJI's), Statistics and Table changes including Index.
  • Developed mappings in Informatica to load the data from various sources into the Data warehouse different transformations like Source Qualifier, JAVA, Expression, Lookup, Aggregate, Update Strategy and Joiner.
  • Extensive experience in developing stored procedures, functions, views and triggers, complex queries using SQL Server, TSQL and Oracle PL/SQL.
  • Experience in working on Talend Administration Activities and Talend Data Integration ETL Tool.
  • Created Physical Data Model from the Logical Data Model using Compare and Merge Utility in ER/Studio and worked with the naming standards utility.
  • Experience in resolving on-going maintenance issues and bug fixes, monitoring Informatica sessions as well as performance tuning of mappings and sessions.
  • Databases: experience using Oracle 10g/9i/8i/8.0/7.0, DB 2 8.0/7.0/6.0 , MS SQL Server 2000/2005/2008 , Sybase, SQL, PL/SQL, SQL*Plus, SQL*Loader, TOAD.
  • Broad Experience in Database Development including effective use of Database objects, SQL Trace, explain Plan, different types of optimizers, hints, indexes, table partitions, sub partitions, materialized views, global temporary tables, autonomous transitions, bulk binds, capabilities of using oracle built-in functions. performance tuning of Informatica Mapping and workflow.
  • Experience in dimensional data modeling experience using Erwin 7.3/9.7 and Ralph Kimball Approach, Star/Snowflake Modeling, data marts, OLAP, FACT & dimensions tables, physical & logical data modeling, and data modeling tools Visio.
  • Data Warehousing experience using Informatica PowerCenter 9.1/9.5/9.6. x versions.
  • Superior communication skills, strong decision making and organizational skills along with outstanding analytical and problem-solving skills to undertake challenging jobs. Able to work well independently and in a team by helping to troubleshoot technology and business-related problems.

PROFESSIONAL EXPERIENCE

Sr. Data Analyst

Confidential

Responsibilities:

  • Conceptualizing, designing, developing and implementing the business requirements through Informatica mappings.
  • Business analysis and requirements Gathering, designing and customizing data models for Data warehouse supporting data from multiple sources on real time.
  • Designed and developed stored procedures using PL/SQL and tuned SQL queries for better performance.
  • Involved in the initiative to convert and enhance the current Marketing campaign process in SAS to a setup that includes Talend, Tableau, SPSS, SQL Server, DB2 in an agile environment, with 2-3 weeks sprints.
  • Responsible for developing, support and maintenance for the ETL (Extract, Transform and Load) processes using Informatica Power Center 8.5.
  • Created Talend flows, SQL scripts, procedures to extract, clean, scrub, validate and load the historical data to the target tables.
  • Created scripts and Talend flows to do data quality checks, transform and load healthcare and enrollment data coming in from various sources such as flat files (delimited, positional), SAS data sets, Salesforce, SQL Server, Excel.
  • Worked with ETL teams and used Informatica Designer, Workflow Manager and Repository Manager to create source and target definition, design mappings, create repositories.
  • Project life cycle - from analysis to production implementation, with emphasis on identifying the source and source data validation, developing logic and transformation as per the requirement and creating mappings and loading the data into different targets.
  • Design, Development, Testing and Implementation ofETLprocesses using Informatica Cloud..
  • Tested ComplexETLMappings and Sessions based on business user requirements and business rules to loaddatafrom source flat files and RDBMS tables to target tables.
  • Designed, developedETLmappings, enabling the extract, transport and loading of the data into target tables using ETL Package 2017.
  • Responsible for creating and modifying PLs, Packages, procedures, functions, cursor, ref cursor, view, materialized view, collections etc. according to the business requirements.
  • Tested theETLprocess for both beforedatavalidation and afterdatavalidation process. Tested the messages published byETLtool anddataloaded into various databases
  • Worked with Memory cache for static and dynamic cache for the better throughput of sessions containing Rank, Lookup, Joiner, Sorter, source qualifier and Aggregator transformations.
  • Analysed dimensional and fact tables in a campaign data mart that details Inclusion, exclusion and Segmentation criteria, Offer assignment and campaign response by campaign.
  • Extensively used almost all the transformations of Informatica including lookups, stored procedures, Update Strategy and others.
  • Extensively worked on Power connect, Power Exchange and Power Center to pull data from sources like Oracle and strong experience on RDBMS Concepts.
  • Strong experience in OLAP Data Modeling using Dimensional Data Modeling, Star Join Schema Modeling, Snow-Flake Modeling, FACT and Dimensions Tables, Physical and Logical Data Modeling.
  • Extensively worked with XML files as the Source and Target, used transformations like XML Generator and XML Parser to transform XML files, used Oracle XMLTYPE data type to store XML files.
  • Application of various machine learning algorithms and statistical modeling like decision trees, regression models, SVM, clustering to identify Volume using SciKit-learn package inpython, MatLab.
  • Utilized Apache Spark withPythonto develop and execute BigDataAnalytics and Machine learning applications, executed machine Learning use cases under Spark ML and MLlib.
  • Built analyticaldatapipelines to portdatain and out of Hadoop/HDFS from structured and unstructured sources and designed and implemented system architecture for Amazon EC2 based cloud-hosted solution for client.
  • Involved in creatingDataLake by extracting customer's BigDatafrom variousdatasources into Hadoop HDFS. This includeddatafrom Excel, Flat Files, Oracle, SQL Server,Mongo Db, Cassandra, HBase, Teradata, Netezza and also logdatafrom servers.
  • Created map reduce running over HDFS fordatamining and analysis usingPythonand Loading & Storage datato Pig Script andPythonfor MapReduce operations and created various types ofdatavisualizations usingPython, and Tableau.
  • Performeddataanalysis by using Hive to retrieve thedatafrom Hadoop cluster, SQL to retrievedatafrom Oracle database.
  • CreatedDataQuality Scripts using SQL and Hive to validate successfuldataload and quality of thedata. Created various types ofdatavisualizations usingPythonand Tableau.
  • Extracteddatafrom HDFS and prepareddatafor exploratory analysis usingdatamunging.
  • Worked on Python OpenStack APIs and used NumPy for Numerical analysis and worked on real time in memory distributed systems.
  • DevelopedPythonOpen Stack APIs by using NumPy for Numerical analysis also Ajax and jQuery for transmitting JSON data objects between frontend and controllers.
  • Developed analytics solutions based on Machine Learning platform and demonstrated creative problem-solving approach and strong analytical skills.
  • Designed and managed API system deployment using fast http server and Amazon AWS architecture & Developed remote integration with third party platforms by using RESTful web services and Successful implementation of Apache Spark and Spark Streaming applications for large scale data.

Data Analyst

Confidential

Responsibilities:

  • Developed T-SQL queries, scripts and stored procedures used for extracting, validating, transforming and loading data provided by clients.
  • Dig deep into complex SQL Query and PL/SQL Procedure to identify items that could be converted to Informatica Cloud ISD.
  • Analyzed the data in the multiple databases created for Marketing Campaign and do a thorough data quality analysis and profiling using Talend EDQ and Data flux, to identify anomalies in the data in terms of accuracy and redundancy and implement complex data quality rules to prevent and cleanse the data.
  • Worked on analysing Hadoop cluster and different Big Data Components including Pig, Hive, Spark, HBase, Kafka, Elastic Search, database and SQOOP. Installed Hadoop, Map Reduce, Confidential, and developed multiple Map-Reduce jobs in PIG and Hive for data cleaning and pre-processing.
  • Using Informatica PowerCenter Designer analysed the source data to Extract & Transform from various source systems (oracle 10g,DB2,SQL server and flat files) by incorporating business rules using different objects and functions that the tool supports.
  • Respond to ad-hoc requests from Business Analyst and Project Managers by providing accurate data/reports in a timely manner.
  • Tested and implemented applications built using Python andcoordinated with campaign and reporting team for the change requests and availability of Data.
  • Extensively use ETL methodology for performing Data Migration, Data Profiling, Extraction, Transformation and Loading using Talend and designed data conversions from wide variety of source systems like SQL Server, Oracle, DB2 and non - relational sources like XML, flat files, and mainframe Files.
  • Developed high level data dictionary of ETL data mapping and transformations from a series of complex Talend data integration jobs.
  • DesignedETLpackages to load the data from staging server to Data marts on Data warehouse.
  • Worked with various SSIS control flow tasks and data transformation tasks like Data conversion, Derived Column, Look-up, Fuzzy Look-up, script task, etc. as part ofETL.
  • Worked with configuring checkpoints, package logging, error logging and event handling to redirect error rows and fix the errors in SSIS.
  • Loaded the data into destination tables: both full and incremental load, by performing different kinds of transformations like row count, Look-up, derived column, merge, script task using SSIS packages and performed with truncating, modifying and creating tables in data base as per requirements.
  • Extending HIVE and PIG core functionality by using custom User Defined Function’s (UDF), User Defined Table-Generating Functions (UDTF) and User Defined Aggregating Functions (UDAF) for Hive and Pig using python.
  • Used various transformations like Filter, Router, Expression, Lookup (connected and unconnected), Aggregator, Sequence Generator, Update Strategy, Joiner, Normalizer, Sorter and Union to develop robust mappings in the Informatica Designer.
  • Developed numerousTeradata SQL Queries by creating SET or MULTISET Tables, Views, Volatile Tables, using Inner and Outer Joins, Using Date Function, String Function and Advanced techniques like RANK and ROW NUMBER functions.
  • 2fInvolved heavily in writingcomplex SQL queriesto pull the required information from Database usingTeradataSQL Assistance
  • Worked extensively creating and maintaining DAX calculations, DAX queries and MDX queries.
  • Provided Operational Support to modify existing Tabular SSAS models to satisfy new business requirements.
  • Created hierarchies in Power BI reports using Data visualizations like Bar chart, Line chart, pie charts, forecast charts deployed.
  • Created SSIS packages for File Transfer from one location to the other using FTP task with Master SSIS Package to run all other packages.
  • Automated execution process, scheduling, deploying of packages using SQL Server Agent by creating jobs and error reports using Alerts, SQL Mail Agent and FTP.
  • Developed Scala scripts, UDFFs using both Data frames/SQL/Data sets and RDD/MapReduce in Spark 1.6 for Data aggregation, queries and writing data back into OLTP system through Sqoop.
  • Deployed SSIS packages with minimal changes using XML configuration file.
  • Created SSRS reports using complex SQL Queries/Stored Process which have sub-reports, Drill-Down reports and charts.
  • Created the test data to test the performance and functionality of the SSRS Reports also created database objects like Procedures, Functions, Triggers, Indexes, DML and DDL changes.

Data Engineer

Confidential

Responsibilities:

  • Involved and understanding the Business requirements/ discuss with Business Analyst, analyzing the requirements and helped architect preparing business rules.
  • Design and Developed complex mappings by using Lookup transformation, Expression, Sequence generator, Update Strategy, Aggregator, Router, Stored Procedure to implement complex logics while create mappings.
  • Developed mappings workflow usingInformaticato load data from homogeneous and heterogeneous sources and Targets such as Relational tables, Flat files.
  • Developed mappings/Transformation/mapplets by using mapping designer, transformationdeveloperand mapplets designer usingInformaticaPowerCenter.
  • Contributed and actively providing comments for user stories review meeting within AGILE SCRUM environment.
  • Extensively worked on SQL override in Source Qualifier, Look Up, Aggregator Transformation for better performance.
  • Hands on experience using query tools like TOAD, SQL.
  • Created and maintained Job Groups, Jobs, Job activity on Control M to schedule workflow and providedInformaticajob Support.
  • Worked in using Teradata14 tools like Fast Load, Multi Load, T Pump, Fast Export, Teradata Parallel Transporter (TPT) and BTEQ.
  • Involved in creatingDataLake by extracting customer's BigDatafrom variousdatasources into Hadoop HDFS. This includeddatafrom Excel, Flat Files, Oracle, SQL Server,Mongo Db, Cassandra, HBase, Teradata, Netezza and also logdatafrom servers
  • Helped in migration and conversion ofdatafrom the Sybase database into Oracle database, preparing mapping documents and developing partial SQL scripts as required.
  • Designed the schema, configured and deployed AWS Redshift for optimal storage and fast retrieval ofdata and executed ad-hocdataanalysis for customer insights using SQL using Amazon AWS Hadoop Cluster.
  • Integrated Azure Active Directory (Azure AD) and other Azure services that enables you to build your modern data warehouse and machine learning.
  • Involved in building database Model, APIs and Views utilizingPython, in order to build an interactive web-based solution.
  • Involved in development of Web Services usingSOAPfor sending and getting data from the external interface in theXMLformat
  • WrotePythonScriptsto parse JSON files and load the data into the Console and usedPythonIDE PyCharm for developing the code and performing unit test.
  • Worked on different Tasks in Workflow Manager like session, event wait, decision, e-mail, command, Timer and scheduling of the workflow.
  • Created and scheduled Worklets. Setup workflow and Tasks to schedule the loads Confidential required frequency using Workflow Manager.
  • Designed and developed Informatica mappings for data loads and data cleansing. Extensively worked on Informatica Designer.
  • Created Design Documents for source to target mappings. Developed mappings to send files daily to AWS.
  • Created visualizations for logistics calculations and departmental spend analysis. Utilized sample data to create dashboard while ETL team were cleaning data from source systems, and was responsible for replacing the connection to Google Big-query later.
  • Used SQL Queries Confidential the custom SQL level to pull the data in tableau desktop and validated the results in Tableau by running SQL queries in SQL developer and Google Big-Query.
  • Worked on Business Intelligence standardization to create database layers with user friendly views in Teradata that can be used for development of various Tableau reports/ dashboards.
  • Used Spark-Streaming APIs to perform necessary transformations and actions on the fly for building the common learner data model which gets the data from Kafka in near real time and Persists into Cassandra.

Data Engineer

Confidential

Responsibilities:

  • Responsible for the design, implementation and architecture ofvery large-scale data intelligence solutions around Snowflake Data Warehouse.
  • Worked on all phases of data warehouse development lifecycle, from gathering requirements to testing, implementation, and support.
  • Responsible for ETL (Extract, Transform, Load) processes to bring data from multiple sources into a single warehouse environment.
  • Experience in the development of ETL processes and frameworks for large-scale, complex datasets
  • Created ETL jobs using matillion to load server data into the Snowflake Data Warehouse.
  • Design and build ETL pipelines to automate ingestion of structured and unstructured data
  • Developing ETL pipelines in and out of data warehouse using combination of Python and Snowflakes SnowSQL Writing SQL queries against Snowflake.
  • Maintain existing ETL workflows, data management and data query components
  • Substantial experience with the use of relational databases using SQL for data extraction, management, and queries
  • Experience developing data management systems, tools and architectures using relational databases, Redshift and/or other distributed computing systems.
  • Implement data models, database designs, data access, table maintenance and code changes together with our development team.
  • Demonstrated ability to move data between production systems and across multiple platforms.
  • Lead migration of a legacy Data Warehouse from On-premise to AWS and Spark/EMR
  • Encoded and decoded json objects using PySpark to create and modify the dataframes in Apache Spark
  • Added support for Amazon AWS S3 to host static/media files and the database into Amazon Cloud.

We'd love your feedback!