Application Architect Resume
Pennington, NJ
PROFESSIONAL SUMMARY:
- 8+ years of experience in Analysis, Development and Implementation of data integration using tools like Apache Spark and Informatica Powercenter in Financial industry.
- Possess solid understanding of various Spark abstractions like RDD, DataFrame, DataSet and good understanding of Spark Architecture and internals like Catalyst Optimizer and Tungsten Execution Engine.
- Developed our own data loader tool using Scala and Spark that takes hql file as an input and writes it to target Hive table/Extract or from Hive table to Extract. Which is very suitable for most of our use cases where we take data from couple of tables and do some transformations on that and load it to our target.
- Developed various utility programs which comes very handy in Bigdata Applications like Partition/Small File Consolidation, Purging Hive data and CDC to Snapshot Conversion using Apache Spark and Scala.
- Worked on improving slow running Spark job’s performance by utilizing various methods such as caching, partition pruning, repartition (data skew), taking insights from explain plan and configuring memory parameters.
- Worked on migrating various layers of our data platform from Spark 1.6 to Spark 2.3.
- Experienced in writing HQL (Hive Query Language) statements for complex business rules.
- Have good amount of experience in using various AWS tools such as S3, EC2, EMR, Redshift, etc.
- Experienced in using other relevant tools for data integration like Talend Open Studio, Informatica Cloud, SSIS and Informatica Metadata Manager
- Experienced in writing Test cases and implement unit test cases using Junit 4.12.
- Have good understanding of setting up development environment using tools like Git 2.18, Eclipse Oxygen and Maven.
- Have good understanding of Data Warehousing concepts and Dimensional Modeling using Star and Snowflake schemas.
- Extensively involved in Optimization and Tuning of transformations, SQL’s and sessions in Informatica by identifying, eliminating bottlenecks and memory management.
- Experienced with Informatica Advanced Techniques - Dynamic Caching, Incremental Aggregation, Parallel Processing and Pushdown optimization to increase Performance.
- Extensive experience in automating ETL processes (Workflow run, import and export XMLs) using UNIX shell scripting and Autosys scheduler.
- Expertise in installing, configuring and administering Informatica on LINUX server. Experienced in upgrading Informatica Powercenter Version 8.6.1 to 9.1.0 and 9.1.0 to 10.0.
- Experienced in configuring nodes to work in a grid architecture for our Informatica jobs.
- Experienced in creating stored procedures using PL/SQL and tuning queries to improve the performance.
- Used UNIX Shell scripts extensively to handle large number of upstream and downstream files and have good understanding of most popular UNIX commands such as tr, grep, sed and awk.
- Extensive knowledge in Formatting, Creating Pivot tables and other excel processing automatically using VBA and Windows batch scripting to avoid manual efforts for the repetitive tasks.
- Quick learner, open to suggestions and recommendations and easily adapt to changes.
TECHNICAL EXPERTISE:
Bigdata Tools: Spark 2.3(4), Hive(4), HDFS(4), YARN(1) and Oozie(1)
Programming Languages: Scala 2.11(4), Visual Basics(3), Java(3) and Python(2)
ETL Tools: Informatica Powercenter 10.x(5), Informatica Cloud(4), Metadata Manager 10.x(2), SSIS(2) and Talend Open Studio(2)
Databases: Microsoft SQL Server 2015(4), Amazon Redshift(3), Oracle 11g(3), DB2 9.x(2), MS Access 2016(2) and Teradata 14(1)
Scripting Languages: UNIX Shell Scripting(4), VBA(4) and Windows batch Scripting(2)
Developer Tools: Eclipse 4.7(4), Git(4), Bitbucket(4) and Maven(3)
AWS Services: S3(3), EC2(3), Redshift(2), EMR(1)
Scheduling Tools: Autosys(5) and Activebatch(3)
Reporting Tools: TIBCO Spotfire(2) and Tableau(1)
PROFESSIONAL EXPERIENCE:
Confidential, Pennington, NJ
Application Architect
Responsibilities:
- Was a key player in developing the sourcing framework called Autosource and took care of sourcing for 10+ out of 25 sources.
- Developed Scala/Spark packages for preprocessing the files to standard format that can be load through Autosource and also built packages for converting the incremental hive partitions to consolidated snapshots to simplify the analysis.
- Created a process using Shell scripts for FTP'ing the files from source location to our landing zone and also built the file archival and retention process that should be applied post sourcing.
- Created parallel pipeline to load the Talend processed files into Hive for 100+ tables in just 2 days with 0 defect. We had to do this in the last minute since the actual pipeline that translates the Talend logic into HQL work didn't pass the regression test.
- Worked on optimizing the delivery of the sink process called backfeed and improved the run time from 2+ hours to under 20 mins on a full volume.
- Worked on decommissioning legacy-flows which involves translating the business logic from other technologies like Talend, Teradata SQL and Microsoft SQL to HQL and building the wrapper around it which loads the data into Hive tables.
Environment: Apache Spark 1.6, Hive 1.1, Scala 2.1.0, LINUX, Oracle 12c, Talend 5.4 and Autosys
Confidential, Nashville, TN
Data Engineer
Responsibilities:
- Worked with Business teams such as Supply Chain Management (SCM), Finance and Microsoft Dynamics (DAX) to understand their operations with respect to Analytics and prepare a plan to bring data to Atlas platform.
- Worked with offshore team of 6 members to develop Informatica objects for synchronizing the on premise data with Atlas platform and Converting SQL Server Stored Procedures into Redshift equivalent queries.
- Followed up with Business teams to get the sign off on the data sync up and Redshift queries which load data marts.
- Helped business team to switch over legacy reporting platforms into Atlas supplied tools like Spotfire, Tableau and Denodo.
- Conducted 4 full day s for various business teams to use Atlas platform and its benefits over legacy on premise systems.
Environment: Informatica Powercenter 10.0, Informatica Cloud, Informatica Metadata Manager, AWS S3, AWS Redshift, AWS Lambda, AWS Elastic Mapreduce (EMR), Denodo, SQL Server 2012, TIBCO Spotfire, UNIX and Activebatch.
Confidential, Jersey City, New Jersey
Lead ETL Developer
Responsibilities:
- Setup meetings with DQ Team and Manager to understand the requirement, finalize the format and complete low-level design for Lineage submission.
- Trained team of 9 ETL developers (2 onsite and 7 offshore) on the process of gathering Lineage from Informatica objects and stored procedures and write it to the *.csv file.
- Cross verified offshore work with Informatica objects and stored procedures daily and provided necessary corrections.
- Reviewed weekly status and performed brainstorming sessions with Manager to progress on the work.
- Coordinated with our team to develop Informatica objects and SQL Stored procedure to process the *.csv files and load it into SQL tables.
- Used Informatica Metadata Manager and Custom Metadata Configurator to display the Lineage data in graphical form.
- Followed up with DQ team to get the Lineage data reviewed and applied necessary fixes.
- Prepared technical and process documents outlining guidelines, process and object details of the work.
Environment: Informatica Powercenter 9.1.0, Metadata Manager 9.1.0, Custom Metadata Configurator, SQL Server 2012, UNIX and Autosys.
Confidential
Lead ETL Developer
Responsibilities:
- Participated in meetings with Business Analysts and Managers to understand the requirements, prepared Source to Target mapping spec and Low-Level Design document.
- Worked with offshore team of 5 ETL developers and 2 Python developers to socialize the requirements and Design.
- Informatica objects are developed per Source to Target mapping spec and unit tested the results. Utilized Persistent caching, Partitioning and Flatfile load to improve performance.
- Developed Unix Scripts for automating workflow run and used Autosys for scheduling.
- Established SSH connection to downstream servers.
- Created Python scripts to generate Flatfiles out of SQL tables using SQL BCP utility and transmit Flatfiles to downstream server using UNIX credential object.
- Coordinated with TQMS team on the Functional and Regression testing and fixed the reported issues.
- Coordinated with downstream on the UAT and acceptance testing.
- Made code changes as per the enhancement requests and followed SDLC process to promote it to Production.
- Prepared Flow diagram, design document and support handbook.
- Worked with Production support team on the setup and knowledge transfer.
- Monitored daily and monthly batches and helped Production support team with any questions, failures and other emergency updates.
Environment: Informatica Powercenter 9.1.0, SQL Server 2012, DB2 9.7, UNIX, Python 2.7.1 and Autosys.
Confidential
Lead ETL Developer
Responsibilities:
- Participated in meetings with Users, Business Analyst, DQ Team and Manager to understand the requirement and complete the low-level design.
- Worked with offshore team of 5 ETL developers to complete the source to target mappings, development, unit testing and documentation for all the Informatica processes (Generating query out of metadata tables for the controls, running the query, storing the result on the tables, calculating DQI and report generation).
- Developed Excel Macros using VBA to create report on data quality, formatting and displaying DQ metrics at multiple levels.
- Used UNIX Shell scripting to transfer the Excel files on Windows server to users through email.
- Developed Unix Scripts for automating workflow run and used Autosys for scheduling.
- Coordinated with 4 ETL developers in offshore for development and unit testing.
- Made code changes as per the enhancement requests and followed SDLC process to promote it to Production.
- Prepared production monitoring and support handbook for ETL Process.
- Worked with Production support team on the setup and knowledge transfer.
- Monitored daily and monthly batches and helped Production support team with any questions, failures and other emergency updates.
Environment: Informatica Powercenter 9.1.0, SQL Server 2012, DB2 9.7, UNIX, Autosys, Excel VBA and Windows Batch scripting.
Confidential
ETL Developer
Responsibilities:
- Worked with ETL Lead to understand the requirement and prepare Source to Target mapping spec.
- Created Informatica objects for Source to Target mapping spec and unit tested the results.
- Utilized Persistent caching, Partitioning and Flatfile load to improve the performance.
- Created SQL Stored procedure to calculate Holding Period and Haircut factor of the collateral and Exposure at Default (EAD) of counterparties.
- Developed Unix Scripts for automating workflow run, stored procedure execution and used Autosys for scheduling.
- Coordinated with TQMS team on the Functional and Regression testing and fixed the reported issues.
- Made code changes as per the enhancement requests and followed SDLC process to promote it into Production.
- Prepared Flow diagram, design document and support handbook.
- Worked with Production support team on the setup and knowledge transfer.
- Monitored daily and monthly batches and helped Production support team with any questions, failures and other emergency updates.
Environment: Informatica Powercenter 8.6.1/9.1.0 , SQL Server 2008/2012, DB2 9.7, UNIX and Autosys.
Confidential
ETL Developer
Responsibilities:
- Worked with ETL Lead to understand the existing requirement, Informatica objects, UNIX scripts and Autosys jobs.
- Improved performance of the existing process by persistent caching, partitioning, optimizing SQL queries and Flatfile load.
- Created Informatica objects, UNIX scripts and Autosys jobs for new reports and enhanced the existing reports as per the change requests.
- Coordinated with ETL Lead daily to understand the change requests, issue analysis and code review.
- Automated report formatting, creating Pivot tables and other cosmetic changes to the report using VBA and Windows batch scripting.
- Prepared a production monitoring and support handbook for ETL Process.
- Monitored daily and monthly batches and helped Production support team with any questions, failures and other emergency updates.
Environment: Informatica Powercenter 8.6.1/9.1.0 , SQL Server 2005/2008/2012 , DB2 9.7, UNIX, Autosys, Excel VBA and Windows Batch scripting.