Senior Software Architect - Data Integration Resume
TexaS
SUMMARY
- A collaborative engineering professional with substantial experience designing and executing solutions for complex business problems involvinglarge scale data warehousing, real - time analytics and Business intelligence reporting solutions.
- Information Technology Professional with 13+ years of experience as a Data Analyst, Data Engineer, ETL Data Integration Engineer, Application Design, Application Development, Implementation and Testing of Applications in Data warehousing with 4+ years of experience in development of Big data Hadoop Framework with industry experience in Banking and Financial services and Telecom.
- Detail oriented and resourceful in the completion of projects with an ability to multitask and meet deadlines.
- Organized individual with exceptional follow through capabilities.
- Has extensive experience in Gathering functional requirements from business users and System Design, Coding, Testing and production support for number of large projects. Participated in development and enhancement of large projects using Informatica.
- Highly regarded for proactive attitude, ability to think analytically and learning skills.
- Participated in various stages of Software Development Life Cycle (SDLC) such as Analysis, designing, developing, debugging, conversion, testing, implementation, and production support.
- In depth knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, and YARN / MapReduce programming paradigm and for working with Big Data to analyze large data sets efficiently.
- Extensive experience in Hadoop augmented data warehouse environment.
- Extensive experience in using Informatica Power Exchange platform to set up real time CDC processing from source systems from source systems.
- Extensive Experience in migrating On prem Data Warehouse to cloud and migrating On prem big data clusters to cloud using AWS EC2, AWS RDS and AWS EMR respectively.
- Experience in cloud object storage like AWS S3 for data ingestion and store processed data.
- Expert in understanding cloud framework (AWS) inregards to Data processing and data integration platforms.
- Experience in DevOps framework and used tools like Ansible for configuration management and acquaintance with container frameworks like Docker and code management repositories like GIT HUB to manage code base.
- Extensive experience in working with Ecosystems like Hive, Sqoop, Pyspark.
- Experience in importing and exporting terabytes of data using Sqoop from HDFS to Relational Database Systems and vice-versa.
- Experience working on processing data using Hive. Involved in creating Hive tables, data loading and writing hive queries.
- Experience working with SPARK framework with hands on experience on SPARK SQL, SPARK RDD / Data frame level programming.
- Experience in working with Python scripting using Spark Framework.
- Experience in analyzing data using Hive QL, Spark SQL.
- Extensive experience in using Informatica Big Data management plat form to integrate with Big data clusters.
- Knowledge of job workflow scheduling and monitoring tools like ControlM, appWorx.
- Experience in Performance tuning in ETL processing and decreased batch window by 25%. Performance tuning in adhoc SQL’s and reduced latency by 30%.
- Experience with UNIX Shell Scripting (KSH - Korn Shell Scripting).
- In-depth knowledge in handling migration of traditional Data warehouse environment to Hadoop augmented data warehouse.
- Experience in developing Pyspark scripts for aggregating data and to Performa any complex computational task to make use of Big data computational power in optimal way.
- Experience in performance tuning of ETL Sources, Targets, Mappings, transformations & Sessions.
- Experience in designing the mappings/sessions, SQL Server SSIS packages.
- Expert in Teradata SQL, Teradata Utilities (BTEQ, Fast Load, Fast Export, Multiload, TD Administrator 7.1, TD SQL Assistant 13/14.10).
- Experience introubleshootingTeradata scripts, fixing bugs and addressing production issues andperformance tuning.
- Experience introubleshootingcomplex SQL queries and addressing production issues andperformance tuning.
- Experienced in writing and tuning complex SQL queries, Triggers and Stored procedures in SQL Server, Oracle, Teradata.
- Strong experience in designing and developing Business Intelligence solutions in Data Warehousing/Decision Support Systems using Informatica Power Center and SQL Server suite of product.
- Experience inBI DevelopmentandDeploymentofDTS, SSISpackages from MS-Access, Excel, and Oracle.
- Experienced in programming Tasks-Stored Procedures, Triggers, CursorsusingSQL Server 2012/2014withT-SQL.
- Extensive experience in implementation of Data Cleanup procedures, transformations, Scripts, Stored Procedures and execution of test plans for loading the data successfully into the targets.
- Used SQL & STORED PROCEDURES to write complex queries to retrieve/manipulate data from various tables to match business functionality.
- Extracted and transferred source data from different databases like Oracle, SQL SERVER, and DB2and flat file into Oracle.
- Data modeling experience using Dimensional Data Modeling, Star Schema Modeling, Snow-Flake Modeling, FACT and Dimensions Tables, Physical and Logical Data Modeling, Erwin 4.x.
TECHNICAL SKILLS
Operating Systems: UNIX, Windows NT/ 2000/ XP/10
Database: Oracle 10g/9i/8.x, SQL Server 2005/2008/2012 , DB2, MS Access | Teradata 12. xx / 13. xx / 14. Xx, AWS RDS.
Language: Informatica Power Center 9.x/8.x, Teradata, Hadoop, Cloudera 5.x, Hive 2.x, SQOOP, Impala shell, Apache Spark, Python2.x,YARN, Version one, Jira, PYSPARK, C Language, TOAD for Oracle, Lotus Notes, MS-Office Suit, MS Visio, PL/SQL, Shell Scripting, SSIS, SSRS, HDP 2.x, MapReduce, Informatica PowerXchnage 10.x for CDC, Informatica BDM 10.x.
Teradata Utilities: Teradata - MLOAD, Fast Load, BTEQ, Fast Export, TPUMP, and TPT
Teradata Tools: Teradata SQL Assistant
ETL tools: Informatica Power center 9.x, Informatica BDM 10.x, Informatica PowerXchnage 10.x for CDC, SSIS
BI Reporting Tools: SSRS
Process/Methodologies: Waterfall Methodology, Agile Methodology
MS Office Applications: Word, Excel, PowerPoint, Visual Basic, SharePoint
Advanced Excel Skills: Pivot tables, VLOOKUP, HLOOKUP, IF statements, List functions
Testing Tools: HP Quality Center
PROFESSIONAL EXPERIENCE
Confidential, TEXAS
Senior Software Architect - Data Integration
Responsibilities:
- Involved in modernization of traditional Datawarehouse by migrating it from Oracle into Hadoop to reduce processing windows significantly and provide near real time analytics to customers.
- Started with POC on Hortonworks Hadoop converting one small, medium, complex traditional Data Warehouse into Hadoop.
- Developed data ingestion pipelines from RDBMS (Oracle, SQL Server) to Hadoop using SQOOP framework.
- Created Hive tables and stored data in ORC format for optimal performance and storage of data.
- Involved in Design, develop Hive Data model and loading into Hive.
- Created Python scripts using Spark Framework.
- Involved in analyzing data using Hive QL, Spark SQL.
- Migrated On prem Data warehouse infrastructure to AWS cloud.
- Create architecture front end development using AWS tools and services such as RDS, S3, AWS Glue, Lambda Functions, Apache Spark, API Gateway, Data lakes, EC2, CloudFront, Restful API, ECS, EKS, SES, VPC and Control Tower Automate ongoing policy management.
- Built serverless architecture with Lambda integrated with SNS, SES, CloudWatch and other AWS Services.
- Configured CI/CD Pipeline using Jenkins connected to Github to build environments (DEV stage & Prod)
- Developed Ansible play books for configuring Hive tables and Databases across environments.
- Good exposure to Jinja2 framework for using in Ansible to configure servers across environments.
- Created Hive external and internal tables and extensively used Hive partitioning and bucketing concepts to gain performance benefits.
- Developed and deployed ETL strategies using Informatica BDM to load data into Hadoop Cluster.
- Developed and tuned Hive QL’s to achieve optimal compute performance.
- Used Spark engine and Blaze engine during ETL processing on Hadoop Cluster.
- Involved in setting up and capturing real time data using Informatica PowerXchange and establish strategies to capture data into Datawarehouse for near Realtime insights.
- Developed SQOOP ingestion framework for full loads as well as incremental loads.
- After successful POC started to convert the existing Oracle Datawarehouse systems built on other ETL platforms into modern ETL platforms like Informatica Big Data Management edition.
- Mainly worked on Hive QL to categorize data of different systems like call center, sales and billing. Implemented Partitioning, Dynamic Partitions, and Buckets in HIVE.
- Monitored Full/Incremental/Daily Loads and support all scheduled ETL jobs for Performance improvements.
- Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
- Involved in optimizing the Data ingestion strategies by reducing the data ingestion window from approximately 23 hours to 6 hours.
- Loaded data into Hive using different compression strategies and ORC file format.
- Involved in Technical decisions for Business requirement, Interaction with Business Analysts, Client team, Development team,Capacity planningandUp gradationofSystem Configuration.
- Worked on UNIX scripting for data preprocessing before loading them to DB.
- Extensive experience in analyzing and optimizing complex SQL queries.
Confidential, TEXAS
Data Integration Consultant
Responsibilities:
- Involved in designing dimensional modeling and implemented STAR flake, SNOWflake schemas.
- Involved in development and design of a Hadoop cluster using Apache Hadoop for POC and sample data analysis.
- Performed current state system analysis, developed functional and technical specifications, enhancing existing objects like procedures, functions, packages, views etc. and maintaining documentation.
- Started with POC on Cloudera Hadoop converting one small, medium, complex traditional Data Warehouse into Hadoop.
- After successful POC started to convert the existing Teradata systems built on other ETL platforms into suitable Spark SQL/Data frames scripts, Pyspark, Hive, Impala shell. Having background on all the ETL technologies helped me to analyze and convert the existing system faster into Hadoop.
- Developed Hive data model using Hive query language which will serve same purpose as traditional Data Warehouse.
- Developed multiple POCs using Python scripts and deployed on the Yarn cluster, compared the performance of Spark, with Hive and SQL/Teradata.
- Migrating the data of high volumes from Teradata into HDFS using Sqoop, Informatica ETL and importing various formats of flat files in to HDFS.
- Created Hive external and internal tables and extensively used Hive partitioning and bucketing concepts to gain performance benefits.
- Familiarity with SPARK scripts, Data frames/SQL and RDD in Spark 1.6 for Data Aggregation, queries and writing data back into OLTP system through Sqoop.
- Mainly worked on Hive QL to categorize data of different systems like call center, sales and billing. Implemented Partitioning, Dynamic Partitions, and Buckets in HIVE.
- Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
- Working on projecting, involving and migration of data from different sources, Teradata to HDFS Data Lake and creating reports by performing transformations on the data put in the Hadoop data lake.
- Developed Hive queries to process the data and generate the data cubes for visualizing
- Implemented schema extraction for Parquet and Avro file Formats in Hive.
- Used Hive Queries in Spark-SQL for analysis and processing thedata.
- Used RDD's to perform various transformation on datasets.
- Experience in importing and exportingdatausing Sqoop from HDFS to Relational Database Systems and vice-versa.
Environment: Informatica Power Center 9.x, Oracle 10g/9i, Teradata, Toad 8.6, Sun Solaris 9/10, SQL, MS Office Visio 2003, Appworx Enterprise Job Scheduler, Tidal Enterprise Job Scheduler, Hadoop, HDFS, Spark, Python, Pig, Hive, SQOOP, Kafka, Linux shell scripting, Eclipse, Cloudera, Jira.
Confidential, TEXAS
Lead Programmer Analyst
Responsibilities:
- Working as onsite coordinator to coordinate Requirements between Business teams and offshore Teams.
- Monitored Full/Incremental/Daily Loads and support all scheduled ETL jobs for Performance improvements.
- Extensively used Pushdown Optimization in Teradata to gain maximum Performance by using Full PDO.
- Worked on Fast Load, Multiload, TPT, TPUMP and Fast Export loading techniques through Informatica into Teradata.
- Used BTEQ and SQL Assistant front-end tools to issue SQL commands matching the business Requirements to Teradata RDBMS.
- Experienced in programming Tasks-Stored Procedures, Triggers, CursorsusingSQL Server 2012/2014withT-SQL.
- Experience in importing and exportingdatausing Sqoop from HDFS to Relational Database Systems and vice-versa.
- Extensively used transformations such as Source Qualifier, Aggregator, Expression, Lookup, Router, Filter, Update Strategy, Joiner, Transaction Control and Stored Procedure.
- Hands on experience in application development using RDBMS, Linux shell scripting, Python.
- Used Hive, Sqoop, Impala, and Cloudera Manager to ingest and process data in Hadoop environment.
- Analyzed and developed partitioned tables using Partitioning, Dynamic Partition, Indexing and buckets in Hive.
- Coordinated testing in multiple environments with multiple teams.
- Developed, deployed and monitored SSIS Packages for new ETL Processes and upgraded the existing DTS packages to SSIS for the on-going ETL Processes.
- Extensively worked with Informatica tools - Source Analyzer, Warehouse Designer, Transformation developer, Mapplet Designer, Mapping Designer, Repository manager, Workflow Manager, Workflow Monitor, Repository server and Informatica server to load data from flat files, legacy data.
- Worked on UNIX scripting for data preprocessing before loading them to DB.
- Involved inMigrationof Databases fromSQL Server 2005 to SQL Server 2008 using SSIS.
- Involved in Technical decisions for Business requirement, Interaction with Business Analysts, Client team, Development team,Capacity planningandUp gradationofSystem Configuration.
- Creating reports usingSQL Reporting Services (SSRS)for customized and ad-hoc Queries.
- Migration ofDTSpackages toSSISpackages.
- Experience inBI DevelopmentandDeploymentofDTS, SSISpackages from MS-Access, Excel, Oracle.
- Resolved critical issues in production environment in numerous instances as I took care of Production support as well.
- Used PMCMD command to start and run the workflow from the UNIX environment.