Lead Data Architect Resume
Hartford, CT
SUMMARY:
- Highly effective Leader/Architect/Developer professional with around 11+ years of experience covering multiple technologies Hadoop(HDFS) framework, Spark, PySpark, Hive, Pig, HBase, Apache Kafka Real - time system, Apache SOLR,MQ Series and Data-ware housing(DWH) processes Talend,Abinitio,Unix, Oracle, big data, cloud, data and analytics platforms.
- Worked with largest financial and Insurance groups such as Confidential, CapitalOne Financial services, BNP Paribas and ABSA, Confidential
- Recognized as a thoughtful leader with expert ability to successfully implement end-to-end architecture and design for big data, cloud, data-warehouse and business intelligence.
- Successfully transformed 14+ Enterprise POC to Applications using Hadoop Eco-system.
- Led development team in designing/migrating AWS cloud based solutions
- Experience with AWS (EC2) / EMR instances and services.
- Designed and developed ingestion pipelines and data analysis systems for multiple Fortune 500 client to process over 300 TB of data.
- Provided technical expertise and created software design proposals for upcoming components.
- Utilized MapReduce, HDFS, Spark, Hive, Pig, Kafka & MongoDB to design multiple data platforms.
- Developed real-time data management services for Intelli drive policy data using Splunk
- Design, Deploy and maintain enterprise class security, network and systems management applications within an AWS environment
- Evaluation/design/development/deployment of additional technologies and automation for managed services on AWS
- Working on Customer interaction history integration to HDFS using Flume, RabbitMQ and MongoDB.
- Initiated a lot of fine tuning mechanisms to tune the database such as Hive,MongoDB,Oracle as well as the queries to complete a set of given jobs or tasks in optimal time
- Evaluated project proposals as a member and actively worked on project resourcing and estimations.
- Develop frameworks, metrics and reporting to ensure progress can be measured, evaluated and continually improved.
- Perform in-depth analysis of research information for the purpose of identifying opportunities, developing proposals and recommendations for use by management
TECHNICAL SKILLS:
Frameworks & Technologies: Hadoop 1.0,Hadoop 2.0,HDFS,YARN, Spark 1.6.2, Spark 2.0,Spark 2.1.1,PySpark,Flume Agent, Sqoop,Hive Pig,AWS,Oozie, Hue,Ambari,Atlas MapReduce, Apache Kafka,ActiveMQ series, Apache SOLR, Talend, Abinitio, Maven, Parquet,ORC, Zookeeper, Jenkins, Urban Code deploy(UCD)
Languages: Python,PySpark API,HiveSQL,Pig,Java,JavaScript,Shell Scripting,HTML,XML,Perl
Databases: MongoDB, HBase, Hive,PostgreSQL,Oracle, SQL Server, Teradata,DB2,MySQL
Concepts: NoSQL, REST API,SOAP,ETL,DWH,Master Data Management(MDM),RDMS,PL/SQL
PROFESSIONAL EXPERIENCE:
Confidential, Hartford, CT
Lead Data Architect
- Designed, Develop and implemented Contact Center Data Platform (CCDP) using PySpark, HDFS, Flume Agent, Hive, Sqoop and MongoDB.
- Designed and implemented Data Lake by extracting customer's data from various data sources into HDFS. This includes data from SQL Server, Teradata, CSV, Sequence Files, JSON, XML files.
- Involved in design and development of data transformation framework components to support ETL and ELT process using Hive, which gets the single complete actionable view of a customer.
- Developed an ingestion module to ingest data into HDFS from heterogeneous data sources.
- Started Proof of Concepts to implement Real-time services for Customer 360 data sources.
- Designed and Develop DevOps processes for Continuous Integration (CI) in Analytics LOB using Urban Code deploy (UCD), Jenkins and UMT.
- Support the development for performance dashboards that encompass key metrics to be reviewed with senior leadership and sales management.
- Optimized and de-normalized the MongoDB collections for Customer360 UI search improvisations and data visualizations.
- Working on Customer interaction history integration with HDFS using Flume agent, Hive and MongoDB.
- Designed and develop Hive metastore, database and tables.
- Developed real-time data management services for Intelli drive policy data using Splunk
- Exploring machine learning algorithms using AWS Sagemaker for Insurance Campaigning.
- Worked with HDFS container design and creation for HIVE tables implementations.
- Designing and developing unified search for Customer360 using apache SOLR and MongoDB.
- Working on proposal of Business Insurance(BI) data ingestion pipeline
- Designed, Developed and Implemented New Architecture with Talend Administrative Console(TAC) and Autosys
- Successfully completed 3 Proof of Concepts to Convert 3 Abinitio DWH applications to Talend for data-ware housing projects.
- Created DevOps Continuous Integration(CI) pipeline process for EBIA line of business using Jenkins(CloudBees) and Urban Code deploy
- Design, develop and Implemented Talend jobs with HDFS configuration, Generic framework
- Reviewed Talend jobs and created code review standards and processes for Developers
- Created Talend projects and roles, maintained for different teams on TAC
- Designed and developed real-time services to process Intelli drive information of Policy/Customer and push to Sales force Marketing Cloud (SFMC) using REST API calls using PySpark.
Confidential, Plano TX and Richmond VA
Data Architect, Lead, And Developer
Project: Credit Hub, Sales reporting Location: Plano, TX and Richmond,VA
- Designed, develop & implemented real-time data processing application from Kafka using Flume, HDFS, Spark, Python, Hive to PostgreSQL.
- Developed multiple MapReduce jobs in java for data cleaning and preprocessing.
- Designed and implemented Data Lake for retail bank data using Spark, HDFS, Python, Hive, and Teradata.
- Improved data processing and storage throughput by using Hadoop framework for distributed computing across a cluster of up to 16 node
- Used Sqoop and Flume solutions to move data from HDFS to RDBMS and vice-versa.
- Designed and created Hive tables with parquet for performing complex aggregations.
- Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis
- Worked on Talend code deployments using Jenkins and UCD for DevOps
- Installed and configured Hadoop(HDFS) cluster in Test and Production environments
- Performed both major and minor upgrades to the existing CDH cluster
- Implemented Commissioning and Decommissioning of new nodes to existing cluster.
- Analyzing/Transforming data with Hive and Pig.
- Launch, configure and maintenance AWS EC2 instances for solution designs
- Building customized memory indexes for high performance information retrieval using Apache Lucene and Apache SOLR
- Talend Open Studio installation, environment setup and infrastructure design and implementation.
- Worked on Data Demoralization for all lookup tables from source where I have de-normalized 64 PK tables to 3 tables with Business confirmation.
- Worked with JSON file processing using Talend and define Xpath to extract Json from Kafka queue.
- Design the Job using repository Metadata in Talend DI tool and performed build operations for Import/export.
- Designed and implemented real-time Kafka data streaming to PostgreSQL using AWS cloud solutions.
- Architect, Developed and Implemented Talend Big data application in Credit hub application.
- Created Metadata in repository for Databases, files and Contexts using Talend and also Implemented best practices
- Created sandbox utility using shell scripts for individual development and testing environment.
- Converted major credit division applications/processes from Abinitio to Talend(6.2) Big Data platform
- Worked extensively with Fact - Dimension modeling applications.
- Designing, Developing, testing and implementation of Ab Initio Graphs
- Worked High level designing flow and Design level documentation (DLD)
- Graph development and mapping review, code review and mapping co-ordination for OFF-SHORE
- Delta processing and Up-sert logic implementation technique
- Improved performance of Ab Initio graphs by using various Ab Initio performance techniques like using lookups in memory joins and rollups to speed up various Ab Initio graphs.
- Sales Trends applications provides D-marts and ability to view daily and weekly sales trends for various metrics like sales dollars, gross profit, etc. related to product hierarchy. Any deviation from an average value for a specific metric will trigger an alert.
- Developed wrapper scripts to periodically notify users in case of any failures with debugging information.
- Developed UNIX Korn Shell wrappers to initialize variables, run graphs and perform error handling.
- Developed, tested and reviewed complex Ab Initio graphs, sub-graphs, DML, Pset, XFR, deployed scripts, DBC files for connectivity create Package and exports.
Confidential Waren NJ
Ab initio Developer/Project coordinator Project: Strategic Solar (S2)
Responsibilities:
- Multiple graphs were built to unload all the data needed from different source databases by configuring the dbc file in the Input Table component.
- Developed the graphs using the GDE, with partition by round robin, partition by key, rollup, sort, scanning, dedup sort, reformat, join, merge, gather components.
- Developed Ab-Initio graphs using Ab-Initio Parallelism techniques, Data Parallelism and MFS Techniques with the Conditional Components and Conditional DML.
- Home loans and mortgages business understanding to enhancements related to data handling
- Ab Initio code optimizing for New Small scale business requirements handling.
- Gathered Business requirements and Mapping documents.
- Identified the required dependencies between ETL processes and triggers to schedule the Jobs to populate
- Data Marts on scheduled basis and carried the performance tuning on the Ab Initio graphs to reduce the process time.
- Created several SQL queries and created several reports using the above data mart for UAT and user
Reports.
- Worked in the Data Management Team on Data Extraction, Subset, Data Cleansing and Data Validation.
- Facilitated the effective communication between the Development team and the Analysis/Design team
Working at client site and scheduled a weekly status meeting for any critical issue and any dependency between design and development team.
- Liaised with business and functional owner during risk engineering and high-level review sessions to drive
& execute action plans, meeting deadlines and standards.
- Developed number of Ab Initio Graphs based on business requirements using various Ab Initio Components
such as Partition by Key, Partition by round robin, Reformat, Rollup, Join, Scan, Normalize, Gather, Merge etc.
- Extensively used the Ab Initio tool’s feature of Component, Data and Pipeline parallelism.
- Used AIR commands to do dependency analysis for all ABI objects
- Moved data feeds from the mainframe (MVS) system to the Data Developing Environment (Ab Initio).
- Developed shell scripts for Archiving, Data Loading procedures and Validation.
- Tested and tuned the Ab Initio graphs and Teradata SQL’s for better performance
- Took part in quality assurance activities, design and code reviews, unit testing, defect fixes and operational
Readiness.
- Special follow up such as Quarter end, month end