We provide IT Staff Augmentation Services!

Big Data Architect / Etl - Hadoop Architect - Technical Lead Resume

4.00/5 (Submit Your Rating)

Hartford, CT

SUMMARY

  • Over Eighteen years of professional IT experience in all phases of Software Development Life Cycle which includes hands on experience in Big Data Analytics. Excellent business domain experience in HealthCare, Supply Chain Management (Procurement), Sales, Planning, Finance, Pricing, Hedge Funds, Stocks/Securities, Trading, Reinsurance, Accounting, Telecom, Retail and Insurance / Actuarial data domain.
  • Hands on experience in different phases of big data applications like data ingestion, data analytics and data visualization and building data lake based data marts for supporting data science and Machine Learning and into AI.
  • Hands on experience on Hadoop ecosystem components like Hadoop, HDFS, HBase/ Phoenix, Zoo Keeper, Hive, Oozie, Sqoop, Pig, Flume, Kafka, Storm, Spark, Cassandra /Datastax with Cloudera and Horton Works distributions.
  • Good Understanding of Hadoop Gen1/Gen2 architecture and Hands - on experience with Hadoop components such as Job Tracker, Task Tracker, Name Node, Data Node, Map Reduce concepts and YARN architecture which includes Node manager, Resource manager.
  • Hands on experience in big data ingestion tools like Talend Real time Integration 6.2.1 and
  • Experience in transferring Streaming data, data from different data sources into HDFS, No SQL databases using Apache Flume and Apache Kafka.
  • Well versed with Developing and Implementing Map Reduce programs using Hadoop to work with Semi and Unstructured data using Talend Real time Integration Tool.
  • Good understanding of Data Mining and Machine Learning techniques like Random Forest, logistic Regression, K-Means using Spark Mlib.
  • Experienced to interact with data scientists to implement ad-hoc queries using Hive QL, Partitioning, bucketing, Hue, Python, and Hive Custom UDF's. Experience in writing Groovy Scripts.
  • Experience in optimizing hive queries, joins and using different data files with Custom SerDe's.
  • Exposure with Apache Storm architecture to integrate with Kafka to perform streaming operations.
  • Experience in using different file formats like VSAM, CSV, Sequence, AVRO, RC, ORC, JSON and PARQUET files and different compression Techniques like LZO, Gzip,Bzip2 and Snappy.
  • Exposure in implementing advanced procedures like text analytics and processing using the in-memory computing capabilities like Apache Spark written in Scala.
  • Experience in using UNIX scripting and Oozie schedulers to implement Cron jobs that execute different kind of Hadoop actions.
  • Good knowledge and hands on experience on NoSQL databases like HBase and Cassandra.
  • Good understanding on AWS, Azure, Cloud Platform, Big Table, and Natural Language Processing etc.
  • Good understanding and experience with Software Development methodologies like Agile and Waterfall.

TECHNICAL SKILLS

  • Hadoop
  • HDFS
  • Hive
  • Map Reduce
  • Pig
  • Sqoop
  • Flume
  • Oozie
  • Impala
  • Spark
  • Storm
  • Kafka
  • Zookeeper
  • Nifi
  • HBase
  • Cassandra
  • Mongo DB
  • Python
  • Unix Shell
  • Java
  • Scala
  • HTML
  • CSS
  • JavaScript
  • Struts
  • SQL
  • PL/SQL
  • Oracle 12 C
  • MySQL
  • DB2
  • SAP Hana
  • Informatica 9.5
  • Data Stage 9.0
  • Abinitio 3.0.2
  • Talend
  • Maven
  • Jenkins
  • Sbt
  • SVN
  • GIT
  • NetBeans
  • Eclipse
  • Netezza
  • Teradata 15.10
  • Erwin 8
  • DB2

PROFESIONAL EXPERIENCE

Confidential, Hartford, CT

Big Data Architect / ETL - Hadoop Architect - Technical Lead

RESPONSIBILITIES:

  • Designing and architecting a Horton Works based Big Data platform - Data Lake for Supporting Confidential Retirement, Finance, Actuarial and Web data analytics requirements.
  • Designing different Zone’s for Organizing data in Data Lake, based on data domain on HDFS.
  • Designing and Developing Data Ingestion for integrating data from multiple source systems like Mainframes, SAP - BPC, Oracle, Web logs and other unstructured data using Talend Real Time data integration Tool. And building integration data marts using Netezza for supporting analytics and cognos dashboards and OLAP cubes. Worked on different files types like JSON, csv, Unstructured etc.
  • Building out Spark jobs in Talend for improving performance for in-memory processing.
  • Involved in converting Hive/HQL queries into Spark transformations using Spark RDDs, Scala and have a good experience in using Spark-Shell and Spark Streaming.
  • Exposure in data Ingestion and handling clusters in real time processing using ApacheKafka.
  • Developed generic Sqoop scripts and Talend Jobs for importing and exporting data between HDFS and Relational Systems like Oracle, Mysql, SAP - BPC and Netezza.
  • Exposure in Creating Cassandra tables to load large sets of structured, semi-structured and unstructured data coming from Linux, NoSQL and a variety of portfolios.
  • Involved in executing various Oozie workflows and automating parallel Hadoop MapReduce jobs.
  • Working with vendor and client support teams to assist critical production issues based on SLA's.
  • Created Talend Mappings to populate the data into dimensions and fact tables. Good Hands on developing Talend DI Jobs to transfer the data from Source views to Hadoop Staging, Target Layers
  • Developed jobs to move inbound files to vendor server location based on monthly, weekly and daily frequency in Talend. Wrote the Map Reduce jobs to parse the web logs which are stored in HDFS.
  • Knowledge in ingesting data into Cassandra and consuming the ingested data from Cassandra to Hadoop Data lake. Familiar on using Python and Groovy and HDFS commands.
  • Good understanding in the process ofCassandra data modelling and building efficient data structures.
  • Exposure in using Solr Cloud implementation to provide real time search capabilities on the repository with tera bytes of data.
  • Involved in restarting failed Hadoop jobs in production environment.
  • Wrote the Map Reduce jobs to parse the web logs which are stored in HDFS.
  • Participated in designing Data governance and Lineage using Talend Metadata Manager.
  • Leading the Technical Team, conducting interviews, assigning tasks, and follow up.

Environment: Horton Works HDFS, Hive, Hbase/Phoenix, Pig, Oozie, Cassandra, Sqoop, Apache Kafka, Linux, Talend real time integration, TAC, Talend Metadata Manager, Spark, Confluence, GitHub, Nexus, Netezza, Cognos 10, 11, Kerberos, Oracle 12 C, Python, Sql Developer, Toad, Hue, Aginity Work Bench, Zaloni, Bedrock, Mica, SQL server Management Studio, SSIS, SSAS, Data tools

Confidential, Bloomfield, CT

ETL Engineer - Architect

Responsibilities:

  • Involved in importing and exporting data between Hadoop data lake and Relational Systems like Oracle,, DB2, Unstructured data and Teradata using Sqoop
  • Participated in architecting a Teradata based IDS and Enterprise EDW for Cigna.
  • Worked on Creating Hive partitions, writing custom serde’s and creating data marts.
  • Developed script which will Load the data into Teradata from different source systems and do analytics dashboards for supporting Insights using Tableau and Looker.
  • Enhanced HIVE queries performance using TEZ for Customer Attribution datasets.
  • Involved in loading data from UNIX file system to HDFS and also responsible for writing generic scripts in UNIX. Involved in developing and testing Pig Latin Scripts.
  • Managing and scheduling Jobs on aHadoopcluster using Oozie.
  • Involved in troubleshooting errors in Shell, Hive and MapReduce.
  • Worked on debugging, performance tuning of Hive & Pig Jobs.
  • Wrote Hive Generic UDF's to perform business logic operations at record level.
  • Implemented Partitioning, Dynamic Partitions and bucketing in HIVE for efficient data access.
  • Developed workflow in Oozie to automate the tasks of loading the data into Hive tables.
  • Involved in running queries using Impala and used BI tools to run ad-hoc queries directly on Hadoop.
  • Generated various reports using Tableau with Hadoop and Teradata as a source for data.
  • Written Map Reduce code that will take input as customer related flat file and parse the same data to extract the meaningful (domain specific) information for further processing.

Environment: Teradata 15.10, Query Grid 2.0, Cloudera, MapReduce, HDFS, Hive, Pig, Oozie, Sqoop, Apache Kafka, Storm, Impala, Linux, Tableau, Jira, Confluence, GitHub, Jenkins, Udeploy., Data Stage 9.0, Informatica Power Center 9.5, Oracle 11 g, DB2, TPT, Looker, Esp Scheduler, View Point, Erwin 8

Confidential, Bloomfield, CT

Senior Application Architect - ETL Data Engineer

Responsibilities:

  • Designing the required infrastructure for the application, to meet the committed services level.
  • Participating in the business meetings to gather the requirements and data needs.
  • Participating in the design of conceptual and logical data model for different subject areas.
  • Designing proper security for the production data access to make sure HIPAA and IIPI rules are strictly followed and applied in it.
  • Participating in defining different project scopes, high level from an Architecture perspective. Creating blue prints, architectural artifacts and holding walk through’s with business and senior management.
  • Designing ETL Design and architecture for the CCDR and providing the vision to the ETL development team towards reusability for ETL components towards implementing the Service Oriented Architecture
  • Designing and data feeds for different Data Marts for reporting applications to work on.
  • ETL code development and supporting different project UAT’s
  • Worked on MLR reporting, and data analysis related to HEDIS reporting
  • Wrote shell scripts to run the Cron jobs to automate the data migration process from external servers and FTP sites. Developed Control, Auditing and Balancing framework for Batch applications.
  • Created tasks for incremental load into staging tables and schedule them to run in Autosys.
  • Created mappings to load the data mart with the use of transformations like Aggregator, Filter, Router, Expression, Joiner, Sequence Generator and Update Strategy in Informatica and Data stage.

Environment: Oracle 11g on RAC, Unix, Informatica power center 9.5, Data Stage 9.0, Tableau, Looker, Esp scheduler, Teradata, View point, Erwin 8, Sybase, SQL Server, DB2, SAS, Cognos, Visual Studio, Tiwoli etc.

Confidential, New Britain, CT

Data Architect

Responsibilities:

  • Supporting broad, efficient global rollouts with rapid user and supplier adoption and integration of multiple legacy systems and capture of all spend. Master data management and data governance.
  • Facilitate faster, more effective communication, compliance and collaboration with multiple stakeholder’s worldwide. Participate in the development of procurement strategies, policies and integration of systems across product lines, Identifying cultural and other business issues.
  • Leading the Implementation of Ariba Spend Management Solutions, and standards based such as UNSPSC and spend visibility at invoice line item level ( SKU) for the enterprise to gain control and improved visibility into entire procurement lifecycle, through integration of multiple source legacy and ERP systems. The integrated source systems include SAP, and packaged Products like BPCS.
  • Designing the data flow diagrams and developing the Conceptual, logical and physical star data model for different subject areas like sales, planning, and procurement. Designing the overall process flow for the applications and co-coordinating with stakeholders, and getting the business approval. Developing XML based, standard extract format for integrating the data, across different source systems and for the re-usability of ETL design / code.

Environment: MS SQL Server 2012 and 2008, TSQL, SSAS, Query Analyzer, Enterprise Manager, Sybase PowerDesigner 12.5, IBM Information Server, Data Stage 7.5 and Websphere DataStage 8, DataStage administrator, DataStage Designer, Director and DataStage Manager, XML, XML Spy, Business Objects Enterprise XI Release 2, Business Objects Planning, SAP, SAP BW, BPCS(Business Planning and Control System - by System Software Associates) .

Confidential

Teradata and ETL Solution Architect

Responsibilities:

  • Assist project teams in Blueprints, Holding design walkthroughs Assist in performance measurement and monitoring of production Teradata environment. Define and develop performance-tuning measurements for the large in size Teradata database. Define infrastructure and application standards and approaches - hardware, software and methodology
  • Designing the data integration flow and diagrams / road maps, to use XML, from rapidly changing SOA web based source systems. Analyzing different types of XML documents ( free form, DTD, XML Schema etc.) to find the entity references and relationships, to use in the data design.
  • Design of the ETL processes and design of data warehouse control infrastructure
  • Providing insight to EDW Teradata Amp and Node Server farms.
  • Designing re-usable ETL components to be useful across different data domains and lines of business
  • Designing high level ETL architecture for different Warehouse projects. Working with Vendors in the development, testing and production rollouts of applications.
  • Developing processes for ACCORD XML transactions, to conform to the industry standard.

Environment: Teradata, Ab initio 1.14.28, Enterprise Meta Environment(EME), Data Profiler 1.7.5.1., Fastload, Multiload, TPump, Parallel Transporter, Unix, Erwin 7.1, Teradata Administrator, Microsoft Analysis Services, MDX, Cognos Powerplay, Impromptu, DB2, Sql Server 2005, XML, Accord, XML spy

Confidential, Stamford, CT

Senior ETL Data Developer

Responsibilities:

  • Developing processes for reconciling daily trading activities in stock exchanges in USA and U.K.
  • Designing and developing the ETL frame work and processes using Informatica 8.0, to run on the Domain, Integration Services and Nodes, on a Highly Available Grid ETL Infrastructure. Excellent hands on knowledge on Integration Services and Load Balancing using Powercenter 8.0. Developing T-Sql routines in Sybase and Sql Server
  • Experience on data sources like XML, Pdf files, excel, relational and mainframe files. Performance tuning the ETL processes, to use the Informatica Push Down optimization, in the staging and source databases, thus maximizing the flexibility and providing optimal performance for data and process intensive transformations
  • Involved in Designing of ODS, Star Schema and Staging area.
  • Used Striva Detail (Power Connect) to Read and Write data from mainframes files and Cobol Sources. Did Incremental Updates, used constraint based loading etc.
  • Used session, mapping, workflow variables, Created partitions, databases indexes and fine tuned the mappings/sessions to improve the performance of the ETL process
  • Participating in concept through design/development/testing of the entire ETL Process
  • Used AUTOSys on Unix platform to Schedule the ETL Process, using PMCMD, wrote Unix Shell Scripts.
  • Leading the Data Warehouse - Reporting initiatives

Environment: Informatica PowerCenter 6.2, Designer 6.2, Workflow Manager 6.2, Workflow Monitor 6.2, Repository manager 6.2, PowerConnect ( Striva Detail ), Toad,Oracle 9i, Cobol, Unix, AutoSys, Web Focus

Confidential, Harrisburg, PA

Data Architect - Oracle DBA

Responsibilities:

  • Participate in the Analysis and Design of the Conceptual, Logical and Physical Star Schema.
  • Creating and administering Oracle Database.
  • Designing the Staging/ODS Environment for the Warehouse using Oracle Replication to Minimize impact of ETL in Transactional System. Configuring the Windows 2000 Server for Database Creation. Capacity Planning and Designing Database Physical Architecture
  • Administering the Power Center Repository. Backup/Recovery of PM Server, Created and Migrated Repository and Folders for Development/Testing/Production Environment
  • Applying Patches to Database, Migrating the Database, Installing the Software Upgrades.
  • Monitoring the Data Loads and Data Cubes Creation, 24X7 Production on call Support
  • Involved in Researching and Re-Architecture the ETL environment and Coming up with a design for a Staging area to consolidate the Order and billing details at an Enterprise Level.
  • Detailed GAP analysis on new reporting initiatives and existing Data Marts.
  • Modeling conforming dimensions for Client Definition (Client base include Banks, Credit Unions and other financial institutions). Identifying the Data Dependency

Environment: Oracle 9i, PL/SQL, Model Repository & Power Designer, Informatica Power Mart 5.0, Server Manager, Repository Manager, DB2, Business Objects, Web Intelligence Tools, Unix, SQLPLUS, Toad, Visio

Confidential

ETL Data Engineer

Responsibilities:

  • Developed the system from scratch to production implementation. The design enables the users to drill from summary data to details. The system has an n-tier front-end architecture with Java and JSP User Interface and reports, report business logic in middle tier and Oracle 9i/8i database in back end using WebLogic Web server 5.1 and connection pooling
  • Designed Surrogate Key lookup tables to keep track of changing dimensions. Designed Data Staging Areas (Back Room) for the warehouse. Designed Summary tables and Materialized Views for Data Publishing. Did integrity and quality check of data in the Staging Areas.
  • Standardized data elements, filled data gaps, extracted, cleansed, reconciled, transported, summarized and aggregated data from ODS, online and external systems and loaded into DW
  • The project was meant to create an Oracle based Enterprise Data hub, migrating data from COBOL - DB2 legacy systems that were running on NCR Mainframe. Involved in concept through specifying the physical infrastructure, system study and design, development, testing, deployment, user training, and production support. The designed Multi-Dimensional Warehouse had data from Finance, trade, import, Sales, Retail, Manufacturing and Supply Chain, (SAP) inventory
  • Created Conforming dimensions, facts, Degenerated and Junk Dimensions etc

Environment: DB2, MF COBOL, Cobol85 on NCR Mainframe, Oracle 7.x on SCO UNIX, Power Designer, SQL Plus, PL/SQL, Developer 2000, Crystal Reports, Forms4.5, Report Server, Reports2.5, Graphics, Pro*COBOL, SQL*Loader, Designer 2000, Embedded SQL, ABAP, SAP

We'd love your feedback!