We provide IT Staff Augmentation Services!

Big Data Architect /data / Etl Lead Engineer - Hadoop Architect – Technical Lead Resume

3.00/5 (Submit Your Rating)

Hartford, CT

SUMMARY

  • Over Eighteen years of professional IT experience in all phases of Software Development Life Cycle which includes hands on experience in Big Data Analytics, Data Warehousing, ETL and Reporting and Analytics. Excellent business domain experience in HealthCare, Supply Chain Management (Procurement), Insurance, Sales, Planning, Finance, Pricing, Hedge Funds, Stocks/Securities, Trading, Reinsurance, Accounting, Telecom, Retail, Actuarial data domain.
  • Hands on experience in different phases of big data applications like data ingestion, data analytics and data visualization and building data lake based data marts for supporting business Intelligence, reporting and dashboards, data science and Machine Learning and into AI.
  • Hands on experience on Hadoop ecosystem components like Hadoop, HDFS, HBase/ Phoenix, Zoo Keeper, Hive, Oozie, Sqoop, Pig, Flume, Kafka, Storm, Spark, Cassandra /Datastax with Cloudera and Horton Works distributions.
  • Good Understanding of Hadoop Gen1/Gen2 architecture and Hands - on experience with Hadoop components such as Job Tracker, Task Tracker, Name Node, Data Node, Map Reduce concepts and YARN architecture which includes Node manager, Resource manager.
  • Hands on experience in big data ingestion tools like Talend Real time Integration 6.2.1 and
  • Experience and exposure in transferring Streaming data, data from different data sources into HDFS, No SQL databases using Apache Flume and Apache Kafka.
  • Well versed with Developing and Implementing Map Reduce programs using Hadoop to work with Semi and Unstructured data using Talend Real time Integration Tool.
  • Excellent experience in Data Modeling, Data design, and architecture of building Data Warehouses and related data marts for supporting BI and Analytics dashboards.
  • Excellent hands on experience on ETL tools like Informatica, Data Stage and Ab Initio. ETL and Data - Technical Design / architecture experience on ingesting data to Data Warehouses, related data marts on different Platforms like Oracle, Teradata, Netezza, and DB2.
  • Designed and build out different control and compliance mechanism for supporting auditing and balancing on ETL and data for supporting analytics and KPI level reporting for analytics dashboards.
  • Good understanding of Data Mining and Machine Learning techniques like Random Forest, logistic Regression, K-Means using Spark Mlib.
  • Experienced to interact with data scientists to implement ad-hoc queries using Hive QL, Partitioning, bucketing, Hue, Python, and Hive Custom UDF's. Experience in writing Groovy Scripts.
  • Experience in optimizing hive queries, joins and using different data files with Custom SerDe's.
  • Exposure with Apache Storm architecture to integrate with Kafka to perform streaming operations.
  • Experience in using different file formats like VSAM, CSV, Sequence, AVRO, RC, ORC, JSON and PARQUET files and different compression Techniques like LZO, Gzip,Bzip2 and Snappy.
  • Exposure in implementing advanced procedures like text analytics and processing using the in-memory computing capabilities like Apache Spark written in Scala.
  • Experience in using UNIX scripting and Oozie schedulers to implement Cron jobs that execute different kind of Hadoop actions.
  • Good knowledge and hands on experience on NoSQL databases like HBase and Cassandra.
  • Good understanding on AWS, Azure, Cloud Platform, Big Table, and Natural Language Processing etc.
  • Good understanding and experience with Software Development methodologies like Agile and Waterfall.

TECHNICAL SKILLS

  • Hadoop
  • HDFS
  • Hive
  • Map Reduce
  • Pig
  • Sqoop
  • Flume
  • Oozie
  • Impala
  • Spark
  • Storm
  • Kafka
  • Zookeeper
  • Nifi
  • HBase
  • Cassandra
  • Mongo DB
  • Python
  • Unix Shell
  • Java
  • Scala
  • HTML
  • CSS
  • JavaScript
  • Struts
  • SQL
  • PL/SQL
  • Oracle 12 C
  • MySQL
  • DB2
  • SAP Hana
  • Informatica 9.5
  • Data Stage 9.0
  • Abinitio 3.0.2
  • Talend
  • Maven
  • Jenkins
  • Sbt
  • SVN
  • GIT
  • NetBeans
  • Eclipse
  • Netezza
  • Teradata 15.10
  • Erwin 8
  • DB2

PROFESIONAL EXPERIENCE

Confidential, Hartford, CT

Big Data Architect /Data / ETL Lead engineer - Hadoop Architect - Technical Lead

Responsibilities:

  • Designing and architecting a Horton Works based Big Data platform - Data Lake for Supporting Confidential Retirement, Finance, Actuarial and Web data analytics requirements.
  • Designing different Zone’s for Organizing data in Data Lake, based on data domain on HDFS.
  • Designing and Developing Data Ingestion for integrating data from multiple source systems like Mainframes, SAP - BPC, Oracle, Web logs and other unstructured data using Talend Real Time data integration Tool. And building integration data marts using Netezza for supporting analytics and cognos dashboards and SSAS OLAP cubes. Architecting and building out Semantic Layer for BI.
  • Worked on different files types like JSON, csv, Unstructured etc.
  • Building out Spark jobs in Talend for improving performance for in-memory processing.
  • Involved in converting Hive/HQL queries into Spark transformations using Spark RDDs, Scala and have a good experience in using Spark-Shell and Spark Streaming.
  • Exposure in data Ingestion and handling clusters in real time processing using ApacheKafka.
  • Developed generic Sqoop scripts and Talend Jobs for importing and exporting data between HDFS and Relational Systems like Oracle, Mysql, SAP - BPC and Netezza.
  • Exposure in Creating Cassandra tables to load large sets of structured, semi-structured and unstructured data coming from Linux, NoSQL and a variety of portfolios.
  • Involved in executing various Oozie workflows and automating parallel Hadoop MapReduce jobs.
  • Working with vendor and client support teams to assist critical production issues based on SLA's.
  • Created Talend Mappings to populate the data into dimensions and fact tables. Good Hands on developing Talend DI Jobs to transfer the data from Source views to Hadoop Staging, Target Layers
  • Developed jobs to move inbound files to vendor server location based on monthly, weekly and daily frequency in Talend. Wrote the Map Reduce jobs to parse the web logs which are stored in HDFS.
  • Knowledge in ingesting data into Cassandra and consuming the ingested data from Cassandra to Hadoop Data lake. Familiar on using Python and Groovy and HDFS commands.
  • Good understanding in the process ofCassandra data modelling and building efficient data structures.
  • Exposure in using Solr Cloud implementation to provide real time search capabilities on the repository with tera bytes of data. Involved in restarting failed Hadoop jobs in production environment.
  • Wrote the Map Reduce jobs to parse the web logs which are stored in HDFS.
  • Participated in designing Data governance and Lineage using Talend Metadata Manager.
  • Leading the Technical Team, conducting interviews, assigning tasks, and follow up.
  • Architecting and Building out Semantic Layer based on Lake and Netezza for supporting BI and KPI’s reporting and for supporting data analytics based on lake.

Environment: Horton Works HDFS, Hive, Hbase/Phoenix, Pig, Oozie, Cassandra, Sqoop, Apache Kafka, Linux, Talend real time integration, TAC, Talend Metadata Manager, Spark, Confluence, GitHub, Nexus, Netezza, Cognos 10, 11, Kerberos, Oracle 12 C, Pl/SQL, Python, Sql Developer, Toad, Hue, Aginity Work Bench, Zaloni, Bedrock, Mica, SQL server Management Studio, SQL, Pl/SQL, T-sql, SSIS, SSAS, Data tools, Accord information model etc.

Confidential, Bloomfield, CT

Data - ETL Engineer - Architect - Technical Lead - Hadoop and Teradata – EDW and data lake

Responsibilities:

  • Involved in importing and exporting data between Hadoop data lake and Relational Systems like Oracle,, DB2, Unstructured data and Teradata using Sqoop
  • Participated in architecting a Teradata based IDS and Enterprise EDW for Cigna.
  • Worked on Creating Hive partitions, writing custom serde’s and creating data marts.
  • Developed script which will Load the data into Teradata from different source systems and do analytics dashboards for supporting Insights using Tableau and Looker.
  • Enhanced HIVE queries performance using TEZ for Customer Attribution datasets.
  • Involved in loading data from UNIX file system to HDFS and also responsible for writing generic scripts in UNIX. Involved in developing and testing Pig Latin Scripts.
  • Managing and scheduling Jobs on aHadoopcluster using Oozie.Involved in troubleshooting errors in Shell, Hive and MapReduce.Worked on debugging, performance tuning of Hive & Pig Jobs.
  • Wrote Hive Generic UDF's to perform business logic operations at record level.
  • Implemented Partitioning, Dynamic Partitions and bucketing in HIVE for efficient data access.
  • Developed workflow in Oozie to automate the tasks of loading the data into Hive tables.
  • Involved in running queries using Impala and used BI tools to run ad-hoc queries directly on Hadoop.
  • Generated various reports using Tableau with Hadoop and Teradata as a source for data.
  • Written Map Reduce code that will take input as customer related flat file and parse the same data to extract the meaningful (domain specific) information for further processing.

Environment: Teradata 15.10, Query Grid 2.0, Cloudera, MapReduce, HDFS, Hive, Pig, Oozie, Sqoop, Apache Kafka, Storm, Impala, Linux, Tableau, Jira, Confluence, GitHub, Jenkins, Udeploy., Data Stage 9.0, Informatica Power Center 9.5, Oracle 11 g, DB2, TPT, Looker, Esp Scheduler, View Point, Erwin 8

Confidential, Bloomfield, CT

Senior Application Architect – ETL Data Engineer – Enterprise Data Warehouse

Responsibilities:

  • Designing the required infrastructure for the application, to meet the committed services level.
  • Participating in the business meetings to gather the requirements and data needs.
  • Participating in the design of conceptual and logical data model for different subject areas.
  • Designing proper security for the production data access to make sure HIPAA and IIPI rules are strictly followed and applied in it.
  • Participating in defining different project scopes, high level from an Architecture perspective. Creating blue prints, architectural artifacts and holding walk through’s with business and senior management.
  • Designing ETL Design and architecture for the CCDR and providing the vision to the ETL development team towards reusability for ETL components towards implementing the Service Oriented Architecture
  • Designing and data feeds for different Data Marts for reporting applications to work on.
  • ETL code development and supporting different project UAT’s
  • Worked on MLR reporting, and data analysis related to HEDIS reporting
  • Wrote shell scripts to run the Cron jobs to automate the data migration process from external servers and FTP sites. Developed Control, Auditing and Balancing framework for Batch applications.
  • Created tasks for incremental load into staging tables and schedule them to run in Autosys.
  • Created mappings to load the data mart with the use of transformations like Aggregator, Filter, Router, Expression, Joiner, Sequence Generator and Update Strategy in Informatica and Data stage.
  • Supporting broad, efficient global rollouts with rapid user and supplier adoption and integration of multiple legacy systems and capture of all spend. Master data management and data governance.
  • Facilitate faster, more effective communication, compliance and collaboration with multiple stakeholder’s worldwide. Participate in the development of procurement strategies, policies and integration of systems across product lines, Identifying cultural and other business issues.
  • Leading the Implementation of Ariba Spend Management Solutions, and standards based such as UNSPSC and spend visibility at invoice line item level ( SKU) for the enterprise to gain control and improved visibility into entire procurement lifecycle, through integration of multiple source legacy and ERP systems. The integrated source systems include SAP, and packaged Products like BPCS.
  • Designing the data flow diagrams and developing the Conceptual, logical and physical star data model for different subject areas like sales, planning, and procurement. Designing the overall process flow for the applications and co-coordinating with stakeholders, and getting the business approval. Developing XML based, standard extract format for integrating the data, across different source systems and for the re-usability of ETL design / code.

Environment: MS SQL Server 2012 and 2008, TSQL, SSAS, Query Analyzer, Sybase PowerDesigner 12.5, IBM Information Server, Data Stage 7.5 and Websphere DataStage 8, DataStage administrator, DataStage Designer, Director and DataStage Manager, XML, XML Spy, Business Objects Enterprise XI Release 2, Business Objects Planning, SAP, SAP BW, BPCS(Business Planning and Control System )

Confidential

Teradata and ETL and Data Solution Architect

Responsibilities:

  • Assist project teams in Blueprints, Holding design walkthroughs Assist in performance measurement and monitoring of production Teradata environment. Define and develop performance-tuning measurements for the large in size Teradata database. Define infrastructure and application standards and approaches – hardware, software and methodology
  • Designing the data integration flow and diagrams / road maps, to use XML, from rapidly changing SOA web based source systems. Analyzing different types of XML documents ( free form, DTD, XML Schema etc.) to find the entity references and relationships, to use in the data design.
  • Design of the ETL processes and design of data warehouse control infrastructure
  • Providing insight to EDW Teradata Amp and Node Server farms.
  • Designing re-usable ETL components to be useful across different data domains and lines of business
  • Designing high level ETL architecture for different Warehouse projects. Working with Vendors in the development, testing and production rollouts of applications.
  • Developing processes for ACCORD XML transactions, to conform to the industry standard.
  • Developing processes for reconciling daily trading activities in stock exchanges in USA and U.K.
  • Designing and developing the ETL frame work and processes using Informatica 8.0, to run on the Domain, Integration Services and Nodes, on a Highly Available Grid ETL Infrastructure. Excellent hands on knowledge on Integration Services and Load Balancing using Powercenter 8.0. Developing T-Sql routines in Sybase and Sql Server
  • Experience on data sources like XML, Pdf files, excel, relational and mainframe files. Performance tuning the ETL processes, to use the Informatica Push Down optimization, in the staging and source databases, thus maximizing the flexibility and providing optimal performance for data and process intensive transformations

Environment: Informatica Power Center 8.5, Designer 8.5, Workflow Manager 8.5, Repository Manager 8.5, Workflow Monitor, Sybase, MS Sql Server, T-SQL, Dbartisan, Lotus Notes, Windows, Unix, Cygwin, Perl, Power Designer, XML, Visual Source Safe, .Net etc.

The Hartford Insurance

Lead ETL / Data Engineer – Data Marts and Reporting and ODS

Responsibilities:

  • Involved in Designing of ODS, Star Schema and Staging area.
  • Used Striva Detail (Power Connect) to Read and Write data from mainframes files and Cobol Sources. Did Incremental Updates, used constraint based loading etc.
  • Used session, mapping, workflow variables, Created partitions, databases indexes and fine tuned the mappings/sessions to improve the performance of the ETL process
  • Participating in concept through design/development/testing of the entire ETL Process
  • Used AUTOSys on Unix platform to Schedule the ETL Process, using PMCMD, wrote Unix Shell Scripts.
  • Leading the Data Warehouse – Reporting initiatives
  • Participate in the Analysis and Design of the Conceptual, Logical and Physical Star Schema.
  • Creating and administering Oracle Database.
  • Designing the Staging/ODS Environment for the Warehouse using Oracle Replication to Minimize impact of ETL in Transactional System. Configuring the Windows 2000 Server for Database Creation. Capacity Planning and Designing Database Physical Architecture
  • Administering the Power Center Repository. Backup/Recovery of PM Server, Created and Migrated Repository and Folders for Development/Testing/Production Environment
  • Applying Patches to Database, Migrating the Database, Installing the Software Upgrades.
  • Monitoring the Data Loads and Data Cubes Creation, 24X7 Production on call Support

Environment: Oracle9i, Enterprise Manager, SqlPlus, Oracle Management Server, Cognos Impromptu, Powerplay, Informatica Power Center 6, Designer 6, Repository Manager, workflow Manager, Workflow Monitor, Windows 2000, Veritas Backup Exec 8.60. Apache, Erwin, SAS

We'd love your feedback!