We provide IT Staff Augmentation Services!

Big Data Architect Resume

Chicago, IL


  • Accomplished software professional with great amount of technical skills and a passion for resolving complex problems at ease.
  • Adept at maintaining focus on achieving the right end results while formulating and implementing advanced technology and business solutions to meet a diversity of business needs.
  • Performed individual and leadership roles involving Enterprise software and consulting services, catering to a variety of different areas across many industry verticals and technical areas of focus include Big Data, business intelligence/data warehousing, data integration, master data management.
  • 11+ years of experience in Big Data, Cloud, Data Warehousing (Dimensional Modeling, Database design and ETL) and Business Intelligence (Report, KPI, Scorecards, Dashboards)
  • Subject matter expert on Big Data ventures for Analytics and replacement/extension of relational database system to big Data solution for Data Warehouse projects. SME on designing hybrid data models to replace conventional DWH system and to migrate to Hadoop platform.
  • Confidential Big Data certified developer having 5+ years of experience in end - to-end Big Data implementation with strong experience on major components of Hadoop Ecosystem like Hadoop Map Reduce, HDFS, HIVE, PIG, HBase, Zookeeper, Sqoop, Oozie, Flume, Apache Spark, BigSQL, MongoDB, BigSheets
  • SME in implementing end-to-end Big Data solution for migration and supplementing existing enterprise datawarehouse on Hadoop.
  • Experienced in building data lakes and data hubs on HDFS from ground up covering industry wide best practices in data injection, query able archival and near to real time analytics.
  • Expertise in building data warehouse and data marts for performing analytics on structured data using Hive queries, operations, Joins, tuning queries, SerDe's and UDF.
  • Adept in developing end to end ETL on hadoop using map reduce, spark and pig
  • Expertise in implementing semantic layer for data presentation using Hbase and visualizing data using Cloudera work bench for analytics Confidential Infosphere Bigsheets, Zepplelin
  • Adept in creating secure information access layer on top of big data technologies using oAuth2 and micro service architecture.
  • Databricks certified Spark developer adept in developing ETL using RDD, dataframes and dataset with structured as well as unstructured data.
  • Expertise include creating semantic layer on nosql databases like Hbase and MongoDB and enabling data visualization layer using Zeppelin, BigSheets
  • Technical consultant for Analytics and machine learning on huge dataset. Designed machine learning models for predictive analysis using tech stack on Hadoop, Apache Spark (Mlib) and Zeppelin for Insurance sector.
  • Technical SME on creating from ground up illustrating how these data integration techniques can meet specific business requirements reducing cost and time to market using PaaS Confidential Bluemix. Part of technology architecture team to make decisions on adoption of technology on Hadoop platform. Conducted extensive use case analysis on various tools like reporting on Hadoop - zeppelin, notebooks, tableau as well as ETL for big data on Confidential Infosphere Big Integrate, Informatica BDM, Talend Data studio for data integration.
  • Data Architecture (Conceptual, Logical, Physical) experience involves building Normal form & Star Schema {Dimension (Slowly Changing, Snowflake, Conformed Dimensions), Fact, Metrics, and KPI} using Erwin.
  • Architected ETL framework for credit data warehouse system to increase the reusability and maintain modularity of ETL components in data integration.
  • Architected virtual data environments in development and test environment to simulate complex source systems.
  • ETL Architecture experience involves defining different layers (landing/staging, recovery point layers,) complete ETL schedule and flow to fit SLA, mapping design, optimization, Error handling and Workflow Design on Confidential Information service (Datastage 5.x.x) and Informatica Power Center 7x, 8x, 9x. Hands on mainly on Informatica and Datastage. Worked on relational Databases like Oracle Exadata, Oracle 9i/10g/11g/12c, DB2 9.5, Netezza, Teradata and nosql database like MongoDB, Cassandra and Hadoop ecosystem storage like Hive, Hbase.
  • Extensive hands on experience on Web and Application development technology like PHP, Node.js and Scala
  • Lead projects in agile environment to design and develop quick prototypes for end-to-end solution.
  • Coached technical teams on paradigms like Test driven development and Acceptance test driven development.
  • Designed and developed automation suite for characteristic testing and end to end testing for Data Integrator environment.
  • Strong expertise as solution architect designing and delivering solution deployment architecture, Infrastructure sizing, performance and scalability options, benchmarking & tuning (application, database and hardware) for test, development and production environment
  • Worked with high performance teams developing and implementing state-of-art data warehouse applications. Ability to understand (or get the customer to articulate) the business needs in terms of business value
  • Managed and mentored (10+) team technically to deliver solutions on agile environment, both collaborate and company.
  • Real technology enthusiast, spend time on blogs, social development, Meet Ups and volunteer teaching technology. Some of the works can be found on GitHub


Platforms: Hadoop Ecosystem (Infosphere BigInsights/ Cloudera - CDH 5.10/5.9) - Apache Spark, Map Reduce, Hbase, Pig, Hive, Cassandra, Zeppelin

Services: Micro service architecture, reactive micro services using Akka, Scala

Data warehousing s: Dimensional Modeling - Star, Snowflake, Fact Table, Dimension Table, Slowly Changing Dimension, Metrics, KPIs, OLAP

Domain Expertise: Banking and Insurance (campaign management, credit, customer, master data management, billing and customer service), Telecommunication

Languages: Scala, Python, Node.js, PHP, Ruby

ETL Tools: Infosphere Datastage, Informatica Power Center 7x, 8x, 9x

Test Automation: Ruby - Cucumber

Data Modeling Tool: Erwin

Source Control Tool:: Subversion, Visual Source Safe

Databases: Oracle 9.x/10g/11g/12c,Exadata, MySQL, Teradata 13, Netezza, DB2 9.5, MongoDB, Cassandra

Project Management: MS Office Suite, MS Project, MS Visio 2003

Reporting and Analytical Tool:: Cognos 8, Business Objects XI Business Objects Web intelligence SDK, Informatica Power Analyzer 3.5.2

Master Data Management Products: Confidential Infosphere MDM


Confidential, Chicago, IL

Big Data Architect


  • Played solution architect role for entire operations platform built on polyglot storage including Hadoop, Teradata and MongoDB.
  • Designed Operation wide datalake on Hadoop across domains and enabled analytics on various domains.
  • Architected a secure service layer for enabling data as a service for downstream applications both internal and external
  • Designed and developed a integration framework which enables seamless integration of disparate source systems to data lake.
  • Architected the data provisioning layer into 3 zones - Bronze, silver and Gold based on user access pattern and expertise
  • Lead data science initiative for demand forecasting.

Key Technologies: Cloudera Hadoop Distribution 5.10, AWS, Reactive web services, Micro Services Architecture, Spark 1.6, Hbase, Solr, Akka

Confidential, Columbus, OH

Big Data Architect/Technology Consultant


  • Designed and designed Data Lake on Hadoop, considering data sources availability and volume of data.
  • Designed a data ingestion layer as data as a service layer which enables a seamless integration of data sources to the data lake
  • Designed a data archival system, which enables quick interface to query and reporting tool.
  • Designed data hub based on subject area on Hive and Hbase.
  • Designed and developed an analytics exploratory zone on Big Data project (Google Analytics). Project gives insight about the product trends obtained from Google Analytics on Nationwide websites.
  • This is a one shot project, which read data from Google Big query and unloading in on in-premise HDFS.
  • Architected and designed data ingestion layer for acquiring data from Google - using Big Table APIs
  • Designed a data hub, which serves as an exploratory zone.
  • Created machine learning patterns and models using Apache Spark Mlib for analyzing customer behavior on Nationwide product portal
  • Designed and developed SmartRide project on Hadoop. Project goal is to have drivers scoring models developed based on the input obtained about driver’s data (from OBDII devices in vehicles). Program is partnered with third party vendors who provide data to Confidential (SmartPhone applications, OBDII - Lexis Nexis, Orion (General Motors)). Drivers score is eventually used to determine the premium paid and driver discount etc.
  • Lead the team to develop big data components on Hive, Pig and Spark for enabling Data warehouse capability in Big Data.
  • Designed and developed machine-learning algorithm with google analytics data to determine the product interests, return customer, new customer conversion rate etc.
  • Architected and designed query-able archive, resource management extension components for DW projects in Big Data (Hive storage)
  • Developed test framework for automated test suite, give capability to test drive development on HDFS and Hive.
  • Member of technology architecture team to decide on tools / technology to be used on Hadoop platform. Conducted extensive use case studies on ETL tools on Big Data - Confidential BigIntegrate, Talend Data studio, Informatica BDM as well as on reporting/ visualization tools like WebFocus, Zeppelin, Notebook and Tableau.

Key Technologies: Infospshere BigInsights- Apache Spark,Hive, HDFS, Node.js, MongoDB, Scala, Java

Confidential, Columbus, OH

Technology Consultant


  • Designed and lead development of new data integration system for commission calculation for Nationwide Exclusive and Independent Agents.
  • Lead the team to adopt true acceptance test-driven development.
  • Designed integration framework to group the similar pattern ETL (Extract, Transform, Load) objects and increase the reusability of the code.
  • Architected and designed end prototype during design phase and with the automated test suite, test driven development was made possible with short iteration (DDIT cycle)
  • Designed dashboards and sample reports ahead of the DDIT cycle to get consensus from business about the end goal.
  • Modeled the data integration system on Oracle 11 g /12c, following Inmon’s approach.

Key Technologies: Infospshere BigInsights, Informatica 9.1, Perl, Business Object XI, Erwin, Oracle 12.C, Ruby, Ruby-Cucumber (framework), MapReduce, Java

Confidential, Columbus, OH

System Test Lead & Test Data Architect


  • Work close with requirement analyst & Workshops with business stakeholders to understand the overall functional requirements
  • Understand the system impact and strategize master test plan for all the system
  • Designed TDW implementation Logical and Physical data model
  • Worked on a PoC for moving customer service billing DB2 model to nosql database (MongoDB)
  • Worked on a PoC for data virtualization using green hat.
  • Designed a logical and presentation layer for feeding/ retrieving test data with ease from billing system.

Key Technologies: Confidential z/OS mainframe systems, SoapUI, Node.js, Expressjs, MongoDB, Ruby-Cucumber

Confidential, Dublin, OH

Tech Lead


  • System analysis and development of mortgage system Into Integrated credit data warehouse which follows Confidential banking data warehouse model
  • Responsible for leading a project team in delivering solution to our customer in the finance sector.
  • Responsible for analyzing the current client framework for ETL and feasibility study of integrating the new legacy systems to the Big Data platform.
  • Deliver new and complex high quality solutions to clients in response to varying business requirements
  • Responsible for effective communication between the project team and the customer. Provide day to day direction to the project team and regular project status to the customer.
  • Translate customer requirements into formal requirements and design documents, establish specific solutions, and leading the efforts including programming and testing that culminate in client acceptance of the results.
  • Developed file extraction and archival system on Hadoop (HDFS) for data warehouse.

Key Technologies: Cloudera CDH, Confidential BigInsights, Oracle11g, Teradata 13, Confidential Banking data warehouse, Unix shell script, Hadoop, Apache Spark, Apache Kafka, Mapreduce


Tech Lead


  • Feasibility analysis and estimation of moving campaign data mart from RDBMS to Hadoop platform. Integrating the data federation layer to nosql database (Hbase).
  • Lead the development of converting ETL built on Informatica and Netezza to Map- Reduce. Hbase
  • Estimation, Analysis, Development. Involved in analysis and design of Migration project from oracle to Netezza in same account. Participate in client meet for requirement briefing for AK BANK maintenance project and migration project.
  • Work with Business Analyst in translating business requirements into Functional Requirements Document and to Detailed Design.
  • Lead analysis sessions, gather requirements and write specification and functional design documents for enhancements and customization; Analyze product impact
  • Coordinate and communicate tasks with developers
  • Ensure that development is performed as per requirements
  • Communicate activities/progress to project managers, business development, business analysts and clients
  • Designed and developed power exchange extraction from VSAM files.
  • As a value add, developed a dashboard for monitoring ETL load progress which gives on duty person a single window view of the load stats and send out alters as and when the jobs succeed or fail. Dashboard is developed in PHP 5.

Key Technologies: Cloudera CDH, Sqoop, MapReduce, Informatica 8.6, Oracle 10.g, Exadata, Netezza, PHP 5, Unix shell scripts


Tech Lead


  • Analysis, HLD creation and co-ordination of development activity. CPW DE dataware house was not a part of integrated and centralized DWH in UK. This was giving business trouble in terms of analysis, commission calculation, reporting etc. Project was planned in phases and I was responsible for Integration of service provider data from Vodaphone and T-Mobile which required extensive analysis of DE DWH and UK DWH. HLD was delivered and after business approval proceeded for development.
  • System Analyst, single point of contact for the migration of clients German based system to central UK system. The project was executed in phases. First phase of the project covered technical migration i.e. database migration from Oracle to Netezza. Second phase details in Project 02.

Key Technologies: Datasatge 5.x.x, Netezza, Unix shell scripts

Hire Now