- Solutions oriented Architect with notable success in directing a board range of IT initiatives while participating in Team Leading, analysis and implementation of data warehouse solutions in direct support of business objectives.
- Big Data, Hadoop, Teradata & Informatica expertise, Architectural Design & development of BI&DWH, Expertise in implementing of End - to-End BI & DW Projects, Demonstrated competencies in BI&DWH Products, Conceptualization and Creation of Flexible, expandable DWH architecture to meet business requirements.
- Outstanding leadership abilities; able to coordinate and direct all phases of project based efforts while managing, motivating and leading project teams.
- Hands on experience leading all stages of system development efforts, including requirement definition, design, architecture, testing, support
- Windows 2000/NT
- Oracle 11i/9i/8i/10g
- POWER CENTER 6i/7i/8i/9i and Power Exchange (PWX), IDQ (data quality), CDC, Qlik View, SSRS & SSAS (Reporting), Big Data, Hadoop, Talend
- SAP BO, Cognos, Micro Strategy, MLOAD,FAST LOAD,TPUMP,BTEQ, Xcelsius, HDFS,MapReduce, Yarn, Pig, Hive, Hbase, Oozie,Zookeeper, Pyspark
- ERWIN, SYBASE,, Jenkins, GitHub, Splunk,Confluence,, Oziee, Sqoop
- Requirement gathering from business user/end users and data analysis, data profiling, source system analysis.
- Prepared documentation such as Functional Specifications document, Technical or ETL Specifications document, and Reporting Layer Functional and Technical Specifications documents and presented them to the BI Team (Team of Architects-ETL/BI) for review.
- Data Modeling, Logical and Physical Data Modeling. EDW and data marts designing/modeling.
- Solution/data Architect, ETL/BI Architect, Define ETL Standards, ETL process, batch processing, bulk loading, initial data capture, change data capture, implementation process, identifying re-usability.
- Performing day-to-day operational responsibility for IT but reporting to chief information officer.
- Interact with project team to negotiate timelines, responsibilities, and deliverables and making sure deliverables on time to end client.
- Responsible for installation and configuration of Hive, Pig, H base and Sqoop on the Hadoop cluster and created hive tables to store the processed results in a tabular format.
- Configured Spark Streaming to receive real time data from the Apache Kafka and store the stream data to HDFS using Scala. Developed the Sqoop scripts to make the interaction between Hive and vertica Database.
- Processed data into HDFS by developing solutions and analyzed the data using Map Reduce, PIG, and Hive to produce summary results from Hadoop to downstream systems.
- Build servers using AWS: Importing volumes, launching EC2, creating security groups, auto-scaling, load balancers, Route 53, SES and SNS in the defined virtual private connection.
- Written Map Reduce code to process and parsing the data from various sources and storing parsed data into H Base and Hive using H Base-Hive Integration. Streamed AWS log group into Lambda function to create service now incident.
- Involved in loading and transforming large sets of Structured, Semi-Structured and Unstructured data and analyzed them by running Hive queries and Pig scripts.
- Created Managed tables and External tables in Hive and loaded data from HDFS. Developed Spark code by using Scala and Spark-SQL for faster processing and testing and performed complex Hive QL queries on Hive tables. Scheduled several times based Oozie workflow by developing Python scripts.
- Developed Pig Latin scripts using operators such as LOAD, STORE, DUMP, FILTER, DISTINCT,FOREACH, GENERATE, GROUP, COGROUP, ORDER, LIMIT, UNION, SPLIT to extract data from data files to load into HDFS.
- Exporting the data using Sqoop to RDBMS servers and processed that data for ETL operations. Worked on S3 buckets on AWS to store Cloud Formation Templates and worked on AWS to create EC2 instances.
- Designing ETL Data Pipeline flow to ingest the data from RDBMS source to Hadoop using shell script, sqoop, package and My SQL.
- Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java map-reduce Hive, Pig, and Sqoop.
- Implementing Hadoop with the AWS EC2 system using a few instances in gathering and analyzing data log files.
- Involved in Spark and Spark Streaming creating RDD's, applying operations -Transformation and Actions. Created partitioned tables and loaded data using both static partition and dynamic partition method.
- Developed custom Apache Spark programs in Scala to analyze and transform unstructured data. Handled importing of data from various data sources, performed transformations using Hive, Map Reduce, loaded data into HDFS and Extracted the data from Oracle into HDFS using Sqoop Using Kafka on publish-subscribe messaging as a distributed commit log, have experienced in its fast, scalable and durability.
- Test Driven Development (TDD) process and extensive experience with Agile and SCRUM programming methodology.
- Implemented POC to migrate Map Reduce jobs into Spark RDD transformations using SCALA Scheduled map reduces jobs in production environment using Oozie scheduler.
- Involved in Cluster maintenance, Cluster Monitoring and Troubleshooting, Manage and review data backups and log files.
- Designed and implemented map reduce jobs to support distributed processing using java, Hive and Apache Pig Analyzing Hadoop cluster and different Big Data analytic tools including Pig, Hive, H Base and Sqoop. Improved the Performance by tuning of HIVE and map reduce. Research, evaluate and utilize modern technologies/tools/frameworks around Hadoop ecosystem.
Sr. Architect, Lead & Developer
- Interact with client/business users to gather requirements, analyze and create supporting documentation of business process and flow, data analysis, data profiling, source system analysis.
- Handled importing of data from various data sources, performed transformations using Hive, Map Reduce and loaded data into HDFS.
- Involved in Architectural design. Responsible in creating the Data models
- Architect, Design, Develop, Test, Deploy and Support Big Data Applications on Hadoop Cluster with Map Reduce, Sqoop, Hbase, Pig, Hive, OOZIE,Pyspark,Spark SQL
- Responsible for gathering all required information and requirements for the project.
- Created automated scripts using python which is makes the testing team to reduce the manual effort
- Used Spark for processing data. Expertise in platform related Hadoop Production support tasks by analyzing the job logs.
- Developed UDF's to implement complex transformations on Hadoop.
- Did Spark POC to Present client using Pyspark. Implemented history based application using Pysaprk for the better performance
- Worked on large-scale Hadoop YARN cluster (600+ nodes) for distributed data Storage, processing and analysis.
- Architect, Design, Develop, Test, Deploy and Support Big Data Applications on Hadoop Cluster with Map Reduce(Java), Sqoop, Hbase, Pig, Hive, OOZIE,Pyspark,Spark SQL
- Work with Fuzion client manager for daily status updates on deliverables.
Environment: Hadoop, Hive,Python, Pyspark, Linux, Sqoop, Oozie, Kafka,UML,Teradata, Mysql, GIT, Jenkins
Lead & Architect (BI DQ PROJECT)
- Collaborate with users to gather requirements, analyze and create supporting documentation of business process and flow, data analysis, data profiling, source system analysis.
- Work with Business Analyst to complete the ETL design with the complete solution finalized.
- Manage the capacity and allocation of resources on DQ tickets. Provide estimates/work plan for all DQ development.
- Informatica ETL Architect, Define ETL Standards, ETL process, batch processing, bulk loading, initial data capture, change data capture, implementation process, identifying re-usability.
- Work with Data modelers for design of logical and physical data model.
- Work with DBA’s for reviews and completion of the creating DDL’s. Reviewing the DQ ETL Architect. Work with Offshore team and co-ordination.
- Performances Tuning & recommending tuning strategies. PL/SQL Blocks. Collect Stats. Coordinates work schedules, assignments and communications (including status and issue reporting) with customer designated Project Manager. Performance tuning by indexing, define required stats on tables.
- Informatica ETL to load Slowly Changing Dimension type 2 in Enterprise warehouse and Data Mart.
- Development of complex Informatica ETL mappings, Sessions and workflows.
- Generate COGNOS Reports, Unit Testing and move to different environments.
- Project implementation and deliverables. QA and Production deployment & Support.
- Support QA team for their testing phase and also support UAT testing.
Environment: Oracle11i, Unix, Sql Server 2008, Informatica 9.1.0, ERWIN, COGNOS, DB2, Mainframe, TOAD