- More than 7+ year of IT experience with special emphasis on Analysis, Design and Development and Testing of ETL methodologies in all the phases of the Data Warehousing.
- Expertise in OLTP/OLAP System Study, Analysis and E - R modeling, developing Database Schemas like star schema and Snowflake schema used in relational, dimensional modeling.
- Experience in optimizing and performance tuning of Mappings and implementing the complex business rules by creating re-usable Transformations, Mapplets and Tasks .
- Worked on creation of the projections like Query specific projections, Pre- Join Projections, Live aggregate projections .
- Around 3+ years of experience as a Hadoop Development with very good exposure on Hadoop ecosystems HDFS, Map Reduce, Hive, HBase, Sqoop, HCatalog, Pig, OoZie and Impala .
- Performed all dimensions of development including Extraction, Transformation and Loading data from various sources into Data Warehouses and Data Marts using Power Center (Repository Manager, Designer, Workflow Manager, and Workflow Monitor).
- Queried Vertica, SQL Server for data validation along with developing validation worksheets in Excel in order to validate the dashboards on Tableau .
- Extensively used SQL and PL/SQL for development of Procedures, Functions, Packages and Triggers.
- Experienced on Tableau Desktop, Tableau Server and good understanding of tableau architecture
- Extensive work experience in ETL processes consisting of data sourcing, data transformation, mapping and loading of data from multiple source systems into Data Warehouse using Informatica Power Center.
- Experienced in developing business reports by writing complex SQL queries using views, macros, volatile and global temporary tables.
- Around 3 years of experience in developing OLAP reports and dashboards in Tableau Desktop 8.x/9.x, Tableau Server, Business Objects Desktop Intelligence, Web Intelligence, Universe Designer, Crystal Reports and Central Management Console .
- Experience in designing both time driven and data driven automated workflows using Oozie.
- Experienced with work flow schedulers, data architecture including data ingestion pipeline design and data modelling.
- Coordinating with Business Users, functional Design team and testing team during the different phases of project development and resolving the issues.
- Solved performance issues in Hive and Pig scripts with understanding of Joins, Group and Aggregation and how does it translate to MapReduce jobs.
Specialties: Data warehousing/ETL/BI Concepts, Data Architecture, Software Development methodologies, Data Modeling
Business Tools: Tableau 8.X/9.X, Business Objects XI R2, Informatica Powercenter 8.x, OLAP/OLTP, Talend, Teradata 13.x, Teradata SQL Assistant
Big Data: Hadoop, Map Reduce 1.0/2.0, Pig, Hive, Hbase, Sqoop, Oozie, Zookeeper, Kafka, Spark, Flume, Storm, Impala, Scala, Mahout, Hue, Tez, HCatalog, Storm, Cassandra.
Databases: DB2, MySQL, MS SQL server, Vertica, Mongo DB, Oracle, SQL 2008
Operating System: Mac OS, Unix, Linux (Various Versions), Windows 2003/7/8/8.1/XP
Application Server: Apache Tomcat, WebLogic, WebSphere Tools Eclipse, NetBeans
Data Engineer/Senior ETL Developer
- Analyzing the requirements and the existing environment to help come up with the right strategy to migrate the Salesforce to internal FB CRM system.
- Worked on dataswarm(Facebook tool) for Presto to process these large datasets.
- Worked on Core tables of Revenue DataFeed(RDF) that calculates the revenue of the advertisers of the Facebook.
- Worked on data extraction strategies using Facebook’s Lineage.
- Involved into testing and migration to Presto.
- Performed job functions using Spark API's in Scala for real time analysis and for fast querying purposes.
- Worked extensively with Data migration, Data cleansing, Data profiling, and ETL Processes features for data warehouses.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs and Scala.
- Experienced in writing complex SQL Queries, Stored Procedures, Triggers, Views, Cursors, Joins, Constraints, DDL, DML and User Defined Functions to implement the business logic.
- Developed Custom ETL Solution, Batch processing and Real-Time data ingestion pipeline to move data in and out of Hadoop using Python and shell Script.
- Experience in Large Data processing and transformation using Hadoop-Hive and Sqoop.
- Experience with Tableau for Data Acquisition and visualizations.
- Expertise in the concepts of Data Warehousing, Data Marts, ER Modeling, Dimensional Modeling, Fact and Dimensional Tables.
- Monitored System health and logs and responded accordingly to any warning or failure conditions.
Environment: Amazon Web Service, AWS-S3, Redshift, Apache-Hadoop, Hive, Pig, Shell Script, ETL, tableau, Agile Methodology.Confidential, Texas
Senior ETL Developer
- Worked with variables and parameter files and designed ETL framework to create parameter files to make it dynamic.
- Currently working on the Teradata to HP Vertica Data Migration Project Working extensively on the Copy Command for extracting the data from the files to Vertica . Monitor the ETL process job and validate the data loaded in Vertica DW .
- Responsible for data extraction and data ingestion from different data sources into Hadoop Data Lake by creating ETL pipelines using Pig, and Hive.
- The logs and semi structured content that are stored on HDFS were preprocessed using PIG and the processed data is imported into Hive warehouse which enabled business analysts to write Hive queries.
- Involved in designing the ETL testing strategies for functional, integration and system testing for Data warehouse implementation.
- Responsible to write Hive and Pig scripts as ETL tool to do transformations, event joins, filter both traffic and some pre-aggregations before storing into the HDFS. Developed the Vertica UDF's to preprocess the data for analysis.
- Build custom batch aggression framework for creating reporting aggregates in Hadoop.
- Experience in working with Hive data warehouse tool-creating tables, data distribution by implementing partitioning and bucketing, writing and optimizing the Hive queries. Built real time pipeline for streaming data using Kafka and Spark Streaming .
- Experienced with NoSQL databases like HBase, MongoDB and Cassandra and wrote Storm topology to accept the events from Kafka producer and emit into Cassandra DB .
- Wrote Python Script to access databases and execute scripts and commands.
- Extensive work in Custom ETL process consisting of data transformation, data sourcing, mapping,Conversion and loading using python and shell script.
- Building, publishing customized interactive reports and dashboards, report scheduling using Tableau server. Creating New Schedule's and checking the task's daily on the server.
Environment: Hadoop, Hive, Apache Spark, Apache Kafka, Apache Cassandra, Hbase, MongoDB SQL, Sqoop, Flume, Oozie, Java (jdk 1.6), Eclipse, Informatica Power Center 9.1, Tableau, Teradata 13.x, Teradata SQL Assistant.Confidential, Los Angeles
- Designed ETL process, load strategy and the requirements specification after having the requirements from the end users.
- Worked in three different profiles including Hadoop ETL reporting, Mainframe Job monitoring, scheduling and user administration. Proposed, developed backup and recovery architecture for recurring (Extract, Transform and Load) reports.
- Worked on data migration and data conversion using PL/SQL, SQL and Python to convert them into custom ETL tasks.
- Build custom batch aggression framework for creating reporting aggregates in Hadoop.
- Well versed in developing the complex SQL queries, unions and multiple table joins and experience with views.
- Experience in using Sqoop to migrate data to and from HDFS and MySQL or Oracle and deployed Hive and HBase integration to perform OLAP operations on HBase data.
- Extensive experience working with Business Intelligence Data Visualization Tools with specialization on Tableau.
- Created and Implemented highly scalable and reliable highly scalable and reliable distributed data design using NoSQL HBase.
- Developed UDFs using python as and when necessary to use in PIG and HIVE queries.
- Designed and publishing visually rich and intuitive Tableau dashboards for executive decision making. Created various views in Tableau like Tree maps, Heat Maps, Scatter plots, Geographic maps, Line chart, Pie charts and etc.
Environment: Hadoop, Hive, HDFS, Pig, Apache Spark, Tableau, My SQL, Apache Mesos, Unix Shell Programming, Hbase, Teradata SQL, Sqoop, Flume, MS Office and Delimited Flat files, Oozie, DB2, Teradata 13.x, Teradata SQL Assistant, Informatica Power Center 9.1, UNIX Shell Scripting, Toad, Windows XP and MS Office Suite.Confidential, San Francisco
- Responsible for definition, development and testing of processes/programs necessary to extract data from operational databases, Transform and cleanse data, and Load it into data warehouse using Informatica Powercenter.
- Strong knowledge of OBIEE as a Business Intelligence tool and Data Extraction using Informatica as ETL tool.
- Responsible for implementing and following the different SDLC processes as defined by organization.
- Developed ETL processes to load data from multiple data sources to HDFS using Sqoop, analyzing data using MapReduce, Hive and Pig Latin.
- Reporting Data Designer bugs in Adaptive Integration to the support and creating workarounds until the bugs are fixed.
- Used Unix Command and Unix Shell Scripting to interact with the server and to move flat files and to load the files in the server.
- Extensively worked on Teradata Database and created many objects like Micros, procedures, tuned queries.
- Developed transformation logic as per the requirement, created mappings and loaded data into respective targets.
- Extensively used Transformations like Router, Aggregator, Normalizer, Joiner, Expression and Lookup, Update strategy and Sequence generator and Stored Procedure.
- Used Informatica command task to transfer the files to bridge server to send the file to third party vendor.
Environment: Informatica PowerCenter, Oracle, UNIX, HTML, SQL, Hive, Pig, MapReduce, Python, Django, Sqoop, HDFS, Teradata.Confidential
- Involved in developing, testing and implementation of the system using Struts, JSF, and Hibernate.
- Developing, modifying, fixing, reviewing, testing and migrating the Java, JSP, XML, Servlet, SQLs, JSF, Spring and hibernate programs.
- Created enterprise deployment strategy and designed the enterprise deployment process to deploy Web Services, J2EE programs on more than 7 different SOA/WebLogic instances across development, test and production environments.
- Designed user interface HTML, Swing, CSS, XML, Java Script and JSP.
- Implemented the presentation using a combination of Java Server Pages (JSP) to render the HTML and well-defined API interface to allow access to the application services layer.
- Used Enterprise Java Beans (EJBs) extensively in the application Developed and deployed Session Beans to perform user authentication.
- Involve in Requirement Analysis, Design, Code Testing and debugging, Implementation activities.
- Involved in the Performance Tuning of Database and Informatica. Improved performance by identifying and rectifying the performance bottle necks.
- Wrote PL/SQL Packages and Stored procedures to implement business rules and validations.