- Over 11 years of experience in Application Design, Development and Implementation, which includes experience in Hadoop Ecosystem, Big Data and Enterprise Resource Planning.
- Around 4 years of experience in Big data applications usingHadoop stack MapReduce, Hive, Spark, Oozie, Sqoop, Flume, HBase and NoSQL Data bases.
- Expertise in working wif Hive - creating tables, data distribution by implementing Partitioning and Bucketing, developing, tuning & optimizing teh HQL queries.
- Architected & Designed Data integration process to make re-usable and resource constraint manner.
- Designed teh real-time analytics and ingestion platform using BigData
- Hands on experience wif multiple NoSQL databases including DynamoDB and Cassandra.
- Experienced in Spark SQL and Spark DataFrames using Scala.
- Created Proof of Concepts from scratch illustrating how these data integration techniques can meet specific business requirements reducing cost and time to market.
- Hands-on experience wif "Productionalizing" Hadoop applications (such as administration, configuration management, debugging, and performance tuning)
- Hands on Experience wif HDFS, Spark (Python & Scala), Presto, Hive, Oozie & Sqoop.
- Developed In-house Frameworks based on Metadata to handle complex data integration processes.
- Led teh team of engineers and mentor team including training for prod-ops, execution and presentations to product and executive teams.
- Developed Map Reduce (Python) programs to parse teh raw data, populate staging tables and store teh refined data in partitioned tables.
- Developed In-House Scheduling methodologies (used UNIX, Metadata Tables on MySQL) to replace Third-Party tools.
- Practical understanding of theData modelingconcepts like Star-Schema Modeling, Snowflake Schema Modeling, Fact and Dimension tables & 3NF.
- Excellent noledge of RDBMS Architecture and concepts and lot of experience/understanding on logical/physical database designs/models.
- Worked in an agile development environment, handling multiple concurrent tasks and prioritizing effectively.
- Handled on-shore & off-shore model to execute projects in cost effective manner.
- Commendable noledge on Spark architecture including Spark Core and Spark SQL DataFrames.
- In depth understanding of Hadoop Architecture including YARN and various components such as HDFS, Resource Manager, Node Manager, Name Node, Data Node and MR v1 & v2 concepts.
- Worked extensively on CDH and HDP distributions.
- Worked in Hadoop & Spark system engineering teams to define various design & implementation standards.
- Experienced in writing Spark programs/application in Scala using Spark APIs for Data Extraction, Transformation and Aggregation.
Apache Hadoop/HDP/ClouderaOracle: Hive 0.9/0.10/0.11/0.14/1.2.1Spark 1.2.1/1.6.X/2.0.1
MySQLPython 2.x: Oozie AWS (Stream, S3, FH, Lambda & EMR)
LinuxSQL/HQL/CQL: Unix Scripting (sh, bash) Map Reduce (MRv1, v2)
Presto YARN: CDH and HDP Distributions ORC, Parquet and Avro - Snappy/Zlib
Confidential, Hartford, CT
Sr. Data Engineer
- Analysis of requirements and implement different functions, models according to design.
- Developing Spark code using Python and Spark-SQL for large data sets.
- Involved in creating External and Managed tables in Hive.
- Generated a report based on Outbound file considering teh parameters set by Business.
- Involved in converting Hive Queries into variousSparkActions and Transformations by Creating RDD's from teh required files in HDFS.
- Created internal and external Hive tables and defined static and dynamic partitions for optimized performance.
- Developed Spark code for calling API’s for both Inbound and Outbound data sets.
- Developed data format file that is required by teh Model to perform API calls for both Outbound and Inbound usingSparkSQL and Hive query language.
- Teh JSON format file is provided to teh API calling for performing validation on teh Inbound file received.
- Monitor theSparkjobs inSparkURL and Involved in Unit test and debugging after development.
- Created managed tables in teh Hive where teh analytics data will be updating regularly.
Confidential, Denver, CO
Sr. BigData Engineer
- DevelopedSparkcode using Python andSpark-SQL for large data sets thru batch processing.
- Involved in creating Hive tables, loading structured data and writing Hive queries.
- Developed OOZIE workflows for automating Sqoop, Spark and Hive scripts.
- Developed process to read files and performed ETL through spark RDD and data frame. Wrote UDF on Spark data frames.
- Used Sqoop to perform data transfers across applications involving HDFS and RDBMS
- Involved in loading and transforming large sets of Structured, Semi-Structured data and analyzed them by running Hive queries.
- Involved in designing and developing HBase tables and storing aggregated data from Hive table.
- ImplementedSparkRDD transformations to Map business analysis and apply actions on top of transformations.
- Involved in converting Hive/HQL queries into Spark transformations using Spark RDD, Scala and Python.
- Expertise in processing large sets of structured, semi-structured data in Spark & Hadoop, and store them in HDFS.
- Develop generic SQOOP import utility to load data from various RDBMS sources
- Load and transform large data sets of structured, semi structured and unstructured data usingHadoop/Big Data concepts.
- Developed SQL queries into Spark Transformations using Spark RDDs, DataFrames and Scala, and performed broadcast join on RDD's/DF.
- Developed Spark code using Scala and Spark-SQL for faster testing and processing of data.
- Developed Spark RDD transformations, actions, and DataFrame's, case classes for teh required input data and performed teh data transformations using Spark-Core.
- Used Scala programming as well to perform transformations and applying business logic.
- Developed Hive queries in Spark-SQL for analysis and processing teh data.
- Worked wif Spark Context, Spark-SQL, DataFrames, Pair RDD's, Spark Streaming.
Confidential, Austin, TX
Data Warehouse Engineer
- Implemented partitioning, dynamic partition, indexing and buckets in Hive.
- Involved in review of functional and non-functional requirements.
- Worked on Partitions, Bucketing concepts inHiveand designed both Managed and External tables inHivefor optimized performance.
- Worked wif RC/ORC & Avro Hive tables wif Snappy compression.
- Developed shell scripts for running Hive scripts in Hive and Impala.
- Created Oozie workflows to run Hive, Unix shell scripts, MapReduce and Python programs.
- Transformed and aggregated data for analysis by implementing work flow management of Sqoop and Hive scripts.
- Implemented map or reduce side joins to process large data sets.
- Involved in importing data from Oracle tables to HDFS and Hbase tables using Sqoop.
- Developed job flows using Oozie for scheduling jobs to manage apacheHadoopjobs
- Designed and implementedHive queries and functions for evaluation, filtering, loading and storing of data.
- Involved in loading data from UNIX file system to HDFS.
- Implemented map or reduce side joins to process large data sets.
- Responsible for writingHivequeries for data analysis to meet teh business requirements.
- Used Hive to create ad-hoc queries and data metrics for Business Users.
- Experienced in managing and reviewingHadooplog files.
- Responsible for creatingHivetables and working on them usingHiveQL.
- CreatedHivetables to store teh processed results in a tabular format.
- Developedhivequeries and UDF’s to analyze/transform teh data in HDFS.
- Designed and Implemented Partitioning (Static, Dynamic) and Buckets inHIVE.
- Modeledhivepartitions extensively for data separation and faster data processing and followed by andHivebest practices for tuning.
- Supported in setting up QA environment and updating configurations for implementing scripts wif Pig,Hiveand Sqoop.
- Implemented merge process to supplement update process.
- Data Extraction Framework: I has developed data extraction framework based on metadata, which will has details about host and tables that need to extract. Wrapper process will depend on metadata and extracts data as files on to SAN then process copies data into HDFS. Framework supports most of teh databases (MySQL, Oracle etc.). We just need to configure in metadata tables of wat need to fetch data from different sources.
- Data Acquisition API: I has created/adopted open standard API (any format (json, xml, csv etc..) to stream data set from application servers/web based applications.
- Data Translation Framework: I has developed framework to map HDFS files to tables in hive wif pre-defined serde from apache (regex, json, csv etc.). Once we has files in HDFS, we just need to provide path and table details then wrapper process takes data from configuration file and alter partitions at date level by checking done files based on date.
- In Hive, we has drawbacks like updates are not available, but introduced merge process to overcome update drawback.
- Extensively used all sorts of joins (map, reduce side joins) and processed large data sets wif efficient manner.
- Created UDF’s to support in house requirements which are not part of Hive like Rank & OLAP functions.
- Involved in analysis towards customer engagement, monetization, acquisition and title level comparisons.
- Developed framework to find out positive and negative responses regarding to games/features in game from forum data based on keywords (regular expressions).
Data Integration Developer
- Worked in Production Support Environment as well as QA/TEST environments for projects, work orders, maintenance requests, bug fixes, enhancements, data changes, etc.
- Performed Script optimizations inPL/SQL
- Developed Complex database objects like Stored Procedures, Functions, Packages and Triggers usingSQLandPL/SQL
- Handled errors using Exception handling extensively for teh ease of debugging and logging error messages in application. Worked onSQL- Loader to load data from flat files obtained from various facilities every day.
- Created and modified several UNIX shell Scripts according to teh changing needs of teh project and client requirements.
- Developed teh Data mart for teh base data in Star Schema, Snow Flake Schema and Multi Star Schema and involved in developing teh Data warehouse for teh Database.
- Generated server sidePL/SQLscripts for data manipulation and validation and materialized views for remote instances.
- CreatedPL/SQLscripts to extract teh data from teh operational database into simple flat text files
Oracle Techno Functional Consultant
- Designing, development and implementation of Oracle Applications 11i
- BI Publisher reports for customer invoices and agreements and various templates as per business needs
- Worked on Check Printing Report as per teh Business requirement for teh Pre-printe
- Created functional (MD50), technical design document (MD70) and Installation steps document(MD120) using oracle AIM standard
- Customization of Defaulting Rules in Order Management
- Worked on Forms Personalization, Reports, Workflow, Complex Interfaces andOracleXML Publisher Reports
- Responsible for custom program maintenance during teh production support.
- Developed Interface program that validates and loads data into OM interface tables
- Involved in Registration/Migration of concurrent programs, executables, Creation of Responsibilities, request Groups, Menus, value sets, Request Sets, Forms, Functions in System Administration
- Developed stored procedures, functions and packages in augmenting teh business using PL/SQL
- Customization of Standard Reports as per teh business need
- Developed Interface Program to load item details into staging tables, validate them and insert them into Interface tables.
- Developed BOM Bills Conversion Interface
- IntegratingOrderManagement wif Purchasing in Drop Ship flow, Internal Sales Order (ISO), Back to BackOrder (B2B)andInternalRequisition andInternalSalesOrder