Hadoop Developer and Production Support Analyst Resume

SUMMARY

Total 14+ years of IT experience with a strong background in analysis, designing, development, customization and testing in backend development.
Analysis, designing and development in Hadoop infrastructure using Hive, Sqoop, Oozie, Kafka,Spark Python and Scala.
Experience in working with Cloudera distribution with various nodes cluster running Spark on YARN.
Hands on experience in using Hive to extract, transform and load (ETL) data into a reportable format using Spark environment.
Experience in importing and exporting Gigabytes of data between HDFS and Relational Database (Oracle,Teradata and MySQL) Systems using Sqoop.
Knowledge about NoSql Database like HBase.
Experience in using different file formats like Avro, Parquet, ORC File etc. in Hive and Spark.
Clear understanding of Hadoop architecture and various component like Resource manager, node manager, Name node, Data node HDFS etc.
Used Tableau to connect with Hive for generating daily reports.
Programing knowledge to get the data from Kafka using PySpark and load into Hadoop distributed file system and other downstream applications.
Hands on experience in designing ETL through Oracle PL/SQL, Informatica power center 9.1. Experience of Data warehousing concepts like Dimensional modeling like SCD - 1, SCD-2 etc., different schemas (Star, Snowflakes etc.), Demoralization etc.
Proficient in Data analysis, Data modeling (Physical and logical), That Include ER diagrams, DFD (data flow diagrams) and Database designing.
Having Experience to work on Production support project consist of Ticket investigations, tracking the regular data loading jobs, resolve the tickets etc.
Proficient in performance tuning using through execution plan,Hints,Bulk Binding, Pipelined Functions, Partitions, index etc.
Having experience of UNIX Shell programing using BASH,KSH.
In addition to technical skills, having good exposure on handling a team, project planning, estimation and project methodology like Agile and waterfall.
Experience of Core Java.
Hands on experience on Tableau(BI Reporting) Desktop versions 8.2, Tableau Reader and Tableau Server.
Pyspark and Scala programing experience.
Good understanding of Amazon AWS.

TECHNICAL SKILLS

HR, Finance (Credit Card Application and Loan) Manufacturing
Unix, windows
Oracle 12c, Oracle 11g, Oracle 9i, Oracle 8i,Mysql,SQL server 2008
PL/SQL, T-SQL,UNIX Shell scripting, Core java, Python,Scala.
Toad, Cloudera, Hadoop YARN, Spark, Hive, Sqoop, Eclipse,
SQL Developer, SQL loader, Power center 9.1,Powercenter 8.x
6i, Oracle Report 6i, HP Quality Center, PVCS,SVN, AutoSys,Erwin

PROFESSIONAL EXPERIENCE

Confidential

Hadoop Developer and Production Support Analyst

Responsibilities:

A very large enterprise data ware housing system that gets feed from various applications. PFDS maintains records for neglected or victim children. DHS workers further investigate on report and determine whether the child is for eligible of service or not. There are many kind of service (In-home service, foster care, Adoption, Permanent legal custody etc) provided by DHS or via preferred agency. DHS also connect with State via CWIS (Child welfare information System) where it sends investigation outcome to states. Also maintain another data warehouse for various metrics of student records for data analytics.
The reporting process is to extract the data from different data warehouse system and Sqoop the data into Hadoop FS. From HDFS, there are few scrambling logic, transformation and enrichment logic built on top of PySpark and finally transformed dataset has been shared to business intelligence tools.
Responsible for building a framework which is being used for Sqoop the data and store into Hadoop distributed file system for raw zone. After raw zone there is a need to transform the data into
Created SQOOP jobs for importing /exporting data to/ from RDMS into Hadoop ecosystem.
Developed shell scripts for loading data via SQOOP import.
Created Hive tables, loading and analyzing data using hive queries in Hadoop.
Developed Hive queries to process the data and generate the data cubes for visualizing.
Implemented schema extraction for Parquet and Avro file Formats in Hive.
Performance tuning of HIVE queries in Hadoop ecosystem.
Created SPARK jobs using Pyspark.
Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
Developed Python scripts, UDFs using both Data frames/SQL in Python and RDD/MapReduce in Spark for Data Aggregation, queries and writing data back into OLTP system through Sqoop.
Processed data of various file format like JSON, csv, textfile etc. in Spark using Pyspark.
Improved performance of Spark application through Cache, Broadcast join etc.
Experience in writing applications using python using different libraries like Pandas , NumPy.
To understand requirement and create/modify database objects as per requirement.
Involved in data migration project. understand old data model and application and migrating data into new application, which involve creating new/modifying objects in oracle like table, views, stored procedures, function etc.
Responsible for performance tuning of DB objects using explain plan, Hints, Query Tuning etc.
Implemented few features in Oracle12c for performance improvement.
Perform logical and Physical data modeling using Erwin.
Responsible for L3 production support.
Closely working with Data Analytics team and providing SQL queries for business analysis and BI report designing.

Environment: Oracle 12g, UNIX, Control-M,Hadoop,Sqoop,Hive,Spark.

Confidential, New York

Hadoop /Oracle Developer

Responsibilities:

Worked extensively with Sqoop for importing data into Hadoop FS from different data source like Oracle, Teradate,SQL server etc.
Involved in creating Hive tables, loading data into Hadoop FS and analyzing data using hive queries.
Developed Hive queries to process the data and generate the data cubes for visualizing.
Used Tableau to connect with Hive for generating daily reports.
Implemented Partitioning, Dynamic Partitions, Buckets in HIVE and reading/writing data to/ from Hadoop FS.
Created UDF in python scripts on SPARK environment for data processing.
Developed Scala/PySpark scripts for data Aggregation, queries and writing data back into OLTP system through Sqoop.
Experience in using Avro, Parquet, ORC File and JSON file formats and UDFs using Hive.
Converted existing Hadoop mapR job in Hadoop using Spark Context, Spark-SQL, Data Frames and Pair RDD's.
Improved performance of Hadoop jobs written in Sqoop also worked on tuning of Hive queries and Spark jobs.
To understand requirement with BAs and modify / develop database object as per the requirements for backend and frontend. Also create/Modify Oracle procedure, Function, Packages. Refcursors, Views etc.
Converted existing BO reports to tableau dashboards
Developed Tableau data visualization using Cross tabs, Heat maps and Whisker charts
Utilized Tableau server to publish and share the reports with the business users.
Implemented Partitioning, Dynamic Partitions, Buckets in HIVE for data warehouse system.
To create logical and physical data models using Erwin.
Created DFD and database documentation Also provide production /UAT support for issues.
Importing and exporting data through SQL loader and Catalog Data analysis
Used Oracle BULK BINDING,Hints, INDEXs, table partition etc. for performance improvement.
To follow best programing practices to avoid future performance issues.
Created SQL scripts for Prod/UAT/QA deployment.

Environment: Oracle 12g, UNIX, Autosys,Hadoop,Sqoop,Hive,Spark, Tableau.

Confidential, Wilmington, DE

Sr Oracle ETL Developer

Responsibilities:

To involve in agile grooming, planning and task allocation to team members.
Involve in Data modeling with Data modelers.
Create and customization Database jobs and Data stage jobs as per requirement.
To provide requirement documents and runbook to Control - M team for job creation.
Responsible to performance tuning issue and creating/modify DB objects like Procedures, functions, packages etc.
Responsible for production issue and provide on call support for varies partners.
Coordinate with release management for prod deployment.
Performance tunings of Batch jobs.
Data modeling through Erwin.
Modifying UNIX scripts for batch jobs and also using scripts for file transfer.
Involve in quantitative analysis and providing summarized data for different type of analytical reports.

Environment: Oracle 11g, UNIX, Windows, Control-M

Confidential

Sr Oracle Informatica ETL Developer

Responsibilities:

To involve in sprint planning session, analyze user stories, provide inputs and work estimates.
Imports data sources and Targets in various format like Flat Files, Oracle Tables for mapping designing.
To follow STTM document and design mappings using multiple transformation like Expressions, Unions, Joiner, Filter for various data source using Informatics powercenter.Also create corresponding session and workflows.
Used different data sources like SQL server, Flat files, XML etc for integration.
Used CDC for identify new/updated records.
To create JIL scripts for job scheduling in Autosys.
To involve in integration testing (End to end) through Autosys and fix the issues if comes.
Implemented oracle exchange partition, Bulk Update, Bulk insert using stored procedure transformation for performance improvement for very huge data set.
Created a generic procedure using Dynamic SQL for data load from staging to Target table.
To create huge data set through PL/SQL programs for performance testing.
Created package, function, procedures and Dynamic queries. Also Involve in PL/SQL and SQL tuning.
Used PRAGMA AUTONOMUOS TRANSACTION for auditing in batch processing. Also used exception.
Used Index by table, Nested table, Dynamic SQL for BULK binding. Also used Exception handling using LOG ERRORS and SAVE Exceptions clauses.

Environment: UNIX, Windows, Informatica Powercenter 9.1, Autosys.

Confidential, Austin, TX

Sr Oracle Developer

Responsibilities:

Initiate, monitor and manage loading of PDM data from the ODS (Operational Data Store).
Review results of PDM loading processes - Validation and Verification reports.
To involve in project planning calls and provide estimates to Business Analyst and Project managers.
To Involve in Database architecture design and make changes in Database as per requirements.
To Design logical (ER diagrams, DFD) and physical data (table, Constraints etc.) model as per requirements.
Developed SQL scripts, packages, procedures, cursors, tables, views, materialized views and functions as per the business requirements.
Used STM (Source target Mapping) document for ETL designing that include One to One and One to many mappings and implemented BULK Binding, dbms parallel for performance improvement in large Database.
Involved in maintaining batch processing PL/SQL procedures for data extraction for feed creation.
Improved Query performance using EXPLAIN PLAN, TKPROF,indexing,Hints, re-writing etc.
Implemented Range and List partition in performance improvement for very large DB.
To create/customize SSIS package to load ODS date into Different SQL Server DBs for Student metrics.
Fixed performance issues through SQL queries modification in SSIS, SQL server 2008 and Oracle DB.

Environment: Java, Oracle 11G, SQL Server 2008, SSIS

We provide IT Staff Augmentation Services!

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship