We provide IT Staff Augmentation Services!

Data Engineer Resume

2.00/5 (Submit Your Rating)

SUMMARY

  • Around 10 years of experience in IT industry which includes including 3.2+ years in BigData ecosystem related technologies.
  • Around 7 years of IT experience in developing BI Applications Using Informatica, Oralce and Unix
  • 3 years of experience in NoSQl databases Hbase, Mongodb
  • 2 years of experience in installing, tuning and operating Apache Spark related technologies like Spark SQL, Spark Streaming
  • 2 years of experience in Oracle Business Intelligence Enterprise Edition( 11.x)
  • Excellent understanding / knowledge of Hadoop architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce programming paradigm.
  • Hands on experience in installing, configuring, and using Hadoop ecosystem components like Hadoop MapReduce, HDFS, HBase, Hive, Sqoop, Pig, Zookeeper, Oozie and Flume.
  • Experience in analyzing data using HiveQL, Pig Latin, HBase and custom Map Reduce programs in python
  • Extending Hive and Pig core functionality by writing customUDFs.
  • Experience in managing and reviewing Hadoop log files.
  • Experience in Managing scalable Hadoop clusters including Cluster designing, provisioning, custom configurations, monitoring and maintaining using different hadoop distributions: Cloudera CDH, HortonWorks HDP Apache Hadoop.
  • Hands on experience in application development using Oracle(SQL,PL/sqL),Pyhon,shell scripting.
  • Knowledge of various NoSQL storage technologies (Key - Value, Column-Family, Document)
  • Experience in managing lifecycle of MongoDB database including database sizing, deployment automation, monitoring and tuning
  • Hands on experience with designing and developing applications that use MongoDB
  • Good experience of integration with hadoop using mongodb
  • Experience database backups and test recoverability regularly and overall performance of the mongodb
  • Exposure on Apache Kafka and Apache YARN
  • Experience with Solr, Elasticsearch
  • Experience in Java Script
  • Experience in functional programming language Scala
  • Experience in Financial, Telecom, Healthcare Domain
  • Expert in creating SQL Queries, PL/SQL Packages, Function, stored procedures, triggers, and cursors, Created database objects like tables, views, sequences, synonyms, indexes using Oracle tools like SQL*Plus, SQL Developer and Toad.
  • Proficient in advance features of Oracle 11g for PL/SQL programming like Using Records and Collections, Bulk Bind, Ref. Cursors, Nested tables and Dynamic SQL, Oracle Advanced queues.
  • Extensively worked on ETL using Informatica - Power Center
  • Designed complex Mappings and expertise in performance tuning.
  • Extensive experience with Data Extraction, Transformation, and Loading (ETL) from disparate data sources like Multiple Relational Databases. Worked on integrating data from flat files, CSV files, and XML files into a common reporting and analytical Data Model.
  • Experienced in Tuning Informatica Mappings to identify and remove processing bottlenecks
  • Experience in performance tuning of Oracle BI Repository & Dashboards / Reports, by implementing Aggregate tables, Indexes and managing Cache.
  • Experience in data modeling using dimensional data modeling, star schema/snow flake schema, fact and dimension tables, physical and logical data modeling in OBIEE
  • Expertise in developing/customizing OBIEE Repository (. rpd), Confidential all three layers (Physical Layer, Business Model & Mapping Layer and Presentation Layer) using Oracle Business Intelligence Administration Tool.
  • Automation and scheduling of UNIX shell scripts and Informatica sessions and batches using Autosys.
  • Strong knowledge of Star Schema, Snow Flake Schema
  • Expertise in generating customized reports using OBIEE
  • Good Experience in UNIX Shell Script, Perl Script for automation of tasks for file loading Job Scheduling.
  • Expert knowledge in data warehousing and business intelligence concepts.
  • Good Experience in Agile methodology
  • Highly motivated team player with good communication skills and excellent problem solving abilities. Would be willing to work independently or as part of a team.
  • Good experience with Build and configuration tools Autosys,Harvest,Udeploy

TECHNICAL SKILLS

Bigdata Technologies: Hadoop, HDFS, Hive, MapReduce, Pig, Spark, SqoopFlume, Zookeeper, Yarn, Kafka, oozie, flume

Languages: SQL, PL/SQL,Core Java

Web Technologies: Java Script,Json HTML, XML

Oracle Tools: SQL Developer, SQL* Plus, Eclipse, Toad,OBIEE(10g,11g)

Data integration Tools: Informatica 8.x,9.x

Databases: Oracle 8i/9.2/10g/11g, SQL Server, DB2, MS-Access

No SQL Databases: Cassandra,HBase,Mongodb

Operating System: Windows 7/XP/NT/Vista, MS-DOS, Linux and UNIX.

Scripting: Unix Shell script, Perl, Python, Java Script,Scala

PROFESSIONAL EXPERIENCE

Confidential

Data Engineer

Responsibilities:

  • Developed data pipeline using Flume, Sqoop, Pig and python map reduce to ingest various sources data into HDFS for analysis.
  • Developed job flows in Oozie to automate the workflow for extraction of data from warehouses and weblogs.
  • Used Pig as ETL tool to do transformations, event joins, filter bot traffic and some pre-aggregations before storing the data onto HDFS
  • Optimizing Map reduce code,pig scripts, user interface analysis, performance tuning and analysis .
  • Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting on the dashboard.
  • Developed Pig Latin scripts to extract and filter relevant data from the web server output files to load into HDFS.
  • Responsible for building scalable distributed data solutions using Hadoop
  • Installed and configured Hive, Pig, Sqoop, Flume and Oozie on the Hadoop cluster
  • Developed Hive queries and Pig scripts to customize the large data sets into JSON.
  • Involved in loading JSON datasets into MongoDB and validating the data using Mongo shell.
  • Loaded the aggregated data into MongoDB for reporting on the dashboard.
  • Worked on MongoDB schema/document modeling, querying, indexing and tuning
  • Developed Simple to complex Map-reduce Jobs using Hive and Pig
  • Optimized Map-Reduce Jobs to use HDFS efficiently by using various compression mechanisms
  • Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and extracted the data from oracle into HDFS using Sqoop
  • Created and maintained Technical documentation for all the tasks performed like executing Pig scripts and Hive queries
  • Used Informatica for HADOOP for loading data to and from HDFS and HIVE tables
  • Worked on Slowly Changing Dimensions both Type1 and Type 2.
  • Designing ETL processes using Informatica to load data from Flat Files, Oracle and Excel files to target Oracle Data Warehouse database.
  • Developed mappings in Informatica to load the data from various sources into the Data Warehouse, using different transformations like Joiner, Aggregator, Update Strategy, Rank, Router, Lookup, Sequence Generator, Filter, Sorter, Source Qualifier.
  • Designed workflows with many sessions with decision, assignment task, event wait, and event raise tasks, used Informatica scheduler to schedule jobs
  • Created Hbase tables to store various data formats of PII data coming from different portfolios.
  • Installed Oozie workflow engine to run multiple Hive and pig jobs.
  • Worked on tuning the performance Pig queries.
  • Used Elastic Search for multi-tenancy,real-time searches
  • Used Spark SQL for Data massaging and cleansing in Spark environment.

Environment: RedHat Linux, Cloudera HDFS, MapReduce, Hive, Pig, Sqoop, Flume, Zookeeper, Oozie, Hbase, Mongodb, Informatica 9.x,OBIEE 11.3,Spark

Confidential

Hadoop Dev/Admin, Data integration Developer

Responsibilities:

  • Installed and configured Hadoop MapReduce, HDFS, Developed multiple Map Reduce jobs in java for data cleaning and pre-processing.
  • Developed multiple MapReducejobs in python for data cleaning and preprocessing.
  • Designed Oozie workflows.
  • Installed and configured Hive and also written Hive UDFs.
  • Implemented CDH3 Hadoop cluster.
  • Installing cluster, monitoring/administration of cluster recovery, capacity planning, and slots configuration.
  • Created HBase tables to store variable data formats of PII data coming from different portfolios.
  • Implemented best income logic using Pig scripts.
  • Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
  • Supported in setting up QA environment and updating configurations for implementing scripts with Pig and Sqoop.
  • Writing Hadoop MR programs to get the logs and feed into Cassandra for Analytics purpose
  • Building, packaging and deploying the code to the Hadoop servers.
  • Unix Scripting to manage the Hadoop Operation stuffs.
  • Wrote Stored Procedures, Functions, Packages and triggers using PL/SQL to implement business rules and processes.
  • Extensive testing ETL experience using Informatica 9.x (Power Center/ Power Mart) (Designer, Workflow Manager, Workflow Monitor and Server Manager)
  • Worked on Informatica Power Center tools- Designer, Repository Manager, Workflow Manager, and Workflow Monitor.
  • Used advanced SQL like analytical functions, aggregate functions for mathematical and statistical calculations.
  • Optimized SQL used in reports to improve performance dramatically.
  • Tuned and optimized the complex SQL queries.
  • Worked with Business users to gather requirements for developing new Reports or changes in the existing Reports.

Environment: Hadoop, MapReduce, HDFS, Hive, Python, SQL,, PIG, Sqoop, CentOS,Cloudera.Oracle 10g,11g, Autosys,Shellscripting,MongoDB.OBIEE11g,Informatica 9.x

Confidential

Data integration Developer

Responsibilities:

  • Perform Data Mapping and develop ETL Specification documents
  • Develop ETLs using PL/SQL in Oracle 10g & 11g to extract, transform and load data from OLTP into Warehouse
  • Designing ETL processes using Informatica to load data from Flat Files, Oracle and Excel files to target Oracle Data Warehouse database.
  • Developed mappings in Informatica to load the data from various sources into the Data Warehouse, using different transformations like Joiner, Aggregator, Update Strategy, Rank, Router, Lookup, Sequence Generator, Filter, Sorter, Source Qualifier
  • Using Workflow Manager to create Sessions and scheduled them to run Confidential specified time with required frequency
  • Utilized PL/SQL nested tables for conditional trafficking of data within ETL process.
  • Messaging & filtering of data by developing reusable PL/SQL functions
  • Performance tuning using Oracle Hints and Result caching where appropriate.
  • Utilized PL/SQL bulk collect feature to optimize the ETL performance.
  • Develop BASH shell scripts to set up batch jobs on Unix Solaris 10 server.
  • Document application from a technical maintenance point of view and host a knowledge transfer session.
  • Maintain and Enhance Oracle PL/SQL batch process for patient level data collected in a clinical trial and reporting system
  • Load the data from MS Excel to Oracle Table and Oracle table to MS Excel.
  • Loaded the data using the SQL loader, Imports & UTL files based on file formats.
  • Debugging Production issues using Toad 9.7 debugger
  • Used SQL Trace and TKProf for analyzing performance issues
  • Perform Root Cause Analysis to identify and deploy bug fixes

Environment: Oracle 10g, UNIX, Windows, Informatica 8.6

Confidential

Data integration Developer

Responsibilities:

  • Involved in the Extraction, Transformation and loading of the data from various sources into the dimensions and the fact tables in the Data Warehouse.
  • Created reusable Transformations and Mapplets and used them in various mappings.
  • Involved in extensive performance tuning by determining bottlenecks Confidential various points like targets, sources, mappings, sessions or system. This led to better session performance.
  • Created Informatica mappings with PL/SQL procedures to build business rules to load data.
  • Most of the transformations were used like the Source qualifier, Aggregators, Connected & Unconnected lookups, Filters & Sequence.
  • Coding and development of packages in packages, procedures, cursors, tables, views and function as per the business requirements.
  • Supported QA and resolved the defects raised by QA.
  • Created Cursors and Ref cursors as a part of the procedure to retrieve the selected data.
  • Fine Tuned procedures for the maximum efficiency in various schemas across databases using Oracle Hints, Explain plan and Trace sessions.
  • Written complex SQLs using joins, sub queries and correlated sub queries.
  • Handled errors using system defined exceptions and user defined exceptions.
  • Involved in creating INDEXES to avoid the need for large-table, full-table scans for fast retrieval of data from database objects.
  • Used Collections for accessing complex data resulted from joining of large number of tables.
  • Worked with Bulk Collects to improve the performance of multi-row queries by reducing the context switching

Environment: Oracle 10g, PL /SQL, UNIX, Windows, Shell, DPL(Data Presentation Language), PLSQL & Perl.

Confidential

DB Developer

Responsibilities:

  • Interacted with business community and gathered requirements based on changing needs and incorporated identified factors into Informatica mappings to build Data Marts.
  • Extensively worked on Power Center Designer to develop mappings using several transformations such as Filter, Joiner, Lookup, Rank, Sequence Generator, Aggregator and Expression transformations.
  • Implemented Type 1 and Type 2 Slowly Changing Dimensions.
  • Created parameter files and used mapping parameters and variables for incremental loading of data.
  • Involved in performance Tuning of Transformations, mappings, and Sessions for better performance.
  • Designing Tables, Constraints, Views, and Indexes etc. in coordination with the application development
  • Developed stored procedures/function on request for enhancement of business logic
  • Developed Job Scheduler scripts for data migration using UNIX Shell scripting.
  • Building the different applications under Telegence and Light speed Billing Application
  • Involved in automation of build activity of the project using Shell Script.
  • Building Emergency Fixes for different applications
  • Developed Data migration scripts using UTL FILE package

Environment: Oracle 10g, PL /SQL, UNIX, Windows, Shell, CVS, Power Builder, Change Tracker (CT), Mercury Quality Center,Informatica 8.x

We'd love your feedback!