We provide IT Staff Augmentation Services!

Data Engineer (big Data/hadoop) - Current Resume

4.00/5 (Submit Your Rating)

SUMMARY:

  • Data engineer with 8+ years of experience with expertise in full life cycle software Development, Design, and Implementation of Data Warehousing, and Business Intelligence applications.
  • Strong programming background in various languages like PYTHON, Unix Shell scripting,PL/SQL etc.
  • Strong experience in Big Data solutions using different tools in Hadoop ecosystem like Hive, Sqoop, Spark (RDD,Pyspark SQL), Kafka,Flume,Oozie.
  • Expertise in data movement from various sources to HDFS and data transformation using hadoop tools like HIVE, SQOOP, SPARK.
  • Knowledge on RDD architecture and implementing/optimizing Spark operations on RDD and Dataframes.
  • End to End Expertise in datawarehouse domain with knowledge of data modeling and using ETL tools like Informatica 9.0.Strong knowledge of ETL methods for data extraction, transformation and loading in corporate - wide ETL Solutions and Data Warehouse tools for reporting and data analysis using ETL tools like Informatica Power Center 10.x/9.x/8.x, UNIX shell scripting and databases like DB2, Postgres, Oracle, MS SQL Server and Teradata.
  • Designed, Developed and Implemented complex ETL jobs using Informatica and DB2 for Data cleansing, Data Transformation, Data Integration, Data Profiling and involved in Performance Tuning of jobs
  • Experience in using job monitoring and scheduling tools like oozie, Control-M, Crontab.
  • Excellent in analyzing data using SQL, HiveQL, UNIX shell scripting.
  • Understanding on Safe Agile concepts and worked using Agile Methodology for SDLC (Software Development Life Cycle) and utilize Agile scrum meetings for identifying stories and creatively working on them.
  • Experience in using GIT Lab for code version control and using VersionOne for agile lifecycle management.
  • Researching on Amazon Web Services (AWS) cloud services like EC2, S3,etc for hosting existing data in future.
  • Exploring on Machine learning concepts.
  • Strong Impact analysis and requirement gathering experience acquired by constant communication with Client, Technical teams and functional Business Analysts.
  • Worked in offshore-Onsite model for various project releases with onsite experience in project implementation and requirement gathering.
  • Excellent troubleshooting, problem solving, team working skills. High level of integrity and dedication to quality, excellence and corporate objectives.
  • Self-motivated with the ability to work in fast paced environments and a believer in delivering results.
  • Lead & mentored team for various project deliveries ensuring high quality in development, testing and implementation of various releases in onsite-offshore operating mode.
  • Trained in “Cloudera data analyst training” and “Developer training for Apache Spark and Hadoop” courses from Cloudera.

TECHNICAL SKILLS:

Big Data tools: Hive,Sqoop,Impala,Spark 1.6/2.1, beeline,HUE,Oozie

ETL tools: Informatica 10.x/9.x/8.x.

Languages: PL/SQL, Shell Scripting,Python 2.7/3.

Database: IBM DB2, PostgreSQL, Oracle 10,Oracle-9i, NCR Teradata V2R5

OS/Environment: UNIX, Windows.

Job scheduler: Crontab, Control-M

Project Methodology: SAFE AGILE and SDLC Waterfall methodology.

AGILE tools: Version One,GITLAB

Service Management: HP Service Manager

PROFESSIONAL EXPERIENCE:

Confidential

Data Engineer (Big Data/Hadoop) - Current

Tools: Hadoop (Hive,Sqoop,Spark),Python,Unix Shell scriptingLocation: Bloomington,IL

Responsibilities:

  • Worked on creating Hive tables and written Hive queries for data analysis of massive data sets to meet business requirements
  • Experienced in data extraction from IBM DB2 into HDFS using HiveQL,SQL and shell scripting.
  • Used Jupyter Notebook for creating Python (2.7/3) scripts for data analysis and transformation using various python libraries like pandas,numpy,etc.
  • Developed spark SQL scripts for exploratory analysis using Apache Spark Dataframes.
  • Worked with RDD’s and dataframes in apache spark for data analysis and transformation.
  • Involved in loading and transforming large sets of Structured, Semi-Structured and Unstructured data and analyzed them by running Hive queries.
  • Extensively worked with Jenkins team for continuous integration and for End to End automation for all build and deployments.
  • Worked with cross functional consulting teams within the data science and analytics team to design, develop, and execute solutions to derive business insights and solve clients' operational and strategic problems.
  • Continuous monitoring and managing the Hadoop/spark cluster using Cloudera Manager
  • Developed Unix Shell Scripts and automated them using CRON job scheduler.
  • Experience working in Agile Methodology and used Version One and GITlab for maintaining the stories about project.
  • Worked with Architecture and Development teams to understand usage patterns and work load requirements of new projects to ensure the Hadoop platform can effectively meet performance requirements and service levels of application
  • Researching on Amazon Web Services (AWS) cloud services like EC2, S3,etc for hosting existing data in future.
  • Exposed to machine learning Concepts

Data Analyst (Big Data/ ETL )

Confidential

Tools: Hadoop (Hive,Sqoop,Spark,Oozie),Python,Unix Shell scripting,Informatica,Control-M

Responsibilities:

  • Developed and implemented scalable solutions for ingesting data from various sources and processing the data utilizing Big Data (Hadoop) technologies such as Hive, Spark,Oozie and UNIX Shell scripting.
  • Extracted data from different data sources into HDFS from DB2 using SQL, HiveQL, Unix shell scripting.
  • Created internal, external, parquet Hive tables and worked on them for data analysis and transformation.
  • Exposed to concept of loading data into Spark RDD and do in memory data Computation to generate the Output response in form of XML or JSON files.
  • Exposed to coding in Spark for data processing and optimization of the existing algorithms using Spark context, Data Frame, pair RDD's, Spark YARN.
  • Worked extensively in performance tuning of Hive and Spark applications for setting right parameters, correct level of Parallelism and memory tuning for faster processing with high volume data.
  • Developed workflow in Oozie to manage and schedule jobs on Hadoop cluster to trigger daily, weekly and monthly batch cycles.
  • Developed Unix Shell Scripts and automated them using CRON job scheduler.
  • Experience in using cloudera manger for continuous monitoring of jobs running in cluster.
  • Used GITLab for code version control.
  • Experience working in agile environment which involved working and implementing stories in 2 week sprints.
  • Worked on POC data movement from Kafka to HDFS, Hive.
  • Developed various informatica mappings with the collection of all sources, targets, and transformations using powercenter designer.
  • Made substantial contributions in simplifying the development and maintenance of ETL by creating re-usable Mapplets and Transformation objects.
  • Used transformations like aggregator, filter, router, stored procedure, sequence generator, lookup, expression and update strategy to meet business logic in the mappings.
  • Involved in designing tables and implementing Informatica mappings and workflows for extraction of the data from the source systems to populate Staging Area, Dimension and Fact Tables.
  • Extensively worked with various lookup caches like Static Cache, Dynamic Cache and Persistent Cache
  • Worked with PMCMD to interact with Informatica Server from command mode and execute the unix scripts.
  • Developed various SQL queries in DB2 for processing and analysis of data.
  • Scheduled and ran Extraction, loading processes using Control-M.
  • Tested the data and data integrity among various sources and targets. Used debugger by making use of breakpoints to monitor data movement, identified and fixed the bugs.
  • Developed unit test case scenarios for thoroughly testing ETL processes and shared them with testing team.
  • Scheduled walkthroughs of design documents, specifications, code, test plans etc. as appropriate throughout project lifecycle.
  • Performed performance tuning for various informatica mappings for better processing time.
  • Actively involved in evaluating business requirements with stakeholders and prepared detailed specifications that follow project guidelines required to develop application.
  • Developed data mapping documents that contain transformation rules to implement the business logic.
  • Translating high level requirements documents to low level design documents.
  • Appiled ITIL concepts of change management for implementation of code in production and pre production environments.
  • Provided on-call support during and after code promotion as needed.

Confidential, Moline,IL

Senior Informatica Developer

Tools: Informatica,Mainframe,Unix

Responsibilities:

  • Re-engineered on existing Mappings to support new/changing business requirements.
  • Used Mapping, Variables/Parameters, and Parameter Files to support change data capture and workflow execution process .
  • Modified several of the existing mappings based on the user requirements and maintained existing mappings, sessions and workflows.
  • Used various transformations like Source Qualifier, Expression, Aggregator, Joiner, Filter, Lookup, and Update Strategy for Designing and optimizing the Mapping.
  • Tuned the performance of mappings by following Informatica best practices and also applied several methods to get best performance by decreasing the run time of workflows.
  • Prepared SQL Queries to validate the data in both source and target databases.
  • Involved in Unit testing, Integration testing, UAT by creating test cases, test plans and helping Informatica administrator in deployment of code across Dev, Test and Prod Repositories.
  • Created test plans and did unit testing for the Informatica mappings and stored procedures.
  • Involved in Unit, Functional, Integration and System testing and preparation review documents for the same.
  • Creation of sessions and workflows according to the data load in to different systems.
  • Validated the Mappings, Sessions & Workflows, Generated & Loaded the Data into the target database.
  • Monitored batches and sessions for weekly and Monthly extracts from various data sources to the target.

Confidential

Senior Software Engineer/Developer

Tools: Ab initio,Oracle,Unix

Responsibilities:

  • Creating Interface contract document, High level design and Low level design documents based on broadcast received from bank .
  • Development and design of code using Ab initio 3.12 with extensive use of PL/SQL for analysis and coding.
  • Performance improvement of existing applications and newly built.
  • Exhaustive testing of applications using SQL scripts in Oracle.
  • Preparing various documents such as Unit Test Specifications, System Test cases, SQL scripts.
  • Used Korn shell scripts for data loading and execution of mapping
  • Reviewing and exhaustive testing of components before production implementation along with groups involved that involves quality analysis, defect logging and resolution.

Confidential

Maintainence/Support

Tools: Informatica,Oracle,Unix

Responsibilities:

  • Involved in the requirement definition and analysis in support of data warehousing efforts.
  • Involved in designing the Data warehouse using Informatica 8.6 ETL tool by using Source Analyzer,Warehouse Designer, Mapping Designer and Transformations.
  • Developed ETL procedures to ensure compliance with standards and lack of redundancy, translating business rules and functionality requirements into ETL procedures.
  • Wrote Korn shell scripts to load data to target and for running mappings.
  • Developed almost all Transformations such as Filter, Aggregator, Expression, Router, Lookup, Joiner,Update Strategy and Rank.
  • Extensively involved in Performance Tuning issues.
  • Involved in Creating tasks, Worklets, workflows and Schedule, Run, Monitor sessions by using workflow manager and workflow monitor.
  • Migrated Mappings, Sessions and Workflows from Development to Test and then to Pre- Production environment.
  • Tested the data and data integrity among various sources and targets. Involved in developing ad-hoc and business reports according to the requirement.

Confidential

Test Lead

Tools: Informatica,Mainframe,Unix

Responsibilities:

  • Re-engineered on existing Mappings to support new/changing business requirements.
  • Used Mapping, Variables/Parameters, and Parameter Files to support change data capture and workflow execution process .
  • Modified several of the existing mappings based on the user requirements and maintained existing mappings, sessions and workflows.
  • Used various transformations like Source Qualifier, Expression, Aggregator, Joiner, Filter, Lookup, and Update Strategy for Designing and optimizing the Mapping.
  • Tuned the performance of mappings by following Informatica best practices and also applied several methods to get best performance by decreasing the run time of workflows.
  • Prepared SQL Queries to validate the data in both source and target databases.
  • Validated the Mappings, Sessions & Workflows, Generated & Loaded the Data into the target database.
  • Monitored batches and sessions for weekly and Monthly extracts from various data sources to the target database.
  • Exhaustive testing of applications using OPC scheduler (Mainframe).
  • Monitoring batch run every week and providing resolutions in case of error.
  • Creating change management to mention various components and plan to execute various tasks in production.

We'd love your feedback!