We provide IT Staff Augmentation Services!

Hadoop Consultant Resume

2.00/5 (Submit Your Rating)

Fort, LeE

SUMMARY

  • 10+ Years of Comprehensive and in - depth experience in Information Technology with a strong background in Hadoop/Big Data technology
  • Having 5+ Years of experience in Hadoop/Big Data technologies
  • Having 5+ Years of experience in Java
  • Having 3+ years of experience in working in onsite with US clients
  • Done continuous service improvements in the projects
  • As an Idea Champion provided ideas to improve the process and performance
  • Extensive knowledge ofHadoop& Spark architecture and core components
  • Experienced in Spark scripts using Scala, Spark SQL to access hive tables in spark for faster data processing
  • Experienced with Apache Spark Streaming API on Big Data Distributions in an active cluster environment.
  • Good in Scala programming for writing applications in Apache Spark
  • Done performance tuning in Spark
  • Expertise in Hadoop, HDFS, Spark, Hive, HBase, Pig, Flume, Kafka, Oozie
  • Done real time processing using Kafka and Spark Streaming
  • Experienced in Impala for doing MPP (Massive Parallel Processing) in data
  • In depth understanding of Hadoop Architecture and its components such as HDFS, Name Node, Data Node, Job Tracker, Task Tracker and Map Reduce
  • Knowledge in administrative tasks such as Cluster setup, Hadoop installation, Configuring Hadoop and installing its ecosystem components such as HBase, Pig and Flume
  • Good knowledge in Balancer Daemon for redistributing the blocks
  • Experienced in high volume ingestion of event based data into Hadoop using Flume
  • Expertise in Hadoop jobs for analyzing huge data using MapReduce, Pig and Hive
  • Having work experience in writing queries in HBase
  • Experience in importing and exporting data using SQOOP from HDFS to Relational Database Systems and vice-versa
  • Experienced in job workflow scheduling and monitoring tool Oozie
  • Created MapReduce programs in Java for the requirements
  • Strong Data Warehousing experience in Application development & Quality Assurance testing using Informatica Power Center 9.6.1/8.6 Source Analyzer, Data Warehousing Designer, Mapping Designer, Mapplet Designer, Transformation developer
  • Experienced in redesigning Informatica Analytics applications into Big Data Analytics applications
  • Experience in redesigning other applications into Informatica Analytics applications
  • Experience in designing complex ETL/ELT solutions for Data Warehouse Programs
  • Extensive Experience in identifying bottlenecks and performance tuning in Informatica
  • Hands on experiencewith industry standard methodologies Waterfall and Agile Scrum methodology within the Software Development Life Cycle
  • Expertise in working as IC (Individual contributor) working independently
  • Proposed changes in the process within the project and implemented the same for better service to the client
  • Defined and proposed Agile Scrum standards to be followed in the project. Also trained the project teams in Agile Scrum
  • Having good Domain experience in Banking, Insurance, Automobile and Retail
  • Experienced in UNIX, Windows and Mainframes platforms

TECHNICAL SKILLS

Big Data Technologies: Hadoop, Spark, Spark SQL, Spark Streaming, Hive, HBase, Pig, SQOOP, Flume, Ozzie, Kafka, HDFS, Impala, Cloudera, Avro, Parquet, Hortonworks, Phoenix, Ranger, Atlas, Ambari

Web Technologies: Java, HTML, XML, JSON

ETL Technologies: Informatica Power Center 10.x/9.x/8.x

IDE: SBT, ECLIPSE, Intellij, Maven

Languages: Java, Python, Scala, COBOL, SQL, CICS

Databases: Oracle, DB2, VSAM, MSSQL Server

DB Tools: TOAD, SQL Developer

Operating System: Windows, Unix, Z/800

Job Scheduler: Informatica Scheduler, JCL

Version Control: Git

Methodology: Agile Scrum

Other Tools: Win SCP, Notepad++, MS Visio 2010, Putty, Facets, Squirrel

PROFESSIONAL EXPERIENCE

Confidential, Fort Lee

Hadoop Consultant

Responsibilities:

  • Worked in the Big Data proposal in Hortonworks distribution
  • Coded Spark SQL to access HBase tables and load into MSSQL Server by applying business rules
  • Accessed HBase tables using Phoenix
  • Done Hive on HBase
  • Configured Kafka to send the data streams to Spark streaming and then store it in HBase table
  • Imported Data from Databases using Spark and JDBC
  • Done real time data processing using Spark streaming and Kafka connectors
  • Done performance tuning in Spark scripts
  • Created reports by using Spark SQL and Scala
  • Done real time analytics using Spark Steaming and Kafka
  • Done importing and exporting od data using SQOOP from HDFS to Relational Database Systems and vice-versa
  • Used Impala to query the database and also done performance tuning

Environment: Hadoop 2.0, Hortonworks, Kafka, Spark, Spark Streaming, Spark SQL, SBT, Intellij, Impala, MSSQL Server, CentOs, Win SCP, Hadoop, HBase, Hive, SQOOP, Unix Shell Scripts, Windows 10, Notepad ++, Atlas, Ranger, Phoenix, impala, Ambari

Confidential

Hadoop/ Spark / Informatica Big Data Management (BDM) Developer

Responsibilities:

  • Worked in the Big Data proposal in Cloudera distribution
  • Worked in application improvement proposals
  • Involved in the complete life cycle of the project from Requirement gathering, Estimation Designing, Coding, Testing and production support
  • Worked in redesigning the applications and solutions. Also prepared the presentations for the same
  • Done estimation for the requirements
  • Done analysis and prepared design document based on the client requirements
  • Coded Spark scripts using Scala, Spark SQL to access HIVE tables in spark for faster data processing
  • Configured Kafka to send the data streams to Spark streaming and then store it in HDFS
  • Coded Spark scripts using various Spark Transformations and Actions for cleansing the data
  • Imported Data from Databases using Spark and JDBC
  • Done real time data processing using Spark streaming and Kafka connectors
  • Done performance tuning in Spark scripts
  • Created reports by using Spark SQL and Scala
  • Done real time analytics using Spark Steaming and Scala
  • Done Cluster setup, Hadoop installation, Configuring and installing Hadoop ecosystems in Cloudera distribution
  • Done importing and exporting healthcare subscriber, Plans, Benefits, products and claims data from various applications using SQOOP from HDFS to Relational Database Systems and vice-versa
  • Created NoSQL tables in HBase for processing XML data from healthcare member portals
  • Coded Pig scripts and Hive for analyzing structured and semi structured used in the project
  • Used Impala to query the database and also done performance tuning
  • Configured Oozie for scheduling and monitoring the Hadoop jobs
  • Installed and configured Flume for collecting and replicating data from subscribers and claims information from web portals of healthcare providers
  • Involved in testing the huge data in Hadoop Multi node cluster in Cloudera distribution
  • Involved in daily scrum meeting as part of Scrum methodologies followed in the project
  • Managed the team of 10
  • Reviewed the design document, code, unit testing and UTC of team members

Environment: Hadoop 2.0, Cloudera, Spark, Spark Streaming, Spark SQL, SBT, Eclipse, Impala, Informatica BDM 10.1.0, Oracle, Unix, Win SCP, SQL Developer, Hadoop, HBase, Hive, SQOOP, Flume, Pig, Unix Shell Scripts, SQL Developer, Windows 7, Notepad ++, MS Visio

Confidential, O’Follon, MO

Hadoop Developer

Responsibilities:

  • Involved in Business Analysis, requirement gathering, Designing Logical and Physical Data models.
  • Worked with Business and technology teams for redesigning ETL Analytics applications to Big Data Analytics applications
  • Based on the client requirement, done analysis and created design documents
  • Designed and developed Spark applications using Scala in Cloudera distribution
  • Coded in Spark-SQL to use various data sources like JSON, Parquet and Hive
  • Coded extracts for real time data using Kafka and Spark streaming to store it in HDFS
  • Developed Spark scripts to perform Transformations and Actions in Spark RDDs
  • Done performance tuning in Spark scripts
  • Configured the Hadoop Cluster for adding new nodes into cluster for Cloudera distribution
  • Done importing and exporting data into Oracle 10g and Hive using Sqoop
  • Created Sqoop jobs with incremental load to populate Hive external tables
  • Loaded streaming of log and XML data from various web servers into HDFS using Flume
  • Developed queries in HBase for loading and querying semi structured data
  • Involved in loading bank promotion details from Unix file system to HDFS
  • Developed Ozzie workflow for scheduling
  • Created Pig scripts for join queries from multiple tables
  • Created partition tables in Hive for bank transactions table which grows faster
  • Worked on Informatica Power Center tool - Source Analyzer, Data warehousing designer, Mapping & Mapplet Designer and Transformation Developer
  • Developed complex mappings to load data from multiple source systems
  • Done performance tuning in session level, Mapping Level and System level

Environment: Java, Hadoop, Cloudera, Spark, Spark SQL, SBT, Eclipse, Hive, HBase, Pig, SQOOP, Flume, Ozzie, Windows 8, Red Hat Linux 6, Amazon AWS, Informatica 9.6.1, Unix, Oracle 10g, Microsoft Visio 2010

Confidential, Michigan, MI

Lead Informatica Developer

Responsibilities:

  • Worked with Business and technology teams for redesigning ETL Analytics applications to Big Data Analytics applications
  • Worked with the business analysts in requirement analysis to implement the ETL process.
  • Designed the scripts for constraints, triggers and stored procedures.
  • Created design documents for creating maps to load the data from the ODS to warehouse system.
  • Used loading techniques like Slowly Changing Dimensions and Incremental Loading using parameter files and mapping variables.
  • Extensively worked with Slowly Changing Dimensions Type1, Type2, and Type3 for Data Loads.
  • Developed batch file to automate the task of executing the different workflows and sessions associated with the mappings.
  • Created workflows using Workflow manager for different tasks like sending email notifications, timer that triggers when an event occurs, and sessions to run a mapping.
  • Used designer debugger to test the data flow and fix the mappings.
  • Done performance tuning at the Mapping, Session, Source, Target and Databases levels.
  • Created re-usable transformations and mapplets.
  • Created Pig scripts for join queries from multiple tables
  • Done importing and exporting data into Oracle 10g and Hive using Sqoop
  • Created Sqoop jobs with incremental load to populate Hive external tables
  • Used Spark for in-memory processing
  • Used HBase for loading databases for further processing
  • Responsible for determining the bottlenecks and fixing the bottlenecks with performance tuning.
  • Involved in Unit Testing, User Acceptance Testing (UAT) and integration testing

Environment: Java, Hadoop, Spark, Hive, HBase, Pig, SQOOP, Flume, Ozzie, Windows 8, Red Hat Linux 6, Amazon AWS, Informatica 9.6.1, Oracle 10g, Eclipse, SBT

Confidential, Providence, RI

Informatica Onsite Lead

Responsibilities:

  • Closely worked with Business users, Project manager & Stakeholder for gathering requirements
  • Done impact analysis, Design, Coding, preparing test plans and testing
  • Participated in the full life cycle for multiple projects: analysis, design, documentation, development and testing
  • Used Session parameters, mapping variable/parameters and create Parameter files for imparting flexible runs of work flows based on changing variable values.
  • Implemented reusable mappings & sessions for Operational Audit Process
  • Migrating code using deployment groups from development box to test
  • Performance tuning of sources, targets, mappings and SQL queries in the transformations

Environment: Informatica Power Center 8.6.1, Oracle11/10g, SQL Server, Flat files, SQL, TOAD, Windows XP, UNIX Shell Scripts, Microsoft visio

Confidential

SME (Subject Matter Expert)

Responsibilities:

  • Interacted with user and gathered information related to the production issues and enhancements
  • As part of enhancement involved in estimation, analysis, design and coding of a windows and mainframe applications
  • As part of production support taken care of 13 applications including 2 windows based applications.
  • Done analysis and prepared design documents based on the client requirements
  • Developed and maintained ETL/ELT applications
  • Done performance tuning in the applications
  • Used scripts to import and export data to from the database.
  • Developed UNIX Shell scripts for File Manipulation, FTP, Executing DB2 SQLs and archiving log files
  • Implemented various Performance Tuning techniques on Sources, Targets, Mappings, Sessions, Workflows and database

Environment: Informatica Power Center 8.6.1, Oracle11/10g, SQL Server, Flat files, SQL, TOAD, Windows XP, UNIX Shell Scripts, Microsoft visio, Z/800, COBOL, JCL, CICS, DB2, SPUFI, QMF, File Aid, CA-7

We'd love your feedback!