Hadoop Data Engineer & Support Resume

SUMMARY:

3+ years of experience in designing, developing and productionalizing robust end to end data pipelines which consolidate data from various batch or streaming sources
3+ years of experience in log analytics with Hadoop with user click and cookie id and developed attribution model for digital marketing campaigns as per Business need
Proven knowledge in major big data distributions such as Cloudera & MapR distributed Hadoop
Expertize in Hadoop architecture & its core components and eco - components such as Sqoop, Spark, Hive, Yarn, Oozie, and Zookeeper.
Thorough understanding of distributed storage and processing of large amount of data sets and different daemons ofHadoopclusters which include Resource Manager, Node Manager, Application Master, Name Node and Data Node etc.
Excellent knowledge in data ingestion pipeline from external RDBMS to Hadoop cluster with Sqoop and load data into and out of HDFS using the Hadoop File System commands
Hands on experience in setting up data pipeline with web application(Kafka producer and consumer) and HDFS
Excellent knowledge in developing the application to process the ingested data with Spark transformations along with pre-existing analytics application. Experience in Apache Spark integration (Spark SQL, Spark Streaming)
Proficiency in developing Hive queries with various built in functions to automate the jobs
Excellent knowledge in developing Hive UDFs as per application needs
Excellent knowledge in increasing query performance by creating partitioned tables in the metastore
Excellent knowledge in read and write various file formats and compression codecs for cluster memory management
Experience in developing shell scripting with marker files for attribution process to track the progress
Developed the Hive jobs with shell scripting to automate the Test data setup process for unit testing and SIT
Developed the framework to setup and run Hive queries in cluster from Excel to assist Test team
Played key role in deploying the application and run sanity tests and send results to business team
Experience in running query using Hue and used BI tools to run ad-hoc queries directly on Hive for business users
Experienced in performance tuning of Spark Applications for setting up the memory tuning etc.
Worked on both SDLC methodology Waterfall and Agile (Scrum approach) and have clear understanding of all phases of Software Development Life Cycle
Worked closely with Business & Marketing team to drive technical solutions, design and insights on data
7 years of working experience in onshore/Offshore model.
Develop documentation and other materials to support ongoing application development, change control, production maintenance, and support tasks.

TECHNICAL SKILLS:

Operating Systems/Platform: Windows XP, Unix/Linux

Programming: Java, SQL, Python 2.x, Scala 2.11.x, Shell

Databases: Oracle 11g, MySQL, Hbase

Hadoop Ecosystem: MapR distribution, Hive 1.2.1, Sqoop, Cloudera CDH5.8, Map-Reduce, HDFS, YARN, Spark

Tools: Hue, Eclipse, Rally, GIT, JIRA and testing frameworks like TestNG

PROFESSIONAL EXPERIENCE:

Confidential

Hadoop Data Engineer & Support

Responsibilities:

Analysis of business requirements and understand the rules of attribution modelling with user journey logs
Played key role in defining the captured log data for users and map the indicators for user journey
Part of analysis of data patterns from digital tracking partners like Google, Facebook, and LinkedIn etc.
Developed the framework to load and read data from HDFS with Spark Dataframes as per business needs
Developed the date restriction filter for journey logs
Utilized Transformation functions to transform the unstructured data to usable form for attribution
Established knowledge in developing unix shell scripting with marker files
Developed Hive DDL schema for input tables with partitions to track the user data with various file format and compression codecs
Developed Hive DDL schema to insert the data for every day attribution and look up by adding partition
Played the key role in implementing the rules for conversion based on attribution with Spark SQL and Hive jobs
Developed Hive jobs which accepts parameters to run the attribution modelling for given day or multiple day
Played key role in developing the data quality check scripts based on metrics available to detect anomaly
Involved in Unit Testing, assist SIT and UAT testing
Familiarized with Rerun, Replay and Replace model to re-run process for past day and replacement in PROD
Involved in PROD support whenever required
Involved in developing dashboards with Graffana to check for daily progress on various Affiliate and Partner reports

Environment: MapR Hadoop Distribution (5.2.1.x), Hive 1.2.1, HDFS, Java 8, Scala 2.11, Spark Core 1.6.0, Spark Streaming 1.6.0, Graffana

Confidential

Acceptance optimization for Mag

Responsibilities:

Understanding of logging of FiPay global payments central system logging to fetch the log files in flat file format
Set up of data pipeline in TEST environment as PoC to accept the central server log
Set up of Hive table with nested partition for Year, Month, Day for analysis
Played key role in assisting Infra team to setup Hadoop Cluster
Played key role in implement the rules define by business users in Hive for log analysis like region wise segregation, joining of different store data, merging of files with same type of payments from different stores in same region, identifying the maximum approval amount for each store and region wise etc.
Involved in key discussion with downstream team to discuss on file format compatible

Environment: MapR Hadoop Distribution (5.0), Hive 1.2.1, HDFS, Java, Spark Core 1.6.0, Hue

Confidential

Loyalty reward Accumulation

Responsibilities:

Understanding the logging of payment related tables in MySQL and its process to dump data into Main Frame system
Worked with MF team on file format to SFTP the flat files to Hadoop cluster
Proficient in developing strategies for Extraction, Transformation and Loading (ETL) mechanism to fetch the Target proprietary card transactions
Expert in designing Parallel jobs using various stages like Join, Merge, Lookup, Remove duplicates, Filter, Dataset, Lookup file set, Complex flat file
Involved in creating Hive tables, and loading and analysing data using hive queries and also Implemented Partitioning, Dynamic Partitions in Hive. Implemented the business rules in Hive for reward accumulation

Confidential

Hadoop data engineer & support

Responsibilities:

Understanding the logging of payment related tables in MySQL and its process to dump data into Main Frame system
Worked with MF team on file format to SFTP the flat files to Hadoop cluster
Proficient in developing strategies for Extraction, Transformation and Loading (ETL) mechanism for various card processors like Master, Visa, Discover & American Express and Target In house cards
Expert in designing Parallel jobs using various stages like Join, Merge, Lookup, Remove duplicates, Filter, Dataset, Lookup file set, Complex flat file
Involved in creating Hive tables, and loading and analysing data using hive queries and also Implemented Partitioning, Dynamic Partitions, Buckets in HIVE

Confidential

Siebel Developer and Support

Responsibilities:

Implementation of OAGIS (unique messaging exchange between integrated systems)
Integration with new source systems in short durations.
Customization of Rules for data survivor ship and Matching
Customization of Siebel User Interface for master entities data capturing.
Implementation of workflows for Batch Uploads of huge volume of data
Writing SQL Query to ensure the created record is stored in DB, also ensure the correct association of Pick list, MVG to a record
Have knowledge on split tool to test the data with batch load
Continuous monitoring of Siebel servers in all the environment and sending report to team
Framing the Master Test Suite/Regression Test Suite in Test Lab
Managed test execution and produced daily reports/Weekly Status Reports to Onsite team members/Senior Manager
Knowledge on Unix to move the srf in lower environments
Knowledge of sending and receiving responses in SOAP UI for Integration testing
Establish “Best Practices” and plan for continuous improvement of processes
Project Monitoring and Metrics Reporting

We provide IT Staff Augmentation Services!

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship