Hadoop Data Engineer & Support Resume
SUMMARY:
- 3+ years of experience in designing, developing and productionalizing robust end to end data pipelines which consolidate data from various batch or streaming sources
- 3+ years of experience in log analytics with Hadoop with user click and cookie id and developed attribution model for digital marketing campaigns as per Business need
- Proven knowledge in major big data distributions such as Cloudera & MapR distributed Hadoop
- Expertize in Hadoop architecture & its core components and eco - components such as Sqoop, Spark, Hive, Yarn, Oozie, and Zookeeper.
- Thorough understanding of distributed storage and processing of large amount of data sets and different daemons ofHadoopclusters which include Resource Manager, Node Manager, Application Master, Name Node and Data Node etc.
- Excellent knowledge in data ingestion pipeline from external RDBMS to Hadoop cluster with Sqoop and load data into and out of HDFS using the Hadoop File System commands
- Hands on experience in setting up data pipeline with web application(Kafka producer and consumer) and HDFS
- Excellent knowledge in developing the application to process the ingested data with Spark transformations along with pre-existing analytics application. Experience in Apache Spark integration (Spark SQL, Spark Streaming)
- Proficiency in developing Hive queries with various built in functions to automate the jobs
- Excellent knowledge in developing Hive UDFs as per application needs
- Excellent knowledge in increasing query performance by creating partitioned tables in the metastore
- Excellent knowledge in read and write various file formats and compression codecs for cluster memory management
- Experience in developing shell scripting with marker files for attribution process to track the progress
- Developed the Hive jobs with shell scripting to automate the Test data setup process for unit testing and SIT
- Developed the framework to setup and run Hive queries in cluster from Excel to assist Test team
- Played key role in deploying the application and run sanity tests and send results to business team
- Experience in running query using Hue and used BI tools to run ad-hoc queries directly on Hive for business users
- Experienced in performance tuning of Spark Applications for setting up the memory tuning etc.
- Worked on both SDLC methodology Waterfall and Agile (Scrum approach) and have clear understanding of all phases of Software Development Life Cycle
- Worked closely with Business & Marketing team to drive technical solutions, design and insights on data
- 7 years of working experience in onshore/Offshore model.
- Develop documentation and other materials to support ongoing application development, change control, production maintenance, and support tasks.
TECHNICAL SKILLS:
Operating Systems/Platform: Windows XP, Unix/Linux
Programming: Java, SQL, Python 2.x, Scala 2.11.x, Shell
Databases: Oracle 11g, MySQL, Hbase
Hadoop Ecosystem: MapR distribution, Hive 1.2.1, Sqoop, Cloudera CDH5.8, Map-Reduce, HDFS, YARN, Spark
Tools: Hue, Eclipse, Rally, GIT, JIRA and testing frameworks like TestNG
PROFESSIONAL EXPERIENCE:
Confidential
Hadoop Data Engineer & Support
Responsibilities:
- Analysis of business requirements and understand the rules of attribution modelling with user journey logs
- Played key role in defining the captured log data for users and map the indicators for user journey
- Part of analysis of data patterns from digital tracking partners like Google, Facebook, and LinkedIn etc.
- Developed the framework to load and read data from HDFS with Spark Dataframes as per business needs
- Developed the date restriction filter for journey logs
- Utilized Transformation functions to transform the unstructured data to usable form for attribution
- Established knowledge in developing unix shell scripting with marker files
- Developed Hive DDL schema for input tables with partitions to track the user data with various file format and compression codecs
- Developed Hive DDL schema to insert the data for every day attribution and look up by adding partition
- Played the key role in implementing the rules for conversion based on attribution with Spark SQL and Hive jobs
- Developed Hive jobs which accepts parameters to run the attribution modelling for given day or multiple day
- Played key role in developing the data quality check scripts based on metrics available to detect anomaly
- Involved in Unit Testing, assist SIT and UAT testing
- Familiarized with Rerun, Replay and Replace model to re-run process for past day and replacement in PROD
- Involved in PROD support whenever required
- Involved in developing dashboards with Graffana to check for daily progress on various Affiliate and Partner reports
Environment: MapR Hadoop Distribution (5.2.1.x), Hive 1.2.1, HDFS, Java 8, Scala 2.11, Spark Core 1.6.0, Spark Streaming 1.6.0, Graffana
Confidential
Acceptance optimization for Mag
Responsibilities:
- Understanding of logging of FiPay global payments central system logging to fetch the log files in flat file format
- Set up of data pipeline in TEST environment as PoC to accept the central server log
- Set up of Hive table with nested partition for Year, Month, Day for analysis
- Played key role in assisting Infra team to setup Hadoop Cluster
- Played key role in implement the rules define by business users in Hive for log analysis like region wise segregation, joining of different store data, merging of files with same type of payments from different stores in same region, identifying the maximum approval amount for each store and region wise etc.
- Involved in key discussion with downstream team to discuss on file format compatible
Environment: MapR Hadoop Distribution (5.0), Hive 1.2.1, HDFS, Java, Spark Core 1.6.0, Hue
Confidential
Loyalty reward Accumulation
Responsibilities:
- Understanding the logging of payment related tables in MySQL and its process to dump data into Main Frame system
- Worked with MF team on file format to SFTP the flat files to Hadoop cluster
- Proficient in developing strategies for Extraction, Transformation and Loading (ETL) mechanism to fetch the Target proprietary card transactions
- Expert in designing Parallel jobs using various stages like Join, Merge, Lookup, Remove duplicates, Filter, Dataset, Lookup file set, Complex flat file
- Involved in creating Hive tables, and loading and analysing data using hive queries and also Implemented Partitioning, Dynamic Partitions in Hive. Implemented the business rules in Hive for reward accumulation
Confidential
Hadoop data engineer & support
Responsibilities:
- Understanding the logging of payment related tables in MySQL and its process to dump data into Main Frame system
- Worked with MF team on file format to SFTP the flat files to Hadoop cluster
- Proficient in developing strategies for Extraction, Transformation and Loading (ETL) mechanism for various card processors like Master, Visa, Discover & American Express and Target In house cards
- Expert in designing Parallel jobs using various stages like Join, Merge, Lookup, Remove duplicates, Filter, Dataset, Lookup file set, Complex flat file
- Involved in creating Hive tables, and loading and analysing data using hive queries and also Implemented Partitioning, Dynamic Partitions, Buckets in HIVE
Confidential
Siebel Developer and Support
Responsibilities:
- Implementation of OAGIS (unique messaging exchange between integrated systems)
- Integration with new source systems in short durations.
- Customization of Rules for data survivor ship and Matching
- Customization of Siebel User Interface for master entities data capturing.
- Implementation of workflows for Batch Uploads of huge volume of data
- Writing SQL Query to ensure the created record is stored in DB, also ensure the correct association of Pick list, MVG to a record
- Have knowledge on split tool to test the data with batch load
- Continuous monitoring of Siebel servers in all the environment and sending report to team
- Framing the Master Test Suite/Regression Test Suite in Test Lab
- Managed test execution and produced daily reports/Weekly Status Reports to Onsite team members/Senior Manager
- Knowledge on Unix to move the srf in lower environments
- Knowledge of sending and receiving responses in SOAP UI for Integration testing
- Establish “Best Practices” and plan for continuous improvement of processes
- Project Monitoring and Metrics Reporting