- 6+ years of experience in the various Apache Hadoop Ecosystems like HDFS, YARN, MapReduce, Pig, Hive, Sqoop, Flume, Impala, Hue, and Oozie
- 2+ years of experience in MySQL Database testing
- Good understanding of the Hadoop Architecture, the underlying Hadoop Framework and the Storage Management system
- Experience in writing query scripts using MySQL
- Experience in writing MapReduce programs using Java
- Experience in writing Sqoop scripts for importing and exporting structured data between HDFS and MySQL
- Experience in writing Flume scripts for importing and exporting unstructured data between the local File System (FS) and HDFS
- Experience in writing Pig Latin scripts for processing the data imported into HDFS
- Experience in using performance techniques like Partitioning and Bucketing and optimization techniques like data compression for creating efficient Hive schemas
- Experience in writing HiveQL scripts for data analysis
- Experience in writing Linux scripts for HDFS maintenance and automating and scheduling of repetitive jobs
- Experience in using Oozie for scheduling and coordinating actions involving Bash, Sqoop, Pig, Hive and Java
- Good knowledge of the Waterfall Development and Agile Development methodologies
- Able to seamlessly collaborate with other technology groups and business entities
- Able to work independently or as a team member and handle multiple projects at the same time
- Have experience in working with the off - shore team as well as team members located across different time zones
- Have excellent analytical, interpersonal, and communication skills
- Fast learner of new technologies and possesses a positive attitude towards the ever changing industry.
SDLC Methodologies: Agile (SCRUM) Development and Waterfall Development
Hadoop Ecosystems: HDFS, YARN, MapReduce, Pig, Hive, HCatalog, Sqoop, Flume, Impala, Hue, and Oozie
SQLbased technologies: MySQL
Operating Systems: Linux and Windows XP, 7, 8.X, 10
Scripting Languages: BASH Scripting
Programming Languages: Java
Hadoop Distribution: Cloudera, Hortonworks
Documentation Suite: MS Office
Project Management Suite: ALM
- Work in a fast paced Agile development environment
- As part of a team, collaborate with the Business Analysts and the Database teams in understanding the business requirements
- Contribute in providing design recommendations in the design and development of the desired Data Flow System
- Ensure that the off-shore team has fully understood the business requirements and answer any clarifications if needed
- Write Sqoop scripts to import data from MySQL into HDFS
- Write Pig Latin scripts to transform the imported HDFS data based on a set of criterion provided by the business
- Create partitioned and bucketed Hive table schemas that would store data in a compressed format
- Use HCatalog to load the Pig output data into Hive
- Write HiveQL scripts to generate the desired results based on the business requirements
- Write Linux BASH scripts to perform routine HDFS maintenance and automate routine activities
- Write Oozie workflows to schedule the various processes starting from the import of the data, transformation of the data, the loading of the data into Hive and HDFS maintenance
- Carry out end to end testing of the Data Flow System
- Be involved in the code deployment process into Production.
Environment: Cloudera, MySQL, HDFS, YARN, Sqoop, MapReduce, Pig, HCatalog, Hive, Impala, Hue, Linux BASH, Oozie and Microsoft Office.
- Work in a traditional Waterfall Development Environment
- Take part in the Review meetings and contribute to the development of the Business requirements
- Develop data processing algorithms based on the Review meetings
- Write Flume and Sqoop scripts for importing data from MySQL and the local File System into HDFS
- Write Pig Latin scripts to carry out requirement based filtering of the imported HDFS data and store the resulting data into HDFS for Hive ingestion
- Create Hive table schemas that are partitioned and bucketed and would store the data in a compressed format for optimum performance
- Write HiveQL scripts to generate the desired results based on the business needs
- Write Linux BASH scripts to perform HDFS maintenance, to automate routine activities and to schedule jobs.
Environment: MySQL, HDFS, YARN, Sqoop, MapReduce, Pig, Hive, Linux BASH, Oozie and Microsoft Office.
- Work in an Agile Development Environment
- Coordinated with the Development team in designing a MySQL database
- Was responsible for populating and maintaining the MySQL database for listing out all the products carried by the company
- Wrote SQL scripts that were used by the Development team during the website creation
- Carried out extensive Database testing of the product tables in conjunction with the online sales website.
Environment: Linux, MySQL, ALM, and Microsoft Office.