Hadoop Research Assistant Resume
SUMMARY:
- Hadoop Developer with 11/2 years of programming and software development experience with skills in design, development and deployment of software systems from development stage to production stage in Big Data technologies.
- Experience in Big Data and Hadoop Ecosystem tools Pig, Hive, Sqoop, Oozie, Kafka and Zookeeper.
- Experience in creating PIG Latin Scripts and UDFs using JAVA for analysis of data efficiently.
- Experience in creating Hive Queries and UDFs using Java for analysis of data efficiently.
- Knowledge of Hadoop GEN2 Federation, High Availability and YARN architecture.
- Expert in using Sqoop for fetching data from different systems and HDFS to analyze in HDFS, and again putting it back to the previous system for further processing.
- Experience with Oozie Workflow Engine in running workflow jobs with actions that run Hadoop Map/Reduce and Pig jobs.
- Excellent knowledge on Hadoop Ecosystem Architecture and components such as Hadoop Distributes File System (HDFS), MRv1, MRv2, Job Tracker, Task Tracker, Name Node, Data Node, Resource Manager, Node Manager and MapReduce programming.
- Good experience in optimizing MapReduce algorithms using Mappers, Reducers, Combiners and Partitioners to deliver best results for the large dataset.
- Experience in using IDEs like Eclipse and NetBeans.
- Strong knowledge of Software Development Life Cycle and expertise in detailed design documentation.
- Extensive experience with Waterfall and Agile Scrum Methodologies.
- Experience in working with Apache Kafka.
- Strong expertise on MapReduce programming model with XML, JSON, CSV file formats.
- Highly proficient in Object Oriented Programming concepts.
TECHNICAL SKILLS:
Hadoop/Big Data Technologies: Hadoop 2.x, HDFS, MapReduce, Pig 0.14.0, Hive 1.1.0, Sqoop 1.4.6, Cloudera CDH 4, Kafka, Oozie, Avro, YARN and Zookeeper 3.5.0.
Programming Languages: Java, C and Matlab
Scripting/Web Technologies: JavaScript, HTML, XML, Shell Scripting, Python, JSON.
Databases: Oracle and MySQL
Operating Systems: Linux, UNIX and Windows.
Java IDEE: clipse and NetBeans.
Visualization Tools: Tableau
WORK EXPERIENCE:
Hadoop Research Assistant
Confidential
Responsibilities:
- Installed and configured Hadoop Map Reduce, HDFS, Developed multiple Map Reduce jobs in Java for data cleaning and preprocessing.
- Experience in installing, configuring and using Hadoop Ecosystem components.
- Experience in Importing and exporting data into HDFS and Hive using Sqoop.
- Load and transform large sets of structured, semi structured and unstructured data.
- Responsible for managing data coming from different sources.
- Continuous monitoring and managing the Hadoop cluster using Cloudera Manager.
- Strong expertise on MapReduce programming model with XML, JSON, CSV file formats.
- Involved in creating Hive tables, loading with data and writing hive queries, which will run internally in map, reduce way.
- Responsible for building scalable distributed data solutions using Hadoop.
- Experience in managing and reviewing Hadoop log files.
- Involved in loading data from LINUX file system to HDFS.
- Implemented test scripts to support test driven development and continuous integration.
- Created Pig Latin scripts to sort, group, join and filter the enterprise wise data.
- Worked on tuning the performance Pig queries.
- Mentored analyst and test team for writing Hive Queries.
- Installed Oozie workflow engine to run multiple MapReduce jobs.
- Extensive Working knowledge of partitioned table, UDFs, performance tuning, compression - related properties in Hive.
- Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
- Worked with the Data Science team to gather requirements for various data mining projects.
Environment: Cloudera CDH 4, HDFS, Hadoop 2.2.0 (Yarn), Eclipse, Map Reduce, Hive 1.1.0, Pig Latin 0.14.0, Java, SQL, Sqoop 1.4.6, Centos, Zookeeper 3.5.0
Hadoop Java Developer
Confidential
Responsibilities:
- As a Big Data Developer, implemented solutions for ingesting data from various sources and processing the Data-at-Rest utilizing Big Data technologies such as Hadoop, MapReduce Frameworks, Hive, Sqoop etc.
- Supported Map Reduce Programs those are running on the cluster.
- Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
- Developed and written Apache PIG scripts and HIVE scripts to process the HDFS data.
- Involved in Unit testing and delivered Unit test plans and results documents using Junit and MRUnit.
- Written Hive jobs to parse the logs and structure them in tabular format to facilitate effective querying on the log data.
- Involved with File Processing using Pig Latin.
- Involved in creating Hive tables, loading with data and writing hive queries that will run internally in map reduce way.
- Experience in optimization of Map reduce algorithm using combiners and partitions to deliver the best results and worked on Application performance optimization for a HDFS cluster.
- Implemented business logic by writing UDFs in Java and used various UDFs from Piggybanks and other sources.
Environment: Java, Hadoop, Map Reduce, Pig, Hive, Linux, Sqoop, Flume, Eclipse and Cloudera CDH.