Hadoop Developer Resume
MN
SUMMARY
- 4+ years of IT experience in complete life cycle of software development using Object Oriented analysis and design using Big data /Hadoop, SQL, Java, J2EE technologies.
- Excellent understanding of Hadoop architecture and complete understanding of Hadoop daemons and various components such as HDFS, YARN, Resource Manager, Node Manager, Name Node, Data Node.
- Hands on experience in working wif Hadoop ecosystem components like HDFS, Map Reduce, Hive, Pig, Sqoop, Oozie, Flume, HBase.
- Experience in AWS serviceslike AWS S3, EMR, and EC2.
- Experience in writing Map Reduce programs in Java for Data Analysis.
- Experience in tuning the performances by using Partitioning and Bucketing in Hive.
- Hands on experience working on structured, unstructured data wif various file formats such as AVRO files, XML files, JSON files, Parquet and Sequence files using Map Reduce programs, Hive and Spark.
- Experience in importing and exporting data using Sqoop to HDFS, Hive from Relational Database Systems.
- Working experience wif HadoopClusters using Cloudera (CDH) distribution and Hortonworks distribution.
- Experience on YARN environment wif Spark.
- Experience in Job management using Autosys scheduler and Developed job processing scripts using Oozie workflow and have noledge in Zookeeper.
- Hands on NoSQL database experience wif HBase and Cassandra.
- Experienced in handling streaming data like web server log data using Flume.
- Experience in developing Pig Latin Scripts for transformations and using Hive Query Language for data analytics.
- Exposure to Spark, Spark SQL, SparkStreaming and Creating the Data Frames handled in Spark wif Scala and Python.
- Experience as a Java Developer in client/server technologies using J2EE Servlets, JSP, JDBC and SQL.
TECHNICAL SKILLS
Big Data Ecosystem: HDFS, Map Reduce, Hive, Pig, Sqoop, Flume, HBase, and Oozie.
Operating Systems: RedHat Linux, UNIX, Windows
Languages: Java, Scala, Python, Pig Latin, HQL, SQL.
Databases: Oracle, MS SQL server, My SQL.
No SQL Databases: Cassandra and HBase.
HadoopDistributions: Cloudera and Hortonworks.
Java & J2EE Technologies: Core Java, Servlets, JSP, JDBC, Java Beans.
Tools: SQL Developer, Impala, Hue, MS Office, Tableau.
PROFESSIONAL EXPERIENCE:
Confidential, MN
Hadoop Developer
Responsibilities:
- Worked on analyzing Hadoopcluster using different big data analytic tools including Pig, Hive, Sqoop, Spark, Oozie and Impala wif Cloudera distribution.
- Implemented Map Reduce programs to handle semi structured and unstructured data like XML, JSON, AVRO data files and sequence filesfor log files.
- Worked wif Sqoop to import the data from RDBMS to Hive tables.
- Created Partitioned and Bucketed Hive tables in Parquet File Formats and then loaded data into Parquet Hive tables from Avro Hive tables.
- Implemented Spark Scripts using Python, SparkSQL to access Hive tables into Sparkfor faster processing of data.
- Imported data from AWS S3and into SparkRDD and performed transformations and actions on RDD's.
- Developed SparkData Frame that reads the data from SparkRDD.
- Used Tableau for data visualization and generating reports.
- Involve in Pig to do transformations, event joins, filter both traffic and some pre - aggregations for storing the data onto HDFS.
- Experience in Job management using Autosys scheduler and Developed job processing scripts using Oozie workflow.
- Wrote HBase queries for finding different metrics.
- Developed shell scripts for running Hive scripts in Hive and Impala.
- Migrated Existing Map Reduce programs to Spark Models using Java.
- Responsible in code review, finding bug and bug fixing for improving the performance.
Environment: HDFS, Map Reduce, Hive, Pig, Impala, Apache Spark, SQL, Sqoop, HBase, Oozie, Tableau, Cloudera, AWS
Confidential, Orlando, FL
Hadoop Developer
Responsibilities:
- Involved wif ingesting data received from various relational database providers, on HDFS for analysis and other big data operations.
- Wrote Map Reduce jobs to perform operations like copying data on HDFS and defining job flows on EC2 server.
- Involved in converting Hive/SQL queries into Sparktransformations using SparRDDs.
- Created Hive tables, loaded the data and Performed data manipulations using Hive queries in Map Reduce Execution Mode.
- Designed Hive external tables using shared meta-store instead of derby wif dynamic partitioning and bucketing.
- Used Spark API over Hortonworks HadoopYARN to perform analytics on data in Hive.
- Designed and implemented Spark jobs to support distributed data processing.
- Developed Pig Latin scripts to extract and filter relevant data from the web server output files to load into HDFS.
- Used complex data types like bags, tuples and maps in Pig for handling data.
- Performed importing data from various sources to the Cassandra cluster using Sqoop.
- Worked on creating data models for Cassandrafrom Existing Oracle data model.
- Used Hue to read and create tables in Hive.
- Read from Flume and involved in pushing batches of data to HDFS.
- Developed workflow in Oozie to automate the tasks of loading the data into HDFS.
- Developed the UNIX shell scripts for creating the reports from Hive data.
Environment: HDFS, Map Reduce, Hive, Pig, Hue, Spark, Oracle, Sqoop, Oozie, Flume, Cassandra, Hortonworks, AWS
Confidential, Dallas, Texas
Hadoop Developer
Responsibilities:
- Involved in entire software development life cycle of the project.
- Developed, Monitored and Optimized Map Reduce jobs for data cleaning and pre-processing.
- Handled importing data from various data sources, performed transformations using Hive, Map Reduce, and loaded data into HDFS.
- Involved in creating Hive tables, loading data and running hive queries in those data.
- Implemented Partitions, Buckets in Hive for optimization.
- Used DML statements to perform different operations on Hive Tables
- Worked on custom Pig Loaders and storage classes to work wif variety of data formats such as JSON and XML file formats.
- Importing and exporting data into HDFS and Hive using Sqoop.
- Extracted filesfrom NoSQL database Cassandra using Sqoop.
- Implemented secondary sorting to sort reducer output globally in Map Reduce.
- Managing and scheduling Jobs to remove the duplicate log data files in HDFS using Oozie.
- Developed Pig functions to preprocess the data for analysis.
- Created HBase tables to store all data.
- Involved in loading data from UNIX file system to HDFS.
Environment: HDFS, Map Reduce, Hive, Pig, Sqoop, Cassandra
Confidential
Java Developer
Responsibilities:
- Involved in designing of shares and cash modules using UML.
- Followed Java & J2EE design patterns and the coding guidelines to design and develop the application.
- Worked on backend service in Spring MVC and open EJB for the interaction wif Oracle and Mainframe using DAO and model objects.
- Used HTML and JSP for the web pages and used JavaScript for Client-side validation.
- Introduced Spring IOC to increase application flexibility and replace the need for hard-coded class based application functions.
- Used mainframe screen scraping for adding forms to mainframe through the claims data entry application.
- Created View objects, View Links, Association Objects, Application modules wif data validation rules (Exposing Linked Views in an Application Module), LOV, dropdown, value defaulting, transaction management features.
- Used JDBC 2.0 extensively and was involved in writing several SQL queries for the data retrieval.
- Involved in configuration of Spring MVC and Integration wif Hibernate.
- Prepared program specifications for the loans module and involved in database designing.
- Used Eclipse 3.5 IDE for code development and deployed in Web Logic Server.
- Servlet programming for connecting to the database server and to retrieve the serialized data.
- Developed stored procedures and triggers using PL/SQL to calculate and update the tables to implement business logic.
Environment: Java, J2EE, JDBC 2.0, Servlets, Spring, JSP, Java Beans, Eclipse, PL/SQL