Hadoop Developer Resume
Richmond, VA
SUMMARY:
- Around 8 years of overall experience in a variety of industries including 4+ years of experience in Big Data Technologies (Apache Hadoop stack and Apache Spark), 4+ years of experience in Java Technologies
- Hands on experience with Cloudera and Hortonworks.
- Hands on experience in Hadoop Ecosystem components such as Hive, Pig, Sqoop, Flume, Impala, Oozie, Zookeeper, HBase.
- Strong knowledge of Hadoop Daemons such asHDFS, Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce concepts.
- Hands on experience in writing Map Reduce programs using Java to handle different data sets using Map and Reduce tasks.
- Hands on experience with various Apache Hadoop Ecosystems such as Hadoop, Spark, HDFS, MapReduce, YARN, TEZ,HBase, Pig, Hive, Sqoop, Flume, Oozie, and Kafka
- Hands on experience in writing MapReduce jobs in Java, Pig, and Python
- Experience in dealing with SQL in Hadoop with Apache Hive
- Hands on experience in writing Apache Spark SQL and Spark Streaming programming with Scala and Python.
- Developed multiple Map Reduce jobs to perform data cleaning and preprocessing.
- Involved in designing the data model in Hive for migrating the ETL process into Hadoop and wrote Pig Scripts to load data into Hadoop environment
- Designed HIVE queries & Pig scripts to perform data analysis, data transfer and table design.
- Expertise in writing Hive UDF, Generic UDF's to in corporate complex business logic into Hive Queries.
- Experienced in optimizing Hive queries by tuning configuration parameters.
- Implemented SQOOP for large dataset transfer between Hadoop and RDBMS.
- Extensively used Apache Flumeto collect the logs and error messages across the cluster.
- Experience in implementing Real - Time streaming and analytics using SparkStreaming and Kafka
- Experience in data ingestion using Sqoop from RDBMS to HDFS and Hive and vice-versa
- Proficient in Java/J2EE technologies - Core Java, JSP, Java Beans, Java Servlets, Ajax, JDBC, ODBC, Web Services, Swing, Hibernate, Spring, Struts, XML and XSLT
- Performed data analysis using MySQL, SQL Server Management Studio and Oracle
- Experience with ETL Tool using Informatica, Talend and SSIS
- Experience in working with Cloudera (CDH3 & CDH4&CDH5) and Horton works Hadoop Distributions.
- Hands on experience on AWS infrastructure services Amazon Simple Storage Service (Amazon S3) and Amazon Elastic Compute Cloud (Amazon EC2).
- Worked with Oozie and Zookeeper to manage the flow of jobs and coordination in the cluster
- Experience in performance tuning, monitoring the Hadoop cluster by gathering and analyzing the existing infrastructure using Cloudera manager.
- Experience with configuration of Hadoop Ecosystem components: Map Reduce, Hive, HBase, Pig, Sqoop, Oozie, Zookeeper, Flume, Storm, Spark, Yarn, Tez.
- Experience with Restful Services and Amazon Web Services
- Hands on Experience on Amazon’s EC2, EMR and S3
- Conversant with Web/Application Servers - Tomcat, Websphere, Weblogic and IIS
- Experience in writing Maven and SBT scripts to build and deploy Java and Scala Applications
- Around 2 years’ experience on Spark and Scala
- Implemented unit testing with Junit and MRUnit
- Expertise in Web Application Development with JSP, HTML, CSS, JavaScript, ASP .Net, C# .Net and JQuery
TECHNICAL SKILLS:
Big data Technologies: Hadoop, Map Reduce, HDFS, Hive, Pig, Zookeeper, Sqoop,Oozie, Flume, IMPALA, HBASE, Kafka, Storm
Big Data Frameworks:: HDFS, YARN, Spark
Hadoop Distributions: Cloudera(CDH3,CDH4,CDH5),Horton works, Amazon EMR
Programming Languages: Java, C, C++,shell scripting, Scala
Databases: RDBMS, MySQL, Oracle, Microsoft SQL Server, Teradata, DB2, PL/SQL, CASSANDRA, MongoDB
IDE and Tools: Eclipse, NetBeans, Tableau
Operating System: Windows XP/vista/7, Linux/Unix
Frameworks: Spring, Hibernate, JSF, EJB, JMS
Scripting Languages: JSP & Servlets, JavaScript, XML, HTML, Python
Application Servers: Apache Tomcat, Web Sphere, Web logic, JBoss
Methodologies: Agile, SDLC,Waterfall
Web Services: Restful, SOAP
ETL Tools: Talend, Informatica
Others: Solr, Elasticsearch
PROFESSIONAL EXPERIENCE:
Confidential, Richmond, VA
Hadoop Developer
Responsibilities:
- Imported the retail and commercial data from various vendors into HDFS using EDE process and Sqoop.
- Designed the Cascading flow setup from the Edgenode to the HDFS(Data lake)
- Created the cascading code to do several type of data transformations as required by the DA
- Used the Hue to create external Hive tables on the data in the data imported and on transformed data
- Developed the code for removing or replacing the error fields in the data fields using cascading
- Created the custom functions for several datatype conversions, handling the errors in the data provided by the vendor
- Monitored the cascading flow using the Driven component to ensure the desired result was obtained
- Optimized a Confidential tool Docs, for importing the data and converting the data into parquet file format post validation.
- Involved in testing the tool Spark for exporting the data from HDFS to external database in POC
- Developed the shell scripts for automating the cascading jobs for Control Mschedule.
- Involved in testing the AWS Redshift to connecting with SQL database for testing and storing data in POC
- Developed Hive queries to analyze the data according to the customer rating Id for several projects
- Converted the raw files (CSV, TSV) to different file formats like Parquet and Avro with datatype conversion using cascading
- Involved in writing the test cases for the cascading jobs using Plunger framework.
- Setting up the cascading environment and troubleshooting the environmental issues related to cascading.
- Assisted in creating and maintaining Technical documentation to launching HADOOP Clusters and even for executing Hive queries and Pig Scripts
Environment: MapReduce, HDFS Sqoop, Cascading, LINUX, Shell, Hadoop, Spark, Hive, AWS RedShift, Hadoop Cluster
Confidential, New York, NY
Sr. Hadoop Developer
Responsibilities:
- Worked on analyzing Hadoop cluster using different big data analytic tools including Pig, Hive, and MapReduce
- Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis
- Used Pig as ETL tool to do transformations, event joins, filtering and some pre-aggregations before storing the data onto HDFS.
- Hands on experience in writing, executing pig scripts.
- Hands on experience in writing Pig UDFs.
- Configured Oozie work flows to automate data flow, preprocess and cleaning tasks using Hadoop Actions.
- Daily Monitoring of Cluster status and health included Data Node, Job Tracker, Talk Tracker, and Name Node.
- Experience with configuration of Hadoop Ecosystem components: Map Reduce, Hive, HBase, Pig, Sqoop, Oozie, Zookeeper, Flume, Storm, Spark, Yarn, Tez.
- Experience with CDH distribution and ClouderaManager to manage and monitor Hadoop clusters.
- Knowledge on rendering and delivering reports in desired formats by using reporting tools such as Tableau.
- Worked on debugging, performance tuning of Hive & Pig Jobs
- Worked on tuning the performance Pig queries
- Gained experience in managing and reviewing Hadoop log files
- Created HBase tables to store various data formats coming from different applications
- Developed ETL Scripts for Data acquisition and Transformation using Talend
- Extensive experience with Talend source & connections configuration, credentials management, context management
- Implemented and assisted with Talend installations and Talend Servers setup which including,MDM server
- Implemented proof of concept to analyze the streaming data using Apache Spark with Scala and Python; Used Maven and SBT for build and deploy the Spark programs
- Involved in scheduling Oozie workflow engine to run multiple Hive and pig jobs
- Developed simple to complex MapReduce jobs using Java, Pig and Hive
- Developed application using Eclipse and used build and deploy tool as Maven
- Exported the analyzed data to the relational databases using Sqoop for visualization
Environment: Hadoop, MapReduce, HDFS, Pig, Hive, Sqoop, Flume, Java, Oracle 10g, MySQL, SQL Server, Ubuntu, Agile, SQL Server, YARN, Spark,Hortonworks, Teradata, Talend, UNIX Shell Scripting, Oozie, Maven, Eclipse
Confidential, NY
Hadoop Developer
Responsibilities:
- Used Sqoopto extract data from Oracle SQL server and MySQL databases to HDFS
- Developed workflows in Oozie for business requirements to extract the data using Sqoop
- Developed MapReduce(YARN) jobs for cleaning, accessing and validating the data
- Used Hive and Impala to query the data in HBase
- Written multiple MapReduce programs in java for data extraction, transformation and aggregation from multiple file formats including XML, JSON, CSV and other compressed file formats
- Hive scripts were written in Hive QL to de-normalize and aggregate the data
- Used Solr for querying and searching the Hbase DB
- Optimized the existing Hive and Pig Scripts
- Created external table using Hive to perform analysisin HDFS
- Involved in loading data from UNIX file system to HDFS
- Designed workflows by scheduling Hive processes for Log file data, which is streamed into HDFS using Flume
- Involved in implementing a query to search the clients using their respective fields using Neo4j
- Developed schemas to handle reporting requirements using Tableau
- Involved with a team who worked on NoSQL databases like MongoDB for POC (proof of concept) in storing documents using GridFs.
- Have deep and thorough understanding of ETL tools and how they can be applied in a Big Data environment
- Worked with application teams to install operating systems, Hadoop updates, patches, version upgrades as required.
Environment: Hadoop, Map Reduce, Hive QL, Hive, HBase, Sqoop, Solr, Flume, Tableau, Impala, Oozie, MYSQL, Oracle SQL, Java, Unix Shell, YARN, Pig Latin.
Confidential, Great Neck, NY
Hadoop and Java Developer
Responsibilities:
- Worked as a senior developer for the project
- Used Enterprise Java Beans as a middleware in developing a three-tier distributed application
- Developed Session Beans and Entity beans to business and data process
- Implemented Web Services with REST
- Developed user interface using HTML, CSS, JSPs and AJAX
- Client side validation using JavaScript and JQuery
- Performed client side validation with JavaScript and applied server side validation as well to the web pages.
- Used JIRA for BUG Tracking of Web application.
- Written Spring Core and Spring MVC files to associate DAO with Business Layer.
- Worked with HTML, DHTML, CSS, and JAVASCRIPT in UI pages.
- Wrote Web Services using SOAP for sending and getting data from the external interface.
- Extensively worked with JUnit framework to write JUnit test cases to perform unit testing of the application
- Implemented JDBC modules in java beans to access the database.
- Designed the tables for the back-end Oracle database.
- Application hosted under Web Logic and developed utilizing Eclipse IDE.
- Used XSL/XSLT for transforming and displaying reports. Developed Schemas for XML.
- Involved in writing the ANT scripts to build and deploy the application.
- Developed a web-based reporting for monitoring system with HTML and Tiles using Struts framework.
- Implemented field level validations with AngularJS, JavaScript and JQuery
- Preparation of unit test scenarios and unit test cases
- Branding the site with CSS
- Code review and unit testing the code
- Involved in unit testing using Junit
- Implemented Log4J to trace logs and to track information
- Involved in project discussions with clients and analyzed complex project requirements as well as prepared design documents
Environment: Hive, Pig, HBase, Zookeeper, Sqoop,Cloudera,Java, JDBC, JNDI, Struts, Maven, Trac, Subversion, JUnit, SQL language, spring, Hibernate, Junit, Oracle, XML, Altova XmlSpy, Putty and Eclipse.
Confidential
Java Developer
Responsibilities:
- Access to this site is provided for authorized users.
- Coding using Java, JSP, and HTML.
- Developed front end validations using JavaScript and developed design and layouts of JSPs and custom taglibs for all JSPs.
- Participated in planning and development of UML diagrams like Use Case Diagrams, Object Diagrams, Class Diagrams and Sequence Diagrams to represent the detail design phase.
- Implemented several Test cases using Junit.
- Implemented the Log4J logging component from Apache into the Application.
- Made Builds and deployed the same onto Common development test Environment, which is a Web sphere Application server Environment to verify its functional requirements.
Environment: Java, J2EE, Tomcat, JSP and Struts Framework, Eclipse, SQL and Oracle.