Hadoop Developer Resume
SUMMARY:
- Proactive IT developer with 8+ years of working experience on development and design of various scalable systems using Hadoop Technologies on various environments
- Experience in installation, configuration, supporting and managing Hadoop Clusters using Horton works , and Cloudera (CDH3, CDH4 ) distributions on Amazon web services (AWS).
- Extraordinary Understanding of Hadoop building and Hands on involvement with Hadoop segments such as Job Tracker, Task Tracker, Name Node, Data Node and HDFS Framework.
- Extensive experience in analyzing data using Hadoop Ecosystems including HDFS, Hive, PIG, Sqoop, Flume, MapReduce, Spark, Kafka, HBase, Oozie, Solr and Zookeeper.
- Capable of processing large sets of structured, semi - structured and unstructured data and supporting systems application architecture.
- Extensive knowledge on NoSQL databases like HBase, Cassandra, Mongo DB.
- Configured Zookeeper, Cassandra and Flume to the existing Hadoop cluster.
- Have an experience in importing and exporting data using Sqoop from Hadoop Distributed File Systems to Relational Database Systems and also Relational Database Systems to Hadoop Distributed File Systems.
- Expertise in writing Hadoop Jobs for analyzing data using Hive QL ( Queries), Pig Latin ( Data flow language ), and custom MapReduce programs in Java .
- Involvement in creating custom UDFs for Pig and Hive to consolidate strategies and usefulness of Python/Java into PigLatin and HQL (HiveQL).
- Experience in converting Hive queries into Spark transformations using Spark RDDs and Scala .
- Hands on Experience in troubleshooting errors in HBase Shell, Pig, Hive and MapReduce.
- Hands-on experience in provisioning and managing multi-tenant Cassandra cluster on public cloud environment - Amazon Web Services (AWS) - EC2, Open Stack.
- Experience in NoSQL Column-Oriented Databases like HBase , Cassandra and its Integration with Hadoop cluster.
- Experience in maintaining the big data platform using open source technologies such as Spark and ElasticSearch.
- Experience in configuring the flume agents for the transfer of data from external systems to HDFS.
- Got good experience with NOSQL database SOLR HBase.
- Implemented Cluster for NoSQL tools Cassandra, MongoDB as a part of POC to address HBase limitations
- Planned and created answer for constant information ingestion utilizing Kafka, Storm, Spark spilling and different NoSQL databases.
- Developed Scala scripts, UDF's using both Data frames/SQL and RDD/MapReduce in Spark for Data Aggregation, queries and writing data back into RDBMS through Sqoop.
- Experience in understanding the security requirements for Hadoop and integrate with Kerberos authentication and authorization infrastructure.
- Good hands on experience in creating the RDD' s, DF's for the required input data and performed the data transformations using Spark Scala.
- Experience in working with various Cloudera distributions (CDH4/CDH5), Hortonworks and Amazon EMR Hadoop Distributions.
- Knowledge in developing a Nifi flow prototype for data ingestion in HDFS .
- Developed automated scripts using Unix Shell for performing RUNSTATS, REORG, REBIND, COPY, LOAD, BACKUP, IMPORT, EXPORT and other related to database activities.
- Experience in analyzing, designing and developing ETL strategies and processes, writing ETL specifications, Informatics development.
- Extensive experience working in Oracle, DB2, SQL Server, PL/SQL and My SQL database and Java Core concepts like OOPS, Multithreading, Collections and IO .
- Good working knowledge on Object Oriented Programming.
- Experienced in designing Web Applications using HTML5, CSS3, JavaScript, Json, JQuery, AngularJS, Bootstrap and Ajax under Windows operating system.
- Experience in Service Oriented Architecture using Web Services like SOAP & Restful.
- Learning on administration situated design (SOA), work processes and web administrations utilizing XML, SOAP, and WSDL
- Extensive experience in middle-tier development using J2EE technologies like JDBC, JNDI, JSP, Servlets, JSP, JSF, Struts, Spring, Hibernate, JDBC, EJB.
- Good experience in working with Tableau Visualization tool using Tableau Desktop , T ableau Serve r and Tableau Reader.
- Have good interpersonal, communicational skills, strong problem solving skills, explore to new technologies with ease and a good team member.
TECHNICAL SKILLS:
Big Data Eco systems: HDFS, MapReduce, Hive, YARN, Pig, Sqoop, Kafka, Storm, Flume, Oozie, and ZooKeeper, Apache Spark, Apache Tez, Impala, Nifi, Apache Solr, Active MQ,Scala.
No SQL Databases: Hbase, Cassandra, mongoDB
Programming Languages: C, C++, Java, J2EE, PL/SQL, Pig Latin, Scala, Python
Java/J2EE Technologies: Applets, Swing, JDBC, JNDI, JSON, JSTL, RMI, JMS, Java Script, JSP, Servlets, EJB, JSF, JQuery,AngularJS
Frameworks: MVC, Struts, Spring, Hibernate
Sun Solaris, HP: UNIX, RedHat Linux, Ubuntu Linux and Windows XP/Vista/7/8
Web Technologies: HTML, DHTML, XML, AJAX, WSDL, SOAP
Web/Application servers: Apache Tomcat, WebLogic, JBoss
Version control: SVN, CVS
Network Protocols: TCP/IP, UDP, HTTP, DNS, DHCP
Business Intelligence Tools: Tableau, QlikView, Pentaho, IBM Cognos intelligence
Databases: Oracle 9i/10g/11g, DB2, SQL Server, MySQL, Teradata
Tools: and IDE: Eclipse, NetBeans, Toad, Maven, ANT, Hudson, Sonar, JDeveloper, Assent PMD, DB Visualizer
Cloud Technologies: Amazon WebServices(AWS), CDH3, CDH4, CDH5, HortonWorks, Mahout, Microsoft Azure Insight, Amazon RedShift
PROFESSIONAL EXPERIENCE:
Hadoop Developer
Confidential
Responsibilities:
- Involved in managing nodes on Hadoop cluster and monitor Hadoop cluster job performance using Cloudera manager.
- Developed optimal strategies for distributing the web log data over the cluster importing and exporting the stored web log data into HDFS and Hive using Sqoop.
- Involved in loading data from edge node to HDFS using shell scripting.
- Created Map Reduce programs to handle semi/unstructured data like xml, json, Avro data files and sequence files for log files.
- Developed Spark scripts by using Python shell commands as per the requirement.
- Integrated ElasticSearch and implemented dynamic faceted-search.
- Played a key role in installation and configuration of the various Hadoop ecosystem tools such as Solr , Kafka , Pig , HBase and Cassandra.
- Involved in collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
- Written Storm topology to accept the events from Kafka producer and emit into Cassandra DB.
- Developed HDFS with huge amounts of data using Apache Kafka .
- Design and Develop Pig Latin scripts and Pig command line transformations for data joins and custom processing of Map reduce outputs.
- Developed end-to-end search solution using web crawler, Apache Nutch & Search Platform, Apache SOLR .
- Developed ETL job in Talend to load data from ASCII , Flat files.
- Used pig loader for loading tables from Hadoop to various clusters.
- Designed talend jobs for data ingestion, enrichment and provisioning.
- Design and develop custom Java components for Talend.
- Worked in migrating HiveQL into Impala to minimize query response time.
- Created Hive tables , dynamic partitions, buckets for sampling, and working on them using HQL.
- Implemented Spark using Scala and Spark SQL for faster testing and processing of data.
- Used Spark stream processing to get data into in-memory, implemented RDD transformations , actions to process as units.
- Implemented a proof of concept (Poc's) using Kafka , Strom , HBase for processing streaming data.
- Implementing advanced procedures like text analytics and processing using the in-memory computing capabilities like Apache Spark written in Scala .
- Used MRUnit for unit testing and Continuum for integration testing.
- Implemented Spark RDD transformations to map business analysis and apply actions on top of transformations.
- Used maven to build and deploy the Jars for MapReduce, Pig and Hive UDFs.
- Integrated Oozie with the rest of the Hadoop stack supporting several types of Hadoop jobs out of the box (such as Map-Reduce, Pig, Hive, and Sqoop ) as well as system specific jobs (such as Java programs and shell scripts ).
- Developed Spark scripts by using Python shell commands as per the requirement.
Environment: Hadoo p , Scala, Map Reduce, HDFS, Spark,Scala,Kafka, AWS, Apache SOLR,Hive, Cassandra, maven, Jenkins, Pig, UNIX, Python, MRUnit, Git.
Confidential, Mountain View, CA
Hadoop Developer
Responsibilities:
- Responsible for building scalable distributed data solutions using Hadoop .
- Worked in joining raw data with the reference data using Pig scripting.
- Analyzed data using Hadoop components Hive and Pig.
- Implemented DataStax Enterprise Search with Apache Solr .
- Stack and change extensive arrangements of organized, semi organized and unstructured information utilizing Hadoop/Big Data ideas.
- Implemented DSE SOLR solution to push incremental orders data in to centralized Hadoop cluster.
- Configured, Designed implemented and monitored Kafka cluster and connectors.
- Developed ETL jobs using Spark-Scala to migrate data from Oracle to new hive tables.
- Developed and Deployed applications using Apache Spark, Scala.
- Developed Oozie workflow for scheduling and orchestrating the ETL process
- Implemented Spark using Scala and Spark SQL for faster testing and processing of data.
- Created a high-level design approach to build a data lake , which will embrace the existing history data, and to suffice the need to process the transactional data.
- Helped in troubleshooting Scala problems while working with Micro Strategy to produce illustrative reports and dashboards along with ad-hoc analysis.
- Developed Hive queries for the analysts and I have written scripts using Scala.
- Created and worked Sqoop jobs with incremental load to populate Hive External tables.
- Installed and configured Flume, Hive, Pig, Sqoop and Oozie on the Hadoop cluster.
- Handled importing data from various data sources, performed transformations using Hive, Map Reduce, and loaded data into HDFS.
- Used to write custom UDF's in Hive and Pig . Used scripts written in Scala for performing MR Operations.
- Continuous Integration environments in SCRUM and Agile methodologies.
- Extracted the data from Teradata into HDFS using the Sqoop.
- Installed Oozie workflow engine to run multiple Hive and Pig jobs, which run independently with time and data availability.
- Managed real time data processing and real time Data Ingestion in HBase and Hive using Storm.
Environment: Hadoop , HDFS, Pig, Hive,Oozie, HBase, Kafka, Apache SOLR, MapReduce, ApacheSOLR, Sqoop, Storm, Spark, Scala, LINUX, Cloudera, Maven, Jenkins, Java, SQL.
Confidential, Tampa, Florida
Hadoop Developer
Responsibilities:
- Exported data from DB2 to HDFS using Sqoop and Developed MapReduce jobs using Java API .
- Installed and configured Pig and wrote Pig Latin scripts .
- Created and maintained Technical documentation for launching Cloudera Hadoop Clusters and for executing Hive queries and Pig Scripts.
- Developed workflow-using Oozie for running MapReduce jobs and Hive Queries.
- Implementing various advanced join operations using Pig Latin.
- Done the work in importing and exporting data into HDFS and assisted in exporting analyzed data to RDBMS using SQOOP.
- Assisted in exporting analyzed data to relational databases using Sqoop.
- Involved in Develop monitoring and performance metrics for Hadoop clusters.
- Worked with both MapReduce 1 (Job Tracker) and MapReduce 2 (YARN).
- Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.
- Configured Hadoop system files to accommodate new sources of data and updated the existing configuration Hadoop cluster.
Environment: Hadoop , HDFS, Hive, Flume, Sqoop, HBase, PIG, Eclipse,Spark, My SQL and Ubuntu, Zookeeper, Maven, Jenkins, Java (JDK 1.6), Oracle10g.
Confidential, NJ
Java Developer
Responsibilities:
- Effectively interacted with team members and business users for requirements gathering.
- Involved in analysis, design and implementation phases of the software development lifecycle (SDLC).
- Implementation of spring core J2EE patterns like MVC , Dependency Injection (DI), and Inversion of Control (IOC).
- Implemented REST Web Services with Jersey API to deal with customer requests.
- Developed test cases using J Unit and used Log4j as the logging framework.
- Worked with HQL and Criteria API from retrieving the data elements from database.
- Developed user interface using HTML, Spring Tags, JavaScript, J Query and CSS.
- Developed the application using Eclipse IDE and worked under Agile Environment.
- Design and implementation of front end web pages using CSS, JSP, HTML, java Script Ajax and, Struts
- Utilized Eclipse IDE as improvement environment to plan, create and convey Spring segments on Web Logic
Environment: Java , J2EE, HTML, JavaScript, CSS, J Query, Spring 3.0, JNDI, Hibernate 3.0, Java Mail, Web Services, REST, Oracle 10g, J Unit, Log4j, Eclipse, Web logic 10.3.
Java Developer
Confidential
Responsibilities:
- Involved in various stages of Enhancements in the Application by doing the required analysis , development, and testing.
- For analysis and design of application created Use Cases, Class and Sequence Diagrams.
- Developed web-based user interfaces using struts framework.
- Developed and maintained Java/J2EE code required for the web application.
- Handled Client Side Validations used JavaScript and Involved in integration of various Struts actions in the framework.
- Involved in the development of the User Interfaces using HTML, JSP, CSS and JavaScript.
- Developed, Tested and Debugged the Java , JSP and EJB components using Eclipse .
Environments: Java (JDK 1.5), J2EE, Servelets, Struts, JSP, HTML, CSS, JavaScript, EJB, Eclipse, WebLogic 8.1, Windows.
