We provide IT Staff Augmentation Services!

Hadoop Developer/spark Resume

SUMMARY:

  • Over 8+ years of Professional experience in IT Industry involved in Developing, Implementing, configuring, testing Hadoop ecosystem components and maintenance of various web based applications using Java, J2EE.
  • Having 5 years Experience in Hadoop/Bigdata Working with different Hadoop ecosystem including HDFS, MapReduce, Yarn, Pig, Hive, Nifi, HBase, Oozie, Zookeeper, Sqoop, Spark, Kafka and flume
  • Extensive hands on experience in writing complex Map reduce jobs in Java and Pig Scripts and Hive data modeling.
  • Excellent understanding and knowledge of Hadoop Distributed file system data modelling, architecture and design principles.
  • Knowledge on Apache NIFI   for real - time analytical processing
  • Experience in writing Pig and Hive scripts and extending Hive and Pig core functionality by writing custom UDFs
  • Expert in ETL testing of business intelligence/Data Migration/Data Conversion solutions using manual/SQL ETL Testing methodologies . 
  • Experience in writing Map Reduce programs and performance tuning of the Hadoop cluster by gathering and analyzing the existing infrastructure.
  • Experience with Oozie Workflow Engine in running workflow jobs with actions that run Hadoop Map/Reduce and Pig jobs.
  • Hands on experience in Importing and exporting data from different databases like MySQL, MongoDB, Cassandra, Oracle, Teradata and Netezza into HDFS and vice-versa using Sqoop
  • Have good experience creating real time data streaming solutions using Apache Spark/ Spark Streaming / Apache Storm, Kafka and Flume.
  • Create a hive table view in Impala  and validate the table and get the sampling data . 
  • Implemented Kafka Custom encoders for custom input format to load data into Kafka Partitions. Real time streaming the data using Spark with Kafka for faster processing.
  • Experienced in Java Application Development, Client/Server Applications, Internet/Intranet based applications using Core Java, J2EE patterns, Web Services, Oracle, SQL Server and other relational databases.
  • Extensive experience in Scripting  languages - bash shell scripting  and Python and UNIX/Linux Commands.  
  •   Working with data delivery teams to setup new Hadoop users, which includes setting up  Linux users, setting up Kerberos  principals and ETL testing HDFS, Hive, Pig and MapReduce  access for the new users.
  • Experience in utilizing Java tools in business, Web, and client-server environments including Java Platform J2EE, EJB, JSP, Java Servlets, Struts, and Java database Connectivity (JDBC) technologies.
  • Good experience in developing and implementing web applications using Java, CSS, HTML, HTML5, XHTML, Java script, JSON, XML and JDBC.
  • A great team player& ability to effectively communicate with all levels of the organization such as technical, management and customers and upgrading with coming Technology.

TECHNICAL SKILLS:

Big Data Technologies: Hadoop, MapReduce, Hdfs, Hive, Pig, HBase, Sqoop, Flume, Zookeeper, Oozie, Kafka, Yarn, Spark,Scala MongoDB and Cassandra.

Databases: Oracle, MySQL, Teradata, Microsoft SQL Server, MS Access,DB2 and NOSQL

Programming Languages: C, C++, Java, J2EE, Scala, SQL, PL/SQL and Unix Shell Scripts,Bash Shell Scripting.

Frameworks: MVC, Struts, Spring, Junit and Hibernate

Development Tools: Eclipse, NetBeans, Toad, Maven and ANT

Web Languages: XML, HTML, HTML5, DHTML, DOM, JavaScript, AJAX,  JQuery, JSON and CSS

Operating Systems & others: Linux(Cent OS, Ubuntu), Unix, Windows XP, Server 2003, Putty, Winscp, FileZilla, AWS and Microsoft Office Suite

PROFESSIONAL EXPERIENCE:

Confidential

HADOOP DEVELOPER/SPARK

Responsibilities:

  • Installed and Configured Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, Oozie, Zookeeper, HBase, Flume and Sqoop.
  • Implemented multiple Map Reduce Jobs in java for data cleaning and pre-processing.
  • Development pipeline  designs and local, state and federal pipeline  relocations. 
  • Worked in a team with 30 node cluster and increased cluster by adding Nodes, the configuration for additional data nodes was done by Commissioning process in Hadoop.
  • Importing the data from the MySQL and Oracle into the HDFS using Sqoop.
  • Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS.
  • Worked extensively on Teradata  Query Submitting and processing tools like BTEQ and Teradata SQL Assistant. 
  • Created error audit reports on Teradata for data quality and Supporting team.
  • Used Pig as ETL tool to do Transformations, even joins and some pre-aggregations before storing the data onto HDFS.
  • Involved in Testing the initial and incremental load,  dimension and look up tables and validated the data from source to target tables.
  • services like EC2 and S3 for small data sets.
  • Used Apache kafka to get the data from kafka producer which in turn pushes data to broker. 
  • Written robust/reusable HiveQL Scripts and UDF’s in Hive using Java.
  • Implemented partitioning, bucketing in Hive for better organization of the data.
  • Responsible for loading the data from Oracle database, Teradata into HDFS using Sqoop
  • Designed and built unit tests and executed operational queries on HBase.
  •   Proficient in Data warehouse based Testing Planning and extensive use of tools like Data Stage, TERADATA, Micro strategy Reporting, Web Focus Reporting .
  • Implemented a script to transmit information from Oracle to HBase using Sqoop.
  •   Worked on migrating MapReduce Python   programs into Spark transformations using Spark
  • Experience in working with NoSQL database HBase in getting real time data analytics using Apache Spark with Python.
  • Installed Oozie workflow engine to run multiple Map Reduce, HiveQL and Pig jobs.
  • Implemented a script to transmit information from Webservers to Hadoop using Flume.
  • Used Zookeeper to manage coordination among the clusters.
  •   Used Apache Kafka and Apache Storm to gather log data and fed into HD FS.
  • Developed Scala program for data extraction using Spark Streaming.
  • Setting up and managing Kafka for Stream processing.
  • Created Produce, consumer and Zookeeper setup to Kafka replication.
  • Integrate Splunk with AWS  deployment using puppet to collect data from all EC2 systems into Splunk
  • Experienced with batch processing of data source using Apache Spark and Elastic search.
  • Experienced in implementing Spark RDD transformations, actions to implement business analysis.
  • Implemented Spark using Scala and Spark SQL for faster testing and processing of data.
  • Created and maintained Technical documentation for launching HADOOP Clusters and for executing Hive queries and Pig Scripts.

Environment: Hadoop, MapReduce, HDFS, Hive, Cloudera, Core Java, Scala, SQL, Flume, Spark, Pig, Sqoop, Oozie, impala, Pyhton, AWS, HBase, Kafka, Cassandra, ETL, Oracle, Unix.

Confidential

HADOOP DEVELOPER

Responsibilities:

  • Design and develop components of big data processing using HDFS, MapReduce, PIG, and Hive.
  • Analyzed data using Hadoop components Hive and Pig.
  • Import the data from different sources like HDFS/HBase into Kafka.
  • Wrote MapReduce jobs using Scala and Pig Latin.
  • Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
  • Worked on NoSQL databases including HBase and Cassandra. Configured SQL Database to store Hive Teradata.
  • Producing reports and documentation for all Automated testing  efforts, results, activities, data, logging and tracking.
  • Used SQL  Profiler for troubleshooting, monitoring, optimization of SQL  Server and non-production database code as well as T-SQL   code from developers and QA .
  • Participated in development/implementation of Cloudera impala Hadoop environment
  • Imported data using Sqoop to load data from MySQL to HDFS on regular basis.
  • Used Pig as ETL tool to do Transformations, even joins and some pre-aggregations before storing the data onto HDFS.
  • Worked on developing applications in Hadoop Big Data Technlogies-Pig, Hive, Map-Reduce, Oozie.
  • Worked in EC2, S3, ELB, Autoscaling Servers, Glacier, Storage Lifecycle rules, Amazon EMR.
  • Configured deployed and maintained multi-node Dev and Test Kafka Clusters.
  • Design technical solution for real-time analytics using Spark and HBase.
  • Created ETL (Informatica)jobs to generate and distribute reports from MySQL database
  • Involved in loading data from LINUX file system to HDFS using Sqoop and exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the Business intelligence(BI) team.
  • Installed and configured Flume, Hive, Pig, Sqoop and Oozie on the Hadoop cluster.
  • Extracted the Teradata from Oracle into Hive using the Sqoop.
  • Worked on Agile Methodology.

Environment: Hadoop, MapReduce, HDFS, Hive, Cloudera, Core Java, Scala, SQL, Flume, Spark, Pig, Sqoop, Oozie, impala, Ruby, AWS HBase, Kafka, Cassandra, ETL, Oracle, Python, Unix.

Confidential

Los Angeles,CA

Hadoop Developer

Responsibilities:

  • Worked on Hadoop cluster which ranged from 4-8 nodes during pre-production stage and it was sometimes extended up to 24 nodes during production.
  • Used Sqoop to import the data from RDBMS to Hadoop Distributed File System (HDFS) and later analyzed the imported data using Hadoop Components.
  • Established custom MapReduce programs in order to analyze data and used Pig Latin to clean unwanted data.
  • Performed various performance optimizations like using distributed cache for small datasets, Partition, Bucketing in hive and Map Side joins.
  • Impacting technology roadmap to be 40% streamlined delivering technical architecture for migration to cloud based AWS solution
  • Expert in creating PIG and Hive UDFs using Java in order to analyze the data efficiently.
  • Responsible for loading the data from Oracle database, Teradata into HDFS using Sqoop.
  • Wrote data ingestion systems to pull data from traditional RDBMS platforms such as Oracle and Teradata and store it in NoSQL databases such as MongoDB.
  • Involved in creating Hive tables, then applied HiveQL on those tables, this will invoke and run MapReduce jobs automatically.
  • Involved in loading and transforming large sets of Structured, Semi-Structured and Unstructured data and analyzed them by running Hive queries and Pig scripts.
  • Participated in requirement gathering form the Experts and Business Partners and converting the requirements into technical specifications.
  • Experienced in analyzing Cassandra database and compare it with other open-source NoSQL databases to find which one of them better suites the current requirements.
  • Created and maintained Technical documentation for launching HADOOP Clusters and for executing Hive queries and Pig Scripts.
  • Installed Oozie workflow engine to run multiple Hive and Pig jobs which run independently with time and data availability.
  • Assisted application teams in installing Hadoop updates, operating system, patches and version upgrades when required.
  • Assisted in Cluster maintenance, Cluster Monitoring and Troubleshooting, Manage and review data backups and log files.

Environment: Hadoop, MapReduce, HDFS, Hive, Cloudera, Core Java, SQL, Flume, NoSql, Pig, Sqoop, Oozie, HBase, Cassandra, ETL, informatica, Oracle, Unix, Ajax, Json

Confidential

Java Developer

Responsibilities:

  • Involved in all the test cases and fixed any bugs or any issues identified during the testing period. 
  • Worked on IE  Developer  tools to debug given HTML. 
  • Written test cases for Unit testing using Junit. 
  • Implemented logging mechanism using log4j. 
  • Created Restful web service in Doc-delete application to delete documents older than given expiration date.
  •   Involved in complete development of Agile Development Methodology and tested the application in each iteration. 
  • Designed and Developed websites using CXML and REST for Cisco and multiple other clients. 
  • Migrated production database from SQL 2000 to SQL 2008 and upgraded production JBOSS application servers. 
  • Designed User Interfaces using JavaScript, Ajax, CSS JQUERY, functionality 
  • Used Swing for sophisticated GUI components. 
  • Writing  Java  utility classes.
  • Troubleshooting and resolving defects. 
  • IntelliJ as IDE for the application development and integration of the frameworks.
  • Designed the application by implementing Struts 2.0 MVC Architecture. 
  • Development, enhancement, maintenance and support of  Java  J2EE applications,  
  • Developed JSP and Servlets to dynamically generate HTML and display the data to the client side. 
  • Implemented JSON along with Ajax to improve the processing speed. 
  • Deployed the applications on Tomcat Application Server. 
  • Prepared high and low level design documents for the business modules for future references and updates.

ENVIRONMENT:  Java, apache-maven, SVN, Jenkins, Spring 3.2, Spring Integration, JBOSS, Spring boot Strap, log4j, Junit, IBM MQ, JMS, Web Services, HTML, JQuery,  Java  Script, Java   1.5, Servlets 2.3, JSP 2.x, Hibernate.

Confidential

Java Developer

Responsibilities:

  • Involved in coding using Java Servlets, created web pages using JSP's for generating pages dynamically.
  • Involved in developing forms using HTML.
  • Developed Enterprise Java beans for the business flow and business objects.
  • Designing, coding and configuring server side J2EE components like JSP, Servlets, Java Beans, XML.
  • Responsible for implementing the business requirement using the Spring core, Spring boot and Spring data .
  • Extensive use of Struts Framework for Controller components and view components.
  • Learned XML for communicating client and Consumed and created Restful web services.
  • Developed the Database interaction classes using JDBC, java.
  • Rigorously followed Test Driven Development(TDD) in coding.
  • Implemented Action Classes and server side validations for account activity, payment history and Transactions
  • Implemented views using Struts tags, JSTL2.0 and Expression Language.
  • Worked with various java patterns such as Service Locater and Factory Pattern at the business layer for effective object behaviors.
  • Used Hibernate to transfer the application data between client and server.
  • Worked on the JAVA Collections API for handling the data objects between the business layers and the front end
  • Worked with JAXB, SAXP and XML Schema for exporting data into XML format and importing data from XML format to data base and JAXB in the web service's request response data marshalling as well as un marshalling process.
  • Responsible for coding MySQL Statements and Stored procedures for back end communication using JDBC.
  • Developed an API to write XML documents from a database. Utilized XML and XSL Transformation for dynamic web-content and database connectivity.
  • Developed a Restful WebService using spring framework.
  • Involved in implementing the Hibernate API for database connectivity.
  • Maintaining the Source Code Designed, developed and deployed on Apache Tomcat Server.
  • Used Maven for continuous integration of the builds and Used ANT for deploying the web applications.

Environment: Java/J2EE, Struts 1.2, Tiles, EJB, JMS, Servlets, JSP, JDBC, HTML, CSS, JavaScript, JUnit, WebSphere 7.0, Eclipse, SQL Server 2000, log4j, Subversion, Jenkin

Hire Now