We provide IT Staff Augmentation Services!

Hadoop Developer Resume

5.00/5 (Submit Your Rating)

Sunnyvale, CA

SUMMARY:

  • Over 10+ years of experience in Information Technology which includes experience in Big data and HADOOP Ecosystem. In - depth knowledge and hands-on experience in dealing with Apache Hadoopcomponents like HDFS, MapReduce, HiveQL, HBase, Pig, Hive, Sqoop, Oozie, Cassandra, Flume, and Confidential .
  • Experienced in developing web applications in various domains like Telecom, Insurance, Retail and Banking.
  • Excellent knowledge on Hadoop Architecture and ecosystems such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce.
  • Developed Scripts and automated data management from end to end and sync up between all the clusters.
  • Strong hands on experience in Hadoop Framework and its ecosystem including but not limited to HDFS Architecture, MapReduce Programming, Hive, Pig, Sqoop, Hbase, Oozie etc.
  • Worked with system engineering team to plan and deploy Hadoop hardware and software environments.
  • Hands on experience in installing, configuring Cloudera's Apache Hadoop ecosystem components like Flume-ng, Hbase, Zoo Keeper, Oozie, Hive, Storm, Sqoop, Kafka, Hue, Pig, Hue with CDH3&4 clusters.
  • Accomplished with creating Test Plans, defining Test Cases, reviewing and maintaining Test Scripts, interacting with team members in fixing errors and executing Integration testing (SIT), User Acceptance Testing (UAT), Stage (PFIX) Unit, System Integrated Test, Regression Test and Customer Test
  • Extensive experience with core Java, MapReduce programming, advanced J2EE Frameworks such as Spring, Struts, JSF and Hibernate.
  • Extensive experience in Application Software Design, Object Oriented Design, Development, Documentation, Debugging, Testing and Implementation
  • Expert level skills in designing and implementing web server solutions and deploying java application servers like Tomcat, JBoss, WebSphere, Weblogic on Windows platform.
  • Excellent work experience with XML/Database mapping, writing SQL queries, Store Procedures, Triggers with major Relational Databases Oracle 11g, 12c and SQL Server.
  • Extensively worked on Java/J2EE systems with different databases like Oracle, My SQL and DB2.
  • Having good knowledge of Oracle9i, 10g, 11g as Database and excellent in writing the SQL queries and scripts.
  • Experience using Hibernate for mapping Java classes with database and using Hibernate Query Language (HQL).
  • Application Integration with IDE's like Eclipse and deployments in application servers like Tomcat, IBM Websphere and Weblogic.
  • Strong experience in web services that include several components like SOAP, WSDL, XSD, Axis2 and JAX-WS and Restful web services.
  • Experience in developing Client side Web applications using HTML, JSP, Jquery, JSTL, AJAX, and Custom Tags while implementing the client side validations using JavaScript and Server side validations using Struts Validations Framework.
  • Experience in creating Entity Relationship Diagrams (ERD) using data modeling tools, organizing the data in the project into entities and define the relationships between the entities.
  • Competence in using different java IDEs like Eclipse, JBuilder, NetBeans and RAD for developing Dynamic Web Applications.
  • Developed Pyspark code to read data from Hive, group the fields and generate XML files.Enhanced the Pyspark code to write the generated XML files to a directory to zip them to CDAs
  • Implemented REST call to submit the generated CDAs to vendor website Implemented Impyla to support JDBC/ODBC connections for Hiveserver2
  • Built a data validation dashboard in Solr to be able to display the message record. Wrote shell script to run Sqoop job for bulk data ingestion from Oracle into Hive
  • Created tables for the ingested data in Hive. Scheduled Oozie job for data ingestion for the Sqoop job
  • Worked with JSON file format for StreamSets. Worked with Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs.
  • Configured Confidential Streaming to receive real time data from the Kafka and store the stream data to HDFS.
  • Enhanced and optimized product Confidential code to aggregate, group and run data mining tasks using the Confidential framework.
  • Worked on Maven 3.3.9 for building and managing Java based projects. Hands-on experience with using Linux and HDFS shell commands. Worked on Kafka for message queuing solutions.
  • Loaded and transformed large sets of structured, semi structured and unstructured data in various formats like text, zip, XML and JSON.
  • Generated Java APIs for retrieval and analysis on No-SQL database such as HBase and Cassandra.
  • Basic knowledge in application design using Unified Modeling Language (UML), Sequence diagrams, Case diagrams, Entity Relationship Diagrams (ERD) and Data Flow Diagrams (DFD).
  • Familiar in Core Java with strong understanding and working knowledge in Object Oriented Concepts like Collections, Multithreading, Data Structures, Algorithms, Exception Handling and Polymorphism.
  • Expertise in database design, creation and management of schemas, writing Stored Procedures, Functions, DDL, DML, SQL queries & Modeling.
  • Extensive experience in ETL Architecture, Development, enhancement, maintenance, Production support, Data Modeling, Data profiling, Reporting including Business requirement, system requirement gathering.
  • Excellent understanding/knowledge of Hadoop Distributed system architecture and design principles.
  • Developed a pipeline using Java,Kafka and Python to load data from a JMS server to Hive with automatic ingestions and quality audits of the data to the RAW layer of the Data Lake.

TECHNICAL SKILLS:

Programming Languages: Java, SQL, PL/SQL, python, Scala, Linux shell Scripting. Web Services MVC, SOAP, REST

Hadoop/Big Data Technologies: Hadoop, HDFS, MapReduce, Hive, Pig, HBase, Confidential, Scala, Impala, kafka, Hue, Sqoop, Oozie, Flume, Zookeeper, Cassandra, Cloudera CDH5, Python, PySpark, Solrand Horton works .

Frameworks: Apache Storm, Samza, Confidential, and Flink, MapR,YARN

IDE Tools: Eclipse, NetBeans, IntelliJ IDEA, JBuilder, JDeveloper 9.0/10g, Toad 4.0/8.5, SQL Developer, Rational Application Developer (RAD), Visual Studio& Spring Source Tool Suite .

Operating Systems: Windows, UNIX, Linux, CentOS & Ubuntu.

Web Technologies: HTML/DHTML, CSS, CS3, JavaScript, AJAX, JQuery, Angular2, XML, JSON, XSLT, Java Beans, JMS.

Version Controls: Git, CVS, SVN, Microsoft VSS, Stash, ClearCase 7.0/7.1, Subversion.

Web/App Servers: IBM WebSphere 6.x/5.x, Apache Tomcat 8, 9, jboss, WebLogic, Glass Fish, Jetty.

Web Services SOAP, REST API, AXIS.: Development Tools

Eclipse, Rational Application Developer, Oracle JDeveloper, PL/SQL Developer, SQL Developer, WS FTP, Putty: Database

Oracle, DB2, OOAD, Design Patterns, and MS SQL: Server, Teradata

Middleware: ODBC, JDBC, RMI, Blaze DS.

Methodologies: Agile, SCRUM, TDD, Design Patterns, Continuous Integration using CruiseControl, WaterFall

PROFESSIONAL EXPERIENCE:

Hadoop Developer

Confidential, Sunnyvale, CA

Responsibilities:

  • Responsible for Managing, Analyzing and Transforming petabyte's of data and also quick validation check on FTP file arrival from S3 Bucket to HDFS.
  • Responsible for analyzing large data sets and derive customer usage patterns by developing new MapReduce programs.
  • Experienced in creation of Hive tables and loading data incrementally into the tables using Dynamic Partitioning and Worked on Avro Files, JSON Records.
  • Experienced in using Pig for data cleansing and developed Pig Latin scripts to extract the data from web server output files to load into HDFS.
  • Worked on Hive by creating external and internal tables, loading it with data and writing Hive queries.
  • Involved in development and usage of UDTF's and UDAF's for decoding Log Record Fields and Conversion's, Generating Minute Buckets for the specified Time Interval's and JSON Field Extractor.
  • Developed Pig and Hive UDF's to analyze the complex data to find specific user behavior.
  • Responsible for Debug, Optimization of Hive Scripts and also implementing Deduplication Logic in Hive using a Rank Key Function (UDF).
  • Experienced in writing Hive Validation Scripts which are used in validation framework (for daily analysis through graphs and presented to business users).
  • Developed workflow in Oozie to automate the tasks of loading data into HDFS and pre-processing with Pig and Hive.
  • Analyze and define researcher's strategy and determine system architecture and requirement to achieve goals.
  • Developed multiple Kafka Producers and Consumers from as per the software requirement specifications.
  • Used Kafka for log accumulation like gathering physical log documents off servers and places them in a focal spot like HDFS for handling.
  • Configured Confidential streaming to get ongoing information from the Kafka and store the stream information to HDFS.
  • Used various Confidential Transformations and Actions for cleansing the input data.
  • Developed shell scripts to generate the hive create statements from the data and load the data into the table.
  • Involved in writing custom Map-Reduce programs using java API for data processing.
  • Integrated Maven build and designed workflows to automate the build and deploy process.
  • Involved in developing a linear regression model to predict a continuous measurement for improving the observation on data developed using Confidential with Scala API.
  • Worked extensively on Confidential and MLlib to develop a regression model for logistic information.
  • Implemented Confidential Scripts using Scala, Confidential SQL to access hive tables into Confidential for faster processing of data.
  • The hive tables are created as per requirement were Internal or External tables defined with appropriate static, dynamic partitions and bucketing, intended for efficiency.
  • Load and transform large sets of structured, semi structured data using hive.
  • Extract Real time feed using Kafka and Confidential Streaming and convert it to RDD and process data in the form of Data Frame and save the data as Parquet format in HDFS.
  • Used Confidential and Confidential -SQL to read the parquet data and create the tables in hive using the Scala API.
  • Tested Apache TEZ, an extensible framework for building high performance batch and interactive data processing applications, on Pig and Hive jobs
  • Very good understanding Cassandra cluster mechanism that includes replication strategies, snitch, gossip, consistent hashing and consistency levels.
  • Experienced in using the Confidential application master to monitor the Confidential jobs and capture the logs for the Confidential jobs.
  • Implemented Confidential using Scala and Confidential SQL for faster testing and processing of data.
  • Implemented Confidential using Scala and utilizing Data frames and Confidential SQL API for faster processing of data.
  • Involved in making code changes for a module in work station simulation for processing across the cluster using Confidential -submit.
  • Involved in performing the analytics and visualization for the data from the logs and estimate the error rate and study the probability of future errors using regressing models.
  • Used WEB HDFS REST API to make the HTTP GET, PUT, POST and DELETE requests from the webserver to perform analytics on the data lake.
  • Worked on Sequence files, RC files, Map side joins, bucketing, partitioning for Hive performance enhancement and storage improvement.
  • Implemented Daily jobs that automate parallel tasks of loading the data into HDFS using autosys and Oozie coordinator jobs.
  • Performed streaming of data into Apache ignite by setting up cache for efficient data analysis.
  • Sqoop jobs, PIG and Hive scripts were created for data ingestion from relational databases to compare with historical data.

Environment: Hadoop, Hive, HDFS, HPC, WEBHDFS, WEBHCAT, Confidential, Confidential -SQL, KAFKA, Java, Scala, Web Server's, Maven Build and SBT build, Pig, Hive, Sqoop, Oozie, Shell Scripting, SQL Talend, Confidential, HBase, Hotonworks..

Hadoop Developer

Confidential, Franklin, TN

Responsibilities:

  • Developed Pyspark code to read data from Hive, group the fields and generate XML files.Enhanced the Pyspark code to write the generated XML files to a directory to zip them to CDAs
  • Implemented REST call to submit the generated CDAs to vendor website Implemented Impyla to support JDBC/ODBC connections for Hiveserver2
  • Enhanced the Pyspark code to replace Confidential with Impyla. Performed installation for Impyla on the Edge node
  • Evaluated performance of Confidential application by testing on cluster deployment mode vs local mode
  • Experimented submissions with Test OIDs to the vendor website
  • Explored StreamSet Data collector Implemented StreamSets data collector tool for ingestion into Hadoop.
  • Created a StreamSet pipeline to parse the file in XML format and convert to a format that is fed to Solr
  • Built a data validation dashboard in Solr to be able to display the message record. Wrote shell script to run Sqoop job for bulk data ingestion from Oracle into Hive
  • Created tables for the ingested data in Hive. Scheduled Oozie job for data ingestion for the Sqoop job
  • Worked with JSON file format for StreamSets. Worked with Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs.
  • Shell scripts to dump the data from MySQL to HDFS.
  • Analyzing of large volumes of structured data using SparkSQL.
  • Configured Confidential Streaming to receive real time data from the Kafka and store the stream data to HDFS.
  • Enhanced and optimized product Confidential code to aggregate, group and run data mining tasks using the Confidential framework.
  • Worked on Maven 3.3.9 for building and managing Java based projects. Hands-on experience with using Linux and HDFS shell commands. Worked on Kafka for message queuing solutions.
  • Developing Unit Test Cases for Mapper, Reducer and Driver classes using MRUNIT.
  • Loaded and transformed large sets of structured, semi structured and unstructured data in various formats like text, zip, XML and JSON.
  • Generated Java APIs for retrieval and analysis on No-SQL database such as HBase and Cassandra.
  • Written HBASE Client program in Java and web services.
  • Data migration from RDMS to hadoop using sqoop for analysis and implemented Oozie jobs for automatic data imports from source.
  • Used Hive data warehouse tool to analyze the data in HDFS and developed Hive queries.
  • To analyze data migrated to HDFS, used Hive data warehouse tool and developed Hive queries.
  • Created external tables with proper partitions for efficiency and loaded the structured data in HDFS resulted from MR jobs.
  • Involved in importing the real time data to hadoop using Kafka and implemented the Oozie job for daily imports.
  • Involved with various teams on and offshore for understanding of the data that is imported from their source.
  • Developed generic Shell scripts to automate Sqoop job by passing parameters for data imports.
  • Involved in data visualization and provided the files required for the team by analysing the data in hive and developed Pig scripts for advanced analytics on the data.
  • As a part of POC used the Amazon AWS S3 as an underlying file system for the Hadoop and implemented the elastic Map-Reduce jobs on the data in S3 buckets.
  • Participated with operations team for Confidential Installation on Secured cluster.
  • Provided updates in daily SCRUM and Self planning on start of sprint and provided the planned task using JIRA. In sync up with team in order to pick priority task

Environment: Hadoop, HDFS, Pig, Sqoop, Confidential, MapReduce, Cloudera, Snappy, Zookeeper, NoSQL, HBase, Shell Scripting, Ubuntu, Linux Red Hat.

Hadoop Developer

Confidential, Seattle, WA

Responsibilities:

  • Worked on Streamline analytics and data consolidation projects on the product Lucidworks Fusion.
  • Integrated Kafka, Confidential, Scala and HBasefor streamline analytics for creating a predictive model and implemented Machine Learning protocol.
  • Developed Scala scripts, UDFFs using both Data frames in Confidential for Data Aggregation, queries and writing data back into OLTP system through Sqoop.
  • Used Solr/Lucene for indexing and querying the JSON formatted data.
  • Handled cloud operations inside Rackspacefor persistence logic.
  • Monitored OOTB requests with ATG, Akamai and TIBCO.
  • Used REST services for handling unfinished jobs, knowing the status and creating a dataset inside a URL.
  • Worked on ingesting, reconciling, compacting and purging base table and incremental table data using Hive and HBaseand job scheduling through Oozie.
  • Designed and implemented streaming data on UI with Scala.js
  • Utilized DevOps principle components to ensure operational excellence before deploying in production.
  • Operating the cluster on AWS by using EC2, Akka, EMR, S3 and cloudwatch.
  • Transported data to HBase using Flume.
  • Used Java UDFs for performance tuning in Hive and Pig by manually driving the MR part.
  • Used Java APIs such as machine learning library functions, graph algorithms for training and predicting the linear model in Confidential streaming.
  • Have implemented unit testing in Java for pig and hive applications.
  • Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
  • Involved in source system analysis, data analysis, and data modeling to ETL (Extract, Transform and Load).

Environment: Hadoop, HDFS, Pig, Sqoop, Confidential, MapReduce, Cloudera, Snappy, Zookeeper, NoSQL, HBase, Shell Scripting, Ubuntu, Linux Red Hat .

Hadoop Admin

Confidential, Jessup, PA

Responsibilities:

  • Responsible for loading the customer's data and event logs from Oracle database, Teradata into HDFS using Sqoop
  • End-to-end performance tuning of Hadoop clusters and Hadoop MapReduce routines against very large data sets.
  • Developed the Pig UDF'S to pre-process the data for analysis.
  • Installed and configured Hadoop HDFS, MapReduce, Pig, Hive, and Sqoop.
  • Wrote Pig Scripts to generate MapReduce jobs and performed ETL procedures on the data in HDFS.
  • Prepared TEZ build from the source code and run the HIVE Query Jobs using TEZ execution engine rather using MR jobs for better performance
  • Participate in requirements gathering and designing, development, testing and analysis phase of the project in documenting the business requirements by conducting workshops/meetings with various business users.
  • Importing and exporting data into HDFS from database and vice versa using Sqoop.
  • Involved in Writing the Hive queries and Pig script files for processing data and loading to HDFS internally run in map reduce way.
  • Wrote Hive queries and Pig script for data analysis to meet the Business requirements.
  • Highly knowledge on Hadoop Administrator has extensive abilities of building, configuring and administration of large data clusters in big data environments using Apache distribution.
  • Having knowledge on Hive and Hbase integration, pig and Hbase Integration.
  • Used Confidential -Streaming APIs to perform necessary transformations and actions on the fly for building the common learner data model which gets the data from Kafka in near real time and Persists into Cassandra

Environment: Hadoop, MapReduce, HDFS, Hive, Flume, Sqoop, Cloudera, Oozie, Data Stage, Java Cron Jobs, UNIX Scripts, Confidential .

Hadoop Admin

Confidential, Rensselaer, NY

Responsibilities:

  • Evaluated suitability of Hadoop and its ecosystem to the above project and implementing / validating with various proof of concept (POC) applications to eventually adopt them to benefit from the Big Data Hadoop Initiative.
  • Responsible for building scalable distributed data solutions using Hadoop.
  • Designed and implemented Map Reduce-based large-scale parallel relation-learning system.
  • Installed and configured Pig for ETL jobs. Written Pig scripts with regular expression for data cleaning.
  • Creating Hive external tables to store the Pig script output. Working on them for data analysis to meet the business requirements.
  • Developed Map Reduce pipeline jobs to process the data and create necessary HFiles.
  • Involved in loading the created HFiles into HBase for faster access of all the products in all the stores without taking Performance hit.
  • Extensively used Apache Sqoop for efficiently transferring bulk data between Apache Hadoop and relational databases (Oracle) for product level forecast.
  • Used Zookeeper for providing coordinating services to the cluster.
  • Used Oozie Scheduler system to automate the pipeline workflow and orchestrate the map reduces jobs that extract the data on a timely manner.
  • Imported data using Sqoop to load data from MySQL and Oracle to HDFS on regular basis.
  • Developing Scripts and Batch Job to schedule various Hadoop Program.
  • Written Hive queries for data analysis to meet the business requirements.
  • Creating Hive tables and working on them using Hive QL.
  • Involved in loading the created HFiles into HBase for faster access of large customer base without taking Performance hit.

Environment: Hadoop 0.20.2 - PIG, Hive, Apache Sqoop, Oozie, HBase, Zoo keeper, Cloudera manager, 30 Node cluster with Linux-Ubuntu.

Java Developer

Confidential - Long Island, NY

Responsibilities:

  • Full life cycle experience including requirements gathering, business analysis, coding, testing, and creation of functional documentation.
  • Developing the application using Struts and JMS.
  • Deployment of the application on Web Sphere application Server.
  • Building the overall application and deployment and testing it on Web Sphere.
  • Creating tables and stored procedures in Oracle9.
  • IDE Eclipse to develop the Application and TOAD as IDE for DB, PL/SQL.

Environment: Java, STRUTS, JSP, JMS, Web Sphere, Oracle 9, Eclipse, Windows XP and UNIX

Java Developer

Confidential - Houston, TX

Responsibilities:

  • Involved in database design for the Application.
  • Created tables and stored procedures in Oracle.
  • Involved in Designing the Application Framework.
  • Implemented MVC Struts, Framework.
  • Performance Tuning.
  • Created client interfaces in JSP pages with Struts and implemented business logic in Servlets.
  • Created Business Services using Stateless Session Bean.
  • Designed and developed the application using MVC, Struts, JDBC, JSP and EJB.
  • Responsible for modify the existing JSP pages, Action Classes, Presentation Facade Layer and for development of Product Improvement Request.
  • Development of new JSP pages, Action Classes for development of Customer Change Request.
  • Writing various SQL queries/Stored procedures using Oracle.

Environment: Java, JSP, Servlets, Struts, EJB, JDBC, JDeveloper, Web sphere, Oracle, Windows XP, XML, Web Logic Application

Server.

We'd love your feedback!