We provide IT Staff Augmentation Services!

Senior Hadoop Developer Resume

0/5 (Submit Your Rating)

Charlotte, NC

SUMMARY

  • Over 8+ years of experience in analysis, design, development, implementation of web - based distributed applications
  • 4+ years of experience in Hadoop, HDFS, Map Reduce, Sqoop, Pig, Hive and HBase, LINUX, UNIX.
  • Worked on a Hadoop Cluster with current size of 56 Nodes and 896 Terabytes capacity.
  • Hands on using job scheduling and monitoring tools like Kafka, Oozie and Zookeeper.
  • Having strong techno-functional skill to efficiently conceptualize business scenarios into Big Data Use Cases.
  • In depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Nodeand MapReduce concepts.
  • Capable of Designing and Architecting Hadoop Applications and recommending the right solutions and technologies for the application.
  • Developed many MapReduce programs in Java for data cleansing, data filtering, and data aggregation.
  • Have re-engineered many Legacy Mainframe Applications into Hadoop usingMapReduce API to reduce mainframe MIPS and Storage Cost.
  • Developed UDFs in Java as and when necessary to use in PIG and HIVE queries.
  • Involved in creating Hive tables, loading with data and writing Hive queries which will run internally in MapReduce way.
  • Good knowledge on Horton works Data platform 2.2
  • Experience with Oozie Workflow Engine in running workflow jobs with actions that run Hadoop MapReduce and Pig jobs.
  • Experience in working with the NoSQL Mongo DB,Apache Cassandra.
  • Experience in managing and reviewing Hadoop Logfiles.
  • Developed tools using Python, Shell scripting, XML to automate some of the menial tasks
  • Extending Hive and Pig core functionality by writing custom UDFs.
  • Good experience working with Distributions such as MAPR, Horton works and Cloudera.
  • Good Knowledge in Amazon AWS concepts like EMR and EC2web services which provides fast and efficient processing of Big Data.
  • Experienced in the integration of various data sources like Java, RDBMS, Shell Scripting, Spreadsheets, and Text files.
  • Experience in working with Oracle and DB2.
  • Experience in Web Services using XML, HTML and SOAP.
  • Excellent Java development skills using J2EE, J2SE, Servlets, JUnit, JSP, JDBC.
  • Familiarity working with popular frameworks likes Struts, Hibernate, Spring, MVC and AJAX.
  • Implemented Proofs of Concept on Hadoop stack and different big data analytic tools.
  • Experience in migration from different databases (i.e. VSAM, DB2, PLSQL and MYSQL) to Hadoop.
  • Experience in NoSQL databases like HBase, Cassandra and MongoDB.
  • Committed to timely and quality work, Quick learner, able to adapt effortlessly to new technologies, ability to work within a team as well as cross-team
  • Defined and Developed ETL process to automate the data conversions, catalog uploading, error handling and auditing using Talend.
  • Highly motivated and a self-starter with effective communication and organizational skills, combined with attention to detail and business process improvements.

TECHNICAL SKILLS:

Technology: Hadoop Ecosystem/ J2SE/ J2EE/ Oracle.

Operating Systems: WindowsVista/XP/NT/2000Series,UNIX/LINUX (Ubuntu, CentOS, Redhat)/ AIX/ Solaris.

DBMS/Databases: DB2, My SQL, SQL, PL/SQL.

Programming Languages: C, C++, JSE, XML, JSP/Servlets, Struts, Spring, HTML, JavaScript, JQuery, Web services.

Big Data Ecosystem: HDFS, Map Reduce, Oozie, Hive, Pig, Sqoop, Flume, Zookeeper and HBase, Storm, Kafka, Spark, Scala.

Methodologies: Agile, Waterfall.

NOSQL Databases: Cassandra, MongoDB, HBase.

Version Control Tools: SVN, CVS, VSS, PVCS.

PROFESSIONAL EXPERIENCE:

Confidential, Charlotte, NC

Senior Hadoop Developer

Responsibilities:

  • Knowledge on handling Hive queries using Spark SQL that integrate with Spark environment implemented in Scala.
  • Used Spark Streaming API with Kafka to build live dashboards; Worked on Transformations & actions in RDD, Spark Streaming, Pair RDD Operations, Check-pointing, and SBT.
  • Implemented POC to migrate map reduce jobs into Spark RDD transformation using Scala IDE for Eclipse
  • Creating Hive tables to import large data sets from various relational databases using Sqoop and export the analyzed data back for visualization and report generation by the BI team.
  • Installing and configuring Hive, Sqoop, Flume, Oozie on the Hadoop clusters.
  • Involved in scheduling Oozie workflow engine to run multiple Hive and Pig jobs.
  • Developed a process for the Batch ingestion of CSV Files, Sqoop from different sources and also generating views on the data source using Shell Scripting and Python.
  • Integrated a shell script to create Collections/morphline, SolrIndexes on top of table directories using MapReduce Indexer Tool within Batch Ingestion Framework.
  • Implemented partitioning, dynamic partitions and buckets in HIVE.
  • Configured the Message Driven Beans (MDB) for messaging to different clients and agents who are registered with the system.
  • Developed Hive Scripts to create the views and apply transformation logic in the Target Database.
  • Involved in the design of Data Mart and Data Lake to provide faster insight into the Data.
  • Involved in using Stream Sets Data Collector tool and created Data Flows for one of the streaming application.
  • Experienced in using Kafka as a data pipeline between JMS (Producer) and Spark Streaming Application (Consumer)
  • Involved in the development of Spark Streaming application for one of the data source using Scala, Spark by applying the transformations.
  • Developed Web services and web services clients using both SOAP and REST implantations.
  • Designed and Developed web based applications using Hibernate, XML, EJB, and SQL to setup new web services.

Environment: Hadoop, HDFS, Hive, HBase, Zookeeper, Oozie, Impala, Java(jdk1.6), Cloudera, Oracle, Teradata SQL Server, UNIX Shell Scripting, Flume, Scala, Spark, Sqoop, Python.

Confidential, Charlotte, NC

Sr. Hadoop Developer.

Responsibilities:

  • Installed and configured Hadoop Map Reduce, HDFS, developed multiple Map Reduce jobs in java for data cleaning and preprocessing.
  • Importing and exporting data into HDFS and Hive using Sqoop.
  • Involved in loading and transforming large sets of structured, semi structured and unstructured data from relational databases into HDFS using Sqoop imports.
  • Data ingestion into HDFS from various Mainframe Db2 table using Sqoop
  • Populated HDFS and Cassandra with huge amounts of data using Apache Kafka.
  • Migrated Existing Map Reduce programs to Spark Models using Python.
  • Automated Spark streaming process using Kafka
  • Used RDD's to perform transformation on datasets as well as to perform actions like count, reduce, first.
  • Good knowledge on Sparkplatform parameters like memory, cores, and executors
  • Developed simple and complex MapReduce programs in Java for Data Analysis on different data formats.
  • Test data for conformance with standard patterns or customized patterns using Talend
  • Extracted files from Couch DB through Sqoop and placed in HDFS and processed.
  • Experienced in runningHadoopstreaming jobs to process terabytes of xml format data.
  • Administration, installing, upgrading, and managing distributions of Hadoop, Hive, Hbase.
  • Loading data into HBase tables using Java MapReduce.
  • Used AWS cloud infrastructure to manage product development and implementation.
  • Involved in performance of troubleshooting and tuning Hadoop clusters.
  • Created Hive tables, loaded data and wrote Hive queries that run within the map.
  • Implemented business logic by writing Hive UDFs in Java.
  • Got good experience with NoSQL database.
  • Involved in loading data from UNIX file system to HDFS.
  • Installed and configured Hive and also written Hive UDFs.
  • Involved in creating Hive tables, loading with data and writing Hive queries which will run internally in map reduce way.
  • Wrote XML scripts to build OOZIE functionality.
  • Used OOZIE Operational Services for batch processing and scheduling workflows dynamically.
  • Extensively worked on creating End-End data pipeline orchestration using Oozie.
  • Evaluated suitability of Hadoop and its ecosystem in my current project and implementing / validating with various Proof of Concept (POC) applications to eventually adopt them to benefit from the Big Data Hadoop initiative.
  • Used Sqoop and mongoDump to move the data between MongoDB and HDFS.
  • Developed workflows using custom MapReduce.
  • Developed suit of Unit Test Cases for Mapper, Reducer and Driver classes using MR Testing library.

Environment: Java 6,python, Linux, Hadoop, HBase, Sqoop, Kafka, Pig, Hive, Cloudera Hadoop Distribution, HDFS, MapReduce, MongoDB, Shell scripting, LINUX, Flume,spark.

Confidential, Frederick, MD

Sr. Hadoop Developer.

Responsibilities:

  • Involved in Low level design for MR, Hive, Shell scripts to process data.
  • Worked extensively in creating MapReduce jobs to power data for search and aggregation.
  • Designed a data warehouse using Hive.
  • Created partitioned tables and hive queries for ad hoc access in Hive.
  • Worked extensively with Sqoop for importing metadata from Oracle.
  • Extensively used Pig for data cleansing.
  • Importing the real-time data to Hadoop using Kafka and implemented the Oozie job.
  • Optimized MapReduce Jobs to use HDFS efficiently by using various compression mechanisms.
  • Worked on Data Serialization formats for converting Complex objects into sequence bits by using AVRO, PARQUET, JSON, CSV formats.
  • Administration, installing, upgrading, and managing distributions of Hadoop, Hive, HBase.
  • Advanced knowledge in performance troubleshooting and tuning Hadoop clusters.
  • Implemented business logic by writing Pig UDF's in Java and used various UDFs from Piggybanks and other sources.
  • Used OOZIE Operational Services for batch processing and scheduling workflows dynamically.
  • Extensively worked on HIVE data stores for text, Avro and RC storage formats.
  • Worked on populating analytical data stores for data science team.
  • Created tools using Java for performing balance tests.
  • Worked with architects to build efficient OOZIE workflows with coordinators. evaluated and reconfigured companies Unix/Linux/Oracle setup including reallocatingSanDisk space to engineer a robust, scalable solutions
  • Integrated the hive warehouse with HBase.
  • Wrote Python scripts to parse XML documents and load the data in database
  • Written customized HiveUDFs in Java where the functionality is too complex.
  • Implemented Partitioning, Dynamic Partitions, Buckets in HIVE.
  • Generate final reporting data using Tableau for testing by connecting to the corresponding Hive tables using Hive ODBC connector.
  • Maintained System integrity of all sub-components (primarily HDFS, MR, HBase, and Hive).

Environment: Hadoop, MapReduce, HDFS, Hive, Java (jdk1.6), Hadoop distribution of Horton works, Oozie, Core Java, Pig, Sqoop, Shell scripting, Kafka, LINUX, HBase, Oracle.

Confidential, Hillsboro, OR

Hadoop Developer

Responsibilities:

  • Wrote MapReduce jobs using Java API.
  • Importing and exporting data into HDFS and Hive using Sqoop.
  • Defining a logical architecture of the layers and components of Apache Spark solution. Selecting the right products to implement a big data solution.
  • Reliability and Ease of Scalability over traditional MSMQ.
  • Expertise into monitoring and administration of Spark applications. Involved in writing PySpark scripts.
  • Involved heavily in writing complex SQL queries based on the given requirements on Teradata platform.
  • Extracted Data through different source systems like Oracle, MySQL and SQL Server Databases for Applications Development and report Maintenances.
  • Worked on several BTEQ scripts to transform the data and load into the Teradata database.
  • Performed Data analysis and prepared the physical database based on the requirements.
  • Involved in Creating the Unix Shell Scripts/Wrapper Scripts that are used to run the BTEQ and other Teradata jobs from the Control Panel tool.
  • Developed entire frontend and backend modules using Python on Django Web Framework.
  • Involved in active communication and interaction with offshore support team during the development, Testing and production implementation phases of the project.
  • Worked on Tuning, and troubleshooting Teradata system at various levels. Performed unit testing, regression testing and Integration testing.
  • Assisted the Testing team in developing SQL/PLSQL scripts for Automated Testing.
  • Maintain System integrity of all sub-components related to Hadoop.
  • Involved for Cassandra Database Schema Design Using BULK LOAD Utility data pushed to Cassandra databases.

Environment: Hadoop, Hive, Spark, Spark-SQL, Sqoop, Kafka, Teradata Viewpoint, UNIX, UNIX Shell Scripting, TPT, Fast Export, and BTEQ, GitHub, Framework, Pig, SQOOP, ORACLE, MySQL.

Confidential, Irving, TX

Java/Hadoop Developer.

Responsibilities:

  • Creating class diagrams, sequence diagrams, Data Model and Object Model using Rational Rose and MS-Visio.
  • Used JSF Framework to develop the application.
  • Responsible for building scalable distributed data solutions using Hadoop.
  • Implemented Spring quartz Jobs for generating feed to the various downstream applications.
  • Used Rational Rose to draw UML diagrams and to develop the Use cases, Domain model and Design Model
  • Implemented the functionalities using Java, J2EE, JSP, and AJAX, Servlets.
  • Dynamic chart generation using JFreeChart API in java
  • Developed java batch, for performance updates, implemented Multi Thread concepts.
  • Involved in Database programming in DB2.
  • Created the Stored Procedures, functions and triggers using PL/SQL.
  • Implemented struts MVC framework with tiles and validators.
  • Application UI development using AJAX, HTML, JSP, XML and CSS.
  • Implemented the functionalities using Java, J2EE, JSP, and AJAX, Servlets
  • Developed automation, mail notification system using Java Mail API in java FTP programming.
  • Involved Database programming in oracle10g.
  • Worked as a module/tech lead for various modules like GCSP, ORNIS of the application.
  • Created the Stored Procedures, functions and triggers using PL/SQL.

Environment: Java, J2EE, JSP, MVC, Eclipse, web services, SOAP, WSDL, UDDI, Java Script, MTG, AJAX, JDBC, WAS5.1, Eclipse, Oracle 10g, PL/SQL, HTML, DHTML,XML

Confidential

Software Engineer

Responsibilities:

  • Conducted requirements gathering sessions with the business user to collect business requirements (BRDs), data requirement, and user interface requirements.
  • Responsible for the initiation, planning, execution, control and completion of the project
  • Worked alongside the Development team in solving critical issues during the development.
  • Responsible for developing management reporting using Cogon’s reporting tool.
  • Conducted User Interview and documented reconciliation work flows.
  • Worked with Business and System Analyst to complete the development in time.
  • Implemented the presentation layer with HTML, CSS and JavaScript.
  • Developed web components using JSP, Servlets and JDBC.
  • Implemented secured cookies using Servlets.
  • Wrote complex SQL queries and stored procedures.
  • Implemented Persistent layer using HibernateAPI.
  • Implemented Search queries using Hibernate Criteria interface.
  • Conducted detailed analysis of current processes and developed new process flow, data flow, and work flow models, Use Cases using Rational Rose & MS Visio
  • Maintained responsibility for database design, implementation, and administration.
  • Testing the functionality and behavioral aspect of the software.

Environment: UNIX, Windows, Core Java, SQL, JDBC, JavaScript, HTML, JSP, Servlets, Oracle, J2EE, JCL, DB2, CICS.

We'd love your feedback!