Senior Hadoop Developer Resume
Charlotte, NC
SUMMARY
- Over 8+ years of experience in analysis, design, development, implementation of web - based distributed applications
- 4+ years of experience in Hadoop, HDFS, Map Reduce, Sqoop, Pig, Hive and HBase, LINUX, UNIX.
- Worked on a Hadoop Cluster with current size of 56 Nodes and 896 Terabytes capacity.
- Hands on using job scheduling and monitoring tools like Kafka, Oozie and Zookeeper.
- Having strong techno-functional skill to efficiently conceptualize business scenarios into Big Data Use Cases.
- In depth understanding/noledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Nodeand MapReduce concepts.
- Capable of Designing and Architecting Hadoop Applications and recommending the right solutions and technologies for the application.
- Developed many MapReduce programs in Java for data cleansing, data filtering, and data aggregation.
- Has re-engineered many Legacy Mainframe Applications into Hadoop usingMapReduce API to reduce mainframe MIPS and Storage Cost.
- Developed UDFs in Java as and when necessary to use in PIG and HIVE queries.
- Involved in creating Hive tables, loading with data and writing Hive queries which will run internally in MapReduce way.
- Good noledge on Horton works Data platform 2.2
- Experience with Oozie Workflow Engine in running workflow jobs with actions dat run Hadoop MapReduce and Pig jobs.
- Experience in working with the NoSQL Mongo DB,Apache Cassandra.
- Experience in managing and reviewing Hadoop Logfiles.
- Developed tools using Python, Shell scripting, XML to automate some of the menial tasks
- Extending Hive and Pig core functionality by writing custom UDFs.
- Good experience working with Distributions such as MAPR, Horton works and Cloudera.
- Good Knowledge in Amazon AWS concepts like EMR and EC2web services which provides fast and efficient processing of Big Data.
- Experienced in the integration of various data sources like Java, RDBMS, Shell Scripting, Spreadsheets, and Text files.
- Experience in working with Oracle and DB2.
- Experience in Web Services using XML, HTML and SOAP.
- Excellent Java development skills using J2EE, J2SE, Servlets, JUnit, JSP, JDBC.
- Familiarity working with popular frameworks likes Struts, Hibernate, Spring, MVC and AJAX.
- Implemented Proofs of Concept on Hadoop stack and different big data analytic tools.
- Experience in migration from different databases (i.e. VSAM, DB2, PLSQL and MYSQL) to Hadoop.
- Experience in NoSQL databases like HBase, Cassandra and MongoDB.
- Committed to timely and quality work, Quick learner, able to adapt effortlessly to new technologies, ability to work within a team as well as cross-team
- Defined and Developed ETL process to automate the data conversions, catalog uploading, error handling and auditing using Talend.
- Highly motivated and a self-starter with effective communication and organizational skills, combined with attention to detail and business process improvements.
TECHNICAL SKILLS:
Technology: Hadoop Ecosystem/ J2SE/ J2EE/ Oracle.
Operating Systems: WindowsVista/XP/NT/2000Series,UNIX/LINUX (Ubuntu, CentOS, Redhat)/ AIX/ Solaris.
DBMS/Databases: DB2, My SQL, SQL, PL/SQL.
Programming Languages: C, C++, JSE, XML, JSP/Servlets, Struts, Spring, HTML, JavaScript, JQuery, Web services.
Big Data Ecosystem: HDFS, Map Reduce, Oozie, Hive, Pig, Sqoop, Flume, Zookeeper and HBase, Storm, Kafka, Spark, Scala.
Methodologies: Agile, Waterfall.
NOSQL Databases: Cassandra, MongoDB, HBase.
Version Control Tools: SVN, CVS, VSS, PVCS.
PROFESSIONAL EXPERIENCE:
Confidential, Charlotte, NC
Senior Hadoop Developer
Responsibilities:
- Knowledge on handling Hive queries using Spark SQL dat integrate with Spark environment implemented in Scala.
- Used Spark Streaming API with Kafka to build live dashboards; Worked on Transformations & actions in RDD, Spark Streaming, Pair RDD Operations, Check-pointing, and SBT.
- Implemented POC to migrate map reduce jobs into Spark RDD transformation using Scala IDE for Eclipse
- Creating Hive tables to import large data sets from various relational databases using Sqoop and export the analyzed data back for visualization and report generation by the BI team.
- Installing and configuring Hive, Sqoop, Flume, Oozie on the Hadoop clusters.
- Involved in scheduling Oozie workflow engine to run multiple Hive and Pig jobs.
- Developed a process for the Batch ingestion of CSV Files, Sqoop from different sources and also generating views on the data source using Shell Scripting and Python.
- Integrated a shell script to create Collections/morphline, SolrIndexes on top of table directories using MapReduce Indexer Tool within Batch Ingestion Framework.
- Implemented partitioning, dynamic partitions and buckets in HIVE.
- Configured the Message Driven Beans (MDB) for messaging to different clients and agents who are registered with the system.
- Developed Hive Scripts to create the views and apply transformation logic in the Target Database.
- Involved in the design of Data Mart and Data Lake to provide faster insight into the Data.
- Involved in using Stream Sets Data Collector tool and created Data Flows for one of the streaming application.
- Experienced in using Kafka as a data pipeline between JMS (Producer) and Spark Streaming Application (Consumer)
- Involved in the development of Spark Streaming application for one of the data source using Scala, Spark by applying the transformations.
- Developed Web services and web services clients using both SOAP and REST implantations.
- Designed and Developed web based applications using Hibernate, XML, EJB, and SQL to setup new web services.
Environment: Hadoop, HDFS, Hive, HBase, Zookeeper, Oozie, Impala, Java(jdk1.6), Cloudera, Oracle, Teradata SQL Server, UNIX Shell Scripting, Flume, Scala, Spark, Sqoop, Python.
Confidential, Charlotte, NC
Sr. Hadoop Developer.
Responsibilities:
- Installed and configured Hadoop Map Reduce, HDFS, developed multiple Map Reduce jobs in java for data cleaning and preprocessing.
- Importing and exporting data into HDFS and Hive using Sqoop.
- Involved in loading and transforming large sets of structured, semi structured and unstructured data from relational databases into HDFS using Sqoop imports.
- Data ingestion into HDFS from various Mainframe Db2 table using Sqoop
- Populated HDFS and Cassandra with huge amounts of data using Apache Kafka.
- Migrated Existing Map Reduce programs to Spark Models using Python.
- Automated Spark streaming process using Kafka
- Used RDD's to perform transformation on datasets as well as to perform actions like count, reduce, first.
- Good noledge on Sparkplatform parameters like memory, cores, and executors
- Developed simple and complex MapReduce programs in Java for Data Analysis on different data formats.
- Test data for conformance with standard patterns or customized patterns using Talend
- Extracted files from Couch DB through Sqoop and placed in HDFS and processed.
- Experienced in runningHadoopstreaming jobs to process terabytes of xml format data.
- Administration, installing, upgrading, and managing distributions of Hadoop, Hive, Hbase.
- Loading data into HBase tables using Java MapReduce.
- Used AWS cloud infrastructure to manage product development and implementation.
- Involved in performance of troubleshooting and tuning Hadoop clusters.
- Created Hive tables, loaded data and wrote Hive queries dat run within the map.
- Implemented business logic by writing Hive UDFs in Java.
- Got good experience with NoSQL database.
- Involved in loading data from UNIX file system to HDFS.
- Installed and configured Hive and also written Hive UDFs.
- Involved in creating Hive tables, loading with data and writing Hive queries which will run internally in map reduce way.
- Wrote XML scripts to build OOZIE functionality.
- Used OOZIE Operational Services for batch processing and scheduling workflows dynamically.
- Extensively worked on creating End-End data pipeline orchestration using Oozie.
- Evaluated suitability of Hadoop and its ecosystem in my current project and implementing / validating with various Proof of Concept (POC) applications to eventually adopt them to benefit from the Big Data Hadoop initiative.
- Used Sqoop and mongoDump to move the data between MongoDB and HDFS.
- Developed workflows using custom MapReduce.
- Developed suit of Unit Test Cases for Mapper, Reducer and Driver classes using MR Testing library.
Environment: Java 6,python, Linux, Hadoop, HBase, Sqoop, Kafka, Pig, Hive, Cloudera Hadoop Distribution, HDFS, MapReduce, MongoDB, Shell scripting, LINUX, Flume,spark.
Confidential, Frederick, MD
Sr. Hadoop Developer.
Responsibilities:
- Involved in Low level design for MR, Hive, Shell scripts to process data.
- Worked extensively in creating MapReduce jobs to power data for search and aggregation.
- Designed a data warehouse using Hive.
- Created partitioned tables and hive queries for ad hoc access in Hive.
- Worked extensively with Sqoop for importing metadata from Oracle.
- Extensively used Pig for data cleansing.
- Importing the real-time data to Hadoop using Kafka and implemented the Oozie job.
- Optimized MapReduce Jobs to use HDFS efficiently by using various compression mechanisms.
- Worked on Data Serialization formats for converting Complex objects into sequence bits by using AVRO, PARQUET, JSON, CSV formats.
- Administration, installing, upgrading, and managing distributions of Hadoop, Hive, HBase.
- Advanced noledge in performance troubleshooting and tuning Hadoop clusters.
- Implemented business logic by writing Pig UDF's in Java and used various UDFs from Piggybanks and other sources.
- Used OOZIE Operational Services for batch processing and scheduling workflows dynamically.
- Extensively worked on HIVE data stores for text, Avro and RC storage formats.
- Worked on populating analytical data stores for data science team.
- Created tools using Java for performing balance tests.
- Worked with architects to build efficient OOZIE workflows with coordinators. evaluated and reconfigured companies Unix/Linux/Oracle setup including reallocatingSanDisk space to engineer a robust, scalable solutions
- Integrated the hive warehouse with HBase.
- Wrote Python scripts to parse XML documents and load the data in database
- Written customized HiveUDFs in Java where the functionality is too complex.
- Implemented Partitioning, Dynamic Partitions, Buckets in HIVE.
- Generate final reporting data using Tableau for testing by connecting to the corresponding Hive tables using Hive ODBC connector.
- Maintained System integrity of all sub-components (primarily HDFS, MR, HBase, and Hive).
Environment: Hadoop, MapReduce, HDFS, Hive, Java (jdk1.6), Hadoop distribution of Horton works, Oozie, Core Java, Pig, Sqoop, Shell scripting, Kafka, LINUX, HBase, Oracle.
Confidential, Hillsboro, OR
Hadoop Developer
Responsibilities:
- Wrote MapReduce jobs using Java API.
- Importing and exporting data into HDFS and Hive using Sqoop.
- Defining a logical architecture of the layers and components of Apache Spark solution. Selecting the right products to implement a big data solution.
- Reliability and Ease of Scalability over traditional MSMQ.
- Expertise into monitoring and administration of Spark applications. Involved in writing PySpark scripts.
- Involved heavily in writing complex SQL queries based on the given requirements on Teradata platform.
- Extracted Data through different source systems like Oracle, MySQL and SQL Server Databases for Applications Development and report Maintenances.
- Worked on several BTEQ scripts to transform the data and load into the Teradata database.
- Performed Data analysis and prepared the physical database based on the requirements.
- Involved in Creating the Unix Shell Scripts/Wrapper Scripts dat are used to run the BTEQ and other Teradata jobs from the Control Panel tool.
- Developed entire frontend and backend modules using Python on Django Web Framework.
- Involved in active communication and interaction with offshore support team during the development, Testing and production implementation phases of the project.
- Worked on Tuning, and troubleshooting Teradata system at various levels. Performed unit testing, regression testing and Integration testing.
- Assisted the Testing team in developing SQL/PLSQL scripts for Automated Testing.
- Maintain System integrity of all sub-components related to Hadoop.
- Involved for Cassandra Database Schema Design Using BULK LOAD Utility data pushed to Cassandra databases.
Environment: Hadoop, Hive, Spark, Spark-SQL, Sqoop, Kafka, Teradata Viewpoint, UNIX, UNIX Shell Scripting, TPT, Fast Export, and BTEQ, GitHub, Framework, Pig, SQOOP, ORACLE, MySQL.
Confidential, Irving, TX
Java/Hadoop Developer.
Responsibilities:
- Creating class diagrams, sequence diagrams, Data Model and Object Model using Rational Rose and MS-Visio.
- Used JSF Framework to develop the application.
- Responsible for building scalable distributed data solutions using Hadoop.
- Implemented Spring quartz Jobs for generating feed to the various downstream applications.
- Used Rational Rose to draw UML diagrams and to develop the Use cases, Domain model and Design Model
- Implemented the functionalities using Java, J2EE, JSP, and AJAX, Servlets.
- Dynamic chart generation using JFreeChart API in java
- Developed java batch, for performance updates, implemented Multi Thread concepts.
- Involved in Database programming in DB2.
- Created the Stored Procedures, functions and triggers using PL/SQL.
- Implemented struts MVC framework with tiles and validators.
- Application UI development using AJAX, HTML, JSP, XML and CSS.
- Implemented the functionalities using Java, J2EE, JSP, and AJAX, Servlets
- Developed automation, mail notification system using Java Mail API in java FTP programming.
- Involved Database programming in oracle10g.
- Worked as a module/tech lead for various modules like GCSP, ORNIS of the application.
- Created the Stored Procedures, functions and triggers using PL/SQL.
Environment: Java, J2EE, JSP, MVC, Eclipse, web services, SOAP, WSDL, UDDI, Java Script, MTG, AJAX, JDBC, WAS5.1, Eclipse, Oracle 10g, PL/SQL, HTML, DHTML,XML
Confidential
Software Engineer
Responsibilities:
- Conducted requirements gathering sessions with the business user to collect business requirements (BRDs), data requirement, and user interface requirements.
- Responsible for the initiation, planning, execution, control and completion of the project
- Worked alongside the Development team in solving critical issues during the development.
- Responsible for developing management reporting using Cogon’s reporting tool.
- Conducted User Interview and documented reconciliation work flows.
- Worked with Business and System Analyst to complete the development in time.
- Implemented the presentation layer with HTML, CSS and JavaScript.
- Developed web components using JSP, Servlets and JDBC.
- Implemented secured cookies using Servlets.
- Wrote complex SQL queries and stored procedures.
- Implemented Persistent layer using HibernateAPI.
- Implemented Search queries using Hibernate Criteria interface.
- Conducted detailed analysis of current processes and developed new process flow, data flow, and work flow models, Use Cases using Rational Rose & MS Visio
- Maintained responsibility for database design, implementation, and administration.
- Testing the functionality and behavioral aspect of the software.
Environment: UNIX, Windows, Core Java, SQL, JDBC, JavaScript, HTML, JSP, Servlets, Oracle, J2EE, JCL, DB2, CICS.
