Sr Hadoop Developer Resume
Kansas, MO
SUMMARY:
- 4 Years of strong experience as Application developer responsible for building Rest Services, multi - threaded applications, IO programming using Java.
- Strong hands on experience using major components in Hadoop Ecosystem like Spark, Map Reduce, HIVE, PIG, HBase, Sqoop, Splunk, Oozie, Flume and Kafka.
- Hands on developing and debugging YARN (MR2) Jobs to process large Datasets.
- Excellent knowledge and understanding of Distributed Computing and Parallel processing frameworks.
- Strong experience with developing end-to-end Spark applications in Scala.
- Worked extensively on troubleshooting issues related to memory management, resource management, with in spark applications.
- Strong knowledge on fine-tuning spark applications and hive scripts.
- Written complex MapReduce jobs to perform various data transformations on large scale datasets.
- Experience in installation, configuration, and monitoring Hadoop clusters both in house and on the cloud (AWS).
- Extending Hive and Pig core functionality by writing custom UDF’s for Data Analysis.
- Handling importing of data from various data source, performed transformation, and hands on developing and debugging MR2 jobs to process large data sets.
- Experienced in writing MapReduce programs and UDFs for both Hive and Pig in Java.
- Experience in using Splunk, Apache Flume for collecting, aggregation, moving large amount of data from application server.
- Used Sqoop extensively for ingesting data from relational databases.
- Good knowledge on Kafka for streaming real time feeds from external rest applications to Kafka topics.
- Strong knowledge of entire SDLCE - Requirement Gathering & Analysis, Planning, Design, Development, Testing and Implementation.
- Involved in Design and Development of technical specifications using Hadoop Echo System tools
- Used NOSSQL technologies like HBase, Mongo dB for data extraction and storing huge volume of data.
- Experience with Oozie Workflow Engine in running workflow jobs with actions that run Hadoop Map/Reduce, Hive, Sqoop and Spark jobs.
- Expertise in writing Map Reduce jobs using Java native code, Pig, Hive for data Processing
- Used SVN repository for version control of the developed code
- Experience working with NoSQL databases including Cassandra and HBase.
- Major strengths are familiarity with multiple software systems, ability to learn quickly new technologies, adapt to new environments, self-motivated, team player, focused adaptive and quick learner with excellent interpersonal, technical and communication skills.
- Strong oral and written communication, initiation, interpersonal learning and organizing skills matched with the ability to manage time and people effectively.
TECHNICAL SKILLS:
Big Data Eco System: HDFS, Map Reduce, Hive, Pig, HBase, Spark, Spark Streaming, Spark SQL, Kafka, Cloudera CDH4, CDH5, Hortonworks, Hadoop Streaming, Splunk, Zookeeper, Oozie, Sqoop, Flume, Impala, Solar, and Ranger.
Database: Oracle 10g/11g, Sql Server 2005/2008 R2, My SQL, DB2, HBase, MongoDB, Cassandra.
Framework: Struts, Spring, Hibernate
Operating Systems: Windows 2008, 2003, 2000 Server, Windows 95/98/XP/Vista/7, DOS, Red Hat Linux, Macintosh OSX.
Database Tools: SQL Enterprise Manager, SQL Profiler, Query Analyser, SQL Server Setup, Security Manager, Service manager, DTS, Import Export Data, Bulk Insert, SQL Server Reporting Services(SSRS)
Programming Languages: Java, Scala, Python, SQL
Script Languages: JavaScript, jQuery, Shell Script(BASH)
Methodologies: Waterfall, Iterative, Agile/Scrum
PROFESSIONAL EXPERIENCE:
Confidential, Kansas, MO
Sr Hadoop Developer
Responsibilities:
- Developed Spark applications using Scala utilizing Data frames and Spark SQL API for faster processing of data.
- Developed highly optimized Spark applications to perform various data cleansing, validation, transformation and summarization activities according to the requirement
- Data pipeline consists Spark, Hive and Sqoop and Custom build Input Adapters to ingest, transform and analyze operational data.
- Developed Spark jobs and Hive Jobs to summarize and transform data.
- Used Spark for interactive queries, processing of streaming data and integration with popular NoSQL database for huge volume of data.
- Involved in converting Hive/SQL queries into Spark transformations using Spark Data Frames and Scala.
- Analyzed the SQL scripts and designed the solution to implement using Scala.
- Used Spark for interactive queries, processing of streaming data and integration with popular NoSQL database for huge volume of data.
- Real time streaming the data using Spark with Kafka
- Created applications using Kafka, which monitors consumer lag within Apache Kafka clusters. Used in productionby multiple report suites.
- Ingested syslog messages, parses them and streams the data to Apache Kafka.
- Handled importing data from different data sources into HDFS using Sqoop and performing transformations using Hive, Map Reduce and then loading data into HDFS.
- Exported the analyzed data to the relational databases using Sqoop, to further visualize and generate reports for the BI team.
- Collecting and aggregating large amounts of log data using Flume and staging data in HDFS for further analysis
- Analyzed the data by performing Hive queries (Hive QL) to study customer behavior.
- Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
- Developed Hive scripts in Hive QL to de-normalize and aggregate the data.
- Created HBase tables and column families to store the user event data.
- Scheduled and executed workflows in Oozie to run various jobs.
Environment: Hadoop, HDFS, HBase, Spark, Scala, Hive, MapReduce, Sqoop, ETL, Java, PL/SQL, Oracle 11g, Unix/Linux, Ford DirectDearborn
Confidential
Hadoop Developer
Responsibilities:
- Creating end to end Spark applications using Scala to perform various data cleansing, validation, transformation and summarization activities on user behavioral data.
- Developed custom Input Adaptor utilizing the HDFS File system API to ingest click stream log files from FTP server to HDFS.
- Developed end-to-end data pipeline using FTP Adaptor, Spark, Hive and Impala.
- Implemented Spark and utilized SparkSQL heavily for faster development, and processing of data.
- Exploring with Spark for improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's.
- Involved in converting Hive/SQL queries into Spark transformations using Spark with Scala.
- Used Scala collection framework to store and process the complex consumer information.
- Implemented a prototype to perform Real time streaming the data using Spark Streaming with Kafka
- Handled importing other enterprise data from different data sources into HDFS using Sqoop and performing transformations using Hive, Map Reduce and then loading data into HBase tables.
- Exported the analyzed data to the relational databases using Sqoop, to further visualize and generate reports for the BI team.
- Collecting and aggregating large amounts of log data using Flume and staging data in HDFS for further analysis
- Analyzed the data by performing Hive queries (Hive QL) and running Pig scripts (Pig Latin) to study customer behavior.
- Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
- Created components like Hive UDFs for missing functionality in HIVE for analytics.
- Worked on various performance optimizations like using distributed cache for small datasets, Partition,Bucketing in Hive and Map Side joins.
- Created validate and maintain scripts to load data using Sqoop manually.
- Created Oozie workflows and coordinators to automate Sqoop jobs weekly and monthly.
- Uploaded and processed more than 30 terabytes of data from various structured and unstructured sources into HDFS (AWS cloud) using Sqoop and Flume.
- Used Oozie and Oozie coordinators to deploy end-to-end data processing pipelines and scheduling the workflows.
- Continuous monitoring and managing the Hadoop cluster
- Developed interactive shell scripts for scheduling various data cleansing and data loading process.
- Experience with data wrangling and creating workable datasets.
Environment: HDFS, Pig, Hive, Sqoop, Flume, Spark, Scala, MapReduce, Scala, Oozie, Oracle 11g, YARN, UNIX Shell Scripting, Agile Methodology
Confidential, Warren, NJ
Big Data/Hadoop Developer
Responsibilities:
- Lead a team of three developers that built a scalable distributed data solution-using Hadoop on a 30-node cluster using AWS cloud to run analysis on 25+ Terabytes of customer usage data.
- Developed several complex MapReduce programs to analyze and transform the data to uncover insights into the customer usage patterns.
- Used MapReduce to Index the large amount of data to easily access specific records.
- Performed ETL using Pig, Hive and MapReduce to transform transactional data to de-normalized form.
- Configured periodic incremental imports of data from DB2 into HDFS using Sqoop.
- Exported data using Sqoop from HDFSto Teradata on regular basis.
- Developed ETL scripts for data acquisition and transformation using Informatica and Talend.
- Installed and configuredFlume, Hive, Pig and Sqoop HBaseon the Hadoop cluster.
- Exported and analyzed data to the relational databases usingSqoopfor visualization and to generate reports for the BI team.
- Supported in setting up QA environment and updating configurations for implementing scripts withPigandSqoop.
- Worked extensively with importing metadata into Hive and migrated existing tables and applications to work on Hive and AWS cloud.
- Wrote Pig and HiveUDFs to analyze the complex data to find specific user behavior.
- Used Solr workflow engine to schedule multiple recurring and ad-hoc Hive and Pig jobs.
- Created HBase tables to store various data formats coming from different portfolios.
- Created Python scripts in automating the work flows.
- Extracted feeds form social media sites such as Facebook Twitter using Python scripts.
- Designed and implemented Hive and Pig UDF's using Python for evaluation, filtering, loading and storing of data
- Developed Simple to complex Map/reduce streaming jobs using Python language that are implemented using Hive and Pig.
- TibcoJasperSoft was used for the embedding BI reports
- Experience in writing scripts in Python for the automated jobs
- Assisted the team responsible for cluster maintenance, adding and removing cluster nodes, cluster monitoring and troubleshooting, managing and reviewing data backups and Hadoop log files.
- Conversion of Teradata, RDBMS are formulated in Hadoop backlog files.
- Worked actively with various teams to understand and accumulate data from different sources up on the business requirements
- Worked with the testing teams to fix bugs and ensure smooth and error-free code.
Environment: Hadoop, MapReduce, HDFS, Hive, Java, SQL, Cloudera Manager, Pig, Sqoop, Oozie, HBase, ZooKeeper, PL/SQL, MySQL, DB2, Teradata.
Confidential, Salt Lake City, UT
Hadoop Developer
Responsibilities:
- Responsible for developing efficient MapReduce on AWS cloud programs for more than 20 years’ worth of claim data to detect and separate fraudulent claims.
- Developed Map-Reduce programs from scratch of medium to complex.
- Uploaded and processed more than 30 terabytes of data from various structured and unstructured sources into HDFS (AWS cloud) using Sqoop and Flume.
- Played a key-role is setting up a 40 node Hadoop cluster utilizing Apache MapReduce by working closely with the Hadoop Administration team.
- Worked with the advanced analytics team to design fraud detection algorithms and then developed MapReduce programs to run efficiently the algorithm on the huge datasets.
- Developed Java programs to perform data scrubbing for unstructured data.
- Responsible for designing and managing the Sqoop jobs that uploaded the data from Oracle to HDFS and Hive.
- Creating Hive tables to import large data sets from various relational databases using Sqoop and export the analyzed data back for visualization and report generation by the BI team
- Used Flume to collect the logs data with error messages across the cluster.
- Designed and Maintained Oozie workflows to manage the flow of jobs in the cluster.
- Played a key role in installation and configuration of the various Hadoop ecosystem tools such as, Hive, Pig, andHBase.
- Successfully loaded files to HDFS from Teradata, and loaded from HDFS to HIVE
- Experience in using Zookeeper and Oozie for coordinating the cluster and scheduling workflows
- Installed Oozie workflow engine and scheduled it to run data/time dependent Hive and Pig jobs
- Designed and developed Dashboards for Analytical purposes using Tableau.
- Analyzed the Hadoop log files using Pig scripts to oversee the errors.
- Actively updated the upper management with daily updates on the progress of project that include the classification levels in the data.
Environment: Java, Hadoop, Mapreudce Hive, Pig, Sqoop, Flume, HBase, TeradataCapital One
Confidential, VA
Java/J2EE Developer
Responsibilities:
- Effective role in the team by interacting with welfare business analyst/program specialists and transformed business requirements into System Requirements.
- Involved in developing the application using Java/J2EE platform. Implemented the Model View Control (MVC) structure using Struts.
- Responsible to enhance the Portal UI using HTML, Java Script, XML, JSP,Java, CSS as per the requirements and providing the client-side Java script validations and Server side Bean Validation Framework (JSR 303).
- Developed Web services component using XML, WSDL, and SOAP with DOM parser to transfer and transform data between applications.
- Developed analysis level documentation such as Use Case, Business Domain Model, Activity, Sequence and Class Diagrams.
- Handling of design reviews and technical reviews with other project stakeholders.
- Implemented services using Core Java.
- Developed and deployed UI layer logics of sites using JSP.
- Spring MVC for the implementation of business model logic.
- Used SOAP UI for testing the Restful Webservices by sending an SOAP request.
- Used AJAX framework for server communication and seamless user experience.
- Created test framework on Selenium and executed Web testing in Chrome, IE and Mozilla through Web driver.
- Worked with StrutsMVC objects like action Servlet, controllers, and validators, web application context, Handler Mapping, message resource bundles, and JNDI for look-up for J2EE components.
- Developed dynamic JSP pages with Struts.
- Employed built-in/custom interceptors, and validators of Struts.
- Developed the XML data object to generate the PDF documents, and reports.
- Employed Hibernate, DAO, and JDBC for data retrieval and medications from database.
- Messaging and interaction of web services is done using SOAP.
- Developed Junittest cases for Unit Test cases and as well as system, and user test scenarios
Environment: Struts, Hibernate, Spring MVC, SOAP, WSDL, Web Logic, Java, JDBC, Java Script, Servlets, JSP, JUnit, XML, UML, Eclipse, Windows.
Confidential
Java Developer
Responsibilities:
- Involved in designing the Project Structure, System Design and every phase in the project.
- Responsible for developing platform related logic and resource classes, controller classes to access the domain and service classes.
- Developed UI using HTML, JavaScript, and JSP, and developed Business Logic and Interfacing components using Business Objects, XML, and JDBC.
- Designed user-interface and checking validations using JavaScript.
- Managed connectivity using JDBC for querying/inserting & data management including triggers and stored procedures.
- Involved in Technical Discussions, Design, and Workflow.
- Participate in the Requirement Gathering and Analysis.
- Developed Unit Testing cases using JUnit Framework.
- Implemented the data access using Hibernate and wrote the domain classes to generate the Database Tables.
- Involved in design of JSP’s and Servlets for navigation among the modules.
- Designed cascading style sheets and XML part of Order entry Module & Product Search Module and did client side validations with java script.
- Involved in implementation of view pages based on XML attributes using normal Java classes.
- Involved in integration of App Builder and UI modules with the platform.
Environment: Hibernate, Java, JAXB, JUnit, XML, UML, Oracle11g, Eclipse, Windows XP.
Confidential
Java Developer
Responsibilities:
- Involved in designing the Project Structure, System Design and every phase in the project.
- Responsible for developing platform related logic and resource classes, controller classes to access the domain and service classes.
- Developed UI using HTML, JavaScript, and JSP, and developed Business Logic and Interfacing components using Business Objects, XML, and JDBC.
- Designed user-interface and checking validations using JavaScript.
- Managed connectivity using JDBC for querying/inserting & data management including triggers and stored procedures.
- Involved in Technical Discussions, Design, and Workflow.
- Participate in the Requirement Gathering and Analysis.
- Developed Unit Testing cases using JUnit Framework.
- Implemented the data access using Hibernate and wrote the domain classes to generate the Database Tables.
- Involved in design of JSP’s and Servlets for navigation among the modules.
- Designed cascading style sheets and XML part of Order entry Module & Product Search Module and did client side validations with java script.
- Involved in implementation of view pages based on XML attributes using normal Java classes.
- Involved in integration of App Builder and UI modules with the platform.
Environment: Hibernate, Java, JAXB, JUnit, XML, UML, Oracle11g, Eclipse, Windows XP.
