We provide IT Staff Augmentation Services!

Sr. Hadoop Developer Resume

5.00/5 (Submit Your Rating)

Wilmington, DE

PROFESSIONAL SUMMARY:

  • A versatile Hadoop Developer with an experience of over 8+ years with 4 years extensively in Hadoop along with 3 years in Machine learning, Deep learning and 2+ years of experience in Python, Java/J2EE enterprise application design, development and maintenance.
  • Extensive experience implementing Big Data solutions using various distributions of Hadoop and its ecosystem tools.
  • HadoopDeveloper with 4 years of working experience in designing and implementing complete end - to-endHadoopbased data analytical solutions using Spark, MapReduce, Yarn, Kafka, PIG, HIVE, Sqoop, Storm, Flume, Oozie, Impala, HBase etc.
  • Experience with Mahout to understand the Machine Learning algorithms for an efficient data processing.
  • Good experience in creating data ingestion pipelines, data transformations, data management, data governance and real time streaming at an enterprise level.
  • Experience developing PigLatin and HiveQL scripts for Data Analysis and ETL purposes and also extended the default functionality by writing User Defined Functions (UDFs) for data specific processing.
  • Experience with migrating data to and from RDBMS and unstructured sources into HDFS using Sqoop & Flume.
  • Good experience in Object Oriented Programming, using Java & J2EE (Servlets, JSP, Java Beans, EJB, JDBC, RMI, XML, JMS, Web Services, AJAX).
  • Profound experience (1+ years) in creating real time data streaming solutions using Apache Spark/Spark Streaming, Kafka.
  • Experience data processing like collecting, Aggregating Machine, moving from various sources using Apache Flume and Kafka.
  • Implemented TF, TF-IDF and LSI and analysed the results, K-Mean clustering algorithm and cosine similarity.
  • Worked with various formats of files like delimited text files, click stream log files, Apache log files, Avro files, JSON files, XML Files
  • Has good understanding of various compression techniques used in Hadoop processing like Gzip, SNAPPY, LZO etc.,
  • Expertise developing MapReduce jobs to scrub, sort, filter, join and summarize data.
  • Experience developing PigLatin and HiveQL scripts for Data Analysis and ETL purposes and extended the default functionality by writing User Defined Functions (UDFs), (UDAFs) for custom data specific processing.
  • Good Hands-on experience on full life cycle implementation using CDH (Cloudera) and HDP (Hortonworks Data Platform) distributions.
  • In depth understanding of Hadoop Architecture and its various components such as Resource Manager, Application Master, Name Node, Data Node, HBase design principles etc.,
  • Strong Knowledge on Architecture of Distributed systems and parallel processing, In-depth understanding of MapReduce programing paradigm and Spark execution framework.
  • Profound understanding of Partitions and Bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance.
  • Experience in handling messaging services using Apache Kafka (Development of producers and consumers)
  • Extensive Experience on importing and exporting data using Flume and Kafka.
  • Experience with migrating data to and from RDBMS and unstructured sources into HDFS using Sqoops.
  • Analyzed data with Hue, using Apache Hive via Hue’s Beeswax and Catalog applications.
  • Strong experience in collecting and storing stream data like log data in HDFS using ApacheFlume.
  • Good understanding of cloud configuration in Amazon web servicesAWS
  • Experienced in working with Amazon Web Services (AWS) using EC2 for computing and S3 as storage mechanism.
  • Experience in job workflow scheduling and monitoring tools like Oozie.
  • Hands on experience on NoSQL databases including HBase, Cassandra and Mongo DB.
  • Strong experience in collecting and storing stream data like log data in HDFS using ApacheFlume.
  • Experience in working with Java HBase API for ingestion processed data to Hbase tables.
  • Experience withOozie Workflow Engineto automate and parallelize Hadoop Map/Reduce, Hive and Pig jobs.
  • Proficient in using Cloudera Manager, an end-to-end tool to manage Hadoop operations in Cloudera Cluster.
  • Good experience in working with cloud environment like Amazon Web Services (AWS) EC2 and S3.
  • Exposure to AWS in using Lambda functions, Architecture and EMR.
  • Experience in developing ETL scripts for data acquisition and transformation using Informatica and Talend.
  • Knowledge on Enterprise Data Warehouse (EDW) architecture and various data modeling concepts like star schema.
  • Experience in implementing Auto Complete/Auto Suggest functionality using Ajax, JQuery, DHTML, Web Service call and JSON
  • Intensive work experience in developing enterprise solutions using Java, J2EE, Servlets, JSP, JDBC, Struts, Spring, Hibernate, JavaBeans, JSF, MVC.
  • Profound knowledge on Core Java concepts like I/O, Multi-threading, Exceptions, RegEx, Collections, Data-structures and Serialization.
  • Expert at creating UML diagrams Use Case diagrams, Activity diagrams, Class diagrams and Sequence diagrams using Microsoft Visio.
  • Hands on experience in Python to maintain and improve the internet applications.
  • Excellent problem-solving, analytical, communication, presentation and interpersonal skills that help me to be a core member of any team.
  • Very good understanding in AGILE Scrum process.
  • Experience mentoring and working with offshore and distributed teams.

TECHNICAL SKILLS:

Hadoop/Big Data: HDFS, MapReduce, Spark, Yarn, Kafka, PIG, HIVE, Storm, Sqoop,Flume, Oozie, Impala, HBase, Hue,Zookeeper, Mahout

Programming Languages: C, Java, PL/SQL, Pig Latin, Python, HiveQL, Scala, SQL

Java/J2EE & WebTechnologies: J2EE, EJB, JSF, Servlets, JSP, JSTL, CSS, HTML, XML, Angular JS, AJAXDevelopment Tools: Eclipse, Net Beans, SVN, Git, Ant, Maven, SOAP UI, JMX, explorer, XML Spy, QC, QTP, Jira, SQL Developer, QTOAD

Methodologies: Agile/Scrum, UML, Rational Unified Process and Waterfall.

NoSQL Technologies: Cassandra, MongoDB, HBase.

Frameworks: Struts, Hibernate, And Spring MVC.

Scripting Languages: Unix Shell Scripting, perl.

Distributed platforms: Hortonworks, Cloudera, MapR

Databases: Oracle 11g/12C, MySQL, MS-SQL Server, Teradata, IBM DB2

Operating Systems: Windows XP/Vista/7/8,10, UNIX, Linux

Software Package: MS Office 2007/2010/2016.

Web/ Application Servers: WebLogic, WebSphere, ApacheTomcat, WebSphere, Application Server

Visualization: Tableau and MS Excel

Version control: CVS, SVN, GIT, TFS.

PROFESSIONAL EXPERIENCE:

Confidential, Wilmington, DE

Sr. Hadoop Developer

Responsibilities:

  • Creating end to end Spark applications using Scala to perform various data cleansing, validation, transformation and summarization activities on user behavioral data.
  • Developed custom Input Adaptor utilizing the HDFS File system API to ingest click stream log files from FTP server to HDFS.
  • Developed end-to-end data pipeline using FTP Adaptor, Spark, Hive and Impala.
  • Used Scala to write code for all Spark use cases.
  • Implementeddesign patternsin Scala for the application.
  • Implemented Spark using Scala utilized SparkSQL heavily for faster development, and processing of data.
  • Exploring with Spark for improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, YARN.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Java and Scala.
  • Used Scala collection framework to store and process the complex consumer information.
  • Implemented a prototype to perform Real time streaming the data using Spark Streaming with Kafka
  • Handled importing other enterprise data from different data sources into HDFS using Sqoop and performing transformations using Hive, Map Reduce and then loading data into HBase tables.
  • Exported the analyzed data to the relational databases using Sqoop, to further visualize and generate reports for the BI team.
  • Collecting and aggregating large amounts of log data using Flume and staging data in HDFS for further analysis
  • Analyzed the data by performing Hive queries (Hive QL) and running Pig scripts (Pig Latin) to study customer behavior.
  • Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
  • Created components like Hive UDFs for missing functionality in HIVE for analytics.
  • Worked on various performance optimizations like using distributed cache for small datasets, Partition,Bucketing in Hive and Map Side joins.
  • Created validate and maintain scripts to load data using Sqoop manually.
  • Created Oozie workflows and coordinators to automate Sqoop jobs weekly and monthly.
  • Installed and configured Apache Hadoop, Hive and Pig environment on AWS
  • Implemented POC Spark Cluster on AWS
  • Uploaded and processed more than 30 terabytes of data from various structured and unstructured sources into HDFS (AWS cloud) using Sqoop and Flume.
  • Tibco Jasper Soft studio was used for the ireport analysis using AWS cloud
  • Created reports for the BI team using Sqoop to export data into HDFS and Hive.
  • Used Oozie and Oozie coordinators to deploy end-to-end data processing pipelines and scheduling the workflows.
  • Continuous monitoring and managing the Hadoop cluster
  • Used JUnit framework to perform Unit testing of the application
  • Developed interactive shell scripts for scheduling various data cleansing and data loading process.
  • Performed data validation on the data ingested using Spark by building a custom model to filter all the invalid data and cleanse the data.
  • Experience with data wrangling and creating workable datasets.

Environment: HDFS, Pig, Hive, Sqoop, Flume, Spark, Scala, MapReduce, Scala, Oozie, Oracle 11g, YARN, UNIX Shell Scripting, Agile Methodology

Confidential, Minnetonka, MN

Sr. Spark Developer

Responsibilities:

  • The main aim of the project is tuning the performance of the existing Hive Queries.
  • Implemented Spark using Scala, Java and utilizing Data frames and Spark SQL API for faster processing of data.
  • Created end to end Spark applications using Scala to perform various data cleansing, validation, transformation and summarization activities according to the requirement
  • Worked with Mahout to understand the Machine Learning algorithms for an efficient data processing
  • Developed data pipeline using Spark, Hive and Sqoop, to ingest, transform and analyze operational data.
  • Developed Spark jobs and Hive Jobs to summarize and transform data.
  • Used Spark for interactive queries, processing of streaming data and integration with popular NoSQL database for huge volume of data.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark DataFrames and Scala.
  • Analyzed customer transaction data of a large online retailer, derived many R, F, M variables that could help understand customer behavior and predict the revenue they would generate in a long run based on early transactions
  • Analyzed the SQL scripts and designed the solution to implement using Scala.
  • Used Spark for interactive queries, processing of streaming data and integration with popular NoSQL database for huge volume of data.
  • Real time streaming the data using Spark with Kafka
  • Created applications using Kafka, which monitors consumer lag within Apache Kafka clusters. Used in production by multiple companies.
  • Developed Python scripts to copy data between the clusters. The Python script that is developed for the copy enables to copy huge amount of data very fast.
  • Ingested syslog messages, parses them and streams the data to Apache Kafka.
  • Handled importing data from different data sources into HDFS using Sqoop and performing transformations using Hive, Map Reduce and then loading data into HDFS.
  • Exported the analyzed data to the relational databases using Sqoop, to further visualize and generate reports for the BI team.
  • Collecting and aggregating large amounts of log data using Flume and staging data in HDFS for further analysis
  • Analyzed the data by performing Hive queries (Hive QL) to study customer behavior.
  • Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
  • Developed Hive scripts in Hive QL to de-normalize and aggregate the data.
  • Created HBase tables and column families to store the user event data.
  • Scheduled and executed workflows in Oozie to run Hive jobs.

Environment: Hadoop, HDFS, MapR 5.1, HBase, Spark, Scala, Hive, MapReduce, Sqoop, ETL, Java, PL/SQL, Oracle 11g, Unix/Linux, Mahout- Machine Learning

Confidential, New York, NY

Hadoop Developer

Responsibilities:

  • Lead a team of three developers that built a scalable distributed data solution-using Hadoop on a 30-node cluster using AWS cloud to run analysis on 25+ Terabytes of customer usage data.
  • Developed several new MapReduce programs to analyze and transform the data to uncover insights into the customer usage patterns.
  • Used MapReduce to Index the large amount of data to easily access specific records.
  • Performed ETL using Pig, Hive and MapReduce to transform transactional data to de-normalized form.
  • Configured periodic incremental imports of data from DB2 into HDFS using Sqoop.
  • Exported data using Sqoop from HDFSto Teradata on regular basis.
  • Developed ETL scripts for data acquisition and transformation using Informatica and Talend.
  • Installed and configuredFlume, Hive, Pig and Sqoop HBaseon the Hadoop cluster.
  • Exported and analyzed data to the relational databases usingSqoopfor visualization and to generate reports for the BI team.
  • Supported in setting up QA environment and updating configurations for implementing scripts withPigandSqoop.
  • Worked extensively with importing metadata into Hive and migrated existing tables and applications to work on Hive and AWS cloud.
  • Wrote Pig and Hive UDFs to analyze the complex data to find specific user behavior.
  • Used Solr workflow engine to schedule multiple recurring and ad-hoc Hive and Pig jobs.
  • Created HBase tables to store various data formats coming from different portfolios.
  • Created Python scripts in automating the work flows.
  • Extracted feeds form social media sites such as Facebook Twitter using Python scripts.
  • Designed and implemented Hive and Pig UDF's using Python for evaluation, filtering, loading and storing of data
  • Developed Simple to complex Map/reduce streaming jobs using Python language that are implemented using Hive and Pig.
  • Tibco JasperSoft was used for the embedding BI reports
  • Experience in writing scripts in Python for the automated jobs
  • Assisted the team responsible for cluster maintenance, adding and removing cluster nodes, cluster monitoring and troubleshooting, managing and reviewing data backups and Hadoop log files.
  • Conversion of Teradata, RDBMS are formulated in Hadoop backlog files.
  • Worked actively with various teams to understand and accumulate data from different sources up on the business requirements
  • Worked with the testing teams to fix bugs and ensure smooth and error-free code.

Environment: Hadoop, MapReduce, HDFS, Hive, Java, SQL, Cloudera Manager, Pig, Sqoop, Oozie, HBase, ZooKeeper, PL/SQL, MySQL, DB2, Teradata.

Confidential, Denver, CO

Hadoop Developer

Responsibilities:

  • Responsible for developing efficient MapReduce on AWS cloud programs for more than 20 years’ worth of claim data to detect and separate fraudulent claims.
  • Developed Map-Reduce programs from scratch of medium to complex.
  • Uploaded and processed more than 30 terabytes of data from various structured and unstructured sources into HDFS (AWS cloud) using Sqoop and Flume.
  • Played a key-role is setting up a 40 node Hadoop cluster utilizing Apache MapReduce by working closely with the Hadoop Administration team.
  • Worked with the advanced analytics team to design fraud detection algorithms and then developed MapReduce programs to run efficiently the algorithm on the huge datasets.
  • Developed Java programs to perform data scrubbing for unstructured data.
  • Responsible for designing and managing the Sqoop jobs that uploaded the data from Oracle to HDFS and Hive.
  • Creating Hive tables to import large data sets from various relational databases using Sqoop and export the analyzed data back for visualization and report generation by the BI team
  • Used Flume to collect the logs data with error messages across the cluster.
  • Designed and Maintained Oozie workflows to manage the flow of jobs in the cluster.
  • Played a key role in installation and configuration of the various Hadoop ecosystem tools such as, Hive, Pig, and HBase.
  • Successfully loaded files to HDFS from Teradata, and loaded from HDFS to HIVE
  • Experience in using Zookeeper and Oozie for coordinating the cluster and scheduling workflows
  • Installed Oozie workflow engine and scheduled it to run data/time dependent Hive and Pig jobs
  • Designed and developed Dashboards for Analytical purposes using Tableau.
  • Analyzed the Hadoop log files using Pig scripts to oversee the errors.
  • Actively updated the upper management with daily updates on the progress of project that include the classification levels in the data.

Environment: Java, Hadoop, Hive, Pig, Sqoop, Flume, HBase, Oracle 10g, Teradata, Cassandra

Confidential, Pleasanton, CA

Java/J2EE Developer

Responsibilities:

  • Effective role in the team by interacting with welfare business analyst/program specialists and transformed business requirements into System Requirements.
  • Involved in developing the application using Java/J2EE platform. Implemented the Model View Control (MVC) structure using Struts.
  • Responsible to enhance the Portal UI using HTML, Java Script, XML, JSP,Java, CSS as per the requirements and providing the client-side Java script validations and Server side Bean Validation Framework (JSR 303).
  • Developed Web services component using XML, WSDL, and SOAP with DOM parser to transfer and transform data between applications.
  • Developed analysis level documentation such as Use Case, Business Domain Model, Activity, Sequence and Class Diagrams.
  • Handling of design reviews and technical reviews with other project stakeholders.
  • Implemented services using Core Java.
  • Developed and deployed UI layer logics of sites using JSP.
  • Spring MVC for the implementation of business model logic.
  • Used SOAP UI for testing the Restful Webservices by sending an SOAP request.
  • Used AJAX framework for server communication and seamless user experience.
  • Created test framework on Selenium and executed Web testing in Chrome, IE and Mozilla through Web driver.
  • Worked with Struts MVC objects like action Servlet, controllers, and validators, web application context, Handler Mapping, message resource bundles, and JNDI for look-up for J2EE components.
  • Developed dynamic JSP pages with Struts.
  • Employed built-in/custom interceptors, and validators of Struts.
  • Developed the XML data object to generate the PDF documents, and reports.
  • Employed Hibernate, DAO, and JDBC for data retrieval and medications from database.
  • Messaging and interaction of web services is done using SOAP.
  • Developed Junittest cases for Unit Test cases and as well as system, and user test scenarios

Environment: Struts, Hibernate, Spring MVC, SOAP, WSDL, Web Logic, Java, JDBC, Java Script, Servlets, JSP, JUnit, XML, UML, Eclipse, Windows.

Confidential

Jr. Java Developer

Responsibilities:

  • Involved in designing the Project Structure, System Design and every phase in the project.
  • Responsible for developing platform related logic and resource classes, controller classes to access the domain and service classes.
  • Developed UI using HTML, JavaScript, and JSP, and developed Business Logic and Interfacing components using Business Objects, XML, and JDBC.
  • Designed user-interface and checking validations using JavaScript.
  • Managed connectivity using JDBC for querying/inserting & data management including triggers and stored procedures.
  • Involved in Technical Discussions, Design, and Workflow.
  • Participate in the Requirement Gathering and Analysis.
  • Developed Unit Testing cases using JUnit Framework.
  • Implemented the data access using Hibernate and wrote the domain classes to generate the Database Tables.
  • Involved in design of JSP’s and Servlets for navigation among the modules.
  • Designed cascading style sheets and XML part of Order entry Module & Product Search Module and did client side validations with java script.
  • Involved in implementation of view pages based on XML attributes using normal Java classes.
  • Involved in integration of App Builder and UI modules with the platform.

Environment: Hibernate, Java, JAXB, JUnit, XML, UML, Oracle11g, Eclipse, Windows XP.

We'd love your feedback!