Sr. Hadoop Developer Resume Wilmington, DE - Hire IT People

PROFESSIONAL SUMMARY:

A versatile Hadoop Developer with an experience of over 8+ years with 4 years extensively in Hadoop along with 3 years in Machine learning, Deep learning and 2+ years of experience in Python, Java/J2EE enterprise application design, development and maintenance.
Extensive experience implementing Big Data solutions using various distributions of Hadoop and its ecosystem tools.
HadoopDeveloper with 4 years of working experience in designing and implementing complete end - to-endHadoopbased data analytical solutions using Spark, MapReduce, Yarn, Kafka, PIG, HIVE, Sqoop, Storm, Flume, Oozie, Impala, HBase etc.
Experience with Mahout to understand the Machine Learning algorithms for an efficient data processing.
Good experience in creating data ingestion pipelines, data transformations, data management, data governance and real time streaming at an enterprise level.
Experience developing PigLatin and HiveQL scripts for Data Analysis and ETL purposes and also extended the default functionality by writing User Defined Functions (UDFs) for data specific processing.
Experience with migrating data to and from RDBMS and unstructured sources into HDFS using Sqoop & Flume.
Good experience in Object Oriented Programming, using Java & J2EE (Servlets, JSP, Java Beans, EJB, JDBC, RMI, XML, JMS, Web Services, AJAX).
Profound experience (1+ years) in creating real time data streaming solutions using Apache Spark/Spark Streaming, Kafka.
Experience data processing like collecting, Aggregating Machine, moving from various sources using Apache Flume and Kafka.
Implemented TF, TF-IDF and LSI and analysed the results, K-Mean clustering algorithm and cosine similarity.
Worked with various formats of files like delimited text files, click stream log files, Apache log files, Avro files, JSON files, XML Files
Has good understanding of various compression techniques used in Hadoop processing like Gzip, SNAPPY, LZO etc.,
Expertise developing MapReduce jobs to scrub, sort, filter, join and summarize data.
Experience developing PigLatin and HiveQL scripts for Data Analysis and ETL purposes and extended the default functionality by writing User Defined Functions (UDFs), (UDAFs) for custom data specific processing.
Good Hands-on experience on full life cycle implementation using CDH (Cloudera) and HDP (Hortonworks Data Platform) distributions.
In depth understanding of Hadoop Architecture and its various components such as Resource Manager, Application Master, Name Node, Data Node, HBase design principles etc.,
Strong Knowledge on Architecture of Distributed systems and parallel processing, In-depth understanding of MapReduce programing paradigm and Spark execution framework.
Profound understanding of Partitions and Bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance.
Experience in handling messaging services using Apache Kafka (Development of producers and consumers)
Extensive Experience on importing and exporting data using Flume and Kafka.
Experience with migrating data to and from RDBMS and unstructured sources into HDFS using Sqoops.
Analyzed data with Hue, using Apache Hive via Hue’s Beeswax and Catalog applications.
Strong experience in collecting and storing stream data like log data in HDFS using ApacheFlume.
Good understanding of cloud configuration in Amazon web servicesAWS
Experienced in working with Amazon Web Services (AWS) using EC2 for computing and S3 as storage mechanism.
Experience in job workflow scheduling and monitoring tools like Oozie.
Hands on experience on NoSQL databases including HBase, Cassandra and Mongo DB.
Strong experience in collecting and storing stream data like log data in HDFS using ApacheFlume.
Experience in working with Java HBase API for ingestion processed data to Hbase tables.
Experience withOozie Workflow Engineto automate and parallelize Hadoop Map/Reduce, Hive and Pig jobs.
Proficient in using Cloudera Manager, an end-to-end tool to manage Hadoop operations in Cloudera Cluster.
Good experience in working with cloud environment like Amazon Web Services (AWS) EC2 and S3.
Exposure to AWS in using Lambda functions, Architecture and EMR.
Experience in developing ETL scripts for data acquisition and transformation using Informatica and Talend.
Knowledge on Enterprise Data Warehouse (EDW) architecture and various data modeling concepts like star schema.
Experience in implementing Auto Complete/Auto Suggest functionality using Ajax, JQuery, DHTML, Web Service call and JSON
Intensive work experience in developing enterprise solutions using Java, J2EE, Servlets, JSP, JDBC, Struts, Spring, Hibernate, JavaBeans, JSF, MVC.
Profound knowledge on Core Java concepts like I/O, Multi-threading, Exceptions, RegEx, Collections, Data-structures and Serialization.
Expert at creating UML diagrams Use Case diagrams, Activity diagrams, Class diagrams and Sequence diagrams using Microsoft Visio.
Hands on experience in Python to maintain and improve the internet applications.
Excellent problem-solving, analytical, communication, presentation and interpersonal skills that help me to be a core member of any team.
Very good understanding in AGILE Scrum process.
Experience mentoring and working with offshore and distributed teams.

TECHNICAL SKILLS:

Hadoop/Big Data: HDFS, MapReduce, Spark, Yarn, Kafka, PIG, HIVE, Storm, Sqoop,Flume, Oozie, Impala, HBase, Hue,Zookeeper, Mahout

Programming Languages: C, Java, PL/SQL, Pig Latin, Python, HiveQL, Scala, SQL

Java/J2EE & WebTechnologies: J2EE, EJB, JSF, Servlets, JSP, JSTL, CSS, HTML, XML, Angular JS, AJAXDevelopment Tools: Eclipse, Net Beans, SVN, Git, Ant, Maven, SOAP UI, JMX, explorer, XML Spy, QC, QTP, Jira, SQL Developer, QTOAD

Methodologies: Agile/Scrum, UML, Rational Unified Process and Waterfall.

NoSQL Technologies: Cassandra, MongoDB, HBase.

Frameworks: Struts, Hibernate, And Spring MVC.

Scripting Languages: Unix Shell Scripting, perl.

Distributed platforms: Hortonworks, Cloudera, MapR

Databases: Oracle 11g/12C, MySQL, MS-SQL Server, Teradata, IBM DB2

Operating Systems: Windows XP/Vista/7/8,10, UNIX, Linux

Software Package: MS Office 2007/2010/2016.

Web/ Application Servers: WebLogic, WebSphere, ApacheTomcat, WebSphere, Application Server

Visualization: Tableau and MS Excel

Version control: CVS, SVN, GIT, TFS.

PROFESSIONAL EXPERIENCE:

Confidential, Wilmington, DE

Sr. Hadoop Developer

Responsibilities:

Creating end to end Spark applications using Scala to perform various data cleansing, validation, transformation and summarization activities on user behavioral data.
Developed custom Input Adaptor utilizing the HDFS File system API to ingest click stream log files from FTP server to HDFS.
Developed end-to-end data pipeline using FTP Adaptor, Spark, Hive and Impala.
Used Scala to write code for all Spark use cases.
Implementeddesign patternsin Scala for the application.
Implemented Spark using Scala utilized SparkSQL heavily for faster development, and processing of data.
Exploring with Spark for improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, YARN.
Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Java and Scala.
Used Scala collection framework to store and process the complex consumer information.
Implemented a prototype to perform Real time streaming the data using Spark Streaming with Kafka
Handled importing other enterprise data from different data sources into HDFS using Sqoop and performing transformations using Hive, Map Reduce and then loading data into HBase tables.
Exported the analyzed data to the relational databases using Sqoop, to further visualize and generate reports for the BI team.
Collecting and aggregating large amounts of log data using Flume and staging data in HDFS for further analysis
Analyzed the data by performing Hive queries (Hive QL) and running Pig scripts (Pig Latin) to study customer behavior.
Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
Created components like Hive UDFs for missing functionality in HIVE for analytics.
Worked on various performance optimizations like using distributed cache for small datasets, Partition,Bucketing in Hive and Map Side joins.
Created validate and maintain scripts to load data using Sqoop manually.
Created Oozie workflows and coordinators to automate Sqoop jobs weekly and monthly.
Installed and configured Apache Hadoop, Hive and Pig environment on AWS
Implemented POC Spark Cluster on AWS
Uploaded and processed more than 30 terabytes of data from various structured and unstructured sources into HDFS (AWS cloud) using Sqoop and Flume.
Tibco Jasper Soft studio was used for the ireport analysis using AWS cloud
Created reports for the BI team using Sqoop to export data into HDFS and Hive.
Used Oozie and Oozie coordinators to deploy end-to-end data processing pipelines and scheduling the workflows.
Continuous monitoring and managing the Hadoop cluster
Used JUnit framework to perform Unit testing of the application
Developed interactive shell scripts for scheduling various data cleansing and data loading process.
Performed data validation on the data ingested using Spark by building a custom model to filter all the invalid data and cleanse the data.
Experience with data wrangling and creating workable datasets.

Environment: HDFS, Pig, Hive, Sqoop, Flume, Spark, Scala, MapReduce, Scala, Oozie, Oracle 11g, YARN, UNIX Shell Scripting, Agile Methodology

Confidential, Minnetonka, MN

Sr. Spark Developer

Responsibilities:

The main aim of the project is tuning the performance of the existing Hive Queries.
Implemented Spark using Scala, Java and utilizing Data frames and Spark SQL API for faster processing of data.
Created end to end Spark applications using Scala to perform various data cleansing, validation, transformation and summarization activities according to the requirement
Worked with Mahout to understand the Machine Learning algorithms for an efficient data processing
Developed data pipeline using Spark, Hive and Sqoop, to ingest, transform and analyze operational data.
Developed Spark jobs and Hive Jobs to summarize and transform data.
Used Spark for interactive queries, processing of streaming data and integration with popular NoSQL database for huge volume of data.
Involved in converting Hive/SQL queries into Spark transformations using Spark DataFrames and Scala.
Analyzed customer transaction data of a large online retailer, derived many R, F, M variables that could help understand customer behavior and predict the revenue they would generate in a long run based on early transactions
Analyzed the SQL scripts and designed the solution to implement using Scala.
Used Spark for interactive queries, processing of streaming data and integration with popular NoSQL database for huge volume of data.
Real time streaming the data using Spark with Kafka
Created applications using Kafka, which monitors consumer lag within Apache Kafka clusters. Used in production by multiple companies.
Developed Python scripts to copy data between the clusters. The Python script that is developed for the copy enables to copy huge amount of data very fast.
Ingested syslog messages, parses them and streams the data to Apache Kafka.
Handled importing data from different data sources into HDFS using Sqoop and performing transformations using Hive, Map Reduce and then loading data into HDFS.
Exported the analyzed data to the relational databases using Sqoop, to further visualize and generate reports for the BI team.
Collecting and aggregating large amounts of log data using Flume and staging data in HDFS for further analysis
Analyzed the data by performing Hive queries (Hive QL) to study customer behavior.
Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
Developed Hive scripts in Hive QL to de-normalize and aggregate the data.
Created HBase tables and column families to store the user event data.
Scheduled and executed workflows in Oozie to run Hive jobs.

Environment: Hadoop, HDFS, MapR 5.1, HBase, Spark, Scala, Hive, MapReduce, Sqoop, ETL, Java, PL/SQL, Oracle 11g, Unix/Linux, Mahout- Machine Learning

Confidential, New York, NY

Hadoop Developer

Responsibilities:

Lead a team of three developers that built a scalable distributed data solution-using Hadoop on a 30-node cluster using AWS cloud to run analysis on 25+ Terabytes of customer usage data.
Developed several new MapReduce programs to analyze and transform the data to uncover insights into the customer usage patterns.
Used MapReduce to Index the large amount of data to easily access specific records.
Performed ETL using Pig, Hive and MapReduce to transform transactional data to de-normalized form.
Configured periodic incremental imports of data from DB2 into HDFS using Sqoop.
Exported data using Sqoop from HDFSto Teradata on regular basis.
Developed ETL scripts for data acquisition and transformation using Informatica and Talend.
Installed and configuredFlume, Hive, Pig and Sqoop HBaseon the Hadoop cluster.
Exported and analyzed data to the relational databases usingSqoopfor visualization and to generate reports for the BI team.
Supported in setting up QA environment and updating configurations for implementing scripts withPigandSqoop.
Worked extensively with importing metadata into Hive and migrated existing tables and applications to work on Hive and AWS cloud.
Wrote Pig and Hive UDFs to analyze the complex data to find specific user behavior.
Used Solr workflow engine to schedule multiple recurring and ad-hoc Hive and Pig jobs.
Created HBase tables to store various data formats coming from different portfolios.
Created Python scripts in automating the work flows.
Extracted feeds form social media sites such as Facebook Twitter using Python scripts.
Designed and implemented Hive and Pig UDF's using Python for evaluation, filtering, loading and storing of data
Developed Simple to complex Map/reduce streaming jobs using Python language that are implemented using Hive and Pig.
Tibco JasperSoft was used for the embedding BI reports
Experience in writing scripts in Python for the automated jobs
Assisted the team responsible for cluster maintenance, adding and removing cluster nodes, cluster monitoring and troubleshooting, managing and reviewing data backups and Hadoop log files.
Conversion of Teradata, RDBMS are formulated in Hadoop backlog files.
Worked actively with various teams to understand and accumulate data from different sources up on the business requirements
Worked with the testing teams to fix bugs and ensure smooth and error-free code.

Environment: Hadoop, MapReduce, HDFS, Hive, Java, SQL, Cloudera Manager, Pig, Sqoop, Oozie, HBase, ZooKeeper, PL/SQL, MySQL, DB2, Teradata.

Confidential, Denver, CO

Hadoop Developer

Responsibilities:

Responsible for developing efficient MapReduce on AWS cloud programs for more than 20 years’ worth of claim data to detect and separate fraudulent claims.
Developed Map-Reduce programs from scratch of medium to complex.
Uploaded and processed more than 30 terabytes of data from various structured and unstructured sources into HDFS (AWS cloud) using Sqoop and Flume.
Played a key-role is setting up a 40 node Hadoop cluster utilizing Apache MapReduce by working closely with the Hadoop Administration team.
Worked with the advanced analytics team to design fraud detection algorithms and then developed MapReduce programs to run efficiently the algorithm on the huge datasets.
Developed Java programs to perform data scrubbing for unstructured data.
Responsible for designing and managing the Sqoop jobs that uploaded the data from Oracle to HDFS and Hive.
Creating Hive tables to import large data sets from various relational databases using Sqoop and export the analyzed data back for visualization and report generation by the BI team
Used Flume to collect the logs data with error messages across the cluster.
Designed and Maintained Oozie workflows to manage the flow of jobs in the cluster.
Played a key role in installation and configuration of the various Hadoop ecosystem tools such as, Hive, Pig, and HBase.
Successfully loaded files to HDFS from Teradata, and loaded from HDFS to HIVE
Experience in using Zookeeper and Oozie for coordinating the cluster and scheduling workflows
Installed Oozie workflow engine and scheduled it to run data/time dependent Hive and Pig jobs
Designed and developed Dashboards for Analytical purposes using Tableau.
Analyzed the Hadoop log files using Pig scripts to oversee the errors.
Actively updated the upper management with daily updates on the progress of project that include the classification levels in the data.

Environment: Java, Hadoop, Hive, Pig, Sqoop, Flume, HBase, Oracle 10g, Teradata, Cassandra

Confidential, Pleasanton, CA

Java/J2EE Developer

Responsibilities:

Effective role in the team by interacting with welfare business analyst/program specialists and transformed business requirements into System Requirements.
Involved in developing the application using Java/J2EE platform. Implemented the Model View Control (MVC) structure using Struts.
Responsible to enhance the Portal UI using HTML, Java Script, XML, JSP,Java, CSS as per the requirements and providing the client-side Java script validations and Server side Bean Validation Framework (JSR 303).
Developed Web services component using XML, WSDL, and SOAP with DOM parser to transfer and transform data between applications.
Developed analysis level documentation such as Use Case, Business Domain Model, Activity, Sequence and Class Diagrams.
Handling of design reviews and technical reviews with other project stakeholders.
Implemented services using Core Java.
Developed and deployed UI layer logics of sites using JSP.
Spring MVC for the implementation of business model logic.
Used SOAP UI for testing the Restful Webservices by sending an SOAP request.
Used AJAX framework for server communication and seamless user experience.
Created test framework on Selenium and executed Web testing in Chrome, IE and Mozilla through Web driver.
Worked with Struts MVC objects like action Servlet, controllers, and validators, web application context, Handler Mapping, message resource bundles, and JNDI for look-up for J2EE components.
Developed dynamic JSP pages with Struts.
Employed built-in/custom interceptors, and validators of Struts.
Developed the XML data object to generate the PDF documents, and reports.
Employed Hibernate, DAO, and JDBC for data retrieval and medications from database.
Messaging and interaction of web services is done using SOAP.
Developed Junittest cases for Unit Test cases and as well as system, and user test scenarios

Environment: Struts, Hibernate, Spring MVC, SOAP, WSDL, Web Logic, Java, JDBC, Java Script, Servlets, JSP, JUnit, XML, UML, Eclipse, Windows.

Confidential

Jr. Java Developer

Responsibilities:

Involved in designing the Project Structure, System Design and every phase in the project.
Responsible for developing platform related logic and resource classes, controller classes to access the domain and service classes.
Developed UI using HTML, JavaScript, and JSP, and developed Business Logic and Interfacing components using Business Objects, XML, and JDBC.
Designed user-interface and checking validations using JavaScript.
Managed connectivity using JDBC for querying/inserting & data management including triggers and stored procedures.
Involved in Technical Discussions, Design, and Workflow.
Participate in the Requirement Gathering and Analysis.
Developed Unit Testing cases using JUnit Framework.
Implemented the data access using Hibernate and wrote the domain classes to generate the Database Tables.
Involved in design of JSP’s and Servlets for navigation among the modules.
Designed cascading style sheets and XML part of Order entry Module & Product Search Module and did client side validations with java script.
Involved in implementation of view pages based on XML attributes using normal Java classes.
Involved in integration of App Builder and UI modules with the platform.

Environment: Hibernate, Java, JAXB, JUnit, XML, UML, Oracle11g, Eclipse, Windows XP.

We provide IT Staff Augmentation Services!

Sr. Hadoop Developer Resume

Wilmington, DE

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship