Hadoop Developer Resume MN - Hire IT People

PROFESSIONAL SUMMARY:

Over 7+ years of experience with emphasis on Big Data Technologies, Development and Design of Java based enterprise applications.
Excellent understanding and knowledge of Hadoop architecture and various components such as HDFS , Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce programming paradigm.
Experience in installation, configuration, supporting and managing Hadoop Clusters using Apache, Horton works , and Cloud era ( CDH4, CDH5 ) distributions and on Amazon web services ( AWS ).
Hands - on experience on major components in Hadoop Ecosystem including Hive, HBase, HBase-Hive Integration, PIG, Sqoop, Flume & knowledge of Mapper/Reduce/HDFS Framework .
Set up standards and processes for Hadoop based application design and implementation.
Worked on NoSQL databases including Hbase, Cassandra and MongoDB .
Good experience in analysis using PIG and HIVE and understanding of SQOOP and Puppet .
Extensive experience in data analysis using tools like Sync sort and HZ along with Shell Scripting and UNIX.
Experience on ETL development using Kafka, Flume, and Sqoop and Expertise in database performance tuning & data modelling .
Developed automated scripts using Unix Shell for performing RUNSTATS , REORG , REBIND, COPY, LOAD, BACKUP, IMPORT, EXPORT and other related to database activities.
Experienced in developing Map Reduce programs using Apache Hadoop for working with Big Data
Good understanding of XML methodologies (XML, XSL, XSD) including Web Services and SOAP.
Expertise in working with different databases likes Oracle, MS-SQL Server, Postgress , and MS Access 2000 along with exposure to Hibernate for mapping an object-oriented domain model to a traditional relational database.
Familiarity and experience with data warehousing and ETL tools .
Good knowledge on Apache Spark and Scala.
Good understanding of Scrum methodologies, Test Driven Development and continuous integration.
Experience in production support and application support by fixing bugs.
Major strengths are familiarity with multiple software systems, ability to learn quickly new technologies, adapt to new environments, self-motivated, team player, focused adaptive and quick learner with excellent interpersonal, technical and communication skills.
Good knowledge on Teradata Manager, TDWM, PMON, DBQL, SQL assistant and BTEQ.
Expertise in creating databases, users, tables, triggers, macros, views, stored procedures, functions, Packages, joins and hash indexes in Teradata database.
Extensively worked with Teradata utilities like BTEQ, Fast Export, Fast Load, Multi Load to export and load data to/from different source systems including flat files.
Hands on experience using query tools like TOAD, SQL Developer, PLSQL developer, Teradata SQL Assistant and Query man.
Expertise in writing large/complex queries using SQL.
Proficient in performance analysis, monitoring and SQL query tuning using EXPLAIN PLAN, Collect Statistics, Hints and SQL Trace both in Teradata as well as Oracle.

TECHNICAL SKILLS:

Big Data Technologies: HDFS, Hive, Map Reduce, Cassandra, Pig, Storm, Spark, Kafka, Hbase, Scoop, Flume, Zookeeper, Oozie, Avro

Apache Hadoop Distributions: Cloud era (CDH3/CDH4/CDH5), Horton Works

Operating systems: Windows, Centos, Ubuntu, Red Hat Linux, Linux, UNIX

Scripting Languages: Java, SQL, Unix Shell Scripting, PL/SQL, Python

Databases/Technologies: DB Oracle, MySQL, Teradata. Oracle 8i/9i/10g, Microsoft NoSQL, SQL Server, Teradata

Java Technologies: JDBC, Servlets, JSP, Spring and Hibernate

IDE Tools: Eclipse, Net Beans

Application Servers: Tomcat, Web Logic, Web Sphere

ETL Tools: Informatica, Pentaho, SSRS, SSIS, Cognos

Frameworks: MVC, Struts, Hibernate, Spring

Development Strategies: Agile, Lean Agile, Scrum, Water Fall and Test Driven Development

PROFESSIONAL EXPERIENCE:

Hadoop Developer

Confidential, MN

Responsibilities:

Installed Hadoop, Map Reduce, HDFS, AWS and developed multiple Map Reduce jobs in PIG and Hive for data cleaning and pre-processing.
Developed Map Reduce programs to parse the raw data, populate staging tables an store the refined data in partitioned tables in the EDW
Created Hive queries that helped market analysts spot emerging trends by comparing fresh data with EDW tables and historical metrics
Responsible in working with Message broker system such as Kafka
Worked on Storm Topology integrated with Kafka in order to achieve near to real time processing of various Tests involved in Manufacturing of Trucks.
Worked on data warehouse product Amazon Redshirt which is a part of the AWS
Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios.
Enabled speedy reviews and first mover advantages by using Oozie to automate data loading into the Hadoop Distributed File System and PIG to pre-process the data
Implemented and extracted the data from Hbase using SPARK
Shared responsibility for administration of Hadoop, Hive and Pig and built wrapper shell scripts to hold this Oozie workflow.
Involved in testing the AWS Red shift to connecting with SQL database for testing and storing data in POC
Designed and developed data management system using MySQL
Involved in collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
Involved in creating Hadoop streaming jobs using Python.
Provided ad-hoc queries and data metrics to the Business Users using Hive, Pig.
Familiarity with NoSQL databases including HBase, MongoDB .
Developed PIG Latin scripts to extract the data from the web server output files to load into HDFS.
Used Pig as ETL tool to do transformations, event joins and some pre-aggregations before storing the data onto HDFS.
Worked on Map Reduce Joins in querying multiple semi-structured data as per analytic needs.
Worked on Performance optimization of Spot fire applications, Spot fire server configuration
Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
Created many Java UDF and UDAFs in hive for functions that were not pre-existing in Hive like the rank, Csum, etc.
Used Hive and created Hive tables and involved in data loading and writing Hive UDFs.
Developed POC for Apache Kafka .
Gained knowledge on building Apache Spark applications using Scala.
Storing and loading the data from HDFS to Amazon S3 and backing up the Namespace data into NFS Filers.
Created concurrent access for hive tables with shared and exclusive locking that can be enabled in hive with the help of Zookeeper implementation in the cluster.
Wrote the shell scripts to monitor the health check of Hadoop daemon services and respond accordingly to any warning or failure conditions.
Wrote shell scripts for rolling day-to-day processes and it is automated
Used Tera data utilities fast load, multiload, tpump to load data.
Wrote Teradata Macros and used various Teradata analytic functions .
Involved in migration projects to migrate data from data warehouses on Oracle/DB2 and migrated those to Teradata.

Environment : Hadoop , Map Reduce, Hive, HDFS, PIG, Sqoop, Oozie, Cloud era, Flume, HBase, Spot fire, Zookeeper, CDH3, MongoDB, AWS Red shift, Cassandra, Oracle, NoSQL and Unix/Linux, Spark, Kafka, Amazon web services .

Hadoop Consultant

Confidential, Morris Plains, NJ

Responsibilities:

Installed and configured various components of Cloudera Hadoop ecosystem and maintained their integrity
Implemented Fair scheduler on the job tracker to allocate the fair amount of resources to small jobs.
Use Spark API over Horton works Hadoop YARN to perform analytics on data in Hive
Worked on implementing custom Hive and Pig UDF's to transform large amounts of data.
Installed and configured Cloudera Manager and Involved in building a multi node Hadoop cluster.
Used Apache flume to ingest log data from multiple sources directly into Accumulo, file roll and HDFS
Worked in AWS environment for development and deployment Custom Hadoop Applications
Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting on the dashboard
Extracted data from mainframes and feed to KAFKA and ingested to HBASE to perform Analytics
Implemented Installation and configuration of multi-node cluster on Cloud using Amazon Web Services(AWS) on EC2
Worked on NoSQL databases including HBase and Elastic Search
Developed multiple POCs using Scala and deployed on the Yarn cluster , compared the performance of Spark, with Cassandra and SQL
Involved in extracting customer's Big data from various data sources into Hadoop HDFS. This included data from mainframes, databases and also log data from servers.
Worked on Commissioning and Decommissioning Data Nodes and Task Trackers .
Installed and configured Cassandra cluster and CQL on the cluster.
Developed multiple Map Reduce jobs in Java for data cleaning and processing.
Worked on implementing custom Hive and Pig UDF's to transform large amounts of data.
Experience in analyzing data using hive queries, pig scripts and Map Reduce programs.
Implemented Oozie workflow engine to manage inter-dependent Hadoop jobs and to automate Hadoop jobs such as Hive, Sqoop and system-specific jobs.
Exported the business required information to RDBMS using Sqoop to make the data available for BI team to generate reports based on data.
Worked on creating Key Spaces and loading data on the Cassandra Cluster.
Monitored the nodes, the streaming process between nodes during the start up of new nodes and clearing of keys which are no longer used using the node tool utility.

Environment : Linux, Java, Map Reduce, AWS Red shift, HDFS, Oracle, SQL server, Accumulo, Hive, Pig, Sqoop, Cloudera manager, Cassandra

Java/Hadoop developer

Confidential, Fremont, CA

Responsibilities:

Exported data from DB2 to HDFS using Sqoop and Developed Map Reduce jobs using Java API.
Installed and configured Pig and also wrote Pig Latin scripts.
Created and maintained Technical documentation for launching Cloudera Hadoop Clusters and for executing Hive queries and Pig Scripts
Wrote Map Reduce jobs using Pig Latin and worked on Cluster coordination services through Zookeeper
Designed and developed a fully functional generic n-tiered J2EE application platform the environment was Oracle technology driven. The entire infrastructure application was developed using Oracle JDeveloper in conjunction with Oracle ADF-BC and Oracle ADF- Rich Faces
Programmed ETL functions between Oracle and Amazon Red shift
Developed workflow using Oozie for running MapReduce jobs and Hive Queries.
Optimized Map/Reduce Jobs to use HDFS efficiently by using various compression mechanisms
Optimized the configuration of Amazon Red shift clusters, data distribution, and data processing.
Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
Implemented various Java/J2EE design patterns such as Singleton, Session Façade, Business Delegate, Value Object, Data Access Object
Implementing various advanced join operations using Pig Latin.
Involved in loading data from LINUX file system to HDFS.
Experienced in importing and exporting data into HDFS and assisted in exporting analyzed data to RDBMS using SQOOP.
Experienced in running Hadoop streaming jobs to process terabytes of xml format data.
Assisted in exporting analyzed data to relational databases using Sqoop.
Used SOAPUI for testing the web service response
Experience in defining, designing and developing Java applications, specially using Hadoop Map/Reduce by leveraging frameworks such as Cascading and Hive.
Experience in Document designs and procedures for building and managing Hadoop clusters .
Strong Experience in troubleshooting the operating system, maintaining the cluster issues and also java related bugs.
Successfully loaded files to Hive and HDFS from Mongo DB Solar .
Experience in Automate deployment, management and self-serve troubleshooting applications.
Define and evolve existing architecture to scale with growth data volume, users and usage.
Design and develop JAVA API (Commerce API) which provides functionality to connect to the Cassandra through Java services.
Installed and configured Hive and also written Hive UDFs.
Experience in managing development time, bug tracking, project releases, development speed, release forecast, scheduling and many more.

Environment : Hadoop, HDFS, Hive, Flume, Sqoop, HBase, ORACLE, PIG, Eclipse, MySQL and Ubuntu, Zookeeper, Red shift, Java (JDK 1.6)

Java Developer

Confidential

Responsibilities:

Gathered user requirements followed by analysis and design. Evaluated various technologies for the client.
Designed and developed the HTML based web pages for displaying the reports and developed Java classes and JSP files.
Implemented the presentation layer with HTML, XHTML and JavaScript.
Developed, Tested and Debugged the Java, JSP and EJB components using Eclipse.
Prepared the Installation, Customer guide and Configuration document which were delivered to the customer along with the product.
Extensively used JSF framework.
Extensively used XML documents with XSLT and CSS to translate the content into HTML to present to GUI.
Involved in development of JavaScript code for client side Validations.
Involved in the implementation of business logic in struts Framework and Hibernate in the back-end.
J2EE is used to develop the application based on MVC architecture
Wrote complex SQL queries and stored procedures Developed Java Mail for automatic emailing and JNDI to interact with the knowledge server.
Used Struts Framework to implement J2EE design patterns (MVC).
Prepared the Installation, Customer guide and Configuration document which were delivered to the customer along with the product.
Developed Enterprise java Beans like Entity Beans, session Beans (both Stateless and State full Session beans) and Message Driven Beans.

Environment : Java, J2EE, EJB 2.1, JSP 2.0, Servlets 2.4, JNDI 1.2, Java Mail 1.2, JDBC 3.0, Struts, HTML, XML, CORBA, XSLT, Java Script, Eclipse3.2, Oracle10g, Weblogic8.1, Windows 2003.

Java Developer

Confidential

Responsibilities:

Involved in the analysis, design, implementation, and testing of the project
Created the Database , User, Environment, Activity, and Class diagram for the project (UML).
Implement the Database using Oracle database engine
Designed and developed a fully functional generic n-tiered J2EE application platform the environment was Oracle technology driven.
The entire infrastructure application was developed using Oracle JDeveloper in conjunction with Oracle ADF-BC and Oracle ADF- RichFaces.
Created an entity object (business rules and policy, validation logic, default value logic, security)
Experience on working in Agile development following SCRUM process, Sprint and daily stand-up meetings
Created View objects, View Links, Association Objects, Application modules with data validation rules (Exposing Linked Views in an Application Module), LOV, dropdown, value defaulting, transaction management features.
Web application development using J2EE: JSP, Servlets, JDBC, Java Beans, Struts, Ajax, JSF, JSTL, Custom Tags, EJB, JNDI, Hibernate, ANT, JUnit and Apache Log4J, Web Services, Message Queue (MQ).
Designing GUI prototype using ADF 11G GUI component before finalizing it for development.
Experience using Version controls such as CVS, PVCS, and Rational Clear Case.

Environment : Java core, Servlet, JSF, ADF Rich client UI Framework ADF-BC (BC4J), web services Using Oracle SOA (BPEl), Oracle Web Logic.

We provide IT Staff Augmentation Services!

Hadoop Developer Resume

MN

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship