Hadoop Developer/SPARK Resume Beaverton, ORÂ - Hire IT People

SUMMARY:

Offering 7+ years of overall IT experience in Application development in Java and Big Data Hadoop.
Expertise in Hadoop, HDFS, Map Reduce and Hadoop Ecosystem including Hive, HBase, HBase - Hive, Integration, PIG, Sqoop, Flume, Oozie, Zookeeper & knowledge of Mapper/Reduce/HDFS Framework.
Good working experience on Apache Hadoop Map Reduce programming, PIG Scripting and HDFS.
Knowledge of NO SQL databases like Mongo DB and Cassandra.
Good understanding on Hadoop MR1 and MR2 (YARN) Architecture.
Good understanding on Hadoop architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, Application Master, Resource Manager, Node Manager and MapReduce programming paradigm.
Involved in writing Pig scripts to reduce the job execution time.
Experienced in loading the huge data from local file system and HDFS to Hive and writing complex queries to load data into internal tables.
Good hands-on experience in Apache Spark with Scala.
Used Spark Streaming to divide streaming data into batches as an input to Spark engine for batch processing.
Developed Spark SQL programs for handling different data sets for better performance.
Hands on experience in Cloudera and Hortonworks Hadoop environments.
Good understanding of Hadoop administration with Hortonworks.
Good Knowledge on real time data feeding platform-KAFKA, integration software like Talend and NOSQL databases like MongoDB, HBase and Cassandra.
Experience working with interactive applications like TEZ.
Configured TEZ on Hive and PIG to achieve better responsive time while running MR Jobs.
Experienced in loading data to Hive partitions and bucketing.
Experience in installation, configuration, supporting and managing Hadoop Clusters using Apache, Cloudera distributions.
Worked with ETL/ELT tools (e.g. Talend)
Have Good Knowledge on Talend for Integration and Hadoop.
Uses Talend Open Studio to load files into Hadoop HIVE tables and performed ETL aggregations in Hadoop HIVE.
Expertise on Scala Programming language and Spark Core.
Designing & Creating ETL Jobs through Talend to load huge volumes of data into Cassandra, Hadoop Ecosystem and relational databases.
Experience in using Maven 2.0 to compile, package and deploy to the application servers.
Skilled in data management, data extraction, manipulation, validation, and analyzing huge volume of data.
Extensive expertise in creating and Automation of workflows using Oozie workflow Engine.
Scheduled jobs using Oozie Coordinator, to execute jobs on specific days (excluding weekends).
Very Good understanding of SQL, ETL and Data Warehousing Technologies.
Extensive experience working in Oracle, SQL Server and MySQL database. Hands on experience in application development using Java and RDBMS.
Experience in UNIX Shell scripting.
Expert in TSQL, creating and using Stored Procedures, Views, User Defined Functions, implementing Business Intelligence solutions using SQL Server.
Hands on experience in developing the applications with Java, J2EE, JSP, EJB, SOAP, JDBC2, XML, HTML, XSD, XSLT, PL/SQL, Oracle10g.
Strong knowledge of version control systems like SVN & GIT.

TECHNICAL SKILLS:

Hadoop: HDFS, Map Reduce, YARN, Spark Core, Spark Streaming, Spark SQL, Hive, Tez, Pig, Sqoop, Flume, Kafka, Oozie, and ZooKeeper.

Languages: Java, Scala, PL/SQL, Pig Latin, HiveQL, UNIX shell scripts.

Database: Oracle 10g, MySQL.

No SQL Database: HBase, Cassandra, MongoDB.

Web Technologies: HTML, XML, CSS, XSLT, XHTML.

Web Servers: Apache Tomcat, JBoss.

J2EE Technologies: JDBC, Amazon Cloud (S3, EC2).

Frameworks: Spring, MVC, Struts.

Tools: & IDEs: Eclipse, NetBeans, Maven, Toad, DB Visualizer.

Operating Systems: Windows, Linux (Cent OS, Ubuntu).

WORK EXPERIENCE:

Hadoop Developer/SPARK

Confidential - Beaverton, OR

Responsibilities:

Installed and Configured Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, Oozie, Zookeeper, HBase, Flume and Sqoop.
Working on large-scale Hadoop YARN cluster for distributed data Storage, processing and analysis.
Worked totally in agile methodology and also developed Spark scripts by using Scala shell.
Implemented multiple Map Reduce Jobs in java for data cleaning and pre-processing.
Worked in a team with 30 node cluster and increased cluster by adding Nodes, the configuration for additional data nodes was done by Commissioning process in Hadoop.
Importing the data from the MySQL and Oracle into the HDFS using Sqoop.
Improved the performance and optimization of the existing algorithms in Hadoop using Spark Context. Spark- SQL, Data Frame, Pair RDD's, Spark YARN.
Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS.
Used IMPALA for querying the HDFS data.
Developed and implemented two Service Endpoints (end to end) in Java using Play framework, Akka server Hazelcast.
Services like EC2 and S3 for small data sets.
Developed the Pig UDF'S to pre-process the data for analysis.
Used Apache kafka to get the data from kafka producer which in turn pushes data to broker.
Used Spark API over Hadoop YARN to perform analytics on data in Hive.
Written robust/reusable HiveQL Scripts and UDF's in Hive using Java.
Experience with Test Driven Development (TDD) and acceptance- test using Behave.
Implemented partitioning, bucketing in Hive for better organization of the data.
Designed and built unit tests and executed operational queries on HBase.
Built Apache Avro schemas for publishing messages to topics and enabled relevant serializing formats for message publishing and consumption.
Implemented a script to transmit information from Oracle to HBase using Sqoop.
Worked on migrating MapReduce Python programs into Spark transformations using Spark.
Experience in working with NoSQL database HBase in getting real time data analytics using Apache Spark.
Implemented authentication and authorization service using Kerberos authentication Protocol.
Installed Oozie workflow engine to run multiple Map Reduce, HiveQL and Pig jobs.
Implemented a script to transmit information from Webservers to Hadoop using Flume.
Used Zookeeper to manage coordination among the clusters.
Used Apache Kafka and Apache Storm to gather log data and fed into HDFS.
Developed Scala program for data extraction using Spark Streaming.
Setting up and managing Kafka for Stream processing.
Used Pig as a ETL tool to do Transformations, even joins and some pre-aggregations before storing data into HDFS.
Developed Oozie workflow for scheduling and orchestrating the ETL process.
Created Produce, consumer and Zookeeper setup to Kafka replication.
Experienced with batch processing of data source using Apache Spark and Elastic search.
Experienced in implementing Spark RDD transformations, actions to implement business analysis.
Implemented Spark using Scala and Spark SQL for faster testing and processing of data.
Developed NoSQL database by using CRUD, Indexing, Replication and Sharing in MongoDB.
Created and maintained Technical documentation for launching HADOOP Clusters and for executing Hive queries and Pig Scripts.

Environment: Hadoop, MapReduce, YARN, Agile methodologies, HDFS, Hive, Cloudera, Core Java, Scala, SQL, Flume, Spark, Pig, Sqoop, Oozie, impala, Pyhton, AWS, Hbase, Kafka, AVRO, Oracle, Unix.

Big Data Analyst

Confidential - Dallas, TX

Responsibilities:

Responsible to manage data coming from different sources, loading of structured and unstructured data and involved in HDFS maintenance.
Write Unix shell scripts in combination with the Talend data maps to process the source files and load into database.
Worked in Agile methodology for Development.
Responsible for building scalable distributed data solutions using Hadoop.
Created Data Pipeline of Map Reduce programs using Chained Mappers.
Implemented Hadoop YARN jobs to write data into Avro format.
Processed Multiple Data sources input to same Reducer using Generic Writable and Multi Input format.
Developed and executed hive queries for denormalizing the data.
Imported data using Sqoop to load data from MySQL to HDFS on regular basis.
Implemented Optimized join base by joining different data sets to get top claims based on state using Map Reduce.
Worked Big data processing of clinical and non-clinical data using Map Reduce.
Performed data validation on the data ingested using Hadoop YARN by building a custom model to filter all the invalid data and cleanse the data.
Familiarity with a NoSQL database such as MongoDB, Cassandra.
Used Flume for importing log files from various sources into HDFS.
Load log data into HDFS using Flume, Kafka and performing ETL integrations.
Created customized BI tool for manager team that perform Query analytics using HiveQL.
Implemented Partitions, Bucketing concepts in Hive and designed both Managed and External tables in Hive for optimized performance.
Written Hive UDF to sort Structure fields and return complex data type.
Worked on documentation of all Extract, Transform and Load: designed, developed, validated and deployed the Talend ETL processes for the data warehouse teams using PIG and HIVE on Hadoop.
Involved in Installing and configuring Kerberos for the authentication of users and Hadoop daemons.
Working on PIG Latin Scripts and UDF's while ingestion, querying, processing and analysis of Data.
Responsible for cluster maintenance, adding and removing cluster nodes, cluster monitoring and troubleshooting, manage and review data backups, manage and review Hadoop log files.
Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java map-reduce Hive, Pig, and Sqoop.
Implemented JMS for asynchronous auditing purposes.
Develop data ingestion jobs in Talend to acquire, stage, and aggregate data in technologies such as HAWQ, Hive, Spark, HDFS.
Modeled Hive partitions extensively for data separation and faster data processing and followed Pig and Hive best practices for tuning.
Developed Hive queries to process the data and generate the data cubes for visualizing.
Worked on custom Pig Loaders and storage classes to work with variety of data formats such as JSON and XML file formats.

Environment: Hadoop, Agile methodologies, Talend, HDFS, HBase, MongoDb, YARN, Java, Hive, Pig, Sqoop, Flume, Oozie, Hue, SQL, ETL, Cloudera Manager, AVRO, Oracle, My SQL.

Big Data Analyst/Java Developer

Confidential - Kalamazoo, MI

Responsibilities:

Installed and configured Apache Hadoop to test the maintenance of log files in Hadoop cluster.
Installed and configured Hive, Pig, Sqoop, and Oozie on the Hadoop cluster.
Installed and configured Hadoop MapReduce, HDFS and developed multiple MapReduce jobs in Java for data cleansing and preprocessing.
Implemented project using Agile SCRUM methodology, involved in daily stand up meetings and sprint showcase and sprint retrospective.
Deployed the Big Data Hadoop application using Talend on cloud AWS.
Extensively Involved in loading data from UNIX file system to HDFS.
Involved in evaluating the business requirements and prepared detailed specifications that follow project guidelines required to develop written programs.
Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
Provided quick response to ad hoc internal and external client requests for data and experienced in creating ad hoc reports.
Responsible for building scalable distributed data solutions using Hadoop.
Implemented Map Reduce jobs in HIVE by querying the available data.
Used Amazon Redshift to Store and retrieve the data from data-warehouses.
Experience in using Hive and Pig as an ETL tool for event joins, filters, transformations and pre- aggregations.
Developed PIG scripts to transform the raw data into intelligent data as specified by business users.
Supported in setting up QA environment and updating configurations for implementing scripts with Pig.
Performed some unit testing for the development team within the sandbox environment.
Used Hive and created Hive tables and also involved in writing Hive UDFs and data loading.
Imported data into HDFS and Hive from other data systems by using Sqoop.
Installed Oozie Workflow engine to run multiple Hive and Pig Jobs.
Generated aggregations and groups and visualizations using Tableau.
Developed Hive queries to process the data.
Presented data and dataflow using Talend for reusability.
Developed and maintain several batch jobs to run automatically depending on business requirements.

Environment: Apache Hadoop, Cloudera Manager, CDH2, CDH3 CentOS, Apache Hama, Talend, Eclipse Indigo, Java, MapReduce, Hive, Sqoop, Pig, Oozie and SQL, Struts, JUnit.

Java Developer

Confidential

Responsibilities:

Involved in design and development phases of Software Development Life Cycle (SDLC).
Involved in designing UML Use case diagrams, Class diagrams, and Sequence diagrams using Rational Rose.
Followed agile methodology and SCRUM meetings to track, optimize and tailored features to customer needs.
Developed user interface using JSP, JSP Tag libraries, and Java Script to simplify the complexities of the application.
Implemented Model View Controller (MVC) architecture using Jakarta Struts frameworks at presentation tier.
Developed a Dojo based front end including forms and controls and programmed event handling.
Implemented SOA architecture with web services using JAX-RS (REST) and JAX-WS (SOAP).
Developed various Enterprise Java Bean components to fulfill the business functionality.
Created Action Classes which route submittals to appropriate EJB components and render retrieved information.
Validated all forms using Struts validation framework and implemented Tiles framework in the presentation layer.
Used Core java and object oriented concepts.
Used Spring Framework for Dependency injection and integrated it with the Struts Framework.
Used JDBC to connect to backend databases, Oracle and SQL Server 2005.
Proficient in writing SQL queries, stored procedures for multiple databases, Oracle and SQL Server 2005.
Wrote Stored Procedures using PL/SQL. Performed query optimization to achieve faster indexing and making the system more scalable.
Deployed application on windows using IBM Web Sphere Application Server.
Used Java Messaging Services (JMS) for reliable and asynchronous exchange of important information such as payment status report.
Used Web Services - WSDL and REST for getting credit card information from third party and used SAX and DOM XML parsers for data retrieval.
Implemented SOA architecture with web services using Web Services like JAX-WS.
Used ANT scripts to build the application and deployed on Web Sphere Application Server.

Environment: Core Java, Agile methodologies, J2EE, Oracle, SQL Server, JSP, Struts, Spring, JDK, JavaScript, HTML, CSS, AJAX, JUnit, Log4j, Web Services, Windows.

Jr Java/J2EE Developer

Confidential

Responsibilities:

Involved in specification analysis and identifying the requirements.
Participated in design discussions for the methodology of requirement implementation
Involved in preparation of the Code Review Document & Technical Design Document
Designed the presentation layer by developing the jsp pages for the modules
Developed controllers and JavaBeans encapsulating the business logic
Developed classes to interface with underlying web services layer
Used patterns including MVC, DAO, DTO, Front Controller, Service Locator and Business Delegate.
Worked on Service Layer which provided business logic implementation.
Involved in building PL\SQL queries and stored procedures for Database operations.
Used Jasper Reports to provide print preview of Financial Reports and Monthly Statements.
Carried out integration testing & acceptance testing
Used JMeter to carry out performance tests on external web service calls, database connections and other dynamic resources.
Participated in the team meetings and discussed enhancements, issues and proposed feasible solutions.

Environment: Java1.4, J2EE 1.4 Servlet, JSP, JDBC, XML, ANT, Apache Tomcat 5.0, Oracle 8i, JUnit, PL\SQL, UML, NetBeans.

We provide IT Staff Augmentation Services!

Hadoop Developer/spark Resume

Beaverton, Or

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship