Hadoop Developer/spark Resume
Beaverton, Or
SUMMARY:
- Offering 7+ years of overall IT experience in Application development in Java and Big Data Hadoop.
- Expertise in Hadoop, HDFS, Map Reduce and Hadoop Ecosystem including Hive, HBase, HBase - Hive, Integration, PIG, Sqoop, Flume, Oozie, Zookeeper & knowledge of Mapper/Reduce/HDFS Framework.
- Good working experience on Apache Hadoop Map Reduce programming, PIG Scripting and HDFS.
- Knowledge of NO SQL databases like Mongo DB and Cassandra.
- Good understanding on Hadoop MR1 and MR2 (YARN) Architecture.
- Good understanding on Hadoop architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, Application Master, Resource Manager, Node Manager and MapReduce programming paradigm.
- Involved in writing Pig scripts to reduce the job execution time.
- Experienced in loading the huge data from local file system and HDFS to Hive and writing complex queries to load data into internal tables.
- Good hands-on experience in Apache Spark with Scala.
- Used Spark Streaming to divide streaming data into batches as an input to Spark engine for batch processing.
- Developed Spark SQL programs for handling different data sets for better performance.
- Hands on experience in Cloudera and Hortonworks Hadoop environments.
- Good understanding of Hadoop administration with Hortonworks.
- Good Knowledge on real time data feeding platform-KAFKA, integration software like Talend and NOSQL databases like MongoDB, HBase and Cassandra.
- Experience working with interactive applications like TEZ.
- Configured TEZ on Hive and PIG to achieve better responsive time while running MR Jobs.
- Experienced in loading data to Hive partitions and bucketing.
- Experience in installation, configuration, supporting and managing Hadoop Clusters using Apache, Cloudera distributions.
- Worked with ETL/ELT tools (e.g. Talend)
- Have Good Knowledge on Talend for Integration and Hadoop.
- Uses Talend Open Studio to load files into Hadoop HIVE tables and performed ETL aggregations in Hadoop HIVE.
- Expertise on Scala Programming language and Spark Core.
- Designing & Creating ETL Jobs through Talend to load huge volumes of data into Cassandra, Hadoop Ecosystem and relational databases.
- Experience in using Maven 2.0 to compile, package and deploy to the application servers.
- Skilled in data management, data extraction, manipulation, validation, and analyzing huge volume of data.
- Extensive expertise in creating and Automation of workflows using Oozie workflow Engine.
- Scheduled jobs using Oozie Coordinator, to execute jobs on specific days (excluding weekends).
- Very Good understanding of SQL, ETL and Data Warehousing Technologies.
- Extensive experience working in Oracle, SQL Server and MySQL database. Hands on experience in application development using Java and RDBMS.
- Experience in UNIX Shell scripting.
- Expert in TSQL, creating and using Stored Procedures, Views, User Defined Functions, implementing Business Intelligence solutions using SQL Server.
- Hands on experience in developing the applications with Java, J2EE, JSP, EJB, SOAP, JDBC2, XML, HTML, XSD, XSLT, PL/SQL, Oracle10g.
- Strong knowledge of version control systems like SVN & GIT.
TECHNICAL SKILLS:
Hadoop: HDFS, Map Reduce, YARN, Spark Core, Spark Streaming, Spark SQL, Hive, Tez, Pig, Sqoop, Flume, Kafka, Oozie, and ZooKeeper.
Languages: Java, Scala, PL/SQL, Pig Latin, HiveQL, UNIX shell scripts.
Database: Oracle 10g, MySQL.
No SQL Database: HBase, Cassandra, MongoDB.
Web Technologies: HTML, XML, CSS, XSLT, XHTML.
Web Servers: Apache Tomcat, JBoss.
J2EE Technologies: JDBC, Amazon Cloud (S3, EC2).
Frameworks: Spring, MVC, Struts.
Tools: & IDEs: Eclipse, NetBeans, Maven, Toad, DB Visualizer.
Operating Systems: Windows, Linux (Cent OS, Ubuntu).
WORK EXPERIENCE:
Hadoop Developer/SPARK
Confidential - Beaverton, OR
Responsibilities:
- Installed and Configured Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, Oozie, Zookeeper, HBase, Flume and Sqoop.
- Working on large-scale Hadoop YARN cluster for distributed data Storage, processing and analysis.
- Worked totally in agile methodology and also developed Spark scripts by using Scala shell.
- Implemented multiple Map Reduce Jobs in java for data cleaning and pre-processing.
- Worked in a team with 30 node cluster and increased cluster by adding Nodes, the configuration for additional data nodes was done by Commissioning process in Hadoop.
- Importing the data from the MySQL and Oracle into the HDFS using Sqoop.
- Improved the performance and optimization of the existing algorithms in Hadoop using Spark Context. Spark- SQL, Data Frame, Pair RDD's, Spark YARN.
- Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS.
- Used IMPALA for querying the HDFS data.
- Developed and implemented two Service Endpoints (end to end) in Java using Play framework, Akka server Hazelcast.
- Services like EC2 and S3 for small data sets.
- Developed the Pig UDF'S to pre-process the data for analysis.
- Used Apache kafka to get the data from kafka producer which in turn pushes data to broker.
- Used Spark API over Hadoop YARN to perform analytics on data in Hive.
- Written robust/reusable HiveQL Scripts and UDF's in Hive using Java.
- Experience with Test Driven Development (TDD) and acceptance- test using Behave.
- Implemented partitioning, bucketing in Hive for better organization of the data.
- Designed and built unit tests and executed operational queries on HBase.
- Built Apache Avro schemas for publishing messages to topics and enabled relevant serializing formats for message publishing and consumption.
- Implemented a script to transmit information from Oracle to HBase using Sqoop.
- Worked on migrating MapReduce Python programs into Spark transformations using Spark.
- Experience in working with NoSQL database HBase in getting real time data analytics using Apache Spark.
- Implemented authentication and authorization service using Kerberos authentication Protocol.
- Installed Oozie workflow engine to run multiple Map Reduce, HiveQL and Pig jobs.
- Implemented a script to transmit information from Webservers to Hadoop using Flume.
- Used Zookeeper to manage coordination among the clusters.
- Used Apache Kafka and Apache Storm to gather log data and fed into HDFS.
- Developed Scala program for data extraction using Spark Streaming.
- Setting up and managing Kafka for Stream processing.
- Used Pig as a ETL tool to do Transformations, even joins and some pre-aggregations before storing data into HDFS.
- Developed Oozie workflow for scheduling and orchestrating the ETL process.
- Created Produce, consumer and Zookeeper setup to Kafka replication.
- Experienced with batch processing of data source using Apache Spark and Elastic search.
- Experienced in implementing Spark RDD transformations, actions to implement business analysis.
- Implemented Spark using Scala and Spark SQL for faster testing and processing of data.
- Developed NoSQL database by using CRUD, Indexing, Replication and Sharing in MongoDB.
- Created and maintained Technical documentation for launching HADOOP Clusters and for executing Hive queries and Pig Scripts.
Environment: Hadoop, MapReduce, YARN, Agile methodologies, HDFS, Hive, Cloudera, Core Java, Scala, SQL, Flume, Spark, Pig, Sqoop, Oozie, impala, Pyhton, AWS, Hbase, Kafka, AVRO, Oracle, Unix.
Big Data Analyst
Confidential - Dallas, TX
Responsibilities:
- Responsible to manage data coming from different sources, loading of structured and unstructured data and involved in HDFS maintenance.
- Write Unix shell scripts in combination with the Talend data maps to process the source files and load into database.
- Worked in Agile methodology for Development.
- Responsible for building scalable distributed data solutions using Hadoop.
- Created Data Pipeline of Map Reduce programs using Chained Mappers.
- Implemented Hadoop YARN jobs to write data into Avro format.
- Processed Multiple Data sources input to same Reducer using Generic Writable and Multi Input format.
- Developed and executed hive queries for denormalizing the data.
- Imported data using Sqoop to load data from MySQL to HDFS on regular basis.
- Implemented Optimized join base by joining different data sets to get top claims based on state using Map Reduce.
- Worked Big data processing of clinical and non-clinical data using Map Reduce.
- Performed data validation on the data ingested using Hadoop YARN by building a custom model to filter all the invalid data and cleanse the data.
- Familiarity with a NoSQL database such as MongoDB, Cassandra.
- Used Flume for importing log files from various sources into HDFS.
- Load log data into HDFS using Flume, Kafka and performing ETL integrations.
- Created customized BI tool for manager team that perform Query analytics using HiveQL.
- Implemented Partitions, Bucketing concepts in Hive and designed both Managed and External tables in Hive for optimized performance.
- Written Hive UDF to sort Structure fields and return complex data type.
- Worked on documentation of all Extract, Transform and Load: designed, developed, validated and deployed the Talend ETL processes for the data warehouse teams using PIG and HIVE on Hadoop.
- Involved in Installing and configuring Kerberos for the authentication of users and Hadoop daemons.
- Working on PIG Latin Scripts and UDF's while ingestion, querying, processing and analysis of Data.
- Responsible for cluster maintenance, adding and removing cluster nodes, cluster monitoring and troubleshooting, manage and review data backups, manage and review Hadoop log files.
- Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java map-reduce Hive, Pig, and Sqoop.
- Implemented JMS for asynchronous auditing purposes.
- Develop data ingestion jobs in Talend to acquire, stage, and aggregate data in technologies such as HAWQ, Hive, Spark, HDFS.
- Modeled Hive partitions extensively for data separation and faster data processing and followed Pig and Hive best practices for tuning.
- Developed Hive queries to process the data and generate the data cubes for visualizing.
- Worked on custom Pig Loaders and storage classes to work with variety of data formats such as JSON and XML file formats.
Environment: Hadoop, Agile methodologies, Talend, HDFS, HBase, MongoDb, YARN, Java, Hive, Pig, Sqoop, Flume, Oozie, Hue, SQL, ETL, Cloudera Manager, AVRO, Oracle, My SQL.
Big Data Analyst/Java Developer
Confidential - Kalamazoo, MI
Responsibilities:
- Installed and configured Apache Hadoop to test the maintenance of log files in Hadoop cluster.
- Installed and configured Hive, Pig, Sqoop, and Oozie on the Hadoop cluster.
- Installed and configured Hadoop MapReduce, HDFS and developed multiple MapReduce jobs in Java for data cleansing and preprocessing.
- Implemented project using Agile SCRUM methodology, involved in daily stand up meetings and sprint showcase and sprint retrospective.
- Deployed the Big Data Hadoop application using Talend on cloud AWS.
- Extensively Involved in loading data from UNIX file system to HDFS.
- Involved in evaluating the business requirements and prepared detailed specifications that follow project guidelines required to develop written programs.
- Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
- Provided quick response to ad hoc internal and external client requests for data and experienced in creating ad hoc reports.
- Responsible for building scalable distributed data solutions using Hadoop.
- Implemented Map Reduce jobs in HIVE by querying the available data.
- Used Amazon Redshift to Store and retrieve the data from data-warehouses.
- Experience in using Hive and Pig as an ETL tool for event joins, filters, transformations and pre- aggregations.
- Developed PIG scripts to transform the raw data into intelligent data as specified by business users.
- Supported in setting up QA environment and updating configurations for implementing scripts with Pig.
- Performed some unit testing for the development team within the sandbox environment.
- Used Hive and created Hive tables and also involved in writing Hive UDFs and data loading.
- Imported data into HDFS and Hive from other data systems by using Sqoop.
- Installed Oozie Workflow engine to run multiple Hive and Pig Jobs.
- Generated aggregations and groups and visualizations using Tableau.
- Developed Hive queries to process the data.
- Presented data and dataflow using Talend for reusability.
- Developed and maintain several batch jobs to run automatically depending on business requirements.
Environment: Apache Hadoop, Cloudera Manager, CDH2, CDH3 CentOS, Apache Hama, Talend, Eclipse Indigo, Java, MapReduce, Hive, Sqoop, Pig, Oozie and SQL, Struts, JUnit.
Java Developer
Confidential
Responsibilities:
- Involved in design and development phases of Software Development Life Cycle (SDLC).
- Involved in designing UML Use case diagrams, Class diagrams, and Sequence diagrams using Rational Rose.
- Followed agile methodology and SCRUM meetings to track, optimize and tailored features to customer needs.
- Developed user interface using JSP, JSP Tag libraries, and Java Script to simplify the complexities of the application.
- Implemented Model View Controller (MVC) architecture using Jakarta Struts frameworks at presentation tier.
- Developed a Dojo based front end including forms and controls and programmed event handling.
- Implemented SOA architecture with web services using JAX-RS (REST) and JAX-WS (SOAP).
- Developed various Enterprise Java Bean components to fulfill the business functionality.
- Created Action Classes which route submittals to appropriate EJB components and render retrieved information.
- Validated all forms using Struts validation framework and implemented Tiles framework in the presentation layer.
- Used Core java and object oriented concepts.
- Used Spring Framework for Dependency injection and integrated it with the Struts Framework.
- Used JDBC to connect to backend databases, Oracle and SQL Server 2005.
- Proficient in writing SQL queries, stored procedures for multiple databases, Oracle and SQL Server 2005.
- Wrote Stored Procedures using PL/SQL. Performed query optimization to achieve faster indexing and making the system more scalable.
- Deployed application on windows using IBM Web Sphere Application Server.
- Used Java Messaging Services (JMS) for reliable and asynchronous exchange of important information such as payment status report.
- Used Web Services - WSDL and REST for getting credit card information from third party and used SAX and DOM XML parsers for data retrieval.
- Implemented SOA architecture with web services using Web Services like JAX-WS.
- Used ANT scripts to build the application and deployed on Web Sphere Application Server.
Environment: Core Java, Agile methodologies, J2EE, Oracle, SQL Server, JSP, Struts, Spring, JDK, JavaScript, HTML, CSS, AJAX, JUnit, Log4j, Web Services, Windows.
Jr Java/J2EE Developer
Confidential
Responsibilities:
- Involved in specification analysis and identifying the requirements.
- Participated in design discussions for the methodology of requirement implementation
- Involved in preparation of the Code Review Document & Technical Design Document
- Designed the presentation layer by developing the jsp pages for the modules
- Developed controllers and JavaBeans encapsulating the business logic
- Developed classes to interface with underlying web services layer
- Used patterns including MVC, DAO, DTO, Front Controller, Service Locator and Business Delegate.
- Worked on Service Layer which provided business logic implementation.
- Involved in building PL\SQL queries and stored procedures for Database operations.
- Used Jasper Reports to provide print preview of Financial Reports and Monthly Statements.
- Carried out integration testing & acceptance testing
- Used JMeter to carry out performance tests on external web service calls, database connections and other dynamic resources.
- Participated in the team meetings and discussed enhancements, issues and proposed feasible solutions.
Environment: Java1.4, J2EE 1.4 Servlet, JSP, JDBC, XML, ANT, Apache Tomcat 5.0, Oracle 8i, JUnit, PL\SQL, UML, NetBeans.