Hadoop/ Spark Developer Resume San Jose, CA - Hire IT People

PROFESSIONAL SUMMARY:

8 years of IT experience in Design, Development, Deployment, Maintenance and Support of Java/J2EE applications. Focused on quality and efficiency.
3 years of experience in Hadoop distributed file system (HDFS), Impala, Hive, Hbase, Spark, Hue, Map Reduce framework and Sqoop.
Experienced as Hadoop, expertise in providing end to end solutions for real time big data problems by implementing distributed processing concepts such as map reduce on Hadoop frameworks such as HDFS and Hadoop Ecosystem components
Experienced with NoSQL databases like HBase, MongoDB and Cassandra.
Experience in working on large scale big data implementations and in production environment
Hands on experience on Data Migration from Relational Database to Hadoop Platform using SQOOP.
Extensively used Apache Flume to collect logs and error messages across the cluster.
Experienced in using Pig scripts to do transformations, event joins, filters and some pre - aggregations before storing the data onto HDFS.
Around 1 year experience on Spark and Scala.
Developed analytical components using Scala, Spark and Spark Stream.
Experience in Complete Software Development Life Cycle (SDLC) which includes Requirement Analysis, Design, Coding, Testing and Implementation using Agile (Scrum), TDD and other development methodologies.
Expertise in developing both Front End and Back End applications using Java, Servlets, JSP, Web Services, JavaScript, HTML, Spring, Hibernate, JDBC, XML,JSON.
Worked on Web logic, Tomcat Web Server for Development and Deployment of the Java/J2EE Applications.
Good experience in Spring & Hibernate and Expertise in developing Java Beans.
Working knowledge of Web logic server clustering.
Proficient in various web based technologies like HTML, XML, XSLT, and JavaScript.
Expertise in unit testing using JUnit.
Experience in error logging and debugging using Log4J.
Strong knowledge in creating/reviewing of data models that are created in RDBMS like Oracle 10g, MySQL databases.
Worked with operating systems like Linux, UNIX, Solaris, and Windows 2000/XP/Vista/7.
Experience in working with versioning tools like Git, CVS & Clear Case.
Goal oriented, organized, team player with good interpersonal skills; thrives well within group environment as well as individually.
Strong business and application analysis skills with excellent communication and professional abilities.

TECHNICAL SKILLS:

Languages: Java, PL/SQL, Scala

Big Data: Apache Hadoop, Hive, HDFS, Spark, MapReduce, Sqoop

RDBMS: Oracle, SQL Server, Teradata

NoSQL DBMS: HBase

Scripting Languages: UNIX Shell script, Java Script, python

Web Servers: Tomcat 7.x.

Tools and Utilities: MS Team Foundation Server, SVN, Maven, Gradle

Development Tools: Eclipse, IntelliJ IDEA

Operating systems: Windows NT/2000/XP, UNIX, Linux

Methodology: Waterfall, Agile

PROFESSIONAL EXPERIENCE:

Confidential, San Jose, CA

Hadoop/ Spark Developer

Responsibilities:

Expert in implementing advanced procedures like text analytics and processing using the in-memory computing capabilities like Apache Spark written in Scala
Developed and executed shell scripts to automate the jobs
Wrote complex Hive queries and UDFs.
Worked on reading multiple data formats on HDFS using PySpark
Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Python and Scala.
Developed multiple POCs using PySpark and deployed on the Yarn cluster, compared the performance of Spark, with Hive and SQL/Teradata.
Analyzed the SQL scripts and designed the solution to implement using PySpark
Involved in loading data from UNIX file system to HDFS
Extracted the data from Teradata into HDFS using Sqoop
Handled importing of data from various data sources, performed transformations using Hive, MapReduce, Spark and loaded data into HDFS.
Manage and review Hadoop log files.
Involved in analysis, design, testing phases and responsible for documenting technical specifications
Developed Kafka producer and consumers, HBase clients, Spark and Hadoop MapReduce jobs along with components on HDFS, Hive.
Very good understanding of Partitions, Bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance
Worked on the core and Spark SQL modules of Spark extensively..
Experienced in running Hadoop streaming jobs to process terabytes data.
Involved in importing the real time data to hadoop using Kafka and implemented the Oozie job for daily imports.

Environment: Hadoop, HDFS, Hive, Python, Scala, Spark, SQL, Teradata, UNIX Shell Scripting

Confidential

Hadoop Developer

Responsibilities:

Installed and configured Hadoop on a cluster.
Developed multiple MapReduce Jobs in java for data cleaning and pre-processing
Developed Simple to complex Map Reduce Jobs using Hive and Pig
Extending Hive and Pig core functionality by writing custom UDFs
Analyzed large data sets by running Hive queries and Pig scripts
Involved in creating Hive tables, loading with data and writing hive queries that will run internally in mapreduce way.
Experienced in defining job flows using Oozie
Experienced in managing and reviewing Hadoop log files
Experienced in collecting, aggregating, and moving large amounts of streaming data into HDFS using Flume.
Load and transform large sets of structured, semi structured and unstructured data
Responsible to manage data coming from different sources and application
Working Knowledge in NoSQL Databases like HBase and Cassandra.
Good Knowledge of analyzing data in HBase using Hive and Pig.
Involved in Unit level and Integration level testing.
Prepared design documents and functional documents.
Based on the requirements, addition of extra nodes to the cluster to make it scalable.
Involved in running Hadoop jobs for processing millions of records of text data
Involved in loading data from local file system (LINUX) to HDFS
Experienced in running Hadoop streaming jobs to process terabytes of xml format data.
Responsible to manage data coming from different sources.
Assisted in exporting analysed data to relational databases using Sqoop
Created and maintained Technical documentation for launching Hadoop Clusters and for executing
Submit a detailed report about the daily activities on a weekly basis.

Environment: Hadoop-HDFS, Pig, Sqoop, HBase, Hive, Flume MapReduce, Cassandra, Oozie and MySql

Confidential, Madison, WI

Hadoop Developer

Responsibilities:

Installed/Configured/Maintained Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase, Cassandra, Zookeeper and Sqoop.
Involved with Business Analysts in gathering requirements.
Involved in designing Logical/Physical Data Models.
Deployed Hadoop Cluster in Pseudo-distributed and Fully Distributed modes.
Involved in running Ad-Hoc query through PIG Latin language, Hive or Java MapReduce.
Created complex mappings using different transformations like Filter,Router,Connected & Unconnected lookups, Stored Procedure, Joiner, Update Strategy, Union, Expression and Aggregator transformations to pipeline data to DataMart. Also, made use of variables and parameters.
Developed PowerCenter mappings to extract data from various databases, Flat files and load into DataMart using the Informatica 8.6.1.
Developed PIG Latin scripts to extract the data from the web server output files to load into HDFS.
Involved in Big data analysis using Pig and User defined functions (UDF).
Managed and scheduled Jobs on a Hadoop cluster.
Involved in log file management where the logs greater than 7 days old were removed from log folder and loaded into HDFS and stored for 3 months.
Implemented Name node backup using NFS. This was done for High availability.
Worked on importing and exporting data from Oracle and DB2 into HDFS and HIVE using Sqoop.
Set up standards and processes for Hadoop based application design and implementation.
Created Hive External tables and loaded the data into tables and query data using HQL.
Implemented various Performance Tuning techniques on Sources, Targets, Mappings, and Workflows.
Written shell scripts in UNIX to execute the workflow in a loop to process ‘n’ number of files and FTP Scripts to pull the files from FTP server to Linux Server.
Worked on Hadoop Backup Recovery and Upgrade.
Collected the logs data from web servers and integrated into HDFS using Flume.
Implemented Fair schedulers on the Job tracker to share the resources of the Cluster for the MapReduce jobs given by the users.
Involved with reporting team to generating reports from Data Mart using Cognos.

Environment: Apache Hadoop, Informatica PowerCenter 8.6/8.1, SQL Server 2005, TOAD, Rapid SQL, Oracle 10g (RAC), HDFS, Map Reduce, Mongo DB, Java, VMware, HIVE, Eclipse, PIG, Hive, HBase, Sqoop, Flume, Linux, UNIX, DB2.

Confidential

Software Engineer

Responsibilities:

Involved in different phases of Software Development Lifecycle (SDLC) like Requirements gathering, Analysis, Design and Development of the application.
Wrote several Action Classes and Action Forms to capture user input and created different web pages using JSTL, JSP, HTML, Custom Tags and Struts Tags.
Designed and developed Message Flows and Message Sets and other service component to expose Mainframe applications to enterprise J2EE applications.
Used standard data access technologies like JDBC and ORM tool like Hibernate
Worked on various client websites that used Struts 1 framework and Hibernate
Wrote test cases using JUnit testing framework and configured applications on WebLogic Server
Involved in writing stored procedures, views, user-defined functions and triggers in SQL Server database for Reports module.

Environment: Java, JSP, JUnit, Eclipse, JIRA, JDBC, Struts 1, Hibernate, Visual Source Safe (VSS), WebLogic, Oracle 9i.

Confidential

Java Developer

Responsibilities:

Developed Web interface using JSP, Standard Tag Libraries (JSTL), and Struts Framework.
Used Struts as MVC framework for designing the complete Web tier.
Developed different GUI screens JSPs using HTML, DHTML and CSS to design the Pages according to Client Experience Workbench Standards.
Developed Action Form Beans, Action classes for implementing business logic for the Struts Framework.
Validated the user input using Struts Validation Framework.
Client side validations were implemented using JavaScript.
Implemented the mechanism of logging and debugging with Log4j.
Version control of the code and configuration files are maintained by CVS.
Designed generic database Connection Pooling with JDBC using Oracle and involved in the SQL query optimization.
Developed PL/SQL packages and triggers.
Developed test cases for Unit testing and performed integration and system testing.

Environment: J2EE, Weblogic, Eclipse, Struts 1.0, JDBC, JavaScript, CSS, XML, ANT, Log4J, VSS, PL/SQL and Oracle 8i.

We provide IT Staff Augmentation Services!

Hadoop/ Spark Developer Resume

San Jose, CA

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship