Hadoop and Spark Developer Resume Fremont, CA - Hire IT People

SUMMARY:

Having 6+ years of relevant experience in IT Industry in Designing, Development and Implementation of Big data, Hadoop, MapReduce, Pig, Hive, Sqoop, NiFi, Oozie, Zookeeper, Flume with CDH4&5 distributions
Replaced existing jobs with Spark data transformations for efficient data processing and performance
Developed the Hive UDF's (User Defined Functions) to preprocess the data for analysis.
Wrote Hive Queries, Pig Scripts for data analysis to meet the requirements
Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting
Having experience on developing RDD, DATA Frames and SQL Queries in Spark SQL
Worked with different flavors of Hadoop distributions which includes Cloudera and Hortonworks.
Having good knowledge of Spark SQL and Spark using Scala.
Hands on experience with Spark batch process and did POC on kafka - spark streaming process
Experienced in using IDEs and Tools like Eclipse, NetBeans, GitHub, Maven and IntelliJ
Monitored and managed Hadoop cluster using the Cloudera Manager web-interface.
Hands Experience with NOSQL Databases like HBASE and MongoDB.
Having knowledge about Hadoop architecture and its different components such as HDFS, Job tracker, Task tracker, Resource Manager, Name Node, Data Node and Map Reduce concepts.
Experience with Java development skills using J2EE, J2SE, Servlets, JSP, EJB, JDBC.
Implementation knowledge of Enterprise/Web/Client Server using Java, J2EE.
Experience in analyzing the data using HQL, Pig Latin, HBase and custom Map Reduce programs in Java
Experienced in data formats like JSON, PARQUET, AVRO, RC and ORC formats
Utilized Flume to analyze log files and write into HDFS.
Experience in importing and exporting data by Sqoop between HDFS and RDBMS and migrating according to client's requirement
Worked with YARN, MESOS and Spark default schedulers.
Having experience in working with data ingestion, storage, processing and analyzing the big data.
Worked on loading and transforming of large sets of structured, semi structured and unstructured data.
Used GitHub version control tool to push and pull functions to get the updated code from repository

PROFESSIONAL EXPERIENCE:

Hadoop and Spark Developer

Confidential - Fremont, CA

Responsibilities:

Worked with large-scale Hadoop YARN cluster for distributed data processing and analysis using Connectors, Spark core, Spark SQL, Sqoop, Hive and NoSQL database
Created Partitions and Bucketing on the hive tables for improving the performance of the process and faster access of data from hive table.
Data is exported by sqoop export to RDBMS from HDFS which is useful to BI team to analyze and for generating reports.
I had worked with 35 Node cluster it contains 2 Name Nodes and 33 Data nodes. Size of each name node is 3TB
Worked with spark eco system using Spark SQL queries on data formats like Text file, CSV file and XML files
Performed Coalesce and repartition on the Data Frames.
Used Snappy Technique for Saving the Storage in the HDFS.
Worked with Shell-Scripting to clean the base data.
Having good knowledge of Spark SQL and Spark using Scala.
Created RDD'S, Data Frame, Dataset to process the data using spark.
Used Impala for faster and better processing of data.
Spark and Hive is integrated for Business Requirements.
Worked with ORC, Parquet, Avro, Json File formats
Used Hive joins for joining the multiple tables to achieve the business requirements.
Replacing Spark SQL with Hive QL for good performance.
Worked with Agile Methodology.
Migrate the data from all existing jobs too Spark for better performance and to save the time of execution.
Created Hive UDF's for hive table depending on the business requirement.
After processing the data with hive and Spark will send the data to BI people for analysis.
Used Oozie for Automation of tasks.
Created nifi flows to trigger spark jobs.In case if we have any failures we got email notifications regarding the failures.
Created Pig scripts to transform the HDFS data and loaded the data into HIVE external table.
Having Experience working with NOSQL database like HBASE for loading the huge amount of semi-structure coming from different sources.
Monitoring all the nifi flows to get notifications in case if there is no data flow through the flow more than the specific time.
Exported the data using Sqoop to RDBMS and processed the data for ETL operations.
Import the data from different sources like HDFS/HBase into SparkRDD
Having experience in developing POC on streaming data using Apache Kafka, Spark Streaming

Environment: Hadoop, HDFS, Oozie, Hive, Impala, HBase, Spark, Intellij, Linux, Java, Horton Works, Nifi, MapReduce, Sqoop, Shell Scripting, Apache Kakfa

Hadoop Developer

Confidential - Fremont, CA

Responsibilities:

Developed automated scripts to install Hadoop clusters
Worked on loading and transforming of large sets of structured, semi structured and unstructured data
Worked on User Defined Functions in Hive to load the data from HDFS to run aggregation function on multiple rows.
I had worked with 25 Node cluster and it contains 2 Name nodes and 23 Data Nodes. Size of Each data node is
3TB.
Involved in moving all log files generated from various sources to HDFS for further processing through Flume.
Used Bzip2 compression technique to compress the files before loading it to Hive
Performed load and retrieve unstructured data.
Worked on User Defined Functions in Hive to load the data from HDFS to run aggregation function on multiple rows.
Used Zookeeper to manage Hadoop clusters and Oozie to schedule job workflows
Used SQOOP to export the analysed data to relational database for analysis by data analytics team.
Implemented the workflows using Apache Oozie framework to automate tasks
Used Sqoop to import and export data from HDFS to RDBMS and vice-versa.
Data analysis in running Hive queries.
Used Impala for faster processing of Data.
Good experience in developing Hive DDLs to create, alter and drop Hive TABLES
Involved in loading data from edge node to HDFS using shell scripting.
Responsible for building scalable distributed data pipelines using Hadoop.
Assisted in exporting analyzed data to relational databases using Sqoop
Implemented Partitioning, Bucketing to hive table to meet the business requirements
Load and Transform large datasets of structured and semi-structured which includes different file formats like Avro, Parquet and Sequence file

Environment: Hadoop, Map Reduce, HDFS, Pig, Hive, Oozie, Java, UNIX, Flume, Spark, Scala, Hbase, Impala, Java.Eclipse.

SQL Developer

Confidential

Responsibilities:

Developed ER diagrams/ Dimensional diagrams for business requirements using ERWIN.
Involved in the data analysis and data discrepancy reduction for the source and target schemas.
Developed SQL Scripts to perform nested querying, join, subqueries, Insert, Update and Delete data in My Sql database tables.
Involved in the data analysis and data discrepancy reduction for the source and target schemas.
Worked with developing and implementing Stored Procedures, packages and triggers.
Worked to extract the data from xml file to Sql Table and with the help of SQLServer 2008 data file reporting is produced.
With the help of MS SQL SERVER data connection was built to the database
Worked for the advance SQL queries, procedure, cursor and triggers.

Environment: My SQL, SQL Server 2008(SSRS & SSIS), PL\SQL, Visual studio 2000/2005

Java Developer

Confidential

Responsibilities:

Developed the application using Struts Framework that leverages classical Model View Controller (MVC) architecture.
Designed the user interfaces using JSPs, developed custom tags, and used JSTL Taglib.
Developed various java business classes for handling different functions.
Developed controller classes using Struts and tiles api
Involved in Documentation and Use case design using UML modeling include development of Class diagrams, Sequence diagrams, and Use case Transaction diagrams.
Participated in design and code reviews
Developed User Interface using AJAX in JSP and also performed client-side validation
Developed JUnit test cases for all the developed modules. Used SVN as version control

Environment: Java, J2EE, JSP, Struts 1.x, JNDI, DB2, HTML, XML, DOM, SAX, ANT, AJAX, Rational Rose, Eclipse Indigo 3.5, SOAP, Apache Tomcat, Oracle 10g, LOG4J, SVN.

We provide IT Staff Augmentation Services!

Hadoop And Spark Developer Resume

Fremont, CA

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship