Hadoop And Spark Developer Resume
Fremont, CA
SUMMARY:
- Having 6+ years of relevant experience in IT Industry in Designing, Development and Implementation of Big data, Hadoop, MapReduce, Pig, Hive, Sqoop, NiFi, Oozie, Zookeeper, Flume with CDH4&5 distributions
- Replaced existing jobs with Spark data transformations for efficient data processing and performance
- Developed the Hive UDF's (User Defined Functions) to preprocess the data for analysis.
- Wrote Hive Queries, Pig Scripts for data analysis to meet the requirements
- Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting
- Having experience on developing RDD, DATA Frames and SQL Queries in Spark SQL
- Worked with different flavors of Hadoop distributions which includes Cloudera and Hortonworks.
- Having good knowledge of Spark SQL and Spark using Scala.
- Hands on experience with Spark batch process and did POC on kafka - spark streaming process
- Experienced in using IDEs and Tools like Eclipse, NetBeans, GitHub, Maven and IntelliJ
- Monitored and managed Hadoop cluster using the Cloudera Manager web-interface.
- Hands Experience with NOSQL Databases like HBASE and MongoDB.
- Having knowledge about Hadoop architecture and its different components such as HDFS, Job tracker, Task tracker, Resource Manager, Name Node, Data Node and Map Reduce concepts.
- Experience with Java development skills using J2EE, J2SE, Servlets, JSP, EJB, JDBC.
- Implementation knowledge of Enterprise/Web/Client Server using Java, J2EE.
- Experience in analyzing the data using HQL, Pig Latin, HBase and custom Map Reduce programs in Java
- Experienced in data formats like JSON, PARQUET, AVRO, RC and ORC formats
- Utilized Flume to analyze log files and write into HDFS.
- Experience in importing and exporting data by Sqoop between HDFS and RDBMS and migrating according to client's requirement
- Worked with YARN, MESOS and Spark default schedulers.
- Having experience in working with data ingestion, storage, processing and analyzing the big data.
- Worked on loading and transforming of large sets of structured, semi structured and unstructured data.
- Used GitHub version control tool to push and pull functions to get the updated code from repository
PROFESSIONAL EXPERIENCE:
Hadoop and Spark Developer
Confidential - Fremont, CA
Responsibilities:
- Worked with large-scale Hadoop YARN cluster for distributed data processing and analysis using Connectors, Spark core, Spark SQL, Sqoop, Hive and NoSQL database
- Created Partitions and Bucketing on the hive tables for improving the performance of the process and faster access of data from hive table.
- Data is exported by sqoop export to RDBMS from HDFS which is useful to BI team to analyze and for generating reports.
- I had worked with 35 Node cluster it contains 2 Name Nodes and 33 Data nodes. Size of each name node is 3TB
- Worked with spark eco system using Spark SQL queries on data formats like Text file, CSV file and XML files
- Performed Coalesce and repartition on the Data Frames.
- Used Snappy Technique for Saving the Storage in the HDFS.
- Worked with Shell-Scripting to clean the base data.
- Having good knowledge of Spark SQL and Spark using Scala.
- Created RDD'S, Data Frame, Dataset to process the data using spark.
- Used Impala for faster and better processing of data.
- Spark and Hive is integrated for Business Requirements.
- Worked with ORC, Parquet, Avro, Json File formats
- Used Hive joins for joining the multiple tables to achieve the business requirements.
- Replacing Spark SQL with Hive QL for good performance.
- Worked with Agile Methodology.
- Migrate the data from all existing jobs too Spark for better performance and to save the time of execution.
- Created Hive UDF's for hive table depending on the business requirement.
- After processing the data with hive and Spark will send the data to BI people for analysis.
- Used Oozie for Automation of tasks.
- Created nifi flows to trigger spark jobs.In case if we have any failures we got email notifications regarding the failures.
- Created Pig scripts to transform the HDFS data and loaded the data into HIVE external table.
- Having Experience working with NOSQL database like HBASE for loading the huge amount of semi-structure coming from different sources.
- Monitoring all the nifi flows to get notifications in case if there is no data flow through the flow more than the specific time.
- Exported the data using Sqoop to RDBMS and processed the data for ETL operations.
- Import the data from different sources like HDFS/HBase into SparkRDD
- Having experience in developing POC on streaming data using Apache Kafka, Spark Streaming
Environment: Hadoop, HDFS, Oozie, Hive, Impala, HBase, Spark, Intellij, Linux, Java, Horton Works, Nifi, MapReduce, Sqoop, Shell Scripting, Apache Kakfa
Hadoop Developer
Confidential - Fremont, CA
Responsibilities:
- Developed automated scripts to install Hadoop clusters
- Worked on loading and transforming of large sets of structured, semi structured and unstructured data
- Worked on User Defined Functions in Hive to load the data from HDFS to run aggregation function on multiple rows.
- I had worked with 25 Node cluster and it contains 2 Name nodes and 23 Data Nodes. Size of Each data node is
- 3TB.
- Involved in moving all log files generated from various sources to HDFS for further processing through Flume.
- Used Bzip2 compression technique to compress the files before loading it to Hive
- Performed load and retrieve unstructured data.
- Worked on User Defined Functions in Hive to load the data from HDFS to run aggregation function on multiple rows.
- Used Zookeeper to manage Hadoop clusters and Oozie to schedule job workflows
- Used SQOOP to export the analysed data to relational database for analysis by data analytics team.
- Implemented the workflows using Apache Oozie framework to automate tasks
- Used Sqoop to import and export data from HDFS to RDBMS and vice-versa.
- Data analysis in running Hive queries.
- Used Impala for faster processing of Data.
- Good experience in developing Hive DDLs to create, alter and drop Hive TABLES
- Involved in loading data from edge node to HDFS using shell scripting.
- Responsible for building scalable distributed data pipelines using Hadoop.
- Assisted in exporting analyzed data to relational databases using Sqoop
- Implemented Partitioning, Bucketing to hive table to meet the business requirements
- Load and Transform large datasets of structured and semi-structured which includes different file formats like Avro, Parquet and Sequence file
Environment: Hadoop, Map Reduce, HDFS, Pig, Hive, Oozie, Java, UNIX, Flume, Spark, Scala, Hbase, Impala, Java.Eclipse.
SQL Developer
Confidential
Responsibilities:
- Developed ER diagrams/ Dimensional diagrams for business requirements using ERWIN.
- Involved in the data analysis and data discrepancy reduction for the source and target schemas.
- Developed SQL Scripts to perform nested querying, join, subqueries, Insert, Update and Delete data in My Sql database tables.
- Involved in the data analysis and data discrepancy reduction for the source and target schemas.
- Worked with developing and implementing Stored Procedures, packages and triggers.
- Worked to extract the data from xml file to Sql Table and with the help of SQLServer 2008 data file reporting is produced.
- With the help of MS SQL SERVER data connection was built to the database
- Worked for the advance SQL queries, procedure, cursor and triggers.
Environment: My SQL, SQL Server 2008(SSRS & SSIS), PL\SQL, Visual studio 2000/2005
Java Developer
Confidential
Responsibilities:
- Developed the application using Struts Framework that leverages classical Model View Controller (MVC) architecture.
- Designed the user interfaces using JSPs, developed custom tags, and used JSTL Taglib.
- Developed various java business classes for handling different functions.
- Developed controller classes using Struts and tiles api
- Involved in Documentation and Use case design using UML modeling include development of Class diagrams, Sequence diagrams, and Use case Transaction diagrams.
- Participated in design and code reviews
- Developed User Interface using AJAX in JSP and also performed client-side validation
- Developed JUnit test cases for all the developed modules. Used SVN as version control
Environment: Java, J2EE, JSP, Struts 1.x, JNDI, DB2, HTML, XML, DOM, SAX, ANT, AJAX, Rational Rose, Eclipse Indigo 3.5, SOAP, Apache Tomcat, Oracle 10g, LOG4J, SVN.