Hadoop And Spark Developer Resume
Charlotte, NC
SUMMARY:
- Having 6+ years of relevant experience in IT Industry in Designing, Development and Implementation of Big data, Hadoop, MapReduce, Pig, Hive, Sqoop, NiFi, Oozie, Zookeeper, Flume with CDH4&5 distributions
- Replaced existing jobs with Spark data transformations for efficient data processing and performance
- Developed the Hive UDF’s (User Defined Functions) to preprocess the data for analysis.
- Wrote Hive Queries, Pig Scripts for data analysis to meet the requirements
- Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting
- Having experience on developing RDD, DATA Frames and SQL Queries in Spark SQL
- Worked with different flavors of Hadoop distributions which includes Cloudera and Hortonworks.
- Having good knowledge of Spark SQL and Spark using Scala.
- Hands on experience with Spark batch process and did POC on kafka - spark streaming process
- Experienced in using IDEs and Tools like Eclipse,NetBeans, GitHub, Maven and IntelliJ
- Monitored and managed Hadoop cluster using the Cloudera Manager web-interface.
- Hands Experience with NOSQL Databases like HBASE and MongoDB.
- Having knowledge about Hadoop architecture and its different components such as HDFS, Job tracker, Task tracker, Resource Manager, Name Node, Data Node and Map Reduce concepts.
- Experience with Java development skills using J2EE, J2SE, Servlets, JSP, EJB, JDBC.
- Implementation knowledge of Enterprise/Web/Client Server using Java, J2EE.
- Experience in analyzing the data using HQL, Pig Latin, HBase and custom Map Reduce programs in Java
- Experienced in data formats like JSON, PARQUET, AVRO, RC and ORC formats
- Utilized Flume to analyze log files and write into HDFS.
- Experience in importing and exporting data by Sqoop between HDFS and RDBMS and migrating according to client's requirement
- Having experience in working with data ingestion, storage, processing and analyzing the big data.
- Worked on loading and transforming of large sets of structured, semi structured and unstructured data.
- Used GitHub version control tool to push and pull functions to get the updated code from repository
TECHNICAL SKILLS:
Hadoop Ecosystem: HDFS, MapReduce, Hive, Sqoop, Pig, HBase, Flume, Zookeeper, Oozie, Impala, Kafka.
Spark Components: Spark Core, Spark SQL (RDD And Data Frames), Scala.
Programming Languages: SQL, Core Java, Scala, Shell, Pig Latin, Hive-QL
Databases: Oracle 12c, 11g, 10g, MySQL, SQL Server 2014/2012/2008 R2, MongoDB HBase
Java/J2EE Technologies: Java, J2EE, JSP, JDBC, RESTFUL API
IDES & Command Line Tools: Eclipse, NetBeans, Jenkins, IntelliJ, MavenWeb Technologies: HTML, XML, JavaScript, jQuery, AJAX, SOAP, and WSDL.
PROFESSIONAL EXPERIENCE:
Confidential, Charlotte, NC
Hadoop and Spark Developer
Responsibilities:
- Worked with large-scale Hadoop YARN cluster for distributed data processing and analysis using Connectors, Spark core, Spark SQL, Sqoop, Pig, Hive and NoSQL database
- Created Partitions and Bucketing on the hive tables for improving the performance of the process and faster access of data from hive table.
- Data is exported by sqoop export to RDBMS from HDFS which is useful to BI team to analyze and for generating report.
- Performed Coalesce and repartition on the Data Frames for optimization the spark queries.
- Used Snappy Technique for Saving the Storage in the HDFS.
- Created RDD’S, Data Frame, Dataset to process the data using spark.
- Used Impala for faster and better processing of data.
- For the utilization of Spark engine in Hive. Used Hive On Spark for the better performance of hive queries.
- Worked with ORC, Parquet, Avro, Json File formats
- Used Hive joins for joining the multiple tables to achieve the business requirements.
- Replacing Spark SQL with Hive QL for good performance.
- Worked with Agile Methodology.
- Hands on experience with Azure HDInsight/Spark/Event hub
- Worked with Kafka message queue for Spark streaming
- Enhanced the functionalities of SparkSql by writing UDF’S using Scala.
- Migrate the data from all existing jobs too Spark for better performance and to save the time of execution.
- Created Hive UDF’s for hive table depending on the business requirement.
- After processing the data with hive and Spark will send the data to BI people for analysis.
- Used Oozie for Automation of jobs in Hadoop.
- Created nifi flows to trigger spark jobs. In case if we have any failures, we got email notifications regarding the failures.
- Having Experience working with NOSQL database like HBASE for loading the huge amount of semi-structure coming from different sources.
- Monitoring all the nifi flows to get notifications in case if there is no data flow through the flow more than the specific time.
- Exported the data using Sqoop to RDBMS and processed the data for ETL operations.
- Import the data from different sources like HDFS/HBase into SparkRDD
- Having experience in developing POC on streaming data using Apache Kafka, Spark Streaming
- Importing and exporting data into HDFS and HIVE, PIG using Sqoop.
Environment: Hadoop, HDFS, Oozie, Pig, Hive, Impala, HBase, Spark, Intellij, Linux, Java, Horton Works, Nifi, MapReduce, Sqoop, Shell Scripting, Apache Kakfa, Scala, HortonWorks
Confidential, California
Hadoop Developer
Responsibilities:
- Developed automated scripts to install Hadoop clusters
- Worked on loading and transforming of large sets of structured, semi structured and unstructured data
- Worked on User Defined Functions in Hive to load the data from HDFS to run aggregation function on multiple rows.
- Involved in moving all log files generated from various sources to HDFS for further processing through Flume.
- Used Bzip2 compression technique to compress the files before loading it to Hive.
- Worked on User Defined Functions in Hive to load the data from HDFS to run aggregation function on multiple rows.
- Used Zookeeper to manage Hadoop clusters and Oozie to schedule job workflows
- Used Sqoop to export the analyzed data to relational database for analysis by data analytics team.
- Implemented the workflows using Apache Oozie framework to automate tasks
- Used Sqoop to import and export data from HDFS to RDBMS and vice-versa.
- Used Impala for faster processing of Data.
- Good experience in developing Hive DDLs to create, alter and drop Hive TABLES
- Involved in loading data from edge node to HDFS using shell scripting.
- Responsible for building scalable distributed data pipelines using Hadoop.
- Assisted in exporting analyzed data to relational databases using Sqoop
- Implemented Partitioning, bucketing to hive table to meet the business requirements
- Load and Transform large datasets of structured and semi-structured which includes different file formats like Avro, Parquet and Sequence file
Environment: Hadoop, Map Reduce, HDFS, Pig, Hive, Oozie, Java, UNIX, Flume, Spark, Scala, HBase, Impala, Java. Eclipse.
Confidential
QL Developer/ETL developer
Responsibilities:
- Developed ER diagrams/ Dimensional diagrams for business requirements using ERWIN.
- Involved in the data analysis and data discrepancy reduction for the source and target schemas.
- Developed SQL Scripts to perform nested querying, join, subqueries, Insert, Update and Delete data in My Sql database tables.
- Developed Etl frame work generic components using column import, column export stages with enabled RCP
- Involved in the data analysis and data discrepancy reduction for the source and target schemas.
- Worked with developing and implementing Stored Procedures, packages and triggers.
- Worked to extract the data from xml file to Sql Table and with the help of SQLServer 2008 data file reporting is produced.
- With the help of MS SQL SERVER data connection was builts to the database
- Worked for the advance SQL queries, procedure, cursor and triggers.
Environment: My SQL, SQL Server 2008(SSRS & SSIS), PL\SQL, Visual studio 2000/2005
Confidential
Java Developer
Responsibilities:
- Developed the application using Struts Framework that leverages classical Model View Controller (MVC) architecture.
- Designed the user interfaces using JSPs, developed custom tags, and used JSTL Taglib.
- Developed various java business classes for handling different functions.
- Developed controller classes using Struts and tiles api
- Involved in Documentation and Use case design using UML modeling include development of Class diagrams, Sequence diagrams, and Use Case Transaction diagrams.
- Participated in design and code reviews
- Developed User Interface using AJAX in JSP and also performed client-side svalidation
- Developed JUnit test cases for all the developed modules. Used SVN as version control
Environment: Java, J2EE, JSP, Struts 1.x, JNDI, DB2, HTML, XML, DOM, SAX, ANT, AJAX, Rational Rose, Eclipse Indigo 3.5, SOAP, Apache Tomcat, Oracle 10g, LOG4J, SVN.