Hadoop and Spark Developer Resume Charlotte, NC - Hire IT People

SUMMARY:

Having 6+ years of relevant experience in IT Industry in Designing, Development and Implementation of Big data, Hadoop, MapReduce, Pig, Hive, Sqoop, NiFi, Oozie, Zookeeper, Flume with CDH4&5 distributions
Replaced existing jobs with Spark data transformations for efficient data processing and performance
Developed the Hive UDF’s (User Defined Functions) to preprocess the data for analysis.
Wrote Hive Queries, Pig Scripts for data analysis to meet the requirements
Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting
Having experience on developing RDD, DATA Frames and SQL Queries in Spark SQL
Worked with different flavors of Hadoop distributions which includes Cloudera and Hortonworks.
Having good knowledge of Spark SQL and Spark using Scala.
Hands on experience with Spark batch process and did POC on kafka - spark streaming process
Experienced in using IDEs and Tools like Eclipse,NetBeans, GitHub, Maven and IntelliJ
Monitored and managed Hadoop cluster using the Cloudera Manager web-interface.
Hands Experience with NOSQL Databases like HBASE and MongoDB.
Having knowledge about Hadoop architecture and its different components such as HDFS, Job tracker, Task tracker, Resource Manager, Name Node, Data Node and Map Reduce concepts.
Experience with Java development skills using J2EE, J2SE, Servlets, JSP, EJB, JDBC.
Implementation knowledge of Enterprise/Web/Client Server using Java, J2EE.
Experience in analyzing the data using HQL, Pig Latin, HBase and custom Map Reduce programs in Java
Experienced in data formats like JSON, PARQUET, AVRO, RC and ORC formats
Utilized Flume to analyze log files and write into HDFS.
Experience in importing and exporting data by Sqoop between HDFS and RDBMS and migrating according to client's requirement
Having experience in working with data ingestion, storage, processing and analyzing the big data.
Worked on loading and transforming of large sets of structured, semi structured and unstructured data.
Used GitHub version control tool to push and pull functions to get the updated code from repository

TECHNICAL SKILLS:

Hadoop Ecosystem: HDFS, MapReduce, Hive, Sqoop, Pig, HBase, Flume, Zookeeper, Oozie, Impala, Kafka.

Spark Components: Spark Core, Spark SQL (RDD And Data Frames), Scala.

Programming Languages: SQL, Core Java, Scala, Shell, Pig Latin, Hive-QL

Databases: Oracle 12c, 11g, 10g, MySQL, SQL Server 2014/2012/2008 R2, MongoDB HBase

Java/J2EE Technologies: Java, J2EE, JSP, JDBC, RESTFUL API

IDES & Command Line Tools: Eclipse, NetBeans, Jenkins, IntelliJ, MavenWeb Technologies: HTML, XML, JavaScript, jQuery, AJAX, SOAP, and WSDL.

PROFESSIONAL EXPERIENCE:

Confidential, Charlotte, NC

Hadoop and Spark Developer

Responsibilities:

Worked with large-scale Hadoop YARN cluster for distributed data processing and analysis using Connectors, Spark core, Spark SQL, Sqoop, Pig, Hive and NoSQL database
Created Partitions and Bucketing on the hive tables for improving the performance of the process and faster access of data from hive table.
Data is exported by sqoop export to RDBMS from HDFS which is useful to BI team to analyze and for generating report.
Performed Coalesce and repartition on the Data Frames for optimization the spark queries.
Used Snappy Technique for Saving the Storage in the HDFS.
Created RDD’S, Data Frame, Dataset to process the data using spark.
Used Impala for faster and better processing of data.
For the utilization of Spark engine in Hive. Used Hive On Spark for the better performance of hive queries.
Worked with ORC, Parquet, Avro, Json File formats
Used Hive joins for joining the multiple tables to achieve the business requirements.
Replacing Spark SQL with Hive QL for good performance.
Worked with Agile Methodology.
Hands on experience with Azure HDInsight/Spark/Event hub
Worked with Kafka message queue for Spark streaming
Enhanced the functionalities of SparkSql by writing UDF’S using Scala.
Migrate the data from all existing jobs too Spark for better performance and to save the time of execution.
Created Hive UDF’s for hive table depending on the business requirement.
After processing the data with hive and Spark will send the data to BI people for analysis.
Used Oozie for Automation of jobs in Hadoop.
Created nifi flows to trigger spark jobs. In case if we have any failures, we got email notifications regarding the failures.
Having Experience working with NOSQL database like HBASE for loading the huge amount of semi-structure coming from different sources.
Monitoring all the nifi flows to get notifications in case if there is no data flow through the flow more than the specific time.
Exported the data using Sqoop to RDBMS and processed the data for ETL operations.
Import the data from different sources like HDFS/HBase into SparkRDD
Having experience in developing POC on streaming data using Apache Kafka, Spark Streaming
Importing and exporting data into HDFS and HIVE, PIG using Sqoop.

Environment: Hadoop, HDFS, Oozie, Pig, Hive, Impala, HBase, Spark, Intellij, Linux, Java, Horton Works, Nifi, MapReduce, Sqoop, Shell Scripting, Apache Kakfa, Scala, HortonWorks

Confidential, California

Hadoop Developer

Responsibilities:

Developed automated scripts to install Hadoop clusters
Worked on loading and transforming of large sets of structured, semi structured and unstructured data
Worked on User Defined Functions in Hive to load the data from HDFS to run aggregation function on multiple rows.
Involved in moving all log files generated from various sources to HDFS for further processing through Flume.
Used Bzip2 compression technique to compress the files before loading it to Hive.
Worked on User Defined Functions in Hive to load the data from HDFS to run aggregation function on multiple rows.
Used Zookeeper to manage Hadoop clusters and Oozie to schedule job workflows
Used Sqoop to export the analyzed data to relational database for analysis by data analytics team.
Implemented the workflows using Apache Oozie framework to automate tasks
Used Sqoop to import and export data from HDFS to RDBMS and vice-versa.
Used Impala for faster processing of Data.
Good experience in developing Hive DDLs to create, alter and drop Hive TABLES
Involved in loading data from edge node to HDFS using shell scripting.
Responsible for building scalable distributed data pipelines using Hadoop.
Assisted in exporting analyzed data to relational databases using Sqoop
Implemented Partitioning, bucketing to hive table to meet the business requirements
Load and Transform large datasets of structured and semi-structured which includes different file formats like Avro, Parquet and Sequence file

Environment: Hadoop, Map Reduce, HDFS, Pig, Hive, Oozie, Java, UNIX, Flume, Spark, Scala, HBase, Impala, Java. Eclipse.

Confidential

QL Developer/ETL developer

Responsibilities:

Developed ER diagrams/ Dimensional diagrams for business requirements using ERWIN.
Involved in the data analysis and data discrepancy reduction for the source and target schemas.
Developed SQL Scripts to perform nested querying, join, subqueries, Insert, Update and Delete data in My Sql database tables.
Developed Etl frame work generic components using column import, column export stages with enabled RCP
Involved in the data analysis and data discrepancy reduction for the source and target schemas.
Worked with developing and implementing Stored Procedures, packages and triggers.
Worked to extract the data from xml file to Sql Table and with the help of SQLServer 2008 data file reporting is produced.
With the help of MS SQL SERVER data connection was builts to the database
Worked for the advance SQL queries, procedure, cursor and triggers.

Environment: My SQL, SQL Server 2008(SSRS & SSIS), PL\SQL, Visual studio 2000/2005

Confidential

Java Developer

Responsibilities:

Developed the application using Struts Framework that leverages classical Model View Controller (MVC) architecture.
Designed the user interfaces using JSPs, developed custom tags, and used JSTL Taglib.
Developed various java business classes for handling different functions.
Developed controller classes using Struts and tiles api
Involved in Documentation and Use case design using UML modeling include development of Class diagrams, Sequence diagrams, and Use Case Transaction diagrams.
Participated in design and code reviews
Developed User Interface using AJAX in JSP and also performed client-side svalidation
Developed JUnit test cases for all the developed modules. Used SVN as version control

Environment: Java, J2EE, JSP, Struts 1.x, JNDI, DB2, HTML, XML, DOM, SAX, ANT, AJAX, Rational Rose, Eclipse Indigo 3.5, SOAP, Apache Tomcat, Oracle 10g, LOG4J, SVN.

We provide IT Staff Augmentation Services!

Hadoop And Spark Developer Resume

Charlotte, NC

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship