Sr. Spark/Hadoop Developer Resume Charlotte, NC - Hire IT People

SUMMARY:

Over 7+ years of IT experience, including 4 years of Hadoop/Big data Experience,3 years of Java Programming involved in entire Software Development Life Cycle which includes Design, Developing, Implementing, Testing and maintenance of various web - based applications using Java, J2EE Technologies.
Experience in working with Cloudera, Hortonworks, Amazon EMR Hadoop Distributions.
Experience in dealing with large data sets and making performance improvements
Experience in Implementing Spark with the integration of Hadoop Ecosystem.
Experience in using Spark RDD for parallel processing datasets in HDFS, MySQL and other data sources.
Experience in converting Hive/SQL queries into Spark transformations using Spark RDDs and Scala.
Experience in using different build tools like SBT and Maven.
Implemented Spark Streaming for fast data processing.
Experience in designing and developing Applications in Spark using Scala.
Skilled in integrating Kafka with Spark streaming for high speed data processing.
Managing and scheduling Spark Jobs on a Hadoop Cluster using Oozie.
Experience in data cleansing using Spark Map and Filter Functions.
Implemented POC to migrate map reduce programs into Spark RDD transformations, actions to improve performance.
Experience in developing and Debugging Hive Queries.
Experience in performing read and write operations on HDFS filesystem.
Experience working on EC2 (Elastic Compute Cloud) cluster instances, setup data buckets on S3 (Simple Storage Service), setting up EMR (Elastic MapReduce).
Good Experience in Data importing and Exporting to Hive and HDFS with Sqoop.
Experience in creating Hive Tables and loading the data from different file formats.
Experience in processing the data using Hive HQL for data Analytics.
Extending Hive Core functionality by writing UDF’s for Data Analysis.
Implemented Partitioning, Dynamic Partition, Buckets in HIVE.
Worked on Tableau with Hive by using JDBC/ODBC drivers.
Experience in dealing with the different file formats like Sequence files, Avro and Parquet.
Good knowledge in NOSQL databases HBASE, MongoDB.
Experience in working with Tableau visualization tool.
Experience in using Producer and Consumer API’s of Apache Kafka.
Experience in creating and driving large scale ETL pipelines
Extensively used Apache Flume to collect the logs and error messages across the cluster.
Good in using version control like GITHUB and SVN
Worked with MySQL, Oracle 11g, Maria databases.
Strong Knowledge on UNIX/LINUX commands.
Strong Knowledge on Python scripting Language.
Worked on Talend to Import/Export data from RDBMS to Hadoop.
Adequate knowledge of Scrum, Agile and Waterfall methodologies.

TECHNICAL SKILLS:

Big Data Technologies: Apache Hadoop, Map Reduce, Hive, Pig, HBase, Sqoop, Spark, HDFS, Apache Kafka, Apache Flume, Apache oozie, Apache Zookeeper, Cassandra.

Hadoop Distributions: Cloudera, Hortonworks.

Programming Languages: Scala, Python, Java.

Shell Scripting: Shell Script.

Build Tools: Maven, Sbt.

Version Control Tools: Git, SVN.

Cloud: AWS, Azure.

Databases: MySQL, Oracle 10g,11g,12c, MariaDB.

NOSQL Databases: HBase, Cassandra.

Operating Systems: Windows 7/10, Linux (Cent OS, Red hat, Ubuntu), Mac OS.

Development Tools: IntelliJ IDEA, Eclipse, NetBeans.

WORK EXPERIENCE:

Sr. Spark/Hadoop Developer

Confidential - Charlotte, NC

Responsibilities:

Worked under the Cloudera distribution CDH 5.13 version.
Involved in Ingesting weblog data into HDFS using Kafka.
Processed Json Data with Spark SQL.
Performed Cleansing the data to get a desired format.
Involved in writing Spark Sql Data frames into Parquet Files.
Involved in Tuning Spark Jobs for optimal Efficiency.
Written the Scala functions, procedures, Constructors and Traits.
Created Hive tables to load the transformed Data.
Performed partitions and bucketing in hive for easy data classification.
Involved in Analyzing data by writing queries using HiveQL for faster data processing.
Involved in working with Sqoop for loading the data into RDBMS.
Created a data pipeline using Oozie which runs on daily basis.
Involved in Persisting Metadata into HDFS for further data processing.
Loading data from Linux Filesystems to HDFS and vice-versa.
Involved in creating tables, partitioning, bucketing of table and creating UDF's along with fine tuning in Hive.
Loaded the Cleaned Data into the hive tables and performed some analysis based on the requirements.
Responsible in performing sort, join, aggregations, filter, and other transformations on the datasets.
Utilized Agile and Scrum Methodology to help manage and organize a team of developers with regular code review sessions.

Environment: HDFS, Apache Spark, Apache Hive, Scala, Oozie, Flume, Kafka, Agile Methodology, Cloudera, Cassandra.

Spark/Hadoop Developer

Confidential -Plano, TX

Responsibilities:

Worked under the Hortonworks HDP Enterprise.
Worked on large sets of structured and semi-structured data.
Involved in copying large data from Amazon S3 buckets to HDFS using Flume.
Used Spark Sql with Scala for creating data frames and performed transformations on data frames.
Involved in working with Avro Files using Spark SQL
Written UDF’s in Spark SQL using Scala.
Performed data Aggregation operations using Spark SQL queries.
Configured Spark streaming to receive data from Kafka and store the streamed data to HDFS using Scala.
Implemented Hive Partitioning and bucketing for data analytics.
Worked on Performance and Tuning operations in Hive.
Extensively used Maven Build tool for code repository.
Used Git has Version Control System.
Involved in working with Sqoop to export the data from Hive to S3 buckets
Used Oozie and Oozie coordinators to deploy end to end data processing pipelines and scheduling the work flows.

Environment: Apache Spark, Apache Flume, Amazon S3, Apache Sqoop, Apache Oozie, Apache Kafka, Hive, Apache.

Hadoop Developer

Confidential - San Francisco, CA

Responsibilities:

Used Flume as a data pipeline system to ingest the unstructured events from various web servers to HDFS.
Worked on altering the unstructured events from web servers on the fly using various flume interceptors.
Wrote various spark transformations using Scala to perform data cleansing, validation and summarization activities on user behavioral data.
Parsed the unstructured data into semi-structured format by writing complex algorithms in spark.
Developed generic parser to transform any format of unstructured data into a consisted data model.
Configured Flume with the Spark Streaming to transfer the data into HDFS at regular intervals of time from web servers to process the data.
Implemented the persistence of frequently used transformed data from data frames for faster processing.
Build hive tables on the transformed data and used different SERDE’s to store the data in HDFS in different formats.
Loaded the transformed Data into the hive tables and perform some analysis based on the requirements.
Implemented portioning on the Hive data to increase the performance of the processing of data.
Analyzed the data by performing Hive queries (Hive QL) to study customer behavior.
Created Pig Latin scripts to sort, group, join and filter to transform the data.
Worked on various performance optimizations like using distributed cache for small datasets, Partitioning, Bucketing in Hive and Map Side joins.
Exported the analyzed data to the relational databases using Sqoop, to further visualize and generate reports for the BI team.
Implemented custom workflow to automate the jobs on daily basis.
Created custom workflows to automate Sqoop jobs weekly and monthly.

Environment: HDFS, Hive, Sqoop, Flume, Spark, Scala, MapReduce, Oracle 11g, YARN, UNIX Shell Scripting, Agile Methodology, Cloudera.

Java Developer

Confidential

Responsibilities:

Involved in various phases of Software Development Life Cycle (SDLC) such as requirements gathering, modeling, analysis, design and development.
Used Struts tag libraries in the JSP pages.
Worked with JDBC and Hibernate.
Used SVN as a version control
Developed Web Services using XML messages that use SOAP.
Configured Development Environment using Tomcat and Apache Web Server.
Develop and maintains complex outbound notification applications that run on custom architectures, using diverse technologies including Core Java, J2EE, SOAP, XML, JMS, JBoss and Web Services.
Worked with Complex SQL queries, Functions and Stored Procedures.
Developed Test Scripts using JUnit and JMockit.
Worked with ANT and Maven to develop build scripts.
Worked with Hibernate, JDBC to handle data needs.
Configured Development Environment using Tomcat and Apache Web Server.

Environment: Java, J2EE, XML, oracle 11g, XML, MySQL, Apache Tomcat.

We provide IT Staff Augmentation Services!

Sr. Spark/hadoop Developer Resume

Charlotte, NC

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship