Sr. Spark/hadoop Developer Resume
Charlotte, NC
SUMMARY:
- Over 7+ years of IT experience, including 4 years of Hadoop/Big data Experience,3 years of Java Programming involved in entire Software Development Life Cycle which includes Design, Developing, Implementing, Testing and maintenance of various web - based applications using Java, J2EE Technologies.
- Experience in working with Cloudera, Hortonworks, Amazon EMR Hadoop Distributions.
- Experience in dealing with large data sets and making performance improvements
- Experience in Implementing Spark with the integration of Hadoop Ecosystem.
- Experience in using Spark RDD for parallel processing datasets in HDFS, MySQL and other data sources.
- Experience in converting Hive/SQL queries into Spark transformations using Spark RDDs and Scala.
- Experience in using different build tools like SBT and Maven.
- Implemented Spark Streaming for fast data processing.
- Experience in designing and developing Applications in Spark using Scala.
- Skilled in integrating Kafka with Spark streaming for high speed data processing.
- Managing and scheduling Spark Jobs on a Hadoop Cluster using Oozie.
- Experience in data cleansing using Spark Map and Filter Functions.
- Implemented POC to migrate map reduce programs into Spark RDD transformations, actions to improve performance.
- Experience in developing and Debugging Hive Queries.
- Experience in performing read and write operations on HDFS filesystem.
- Experience working on EC2 (Elastic Compute Cloud) cluster instances, setup data buckets on S3 (Simple Storage Service), setting up EMR (Elastic MapReduce).
- Good Experience in Data importing and Exporting to Hive and HDFS with Sqoop.
- Experience in creating Hive Tables and loading the data from different file formats.
- Experience in processing the data using Hive HQL for data Analytics.
- Extending Hive Core functionality by writing UDF’s for Data Analysis.
- Implemented Partitioning, Dynamic Partition, Buckets in HIVE.
- Worked on Tableau with Hive by using JDBC/ODBC drivers.
- Experience in dealing with the different file formats like Sequence files, Avro and Parquet.
- Good knowledge in NOSQL databases HBASE, MongoDB.
- Experience in working with Tableau visualization tool.
- Experience in using Producer and Consumer API’s of Apache Kafka.
- Experience in creating and driving large scale ETL pipelines
- Extensively used Apache Flume to collect the logs and error messages across the cluster.
- Good in using version control like GITHUB and SVN
- Worked with MySQL, Oracle 11g, Maria databases.
- Strong Knowledge on UNIX/LINUX commands.
- Strong Knowledge on Python scripting Language.
- Worked on Talend to Import/Export data from RDBMS to Hadoop.
- Adequate knowledge of Scrum, Agile and Waterfall methodologies.
TECHNICAL SKILLS:
Big Data Technologies: Apache Hadoop, Map Reduce, Hive, Pig, HBase, Sqoop, Spark, HDFS, Apache Kafka, Apache Flume, Apache oozie, Apache Zookeeper, Cassandra.
Hadoop Distributions: Cloudera, Hortonworks.
Programming Languages: Scala, Python, Java.
Shell Scripting: Shell Script.
Build Tools: Maven, Sbt.
Version Control Tools: Git, SVN.
Cloud: AWS, Azure.
Databases: MySQL, Oracle 10g,11g,12c, MariaDB.
NOSQL Databases: HBase, Cassandra.
Operating Systems: Windows 7/10, Linux (Cent OS, Red hat, Ubuntu), Mac OS.
Development Tools: IntelliJ IDEA, Eclipse, NetBeans.
WORK EXPERIENCE:
Sr. Spark/Hadoop Developer
Confidential - Charlotte, NC
Responsibilities:
- Worked under the Cloudera distribution CDH 5.13 version.
- Involved in Ingesting weblog data into HDFS using Kafka.
- Processed Json Data with Spark SQL.
- Performed Cleansing the data to get a desired format.
- Involved in writing Spark Sql Data frames into Parquet Files.
- Involved in Tuning Spark Jobs for optimal Efficiency.
- Written the Scala functions, procedures, Constructors and Traits.
- Created Hive tables to load the transformed Data.
- Performed partitions and bucketing in hive for easy data classification.
- Involved in Analyzing data by writing queries using HiveQL for faster data processing.
- Involved in working with Sqoop for loading the data into RDBMS.
- Created a data pipeline using Oozie which runs on daily basis.
- Involved in Persisting Metadata into HDFS for further data processing.
- Loading data from Linux Filesystems to HDFS and vice-versa.
- Involved in creating tables, partitioning, bucketing of table and creating UDF's along with fine tuning in Hive.
- Loaded the Cleaned Data into the hive tables and performed some analysis based on the requirements.
- Responsible in performing sort, join, aggregations, filter, and other transformations on the datasets.
- Utilized Agile and Scrum Methodology to help manage and organize a team of developers with regular code review sessions.
Environment: HDFS, Apache Spark, Apache Hive, Scala, Oozie, Flume, Kafka, Agile Methodology, Cloudera, Cassandra.
Spark/Hadoop Developer
Confidential -Plano, TX
Responsibilities:
- Worked under the Hortonworks HDP Enterprise.
- Worked on large sets of structured and semi-structured data.
- Involved in copying large data from Amazon S3 buckets to HDFS using Flume.
- Used Spark Sql with Scala for creating data frames and performed transformations on data frames.
- Involved in working with Avro Files using Spark SQL
- Written UDF’s in Spark SQL using Scala.
- Performed data Aggregation operations using Spark SQL queries.
- Configured Spark streaming to receive data from Kafka and store the streamed data to HDFS using Scala.
- Implemented Hive Partitioning and bucketing for data analytics.
- Worked on Performance and Tuning operations in Hive.
- Extensively used Maven Build tool for code repository.
- Used Git has Version Control System.
- Involved in working with Sqoop to export the data from Hive to S3 buckets
- Used Oozie and Oozie coordinators to deploy end to end data processing pipelines and scheduling the work flows.
Environment: Apache Spark, Apache Flume, Amazon S3, Apache Sqoop, Apache Oozie, Apache Kafka, Hive, Apache.
Hadoop Developer
Confidential - San Francisco, CA
Responsibilities:
- Used Flume as a data pipeline system to ingest the unstructured events from various web servers to HDFS.
- Worked on altering the unstructured events from web servers on the fly using various flume interceptors.
- Wrote various spark transformations using Scala to perform data cleansing, validation and summarization activities on user behavioral data.
- Parsed the unstructured data into semi-structured format by writing complex algorithms in spark.
- Developed generic parser to transform any format of unstructured data into a consisted data model.
- Configured Flume with the Spark Streaming to transfer the data into HDFS at regular intervals of time from web servers to process the data.
- Implemented the persistence of frequently used transformed data from data frames for faster processing.
- Build hive tables on the transformed data and used different SERDE’s to store the data in HDFS in different formats.
- Loaded the transformed Data into the hive tables and perform some analysis based on the requirements.
- Implemented portioning on the Hive data to increase the performance of the processing of data.
- Analyzed the data by performing Hive queries (Hive QL) to study customer behavior.
- Created Pig Latin scripts to sort, group, join and filter to transform the data.
- Worked on various performance optimizations like using distributed cache for small datasets, Partitioning, Bucketing in Hive and Map Side joins.
- Exported the analyzed data to the relational databases using Sqoop, to further visualize and generate reports for the BI team.
- Implemented custom workflow to automate the jobs on daily basis.
- Created custom workflows to automate Sqoop jobs weekly and monthly.
Environment: HDFS, Hive, Sqoop, Flume, Spark, Scala, MapReduce, Oracle 11g, YARN, UNIX Shell Scripting, Agile Methodology, Cloudera.
Java Developer
Confidential
Responsibilities:
- Involved in various phases of Software Development Life Cycle (SDLC) such as requirements gathering, modeling, analysis, design and development.
- Used Struts tag libraries in the JSP pages.
- Worked with JDBC and Hibernate.
- Used SVN as a version control
- Developed Web Services using XML messages that use SOAP.
- Configured Development Environment using Tomcat and Apache Web Server.
- Develop and maintains complex outbound notification applications that run on custom architectures, using diverse technologies including Core Java, J2EE, SOAP, XML, JMS, JBoss and Web Services.
- Worked with Complex SQL queries, Functions and Stored Procedures.
- Developed Test Scripts using JUnit and JMockit.
- Worked with ANT and Maven to develop build scripts.
- Worked with Hibernate, JDBC to handle data needs.
- Configured Development Environment using Tomcat and Apache Web Server.
Environment: Java, J2EE, XML, oracle 11g, XML, MySQL, Apache Tomcat.
