Sr. Spark/hadoopdeveloper Resume
Charlotte, NC
SUMMARY
- Over 7+ years of IT experience, including 4 years of Hadoop/Big data Experience,3 years of Java Programming involved in entire Software Development Life Cycle which includes Design, Developing, Implementing, Testing and maintenance of various web - based applications using Java, J2EE Technologies.
- Experience in working with Cloudera, Hortonworks, Amazon EMR Hadoop Distributions.
- Experience in dealing with large data sets and making performance improvements
- Experience in Implementing Spark with the integration of Hadoop Ecosystem.
- Experience in using Spark RDD for parallel processing datasets in HDFS, MySQL and other data sources.
- Experience in converting Hive/SQL queries into Spark transformations using Spark RDDs and Scala.
- Experience in using different build tools like SBT and Maven.
- Implemented Spark Streaming for fast data processing.
- Experience in designing and developing Applications in Spark using Scala.
- Skilled in integrating Kafka with Spark streaming for high speed data processing.
- Managing and scheduling Spark Jobs on a Hadoop Cluster using Oozie.
- Experience in data cleansing using Spark Map and Filter Functions.
- Implemented POC to migrate map reduce programs into Spark RDD transformations, actions to improve performance.
- Experience in developing and Debugging Hive Queries.
- Experience in performing read and write operations on HDFS filesystem.
- Experience working on EC2 (Elastic Compute Cloud) cluster instances, setup data buckets on S3 (Simple Storage Service), setting up EMR (Elastic MapReduce).
- Good Experience in Data importing and Exporting to Hive and HDFS with Sqoop.
- Experience in creating Hive Tables and loading the data from different file formats.
- Experience in processing the data using Hive HQL for data Analytics.
- Extending Hive Core functionality by writing UDF’s for Data Analysis.
- Implemented Partitioning, Dynamic Partition, Buckets in HIVE.
- Worked on Tableau with Hive by using JDBC/ODBC drivers.
- Experience in dealing with the different file formats like Sequence files, Avro and Parquet.
- Good knowledge in NOSQL databases HBASE, MongoDB.
- Experience in working with Tableau visualization tool.
- Experience in using Producer and Consumer API’s of Apache Kafka.
- Experience in creating and driving large scale ETL pipelines
- Extensively used Apache Flume to collect the logs and error messages across the cluster.
- Good in using version control like GITHUB and SVN
- Worked with MySQL, Oracle 11g, Maria databases.
- Strong Knowledge on UNIX/LINUX commands.
- Strong Knowledge on Python scripting Language.
- Worked on Talend to Import/Export data from RDBMS to Hadoop.
- Adequate knowledge of Scrum, Agile and Waterfall methodologies.
TECHNICAL SKILLS
Big Data Technologies: Apache Hadoop, Map Reduce, Hive, Pig, HBase, Sqoop, Spark, HDFS, Apache Kafka, Apache Flume, Apache oozie, Apache Zookeeper, Cassandra.
Hadoop Distributions: Cloudera, Hortonworks.
Programming Languages: Scala, Python, Java.
Shell Scripting: Shell Script.
Build Tools: Maven, Sbt.
Version Control Tools: Git, SVN.
Cloud: AWS, Azure.
Databases: MySQL, Oracle 10g,11g,12c, MariaDB.
NOSQL Databases: HBase, Cassandra.
Operating Systems: Windows 7/10, Linux (Cent OS, Red hat, Ubuntu), Mac OS.
Development Tools: IntelliJ IDEA, Eclipse, NetBeans.
PROFESSIONAL EXPERIENCE
Sr. Spark/HadoopDeveloper
Confidential - Charlotte, NC
Responsibilities:
- Worked under the Cloudera distribution CDH 5.13 version.
- Involved in Ingesting weblog data into HDFS using Kafka.
- Processed Json Data with Spark SQL.
- Performed Cleansing the data to get a desired format.
- Involved in writing Spark Sql Data frames into Parquet Files.
- Involved in Tuning Spark Jobs for optimal Efficiency.
- Written the Scala functions, procedures, Constructors and Traits.
- Created Hive tables to load the transformed Data.
- Performed partitions and bucketing in hive for easy data classification.
- Involved in Analyzing data by writing queries using HiveQL for faster data processing.
- Involved in working with Sqoop for loading the data into RDBMS.
- Created a data pipeline using Oozie which runs on daily basis.
- Involved in Persisting Metadata into HDFS for further data processing.
- Loading data from Linux Filesystems to HDFS and vice-versa.
- Involved in creating tables, partitioning, bucketing of table and creating UDF's along with fine tuning in Hive.
- Loaded the Cleaned Data into the hive tables and performed some analysis based on the requirements.
- Responsible in performing sort, join, aggregations, filter, and other transformations on the datasets.
- Utilized Agile and Scrum Methodology to help manage and organize a team of developers with regular code review sessions.
Environment: HDFS, Apache Spark, Apache Hive, Scala, Oozie, Flume, Kafka, Agile Methodology, Cloudera, Cassandra.
Spark/Hadoop Developer
Confidential - Plano, TX
Responsibilities:
- Worked under the Hortonworks HDP Enterprise.
- Worked on large sets of structured and semi-structured data.
- Involved in copying large data from Amazon S3 buckets to HDFS using Flume.
- Used Spark Sql with Scala for creating data frames and performed transformations on data frames.
- Involved in working with Avro Files using Spark SQL
- Written UDF’s in Spark SQL using Scala.
- Performed data Aggregation operations using Spark SQL queries.
- Configured Spark streaming to receive data from Kafka and store the streamed data to HDFS using Scala.
- Implemented Hive Partitioning and bucketing for data analytics.
- Worked on Performance and Tuning operations in Hive.
- Extensively used Maven Build tool for code repository.
- Used Git has Version Control System.
- Involved in working with Sqoop to export the data from Hive to S3 buckets
- Used Oozie and Oozie coordinators to deploy end to end data processing pipelines and scheduling the work flows.