We provide IT Staff Augmentation Services!

Spark/ Hadoop Engineer Resume

Nashville, TN

SUMMARY:

  • Spark/ Hadoop developer having 3+ years of experience in IT with 2+ years on Hadoop and have strong experience working with programming languages: Scala, Java.
  • Experience with Big Data/ Hadoop Ecosystem: Spark, Hive, Sqoop, Kafka, Oozie, HBase, MapReduce, NIFI.
  • In - depth understanding of Spark Architecture and performed several batch and real-time data stream operations using Spark (Core, SQL, Streaming).
  • Experienced in handling large datasets using Spark in-memory capabilities, Partitions, Broadcast variables, Accumulators, Effective & Efficient Joins. Used Scala to develop Spark applications.
  • Tested and Optimized Spark applications.
  • Performed Hive operations on large datasets with proficiency in writing HiveQL queries using transactional and performance efficient concepts: UPSERTS, Partitioning, Bucketing, Windowing,etc.
  • Wrote custom UDFs, UDAFs, UDTFs, and generated optimized execution plans for faster performance.
  • Imported data from relational databases to HDFS/Hive, performed operations and exported the results back using Sqoop.
  • Wrote custom Kafka Consumer programs in Java and implemented a pipeline: Kafka, Spark, HDFS/S3.
  • Implemented NIFI data workflow in production and performed streaming and batch processing via micro-batches coming from multiple data sources. Controlled and monitored using Web UI.
  • Scheduled jobs and automated workflows using Oozie.
  • Experienced working on cloud AWS using EMR. Performed operations on AWS using EC2 instances, S3 buckets, performed RDS, Lambda, analytical Redshift operations.
  • Used HBase to work with large sets of structured, semi-structured and unstructured data coming from a variety of sources.
  • Used Tableau to generate reports and created visualization dashboards.
  • Experienced working with different file formats like Parquet, Avro, CSV, JSON, Text files.
  • Worked with Big Data Hadoop distributions: AWS EMR, Cloudera.
  • Developed MapReduce jobs using Java to process data sets by fitting the problem into the MapReduce programming paradigm.
  • Followed Agile-Scrum model and used DevOps tools like GitLab, JIRA, Confluence, Jenkins.

PROFESSIONAL EXPERIENCE:

Hadoop/ Big data: Spark, Hive, Sqoop, Kafka, YARN, NIFI, HBase, Oozie, MapReduce, Zookeeper

Programming: Scala, Java, SQL

Hadoop Distributions: Cloudera, Amazon EMR

Databases/Datawarehouses: Oracle, MySQL, HBase

Amazon Web Service: EMR, EC2, S3, Lambda, RDS, Redshift, IAM

Other tools & SDLC: Tableau, IntelliJ IDEA, Eclipse, SBT, Maven, Putty, JIRA, Confluence, Agile - Scrum

PROFESSIONAL EXPERIENCE:

Confidential,Nashville, TN

Spark/ Hadoop Engineer

Responsibilities:
  • Developed Spark applications and modified existing ones using Scala to meet business needs to process large datasets through Data frames, RDDs and performed several transformations and actions on top.
  • Made changes to the existing identity scoring algorithm and JSON configuration files to suite our need.
  • Developed Spark SQL applications to perform complex data operations on structured and semi-structured data stored as Parquet, JSON, XML files in S3 buckets.
  • Developed Scala scripts, UDFs using Data frames/Data sets in Spark for aggregation, queries and finding similarity on different types of datasets.
  • Experienced in performance tuning of Spark Applications by setting correct level of Parallelism and memory tuning and using efficient concepts.
  • Implemented schema extraction for Parquet and Avro file Formats in creating Hive tables.
  • Used Sqoop to transfer data from EMR to MySQL (S3 -> Sqoop -> Hive (EMR staging)-> Sqoop -> MySQL).
  • Performed Unit, Integration testing by mocking data.
  • Experienced working with different file formats like Parquet, Avro, JSON, XML and compression tools like Snappy for efficient storage, retrieval, and processing of files.
  • Involved in POC in developing a pipeline using Kafka to subscribe messages from necessary topics as client made changes on UI through apache tomcat.
  • Performed UPSERTS to the data in the data lake (Linking/ Unlinking patient records).
  • Created Entity-Relationship diagrams for the relational database.
  • Experienced working on cloud AWS using EMR. Performed operations on AWS using EC2 instances, S3 storage, performed RDS, Lambda, analytical Redshift operations.
  • Involved in client meetings, understanding business needs, gathering and analyzing functional requirements, tool selection discussions, attending on/off-shore meetings.
  • Experienced with Agile Scrum methodology, GitLab, IntelliJ IDEA, Confluence, JIRA, Jenkins for the project.

Environment: Spark 2.2.0, Scala 2.11.8, Sqoop, Kafka, AWS (EMR, S3, RDS, Lambda, Redshift), IntelliJ IDEA, GitLab, Confluence, JIRA, Jenkins, Agile(Scrum).

Confidential,Dallas, TX

Spark/ Hadoop Developer

Responsibilities:
  • Developed Spark applications using Scala.
  • Used Data frames/ Datasets to write SQL type queries using Spark SQL to work with datasets.
  • Performed real-time streaming jobs using Spark Streaming to analyze data on a regular window time interval to the incoming data from Kafka.
  • Created data pipeline: Kafka-> Spark -> HDFS along with the team.
  • Collaborated with Architects to design Spark model for the existing MapReduce model and migrated them to Spark models using Scala.
  • Tested and Optimized Spark applications.
  • Created Hive tables and had extensive experience with HiveQL.
  • Executed Hive queries on Parquet tables stored in Hive to perform data analysis to meet the business requirements.
  • Extended Hive functionality by writing custom UDFs, UDAFs, UDTFs to process large data.
  • Performed Hive UPSERTS, partitioning, bucketing, windowing operations, efficient queries for faster data operations.
  • Imported and exported data between relational database systems and HDFS/Hive using Sqoop.
  • Wrote custom Kafka consumer code and modified existing producer code in Java to push data to Spark-streaming jobs.
  • Scheduled jobs and automated workflows using Oozie.
  • Automated the movement of data using NIFI dataflow framework and performed streaming and batch processing via micro batches. Controlled and monitored data flow using web UI.
  • Worked with HBase database to perform operations with large sets of structured, semi-structured and unstructured data coming from different data sources. - need to add new line
  • Exported analytical results to MS SQL Server and used Tableau to generate reports and visualization dashboards.

Environment: Cloudera, Spark 2.0, Hive, Hadoop, Java, Scala, Kafka, Sqoop, MapReduce, Oozie, Zookeeper, Tableau, Agile, Eclipse.

Confidential

Hadoop/Java Developer

Responsibilities:
  • Created Hive tables, loaded data, executed HQL queries and developed MapReduce programs to perform analytical operations on data and to generate reports.
  • Created Hive internal and external tables, used MySQL to store table schemas. Wrote custom UDFs in Java.
  • Moved data between MySQL and HDFS using Sqoop.
  • Developed MapReduce jobs in Java for log analysis, analytics, and data cleaning.
  • Wrote complex MapReduce programs to perform operations by extracting, transforming, and aggregating to process terabytes of data.
  • Designed E-R diagrams to work with different tables.
  • Wrote many SQL, Procedures, PL/SQL, Triggers and Views on top of Oracle.
  • Developed the application using Core Java, Multi-Threading, Collections, JMS, JSP, Servlet, Maven.
  • Developed Java Multi-threading based archival job using executor service for Thread pooling, Callable job and Future task.
  • Redesigned and improved Tracking functionality using java Multi-Threading using Servlet, concurrent queue and thread.
  • Developed Junit and mocking based test code to test various modules.
  • Developed RESTful web service to fetch DB data to be used from UI.
  • Deployed the application on Apache Tomcat. Strong skills in OOP and design patterns.
  • Involved in the implementation of the Software development life cycle (SDLC) that includes Development, Testing, Implementation, and Maintenance Support.

Environment: Java, Hive, Sqoop, MySQL, Multi-threading, JDK, JSP, JMS, Servlet, HTML, CSS, Eclipse, Tomcat, REST.

Hire Now