Spark/ Hadoop Engineer Resume St. Louis, MO - Hire IT People

SUMMARY

Spark/ Hadoop developer having 3+ years of experience in IT with 2+ years on Hadoop and have strong experience working with programming languages: Scala, Java.
Experience with Big Data/ Hadoop Ecosystem: Spark, Hive, Sqoop, Kafka, Oozie, HBase, MapReduce, NIFI.
In - depth understanding of Spark Architecture and performed several batch and real-time data stream operations using Spark (Core, SQL, Streaming).
Experienced in handling large datasets using Spark in-memory capabilities, Partitions, Broadcast variables, Accumulators, Effective & Efficient Joins. Used Scala to develop Spark applications.
Tested and Optimized Spark applications.
Performed Hive operations on large datasets with proficiency in writing HiveQL queries using transactional and performance efficient concepts: UPSERTS, Partitioning, Bucketing, Windowing,etc.
Wrote custom UDFs, UDAFs, UDTFs, and generated optimized execution plans for faster performance.
Imported data from relational databases to HDFS/Hive, performed operations and exported the results back using Sqoop.
Wrote custom Kafka Consumer programs in Java and implemented a pipeline: Kafka, Spark, HDFS/S3.
Implemented NIFI data workflow in production and performed streaming and batch processing via micro-batches coming from multiple data sources. Controlled and monitored using Web UI.
Scheduled jobs and automated workflows using Oozie.
Experienced working on cloud AWS using EMR. Performed operations on AWS using EC2 instances, S3 buckets, performed RDS, Lambda, analytical Redshift operations.
Used HBase to work with large sets of structured, semi-structured and unstructured data coming from a variety of sources.
Used Tableau to generate reports and created visualization dashboards.
Experienced working with different file formats like Parquet, Avro, CSV, JSON, Text files.
Worked with Big Data Hadoop distributions: AWS EMR, Cloudera.
Developed MapReduce jobs using Java to process data sets by fitting the problem into the MapReduce programming paradigm.
Followed Agile-Scrum model and used DevOps tools like GitLab, JIRA, Confluence, Jenkins.

TECHNICAL SKILLS

Hadoop/ Big data: Spark, Hive, Sqoop, Kafka, YARN, NIFI, HBase, Oozie, MapReduce, Zookeeper

Programming: Scala, Java, SQL

Hadoop Distributions: Cloudera, Amazon EMR

Databases/Datawarehouses: Oracle, MySQL, HBase

Amazon Web Services: EMR, EC2, S3, Lambda, RDS, Redshift, IAM

Other tools & SDLC: Tableau, IntelliJ IDEA, Eclipse, SBT, Maven, Putty, JIRA, Confluence, Agile - Scrum

PROFESSIONAL EXPERIENCE

Confidential - St. Louis, MO

Spark/ Hadoop Engineer

Responsibilities:

Developed Spark applications and modified existing ones using Scala to meet business needs to process large datasets through Data frames, RDDs and performed several transformations and actions on top.
Made changes to the existing identity scoring algorithm and JSON configuration files to suite our need.
Developed Spark SQL applications to perform complex data operations on structured and semi-structured data stored as Parquet, JSON, XML files in S3 buckets.
Developed Scala scripts, UDFs using Data frames/Data sets in Spark for aggregation, queries and finding similarity on different types of datasets.
Experienced in performance tuning of Spark Applications by setting correct level of Parallelism and memory tuning and using efficient concepts.
Implemented schema extraction for Parquet and Avro file Formats in creating Hive tables.
Used Sqoop to transfer data from EMR to MySQL (S3 -> Sqoop -> Hive (EMR staging)-> Sqoop -> MySQL).
Performed Unit, Integration testing by mocking data.
Experienced working with different file formats like Parquet, Avro, JSON, XML and compression tools like Snappy for efficient storage, retrieval, and processing of files.
Involved in POC in developing a pipeline using Kafka to subscribe messages from necessary topics as client made changes on UI through apache tomcat.
Performed UPSERTS to the data in the data lake (Linking/ Unlinking patient records).
Created Entity-Relationship diagrams for the relational database.
Experienced working on cloud AWS using EMR. Performed operations on AWS using EC2 instances, S3 storage, performed RDS, Lambda, analytical Redshift operations.
Involved in client meetings, understanding business needs, gathering and analyzing functional requirements, tool selection discussions, attending on/off-shore meetings.
Experienced with Agile Scrum methodology, GitLab, IntelliJ IDEA, Confluence, JIRA, Jenkins for the project.

Environment: Spark 2.2.0, Scala 2.11.8, Sqoop, Kafka, AWS (EMR, S3, RDS, Lambda, Redshift), IntelliJ IDEA, GitLab, Confluence, JIRA, Jenkins, Agile(Scrum).

Confidential - Dallas, TX.

Spark/ Hadoop Developer

Responsibilities:

Developed Spark applications using Scala.
Used Data frames/ Datasets to write SQL type queries using Spark SQL to work with datasets.
Performed real-time streaming jobs using Spark Streaming to analyze data on a regular window time interval to the incoming data from Kafka.
Created data pipeline: Kafka-> Spark -> HDFS along with the team.
Collaborated with Architects to design Spark model for the existing MapReduce model and migrated them to Spark models using Scala.
Tested and Optimized Spark applications.
Created Hive tables and had extensive experience with HiveQL.
Executed Hive queries on Parquet tables stored in Hive to perform data analysis to meet the business requirements.
Extended Hive functionality by writing custom UDFs, UDAFs, UDTFs to process large data.
Performed Hive UPSERTS, partitioning, bucketing, windowing operations, efficient queries for faster data operations.
Imported and exported data between relational database systems and HDFS/Hive using Sqoop.
Wrote custom Kafka consumer code and modified existing producer code in Java to push data to Spark-streaming jobs.
Scheduled jobs and automated workflows using Oozie.
Automated the movement of data using NIFI dataflow framework and performed streaming and batch processing via micro batches. Controlled and monitored data flow using web UI.
Worked with HBase database to perform operations with large sets of structured, semi-structured and unstructured data coming from different data sources. - need to add new line
Exported analytical results to MS SQL Server and used Tableau to generate reports and visualization dashboards.

Environment: Cloudera, Spark 2.0, Hive, Hadoop, Java, Scala, Kafka, Sqoop, MapReduce, Oozie, Zookeeper, Tableau, Agile, Eclipse.

Confidential

Hadoop/Java Developer

Responsibilities:

Created Hive tables, loaded data, executed HQL queries and developed MapReduce programs to perform analytical operations on data and to generate reports.
Created Hive internal and external tables, used MySQL to store table schemas. Wrote custom UDFs in Java.
Moved data between MySQL and HDFS using Sqoop.
Developed MapReduce jobs in Java for log analysis, analytics, and data cleaning.
Wrote complex MapReduce programs to perform operations by extracting, transforming, and aggregating to process terabytes of data.
Designed E-R diagrams to work with different tables.
Wrote many SQL, Procedures, PL/SQL, Triggers and Views on top of Oracle.
Developed the application using Core Java, Multi-Threading, Collections, JMS, JSP, Servlet, Maven.
Developed Java Multi-threading based archival job using executor service for Thread pooling, Callable job and Future task.
Redesigned and improved Tracking functionality using java Multi-Threading using Servlet, concurrent queue and thread.
Developed Junit and mocking based test code to test various modules.
Developed RESTful web service to fetch DB data to be used from UI.
Deployed the application on Apache Tomcat. Strong skills in OOP and design patterns.
Involved in the implementation of the Software development life cycle (SDLC) that includes Development, Testing, Implementation, and Maintenance Support.

Environment: Java, Hive, Sqoop, MySQL, Multi-threading, JDK, JSP, JMS, Servlet, HTML, CSS, Eclipse, Tomcat, REST.

We provide IT Staff Augmentation Services!

Spark/ Hadoop Engineer Resume

St Louis, MO

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship