We provide IT Staff Augmentation Services!

Spark / Hadoop Developer Resume

3.00/5 (Submit Your Rating)

Dallas, TX

PROFESSIONAL SUMMARY:

  • Over 3+ years of professional IT experience with strong emphasis in development and testing of software applications.
  • Around 2 years of experience in Hadoop distributed file system (HDFS), Impala, Sqoop, Hive, HBase, Spark, Hue, MapReduce framework, Kafka, Yarn, Flume, Oozie, Zookeeper and Pig.
  • Hands on experience on various Hadoop components of Hadoop ecosystem such as Job Tracker, Task Tracker, Name Node, Data Node, Resource Manager and Application Manager.
  • Good knowledge on AWS infrastructure services Amazon Simple Storage Service (Amazon S3), EMR and Amazon Elastic Compute Cloud (Amazon EC2).
  • Experience in working with Amazon EMR, Cloudera (CDH3 & CDH4) and Hortonworks Hadoop Distributions.
  • Involved in loading the structured and semi structured data into spark clusters using Spark SQL and Data Frames API.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs and Scala.
  • Experience in implementing Real - Time event processing and analytics using messaging systems like Spark Streaming.
  • Capable of creating real time data streaming solutions and batch style large scale distributed computing applications using Apache Spark, Spark Streaming, Kafka and Flume.
  • Experience in analyzing data using Spark SQL, HIVEQL, PIG Latin, Spark/Scala and custom Map Reduce programs in Java.
  • Have experience in Apache Spark, Spark Streaming, Spark SQL and NoSQL databases like HBase, Cassandra, and MongoDB.
  • Experience in creating DStreams from sources like Flume, Kafka and performed different Spark transformations and actions on it.
  • Experience in integrating Apache Kafka with Apache Storm and created Storm data pipelines for real time processing.
  • Performed operations on real-time data using Storm, Spark Streaming from sources like Kafka, Flume.
  • Implemented Pig Latin scripts to process, analyze and manipulate data files to get required statistics.
  • Experienced with different file formats like Parquet, ORC, Avro, Sequence, CSV, XML, JSON, Text files.
  • Worked with Big Data Hadoop distributions: Cloudera, Hortonworks and Amazon AWS.
  • Developed MapReduce jobs using Java to process large data sets by fitting the problem into the MapReduce programming paradigm.
  • Implemented Kafka Custom encoders for custom input format to load data into Kafka Partitions. Real time streaming the data using Spark with Kafka for faster processing.
  • Having experience in developing a data pipeline using Kafka to store data into HDFS.
  • Good experience in creating and designing data ingest pipelines using technologies such as Apache Storm- Kafka.
  • Used Scala SBT to develop Scala coded spark projects and executed using spark-submit.
  • Experience on Working with data extraction, transformation and load in Hive, Pig and HBase.
  • Orchestrated various Sqoop queries, Pig scripts, Hive queries using Oozie workflows and sub-workflows.
  • Responsible for handling different data formats like Avro, Parquet and ORC formats.
  • Experience in performance tuning, monitoring the Hadoop cluster by gathering and analyzing the existing infrastructure using Cloudera manager.
  • Knowledge of job workflow scheduling and monitoring tools like Oozie (Hive, Pig) and Zookeeper (Hbase).

TECHNICAL SKILLS:

Big Data Ecosystem: Hadoop, MapReduce, Pig, Hive, YARN, Kafka, Flume, Sqoop, Impala, Oozie, Zookeeper, Spark, Ambari, Mahout, MongoDB, Cassandra, Avro, Storm & Parquet.

Hadoop Distributions: Cloudera (CDH3, CDH4, and CDH5), Hortonworks, MapR and Apache

Languages: Java, Python, SQL, HTML, DHTML, Scala, JavaScript, XML and C/C++

Nosql Databases: Cassandra, MongoDB and HBase

Java Technologies: Servlets, JavaBeans, JSP, JDBC, and struts

Web Design Tools: HTML, DHTML, AJAX, JavaScript, JQuery and CSS, AngularJs, ExtJS and JSON

Development / Build Tools: Eclipse, Ant, Maven, IntelliJ, JUNIT and log4J

Frameworks: Struts, spring and Hibernate

App/Web servers: WebSphere, WebLogic, JBoss and Tomcat

DB Languages: MySQL, PL/SQL, PostgreSQL and Oracle

RDBMS: Teradata, Oracle 9i,10g,11i, MS SQL Server, MySQL and DB2

Operating systems: UNIX, LINUX, Mac OS and Windows Variants

PROFESSIONAL EXPERIENCE:

Confidential, Dallas, TX

Spark / Hadoop Developer

Responsibilities:

  • Hands on experience in Spark and Spark Streaming creating RDD & applying operations transformations and Actions.
  • Developed Spark applications using Scala for easy Hadoop transitions.
  • Used Spark and Spark-SQL to read the parquet data and create the tables in hive using the Scala API.
  • Performed advanced procedures like text analytics and processing, using the in-memory computing capabilities of Spark using Scala.
  • Developed Spark code using Scala and Spark-SQL for faster processing and testing.
  • Used Spark-StreamingAPIs to perform necessary transformations and actions on the fly for building the common learner data model which gets the data from Kafka in near real time.
  • Responsible for loading Data pipelines from web servers and Teradata using Sqoop with Kafka and Spark Streaming API.
  • Developed Kafka producer and consumers, Cassandra clients and Spark along with components on HDFS, Hive.
  • Populated HDFS and HBase with huge amounts of data using Apache Kafka.
  • Used Kafka to ingest data into Spark engine.
  • Configured deployed and maintained multi-node Dev and Test Kafka Clusters.
  • Managing and scheduling Spark Jobs on a Hadoop Cluster using Oozie.
  • Experienced with different scripting language like Python and shell scripts.
  • Developed various Python scripts to find vulnerabilities with SQL Queries by doing SQL injection, permission checks and performance analysis.
  • Experienced in Apache Spark for implementing advanced procedures like text analytics and processing using the in-memory computing capabilities written in Scala.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDD, Scala and Python.
  • Worked on SparkSQL, created Data frames by loading data from Hive tables and created prep data and stored in AWS S3.
  • Experience with AWS Cloud IAM, Data pipeline, EMR, S3, EC2, AWS CLI, SNS & other services
  • Involvement in creating custom UDFs for Pig and Hive to consolidate strategies and usefulness of Python into Pig Latin and HQL (HiveQL).
  • Extensively worked on Text, ORC, Avro and Parquet file formats and compression techniques like Gzip and Zlib.
  • Implemented Hortonworks NiFi (HDP 2.4) and recommended solution to inject data from multiple data sources to HDFS and Hive using NiFi.
  • Developed various data loading strategies and performed various transformations for analyzing the datasets by using Hortonworks Distribution for Hadoop ecosystem.
  • Ingested data from RDBMS and performed data transformations, and then export the transformed data to Cassandra as per the business requirement and used Cassandra through Java services.
  • Experience in NoSQL Column-Oriented Databases like Cassandra and its Integration with Hadoop cluster.
  • Creating S3 buckets and managing policies for S3 buckets and utilized S3 bucket and Glacier for storage and backup AWS.
  • Performed AWS Cloud administration managing EC2 instances, S3, SES and SNS services.
  • Operated Elasticsearch time-series data like metrics and application events, area where the huge Beats ecosystem allows you to easily grab data for common applications.
  • Hands on experience in developing the applications with Java, J2EE, J2EE - Servlets, JSP, EJB, SOAP, Web Services, JNDI, JMS, JDBC2, Hibernate, Struts, Spring, XML, HTML, XSD, XSLT, PL/SQL, Oracle10g and MS-SQL Server RDBMS.
  • Delivered zero defect code for three large projects which involved changes to both front end (Core Java, Presentation services) and back-end (Oracle)
  • Along with the Infrastructure team, involved in design and developed Kafka and Storm based data pipeline.
  • Used OOZIE Operational Services for batch processing and scheduling workflows dynamically.
  • Involved in loading and transforming large Datasets from relational databases into HDFS and vice-versa using Sqoop imports and export.
  • Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and extracted data from MYSQL into HDFS vice-versa using Sqoop.

Environment: Hadoop, Hive, MapReduce, Sqoop, Kafka, Spark, Yarn, Pig, Cassandra, Oozie, shell Scripting, Scala, Maven, Java, JUnit, NIFI, MySQL, AWS, EMR, EC2, S3, Hortonworks.

Confidential, Hilmar, CA

Hadoop/Spark Developer

Responsibilities:

  • Optimizing of existing algorithms in Hadoop using Spark context, Spark-SQL, Data Frames and Pair RDD’s.
  • Developed Spark scripts by using Java, and Python shell commands as per the requirement.
  • Involved with ingesting data received from various relational database providers, on HDFS for analysis and other big data operations.
  • Experienced in performance tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning.
  • Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
  • Worked on Spark SQL and Data frames for faster execution of Hive queries using Spark sqlContext.
  • Performed analysis on implementing Spark using Scala.
  • Used Data frames/ Datasets to write SQL type queries using Spark SQL to work with datasets sitting on HDFS.
  • Extracted files from MongoDB through Sqoop and placed in HDFS and processed.
  • Created and imported various collections, documents into MongoDB and performed various actions like query, project, aggregation, sort and limit.
  • Experience with creating script for data modeling and data import and export. Extensive experience in deploying, managing and developing MongoDB clusters.
  • Extracted files from MongoDB through Sqoop and placed in HDFS and processed.
  • Experience in migrating HiveQL into Impala to minimize query response time.
  • Creating Hive tables to import large data sets from various relational databases using Sqoop and export the analyzed data back for visualization and report generation by the BI team.
  • Involved in creating Shell scripts to simplify the execution of all other scripts (Pig, Hive, Sqoop, Impala and MapReduce) and move the data inside and outside of HDFS.
  • Implemented some of the big data operations on AWS cloud. Created cluster using EMR, EC2 instances, S3 buckets, analytical operations on RedShift, performed RDS, Lambda operations and managed resources using IAM.
  • Utilize frameworks such as Struts, Spring, Hibernate, Web services to develop backend code.
  • Used Hibernate reverse engineering tools to generate domain model classes, perform association mapping and inheritance mapping using annotations and XML, and implement second level caching using EHCache cache provider.
  • Developed Pig Scripts, Pig UDFs and Hive Scripts, Hive UDFs to analyze HDFS data.
  • Maintained the cluster securely using Kerberos and making the cluster up and running all the times.
  • Implemented optimization and performance testing and tuning of Hive and Pig.
  • Developed a data pipeline using Kafka to store data into HDFS.
  • Worked on reading multiple data formats on HDFS using Scala
  • Written shell scripts and Python scripts for automation of job.
  • Configured Zookeeper to restart the failed jobs without human intervention.

Environment: Cloudera, HDFS, Hive, HQL scripts, MapReduce, Java, HBase, Pig, Sqoop, Kafka, Impala, Shell Scripts, Python Scripts, Spark, Scala, Oozie, Zookeeper, shell Scripting, Scala, Maven, Java, JUnit, NIFI, AWS, EMR, EC2, S3.

Confidential

Jr. Java Developer

Responsibilities:

  • Actively involved from fresh start of the project, requirement gathering to quality assurance testing.
  • Coded and Developed Multi-tier architecture in Java, J2EE, Servlets.
  • Conducted analysis, requirements study and design according to various design patterns and developed rendering to the use cases, taking ownership of the features.
  • Used various design patterns such as Command, Abstract Factory, Factory, and Singleton to improve the system performance.
  • Analyzing the critical coding defects and developing solutions.
  • Developed configurable front end using Struts technology. Also involved in component-based development of certain features which were reusable across modules.
  • Designed, developed and maintained the data layer using the ORM framework called Hibernate.
  • Used Hibernate framework for Persistence layer, involved in writing Stored Procedures for data retrieval and data storage and updates in Oracle database using Hibernate.
  • Developing & deploying Archive files (EAR, WAR, JAR) using ANT build tool.
  • Used Software development best practices for Object Oriented Design and methodologies throughout Object oriented development cycle.
  • Responsible for developing SQL Queries required for the JDBC.
  • Designed the database and worked on DB2 and executed DDLS and DMLS.
  • Active participation in architecture framework design and coding and test plan development.
  • Strictly followed Water Fall development methodologies for implementing projects.
  • Thoroughly documented the detailed process flow with UML diagrams and flow charts for distribution across various teams.
  • Involved in developing training presentations for developers (off shore support), QA, Production support.
  • Presented the process logical and physical flow to various teams using PowerPoint and Visio diagrams.

Environment: Java, Ajax, Informatica Power Center 8.x/9.x, REST API, SOAP API, Apache, Oracle 10/11g, SQL Loader, MYSQL SERVER, Flat Files, Targets, Aggregator, Router, Sequence Generator.

We'd love your feedback!