We provide IT Staff Augmentation Services!

Spark / Hadoop Developer Resume

2.00/5 (Submit Your Rating)

NY

PROFESSIONAL SUMMARY:

  • Overall 9 years of professional IT experience with strong emphasis in development and testing of software applications.
  • Around4+ years of experience in Hadoop distributed file system (HDFS), Impala, Sqoop, Hive, HBase, Spark, Hue, Mapreduce framework, Kafka, Yarn, Flume, Oozie, Zookeeper and Pig.
  • Hands on experience on various Hadoop components of Hadoop ecosystem such as Job Tracker, Task Tracker, Name Node, Data Node, Resource Manager and Application Manager.
  • Good knowledge on AWS infrastructure services Amazon Simple Storage Service (Amazon S3),EMR and Amazon Elastic Compute Cloud (Amazon EC2).
  • Experience in working with Amazon EMR, Cloudera (CDH3 & CDH4) and Hortonworks Hadoop Distributions.
  • Involved in loading the structured and semi structured data into spark clusters using Spark SQL and Data Frames API.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs and Scala.
  • Experience in implementing Real - Time event processing and analytics using messaging systems like Spark Streaming.
  • Capable of creating real time data streaming solutions and batch style large scale distributed computing applications using Apache Spark, Spark Streaming, Kafka and Flume.
  • Experience in analyzing data using Spark SQL, HIVEQL, PIG Latin, Spark/Scala and custom Map Reduce programs in Java.
  • Have experience in Apache Spark, Spark Streaming, Spark SQL and NoSQL databases like HBase, Cassandra, and MongoDB.
  • Experience in creating DStreams from sources like Flume, Kafka and performed different Spark transformations and actions on it.
  • Experience in integrating Apache Kafka with Apache Storm and created Storm data pipelines for real time processing.
  • Performed operations on real-time data using Storm, Spark Streaming from sources like Kafka, Flume.
  • Implemented Pig Latin scripts to process, analyze and manipulate data files to get required statistics.
  • Experienced with different file formats like Parquet, ORC, Avro, Sequence, CSV, XML, JSON, Text files.
  • Worked with Big Data Hadoop distributions: Cloudera, Hortonworks and Amazon AWS.
  • Developed MapReduce jobs using Java to process large data sets by fitting the problem into the MapReduce programming paradigm.
  • Implemented Kafka Custom encoders for custom input format to load data into Kafka Partitions. Real time streaming the data using Spark with Kafka for faster processing.
  • Having experience in developing a data pipeline using Kafka to store data into HDFS.
  • Good experience in creating and designing data ingest pipelines using technologies such as Apache Storm- Kafka.
  • Used Scala SBT to develop Scala coded spark projects and executed using spark-submit.
  • Experience on Working with data extraction, transformation and load in Hive, Pig and HBase.
  • Orchestrated various Sqoop queries, Pig scripts, Hive queries using Oozie workflows and sub-workflows.
  • Responsible for handling different data formats like Avro, Parquet and ORC formats.
  • Experience in performance tuning, monitoring the Hadoop cluster by gathering and analyzing the existing infrastructure using Cloudera manager.
  • Knowledge of job workflow scheduling and monitoring tools like Oozie (Hive, Pig) and Zookeeper (Hbase).

TECHNICAL SKILLS:

Big Data Ecosystem: Hadoop, MapReduce, Pig, Hive, YARN, Kafka, Flume, Sqoop, Impala, Oozie, Zookeeper, Spark, Ambari, Mahout, MongoDB, Cassandra, Avro, Storm & Parquet.

Hadoop Distributions: Cloudera (CDH3, CDH4, and CDH5), Hortonworks, MapR and Apache

Languages: Java, Python, SQL, HTML, DHTML, Scala, JavaScript, XML and C/C++

Nosql Databases: Cassandra, MongoDB and HBase

Java Technologies: Servlets, JavaBeans, JSP, JDBC, and struts

Web Design Tools: HTML, DHTML, AJAX, JavaScript, JQuery and CSS, AngularJs, ExtJS and JSON

Development / Build Tools: Eclipse, Ant, Maven, IntelliJ, JUNIT and log4J

Frameworks: Struts, spring and Hibernate

App/Web servers: WebSphere, WebLogic, JBoss and Tomcat

DB Languages: MySQL, PL/SQL, PostgreSQL and Oracle

RDBMS: Teradata, Oracle 9i,10g,11i, MS SQL Server, MySQL and DB2

Operating systems: UNIX, LINUX, Mac OS and Windows Variants

PROFESSIONAL EXPERIENCE:

Confidential, NY

Spark / Hadoop Developer

Responsibilities:

  • Hands on experience in Spark and Spark Streaming creating RDD & applying operations transformations and Actions.
  • Developed Spark applications using Scala for easy Hadoop transitions.
  • Used Spark and Spark-SQL to read the parquet data and create the tables in hive using the Scala API.
  • Performed advanced procedures like text analytics and processing, using the in-memory computing capabilities of Spark using Scala.
  • Developed Spark code using Scala and Spark-SQL for faster processing and testing.
  • Used Spark-StreamingAPIs to perform necessary transformations and actions on the fly for building the common learner data model which gets the data from Kafka in near real time.
  • Responsible for loading Data pipelines from web servers and Teradata using Sqoop with Kafka and Spark Streaming API.
  • Developed Kafka producer and consumers, Cassandra clients and Spark along with components on HDFS, Hive.
  • Populated HDFS and HBase with huge amounts of data using Apache Kafka.
  • Used Kafka to ingest data into Spark engine.
  • Configured deployed and maintained multi-node Dev and Test Kafka Clusters.
  • Managing and scheduling Spark Jobs on a Hadoop Cluster using Oozie.
  • Experienced with different scripting language like Python and shell scripts.
  • Developed various Python scripts to find vulnerabilities with SQL Queries by doing SQL injection, permission checks and performance analysis.
  • Experienced in Apache Spark for implementing advanced procedures like text analytics and processing using the in-memory computing capabilities written in Scala.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDD, Scala and Python.
  • Worked on SparkSQL, created Data frames by loading data from Hive tables and created prep data and stored in AWS S3.
  • Experience with AWS Cloud IAM , Data pipeline, EMR , S3 , EC2 , AWS CLI, SNS & other services
  • Involvement in creating custom UDFs for Pig and Hive to consolidate strategies and usefulness of Python into Pig Latin and HQL (HiveQL).
  • Extensively worked on Text, ORC, Avro and Parquet file formats and compression techniques like Gzip and Zlib.
  • Implemented Hortonworks NiFi (HDP 2.4) and recommended solution to inject data from multiple data sources to HDFS and Hive using NiFi.
  • Developed various data loading strategies and performed various transformations for analyzing the datasets by using Hortonworks Distribution for Hadoop ecosystem.
  • Ingested data from RDBMS and performed data transformations, and then export the transformed data to Cassandra as per the business requirement and used Cassandra through Java services.
  • Experience in NoSQL Column-Oriented Databases like Cassandra and its Integration with Hadoop cluster.
  • Creating S3 buckets and managing policies for S3 buckets and utilized S3 bucket and Glacier for storage and backup AWS.
  • Performed AWS Cloud administration managing EC2 instances, S3, SES and SNS services.
  • Operated Elasticsearch time-series data like metrics and application events, area where the huge Beats ecosystem allows you to easily grab data for common applications.
  • Hands on experience in developing the applications with Java, J2EE, J2EE - Servlets, JSP, EJB, SOAP, Web Services, JNDI, JMS, JDBC2, Hibernate, Struts, Spring, XML, HTML, XSD, XSLT, PL/SQL, Oracle10g and MS-SQL Server RDBMS.
  • Delivered zero defect code for three large projects which involved changes to both front end (Core Java, Presentation services) and back-end (Oracle)
  • Along with the Infrastructure team, involved in design and developed Kafka and Storm based data pipeline.
  • Used OOZIE Operational Services for batch processing and scheduling workflows dynamically.
  • Involved in loading and transforming large Datasets from relational databases into HDFS and vice-versa using Sqoop imports and export.
  • Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and extracted data from MYSQL into HDFS vice-versa using Sqoop.

Environment: Hadoop, Hive, Mapreduce, Sqoop, Kafka, Spark, Yarn, Pig, Cassandra, Oozie, shell Scripting, Scala, Maven, Java, JUnit, NIFI, MySQL, AWS, EMR, EC2, S3, Hortonworks.

Confidential, Hilmar, CA

Hadoop/Spark Developer

Responsibilities:

  • Optimizing of existing algorithms in Hadoop using Spark context, Spark-SQL, Data Frames and Pair RDD’s.
  • Developed Spark scripts by using Java, and Python shell commands as per the requirement.
  • Involved with ingesting data received from various relational database providers, on HDFS for analysis and other big data operations.
  • Experienced in performance tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning.
  • Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
  • Worked on Spark SQL and Data frames for faster execution of Hive queries using Spark sqlContext.
  • Performed analysis on implementing Spark using Scala.
  • Used Data frames/ Datasets to write SQL type queries using Spark SQL to work with datasets sitting on HDFS.
  • Extracted files from MongoDB through Sqoop and placed in HDFS and processed.
  • Created and imported various collections, documents into MongoDB and performed various actions like query, project, aggregation, sort and limit.
  • Experience with creating script for data modeling and data import and export. Extensive experience in deploying, managing and developing MongoDB clusters.
  • Extracted files from MongoDB through Sqoop and placed in HDFS and processed.
  • Experience in migrating HiveQL into Impala to minimize query response time.
  • Creating Hive tables to import large data sets from various relational databases using Sqoop and export the analyzed data back for visualization and report generation by the BI team.
  • Involved in creating Shell scripts to simplify the execution of all other scripts (Pig, Hive, Sqoop, Impala and MapReduce) and move the data inside and outside of HDFS.
  • Implemented some of the big data operations on AWS cloud. Created cluster using EMR, EC2 instances, S3 buckets, analytical operations on RedShift, performed RDS, Lambda operations and managedresources using IAM.
  • Utilize frameworks such as Struts, Spring, Hibernate, Web services to develop backend code.
  • Used Hibernate reverse engineering tools to generate domain model classes, perform association mapping and inheritance mapping using annotations and XML, and implement second level caching using EHCache cache provider.
  • Developed Pig Scripts, Pig UDFs and Hive Scripts, Hive UDFs to analyze HDFS data.
  • Maintained the cluster securely using Kerberos and making the cluster up and running all the times.
  • Implemented optimization and performance testing and tuning of Hive and Pig.
  • Developed a data pipeline using Kafka to store data into HDFS.
  • Worked on reading multiple data formats on HDFS using Scala
  • Written shell scripts and Python scripts for automation of job.
  • Configured Zookeeper to restart the failed jobs without human intervention.

Environment: Cloudera, HDFS, Hive, HQL scripts, Mapreduce, Java, HBase, Pig, Sqoop, Kafka,Impala, Shell Scripts,Python Scripts, Spark, Scala, Oozie, Zookeeper, shell Scripting, Scala, Maven, Java, JUnit, NIFI, AWS, EMR, EC2, S3.

Spark/Hadoop Developer

Confidential

Responsibilities:

  • Experience in developing customized UDF's in java to extend Hive and Pig Latin functionality.
  • Responsible for installing, configuring, supporting, and managing of Hadoop Clusters.
  • Importing and exporting data into HDFS from Oracle 10.2 database and vice versa using SQOOP.
  • Installed and configured Pigand written Pig Latin scripts.
  • Designed and implemented HIVE queries and functions for evaluation, filtering, loading and storing of data.
  • Created HBase tables and column families to store the user event data.
  • Written automated HBase test cases for data quality checks using HBase command line tools.
  • Developed a data pipeline using HBase , Spark and Hive to ingest, transform and analyzing customer behavioral data.
  • Experience in collecting the log data from different sources like (webservers and social media) using Flume and storing on HDFS to perform MapReduce jobs.
  • Handled importing of data from machine logs using Flume .
  • Created Hive Tables, loaded data from Teradata using Sqoop.
  • Worked extensively with importing metadata into Hive and migrated existing tables and applications to work on Hive and AWS cloud.
  • Configured, monitored, and optimized Flume agent to capture web logs from the VPN server to be put into Hadoop Data Lake.
  • Responsible for loading data from UNIX file systems to HDFS . Installed and configured Hive and written Pig / HiveUDF s.
  • Wrote, tested and implemented Teradata Fast load, Multiload and Bteq scripts, DML and DDL.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDD , Scala and Python.
  • Ec2 Collecting and aggregating large amounts of log data using Flume and staging data in HDFS for further analysis
  • Exported the analyzed data to the relational databases using Sqoop to further visualize and generate reports for the BI team.
  • Develop ETL Process using SPARK , SCALA , HIVE and HBASE .
  • Collecting and aggregating large amounts of log data using Flume and staging data in HDFS for further analysis.
  • Wrote Java code to format XML documents; upload them to Solr server for indexing.
  • Used with NoSQL technology ( Amazon Dynodb ) to gather and track event-based metric .
  • Maintenance of all the services in Hadoop ecosystem using ZOOKEPER .
  • Worked on implementing Spark frame work.
  • Designed and implemented Spark jobs to support distributed data processing.
  • Expertise in Extraction, Transformation, loading data from Oracle, DB2, SQL Server, MS Access, Excel, Flat Files and XML using Talend .
  • Experienced on loading and transforming of large sets of structured, semi and unstructured data.
  • Help design of scalable Big Data clusters and solutions.
  • Followed agile methodology for the entire project.
  • Experience in working with Hadoop clusters using Cloudera distributions.
  • Involved in Hadoop cluster task like Adding and Removing Nodes without any effect to running jobs and data.
  • Developed workflows using Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig .
  • Developed interactive shell scripts for scheduling various data cleansing and data loading process.
  • Converting the existing relational database model to Hadoop ecosystem.

Environment: Hadoop, HDFS, Pig, Hive, Flume, Sqoop, Oozie, Python, Shell Scripting, SQL Talend, Spark, HBase, Elastic search, Linux- Ubuntu, Flume, Cloudera.

Confidential

Java Developer

Responsibilities:

  • Implemented applications using Java, J2EE, JSP, Servlets, JDBC, RAD, XML, HTML, XHTML, Hibernate Struts, spring and JavaScript on Windows environments.
  • Experienced in developing web-based applications using Python, Django, PHP, XML, CSS, HTML, JavaScript and jQuery.
  • Designed and implemented the and reports modules of the application using Servlets, JSP and Ajax.
  • Developed XML Web Services using SOAP, WSDL, and UDDI.
  • Created the UI tool - using Java, XML, XSLT, DHTML and JavaScript
  • Experience in develop of SDLC life cycle and undergo in all the phases in it.
  • Developed action Servlets and JSPs for presentation in Struts MVC framework.
  • Worked with Struts MVC objects like Action Servlet, Controllers, validators, Web Application Context, Handler Mapping, Message Resource Bundles, Form Controller, and JNDI for look-up for J2EE components.
  • Developed PL/SQL View function in Oracle9i database for get available date module.
  • Used Oracle SQL 4.0 as the database and write SQL queries in the DAO Layer.
  • Experience in application using Core Java, JDBC, JSP, Servlets, spring, Hibernate, Web Services, SOAP, and WSD.
  • Used RESTFUL Services to interact with the Client by providing the RESTFUL URL mapping.
  • Used SVN and GitHub as version control tool.
  • Implemented Hibernate in the data access object layer to access and update information in the Oracle 10g Database.
  • Experience in JIRA and tracked the test results and interacted with the developers to resolve issue.
  • Used XSLT to transform my XML data structure into HTML pages.
  • Deployed EJB Components on Tomcat. Used JDBCAPI for interaction with OracleDB.
  • Wrote build & deployment scripts using shell, Perl and ANTscripts
  • Extensively used Java multi-threading to implement batch Jobs with JDK 1.5 features

Environment: HTML, JavaScript, Ajax, Servlets, JSP, SOAP, SDLC life cycle, Java, Hibernate, Scrum, JIRA, Github, JQuery, CSS, XML, ANT, Tomcat Server, Jasper Reports.

Confidential

Jr. Java Developer

Responsibilities:

  • Actively involved from fresh start of the project, requirement gathering to quality assurance testing.
  • Coded and Developed Multi-tier architecture in Java, J2EE, Servlets.
  • Conducted analysis, requirements study and design according to various design patterns and developedrendering to the use cases, taking ownership of the features.
  • Used various design patterns such as Command, Abstract Factory, Factory, and Singleton to improvethe system performance.
  • Analyzing the critical coding defects and developing solutions.
  • Developed configurable front end using Struts technology. Also involved in component-baseddevelopment of certain features which were reusable across modules.
  • Designed, developed and maintained the data layer using the ORM framework called Hibernate.
  • Used Hibernate framework for Persistence layer, involved in writing Stored Procedures for dataretrieval and data storage and updates in Oracle database using Hibernate.
  • Developing & deploying Archive files (EAR, WAR, JAR) using ANT build tool.
  • Used Software development best practices for Object Oriented Design and methodologies throughoutObject oriented development cycle.
  • Responsible for developing SQL Queries required for the JDBC.
  • Designed the database and worked on DB2 and executed DDLS and DMLS.
  • Active participation in architecture framework design and coding and test plan development.
  • Strictly followed Water Fall development methodologies for implementing projects.
  • Thoroughly documented the detailed process flow with UML diagrams and flow charts for distributionacross various teams.
  • Involved in developing presentations for developers (off shore support), QA, Productionsupport.
  • Presented the process logical and physical flow to various teams using PowerPoint and Visio diagrams.

Environment: Java, Ajax, Informatica Power Center 8.x/9.x, REST API, SOAP API, Apache, Oracle 10/11g, SQL Loader, MYSQL SERVER, Flat Files, Targets, Aggregator, Router, Sequence Generator.

We'd love your feedback!