Spark / Hadoop Developer Resume
NY
PROFESSIONAL SUMMARY:
- Overall 9 years of professional IT experience with strong emphasis in development and testing of software applications.
- Around4+ years of experience in Hadoop distributed file system (HDFS), Impala, Sqoop, Hive, HBase, Spark, Hue, Mapreduce framework, Kafka, Yarn, Flume, Oozie, Zookeeper and Pig.
- Hands on experience on various Hadoop components of Hadoop ecosystem such as Job Tracker, Task Tracker, Name Node, Data Node, Resource Manager and Application Manager.
- Good knowledge on AWS infrastructure services Amazon Simple Storage Service (Amazon S3),EMR and Amazon Elastic Compute Cloud (Amazon EC2).
- Experience in working with Amazon EMR, Cloudera (CDH3 & CDH4) and Hortonworks Hadoop Distributions.
- Involved in loading the structured and semi structured data into spark clusters using Spark SQL and Data Frames API.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs and Scala.
- Experience in implementing Real - Time event processing and analytics using messaging systems like Spark Streaming.
- Capable of creating real time data streaming solutions and batch style large scale distributed computing applications using Apache Spark, Spark Streaming, Kafka and Flume.
- Experience in analyzing data using Spark SQL, HIVEQL, PIG Latin, Spark/Scala and custom Map Reduce programs in Java.
- Have experience in Apache Spark, Spark Streaming, Spark SQL and NoSQL databases like HBase, Cassandra, and MongoDB.
- Experience in creating DStreams from sources like Flume, Kafka and performed different Spark transformations and actions on it.
- Experience in integrating Apache Kafka with Apache Storm and created Storm data pipelines for real time processing.
- Performed operations on real-time data using Storm, Spark Streaming from sources like Kafka, Flume.
- Implemented Pig Latin scripts to process, analyze and manipulate data files to get required statistics.
- Experienced with different file formats like Parquet, ORC, Avro, Sequence, CSV, XML, JSON, Text files.
- Worked with Big Data Hadoop distributions: Cloudera, Hortonworks and Amazon AWS.
- Developed MapReduce jobs using Java to process large data sets by fitting the problem into the MapReduce programming paradigm.
- Implemented Kafka Custom encoders for custom input format to load data into Kafka Partitions. Real time streaming the data using Spark with Kafka for faster processing.
- Having experience in developing a data pipeline using Kafka to store data into HDFS.
- Good experience in creating and designing data ingest pipelines using technologies such as Apache Storm- Kafka.
- Used Scala SBT to develop Scala coded spark projects and executed using spark-submit.
- Experience on Working with data extraction, transformation and load in Hive, Pig and HBase.
- Orchestrated various Sqoop queries, Pig scripts, Hive queries using Oozie workflows and sub-workflows.
- Responsible for handling different data formats like Avro, Parquet and ORC formats.
- Experience in performance tuning, monitoring the Hadoop cluster by gathering and analyzing the existing infrastructure using Cloudera manager.
- Knowledge of job workflow scheduling and monitoring tools like Oozie (Hive, Pig) and Zookeeper (Hbase).
TECHNICAL SKILLS:
Big Data Ecosystem: Hadoop, MapReduce, Pig, Hive, YARN, Kafka, Flume, Sqoop, Impala, Oozie, Zookeeper, Spark, Ambari, Mahout, MongoDB, Cassandra, Avro, Storm & Parquet.
Hadoop Distributions: Cloudera (CDH3, CDH4, and CDH5), Hortonworks, MapR and Apache
Languages: Java, Python, SQL, HTML, DHTML, Scala, JavaScript, XML and C/C++
Nosql Databases: Cassandra, MongoDB and HBase
Java Technologies: Servlets, JavaBeans, JSP, JDBC, and struts
Web Design Tools: HTML, DHTML, AJAX, JavaScript, JQuery and CSS, AngularJs, ExtJS and JSON
Development / Build Tools: Eclipse, Ant, Maven, IntelliJ, JUNIT and log4J
Frameworks: Struts, spring and Hibernate
App/Web servers: WebSphere, WebLogic, JBoss and Tomcat
DB Languages: MySQL, PL/SQL, PostgreSQL and Oracle
RDBMS: Teradata, Oracle 9i,10g,11i, MS SQL Server, MySQL and DB2
Operating systems: UNIX, LINUX, Mac OS and Windows Variants
PROFESSIONAL EXPERIENCE:
Confidential, NY
Spark / Hadoop Developer
Responsibilities:
- Hands on experience in Spark and Spark Streaming creating RDD & applying operations transformations and Actions.
- Developed Spark applications using Scala for easy Hadoop transitions.
- Used Spark and Spark-SQL to read the parquet data and create the tables in hive using the Scala API.
- Performed advanced procedures like text analytics and processing, using the in-memory computing capabilities of Spark using Scala.
- Developed Spark code using Scala and Spark-SQL for faster processing and testing.
- Used Spark-StreamingAPIs to perform necessary transformations and actions on the fly for building the common learner data model which gets the data from Kafka in near real time.
- Responsible for loading Data pipelines from web servers and Teradata using Sqoop with Kafka and Spark Streaming API.
- Developed Kafka producer and consumers, Cassandra clients and Spark along with components on HDFS, Hive.
- Populated HDFS and HBase with huge amounts of data using Apache Kafka.
- Used Kafka to ingest data into Spark engine.
- Configured deployed and maintained multi-node Dev and Test Kafka Clusters.
- Managing and scheduling Spark Jobs on a Hadoop Cluster using Oozie.
- Experienced with different scripting language like Python and shell scripts.
- Developed various Python scripts to find vulnerabilities with SQL Queries by doing SQL injection, permission checks and performance analysis.
- Experienced in Apache Spark for implementing advanced procedures like text analytics and processing using the in-memory computing capabilities written in Scala.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDD, Scala and Python.
- Worked on SparkSQL, created Data frames by loading data from Hive tables and created prep data and stored in AWS S3.
- Experience with AWS Cloud IAM , Data pipeline, EMR , S3 , EC2 , AWS CLI, SNS & other services
- Involvement in creating custom UDFs for Pig and Hive to consolidate strategies and usefulness of Python into Pig Latin and HQL (HiveQL).
- Extensively worked on Text, ORC, Avro and Parquet file formats and compression techniques like Gzip and Zlib.
- Implemented Hortonworks NiFi (HDP 2.4) and recommended solution to inject data from multiple data sources to HDFS and Hive using NiFi.
- Developed various data loading strategies and performed various transformations for analyzing the datasets by using Hortonworks Distribution for Hadoop ecosystem.
- Ingested data from RDBMS and performed data transformations, and then export the transformed data to Cassandra as per the business requirement and used Cassandra through Java services.
- Experience in NoSQL Column-Oriented Databases like Cassandra and its Integration with Hadoop cluster.
- Creating S3 buckets and managing policies for S3 buckets and utilized S3 bucket and Glacier for storage and backup AWS.
- Performed AWS Cloud administration managing EC2 instances, S3, SES and SNS services.
- Operated Elasticsearch time-series data like metrics and application events, area where the huge Beats ecosystem allows you to easily grab data for common applications.
- Hands on experience in developing the applications with Java, J2EE, J2EE - Servlets, JSP, EJB, SOAP, Web Services, JNDI, JMS, JDBC2, Hibernate, Struts, Spring, XML, HTML, XSD, XSLT, PL/SQL, Oracle10g and MS-SQL Server RDBMS.
- Delivered zero defect code for three large projects which involved changes to both front end (Core Java, Presentation services) and back-end (Oracle)
- Along with the Infrastructure team, involved in design and developed Kafka and Storm based data pipeline.
- Used OOZIE Operational Services for batch processing and scheduling workflows dynamically.
- Involved in loading and transforming large Datasets from relational databases into HDFS and vice-versa using Sqoop imports and export.
- Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and extracted data from MYSQL into HDFS vice-versa using Sqoop.
Environment: Hadoop, Hive, Mapreduce, Sqoop, Kafka, Spark, Yarn, Pig, Cassandra, Oozie, shell Scripting, Scala, Maven, Java, JUnit, NIFI, MySQL, AWS, EMR, EC2, S3, Hortonworks.
Confidential, Hilmar, CA
Hadoop/Spark Developer
Responsibilities:
- Optimizing of existing algorithms in Hadoop using Spark context, Spark-SQL, Data Frames and Pair RDD’s.
- Developed Spark scripts by using Java, and Python shell commands as per the requirement.
- Involved with ingesting data received from various relational database providers, on HDFS for analysis and other big data operations.
- Experienced in performance tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning.
- Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
- Worked on Spark SQL and Data frames for faster execution of Hive queries using Spark sqlContext.
- Performed analysis on implementing Spark using Scala.
- Used Data frames/ Datasets to write SQL type queries using Spark SQL to work with datasets sitting on HDFS.
- Extracted files from MongoDB through Sqoop and placed in HDFS and processed.
- Created and imported various collections, documents into MongoDB and performed various actions like query, project, aggregation, sort and limit.
- Experience with creating script for data modeling and data import and export. Extensive experience in deploying, managing and developing MongoDB clusters.
- Extracted files from MongoDB through Sqoop and placed in HDFS and processed.
- Experience in migrating HiveQL into Impala to minimize query response time.
- Creating Hive tables to import large data sets from various relational databases using Sqoop and export the analyzed data back for visualization and report generation by the BI team.
- Involved in creating Shell scripts to simplify the execution of all other scripts (Pig, Hive, Sqoop, Impala and MapReduce) and move the data inside and outside of HDFS.
- Implemented some of the big data operations on AWS cloud. Created cluster using EMR, EC2 instances, S3 buckets, analytical operations on RedShift, performed RDS, Lambda operations and managedresources using IAM.
- Utilize frameworks such as Struts, Spring, Hibernate, Web services to develop backend code.
- Used Hibernate reverse engineering tools to generate domain model classes, perform association mapping and inheritance mapping using annotations and XML, and implement second level caching using EHCache cache provider.
- Developed Pig Scripts, Pig UDFs and Hive Scripts, Hive UDFs to analyze HDFS data.
- Maintained the cluster securely using Kerberos and making the cluster up and running all the times.
- Implemented optimization and performance testing and tuning of Hive and Pig.
- Developed a data pipeline using Kafka to store data into HDFS.
- Worked on reading multiple data formats on HDFS using Scala
- Written shell scripts and Python scripts for automation of job.
- Configured Zookeeper to restart the failed jobs without human intervention.
Environment: Cloudera, HDFS, Hive, HQL scripts, Mapreduce, Java, HBase, Pig, Sqoop, Kafka,Impala, Shell Scripts,Python Scripts, Spark, Scala, Oozie, Zookeeper, shell Scripting, Scala, Maven, Java, JUnit, NIFI, AWS, EMR, EC2, S3.
Spark/Hadoop Developer
Confidential
Responsibilities:
- Experience in developing customized UDF's in java to extend Hive and Pig Latin functionality.
- Responsible for installing, configuring, supporting, and managing of Hadoop Clusters.
- Importing and exporting data into HDFS from Oracle 10.2 database and vice versa using SQOOP.
- Installed and configured Pigand written Pig Latin scripts.
- Designed and implemented HIVE queries and functions for evaluation, filtering, loading and storing of data.
- Created HBase tables and column families to store the user event data.
- Written automated HBase test cases for data quality checks using HBase command line tools.
- Developed a data pipeline using HBase , Spark and Hive to ingest, transform and analyzing customer behavioral data.
- Experience in collecting the log data from different sources like (webservers and social media) using Flume and storing on HDFS to perform MapReduce jobs.
- Handled importing of data from machine logs using Flume .
- Created Hive Tables, loaded data from Teradata using Sqoop.
- Worked extensively with importing metadata into Hive and migrated existing tables and applications to work on Hive and AWS cloud.
- Configured, monitored, and optimized Flume agent to capture web logs from the VPN server to be put into Hadoop Data Lake.
- Responsible for loading data from UNIX file systems to HDFS . Installed and configured Hive and written Pig / HiveUDF s.
- Wrote, tested and implemented Teradata Fast load, Multiload and Bteq scripts, DML and DDL.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDD , Scala and Python.
- Ec2 Collecting and aggregating large amounts of log data using Flume and staging data in HDFS for further analysis
- Exported the analyzed data to the relational databases using Sqoop to further visualize and generate reports for the BI team.
- Develop ETL Process using SPARK , SCALA , HIVE and HBASE .
- Collecting and aggregating large amounts of log data using Flume and staging data in HDFS for further analysis.
- Wrote Java code to format XML documents; upload them to Solr server for indexing.
- Used with NoSQL technology ( Amazon Dynodb ) to gather and track event-based metric .
- Maintenance of all the services in Hadoop ecosystem using ZOOKEPER .
- Worked on implementing Spark frame work.
- Designed and implemented Spark jobs to support distributed data processing.
- Expertise in Extraction, Transformation, loading data from Oracle, DB2, SQL Server, MS Access, Excel, Flat Files and XML using Talend .
- Experienced on loading and transforming of large sets of structured, semi and unstructured data.
- Help design of scalable Big Data clusters and solutions.
- Followed agile methodology for the entire project.
- Experience in working with Hadoop clusters using Cloudera distributions.
- Involved in Hadoop cluster task like Adding and Removing Nodes without any effect to running jobs and data.
- Developed workflows using Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig .
- Developed interactive shell scripts for scheduling various data cleansing and data loading process.
- Converting the existing relational database model to Hadoop ecosystem.
Environment: Hadoop, HDFS, Pig, Hive, Flume, Sqoop, Oozie, Python, Shell Scripting, SQL Talend, Spark, HBase, Elastic search, Linux- Ubuntu, Flume, Cloudera.
Confidential
Java Developer
Responsibilities:
- Implemented applications using Java, J2EE, JSP, Servlets, JDBC, RAD, XML, HTML, XHTML, Hibernate Struts, spring and JavaScript on Windows environments.
- Experienced in developing web-based applications using Python, Django, PHP, XML, CSS, HTML, JavaScript and jQuery.
- Designed and implemented the and reports modules of the application using Servlets, JSP and Ajax.
- Developed XML Web Services using SOAP, WSDL, and UDDI.
- Created the UI tool - using Java, XML, XSLT, DHTML and JavaScript
- Experience in develop of SDLC life cycle and undergo in all the phases in it.
- Developed action Servlets and JSPs for presentation in Struts MVC framework.
- Worked with Struts MVC objects like Action Servlet, Controllers, validators, Web Application Context, Handler Mapping, Message Resource Bundles, Form Controller, and JNDI for look-up for J2EE components.
- Developed PL/SQL View function in Oracle9i database for get available date module.
- Used Oracle SQL 4.0 as the database and write SQL queries in the DAO Layer.
- Experience in application using Core Java, JDBC, JSP, Servlets, spring, Hibernate, Web Services, SOAP, and WSD.
- Used RESTFUL Services to interact with the Client by providing the RESTFUL URL mapping.
- Used SVN and GitHub as version control tool.
- Implemented Hibernate in the data access object layer to access and update information in the Oracle 10g Database.
- Experience in JIRA and tracked the test results and interacted with the developers to resolve issue.
- Used XSLT to transform my XML data structure into HTML pages.
- Deployed EJB Components on Tomcat. Used JDBCAPI for interaction with OracleDB.
- Wrote build & deployment scripts using shell, Perl and ANTscripts
- Extensively used Java multi-threading to implement batch Jobs with JDK 1.5 features
Environment: HTML, JavaScript, Ajax, Servlets, JSP, SOAP, SDLC life cycle, Java, Hibernate, Scrum, JIRA, Github, JQuery, CSS, XML, ANT, Tomcat Server, Jasper Reports.
Confidential
Jr. Java Developer
Responsibilities:
- Actively involved from fresh start of the project, requirement gathering to quality assurance testing.
- Coded and Developed Multi-tier architecture in Java, J2EE, Servlets.
- Conducted analysis, requirements study and design according to various design patterns and developedrendering to the use cases, taking ownership of the features.
- Used various design patterns such as Command, Abstract Factory, Factory, and Singleton to improvethe system performance.
- Analyzing the critical coding defects and developing solutions.
- Developed configurable front end using Struts technology. Also involved in component-baseddevelopment of certain features which were reusable across modules.
- Designed, developed and maintained the data layer using the ORM framework called Hibernate.
- Used Hibernate framework for Persistence layer, involved in writing Stored Procedures for dataretrieval and data storage and updates in Oracle database using Hibernate.
- Developing & deploying Archive files (EAR, WAR, JAR) using ANT build tool.
- Used Software development best practices for Object Oriented Design and methodologies throughoutObject oriented development cycle.
- Responsible for developing SQL Queries required for the JDBC.
- Designed the database and worked on DB2 and executed DDLS and DMLS.
- Active participation in architecture framework design and coding and test plan development.
- Strictly followed Water Fall development methodologies for implementing projects.
- Thoroughly documented the detailed process flow with UML diagrams and flow charts for distributionacross various teams.
- Involved in developing presentations for developers (off shore support), QA, Productionsupport.
- Presented the process logical and physical flow to various teams using PowerPoint and Visio diagrams.
Environment: Java, Ajax, Informatica Power Center 8.x/9.x, REST API, SOAP API, Apache, Oracle 10/11g, SQL Loader, MYSQL SERVER, Flat Files, Targets, Aggregator, Router, Sequence Generator.