Senior Hadoop/spark Developer Resume
Berkeley Heights, NJ
SUMMARY:
- Around 7 Years of experience in the field of Information Technology which includes a major concentration on Big Data Tools and Technologies, various Relational Databases and NoSQL Databases, Java Programming language and J2EE technologies with highly recommended software practices.
- Hands on experience in developing application using Hadoop ecosystem like Spark, Hadoop MapReduce, HDFS, Yarn, Pig, Hive, Sqoop, Oozie, Avro, HBase, Zookeeper, Flume, Hue, Kafka and Storm.
- Extensive experience in developing application using Scala, Python, Java and android.
- Experience in different Hadoop distributions like Cloudera (Cloudera distribution CDH 4 and 5) and knowlede of Horton Works Distributions (HDP).
- Experience with Cloudera Manager Administration and Monitor Hadoop cluster using Cloudera Manager and Apache Ambari .
- Expertise in installing, designing, sizing, configuring, provisioning and upgrading Hadoop environments.
- Excellent understanding of Hadoop architecture and core components such as Master Node (Name Node), Secondary Node(Data Node) .
- Good experience in both MapReduce MRv1 and MapReduce MRv2 (YARN).
- Extensive experience in testing, debugging and deploying MapReduce Hadoop platforms.
- Worked with Spark engine to process large scale data and experience to create Spark RDD.
- Expert in creating Hive tables and write Hive queries to do analysis of HDFS data.
- Good experiences in creating Pig and Hive UDFs using Java in order to analyze the data efficiently.
- Experience in importing and exporting data using Sqoop from HDFS to Relational Database systems (RDBMS) and vice - versa.
- Successfully loaded files to Hive and HDFS from Oracle and SQL Server using SQOOP.
- Hands on NoSQL database experience with MongoDb, HBase and Cassandra.
- Excellent relational database ( RDBMS ) experience in Oracle, MySQL, SQL Server .
- Extensive experience in SQL (Structure Query Language) and PL/SQL - Stored Procedure, Trigger, Sequence, Index.
- Experience in creating Map Reduce codes in Java as per the business requirements.
- Extensive experience in developing Java application using Spring MVC , Spring Restful web service, Struts2, JSP (java server pages), Servlet, ORM (Object Relational mapping), Hibernate , Core java and Swing.
- Strong experience in IO, Bean, String, JDBC,JSTL,HTML, AngularJS , Multithreading, JavaScript, Ajax, CSS, jQuery, Collections, JSON, XML and auto building tool Jenkins.
- Excellent experience in developing web-base and desktop report using Jasper Report tool.
- Extensively worked on Amazon web service ( AWS ) using difference services like EC2, S3, Relational Database Service (RDS), DynamoDB, Elastic load balancing (ELB), Auto scaling, Elastic Block Store (EBS), Elastic MapReduce (EMR).
- Good working knowledge on Eclipse IDE for developing and debugging Java applications.
- Experience in using version control tools like Subversion (SVN), GIT
- Experience in working with software methodologies like Agile and Waterfall.
- Thorough knowledge of Software Development Life Cycle (SDLC) with deep understanding of various phases like Requirements gathering, Analysis, Design, Development and Testing.
TECHNICAL SKILLS:
Hadoop/Big Data Framework: Apache Spark, HDFS, MapReduce, Yarn, Hive, Pig, HBase, Sqoop, Oozie, Zookeeper, Flume, Kafka and Storm
Programming Languages: Scala, Java (JDK1.5/1.6/1.7), J2EE, Python, Pig Latin, HiveQL, Android, HTML, C, C++, JavaScript, J Query, CSS, Ajax, Shell script
Databases: My SQL 5.6/5.5/5.1, Mogo DB, Oracle 10g, SQL Server, MS Access.
Java Framework and Tools: Spring 4/3, Struts 2, Hibernate 3/4, AngularJS 1.0
IDE Tools: Eclipse 4.5/4.3/3.1/3.0 , Net Beans 4.1/4.0
Database GUI Tools: Robo mongo, SQL Developer, SQL yog 5.26/11.11, MySQL Workbench, Toad, SQL Server Management Studio
Reporting Tool: Jasper Report
Operating Systems: Linux (Fedora10/18, Ubuntu13/16), Windows XP/2007/10
Other skills: AWS, Internet OF things, GIT, SVN, Clear Case, JFrogArtifactory
Development Methodologies: Agile/Scrum, Waterfall
PROFESSIONAL EXPERIENCE:
Senior Hadoop/Spark Developer
Confidential, Berkeley Heights, NJ
Responsibilities:
- Hands on experience in Spark and Spark Streaming creating RDD & applying operations transformations and Actions.
- Developed Spark applications using Scala for easy Hadoop transitions.
- Used Spark and Spark-SQL to read the parquet data and create the tables in hive using the Scala API.
- Performed advanced procedures like text analytics and processing, using the in-memory computing capabilities of Spark using Scala.
- Developed Spark code using Scala and Spark-SQL for faster processing and testing.
- Implemented Spark sample programs in python using pySpark.
- Analyzed the SQL scripts and designed the solution to implement using pySpark.
- Developed pySpark code to mimic the transformations performed in the on-premise environment.
- Used Spark-Streaming APIs to perform necessary transformations and actions on the fly for building the common learner data model which gets the data from Kafka in near real time.
- Responsible for loading Data pipelines from web servers and Teradata using Sqoop with Kafka and Spark Streaming API.
- Developed Kafka producer and consumers, Cassandra clients and Spark along with components on HDFS, Hive.
- Populated HDFS and HBase with huge amounts of data using Apache Kafka.
- Used Kafka to ingest data into Spark engine.
- Configured deployed and maintained multi-node Dev and Test Kafka Clusters.
- Managing and scheduling Spark Jobs on a Hadoop Cluster using Oozie.
- Experienced with different scripting language like Python and shell scripts.
- Developed various Python scripts to find vulnerabilities with SQL Queries by doing SQL injection, permission checks and performance analysis.
- Tested Apache TEZ, an extensible framework for building high performance batch & interactive data processing applications, on Pig and Hive jobs.
- Designed and implemented Incremental Imports into Hive tables and writing Hive queries to run on TEZ.
- Experienced data pipelines using Kafka and Akka for handling large terabytes of data.
- Written shell scripts that run multiple Hive jobs which helps to automate different Hive tables incrementally which are used to generate different reports using Tableau for the Business use.
- Experienced in Apache Spark for implementing advanced procedures like text analytics and processing using the in-memory computing capabilities written in Scala.
- Involved in developing web-services using REST, HBase Native API and Big SQL Client to query data from HBase.
- Developed Solr web apps to query and visualize and solr indexed data from HDFS.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDD, Scala and Python.
- Worked on Spark SQL, created Data frames by loading data from Hive tables and created prep data and stored in AWS S3.
- Using Spark-Streaming APIs to perform transformations and actions on the fly for building the common learner data model which gets the data from Kafka in near real time and Persists into Cassandra.
- Involvement in creating custom UDFs for Pig and Hive to consolidate strategies and usefulness of Python into Pig Latin and HQL (HiveQL).
- Extensively worked on Text, ORC, Avro and Parquet file formats and compression techniques like Snappy, Gzip and Zlib.
- Implemented Hortonworks NiFi (HDP 2.4) and recommended solution to inject data from multiple data sources to HDFS and Hive using NiFi.
- Hands on work administering applications and helping with DevOps tasks
- Developed various data loading strategies and performed various transformations for analyzing the datasets by using Hortonworks Distribution for Hadoop ecosystem.
- Ingested data from RDBMS and performed data transformations, and then export the transformed data to Cassandra as per the business requirement and used Cassandra through Java services.
- Experience in NoSQL Column-Oriented Databases like Cassandra and its Integration with Hadoop cluster.
- Wrote ETL jobs to read from web APIs using REST and HTTP calls and loaded into HDFS using java and Talend.
- Along with the Infrastructure team, involved in design and developed Kafka and Storm based data pipeline.
- Developed software to process, cleanse, and report on vehicle data utilizing various analytics and REST API languages like Java, Scala and Akka Asynchronous programming Framework
- Created Partitions, Buckets based on State to further process using Bucket based Hive joins.
- Worked on Sequence files, RC files, Map side joins, bucketing, partitioning for Hive performance enhancement and storage improvement and utilizing HiveSerDes like REGEX, JSON and AVRO.
- Experiencing working in a DevOps model and a passion for automation
- Used OOZIE Operational Services for batch processing and scheduling workflows dynamically.
- Worked totally in agile methodology and developed Spark scripts by using Scala shell.
- Involved in loading and transforming large Datasets from relational databases into HDFS and vice-versa using Sqoop imports and export.
- Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and extracted data from MYSQL into HDFS vice-versa using Sqoop.
- Written shell scripts that run multiple Hive jobs which helps to automate different Hive tables incrementally which are used to generate different reports using Tableau for the Business use.
- Used Hibernate ORM framework with Spring framework for data persistence and transaction management.
Environment: Hadoop, Hive, Mapreduce, Sqoop, Kafka, Spark, Yarn, Pig, Cassandra, Oozie, shell Scripting, Scala, Maven, Java, JUnit, agile methodologies, NIFI, MySQL, Tableau, AWS, EC2, S3, Hortonworks, power BI, Solr.
Senior Hadoop/Spark Developer
Confidential, Raritan, NJ
Responsibilities:
- Worked on Hadoop cluster and data querying tools Hive to store and retrieve data.
- While developing applications involved in complete Software Development Life Cycle (SDLC).
- Reviewing and managing Hadoop log files by consolidating logs from multiple machines using flume.
- Developed Oozie workflow for scheduling ETL process and Hive Scripts.
- Started using apache NiFi to copy the data from local file system to HDFS.
- Involved in teams to analyze the Anomaly detection and ratings of data.
- Implemented custom input format and record reader to read XML input efficiently using SAX parser.
- Analyze database and compare it with other open-source NoSQL databases to find which one of them better suites the current requirement.
- Integrated Cassandra as a distributed persistent metadata store to provide metadata resolution for network entities on the network.
- Implemented Spark using Scala and Spark SQL for faster testing and processing of data.
- Having experience on RDD architecture and implementing Spark operations on RDD and also optimizing transformations and actions in Spark.
- Involved in working with Impala for data retrieval process.
- Designed multiple Python packages that were used within a large ETL process used to load 2TB of data from an existing Oracle database into a new PostgreSQL cluster.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDD's.
- Loaded data from Linux file system to HDFS and vice-versa
- Developed UDF's using both Data Frames/SQL and RDD in Spark for data Aggregation queries and reverting back into OLTP through Sqoop.
- Experience with DevOps and automation frameworks, including Chef, Docker, Puppet, or Jenkins
- POC for enabling member and suspect search using Solr.
- Worked on ETL methods for data extraction, transformation and loading in corporate-wide ETL Solutions and Data warehouse tools for reporting and data analysis.
- Used CSV Excel Storage to parse with different delimiters in PIG.
- Installed and monitored Hadoop ecosystems tools on multiple operating systems like Ubuntu, CentOS.
- Performed advanced procedures like text analytics and processing, using the in-memory computing capabilities of Spark using Scala.
- Used Rest ApI to Access HBase data to perform analytics.
- Modified reports and Talend ETL jobs based on the feedback from QA testers and Users in development and staging environments. Involved in setting QA environment by implementing pig and Sqoop scripts.
- Got chance working on Apache NiFi like executing Spark script, Sqoop scripts through NiFi, worked on creating scatter and gather pattern in NiFi, ingesting data from Postgres to HDFS, Fetching Hive metadata and storing in HDFS, created a custom NiFi processor for filtering text from Flow files etc.
- Responsible for designing and implementing ETL process using Talend to load data from Worked extensively with Sqoop for importing and exporting the data from HDFS to Relational Database systems/mainframe and vice-versa.
- Developed Pig Latin scripts to do operations of sorting, joining and filtering enterprise data.
- Implemented test scripts to support test driven development and integration.
- Developed multiple MapReduce jobs in java to clean datasets.
- Involved in loading data from Linux file systems, servers, java web services using Kafka producers and consumers.
- Involved in developing code to write canonical model JSON records from numerous input sources to Kafka Queues.
- Performed streaming of data into Apache ignite by setting up cache for efficient data analysis.
- Collected the logs data from web servers and integrated in to HDFS using Flume.
- Developed UNIX shell scripts for creating the reports from Hive data.
- Manipulate, serialize, model data in multiple forms like JSON, XML. Involved in setting up MapReduce 1 and MapReduce 2.
- Prepared Avro schema files for generating Hive tables and Created Hive tables and loaded the data in to tables and query data using HQL.
- Installed and Configured Hadoop cluster using Amazon Web Services (AWS) for POC purposes.
Environment: Hadoop MapReduce 2 (YARN), Nifi, HDFS, PIG, Hive, Flume, Cassandra, Eclipse, Ignite Core Java, Sqoop, Spark, Splunk, Maven, Spark SQL, Cloudera, Solr, Talend, Linux shell scripting.
Hadoop Developer
Confidential, Pasadena, CA
Responsibilities:
- Installed and configured Hadoop Map Reduce, HDFS, developed multiple Map Reduce jobs in Java for data cleaning and preprocessing.
- Experience in installing, configuring and using Hadoop Ecosystem components.
- Experience in Importing and exporting data into HDFS and Hive using Sqoop.
- Load and transform large sets of structured, semi structured and unstructured data.
- Worked on different file formats like Sequence files, XML files and Map files using Map Reduce Programs.
- Responsible for managing data coming from different sources.
- Continuous monitoring and managing the Hadoop cluster using Cloudera Manager.
- Strong expertise on MapReduce programming model with XML, JSON, CSV file formats.
- Gained good experience with NOSQL database.
- Involved in creating Hive tables, loading with data and writing hive queries, which will run internally in map, reduce way.
- Responsible for building scalable distributed data solutions using Hadoop.
- Involved in collecting, aggregating and moving data from servers to HDFS using Apache Flume.
- Experience in managing and reviewing Hadoop log files.
- Involved in loading data from LINUX file system to HDFS.
- Implemented test scripts to support test driven development and continuous integration.
- Created Pig Latin scripts to sort, group, join and filter the enterprise wise data.
- Worked on tuning the performance Pig queries.
- Mentored analyst and test team for writing Hive Queries.
- Installed Oozie workflow engine to run multiple MapReduce jobs.
- Implemented working with different sources using Multi Input formats using Generic and Object Writable.
- Cluster co-ordination services through Zookeeper.
- Extensive Working knowledge of partitioned table, UDFs, performance tuning, compression-related properties, thrift server in Hive.
- Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
Environment: Cloudera CDH 4, HDFS, Hadoop 2.2.0 (Yarn), Flume 1.5.2, Eclipse, Map Reduce, Hive 1.1.0, Pig Latin 0.14.0, Java, SQL, Sqoop 1.4.6, Centos, Zookeeper 3.5.0 and NOSQL database.
Java Developer
Confidential
Responsibilities:
- Generated Domain Layer classes using DAO’s from the Database Schema.
- Defined set of classes for the Helper Layer which validates the Data Models from the Service Layer and prepares them to display in JSP Views.
- Design and develop the interface to interact with web services for card payments
- Performed enhancements to existing SOAP web services for online card payments
- Performed enhancements to existing payment screens by developing servlets and JSP Pages
- Involved in end to end batch loading process using ETL Informatica
- Transformed the Use Cases into Class Diagrams, Sequence Diagrams and State diagrams
- Developed Validation Layer providing Validator classes for input validation, pattern validation and access control
- Used AJAX calls to dynamically assemble the data in JSP page, on receiving user input.
- Used Log4J to print the logging, debugging, warning, info on the server console.
- Involved in creation of Test Cases for JUnit Testing and carried out Unit testing.
- Used ClearCase as configuration management tool for code versioning and release deployment on Oracle WebLogic Server 10.3.
- Used MAVEN tool for deployment of the web application on the WebLogic Server.
- Interacted with business team to transform requirements into technical solutions.
- Involved in the functional tests of the application and also resolved production issues.
- Designed and Developed application using EJB and Spring framework.
- Developed POJO’s for Data Model to map the Java Objects with Relational database tables.
- Designed and developed Service layer using spring framework.
Environment: Java, J2EE, Spring, SOAP, Web Services, MAVEN, Solaris, WEBLOGIC7.0, Oracle 8i, Informatica 8.5, Mainframe, OSS/BSS, Log4j, Servlets, JSP, JSTL, JDBC, HTML, Java Script, CSS, Rational Rose, UML
Java Developer
Confidential
Responsibilities:
- Actively involved from fresh start of the project, requirement gathering to quality assurance testing.
- Coded and Developed Multi-tier architecture in Java, J2EE, Servlets.
- Conducted analysis, requirements study and design according to various design patterns and developed rendering to the use cases, taking ownership of the features.
- Used various design patterns such as Command, Abstract Factory, Factory, and Singleton to improve the system performance.
- Analyzing the critical coding defects and developing solutions.
- Developed configurable front end using Struts technology. Also involved in component-based development of certain features which were reusable across modules.
- Designed, developed and maintained the data layer using the ORM framework called Hibernate.
- Used Hibernate framework for Persistence layer, involved in writing Stored Procedures for data retrieval and data storage and updates in Oracle database using Hibernate.
- Developed batch jobs which will run on specified time to implement certain logic in java platform.
- Developing & deploying Archive files (EAR, WAR, JAR) using ANT build tool.
- Used Software development best practices for Object Oriented Design and methodologies throughout Object oriented development cycle.
- Active participation in architecture framework design and coding and test plan development.
- Strictly followed Water Fall development methodologies for implementing projects.
- Thoroughly documented the detailed process flow with UML diagrams and flow charts for distribution across various teams.
- Involved in developing training presentations for developers (off shore support), QA, Production support.
Environment: Java, Ajax, Informatica Power Center 8.x/9.x, REST API, SOAP API, Apache, Oracle 10/11g, SQL Loader, MYSQL SERVER, Flat Files, Targets, Aggregator, Router, Sequence Generator.