Sr. Spark/ Hadoop Developer Resume
Newport Beach, CA
SUMMARY:
- Having 8+ years of experience in IT which includes Analysis, Design, Development, Implementation & maintenance of projects in Big D Confidential using Apache Hadoop/Spark echo systems, design and development of web applications using Java technologies.
- Experience in analysis, design, development and integration using Big D Confidential Hadoop ecosystem components with cloudera in working with various file formats like Avro, Parquet
- Working with various compression techniques like Snappy, LZO and GZip.
- Experience in developing customized partitioners and combiners for effective d Confidential distributions.
- Expertise in tuning Impala queries to overcome multiple concurrence jobs and out of memory errors for various analytics use cases
- Rigorously applied transformations in Spark and R programs.
- Worked with AWS cloud based CDH 5.13 cluster and developed merchant camp Confidential ns using Hadoop.
- Developed multiple role - based modules for Apache sentry in Hive.
- Expertise in using built in Hive SerDe and developing custom SerDes.
- Developed multiple Internal and external Hive Tables using Dynamic Partitioning & bucketing.
- Design and development of full text search feature with multi-tenancy elastic search after collecting the real time d Confidential through Spark streaming
- Experience in analyzing large scale d Confidential to identify new analytics, insights, trends, and relationships with a strong focus on d Confidential clustering.
- Wrote multiple customized MapReduce Programs for various Input file formats.
- Experience in developing NoSQL applications using Mongodb, HBase and Cassandra.
- Tuned multiple spark applications for better optimization.
- Developed d Confidential pipeline for real time use cases using Kafka, Flume and Spark Streaming.
- Experience in importing and exporting multi Terabytes of d Confidential using Sqoop from HDFS, Hive to Relational D Confidential base Systems (RDBMS) and vice-versa.
- Developed multiple hive views for accessing HBase Tables d Confidential .
- Used complex Spark SQL programs for better joining and display the results on Kibana dashboard.
- Expertise in using various formats like Text, Parquet while creating Hive Tables.
- Experience in analyzing large scale d Confidential to identify new analytics, insights, trends, and relationships with a strong focus on d Confidential clustering.
- Expertise in collecting d Confidential from various source systems as social media and d Confidential bases.
- End-to-end hands on in ETL process and setup automation to load terabytes d Confidential into HDFS.
- Good Experience in Developing Applications using core java, Collections, Threads, JDBC,
- Servlets, JSP, Struts, Hibernate, XML components using various IDEs such as
- Eclipse6.0, MyEclipse.
- Experience in SQL programming in writing queries using joins, stored procedures, triggers, functions and performing query optimization techniques with Oracle, SQL Server, MySQL.
- Excellent team worker with good interpersonal skills and leadership qualities.
- Excellent organizational and communication skills.
- Excellent in understanding of Agile and scrum methodologies.
TECHNICAL SKILLS:
Programming Languages: C, C++, Java, Scala, R, Python, UNIX
Distributed Computing: Apache Hadoop, HDFS, MapReduce, Pig, Hive, Oozie, Hue, Kerberos, Sentry, Zookeeper, Kafka, Flume, Impala, HBase and Sqoop
AWS Components: EC2, S3, RDS, Redshift, EMR, DynamoDB, Lambda, RDS, SNS, SQS
Web Development: HTML, JSP, XML, Java Script and AJAX
Web Application Server: Tomcat 6.0, JBoss 4.2 and Web Logic 8.1
Operating Systems: Windows, Unix, iOS, Ubuntu and RedHat Linux
Tools: Eclipse, NetBeans, Visual Studio, Agitator, Bugzilla, ArcStyler (MDA), Rational Rose, Enterprise Architect and Rational Software Architect
Source Control Tools: VSS, Rational Clear Case, Subversion
Application Framework: Struts 1.3, spring 2.5, Hibernate 3.3, Jasper Reports, JUnit and JAXB
RDBMS: Oracle and SQL Server 2016
NOSQL: MongoDB, Cassandra and HBase
PROFESSIONAL EXPERIENCE:
Confidential, Newport Beach, CA
Sr. Spark/ Hadoop Developer
Responsibilities:
- Responsible for building scalable distributed d Confidential solutions using Hadoop.
- Participate in all stages of Software Development Life Cycle (SDLC) including requirements gathering, system Analysis, system development, unit testing and performance testing.
- Installed Oozie workflow engine to run multiple Hive, Shell Script, Sqoop, pig and Java jobs.
- Implemented Spark using Scala and Spark SQL for faster testing and processing of d Confidential .
- Involved in story-driven Agile development methodology and actively participated in daily scrum meetings.
- Developed Apache spark-based programs to implement complex business transformations
- Used Hive to analyze partitioned and bucketed d Confidential and compute various metrics for reporting.
- Developed unloading micro services using Scala API in Spark D Confidential frame API for the semantic layer.
- Used Spark SQL and Spark D Confidential frame extensively to cleanse and integrate imported d Confidential into more meaningful insights
- Migrated associated business logic (PL/SQL procedures/functions) to Apache Spark
- Involved in Unit testing and delivered Unit test plans and results documents using JUnit and MrUnit.
- Used Spark API over Cloudera Hadoop YARN to perform analytics on d Confidential in Hive.
- Created scripts to sync d Confidential between local MongoDB and Postgres d Confidential bases with those on AWS Cloud.
- Used Oozie engine for creating workflow and coordinator jobs that schedule and execute various
- Created Kafka producer API to send live stream json d Confidential into various Kafka topics.
- Created concurrent access for Hive tables with shared and exclusive locking that can be enabled in Hive with the help of Zookeeper implementation in the cluster.
- Created web-based User interface for creating, monitoring and controlling d Confidential flows using Apache Nifi.
- Worked on a POC to compare processing time of Impala with Apache Hive for batch applications to implement the former in project.
- Extensively used Spark - Cassandra connector to load d Confidential to and from Cassandra.
- Involved in the Design Phase for getting live event d Confidential from the d Confidential base to the front-end application using Spark Ecosystem.
- Worked on various compression and file formats like Avro, Parquet and Text formats
- Created a complete processing engine, based on Cloudera distribution, enhanced performance.
- Worked with NoSQL d Confidential bases like HBase in creating HBase tables to load large sets of semi structured d Confidential coming from various sources.
- Written event-driven, link tracking system to capture user events and feed to Kafka to push it to HBase.
- Implemented Spark using Scala and performed cleansing of d Confidential by applying Transformations and Actions
- Developed MapReduce programs in Java to search production logs and web analytics logs for application issues.
- Involved in build/deploy applications using Maven and integrated with CI/CD server Jenkins.
Environment: Hadoop 3.0, Oozie 5.0, Hive 2.3, Sqoop 1.4, Pig 0.17, Java, Spark 2.4, Scala 2.13, Agile, PL/SQL, JUnit 5.4, AWS, Kafka 2.2, Zookeeper 3.4, Cassandra 3.11, HC Confidential log, NoSQL, Maven 3.6, Jenkins 2.16
Confidential, Bowie, MD
Hadoop/Spark Developer
Responsibilities:
- End-to-end involvement in d Confidential ingestion, cleansing, and transformation in Hadoop.
- Created Hive tables, load and transform large sets of structured and semi structured d Confidential
- Logical implementation and interaction with HBase.
- Developed multiple scala/spark jobs for d Confidential transformation and aggregation
- Implemented PIG scripts to load d Confidential from and to store d Confidential into Hive using hC Confidential log
- Write scripts to automate application deployments and configurations. Monitoring YARN applications. Troubleshoot and resolve cluster related system problems.
- Optimizing of existing word2vec algorithms in Hadoop using Spark Context, Spark-SQL, D Confidential Frames and Pair RDD's in development of chatbot using OpenNLP and Word2Vec.
- Produce unit tests for Spark transformations and helper methods
- Implemented various output formats like Sequence file and parquet format in Map reduce programs. Also, implemented multiple output formats in the same program to match the use cases.
- Design and developed d Confidential pipeline using kafka, flume and spark streaming
- Performed benchmarking of the No-SQL d Confidential bases, Cassandra and HBase.
- Hands on experience with Lambda architectures.
- Created d Confidential model for structuring and storing the d Confidential efficiently. Implemented partitioning and bucketing of tables in Cassandra.
- Implemented test scripts to support test driven development and continuous integration.
- Converted text files into Avro then to parquet format for the file to be used with other Hadoop eco system tools.
- Experienced in handling large d Confidential sets using Partitions, Spark in Memory capabilities, Broadcasts in Spark, Effective & efficient Joins, Transformations and other during ingestion process itself.
- Experienced on loading and transforming of large sets of structured, semi structured and unstructured d Confidential .
- HBase tables to load large sets of structured, semi-structured and unstructured d Confidential .
- Used Impala to read, write and query the d Confidential in HDFS from Cassandra and configured Kafka to read and write messages from external programs.
- Handling the importing of d Confidential from various d Confidential sources (media, MySQL) and performing transformations using Hive, MapReduce.
- Ran Pig scripts on Local Mode, Pseudo Mode, Distributed Mode in various stages of testing.
- Performed Importing and exporting d Confidential from SQL server to HDFS and Hive using Sqoop
- Optimizing existing algorithms in Hadoop using Spark Context, Spark-SQL, D Confidential Frames and Pair RDD's.
- Implemented Spark using Scala and Spark SQL for faster testing and processing of d Confidential .
- Create a complete processing engine, based on Cloudera distribution, enhanced performance.
- Developed d Confidential pipeline using Flume, Spark and Hive to ingest, transform and analyzing d Confidential .
- Writing scaladoc-style documentation with all code
- Designed and Modified D Confidential base tables and used HBASE Queries to insert and fetch d Confidential from tables.
- Developing and supporting multiple spark Programs running on the cluster.
- Preparation of Technical architecture and Low -level design documents.
- Tested raw d Confidential and executed performance scripts
Environment: Linux, eclipse, jdk1.8.0, Hadoop2.9.0, flume 1.7.0, HDFS, MapReduce, Pig0.16.0, Spark 2.0, Hive 2.0, Apache-Maven3.0.3
Confidential, Atlanta, GA
Hadoop/Spark Developer
Responsibilities:
- Worked on improving the performance of existing Pig and Hive Queries.
- Developed Oozie workflow engines to automate Hive and Pig jobs.
- Worked on performing Join operations.
- Exported the result set from Hive to MySQL using Sqoop after processing the d Confidential .
- Analyzed the d Confidential by performing Hive queries and running Pig scripts to study customer behavior.
- Used Hive to partition and bucket d Confidential .
- Performed various source d Confidential ingestions, cleansing, and transformation in Hadoop.
- Design and developed many Spark Programs using pyspark
- Produce unit tests for Spark transformations and helper methods
- Creating RDD's and Pair RDD's for Spark Programming.
- Implement Joins, Grouping and Aggregations for the Pair RDD's.
- Write Scaladoc-style documentation with all code
- Experienced in performance tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning.
- Developed Pig Scripts to perform ETL procedures on the d Confidential in HDFS.
- Analyzed the partitioned and bucketed d Confidential and compute various metrics for reporting.
- Created HBase tables to store various d Confidential formats of d Confidential coming from different systems.
- Advanced knowledge in performance troubleshooting and tuning Cassandra clusters.
- Analyzing the source d Confidential to know the quality of d Confidential by using Talend D Confidential Quality.
- Created Scala/Spark jobs for d Confidential transformation and aggregation
- Involved in creating Hive tables, loading with d Confidential and writing hive queries.
- Used Impala to read, write and query the Hadoop d Confidential in HDFS from Cassandra and configured Kafka to read and write messages from external programs.
- Preparation of Technical architecture and Low-level design documents
- Tested raw d Confidential and executed performance scripts.
Environment: eclipse, jdk1.8.0, Hadoop2.8, HDFS, MapReduce, Spark 2.0 Pig0.15.0, Hive2.0, HBase, Apache-Maven3.0.3
Confidential, Chicago, IL
Hadoop Developer
Responsibilities:
- Involved in creating Hive tables, loading with d Confidential and writing hive queries to process the d Confidential .
- Developing and maintaining Workflow Scheduling Jobs in Oozie for importing d Confidential from RDBMS to Hive.
- Implemented Partitioning, Bucketing in Hive for better organization of the d Confidential .
- Involved with the team of fetching live stream d Confidential from DB2 to Hbase table using Spark Streaming and Apache Kafka.
- Developing Spark Streaming program on Scala for importing d Confidential from the Kafka topics into the Hbase tables.
- Developed solutions to pre-process large sets of structured, semi-structured d Confidential, with different file formats (Text file, Avro d Confidential files, Sequence files, Xml and JSon files, ORC and Parquet).
- Involved in the Design Phase for getting live event d Confidential from the d Confidential base to the front-end application using Spark Ecosystem.
- Importing d Confidential from hive table and run SQL queries over imported d Confidential and existing RDD’s Using Spark SQL.
- Responsible for loading and transforming large sets of structured, semi structured and unstructured d Confidential .
- Collected the log d Confidential from web servers and integrated into HDFS using Flume.
- Responsible to manage d Confidential coming from different sources.
- Extracted files from Couch DB and placed into HDFS using Sqoop and pre-process the d Confidential for analysis.
- Developed the subqueries in Hive.
- Partitioning and bucketing the imported d Confidential using HiveQL.
- Partitioning dynamically using dynamic partition insert feature.
- Moving this partitioned d Confidential onto the different tables as per as business requirements.
Environment: eclipse, jdk1.8.0, Hadoop2.8, HDFS, MapReduce, Pig0.15.0, Hive2.0, HBase, Apache-Maven3.0.3
Confidential, Nashville, TN
Java Developer
Responsibilities:
- Actively involved in writing SQL using SQL Query Builder.
- Used JAXB to read and manipulate the xml properties.
- Used JNI for calling the libraries and other implemented functions in C language.
- Handling Server Related issues, new requirement handling, changes and patch movements.
- Involved in developing business logic for skip pay module.
- Understanding the requirements based on the design.
- Coding the business logic methods in core java.
- Involved in development of the Action classes and Action Forms based on the
- Struts framework.
- Participated in client-side validation and server-side validation.
- Involved in creation of struts configuration file and validation file for skip module using
- Struts framework
- Developed java programs, JSP pages and servlets using Spring framework.
- Involved in creating d Confidential base tables, writing complex TSQL queries and stored procedures in the SQL server.
- Worked with AJAX framework to get the asynchronous response for the user request and used JavaScript for the validation.
- Used EJBs in the application and developed Session beans to implement business logic at the middle tier level.
- Developed the Restful Web Services for various XSD schemas.
- Used Servlets to implement Business components.
- Designed and developed required Manager Classes for d Confidential base operations.
- Developed various Servlets for monitoring the application.
- Designed the UML class diagram, Sequence diagrams for Trade Services.
- Designed the complete Hibernate mapping for SQL Server for PDM.
- Designed the complete JAXB classes mapping for various XSD schemas.
- Developed the Restful Web Services for various XSD schemas.
- Involved in writing JUnit test Classes for performing Unit testing.
- Developing a uniform standard of invoices between various customers.
- Automatic generations of invoices for fixed prize contract, once deliverable have been transmitted to sponsor agencies.
- Preparation of Technical architecture and Low-level design documents.
- Tested raw d Confidential and executed performance scripts.
Environment: eclipse neon, jdk1.8.0, Java, Servlets, JSP, EJB, xml, SQL server, Spring JUnit and Eclipse, SQL, UNIX, UML, Apache-Maven3.0.3
Confidential, New York, NY
Java Developer
Responsibilities:
- Improved Code Quality to a commendable level.
- Interacted with business and reporting teams for requirement gathering, configure walkthrough and UAT.
- Developed the Restful Web Services for various XSD schemas.
- Used Servlets to implement Business components.
- Designed and developed required Manager Classes for d Confidential base operations.
- Developed various Servlets for monitoring the application.
- Designed and developed the front-end using HTML and JSP.
- Involved in eliciting the requirement, use case modeling, design, leading the development team and tracking the project status.
- Understanding Functional and Technical Requirement Specification of the customer’s need for the product.
- Impact analysis of the application under development.
- Preparing Design Specification, User Test Plans, and Approach Notes for the application under development.
- Creating UML Diagrams like Class Diagram, Sequence Diagram and Activity Diagram.
- Developing application as per given specifications.
- Creation of User based modules implementing different levels of security.
- Involved in designing of Front-end, Implementing Functionality with Business Logic.
- Mentoring and grooming juniors technically as well as professionally on Agile practices and Java/J2EE development issues.
- Providing access to all customers and billing d Confidential online.
- Allowing department access to accounts receivable information. Subject to security and authorization constraints.
- Involved in designing and Code Reviews.
- Assisted in troubleshooting architectural problems.
- Design and development of report generation by using velocity framework.
- Interacting with business and reporting teams for requirement gathering, configure walkthrough and UAT.
- Generating monthly Sales report, Collection Report and Receivable-aging reports.
- Prepared use cases designed and developed object models and class diagrams.
- Worked on one of the most critical modules for project, right from the beginning phase which included requirement gathering, analysis, design, review and development.
- Module lead located to another location had KT from him about roughly 2 weeks, Lead was absorbed by client.
- Took initiative in building a new team of more than 6 members with proper knowledge transfer sessions assigning and managing tasks with JIRA.
- Learned Backbone JS and worked with UI team on UI enhancements.
- Actively participating in the daily Scrums, understanding new user stories.
- Implementing new requirements after discussion with Scrum masters.
- Working with BA, QA to identify and fix bugs, raise new feature and enhancements.
- Was greatly appreciated by client with appreciation and client bonus of 10k and 50k respectively.
- Analyze the generated Junit and add proper asserts and make it more code specific along with increasing the code coverage. This helped to boast my product knowledge as well as my Junit writing skills.
- Addressed issues related to application integration and compatibility
- Performed enhancements to User Group Tree using DOJO and JSON to meet business needs.
- Provided technical solutions to create various d Confidential base connections to Oracle, Terad Confidential, DB2 in Server and BO Designer for Dev, QA and Prod Environments.
Environment: eclipse helios3.6(64bit), jdk1.6.0 25, Java, Servlets, JSP, EJB, xml, JSON, Dojo, SQL server, Spring, JUnit and Eclipse, SQL, Unix, UML