Hadoop/spark Developer Resume
SUMMARY:
- Having 9+ years of experience in IT which includes Analysis, Design, Development, Implementation & maintenance of projects in Big Data using Apache Hadoop/Spark echo systems, design and development of web applications using Java technologies.
- Experience in analysis, design, development and integration using Big Data Hadoop ecosystem components with cloudera in working with various file formats like Avro, Parquet
- Working with various compression techniques like Snappy, LZO and GZip.
- Experience in developing customized partitioners and combiners for effective data distributions.
- Expertise in tuning Impala queries to overcome multiple concurrence jobs and out of memory errors for various analytics use cases
- Rigorously applied transformations in Spark and R programs.
- Worked with AWS cloud based CDH 5.13 cluster and developed merchant campaigns using Hadoop.
- Developed multiple role - based modules for Apache sentry in Hive.
- Expertise in using built in Hive SerDe and developing custom SerDes.
- Developed multiple Internal and external Hive Tables using Dynamic Partitioning & bucketing.
- Design and development of full text search feature with multi-tenancy elastic search after collecting the real time data through Spark streaming
- Experience in analyzing large scale data to identify new analytics, insights, trends, and relationships with a strong focus on data clustering.
- Wrote multiple customized MapReduce Programs for various Input file formats.
- Experience in developing NoSQL applications using Mongodb, HBase and Cassandra.
- Tuned multiple spark applications for better optimization.
- Developed data pipeline for real time use cases using Kafka, Flume and Spark Streaming.
- Experience in importing and exporting multi Terabytes of data using Sqoop from HDFS, Hive to Relational Database Systems (RDBMS) and vice-versa.
- Developed multiple hive views for accessing HBase Tables data.
- Used complex Spark SQL programs for better joining and display the results on Kibana dashboard.
- Expertise in using various formats like Text, Parquet while creating Hive Tables.
- Experience in analyzing large scale data to identify new analytics, insights, trends, and relationships with a strong focus on data clustering.
- Expertise in collecting data from various source systems as social media and databases.
- End-to-end hands on in ETL process and setup automation to load terabytes data into HDFS.
- Good Experience in Developing Applications using core java, Collections, Threads, JDBC,
- Servlets, JSP, Struts, Hibernate, XML components using various IDEs such as
- Eclipse6.0, MyEclipse.
- Experience in SQL programming in writing queries using joins, stored procedures, triggers, functions and performing query optimization techniques with Oracle, SQL Server, MySQL.
- Excellent team worker with good interpersonal skills and leadership qualities.
- Excellent organizational and communication skills.
- Excellent in understanding of Agile and scrum methodologies.
TECHNICAL SKILLS:
Programming Languages: C, C++, Java, Scala, R, Python, UNIX
Distributed Computing: Apache Hadoop, HDFS, MapReduce, Pig, Hive, Oozie, Hue, Kerberos, Sentry, Zookeeper, Kafka, Flume, Impala, HBase and Sqoop
AWS Components: EC2, S3, RDS, Redshift, EMR, DynamoDB, Lambda, RDS, SNS, SQS
Web Development: HTML, JSP, XML, Java Script and AJAX
Web Application Server: Tomcat 6.0, JBoss 4.2 and Web Logic 8.1
Operating Systems: Windows, Unix, iOS, Ubuntu and RedHat Linux
Tools: Eclipse, NetBeans, Visual Studio, Agitator, Bugzilla, ArcStyler (MDA), Rational Rose, Enterprise Architect and Rational Software Architect
Source Control Tools: VSS, Rational Clear Case, Subversion
Application Framework: Struts 1.3, spring 2.5, Hibernate 3.3, Jasper Reports, JUnit and JAXB
RDBMS: Oracle and SQL Server 2016
NOSQL: MongoDB, Cassandra and HBase
PROFESSIONAL EXPERIENCE:
Confidential
Hadoop/Spark Developer
Responsibilities:
- End-to-end involvement in data ingestion, cleansing, and transformation in Hadoop.
- Created Hive tables, load and transform large sets of structured and semi structured data
- Logical implementation and interaction with HBase.
- Developed multiple scala/spark jobs for data transformation and aggregation
- Implemented PIG scripts to load data from and to store data into Hive using hCatalog
- Write scripts to automate application deployments and configurations. Monitoring YARN applications. Troubleshoot and resolve cluster related system problems.
- Optimizing of existing word2vec algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames and Pair RDD's in development of chatbot using OpenNLP and Word2Vec.
- Produce unit tests for Spark transformations and helper methods
- Implemented various output formats like Sequence file and parquet format in Map reduce programs. Also, implemented multiple output formats in the same program to match the use cases.
- Design and developed data pipeline using kafka, flume and spark streaming
- Performed benchmarking of the No-SQL databases, Cassandra and HBase.
- Hands on experience with Lambda architectures.
- Created data model for structuring and storing the data efficiently. Implemented partitioning and bucketing of tables in Cassandra.
- Implemented test scripts to support test driven development and continuous integration.
- Converted text files into Avro then to parquet format for the file to be used with other Hadoop eco system tools.
- Experienced in handling large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark, Effective & efficient Joins, Transformations and other during ingestion process itself.
- Experienced on loading and transforming of large sets of structured, semi structured and unstructured data.
- HBase tables to load large sets of structured, semi-structured and unstructured data.
- Used Impala to read, write and query the data in HDFS from Cassandra and configured Kafka to read and write messages from external programs.
- Handling the importing of data from various data sources (media, MySQL) and performing transformations using Hive, MapReduce.
- Ran Pig scripts on Local Mode, Pseudo Mode, and Distributed Mode in various stages of testing.
- Performed Importing and exporting data from SQL server to HDFS and Hive using Sqoop
- Optimizing existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames and Pair RDD's.
- Implemented Spark using Scala and Spark SQL for faster testing and processing of data.
- Create a complete processing engine, based on Cloudera distribution, enhanced performance.
- Developed data pipeline using Flume, Spark and Hive to ingest, transform and analyzing data.
- Writing scaladoc-style documentation with all code
- Designed and Modified Database tables and used HBASE Queries to insert and fetch data from tables.
- Developing and supporting multiple spark Programs running on the cluster.
- Preparation of Technical architecture and Low -level design documents.
- Tested raw data and executed performance scripts
Environment: Linux, eclipse, jdk1.8.0, Hadoop2.9.0, flume 1.7.0, HDFS, MapReduce, Pig0.16.0, Spark 2.0, Hive 2.0, Apache-Maven3.0.3
Confidential
Hadoop/Spark Developer
Responsibilities:
- Worked on improving the performance of existing Pig and Hive Queries.
- Developed Oozie workflow engines to automate Hive and Pig jobs.
- Worked on performing Join operations.
- Exported the result set from Hive to MySQL using Sqoop after processing the data.
- Analyzed the data by performing Hive queries and running Pig scripts to study customer behavior.
- Used Hive to partition and bucket data.
- Performed various source data ingestions, cleansing, and transformation in Hadoop.
- Design and developed many Spark Programs using pyspark
- Produce unit tests for Spark transformations and helper methods
- Creating RDD's and Pair RDD's for Spark Programming.
- Implement Joins, Grouping and Aggregations for the Pair RDD's.
- Write Scaladoc-style documentation with all code
- Experienced in performance tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning.
- Developed Pig Scripts to perform ETL procedures on the data in HDFS.
- Analyzed the partitioned and bucketed data and compute various metrics for reporting.
- Created HBase tables to store various data formats of data coming from different systems.
- Advanced knowledge in performance troubleshooting and tuning Cassandra clusters.
- Analyzing the source data to know the quality of data by using Talend Data Quality.
- Created Scala/Spark jobs for data transformation and aggregation
- Involved in creating Hive tables, loading with data and writing hive queries.
- Used Impala to read, write and query the Hadoop data in HDFS from Cassandra and configured Kafka to read and write messages from external programs.
- Preparation of Technical architecture and Low-level design documents
- Tested raw data and executed performance scripts.
Environment: eclipse, jdk1.8.0, Hadoop2.8, HDFS, MapReduce,Spark 2.0 Pig0.15.0, Hive2.0, HBase, Apache-Maven3.0.3
Confidential
Hadoop Developer
Responsibilities:
- Involved in creating Hive tables, loading with data and writing hive queries to process the data.
- Developing and maintaining Workflow Scheduling Jobs in Oozie for importing data from RDBMS to Hive.
- Implemented Partitioning, Bucketing in Hive for better organization of the data.
- Involved with the team of fetching live stream data from DB2 to Hbase table using Spark Streaming and Apache Kafka.
- Developing Spark Streaming program on Scala for importing data from the Kafka topics into the Hbase tables.
- Developed solutions to pre-process large sets of structured, semi-structured data, with different file formats (Text file, Avro data files, Sequence files, Xml and JSon files, ORC and Parquet).
- Involved in the Design Phase for getting live event data’s from the database to the front end application using Spark Ecosystem.
- Importing data from hive table and run SQL queries over imported data and existing RDD’s Using Spark SQL.
- Responsible for loading and transforming large sets of structured, semi structured and unstructured data.
- Collected the log data from web servers and integrated into HDFS using Flume.
- Responsible to manage data coming from different sources.
- Extracted files from Couch DB and placed into HDFS using Sqoop and pre-process the data for analysis.
- Developed the subqueries in Hive.
- Partitioning and bucketing the imported data using HiveQL.
- Partitioning dynamically using dynamic-partition insert feature.
- Moving this partitioned data onto the different tables as per as business requirements.
Environment: eclipse, jdk1.8.0, Hadoop2.8, HDFS, MapReduce, Pig0.15.0, Hive2.0, HBase, Apache-Maven3.0.3
Confidential
Java Developer
Responsibilities:
- Actively involved in writing SQL using SQL Query Builder.
- Used JAXB to read and manipulate the xml properties.
- Used JNI for calling the libraries and other implemented functions in C language.
- Handling Server Related issues, new requirement handling, changes and patch movements.
- Involved in developing business logic for skip pay module.
- Understanding the requirements based on the design.
- Coding the business logic methods in core java.
- Involved in development of the Action classes and Action Forms based on the
- Struts framework.
- Participated in client-side validation and server-side validation.
- Involved in creation of struts configuration file and validation file for skip module using
- Struts framework
- Developed java programs, JSP pages and servlets using Spring framework.
- Involved in creating database tables, writing complex TSQL queries and stored procedures in the SQL server.
- Worked with AJAX framework to get the asynchronous response for the user request and used JavaScript for the validation.
- Used EJBs in the application and developed Session beans to implement business logic at the middle tier level.
- Developed the Restful Web Services for various XSD schemas.
- Used Servlets to implement Business components.
- Designed and developed required Manager Classes for database operations.
- Developed various Servlets for monitoring the application.
- Designed the UML class diagram, Sequence diagrams for Trade Services.
- Designed the complete Hibernate mapping for SQL Server for PDM.
- Designed the complete JAXB classes mapping for various XSD schemas.
- Developed the Restful Web Services for various XSD schemas.
- Involved in writing JUnit test Classes for performing Unit testing.
- Developing a uniform standard of invoices between various customers.
- Automatic generations of invoices for fixed prize contract, once deliverable have been transmitted to sponsor agencies.
- Preparation of Technical architecture and Low-level design documents.
- Tested raw data and executed performance scripts.
Environment: eclipse neon, jdk1.8.0, Java, Servlets, JSP, EJB, xml, SQL server, Spring JUnit and Eclipse, SQL, UNIX, UML, Apache-Maven3.0.3
Confidential
Java Developer
Responsibilities:
- Improved Code Quality to a commendable level.
- Interacted with business and reporting teams for requirement gathering, configure walkthrough and UAT.
- Developed the Restful Web Services for various XSD schemas.
- Used Servlets to implement Business components.
- Designed and developed required Manager Classes for database operations.
- Developed various Servlets for monitoring the application.
- Designed and developed the front end using HTML and JSP.
- Involved in eliciting the requirement, use case modeling, design, leading the development team and tracking the project status.
- Understanding Functional and Technical Requirement Specification of the customer’s need for the product.
- Impact analysis of the application under development.
- Preparing Design Specification, User Test Plans, and Approach Notes for the application under development.
- Creating UML Diagrams like Class Diagram, Sequence Diagram and Activity Diagram.
- Developing application as per given specifications.
- Creation of User based modules implementing different levels of security.
- Involved in designing of Front-end, Implementing Functionality with Business Logic.
- Mentoring and grooming juniors technically as well as professionally on Agile practices and Java/J2EE development issues.
- Providing access to all customers and billing data online.
- Allowing department access to accounts receivable information. Subject to security and authorization constraints.
- Involved in designing and Code Reviews.
- Assisted in troubleshooting architectural problems.
- Design and development of report generation by using velocity framework.
- Interacting with business and reporting teams for requirement gathering, configure walkthrough and UAT.
- Generating monthly Sales report, Collection Report and Receivable-aging reports.
- Prepared use cases designed and developed object models and class diagrams.
- Worked on one of the most critical modules for project, right from the beginning phase which included requirement gathering, analysis, design, review and development.
- Module lead located to another location had KT from him about roughly 2 weeks, Lead was absorbed by client.
- Took initiative in building a new team of more than 6 members with proper knowledge transfer sessions assigning and managing tasks with JIRA.
- Learned Backbone JS and worked with UI team on UI enhancements.
- Actively participating in the daily Scrums, understanding new user stories.
- Implementing new requirements after discussion with Scrum masters.
- Working with BA, QA to identify and fix bugs, raise new feature and enhancements.
- Was greatly appreciated by client with appreciation certificate and client bonus of 10k and 50k respectively.
- Analyze the generated Junit and add proper asserts and make it more code specific along with increasing the code coverage. This helped to boast my product knowledge as well as my Junit writing skills.
- Addressed issues related to application integration and compatibility
- Performed enhancements to User Group Tree using DOJO and JSON to meet business needs.
- Provided technical solutions to create various database connections to Oracle, Teradata, DB2 in Server and BO Designer for Dev, QA and Prod Environments.
Environment: eclipse helios3.6(64bit), jdk1.6.0 25, Java, Servlets, JSP, EJB, xml, JSON, Dojo, SQL server, Spring, JUnit and Eclipse, SQL, Unix, UML