- Overall 8+ years of experience in Development, Testing, Implementation, Maintenance and Enhancements on various IT Projects and experience in Big Data in implementing end - to-end Hadoop solutions.
- Experience with all stages of the SDLC and Agile Development model right from the requirement gathering to Deployment and production support.
- Good Knowledge in Amazon AWS concepts like EMR and EC2 web services which provides fast and efficient processing
- Extensive experience in installing, configuring and using Big Data ecosystem components like Hadoop MapReduce, HDFS, Sqoop, Pig, Hive, Impala, Spark and Zookeeper
- Expertise in using J2EE application servers such as IBM Web Sphere, JBoss and web servers like Apache Tomcat.
- Experience in different Hadoop distributions like Cloudera (CDH3 & CDH4) and Horton Works Distributions (HDP).
- Experience in developing web services with XML based protocols such as SOAP, Axis, UDDI and WSDL.
- Solid understanding of Hadoop MRV1 and Hadoop MRV2 (or) YARN Architecture.
- Very well verse in writing and deploying Oozie Workflows and Coordinators.
- Highly skill in integrating Amazon Kinesis streams with Spark streaming applications to build long running real-time applications.
- Good working experience on using Sqoop to import data into HDFS from RDBMS and vice-versa.
- Extensive experience in Extraction, Transformation and Loading (ETL) of data from multiple sources into Data Warehouse and Data Mart.
- Good knowledge in SQL and PL/SQL to write Stored Procedures and Functions and writing unit test cases using JUnit.
- Hands on experience in configuring and working with Flume to load the data from multiple sources directly into HDFS.
- In-depth knowledge of handling large amounts of data utilizing Spark Data Frames/Datasets API and Case Classes.
- Good knowledge in implementing various data processing techniques using Apache HBase for handling the data and formatting it as required.
- Working experience in Impala, Mahout, Spark SQL, Storm, Avro, Kafka, and AWS.
- Experience with Java web framework technologies like Apache Camel and Spring Batch.
- Experience in version control and source code management tools like GIT, SVN, and Bit Bucket.
- Hands on experience working with databases like Oracle, SQL Server and MySQL.
- Great knowledge of working with Apache Spark Streaming API on Big Data Distributions in an active cluster environment.
- Proficiency in developing secure enterprise Java applications using technologies such as Maven, Hibernate, XML, HTML, CSS Version Control Systems
- Developing and implementing Apache NIFI across various environments, written QA scripts in Python for tracking files.
- Excellent understanding of Hadoop and underlying framework including storage management.
- Good Knowledge and experience of functionality of NoSQL DB like Cassandra and Mongo DB.
- Extensive experience in building and deploying applications on Web/Application Servers like Web logic, Web sphere, and Tomcat.
- Experience in using ANT for building and deploying the projects in servers and also using JUnit and log4j for debugging.
- Have good experience, excellent communication and interpersonal skills which contribute to timely completion of project deliverable well ahead of schedule.
Hadoop/Big Data Technologies: Hadoop 3.0, HDFS, MapReduce, HBase 1.4, Apache Pig, Hive 2.3, Sqoop 1.4, Apache Impala 2.1, Oozie 4.3, Yarn, Apache Flume 1.8, Kafka 1.1, Zookeeper
Cloud Platform: Amazon AWS, EC2, EC3, MS Azure, Azure SQL Database, Azure SQL Data Warehouse, Azure Analysis Services, HDInsight, Azure Data Lake, Data Factory
Hadoop Distributions: Cloudera, Hortonworks, MapR
Programming Language: Java, Scala, Python 3.6, SQL, PL/SQL, Shell Scripting, Storm 1.0, JSP, Servlets
Frameworks: Spring 5.0.5, Hibernate 5.2, Struts 1.3, JSF, EJB, JMS
Databases: Oracle 12c/11g, SQL
Operating Systems: Linux, Unix, Windows 10/8/7
IDE and Tools: Eclipse 4.7, NetBeans 8.2, IntelliJ, Maven
NoSQL Databases: HBase 1.4, Cassandra 3.11, MongoDB, Accumulo
Web/Application Server: Apache Tomcat 9.0.7, JBoss, Web Logic, Web Sphere
SDLC Methodologies: Agile, Waterfall
Version Control: GIT, SVN, CVS
Confidential - Arlington, VA
Sr. Big Data Developer
- Worked on Big Data infrastructure for batch processing as well as real-time processing. Responsible for building scalable distributed data solutions using Hadoop.
- Involved with all the phases of Software Development Life Cycle (SDLC) methodologies throughout the project life cycle.
- Involved in Agile methodologies, daily scrum meetings, spring planning.
- Developed a JDBC connection to get the data from Azure SQL and feed it to a Spark Job.
- Configured Spark streaming to receive real time data from the Apache Kafka and store the stream data to HDFS using Scala.
- Developed the Sqoop scripts to make the interaction between Hive and vertical Database.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, and Scala.
- Primarily involved in Data Migration process using Azure by integrating with Github repository and Jenkins.
- Deployed the application in Hadoop cluster mode by using spark submit scripts.
- Queried and analyzed data from Cassandra for quick searching, sorting and grouping through CQL.
- Worked on the large-scale Hadoop YARN cluster for distributed data processing and analysis using Spark, Hive.
- Implemented various Hadoop Distribution environments such as Cloudera and Hortonworks.
- Experienced of building Data Warehouse in Azure platform using Azure data bricks and data factory
- Implemented monitoring and established best practices around usage of elastic search
- Worked on Apache Nifi as ETL tool for batch processing and real time processing.
- Moved Relational Database data using Sqoop into Hive Dynamic partition tables using staging tables.
- Upgraded the Hadoop Cluster from CDH3 to CDH4, setting up High Availability Cluster and integrating HIVE with existing applications.
- Captured the data logs from web server into HDFS using Flume for analysis.
- Involved in developing code to write canonical model JSON records from numerous input sources to Kafka Queues.
- Built code for real time data ingestion using Java, Map R-Streams (Kafka) and STORM.
- Involved in identifying job dependencies to design workflow for Oozie & YARN resource management.
- Used GitHub as repository for committing code and retrieving it and Jenkins for continuous integration.
- Reviewed the HDFS usage and system design for future scalability and fault-tolerance.
- Loaded and transformed large sets of structured, semi structured and unstructured data in various formats like text, zip, XML and JSON.
- Utilized Oozie workflow to run Pig and Hive Jobs Extracted files from Mongo DB through Sqoop and placed in HDFS and processed.
- Involved in PL/SQL query optimization to reduce the overall run time of stored procedures.
- Worked on importing data from HDFS to MYSQL database and vice-versa using SQOOP.
- Developed Python, Shell/Perl Scripts and Power shell for automation purpose and Component unit testing using Azure Emulator.
- Built the automated build and deployment framework using Jenkins, Maven etc.
Environment: Hadoop 3.0, Azure, HDFS, Scala 2.12, SQL, Hive 1.2, spark 2.4, Kafka 2.1, YARN, Apache Nifi, ETL, Sqoop 1.4, Flume 1.8, Oozie 4.3, Jenkins, XML, PL/SQL, MYSQL 8.0.15, Python 3.7, GitHub, Hortonworks, Cloudera, MongoDB 4.0.5.
Confidential - Bellevue, WA
Sr. Spark/Hadoop Developer
- Worked on Hadoop cluster scaling from 4 nodes in development environment to 8 nodes in pre-production stage and up to 24 nodes in production.
- Extensively migrated existing architecture to Spark Streaming to process the live streaming data.
- Involved in writing the shell scripts for exporting log files to Hadoop cluster through automated process.
- Utilized Agile and Scrum Methodology to help manage and organize a team of developers with regular code review sessions.
- Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
- Extensively used Hive/HQL or Hive queries to query or search for a particular string in Hive tables in HDFS.
- Worked with multiple teams to provision AWS infrastructure for development and production environments.
- Gathered the business requirements from the Business Partners and Subject Matter Experts.
- Created custom new columns depending up on the use case while ingesting the data into Hadoop Lake using pyspark.
- Designed and developed applications in Spark using Scala to compare the performance of Spark with Hive and SQL/Oracle.
- Stored data in S3 buckets on AWS cluster on top of Hadoop.
- Developed time driven automated Oozie workflows to schedule Hadoop jobs.
- Analyzed Cassandra database and compare it with other open-source NoSQL databases to find which one of them better suites the current requirement.
- Designed and implemented map reduce jobs to support distributed processing using java, Hive, Scala and Apache Pig.
- Involved in collecting, aggregating and moving data from servers to HDFS using Apache Flume.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDD's.
- Worked with NoSQL databases like HBase in creating HBase tables to load large sets of semi-structured.
- Worked with Apache Nifi for managing the flow of data from sources through automated data flow.
- Created, modified and executed DDL and ETL scripts for De-normalized tables to load data into Hive and AWS Redshift tables.
- Developed Date, Currency type conversions, and Sales Spark UDF's to reduce the load on Tableau.
- Implemented POC to migrate MapReduce programs into Spark transformations using and Scala.
- Managed and review data backups, Manage and review Hadoop log files Hortonworks Cluster.
- Used PIG to perform data validation on the data ingested using Sqoop and Flume and the cleansed data set is pushed into MongoDB.
- Extracted files from CouchDB through Sqoop and placed in HDFS and processed.
- Implemented a distributed messaging queue to integrate with Cassandra using Apache Kafka and Zookeeper.
- Participated in development/implementation of Cloudera Impala Hadoop environment.
- Used Test driven approach for developing the application and Implemented the unit tests using Python Unit Test framework
- Involved in Ad Hoc stand up and architecture meetings to set up daily priorities.
Environment: Hadoop 3.0, Spark 2.4, Agile, Sqoop 1.4, Hive 2.3, AWS, pyspark, Scala 2.12, Oozie 4.3, Cassandra 3.11, Apache Pig 0.17, NoSQL, NoSQL, MapReduce, Hortonworks, MongoDB 4.0.5, Zookeeper
Confidential - Lowell, AR
- Installed and configured Hadoop Ecosystem components and Cloudera manager using CDH distribution.
- Worked on S3 buckets on AWS to store Cloud Formation Templates and worked on AWS to create EC2 instances.
- Used Sqoop to import the data from RDBMS to Hadoop Distributed File System (HDFS) and later analyzed the imported data using Hadoop Components
- Used Oozie scripts for deployment of the application and perforce as the secure versioning software.
- Involved in Installing, Configuring Hadoop Eco-System, Cloudera Manager using CDH3, CDH4 Distributions.
- Worked on No-SQL databases like Cassandra, MongoDB for POC purpose in storing images and URIs.
- Involved in creating Hive tables, then applied HiveQL on those tables, this will invoke and run MapReduce jobs automatically.
- Involved in loading and transforming large sets of Structured, Semi-Structured and Unstructured data and analyzed them by running Hive queries and Pig scripts
- Highly skilled in integrating Kafka with Spark streaming for high-speed data processing
- Implemented POC to migrate MapReduce programs into Spark transformations using Spark and Scala.
- Developed different kind of custom filters and handled pre-defined filters on HBase data using API.
- Used Enterprise Data Warehouse database to store the information and to make it access all over organization.
- Exported of result set from HIVE to MySQL using Sqoop export tool for further processing.
- Collected and aggregating large amounts of log data using Flume and staging data in HDFS for further analysis.
- Worked with Avro Data Serialization system to work with JSON data formats.
- Involved in creating Hive tables, loading with data and writing hive queries which runs internally in MapReduce way.
- Worked on Spark for in memory commutations and comparing the Data Frames for optimizing performance.
- Created web services request-response mappings by importing source and target definition using WSDL file.
- Developed custom UDF's to generate unique key for the use in Apache pig transformations.
- Created conversion scripts using Oracle SQL queries, functions and stored procedures, test cases and plans before ETL migrations.
- Developed Shell scripts to read files from edge node to ingest into HDFS partitions based on the file naming pattern.
Environment: Hadoop 2.3, AWS, Sqoop 1.2, HDFS, Oozie 4.1, Cassandra 3.0, MongoDB 3.5, Hive 2.1, Kafka, Spark 2.1, MapReduce, Scala 2.0, MySQL, Flume, JSON
Confidential - McLean, VA
- Applied J2EE Design Patterns such as Factory, Singleton, and Business delegate, DAO, Front Controller Pattern and MVC.
- Implemented modules using Java APIs, Java collection, Threads, XML, and integrating the modules.
- Successfully installed and configured the IBM WebSphere Application server and deployed the business tier components using EAR file.
- Designed and developed business components using Session and Entity Beans in EJB.
- Implemented CORS (Cross Origin Resource Sharing) using Node JS and developed REST services using Node and Express, Mongoose modules.
- Used Log4j for Logging various levels of information like error, info, debug into the log files.
- Developed integrated applications and light weight component using spring framework and IOC features from spring web MVC to configure application context for spring bean factory.
- Used JDBC prepared statements to call from Servlets for database access.
- Used Angular JS to connect the web application to back-end APIs, used RESTFUL methods to interact with several API's.
- Involved in build/deploy applications using Maven and integrated with CI/CD server Jenkins.
- Implemented JQuery features to develop the dynamic queries to fetch the data from database.
- Involved in developing module for transformation of files across the remote systems using JSP and servlets.
- Worked on Development bugs assigned in JIRA for Sprint following agile process.
- Used ANT scripts to fetch, build, and deploy application to development environment.
- Implemented MVC architecture using Apache struts, JSP & Enterprise Java Beans.
- Used AJAX and JSON to make asynchronous calls to the project server to fetch data on the fly.
- Developed batch programs to update and modify metadata of large number of documents in FileNet Repository using CE APIs
- Worked on creating a test harness using POJOs which would come along with the installer and test the services every time the installer would be run.
- Worked on creating Packages, Stored Procedures & Functions in Oracle using PL/SQL and TOAD.
- Used JNDI to perform lookup services for the various components of the system.
- Deployed the application and tested on JBoss Application Server. Collaborated with Business Analysts during design specifications.
- Developed Apache Camel middleware routes, JMS endpoints, spring service endpoints and used Camel free marker to customize REST responses.
- Wrote complex SQL queries and programmed stored procedures, packages, and triggers.
- Designed HTML prototypes, visual interfaces and interaction of Web-based design.
- Designed and developed a business tiers using EJBs and Used Session Beans to encapsulate the Business Logic.
- Involved in the configuration of Spring MVC and Integration with Hibernate.
- Used Eclipse IDE to configure and deploy the application onto WebLogic application server using Maven build scripts to automate the build and deployment process
- Designed CSS based page layouts that are cross-browser compatible and standards-compliant.
- Used spring framework for Dependency Injection and JDBC connectivity.
- Created data source and connection pools in Web Logic and deployed applications on the server
- Developed XML and XSLT pages to store and present data to the user using parsers.
- Developed RESTful Web services client to consume JSON messages
- Implemented business logic with POJO using multithreading and design patterns.
- Created test cases for DAO Layer and service layer using JUNIT and bug tracking using JIRA.
- Used Struts, Front Controller and Singleton patterns, for developing the action and Servlets classes, Involved in
- Wrote REST Web Services for sending and getting data from the external interface
- Implemented the application using Spring Boot Framework and handles the security using spring security.
- Developed stored procedures and triggers using PL/SQL to calculate and update the tables to implement business logic
- Used Subversion (SVN) as the configuration management tool to manage the code repository.
- Used Maven as the build tool and Tortoise SVN as the Source version controller
- Used JQuery for basic animation and end user screen customization purposes.
- Used GIT as Version Control System performed Module and Unit Level Testing with JUnit and log4j
- Participated in Unit testing and functionality testing for tracking errors and debugging the code.