- 8+ years of total experience in Designing and Developing client/server and web based applications using J2EE technologies, which includes 5 years of experience in Big Data with good knowledge on HDFS and Ecosystem.
- Excellent understanding / knowledge ofHadooparchitecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce programming paradigm.
- Hands on experience in installing, configuring, and usingHadoopecosystem components likeMap Reduce, HDFS, HBase, Oozie, Hive, Sqoop, Pig, and Flume.
- Experience in working with large scaleHadoop environments build and support including design, configuration, installation, performance tuning and monitoring.
- Experience in importing and exporting terabytes of data using Sqoop from HDFS to Relational Database Systems and vice - versa.
- Experience in architecting Hadoop clusters using major Hadoop Distributions - CDH3&CDH4&CDH5.
- Experience in managing and troubleshooting Hadoop related issues
- Experience in installation, configuration, management and deployment of Big Data solutions and the underlying infrastructure of Hadoop Cluster.
- Knowledge in job/workflow scheduling and monitoring tools like Oozie & Zookeeper.
- Experience in analyzing data using HIVEQL, PIG Latin and custom Map Reduce programs in JAVA. Extending Hive and PIG core Functionality by using custom User Defined Functions.
- Worked with application teams to install operating system, Hadoop updates, patches and version upgrades as required.
- Hands on experience in virtualization and worked on VMware Virtual Center
- Experience in designing, developing and implementing connectivity products that allow efficient exchange of data between our core database engine and theecosystem in Hadoop.
- Designed and modeled projects using techniques in UML - Use Cases, Class Diagrams, Sequence Diagrams, etc.
- Extensive experience inRequirements gathering, Analysis, Design, Reviews, Codingand Code Reviews,Unit and Integration Testing.
- Experience in using different applications development frameworks like Hibernate, Struts, and spring for developing integrated applications and different light weight business components.
- Experience in developing service components using JDBC.
- Experienceindeveloping and designing Web Services (SOAP and Restful Web services).
- Experience in developing Web Interface using Servlets, JSP and Custom Tag Libraries.
- Good knowledge and working experience in XML related technologies.
- Experience in using Java, JEE, J2EE design Patterns like Singleton, Factory, MVC, Front Controller, for reusing most effective and efficient strategies.
- Expertise in using IDE like WebSphere (WSAD), Eclipse, NetBeans, My Eclipse, WebLogic Workshop.
- Extensive experience in writing SQL queries for Oracle, Hadoop and DB2 databases using SQLPLUS. Hands on experience in working with oracle (9i/10g/11g), DB2, NoSQL, MySQL and knowledge on SQL Server.
- Extensive experience in using SQL and PL/SQL to write Stored Procedures, Functions and Triggers.
- Excellent technical, logical, code debugging and problem solving capabilities and ability to watch the future environment, the competitor and customers probable activities carefully.
- Proven ability to work effectively in both independent and team situations with positive results. Inclined towards building a strong team/work environment, and have the ability to accustom to the latest technologies and situations with ease.
Hadoop/Big Data: Hadoop 1.x/2.x (Yarn), HDFS, Map Reduce, Spark, Hive, Zookeeper, Oozie, Tez, Impala, Mahout, Pig, Sqoop, Flume, Kafka, Storm, Ganglia, Nagios.
Development Tools: Eclipse, IBM DB2 Command Editor, QTOAD, SQL Developer, Microsoft Suite (Word, Excel, PowerPoint, Access), VM Ware
Programming/Scripting Languages: Java, SQL, Unix Shell Scripting, Python.
Databases: Oracle 11g,10g,9i, MySQL, SQL Server 2005,2008, PostgreSQL & DB2
NoSQL Databases: HBase, Cassandra, Mongo DB
Visualization: Tableau, Plotly, Raw and MS Excel.
Modeling languages: UML Design, Use case, Class, Sequence, Deployment and Component diagrams.
Version Control Tools: Sub Version (SVN), Concurrent Versions System (CVS) and IBM Rational Clear Case.
Methodologies: Agile/ Scrum, Waterfall
Operating Systems: Windows 98/2000/XP/Vista/7/8, 10, Macintosh, Unix, Linux and Solaris.
Confidential, Cleveland, OH
- Involved in review of functional and non-functional requirements.
- Experience in upgrading Cloudera hadoop cluster from 5.3.8 to 5.8.0 and 5.8.0 to 5.8.2.
- Hands-on experience on all hadoop ecosystems (HDFS, YARN, Map Reduce, Hive, Spark, Flume, Oozie, Zookeeper, Spark, Impala, HBase and Sqoop) through Cloudera manager.
- Worked on migrating MapReduce programs into Spark transformations using Spark and Scala, initially done using python (Pyspark).
- Developed Spark jobs using Scala on top of Yarn/MRv2 for interactive and Batch Analysis.
- Experienced in querying data using Spark SQL on top of Spark engine for faster data sets processing.
- Worked on implementing Spark Framework, a Java based Web Frame work.
- Worked with Apache SOLR to implement indexing and wrote Custom SOLR query segments to optimize the search.
- Written java code to format XML documents, uploaded them to Solr server for indexing.
- Experienced on Apache Solr for indexing and load balanced querying to search for specific data in larger datasets and implemented Near Real Time Solr index on HBase and HDFS.
- Worked on Ad hoc queries, Indexing, Replication, Load balancing, and Aggregation in MongoDB.
- Processed the Web server logs by developing Multi-hop flume agents by using Avro Sink and loaded into MongoDB for further analysis, also extracted files from MongoDB through Flume and processed.
- Expert knowledge on MongoDB NoSQL data modeling, tuning, and disaster recovery backup used it for distributed storage and processing using CRUD.
- Extracted and restructured the data into MongoDB using import and export command line utility tool.
- Experience in setting up Fan-out workflow in flume to design v shaped architecture to take data from many sources and ingest into single sink.
- Implemented Custom Sterilizer, interceptors to Mask, created confidential data and filter unwanted records from the event payload in flume.
- Experience in creating tables, dropping and altered at run time without blocking updates and queries using HBase and Hive.
- Experience in working with different join patterns and implemented both Map and Reduce Side Joins.
- Wrote Flume configuration files for importing streaming log data into HBase with Flume.
- Imported several transactional logs from web servers with Flume to ingest the data into HDFS. Using Flume and Spool directory for loading the data from local system (LFS) to HDFS.
- Installed and configured pig, written Pig Latin scripts to convert the data from Text file to Avro format.
- Created Partitioned Hive tables and worked on them using HiveQL.
- Loading Data into HBase using Bulk Load and Non-bulk load.
- Installed, Configured Talend ETL on single and multi-server environments.
- Experience in monitoring Hadoop cluster using Cloudera Manager, interacting with Cloudera support and log the issues in Cloudera portal and fixing them as per the recommendations.
- Experience in Cloudera Hadoop Upgrades and Patches and Installation of Ecosystem Products through Cloudera manager along with Cloudera Manager Upgrade.
- Worked on continuous Integration tools Jenkins and automated jar files at end of day.
- Worked with Tableau and Integrated Hive, Tableau Desktop reports and published to Tableau Server.
- Developed data pipeline expending Pig and Java MapReduce to consume customer behavioral data and financial antiquities into HDFS for analysis.
- Developed REST APIs using Java, Play framework and Akka.
- Experienced knowledge over designing Restful services using java based API’s like JERSEY.
- Worked in Agile development environment having KANBAN methodology. Actively involved in daily Scrum and other design related meetings.
- Used OOZIE Operational Services for batch processing and scheduling workflows dynamically.
- Supported in setting up QA environment and updating configurations for implementing scripts with Pig, Hive and Sqoop.
- Experienced in using agile approaches including Test-Driven Development, Extreme Programming, and Agile Scrum.
Environment: Hadoop, HDFS, Hive, Map Reduce, AWS Ec2, SOLR, Impala, MySQL, Oracle, Sqoop, Kafka, Spark, SQL Talend, Python, PySpark, Yarn, Pig, Oozie, SBT, Akka, Linux-Ubuntu, Scala,Ab Initio, Tableau, Maven, Jenkins, Java (JDK 1.6), Cloudera, JUnit, agile methodologies
- Involved in handling large amount of data coming from various sources and involved in HDFS maintenance and loading of structured and unstructured data.
- Developed Map Reduce jobs in Java to perform data cleansing and pre-processing.
- Migrated large amount of data from various Databases like Oracle, Netezza, MySQL toHadoop.
- Imported Bulk Data into HBase Using Map Reduce programs.
- Developed and written Apache PIG scripts and HIVE scripts to process the HDFS data.
- Perform analytics on Time Series Data exists in HBase using HBase API.
- Designed and implemented Incremental Imports into Hive tables.
- Involved in Unit testing and delivered Unit test plans and results documents using Junit and MRUnit.
- Written Hive jobs to parse the logs and structure them in tabular format to facilitate effective querying on the log data.
- Design and Implementation of Batch jobs using Sqoop, MR2, PIG, Hive.
- Involved with File Processing using Pig Latin.
- Scheduled jobs using Oozie workflow Engine.
- Worked on various compression techniques like GZIP and LZO.
- Ingesting Log data from various web servers into HDFS using Apache Flume.
- Involved in creating Hive tables, loading with data and writing Hive queries that will run internally in Map Reduce way.
- Experience in optimization of Map Reduce algorithm using combiners and partitions to deliver the best results and worked on Application performance optimization for a HDFS cluster.
- Implemented business logic by writing UDFs in Java and used various UDFs from Piggybanks and other sources.
- Used Hive to find correlations between customer's browser logs in different sites and analyzed them to build risk profile for such sites.
- Monitored workload, job performance and capacity planning using Cloudera Manager.
- Performed Cluster tasks like adding, removing of nodes without any effect on running jobs.
- Installed Qlik Sense Desktop 2.x and developed applications for users and made reports using Qlik view.
- Created and maintained Technical documentation for launchingHADOOPClusters and for executing Hive queries and Pig Scripts.
- Active involvement in SDLC phases (Design, Development, Testing), Code review etc.
- Active involvement in Scrum meetings and Followed Agile Methodology for implementation.
Environment: Apache Hadoop, MapReduce, HDFS, HBase, CentOS 6.4, Unix, REST web Services, Hive, Pig, Oozie, Java (Jdk 1.5), JSON, Qlik view, Qlik Sense, Eclipse, Oracle Database, Jenkins, Junit, Maven, Sqoop
Confidential, Conway, AR
- Loading the data from the different Data sources like (Teradata, DB2, Oracle and flat files) into HDFS using Sqoop and load into Hive tables, which are partitioned.
- Created different pig scripts & converted them as a shell command to provide aliases for common operation for project business flow.
- Implemented various Hive queries for Analysis and call then from java client engine to run on different nodes.
- Created few Hive UDF's to as well to hide or abstract complex repetitive rules.
- Developed Oozie Workflows for daily incremental loads, which gets data from Teradata and then imported into hive tables.
- Involved in End to End implementation of ETL logic.
- Reviewing ETL application use cases before on boarding to Hadoop.
- Developed bash scripts to bring the log files from FTP server and then processing it to load into Hive tables.
- All the bash scripts are scheduled using Resource Manager Scheduler.
- Moved data from HDFS to Cassandra using Map Reduce and Bulk Output Format class.
- Developed Map Reduce programs for applying business rules to the data.
- Did Implementation using Apache Kafka replacement for a more traditional message broker (JMS Solace) to reduce licensing and decouple processing from data producers, to buffer unprocessed messages.
- Created HBase tables and column families to store the user event data.
- Written automated HBase test cases for data quality checks using HBase command line tools.
- Implemented receiver based approach, here I worked on Spark streaming for linking with Streaming Context using java API and handle proper closing & waiting for stages as well.
- Maintaining Authentication module to support Kerberos.
- Experience in Implementing Rack Topology scripts to the Hadoop Cluster.
- Implemented the part to resolve issues related with old Hazel cast API Entry Processor.
- Participated with the admin team in designing and upgrading CDH 3 to HDP 4.
- Developed Some Helper class for abstracting Cassandra cluster connection act as core toolkit.
- Enhanced existing module written in python scripts.
- Used dashboard tools like Tableau.
Environment: Hadoop, Linux, MapReduce, HDFS, HBase, Hive, Pig, Tableau, NoSQL, Shell Scripting, Sqoop, Java, Eclipse, Oracle 10g, Maven, Open source technologies Apache Kafka, Apache Spark, ETL, Hazel cast, Git, Mockito, python.
- Involved in design, development and testing phases of the project.
- Implemented GUI using Html, Jsp, Tiles, Struts Tag Libs, CSS components.
- Configured faces-config.xml for the page navigation rules and created managed and backing beans for the Optimization module.
- Used LDAP for user Authentication and authorization.
- Developed MapReduce programs in Java for parsing the raw data and populating staging Tables.
- Developed UNIX shell scripts to load large number of files into HDFS from Linux File System.
- Experience in setting up the whole app stack, setup and debug log stash to send Apache logs to AWS Elastic search.
- Collaborated with Database, Network, application and BI teams to ensure data quality and availability.
- Used Impala connectivity from the User Interface (UI) and query the results using Impala QL.
- Written and Implemented Teradata Fast load, Multi load and Bteq scripts, DML and DDL.
- Used Zookeeper to coordinate the servers in clusters and to maintain the data consistency.
- Developed Enterprise Application using Spring MVC, JSP, MySQL
- Worked on developing client-side Web Services components using Jax-Ws technologies
- Extensively worked on JUnit for testing the application code of server-client data transferring
- Developed and enhanced products in design and in alignment with business objectives
- Used SVN as a repository for managing/deploying application code
- Involved in the system integration and user acceptance tests successfully
- Developed front end using JSTL, JSP, HTML, and Java Script
- Used XML to maintain the Queries, JSP page mapping, Bean Mapping etc.
- Used Oracle 10g as the backend database and written PL/SQL scripts.
- Maintained and modified system based on user feedbacks using the OO concepts
- Implemented database transactions using Spring AOP & Java EE CDI capability
- Enriched organization reputation via fulfilling requests and exploring opportunities
- Business Analysis, Reporting Service and Integrate to Sage Accpac(ERP)
- Developing new and maintaining existing functionality using SPRING MVC, Hibernate
- Developed test cases for integration testing using Junit.
- Extensively used tools like Acc Verify, Check style and Clockworks to check the code.
- Creating new and maintaining existing web pages build in JSP, Servlet.
- Presented the process logical and physical flow to various teams using PowerPoint and Visio diagrams.
Environment: Java JDK (1.5), Java J2EE, Informatica, Oracle 11g (TOAD and SQL developer) Servlets, JBoss application Server, Map Reduce, HDFS, Water Fall, JSPs, EJBs, DB2, RAD, XML, Web Server, JUNIT, Hibernate, MS ACCESS, Microsoft Excel.
- Understanding requirement and the technical aspects and architecture of the existing system
- Involved in writing SQL queries for fetching data from Oracle database.
- Developed multi-tiered web - application using J2EE standards.
- Designed and developed Web Services to store and retrieve user profile information from database.
- Used Apache Axis to develop web services and SOAP protocol for web services communication.
- Used Spring DAO concept to interact with Database using JDBC template and Hibernate template.
- Well Experienced in deploying and configuring applications onto application servers like Web logic, WebSphere and Apache Tomcat.
- Created RESTful web services interface to Java-based runtime engine and accounts.
- Done thorough code walk through for the team members to check the functional coverage and coding standards.
- Actively involved in writing SQL using SQL query builder.
- Followed AGILE Methodology and SCRUM to deliver the product with cross-functional skills.
- Used JUnit to test persistence and service tiers. Involved in unit test case preparation.
- Hands on experience in software configuration / change control process and tools like Subversion (SVN), Git CVS and Clear Case.
- Actively used the defect tracking tool JIRA to create and track the defects during QA phase of the project
- Worked closely with team members on and offshore in development when having dependencies.
- Involved in sprint planning, code review and daily standup meetings to discuss the progress of the application.