- 8 years of professional IT experience which includes around 4+ years of hands on experience in Hadoop using Cloudera, Hortonworks, and Hadoop working environment includes Map Reduce, HDFS, HBase, Zookeeper, Oozie, Hive, Sqoop, Pig, Spark and Flume.
- Expertise in Installing, Configuring and using Hadoop Eco System Components like HDFS, Hadoop Map Reduce, Yarn, Zookeeper, Sqoop, Flume, Hive, HBase, Spark, Oozie. in Hadoop (CDH3/CDH4 & Horton works) architecture, Map Reduce programming using Hive, Java.
- Experience in configuring various configuration files like core - site.xml, hdfs-site.xml, mapred-site.xml, yarn-site.xml based upon the job requirement.
- Excellent understanding of Hadoop architecture and different components of Hadoop clusters which include components of Hadoop (Job Tracker, Task Tracker, Name Node and Data Node).
- Exposure in analyzing large data sets using HiveQL, Pig Latin, HBase and custom Map Reduce programs in Java.
- Hands on performing ad-hoc queries on structured data using Hive QL and used Partition and Bucketing techniques and joins with HIVE for faster data access.
- Extensively worked on Hive, Pig and Sqoop for sourcing and transformations.
- Experience working on deploying a Hadoop cluster using Cloudera 5.X integrated with Cloudera Manager for monitoring and Alerting.
- Experience in implementing Real-Time event processing and analytics using messaging systems like Spark Streaming.
- Hands on experience in data mining process, implementing complex business logic and optimizing the query using HiveQL and controlling the data distribution by partitioning and bucketing techniques to enhance performance.
- In-Depth knowledge of Scala and Experience building Spark applications using Scala.
- All phases of Software Development Life Cycle starting with Analysis, followed by design, development and testing.
- Experienced in creating and analyzing Software Requirement Specifications (SRS) and Functional Specification Document (FSD). Strong knowledge of Software Development Life Cycle (SDLC).
- Acquired good knowledge on Mainframes and extensively worked in VS COBOL II, JCL languages, and well versed with DB2 and the file system like PS files and VSAM and having worked in different applications Domains.
- Hands-on experience on major components in Hadoop Ecosystem including Pig, Sqoop, Spark, Hive, HBase, HBase-Hive Integration & amp; knowledge of MapReduce/HDFS and Spark Framework.
- Experience in implementing software best practices, including Design patterns, Use Cases, Object oriented analysis and design, agile methodologies, and Software/System Modeling (UML).
- Worked excessively on Core Java concepts like polymorphism, Collection, inheritance, serialization, synchronization, multi-threading, exception handling and socket programming.
- Experience in software testing, Junit testing, regression testing and defect tracking and Management using JIRA.
- Highly motivated with the ability to work independently or as an integral part of a team and Committed to highest levels of profession.
Programming Languages : C, C++, Java, Unix Shell Scripting, PL/SQL
J2EE Technologies: Spring, Servlets, JSP, JDBC, Hibernate.
Big Data Ecosystem: HDFS, HBase, Map Reduce, Hive, Pig, Spark, Kafka, Storm, Sqoop, Impala, Cassandra, Oozie, Zookeeper, Flume.
DBMS: Oracle 11g, SQL Server, MySQL, IBM DB2.
Modeling Tools: UML on Rational Rose 4.0
Web Services: Restful, SOAP.
IDEs: Eclipse, Net beans, WinSCP, Visual Studio and Intellij.
Operating systems: Windows, UNIX, Linux (Ubuntu), Solaris, Centos.
Version and Source Control: CVS, SVN and IBM Rational Clear Case.
Servers: Apache Tomcat, Web logic and Web Sphere.
Frameworks: MVC, Spring, Struts, Log4J, Junit, Maven, ANT.
Sr. Hadoop Developer
- Worked on loading disparate data sets coming from different sources to BDpaas (HADOOP) environment using SQOOP.
- Developed UNIX scripts in creating Batch load and driver code for bringing huge amount of data from Relational databases to BIGDATA platform.
- Developed Pig queries to load data to HBase. Leveraged Hive queries to create ORC tables.
- Worked with a team of developers on Python applications for RISK management.
- Used EMR (Elastic Map Reducing) to perform big data operations in AWS. Created ORC tables to improve the performance for the reporting purposes.
- Involved in the coding and integration of several business-critical modules of CARE application using Java, spring, Hibernate and REST web services on Web Sphere application server.
- Knowledge of Cassandra maintenance and tuning - both database and server. Databases Cassandra, Mango DB, MySQL, Oracle.
- Coordinated Kafka operational and monitoring (via JMX) with dev ops personnel; formulated balancing.
- Leadership strategies and impact of producer and consumer message (topic) consumption to prevent overruns. Aggressive monitoring of partitioning versus topic production via JMX interface (s) developed Kafka standalone.
- POC's with the Confluent Schema Registry, Rest Proxy, Kafka Connectors for Cassandra and HDFS (Hadoop 2.0).
- Developed efficient MapReduce programs for filtering out the unstructured data and developed multiple MapReduce jobs to perform data cleaning and preprocessing on Hortonworks.
- Very good experience in monitoring and managing the Hadoop cluster using Hortonworks.
- Responsible for migrating the code base from Cloudera Platform to Amazon EMR and evaluated Amazon eco systems components like RedShift, Dynamo DB.
- Developed MapReduce/ EMR jobs to analyze the data and provide heuristics and reports. The heuristics were used for improving campaign targeting and efficiency.
- Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.
- Monitored workload, job performance and capacity planning using Cloudera Manager.
- Custom Kafka broker design to reduce message retention from default 7-day retention to 30-minute retention - architected a light weight Kafka broker.
- Worked with Parallel connectors for Parallel Processing to improve job performance while working with bulk data sources in Talend.
- Analyzing the source data to know the quality of data by using Talend Data Quality.
- Deployed Talend jobs on various environments including dev, test and production environments.
- Implemented business logic using Python/Django. Created Health Allies Eligibility and Health Allies Transactional feeds extracts using Hive, HBase and UNIX to migrate feed generation from a mainframe application called CES (Consolidated Eligibility Systems) to big data.
- Used bucketing concepts in Hive to improve performance of HQL queries. Developed Spark scripts by using Scala shell commands.
- Created a Map Reduce program which looks into data in HBase current and prior versions to identify transactional updates. These updates are loaded into Hive external tables which are in turn referred by Hive scripts in transactional feeds generation.
- Involved in Agile methodologies, daily Scrum meetings, Sprint planning.
Environment: HDFS, Pig, Hive, Map Reduce, Spark, Sqoop, HBase, Apache Kafka 0.0.8/0.0.9, 0.1.0, Zookeeper, Apache NIFI, Talend, NoSQL, and LINUX.
Confidential, Jacksonville, Florida
- Responsible for building scalable distributed data solutions using Hadoop.
- This project will download the data that was generated by sensors from the cars activities, the data will be collected in to the HDFS system online aggregators by Kafka.
- Experience in creating Kafka producer and Kafka consumer for Spark streaming which gets the data from different learning systems of the patients.
- Spark Streaming collects this data from Kafka in near-real-time and performs necessary transformations and aggregation on the fly to build the common learner data model.
- Used Spark Streaming to divide streaming data into batches as an input to Spark engine for batch processing.
- Experience in AWS to spin up the EMR cluster to process the huge data which is stored in S3 and push it to HDFS.
- Implemented Spark SQL to access hive tables into spark for faster processing of data.
- Involved in Converting Hive/SQL queries into Spark transformations using Spark RDD, Scala.
- Interacting with Cloudera support and log the issues in Cloudera portal and fixing them as per the recommendations.
- Upgraded the Cloudera Hadoop ecosystems in the cluster using Cloudera distribution packages.
- Debug and solve the major issues with Cloudera manager by interacting with the Cloudera team.
- Used Apache Oozie for scheduling and managing the Hadoop Jobs. Extensive experience with Amazon Web Services (AWS).
- Developed Python/Django application for Google Analytics aggregation and reporting.
- Developed and updated social media analytics dashboards on regular basis.
- Good understanding of NoSQL databases such as HBase, Cassandra and MongoDB.
- Supported Map Reduce Programs running on the cluster and wrote custom Map Reduce Scripts for Data Processing in Java.
- Worked with Apache Nifi for Data Ingestion. Triggered the shell Script and Schedule them using Nifi.
- Monitoring all the Nifi flows to get notifications in case if there is no data flow through the flow more than the specific time.
- Created Nifi flows to trigger spark jobs in case if we have any failures we got email notifications regarding the failures.
- Monitored workload, job performance and capacity planning using Cloudera Manager.
- Worked on migrating PIG scripts and MapReduce programs to Spark Data frames API and Spark SQL to improve performance Involved in moving all log files generated from various sources to HDFS for further processing through Flume and process the files by using some piggybank.
- Used Flume to collect, aggregate and store the web log data from different sources like web servers, mobile and network devices and pushed into HDFS. Used Flume to stream through the log data from various sources.
- Using Avro file format compressed with Snappy in intermediate tables for faster processing of data. Used parquet file format for published tables and created views on the tables.
- Created sentry policy files to provide access to the required databases and tables to view from impala to the business users in the dev, test and prod environment.
- Implemented test scripts to support test driven development and continuous integration.
- Good understanding of ETL tools and how they can be applied in a Big Data environment.
- Involved in Agile methodologies, daily Scrum meetings, Sprint planning.
Environment: Hadoop, Map Reduce, Cloudera, Spark, Kafka, HDFS, Hive, Pig, Oozie, Scala, Eclipse, Flume, Oracle, UNIX Shell Scripting.
Confidential - Auburn Hills, MI
Sr. Java Developer
- Developed Application using Spring and Hibernate, Spring batch, Web Services like SOAP using Apache Axis Containers and restful Web services.
- Used Spring Framework at Business Tier and Spring’s Bean Factory for initializing services.
- Used Spring IOC to inject services and their dependencies.
- Developed server-side scripts in Python to customize GIT and integrate it with tools like JIRA and Jenkins.
- Designed and developed several EJBs using Session facade pattern.
- Taken care of Java Multithreading, Collections, File Handling, Serialization part in back end components.
- Done the design, development and testing phases of Software Development using AGILE methodology and Test-Driven Development (TDD)
- Performed Test Driven Development (TDD) using JUnit and Mockito.
- Implemented Hibernate to persist the data into Database and wrote HQL based queries to implement CRUD operations on the data.
- Developed Web Services to communicate to other modules using XML based SOAP using Apache Axis Container and WSDL.
- Developed REST API components using Jersey REST framework following TDD approach for processing Product Upgrade requests.
- Involved in writing Junit test cases for controller classes by using Mockito, Junit Framework.
- Developed test code in Java language using Eclipse, IDE and Testing framework.
- Used Testing framework to run unit test and Maven to build the Project.
- Designed the complex BPDs (Business Process Definition) of the application.
- Extensively used design patterns like Singleton, Value Object, Service Delegator and Data Access Object.
- Used Spring Core Annotations for Dependency Injection and used Apache Camel to integrate spring framework.
- Followed top down approach to implement SOAP based web services & used Apache AXIS commands to generate artifacts from WSDL file.
- Used SOAP-UI to test the Web Services using WSDL.
- Used JERSEY API to develop restful web services.
- Development and Integration of the Application using Eclipse IDE.
- Used Maven tool to build project and JUnit to develop unit test cases.
- Used the Log4j framework to log the system execution details to log files.
Environment: Java 1.7, Spring, Hibernate, HTML, HTML5, TDD, CSS, CSS3, Java Script, AJAX, Eclipse, XML, CVS, Maven, WSDL, SOAP, Apache AXIS, JSE, JAX-WS, AngularJS, Python, JAX-RS, JERSEY, SOAP UI, Log4J, DB2, Oracle 11g, IBM Web Sphere server, UNIX, DB2- SQL & PL/SQL.
- Played an active role in the team by interacting with welfare business analyst/program specialists and converted business requirements into system requirements.
- Developed analysis level documentation such as Use Case, Business Domain Model, Activity & Sequence and Class Diagrams.
- Conducted Design reviews and Technical reviews with other project stakeholders.
- Implemented Services using Core Java.
- Developed and deployed UI layer logics of sites using JSP.
- Struts (MVC) is used for implementation of business model logic.
- Worked with Struts MVC objects like Action Servlet, Controllers, and validators, Web Application Context, Handler Mapping, Message Resource Bundles and JNDI for look-up for J2EE components.
- Developed dynamic JSP pages with Struts.
- Used built-in/custom Interceptors and Validators of Struts.
- Developed the XML data object to generate the PDF documents and other reports.
- Used Hibernate, DAO, and JDBC for data retrieval and medications from database.
- Messaging and interaction of Web Services is done using SOAP.
- Developed JUnit Test cases for Unit Test cases and as well as System and User test scenarios.
- Involved in Unit Testing, User Acceptance Testing and Bug Fixing.
- Implemented mid-tier business services to integrate UI requests to DAO layer commands.
- Analyzed Business requirements based on the Business Requirement Specification document.
- Performed extensive query analysis and tuning, indexes and hints and written numerous complex queries involving sub-queries, correlated queries, union/all, minus, inline SQL’s, analytical function SQL’s.
- Developed program specifications for PL/SQL Procedures and Functions to do the data migration and conversion.
- Created wide range of data types, tables, and index types and scoped variables.
- Designed the front-end interface for the users, using Oracle Forms.
- Involved in database development by creating Oracle PL/SQL Functions, Procedures, Triggers, Packages, Records and Collections.
- Involved in development of ETL process using SQL* Loader and PL/SQL Package.
- Developed and customized Forms/Reports Using Oracle D2K.
- Designed Data layouts and Developer Reports using Oracle D2K.
- Implemented batch jobs (shell scripts) for loading database tables from Flat Files using SQL*Loader.
- Participated in Performance Tuning using Explain Plan.
- Created numerous of database Triggers using PL/SQL.
- Involved in Technical Documentation, Unit test, Integration Test, writing the Test plan and version controlling with CVS.
- Created UNIX shell and Perl scripts for data file handling and manipulations.
Environment: Oracle 9i/10g, SQL, PL/SQL, SQL*Plus, Oracle D2K, SQL*Loader.