- Around 8 years of experience with emphasis in designing and implementing statistically significant analytic solutions on Big Data Technologies and Java based enterprise applications.
- 3+ years of implementation and extensive working experience in wide array of tools in the Big Data Stack HDFS, HBase, Hadoop MapReduce, Hive, Pig, Flume, Chukwa, Oozie, Sqoop, Avro, Kafka, Zookeeper and Spark.
- Expereienced with NoSQL databases like HBase, Cassandra and MongoDB.
- Comprehensive experience in building Web - based, Enterprise level and stand alone applications using JSP, Struts, Spring, Hibernate,JSF, Web services.
- Experienced in complete SDLC life cycle including design, development, testing and production environments.
- Assisted in Cluster maintenance, Cluster Monitoring and Troubleshooting, Managing and Reviewing data backups and log files.
- Experience in business analysis, requirement gathering, impact analysis, estimation, review activities, testing, assigning and tracking work, status reporting and providing quick resolution of issues.
- Good communication skills and excellent customer relations in collecting & analyzing user requirements.
- Good knowledge on Cloudera and Hortonworks Hadoop distributions.
- Good exposure to Name node Federation and MapReduce 2.0 (MRv2) or YARN.
- Hands on experience in developing MapReduce programs using Apache Hadoop for analyzing the Big Data.
- Extensive experience in handling complex unstructured data by writing Map Reduce programs.
- Hands on experience in using different Map Reduce Design Patternes to solve complex Map Reduce programs.
- Experienced in handling different file formats like Text file, Avro data files, Sequence files, Xml and Json files.
- Experienced in debugging, testing Map Reduce programs using Counters, MRUnit and EasyMock.
- Expertise in composing MapReduce Pipelines with many user-defined functions using Apache Crunch.
- Experienced in migrating ETL projects into Hadoop using PIG latin scripts.
- Experienced in handling different data sets using Pig join operations.
- Implemented business logic by writing Pig Latin UDFs in Java and used various UDFs from Piggybanks and other sources.
- Experienced in data cleanising processing using Pig latin operations and UDFs.
- Expertise in implementing Ad-hoc queries using Hive QL.
- Responsible for performing extensive data validation using HIVE Dynamic Partitioning and Bucketing.
- Expertise in developing Hive Generic UDF's to implement complex business logic to in corporate into Hive QL.
- Experienced in using Aggregate functions, table generated fucntions, implementing UDF's to handle complex objects.
- Experienced in handling different optimization join operations like Map join, Sorted Bucketed Map join, etc.
- Experienced in Tunning Hive queries using Hive configurable parameters.
- Experienced in writing queries using SparkQL.
- Experienced in proving User based recommendation using Spark Mlib library.
- Comprehensive knowledge in understanding different componets of Spark frameeork.
- Experience in wirint Programs using Scala.
- Expert database engineer; NoSQL and relational data modeling.
- Responsible for building scalable distributed data solutions using DatastaxCassandra.
- Expertise in HBase Cluster Setup, Configurations, HBase Implementation and HBase Client API.
- Worked on importing data into HBase using HBase Shell and Java API.
- Experience in handling streaming data using flume and memory channels.
- Experienced in importing/exporting data from relational data base to HDFS using Sqoop.
- Experienced in configuring work flows, submiting jobs, implementing schedulers using Oozie and shell scripts.
- Expertise in ETL using Informatica to facilitate Extraction Transformation and data Load from OLTP systems to OLAP systems.
- Good Knowledge in Amazon AWS concepts like EMR and EC2 web services which provides fast and efficient processing of Big Data.
- Knowledge on Splunk for logging mechanism.
- Experienced in handling distrubuted messaging using Kafka and JMS.
- Experienced with build tool Maven, Ant and continuous integrations like Jenkins
- Developed Unit test cases using JUnit, Easy Mock and MRUnit testing frameworks
- Experienced in Agile SCRUM, RUP (Rational Unified Process) and TDD (Test Driven Development) software development methodologies
- Expertise in several J2EE technologies like JDBC, Servlets, JSP,Struts, Spring, Hibernate, JPA, JSF, EJB, JMS, JAX-WS, SOAP, JQuery, AJAX, XML, JSON, HTML5/HTML, XHTML, Maven, and Ant
- Expert knowledge over J2EE Design Patterns like MVC Architecture, Front Controller, Session Facade, Business Delegate and Data Access Object for building J2EE Applications
- Thorough knowledge on JAX-WS to access the external Web Services, get the xml response and convert it back to java objects
- Experience in using Jenkins for Continuous Integration and Sonar jobs for java code quality
- Extensive experience in developing Internet and Intranet related applications using J2EE, Servlets, JSP, Jboss, WebLogic, Tomcat, and Struts Frame Work
- Extensive experience with database DB2, Oracle9i/10g/11g (Database Design, and SQL Queries)
- Good experience in SQL, PL/SQL, Perl Scripting, Shell Scripting, Partitioning, Data modeling, OLAP, Logical and Physical Database Design, Backup and Recovery procedures.
- Developed automated scripts using Unix Shell Scripting to perform database activities.
Hadoop/Big Data/NoSql Technologies: HDFS, MapReduce, Hive, Pig, Sqoop, Flume, Oozie, Avro, Hadoop Streaming, Storm, Kafka, YARN, Crunch, Zookeeper, HBase, Cassandra
Programming Languages: Java (JDK 5/JDK 6), Python, C, SQL, PL/SQL, Shell Script
IDE Tools: Eclipse, Rational Team Concert, NetBeans
Framework: Hibernate, Spring, Struts, JMS, EJB, JUnit, MRUnit, JAXB
Application Servers: Jboss, Tomcat, Web Logic, Web Sphere
Databases: Oracle 11g/10g/9i, MySQL, DB2, Derby, MS-SQL Server
Operating Systems: UNIX, Windows, LINUX
Build Tools: Jenkins, Maven, ANT
Reporting Tools: Jasper Reports, iReport
Environment: - CDH, Map Reduce, PIG, HIVE, Hawq, Sqoop (V1), Oozie, Spark, Scala, Core Java, Oracle 11g, PL/SQL,Kafka, Cassandra,Maven,Data modeling.
- Developed various Big Data workflows using custom MapReduce, Pig, Hive, Sqoop, and Flume.
- Built reusable Hive UDF libraries for business requirements which enabled users to use these UDF's in Hive querying..
- The logs and semi structured content that are stored on HDFS were preprocessed using PIG and the processed data is imported into Hive warehouse which enabled business analysts to write Hive queries.
- Expertise in performance tuning on Hive Queries, joins and different configuration parameters to improve query response time.
- Implemented custom DataTypes, InputFormat, RecordReader, OutputFormat, RecordWriter for MapReduce computations.
- Created Partitions, Buckets based on State to further process using Bucket based Hive joins.
- Created Hive Generic UDF's to process business logic that varies based on policy.
- Conducted POC for Hadoop and Spark as part of NextGen platform implementation. Implemented recommendation engine using scala.
- Developing data pipeline programs with Spark Scala APIs, data aggregations with Hive, and formatting data (json) for visualization, and generating.
- Spark Streaming collects this data from Kafka in near-real-time and performs necessary transformations and aggregation on the fly to build the common learner data model and persists the data in NoSQL store (Hbase).
- Used JSON and Avro SerDe’s for serialization and de-serialization packaged with Hive to parse the contents of streamed log data and implemented Hive custom UDF’s.
- Experienced in using Spark Mlib to provide user based recommendations.
- Developed Map Reduce programs to find out top selling items based on category.
- Implemented Optimized join base to handle different data sets using Map Reduce programs.
- Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
- Written python scripts for internal testing which pushes the data reading form a file into Kafka queue which in turn is consumed by the Storm application.
- Implemented Custom Input formats to handle input data set format required by business data.
- Configured big data workflows to run on the top of Hadoop using Control M and these workflows comprises of heterogeneous jobs like Pig, Hive, Sqoop and MapReduce.
- Extensively worked on PIG scripts & Pig UDFs to perform ETL activities.
- Developed Simple to complex Map/reduce Jobs using Hive and Pig.
- Worked with Impala for data analysis and exported HDFS data into Netezza tables.
- Experienced on Sqoop jobs in order to export/import data from/into HDFS..
- Used Apache Maven extensively for building ja0r files of MapReduce programs and deployed to Cluster.
- Experienced in debugging jobs using Web UI and Cloudera manager.
- Experienced in writing unit test cases, implement unit test cases using JUnit.
Environment: - Horton works, Hadoop, HBase, Map Reduce, HDFS, Hive, Java (jdk1.7), Pig, Linux, XML, HBase, Cassandra, Zookeeper, Sqoop, Oozie, Informatica, SQL and My SQL.
- Experienced in migrating ETL scripts to Pig latin scripts to perform clean and transform operations.
- Experienced in setting up Hadoop clusters and benchmarked for internal use.
- Developed and Designed ETL Applications and Automated using Oozie workflows and Shell scripts with error handling and mailing Systems.
- Knowledge and experience with Hortonworks Distribution setup.
- Used Cassandra to design and handle large amounts of data.
- Used Cassandra CQL with Java API’s to retrieve data from Cassandra tables.
- Developed multiple Map Reduce jobs and used Hive and Pig Scripts for analyzing the data.
- Developed several advanced Map Reduce programs to process data files received
- Created reports for the BI team using Sqoop to export data into HDFS and Hive
- Used various transformations like Filter, Expression, Sequence Generator, Update Strategy, Joiner and SQL, Lookup (File and Database) using Hive to develop robust mappings in the Informatica Designer.
- Used Avro Serdes to handle Avro Format Data in Hive .
- Developed Hive queries to process the data for visualizing.
- Used Flume and Sqoop to load data from multiple sources into HDFS to be processed by Pig and Hive to create a more usable datastore.
- Developed Bankers Rounding UDF for HIVE Implemented Teradata Rounding in HIVE.
- Migrated data into HBase using Map Reduce to perform real time queries.
- Implement Coprocessors, Observers in HBase to improve performance in data handling.
- Handled CRUD operations in HBase using Java API.
- Extensive understanding about HBase Architecture.
- Developed Map Reduce Jobs to validate and implement business logics.
- Implemented Agile methodology to collect requirements and develop solutions in Hadoop eco-system.
- Implemented performance-tuning techniques along various stages of the Migration process.
- Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
Environment: HDFS, Sqoop, PIG, Hive, Cassandra Map reduce, Cloudera Manager (CDH3), Java, Oracle 11g.
- Coordinated with business customers to gather business requirements.
- Involved in Design and Development of technical specifications using Hadoop technology.
- Involved in implementing data model in Cassandra data base.
- Integrated Cassandra with Map Reduce to implement bulk data movement into Cassandra.
- Performed different kind of transactions on Cassandra data using Thrift API.
- Handled importing of data from various data sources, performed transformations using PIG, MapReduce, loaded data into HDFS and Extracted the data from MySQL into HDFS using SQOOP.
- Established custom Map Reduce programs in order to analyze data and used Pig Latin to clean unwanted data.
- Used Spring framework, Spring-AOP, Spring-ORM, Spring-JDBC modules.
- Developed the application using Spring Framework that uses JSP, Model View Controller (MVC) architecture.
- Worked on streaming the analyzed data to the existing relational databases using SQOOP for making it available for visualization and report generation by the BI team.
- Created Pig Latin scripts to sort, group, join and filter the enterprise wise data.
- Analyzed large data sets by running Hive queries and Pig scripts.
- Involved in moving all log files generated from various sources to HDFS for further processing.
- Analyzed Web server log data using Apache Flume.
- Analyzed large amounts of data sets to determine optimal way to aggregate & report on it.
- Involved in creating workflow engine to run multiple Hive and Pig jobs.
Environment: Java core, Servlets, JSF, JDBC,ADF Rich client UI Framework ADF-BC (BC4J) 11g, web services Using Oracle SOA (BPEl), Oracle Web Logic.
- Created the Database, User, Environment, Activity, and Class diagram for the project (UML).
- Implement the Database using Oracle database engine.
- Used iBATIS framework with Spring Framework for data persistence and transaction management.
- Designed and developed a fully functional generic n-tiered J2EE application platform the environment was Oracle technology driven. The entire infrastructure application was developed using Oracle JDeveloper in conjunction with Oracle ADF-BC and Oracle ADF- Rich Faces.
- Created an entity object (business rules and policy, validation logic, default value logic, security)
- Created View objects, View Links, Association Objects, Application modules with data validation rules (Exposing Linked Views in an Application Module), LOV, dropdown, value defaulting, transaction management features.
- Web application development using J2EE: JSP, Servlets, JDBC, Java Beans, Struts, Ajax, JSF, JSTL, Custom Tags, EJB, JNDI, Hibernate, ANT, JUnit and Apache Log4J, Web Services, Message Queue (MQ).
- Designing GUI prototype using ADF 11G GUI component before finalizing it for development.
- Create Reusable Component (ADF Library and ADF Task Flow).
- Experience using Version controls such as CVS, PVCS, and Rational Clear Case.
- Creating Modules Using Task Flow with Bounded and Unbounded.
- Generating WSDL (Web Services) And Create Work Flow Using BPEL.
- Handel the AJAX functions (partial trigger, partial Submit, auto Submit).
- Created the Skin for the layout.
Environment: - JAVA, Java Script, HTML, JDBC Drivers, Soap Web Services, UNIX, Shell scripting, SQL Server.
- Implemented the project according to the Software Development Life Cycle (SDLC).
- Implemented JDBC for mapping an object-oriented domain model to a traditional relational database.
- Acted as POC at offshore for development.
- Created Stored Procedures to manipulate the database and to apply the business logic according to the user’s specifications.
- Developed the Generic Classes, which includes the frequently used functionality, so that it can be reusable.
- Exception Management mechanism using Exception Handling Application Blocks to handle the exceptions.
- Designed and developed user interfaces using JSP, Java script and HTML.
- Involved in Database design and developing SQL Queries, stored procedures on MySQL.
- Used CVS for maintaining the Source Code.
- Logging was done through log4j.
- Update the day-to-day work status via mails to onsite and offshore leads
- Construction of Unit Test Cases and unit testing.