- 7+ years of experience in JAVA/J2EE technologies and worked in all phases of software development life cycle and Big Data Analytics with hands on experience on writing Map Reduce Jobs on Hadoop Ecosystems including HIVE, PIG, HBASE, Flume and Oozie.
- 3+ years’ experience on Hadoop eco - system development and administration including MapReduce, HIVE, Pig, HBASE, Sqoop, Flume, Oozie and HDFS Administration.
- Excellent knowledge on Hadoop Architecture and ecosystems such as HDFS, Resource Manager, Node Manager, Name node, Data node and Map Reduce Programming Paradigm.
- Experience in migrating the data using Sqoop from HDFS to relational database system and vice-versa.
- Experienced in configuring Flume to stream data into HDFS.
- Experienced in customizing PIG Latin and HIVE SQL scripts for Data Analysis and developing UDF’s and UDAF’s in java to extend the HIVE and PIG Latin functionality.
- Good experience in HIVE partitioning, bucketing and perform different types of joins on HIVE tables and implementing HIVE SERDES like REGEX, JSON and AVRO.
- Experience in using Microsoft Azure, ADF, ADLS, Azure Blob, COSMOS.
- Good experience in building pipelines using Azure Data Factory and moving the data into Azure Data Lake Store.
- Experience in using HBase as backend database for the application development.
- Good knowledge in job scheduling and monitoring through Oozie and Zoo keeper.
- Good experience in setting up and configure clusters in AWS .
- Used AVRO SERDES and JSON SERDES to handle AVRO Format data in Hive and Impala.
- Knowledge of Apache Solr/Lucene developing open source enterprise search platform.
- Good understanding of NOSQL databases and hands on work experience in writing applications on NOSQL databases like HBase and Cassandra.
- Diverse experience in utilizing JAVA tools in business, web and client server environments including Java Platform, JSP, Servlet, Java beans, JSTL, JSP Custom tags, EL, JSF and JDBC.
- Experience in web technologies like XML, XSD, XSLT, CSS, WSDL, REST and SOAP.
- Contributed to the Automation approach of Data Driven Testing (DDT), regression testing and functional testing using Win Runner and Quick Test Pro (QTP).
- Specialized in performance testing application using load testing tools such as Load Runner and Performance Center.
- Developed stored procedures and queries using PL/SQL.
- Proficient in databases: ORACLE, MS SQL server, MYSQL and DB2.
- Strong skills in performing system, Acceptance, Regression, Stress, Performance, Load, Functionality, Front End and Back End Testing.
- Expertise in Informatica SSIS, SSRS, SSAS, COGNOS, ERWIN.
- Good experience in AGILE delivery process of software using SCRUM.
- Good in developing, publishing and execute Test plans, Test procedures, Test Results.
- Highly flexible and accustomed to work in both large and small group settings.
- Have excellent analytical, problem solving, communication and interpersonal skills with good work ethics and ability to interact with individuals at all levels of the organization.
- Highly result oriented and pro-active, proven abilities to learn new technologies quickly and implementing them successfully in production.
Hadoop/Big Data: Hadoop (Cloudera,Azure,Horton), HDFS, Mapreduce, HBase,Spark Pig, Hive, Sqoop, Flume, MongoDB,Kafka Cassandra, Oozie, Zookeeper, Impala, Solr
Java & J2EE Technologies: Java JDK 1.4/1.5/1.6(JDK 5/JDK 6), HTML, Servlets, JSP, JDBC, JNDI, Java Beans
IDE s: Eclipse, Net beans
Frameworks: MVC, Struts, Hibernate, Spring
Programming languages: C, C++, C#, Python, Ant scripts, Linux shell scripts, SQL, PL/SQL
Databases: Oracle 11g/10g/9i, My SQL, DB2, MS-SQL Server
Web Servers: Web Logic, Web Sphere, Apache CXF/XFire, Apache Axis, SOAP, REST
Network Protocols: TCP/IP, UDP, HTTP, DNS, DHCP
ETL Tools: Informatica, Pentaho
Development Tools: TOAD, Maven, Visio, Rational Rose, Endur 8.x/10.x/11.x
Operating Systems: Mac OSX, Unix, Windows, Linux
BIG DATA developer
- Developed Spark scripts using Python on Azure HDInsight for Data Aggregation, Validation and verified its performance over MR jobs.
- Built pipelines to move hashed and un-hashed data from Azure Blob to Data lake.
- Utilized Azure HDInsight to monitor and manage the Hadoop Cluster.
- Collaborated on insights with Data Scientists, Business Analysts and Partners.
- Performed advanced procedure like text analytics and processing, using the in-memory computing capabilities of Spark using Python.
- Created pipelines to move data from on-premise servers to Azure Data Lake.
- Utilized Python Panda Frame to provide data analysis.
- Enhanced and optimized Spark scripts to aggregate, group and run data mining tasks.
- Loaded the data into Spark RDD and do in memory data Computation to generate the output response.
- Involved in converting Hive/SQL queries into Spark Transformations using Spark RDD’s and PySpark.
- Optimizing of existing algorithms in Hadoop using Spark Context, Spark SQL, Data Frames and Pair RDD’s.
- Used Spark API over Hadoop Yarn to perform analytics on data and monitor scheduling.
- Implemented schema extraction for Parquet and Avro file formats.
- Experienced in performance tuning of Spark Applications for setting right Batch Interval Time, correct level of Parallelism and memory tuning.
- Developed Hive queries to process the data and generate the data cubes for visualization.
- Built specific functions to ingest columns into Schemas for Spark Applications.
- Experienced in handling large data sets using Partitions, Spark in memory capabilities, effective and efficient Joins, Transformations and other during ingestion process itself.
- Developed data integration programs in a Hadoop and RDBMS environment with both traditional and non-traditional data sources for data access and analysis.
- Analyzed SQL scripts and designed the solution to implement using PySpark.
- Used reporting tools like Power BI for generating data reports daily.
- Handled several techno-functional responsibilities including estimates, identifying functional and technical gaps, requirements gathering, designing solutions, development, developing documentation and production support.
Environment: s: Hadoop(HDFS/Azure HDInsight), HIVE, YARN, Python/Spark, Linux, MS SQL Server, Power BI, .
Confidential, Albany, NY
- Worked closely with the Source System Analysts and Architects in identifying the attributes and to convert the Business Requirements into Technical Requirements.
- Actively involved in setting up coding standards, prepared low and high level documentation.
- Used Spark-Streaming APIs to perform necessary transformations and actions on the fly for building the common learner data model which gets the data from Kafka in near real time and Persists into Cassandra .
- Implemented the Hive queries for aggregating the data and extracting useful information by sorting the data according to required attributes.
- Importing and exporting data into HDFS and Hive using Sqoop.
- Evaluated the use of Zookeeper in cluster co-ordination services.
- Involved in loading data from UNIX file system to HDFS.
- Assisted in migrating from On-Premises Hadoop Services to cloud based Data Analytics using AWS.
- Developed Map Reduce jobs to validate and implement business logics.
- Used AVRO Serdes to handle AVRO format data in HIVE and IMPALA.
- Used Solr/Lucene for developing open source enterprise search platform in a testing and development environment.
- Used AWS remote computing services such as S3, EC2.
- Experienced in implementing Spark RDD transformations, actions to implement business analysis and Worked with Spark accumulators and broadcast variables
- Designed and implemented MapReduce based large-scale parallel relation-learning system.
- Involved in tuning of Cassandra cluster by changing the parameters of Read operation, Compaction, Memory Cache, Row Cache.
- Worked on OOZIE workflow engine for job scheduling.
- Imported required tables from RDBMS to HDFS using Sqoop and also used Storm/ Spark streaming and Kafka to get real time streaming of data into HBase.
- Experience in designing ETL solutions using Informatica Power Centre tools such as Designer, Repository manager, Workflow manager, and Workflow Monitor.
- Gained knowledge in installing cluster, commissioning and decommissioning of Data Node, Name Node recovery, capacity planning and slots configuration.
- Utilized Apache Hadoop by Hortonworks to monitor and manage the Hadoop Cluster.
- Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
- Converted the feasible business requirements to technical tasks in Design Documents.
- Worked according to production environment configuration and functional change requests.
- Involved in Unit level and Integration level testing and prepared supporting documents for proper deployment.
Environment: s: Hadoop(HDFS/Horton Works), AWS S3, EC2, PIG, HIVE, UDF, SQOOP, Datastax Cassandra, Scala/Spark, Linux, Storm, Solr, Lucene, Kafka, Impala.
Confidential, Hamilton, NJ
- Developed scripts, UDF's using both Spark SQL and Spark Core in scala for Data Aggregation, queries and verified its performance over MR jobs.
- Responsible for building scalable distributed data solutions using Hadoop components.
- Handled importing of data from various data sources, performed transformations using Hive, Spark and loaded data into HDFS
- Wrote job work flows as per the requirements and their dependencies.
- Used Sqoop to dump data from relational database into HDFS for processing.
- Worked on implementing partition, dynamic partition and buckets in HIVE for efficiently accessing data.
- Hands on experience in copying log files into HDFS from Greenplum using Flume.
- Developed and implemented PIG UDF to preprocess the data and use it for analysis.
- Wrote custom MapReduce codes, generated JAR files for user defined functions and integrated with HIVE to help the analysis team with the statistical analysis.
- Used Zookeeper for job synchronization.
- Implemented Fair schedulers on the job tracker to share the resources of the cluster for the MapReduce jobs given by the users.
- Prepared AVRO schema files for generating HIVE tables and shell scripts for executing HADOOP commands for single execution.
- Wrote queries in Datastax Cassandra for searching, sorting and generating the data.
- Gained working knowledge in NOSQL Databases (HBase and Datastax Cassandra).
- Used SQOOP to load data from DB2 into HBASE environment.
- Provide in-depth technical and business knowledge to ensure efficient design, programming, implementation and on-going support for the application.
- Responsible for complete SDLC management using Agile Methodology.
- Installed and configured Hadoop MapReduce, HDFS, HIVE, PIG, SQOOP, Flume, OOZIE on the Hadoop cluster are installed and configured.
ENVIRONMENT: S: Hadoop, MapReduce, Cloudera Manager, HDFS, HIVE, PIG, HBase, Solr, Sqoop,Spark/Scala, Flume, Oozie, UNIX Shell Scripting, SQl, Eclipse.
Confidential, Charlotte, NC
- Part of core development team involved in the Re-engineering activities.
- Designed, implemented and tested the Spring Domain model for the services using Core JAVA.
- Participated in a feasibility study on JSF MVC architecture for the project.
- Wrote custom support modules for upgrade implementation using Pl/Sql, Unix Shell Scripts.
- JSF Migration - Worked on the re-engineering effort to convert the properitary servlet based application to JSF based MVC Architecture.
- Spring Introduction - Involved in complete hands on programing on the core product development using J2EE, JSF and Spring.
- POJO Architecture - Re-engineered the application using IoC principles and removed heavy weight application to light wieght model by removing Enterprise Java Beans and re-worked the business model with Simple POJOs based architecture.
- Participated in the activities to Convert services to Web Services using Axis.
- Developed and Implemented MVC Architecture using JSF and Spring
- Implemented AJAX functionality using RichFaces Components.
- Implemented custom converters and validators in JSF.
- Involved in writing the ANT scripts to build and deploy the application.
- Developed automated build scripts that check out the code from CVS and build the application using Apache ANT.
- Created Stored procedures using PL-SQL for data modification (Using DML insert, update, delete) in Oracle
- Interaction with Oracle database is implemented using Hibernate.
- Used XSL/XSLT for transforming and displaying reports. Developed Schemas for XML.
- Responsible and active in analysis, design, implementation and deployment of full Software Development Life Cycle(SDLC) of the project.
- Developed Struts action classes, action forms and performed action mapping using STRUTS framework and performed data validation in form beans and action classes.
- Extensively used STRUTS framework as the controller to handle subsequent clients and invoke the model based upon user requests.
- Defined the search criteria and pulled out the record of the customer from the database. Make the required changes and save the updated record back to the database.
- Developed build and deployment scripts using APACHE ANT to customize WAR and EAR files.
- Used DAO and JDBC for database access.
- Developed stored procedures and triggers using PL/SQL in order to calculate and update the tables to implement business logic.
- Design and develop XML processing components for dynamic menus on the application.
- Involved in post-production, support and maintenance of the application.
- Involved in analysis, design, and implementation and testing of the project.
- Developed web components using JSP, Servlets and JDBC.
- Implemented database using SQL Server.
- Designed tables and indexes.
- Wrote complex SQL and stored procedures.
- Involved in fixing bugs and unit testing with test cases using JUNIT.
- Developed user and technical documentation.
Oracle PL/SQL Developer
- Involved in development of process using SQL* LOADER, PL/SQL Package, TOAD
- Involved in Oracle development (Data Base) by creating Oracle PL/SQL Functions, Store Procedures, Triggers, Packages, Records and Collections.
- Involved in Oracle SQL, PL/SQL, SQL*Plus, SQL*Loader, Query performance tuning.
- Extensively used Cursors, Ref Cursors, Dynamic SQL and Functions.
- Created Records, Tables, Objects, Collections, Views, Materialized views and Global temporary tables (Nested Tables and V arrays), and performed Error Handling.
- Designed CUSTOM forms and reports in order to meet the Business requirements.
- Run batch jobs for loading database tables from Flat Files using SQL*Loader.
- Fixed Bugs during development/testing and Production phases.
- Performed SQL performance tuning.
- Created business area, new custom folders, complex folders, summary folder, hierarchies, and item classes and list of values using Discoverer 10g.
- Involved in Technical Documentation, Unit test, Integration Test and writing the Test plan.
- Used Forms Compare tool to compare the difference between forms.
Environment: SQL* Plus, PL/SQL, TOAD, SQL* LOADER, SQL Discoverer, Oracle 9i, Oracle Forms/Reports 6i,Windows, UNIX.