- 8+ years of experience in IT industry, which includes hands on experience in bigdata ecosystem related technologies.
- Possesses 4+ years of rich Hadoop experience in design and development of Big Data applications, which involves Apache Hadoop Map/Reduce, HDFS, Hive, HBase, Pig, Oozie, Sqoop, Kafka, Flume and Spark.
- Expertise in developing solutions around NOSQL databases like MongoDB and Cassandra.
- Experience in installation, configuration, supporting and monitoring Hadoop clusters using Apache, Hortonworks Distribution (HDP2.X).
- Experience with all flavor of Hadoop distributions, including Cloudera, Horton works.
- Excellent understanding of Hadoop architecture Map Reduce MRv1 and Map Reduce MRv2 (YARN).
- Developed multiple Map Reduce programs to process large volumes of semi/unstructured data files using different Map Reduce design patterns.
- Strong experience in writing Map Reduce jobs in Java and Pig.
- Hands on experience in working with Ecosystems consistingHive, Pig, Sqoop, Map Reduce, Flume, and Oozie.
- Worked extensively over semi - structured data (fixed length & delimited files), for data sanitation, report generation and standardization.
- Expertise in different data loading techniques (Flume, Sqoop) onto HDFS.
- Experience in importing and exporting data between HDFS and Relational Database Management systems using Sqoop.
- Experience in handling continuous streaming data using Flume and memory channels.
- Solved performance issues in Hive and Pig scripts with understanding of Joins, Group and Aggregation.
- Extending Hive and Pig core functionality by writing custom UDFs.
- Expertise in Hadoop administration such as managing cluster, reviewing Hadoop log files.
- Extensive Experience in Developing and maintaining Big Data streaming applications using Kafka, Storm, Spark and other Hadoop Components
- Good Experience in writing complex SQL queries with databases like Oracle 10g, MySQL, and SQL Server
- The concepts of Objects, Classes and their relationships and how to model them and good hands on experience on Spring 2.5 framework
- Knowledge in job workflow scheduling and monitoring tools like Oozie and Zookeeper. Expertise in using ETL tool Informatica to Extract,Transform and Load the data into ware house.
- Hands on experience with Spark using Scala and Python.
- Hands on experience working with JSON files.
- Hands on experience in Spark architecture and its integrations like Spark SQL, Data Frames and Datasets API.
- Made POC on Spark Real Time Streaming using Kafka into HDFS.
- Hands-on experience with AWS (Amazon Web Services), using Elastic MapReduce (EMR), creating andStoring data in S3 buckets and creating Elastic Load Balancers (ELB) for Hadoop front end Web UI’s.
- Extensive knowledge on creating Hadoop cluster on multiple EC2 instances in AWS and configuring themthrough Ambari and using IAM (Identity and Access Management) for creating groups, users.
- Extensively worked with object oriented Analysis, Design and development of software using UML methdolgy.
- Exposed into methodologies like scrum, agile and waterfall.
- Good knowledge of Normalization, Fact Tables and Dimension Tables, also dealing with OLAP and OLTP systems.
- Hands on experience in application development using Java, RDMS and UNIX shell scripting..
- Strong Experience in Unit Testing and System testing in BigData.
- Experience working on Version control tools like SVN and GIT revision control systems such as GitHub and JIRA to track issues and crucible for code reviews.
- Expertise in using Linux OS including flavors like CentOS, Ubuntu and Linux Mint.
Confidential, Chicago, IL
- Involved in loading and transforming large sets of structured, semi structured and unstructured data from relational databases into HDFS using Sqoop imports.
- Developed Sqoop scripts to import export data from relational sources and handled incremental loading on the customer, transaction data by date.
- Migrated existing java application into Microservices using spring boot and spring cloud.
- Working knowledge in different IDEs like Eclipse, Spring Tool Suite.
- Working knowledge of using GIT, ANT/Maven for project dependency / build / deployment.
- Developed simple and complex MapReduce programs in Java for Data Analysis on different data formats.
- Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
- Worked as a part of AWS build team.
- Create, configure and managing S3 bucket (storage).
- Import the data from different sources like HDFS/HBase into Spark RDD.
- Experienced with batch processing of data sources using Apache Spark and Elastic search.
- Experienced in implementing Spark RDD transformations, actions to implement business analysis
- Migrated Hive QL queries on structured into Spark QL to improve performance
- Optimized MapReduce Jobs to use HDFS efficiently by using various compression mechanisms.
- Worked on partitioning HIVE tables and running the scripts in parallel to reduce run-time of the scripts.
- Worked on Data Serialization formats for converting Complex objects into sequence bits by using AVRO, PARQUET, JSON, CSV formats.
- Responsible for analyzing and cleansing raw data by performing Hive/Impala queries and running Pig scripts on data.
- Administration, installing, upgrading and managing distributions of Hadoop, Hive, HBase.
- Involved in performance of troubleshooting and tuning Hadoop clusters.
- Created Hive tables, loaded data and wrote Hive queries that run within the map.
- Implemented business logic by writing Hive UDFs in Java.
- Wrote XML scripts to build OOZIE functionality.
- Used OOZIE Operational Services for batch processing and scheduling workflows dynamically.
- Extensively worked on creating End-End data pipeline orchestration using Oozie.
Environment : Map Reduce, HDFS, Spring Boot, Microservices, AWS, Hive, Pig, SQL, Sqoop, Oozie, Shell scripting, Cron Jobs, Apache Kafka, J2EE.
Confidential, River side, CA
- Installed/Configured/Maintained Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, and Sqoop.
- Development and testing of Hadoop jobs and implemented data quality solution based on design.
- Developed a data pipeline using flume, hadoop and Hive to ingest, transform and analyzing data.
- Hands-on experience in using Hive partitioning, bucketing and execute different types of joins on Hive tables.
- Used Sqoop to import data into HDFS and Hive from multiple data systems.
- Analyzed large amounts of data sets to determine optimal way to aggregate and report on it. Handled importing of data from various data sources, performed transformations using Hive, MapReduce .
- Helped with the sizing and performance tuning of the Cassandra cluster.
- Involved in converting Cassandra/Hive/SQL queries into Spark transformations using Spark RDD's
- Developed multiple POCs using Spark and deployed on the Yarn cluster
- Involved in the process of Cassandra data modeling and building efficient data structures.
- Importing and Exporting of data from RDBMS to HDFS and vice versa using Sqoop.
- Analyzed the data by performing Hive queries and running Pig scripts to know user behavior like shopping
- Configured Oozie workflow to run multiple Hive and Pig jobs which run independently with time and data availability
- Optimized MapReduce code, pig scripts and performance tuning and analysis
- Implemented advanced procedures like text analytics and processing, using the in-memory computing capabilities of Spark
- Exported the aggregated data into Oracle using Sqoop for reporting on the Tableau dashboard
- Involved in developing Pig Scripts for change data capture and delta record processing between newly arrived data and already existing data in HDFS.
Environment: HortonworksHDP 2.5, MapReduce, HDFS, Hive, Pig, SQL, Ambari, Sqoop, Flume, Oozie, HBase, Java (jdk 1.6), Eclipse, MySql and Unix/Linux.
Confidential, Stamford, CT
- Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted the data from Oracledatabase into HDFS using Sqoop.
- Gathered business requirements, definition and design of the data sourcing and data flows, data quality analysis, working in conjunction with the data warehouse architect on the development of logical data models.
- Create, develop, modify and maintain Database objects, PL/SQL packages, functions, stored procedures, triggers, views, and materialized views to extract data from different sources.
- Extracted data from various location and load them into the oracle table using SQL*LOADER.
- Responsible for cluster maintenance, adding and removing cluster nodes, cluster monitoring and troubleshooting, manage and review data backups, manage and review Hadoop log files.
- Developed the Pig Latin code for loading, filtering and storing the data.
- Collecting and aggregating large amounts of log data using ApacheFlume and staging data in HDFS for further analysis
- Importing and exporting data into HDFS and Hive using Sqoop.
- Analyzed the data by performing Hive queries and running Pig scripts to know user behavior.
- Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.
- Installed Oozie workflow engine to run multiple Hive and Pig jobs.
- Developed Hive queries to process the data and generate the data cubes for visualizing.
- Successfully managed Extraction, Transformation and Loading (ETL) process by pulling large volume of data from various data sources using BCP in staging database from MS Access and excel.
- Was responsible for detecting errors in ETL Operation and rectify them.
- Incorporated Error Redirection during ETL Load in SSIS Packages.
- Implemented various types of SSIS Transformations in Packages including Aggregate, Fuzzy Lookup, Conditional Split, Row Count, Derived Column etc.
- Implemented the Master Child Package Technique to manage big ETL Projects efficiently.
- Involved in Unit testing and System Testing of ETL Process.
Environment: Hadoop, MapReduce, HDFS, Hive, Oracle 11g, Java, Struts, Servlets, HTML, XML, SQL, J2EE, JUnit, Tomcat 6, MS SQL Server 2005/08, SQL SSIS, SSAS, SSRS, SPSS
Confidential, Boston, MA
- Responsible for the analysis, documenting the requirements and architecting the application based on J2EE standards. Followed test-driven development (TDD) and participated in scrum status reports.
- Provided full SDLC application.
- Development services including design, integrate, test, and deploy enterprise mission-critical billing solutions.
- Participated in designing of Use Case, Class Diagram and Sequence Diagram for various Engine components and used IBM Rational Rose for generating the UML notations.
- Developing Ant, Maven and ShellScripts to automatically compile, package, deploy and test J2EE applications to a variety of Web Sphere platforms.
- Experience in developing Business Applications using JBoss, Web Sphere and Tomcat.
- Perl scripting, shell scripting and PL/SQL programming to resolve business problems of various natures.
- Client side validations and server side validations are done according to the business needs. Written test cases and done Unit testing and written executing Junit tests.
- Written ANT Scripts for project build in LINUX environment.
- Involved in Production implantation and post production support.
- Involved in complete software development life cycle management using UML.
- Coding interfaces for Web Services
- Application was developed using SpringMVC Web flow modules
- Implemented spring framework for application transaction management
- Created connections to database using Hibernate Session Factory, using Hibernate APIs to retrieve and store data to the database with Hibernate transaction control.
- Involved in the development of Page Flows, Business Objects, Hibernate database mappings and POJOs. Used xml style syntax for defining object relational metadata.
- Used Spring ORM to integrate Spring Framework with hibernate and JPA.
Environment: Java 7, Spring Framework 3.0, Hibernate, Java 1.6, DHTML, HTML, CSS, Servlets, UML, J2EE, JSP, EJB, Struts Framework Tallies, SQL.
- Video surveillance system design, database management, and web application development
- Node access Control design, database management and web application development
- Responsible for project management and scheduling, rolling out the project plans for new projects.
- Successfully managed cross-functional teams to keep the design process within target date.
- Developed and managed projects for video surveillance system design, database management, and web application development
- Developed applications for video surveillance system designs, such as user interface, onscreen displays, motion detection, and video storage in C++.
- Developed HTML and CSS based web application for flashing embedded system codes.
- Designed and implemented Web applications for Node control interfaces for surveillance based software.
- Successfully implemented Node access control system management.
- Implemented ASP.Net based web application for managing Security and Node control triggers and reporting mechanisms and alert systems in MS-SQL.
Environment: VISUAL BASIC, SQL, HTML, MS-SQL, C++, RTOS (NUCLEUS PLUS).