- Over 11 Years of IT experience in the field of Information Technology that includes analysis, design, development and testing of complex applications.
- Strong working experience with Big Data and Hadoop Ecosystems.
- Strong experience with Hadoop components: HBase, Zookeper, Hive, Pig, Sqoop and Flume.
- Experience in working with BI team and transform big data requirements into Hadoop centric technologies.
- Excellent understanding / knowledge of Hadoop architecture and various components of Hadoop ecosystem such as HDFS, JobTracker, TaskTracker, NameNode, DataNode, MapReduce &YARN.
- Proficient in writing HiveQL queries, Pigbased scripts, Map Reduce Jobs and implementing HBase.
- Experience in installation, configuration, Management, supporting and monitoring Hadoop cluster using various distributions such as Apache Hadoop, Cloudera and Hortonworks
- Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice - versa.
- Experience in working with flume to load the log data from multiple sources directly into HDFS
- Experience in configuring the Zookeeper to coordinate the servers in clusters and to maintain the data consistency.
- Experience in designing both time driven and data driven automated workflows using Oozie.
- Optimization/ performance tuning of MR, PIG & Hive Queries.
- Proficient in designing Rowkeys & Schema Design for NoSQL Databases.
- Good understanding in Cassandra & MongoDB implementation.
- Extensive experience in creating Tableau Scorecards, Tableau Dashboards using Stack bars, bar graphs & Geographical maps.
- Experience in creating different visualizations using plots, histogram, heat maps & highlight tables.
- Good understanding in writing Python Scripts.
- Experience in providing support to data analyst in running Pig and Hive queries.
- Experience in working with Customer engineering teams to assist with their validation cycles.
- Experienced in migrating all historical data from ParAccel to AWS S3 file System with help of SQOOP
- Data Design and Development on Microsoft SQL Server … T-SQL
- Experience in using Business Intelligence tools (SSIS, SSAS, SSRS) in MS SQL Server 2008
- Experience in handling the offshore/onsite teams.
- Extensive experience with T-SQL in constructing Triggers, Tables, implementing stored Procedures, Functions, Views, User Profiles, Data Dictionaries and Data Integrity.
- Proficient in creating dashboards & scorecards using PPS 2010.
- Good understanding in database and data warehousing concepts (OLTP & OLAP).
- Excellent T-SQL development skills to write complex queries involving multiple tables, great ability to develop and maintain stored procedures, triggers, user-defined functions.
- Experience in Performance Tuning and Query Optimization.
- Experienced in using ETL tools in (SSIS) MS SQL 2008, MS SQL 2005 and DTS in MS SQL2000.
- Experience developing applications using Java, J2EE, JSP, MVC, Hibernate, JMS, JSF, EJB, XML, AJAX and web based development tools.
- Experience in developing service components using JDBC.
- Experience working with popular frame works like Spring MVC, Hibernate.
- Implemented SOAP based web services.
- Used Curl scripts to test RESTful Web Services.
- Experience in database design using PL/SQL to write Stored Procedures, Functions, Triggers and strong experience in writing complex queries for Oracle.
- Experienced in both Waterfall and Agile Development (SCRUM) methodologies
- Strong Problem Solving and Analytical skills and abilities to make Balanced & Independent Decisions.
Hadoop/Big Data Stack: Hadoop, HDFS, MapReduce, Hive, Pig, Spark-streaming, Scala,Kafka, Storm, Zoo Keeper, HBase, Yarn, Spark, Sqoop, Flume.
Hadoop Distributions: Cloudera, Horton works.
Programming Language: C++, JAVA, Python, Scala.
Query Languages: HiveQL, SQL, PL/SQL, Pig.
Frameworks: MVC, Struts, Spring, Hibernate.
IDE's: Eclipse, NetBeans.
Databases: Oracle, MYSQL, MS Access, DB2, TeraData.
NO SQL: HBase, Cassendra, MangoDB.
Operating Systems: Windows, Linux, Unix, CentOS.
Confidential, New York, NY
Sr. Hadoop Developer
- Responsible for Installation and configuration of Hive, Pig, Sqoop, Flume and Oozie on the Hadoop cluster.
- Responsible for analyzing and cleansing raw data by performing Hive queries and running Pig scripts on data.
- Designed workflows and coordinators in Oozie to automate and parallelize Hive and Pig jobs on Apache Hadoop environment by Hortonworks
- Developed Sqoop scripts to import, export data from relational sources and handled incremental loading on the customer data by date.
- Developed simple and complex MapReduce programs in Java for Data Analysis on different data formats.
- Developed Spark scripts by using Scala Shell commands as per the requirement.
- Experience in importing the real-time data to Hadoop using Kafka and implemented the Oozie job for daily imports.
- Developed custom aggregate functions using Spark SQL and performed interactive querying on a POC level.
- Implemented business logic by writing Pig UDF's in Java and used various UDFs from Piggybanks and other sources.
- Implemented Spark applications from existing MapReduce framework for better performance
- Implemented Kafka Java producers, create custom partitions, configured brokers and implemented High level consumers to implement data platform.
- Optimized MapReduce Jobs to use HDFS efficiently by using various compression mechanisms.
- Worked on partitioning HIVE tables and running the scripts in parallel to reduce run-time of the scripts.
- Worked on Data Serialization formats for converting Complex objects into sequence bits by using AVRO, PARQUET, JSON, CSV formats.
- Installing, Upgrading and Managing Hadoop Clusters
- Administration, installing, upgrading and managing distributions of Hadoop, Hive, HBase.
- Used cluster co-ordination services through Zookeeper.
- Used OOZIE Operational Services for batch processing and scheduling workflows dynamically.
- Knowledge on handling Hive queries using Spark SQL that integrates with Spark environment.
- Advanced knowledge in performance troubleshooting and tuning Hadoop clusters.
- Created Hive tables, loaded data and wrote Hive queries that run within the map.
- Extensively worked on creating End-End data pipeline orchestration using Oozie.
- Involved in converting Map Reduce programs into Spark transformations using Spark RDD's and Scala.
- Involved in moving all log files generated from various sources to HDFS for further processing through Flume.
- Involved in loading and transforming large sets of structured, semi structured and unstructured data from relational databases into HDFS using Sqoop imports.
- Design and develop JAVA API (Commerce API) which provides functionality to connect to the Cassandra through Java services.
Environment: Hortonworks distribution (2.3), Map Reduce, HDFS, Spark, Hive, Pig, HBase, SQL, Sqoop, Flume, Oozie, Apache Kafka, Storm, Zookeeper,Tez, J2EE, Eclipse, Cassandra.Confidential, Bloomington, IL
- Involved in review of functional and non-functional requirements.
- Facilitated knowledge transfer sessions.
- Installed and configured Hadoop Mapreduce, HDFS, Developed multiple MapReduce jobs in java for data cleaning and pre-processing.
- Develop MapReduce jobs for the users. Maintain, update and schedule the periodic jobs which range from updates on periodic MapReduce jobs to creating ad-hoc jobs for the business users.
- Importing and exporting data into HDFS and Hive using Sqoop.
- Experienced in managing and reviewing Hadoop log files.
- Extracted files from Various Up Stream Application through Sqoop and placed in HDFS and processed.
- Experienced in running Hadoop streaming jobs to process terabytes of xml format data.
- Load and transform large sets of structured, semi structured and unstructured data.
- Responsible to manage data coming from different sources.
- Got good experience with NOSQL database.
- Supported Map Reduce Programs those are running on the cluster.
- Involved in loading data from UNIX file system to HDFS.
- Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map reduce way.
- Developed a custom File System plug in for Hadoop so it can access files on Data Platform.
- This plugin allows Hadoop MapReduce programs, HBase, Pig and Hive to work unmodified and access files directly.
- Designed and implemented Mapreduce-based large-scale parallel relation-learning system
- Gained very good business knowledge on Non-Life Insurance, claim processing, fraud suspect identification, appeals process etc.
Environment: Hadoop, MapReduce, HDFS, Hive, HBase, Linux Java (JDK 1.7), CDH 5.2.0, Oracle 11g / 10g, PL/SQL, SQL*PLUS, Windows NT, UNIX Shell Scripting.Confidential, Milwaukie, WI
- Imported data from different relational data sources like RDBMS, Teradata to HDFS using Sqoop.
- Imported bulk data into HBase Using Map Reduce programs.
- Perform analytics on Time Series Data exists in HBase using HBase API.
- Designed and implemented Incremental Imports into Hive tables.
- Used Rest ApI to Access HBase data to perform analytics.
- Worked in Loading and transforming large sets of structured, semi structured and unstructured data
- Involved in collecting, aggregating and moving data from servers to HDFS using Apache Flume
- Written Hive jobs to parse the logs and structure them in tabular format to facilitate effective querying on the log data.
- Involved in creating Hive tables, loading with data and writing hive queries that will run internally in MapReduce way.
- Experienced in managing and reviewing the Hadoop log files.
- Migrated ETL jobs to Pig scripts do Transformations, even joins and some pre-aggregations before storing the data onto HDFS.
- Worked with Avro Data Serialization system to work with JSON data formats.
- Worked on different file formats like Sequence files, XML files and Map files using Map Reduce Programs.
- Involved in Unit testing and delivered Unit test plans and results documents using Junit and MRUnit.
- Exported data from HDFS environment into RDBMS using Sqoop for report generation and visualization purpose.
- Worked on Oozie workflow engine for job scheduling.
- Created and maintained Technical documentation for launching HADOOP Clusters and for executing Pig Scripts.
Environment: CDH 5.3, Map Reduce, Hive0.14, Oozie, Sqoop, Pig0.11, Java, Rest API, Maven, MRUnit, Junit, Cloudera.Confidential, New York, NY
- Responsible for analyzing and understanding of data Sources like iTunes, Spotify, YouTube and Facebook data.
- Developed a multithreaded framework to grab data for playback, Traffic source, Social, device, and Demographic reports from YouTube.
- Developed reusable component in java to load data from Hadoop distributed file system to ParaAccel.
- Developed Map Reduce jobs to process the Music metric data. Scripts for uploading the data in ParAccel server.
- Developed Map Reduce codes for data manipulation.
- Implemented POC using Spark and Spark SQL.
- Working as an Architect for providing solutions
- Have been involved in designing & creating hive tables to upload data in Hadoop.
- Experienced in migrating all historical data from Par Accel to AWS S3 file System with help of SQOOP for feeds like iTunes Preorders, Radio Monitor etc.
- Responsible for all the data flow and quality of data.
- Responsible for end to end development for the client.
- Involved in Designing, development, coding, Unit testing.
- Involved in designing and developing Hadoop MapReduce jobs Using JAVA Runtime Environment for the batch processing to search and match the scores.
- Used Rational Rose for developing Use case diagrams, Activity flow diagrams, Class diagrams and Object diagrams in the design phase.
- Used Struts with Tiles in the MVC framework for the application.
- Involved in using Solr Cloud implementation to provide real time search capabilities on the repository with terabytes of data.
- Involved in developing Hadoop MapReduce jobs for merging and appending the repository data.
- Involved in agile SCRUM methodology implementation. Involved in various performance projects to increase the response time of the application.
- Involved in integration of Legacy Scoring and Analytical Models like SMG3 into the new application using Web Services.
- Hands on experience in setting up H-base Column based storage repository for archiving and retro data.
- Created various calculated fields and created various visualizations and dashboards using tableau desktop.
- Knowledge in performance troubleshooting and tuning Hadoop clusters.
- Used Crunch for transforming and analyzing time-series data.
- Experience in writing Crunch classes to iterate through the sorted trades applying the incrementing sequence numbers.
- Prepared the Installation, Customer guide and Configuration document which were delivered to the customer along with the product
Environment: Java, J2EE, Tableau Desktop, Tableau Server, Hadoop, Hbase, Kettle, Zookeeper, Solr cloud, Pig Latin, Oozie scheduler, JavaBeans, Agile SCRUM, JProfiler, Hibernate3.0, Jboss Application Server,, CXF 2.2.4, JNDI, Java Script, Servlet 2.3, JUnit, Maven, SVN, Jboss, XML Web services.Confidential
- Design of the application model using Rational Rose by utilizing Struts framework (Model View Controller) and J2EE design patterns.
- Designed Class diagrams of modules using Rational Rose (UML).
- Designed and developed user interfaces using JSP, html.
- Developed Struts components, Servlets, JSPs, EJBs, other Java components to fulfill the requirements.
- Designed and implemented all the front-end components using Struts framework.
- Designed various applications using multi-threading concepts, mostly used to perform time consuming tasks in the background.
- Developed JSP & Servlets classes to generate dynamic HTML.
- Developed JSP pages using Struts custom tags.
- Developed the Presentation layer, which was built using Servlets and JSP and MVC architecture on WebSphere Studio Application Developer
- Design and develop XML processing components for dynamic menus on the application.
- Persistence layer is implemented using Entity Beans.
- Developed SQL queries efficiently for retrieving data from the database.
- Used Rational Clear case for controlling different versions of the application code. Business delegate and Service locator patterns were used to separate the client from invoking the direct business logic implementation and prevented the duplication of code.
- Involved in the integration testing and addressed the Integration issues of the different modules of the application.
- The application was run and deployed in IBM's WebSphere Application Server 5.1. The build process was controlled using Apache Jakarta's Ant.
- Used Log4J for logging purposes.
Environment: Java/J2EE, JDBC, Servlets 2.4, JSP 2.0, EJB 2.0, Struts 1.1, Rational Clear case, WebSphere 5.1, WSAD, UML, UNIX, java-script, Ant 1.6.1, XML, DB2 and Log4JConfidential
- Extensively used SQL queries, PL/SQL stored procedures & triggers in data retrieval and updating of information in the Oracle database using JDBC.
- Involved in Design, Development and Support phases of Software Development Life Cycle (SDLC).
- Reviewed the functional, design, source code and test specifications.
- Involved in developing the complete frontend development using Java Script and CSS.
- Author for Functional, Design and Test Specifications.
- Developed web components using JSP, Servlets and JDBC.
- Designed tables and indexes.
- Designed, Implemented, Tested and Deployed Enterprise Java Beans both Session and Entity using WebLogic as Application Server.
- Developed stored procedures, packages and database triggers to enforce data integrity.
- Performed data analysis and created crystal reports for user requirements.
- Implemented Backend, Configuration DAO, XML generation modules of DIS
- Analyzed, designed and developed the component
- Used JDBC for database access
- Used Spring Framework for developing the application and used JDBC to map to Oracle database.
- Used Data Transfer Object (DTO) design patterns
- Unit testing and rigorous integration testing of the whole application
- Written and executed the Test Scripts using JUNIT
- Actively involved in system testing
- Developed XML parsing tool for regression testing
- Prepared the Installation, Customer guide and Configuration document which were delivered to the customer along with the product