- Over 8+ years of industrial experience in Application development and maintenance, data management, programming, data analysis and data visualization followed by Agile Methodology.
- Experience in dealing with Apache Hadoop components like HDFS, MapReduce, Hive, HBase, Pig, Sqoop, Ooze, Mahout, Python, Spark, Storm, Cassandra, MongoDB, Big Data and Big Data Analytics.
- Experience in Software development in Java Application Development, Client/Server Applications, Internet/Intranet based database applications and developing, testing and implementing application environment using J2EE, JDBC, JSP, Servlets, Web Services, Oracle, PL/SQL and Relational Databases.
- Good experience in complete project life cycle(design, development, testing and implementation) of Client Server and Web applications.
- Good understanding/knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, Secondary Name node, and MapReduce concepts.
- Having Experience in all phases of diverse technology projects specializing in Data Science and Machine Learning.
- Configured Spark streaming to receive real time data from the Kafka and store the stream data to HDFS using Scala.
- Experience in using Kafka and Kafka brokers to initiate spark context and processing live streaming information with the help of RDD.
- Hands on experience in writing Hadoop Jobs for analyzing data using Hive QL (Queries), Pig Latin (Data flow language), and custom MapReduce programs in Java.
- Experienced developing Hadoop integration for data ingestion, data mapping and data process capabilities.
- Experienced in building analytics for structured and unstructured data and managing large data ingestion using technologies like Kafka/Avro/Thift.
- Experienced in Machine Learning Regression Algorithms like Simple, Multiple, Polynomial, SVR(Support Vector Regression), Decision Tree Regression, Random Forest Regression
- Exceptional ability to quickly master new concepts and capable of working in groups as well as independently.
- Experience in debugging, troubleshooting production systems, profiling and identifying performance bottlenecks in managing and troubleshooting Hadoop related issues.
- Hands of experience in virtualization and worked on VMware Virtual Center.
- Good Knowledge on Hadoop Cluster architecture and monitoring the cluster.
- Expertise in setting up standards and processes for Hadoop based application design and implementation.
- Experience in importing and exporting data using Sqoop from Relational Database Systems to HDFS and vice - versa.
- Experience in managing Hadoop clusters using Cloudera Manager.
- Experience in using the Impala usage for the high-performance SQL queries.
- Hands on experience in VPN, Putty, WinSCP, VNCviewer, etc.
- Hands on experience in application development using Java, RDBMS, and Linux shell scripting.
- Performed data analysis using MySQL, SQL Server Management Studio and Oracle.
- Expertise in creating Conceptual Data Models, Process/Data Flow Diagram, Use Case Diagrams and State Diagrams.
- Experience with cloud computing platforms like Amazon Web Services(AWS).
- Ability to adapt to evolving technology, keen sense of responsibility and accomplishment and good team player.
Scripting Languages: Python, Perl, Shell.
Big Data Technologies: Hadoop, HDFS, Hive, Map Reduce, Pig, Sqoop, Flume, Zookeeper, Spark, AWS,Machine Learning Algorithms.
Programming Languages: Java.
Java Frameworks: Struts, Spring, Hibernate.
DB Languages: SQL, PL/SQL.
JMS, Active: MQ, IBM MQ.
Databases /ETL: Oracle 9i/10g/11g, MySQL 5.2, DB2, Informatica v 8.x.
NoSQL Databases: HBase, Cassandra, Mango DB.
Operating Systems: Windows, Unix and Linux.
Confidential, Sacramento, CA
Sr. Hadoop Bigdata Developer
- Worked on analyzing Hadoop cluster and different Big Dataanalytic tools including Pig, Hive HBase database and SQOOP. Involved in Unit testing and delivered Unit test plans and results documents followed by Agile.
- Developed Hive UDF's to bring all the customers email id into a structured format.
- Collected and aggregated large amount of web log data from various sources such as webservers, mobile and network devices using ApacheFlume and stored the data into HDFS for analysis.
- Designed and Exported data from HDFS environment into RDBMS using Sqoop for report generation and visualization purpose.
- Designed, developed, debug, tested and promoted Java/ETL code into various environments from DEV through to PROD.
- Extensively involved in Design phase and delivered Design documents.
- Extensively involved in writing ETL Specifications for Development and conversion projects.
- DevelopedOozie workflow engine for job scheduling.
- ImplementedNoSQL databases like HBase in creating HBase tables to load large sets of semi structured data coming from various sources.
- Support all teams that are engaging with implementing new customer, including vendors who are supporting to establish new product.
- Designing and developing the Real-Time Time Series Analysis andContextualization module for Kafka, Spark Streaming, SparkSQL.
- Working on converting Hive/SQL queries into Spark transformations using Spark RDDs and Scala.
- Developed Spark jobs using Scala in test environment for faster data processing and used Spark SQL for querying.
- Real time predictive analytics capabilities using Spark Streaming, Spark SQL and Oracle Data Mining tools.
- Developed interactive shell scripts for scheduling various data cleansing and data loading process.
- Experienced in managing and reviewing the Hadooplog files and importing log files from various sources into HDFS using Flume.
- Used AWS S3 and Local Hard Disk as underlying File System (HDFS) for Hadoop.
- Implemented Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
- Configured Kafka to read and write messages from external programs.
- Importing data from AWS S3 into Spark RDD, performed transformations and actions on RDDs.
- Working on importing metadata into Hive and migrating existing tables and applications to work on Hive and AWS cloud.
- Stored data in S3 buckets on AWS cluster on top of Hadoop.
- Developing Map Reduce/Spark Python modules for machine learning & predictive analytics in Hadoop on AWS.
- Develop and created data model, tables, views, queries etc. to support business requirements.
- DevelopingHive tables, loading with data and writing hive queries that will run internally in map reduce way.
Environment: HDFS, Map Reduce, Flume, Hive, Informatica 9.1/8.1/7.1/6.1 , Oracle 11g, SQOOP, Oozie, Pig, ETL, Hadoop 2.x, NOSQL, Flat files, AWS (Amazon Web services), Shell Scripting.
Confidential, Atlanta, GA
Hadoop Big data Developer
- Supported Map Reduce Programs those are running on the cluster and Involved in HDFS maintenance and administering it through Hadoop-Java API.
- Implemented Flume from relational database management systems using Sqoop.
- Resolved configuration issues with Apache add-on tools.
- Used Pig as ETL tool to do transformations, event joins, filter both traffic and some pre-aggregations before storing the data onto HDFS.
- Worked on different data formats such as JSON, XML and performed machine learning algorithms.
- Developed and implemented innovative AI and machine learning tools that will be used in the Risk.
- Analyzing large data sets apply machine learning techniques and develop predictive models, statistical models and developing and enhancing statistical models by leveraging best-in-class modeling techniques.
- Responsible for performing Machine-learning techniques regression/classification to predict the outcomes.
- Involved in writing Flume and Hive scripts to extract, transform and load the data into Databasecluster capacity planning, performance tuning, cluster Monitoring, Troubleshooting.
- Developed Spark code using Java and Spark-SQL/Streaming for faster processing of data.
- Implemented test scripts to support test driven development and continuous integration.
- Wrote customs UDF's for HIVE to pull the customized data.
- Experienced on loading and transforming of large sets of structured, semi structured and optimizing of existing algorithms in Hadoop using Spark Context, Hive-SQL, Data Frames.
- Performed advanced procedures like text analytics and processing, using the in-memory computingcapabilities of Spark using Scala.
- Identify and assess available machine learning and statistical analysis libraries (including regressors, classifiers, statistical tests, and clustering algorithms).
- Experienced in handling large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark, Effective & efficient Joins, Transformations and other during ingestion process itself.
- Experience in writing complex SQL to optimize the hive queries.
- Converted the text files and the csv files to parquet form for the analysis of data.
- Validated the machine learning classifiers using ROC Curves and Lift Charts.
- Prepared the mapping document, as in which fields has be used from the HIVE DB and perform thetransformations.
- Developed Spark scripts by using Scala shell commands as per the requirement.
- Developed UNIX shell scripts to send a mail notification upon the job completing either with a success orFailure notation.
- Performed the analytics over the data mining, data visualization using Hive.
- Effectiveness testing of the customers from the source output database DB.
- Worked in Agile environment and used JIRA as a bug-reporting tool for updating the bug report.
Environment: Linux, AWS, HDFS, Hortonworks Hadoop ecosystem, Hive, Spark, Sqoop, Flume, Machine learning, Zookeeper, HBase.
Confidential, Centreville, VA
Sr. Hadoop Developer
- Involved in architecture design, development and implementation of Hadoop deployment, backup and recovery systems followed by Agile.
- Involved and Supported code/design analysis, strategy development and project planning.
- Contributing to the development of key data integration and advanced analytics solutions leveraging Apache Hadoop and other big data technologies for leading organizations using major Hadoop Cloudera Distribution.
- ImplementedAmazon AWS EC2, RDS, S3, RedShift, etc., Tools- Hadoop, Hive, Pig, Sqoop, Oozie, HBase, Flume, Spark.
- Implemented data log directly into HDFS using Flume in Cloudera - CDH and Involve in loading data from LINUX file system to HDFS in Cloudera - CDH.
- Experience in running Hadoop streaming jobs to process terabytes of xml format data and importing and exporting data into HDFS and assisted in exporting analyzed data to RDBMS using SQOOP in Cloudera.
- Design and Development of Integration APIs using various Data Structure concepts, Java Collection Framework along with exception handling mechanism to return response within 500ms. Usage of Java Thread concept to handle concurrent request.
- Installed and configured MapReduce, HIVE and the HDFS and Developing Spark scripts by using Java per the requirement to read/write JSON files. Working on Importing and exporting data into HDFS and Hive using Sqoop.
- Hands on experience on Hadoop Administration, development, NoSQL in in Cloudera Load and transform large sets of structured, semi structured and unstructured data.
- Involve in creating Hive tables, loading with data and writing hive queries which will run internally in map. Automate all the jobs, for pulling data from FTP server to load data into Hive tables, using Oozie workflows.
- ImplementedHBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios.
- Created reports for the BI team using Sqoop to export data into HDFS and Hive, Configure and install Hadoop and Hadoop ecosystems (Hive/Pig/ HBase/ Sqoop/ Flume).
- Designed and implemented a distributed data storage system based on HBase and HDFS. Importing and exporting data into HDFS and Hive.
- Design & Implement Data Warehouse creating facts and dimension tables and loading them using Informatica Power Center Tools fetching data from the OLTP system to the Analytics Data Warehouse. Coordinating with business user to gather the new requirements and working with existing issues, worked on reading multiple data formats on HDFS using Scala. Loading data into parquet files by applying transformation using Impala. Executing parameterized Pig, Hive, impala, and UNIX batches in Production.
- Involve in converting Hive/SQL queries into Spark transformations using Spark RDDs and Scala, Analyzed the SQL scripts and designed the solution to implement using Scala.
- Involved in Investigating any issues that would come up. Experienced with solving issues by conducting Root Cause Analysis, Incident Management & Problem Management processes.
- Developed multiple POCs using Scala and deployed on the Yarn cluster, compared the performance of Spark, with Hive and SQL/Teradata.
Environment: Hadoop 1.x/2.x MR1, Cloudera CDH3U6, HDFS, Spark, Scala, Impala, HBase 0.90.x, Flume, Java, Sqoop, Hive, Tableau.
Confidential, Tampa, FL
- Responsible for building scalable distributed data solutions using Hadoop .
- Written multiple MapReduce programs in Java for Data Analysis
- Wrote MapReduce job using Pig Latin and Java API.
- Performed performance tuning and troubleshooting of MapReduce jobs by analysing and reviewing Hadoop log files
- Developed pig scripts for analysing large data sets in the HDFS .
- Experienced in migrating HiveQL into Impala to minimize query response time.
- Experience in handling Hive queries using Spark SQL that integrates with Spark environment.
- Responsible for creating Hive tables , loading the structured data resulted from MapReduce jobs into the tables and writing hive queries to further analyse the logs to identify issues and behavioural patterns.
- Performed extensive Data Mining applications using Hive.
- Performed streaming of data into Apache ignite by setting up cache for efficient data analysis.
- Responsible for performing extensive data validation using Sqoop jobs, Pig and Hive scripts were created for data ingestion from relational databases to compare with historical data.
- Utilized Storm for processing large volume of datasets.
- Used Kafka to load data in to HDFS and move data into NoSQL databases (Cassandra)
- Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios.
- Involved in submitting and tracking Map Reduce jobs using Job Tracker.
- Involved in creating Oozie workflow and Coordinator jobs to kick off the jobs on time for data availability.
- Used Visualization tools such as Power view for excel, Tableau for visualizing and generating reports.
- Exported data to Tableau and excel with Power view for presentation and refining
- Implemented business logic by writing Pig UDFs in Java and used various UDFs from Piggybanks and other sources
- Implemented test scripts to support test driven development and continuous integration.
- Actively participated in daily scrum meetings.
Environment: Hadoop, Map Reduce, HDFS, Pig, Hive, Spark SQL Sqoop, Flume, Oozie, Java, Linux, Maven, Zookeeper, Tableau, HBase, Cassandra, RDBMS.
Java / J2ee Developer
- Worked on complete life cycle, Design, development and testing using Agile methodology.
- Designed and developed presentation layer using JSP, Custom Tags and HTML.
- Understanding the client requirements and add designing document with business requirements and created Use Cases.
- Implemented the Servlets to transfer the request to an appropriate server where the request can be processed, and the results are then transferred to client.
- Implemented Java Script is used for the client-side validations and to provide the event driven programming with HTML files.
- Developed the user interface using JSP and Java Script to view all online trading transactions
- Developed both Session and Entity beans representing differenttypes of business logic abstractions.
- Developed Java Server Pages for the Dynamic front end content that use Servlets and EJBs.
- Designed modules using JDBC for database connectivity.
- Created the stored procedures using Oracle database and accessed through Java JDBC.
- Implemented Java Naming/Directory Interface (JNDI) to support transparent access to distributed components, directories and services.
- DevelopedJDBC API to connect to the database and carry out database operations.
- Clustered WebLogic and JBoss for high availability.
- Developed action Servlets and JSPs for presentation in Struts MVC framework.
- Implemented JSP and JSTL Tag Libraries for developing User Interface components.
- Developed helper classes and configured deployment descriptors.
- Involved in the maintenance and support of the application.
- Implemented Business Delegate, DAO, DTO, Service locator, Session Façade, View Helper and Value Object design patterns for all the modules.
- Worked with iText API for generating spec sheet design sheets in PDF format.
- Developed PL/SQL View function in Oracle 9i database for get available date module.
- Involved in writing test casesunit testing & integration testing for testing the application and Worked with the testing team to identify, categorize and fix bugs.
- Developed Ant build script to create EAR files and deployed the application in Web Logic app server.
- Deploying application in Dev, Production servers, coordinating my work with the offshore team.
Environment: Custom Tags, Java, J2EE, Java Script, JSP, JDBC, HTML, Oracle WebLogic and Red hat JBoss server, Oracle, PL/SQL.
Java / J2ee Developer
- Developing Use Case, Class diagrams and Sequence diagrams for the modules using UML and VISIO.
- Involved in analysis, design and development of e-bill payment system as well as account transfer system and developed specs that include Use Cases, Class Diagrams, Sequence Diagrams and Activity Diagrams.
- Developed Core Java for business logic.
- Implemented the project according to the Software Development Life Cycle (SDLC)in analyzing, designing, implementing and testing of the project.
- Developed the web layer using Spring MVC framework.
- Implemented JDBC for mapping an object-oriented domain model to a traditional relational database.
- Developed StoredProcedures to manipulate the database and to apply the business logic according to the user's specifications.
- Developed UML diagrams like Use cases and Sequence diagrams as per requirement.
- Developed the Generic Classes, which includes the frequently used functionality, for reusability.
- Exception Management mechanism using Exception Handling Application Blocks to handle the exceptions.
- Designed and developed user interfaces using JSP, Java script, HTML and Struts framework.
- Involved in Database design and developing SQL Queries, stored procedures on MySQL.
- Developed Action Forms and Action Classes in Struts frame work.
- Programmed session and entity EJBs to handle user info track and profile-based transactions.
- Involved in writing JUnit test cases, unit and integration testing of the application.
- Developed user and technical documentation.
- Provide on-call support to production systems and provide analysis, troubleshooting and problem resolution.
- Involved with knowledge transfers and trainings to bring additional resources onboard.
Environment: Java, Java Script, Core Java, Use Cases, Class Diagrams, Sequence Diagrams, EJB, MySQL, Junit, HTML, JSP, JDBC Drivers, UNIX, Shell scripting, SQL Server.