- Over 8+ years of experience as Solutions - oriented IT Software Developer which includes experience in Web Application development using Hadoop and related Big Data technologies, experience is using Java 2 Enterprise edition and through all phases of SDLC.
- Experience in analysis, design, development and integration using Bigdata-Hadoop Technology like MapReduce, Hive, Pig, Sqoop, Ozzie, Kafka, HBase, AWS, Cloudera, Horton works, Impala, Avro, Data Processing, Java/J2EE, SQL.
- Good knowledge on Hadoop Architecture and its components such as HDFS, MapReduce, Job Tracker, Task Tracker, Name Node, Data Node.
- Hands on experience in installing, configuring, and using Hadoop ecosystem components like HDFS, Hive, Spark, Scala, Spark-SQL, MapReduce, Pig, Sqoop, Flume, HBase, Zookeeper, and Oozie.
- Having extensive knowledge on Hadoop technology experience in Storage, writing Queries, processing and analysis of data.
- Experience in extending Pig and Hive functionalities with custom UDFs for analysis of data, file processing, by running Pig Latin Scripts and using Hive Query Language
- Experience working with Amazon AWS cloud which includes services like (EC2, S3A, RDS and EBS), Elastic Beanstalk, Cloud Watch.
- Worked on Data Modelling using various ML (Machine Learning Algorithms) via R and Python.
- Experienced in transferring data from different data sources into HDFS systems using Kafka.
- Experience in Configured Hive meta store with MySQL, which stores the metadata for Hive tables
- Extensive experience increating data pipeline forReal Time Streaming applications using Kafka, Flume, Storm and Spark Streaming and analyze sentiment analysis for twitter source
- Strong knowledge in using Flume for Streaming the Data to HDFS.
- Good knowledge in using job scheduling and monitoring tools like Oozie and Zoo Keeper.
- Expertise on working with various databases in writing SQl queries, Stored Procedures, functions and Triggers by using PL\SQL and SQl.
- Experience in NoSQL Column-Oriented Databases like Cassandra, HBase, MongoDB and Filo DB and its Integration with Hadoop cluster.
- Strong Experience in troubleshooting the operating system like Linux, RedHat, and UNIX, maintaining the cluster issues and java related bugs.
- Experience in Developing Spark jobs using Scala in test environment for faster data processing and used Spark SQL for querying.
- Good exposure to Service Oriented Architectures (SOA) built on Web services (WSDL) using SOAP protocol.
- Well experienced in OOPS principles inheritance, encapsulation, polymorphism and Core Java principles collections, multithreading, synchronization, exception handling
BigData/Hadoop: HDFS, MapReduce, HBase, Hive, Pig, Impala, SQOOP, Flume, OOZIE, Spark, Spark QL, and Zookeeper, AWS, Cloudera, Horton works, Kafka, Avro.
Programming Lang:: Java, J2EE, C, SQL/PLSQL, PIG LATIN, Scala, HTML, XML
NoSQL DB:: MongoDB, HBase, Apache Cassandra, Filo DB, CouchDB
Database: Talend, Oracle, SQL Server, Teradata, Hadoop, IBM DB2, Sybase, Teradata and MS-Access
Utilities: Toad, Citrix server, Oracle SQL Developer, SQL Advantage, Putty
Testing Tools: HP Quality Center 9.2 / 10 / ALM, TFS, JIRA, Clear Quest, Rational Clear Case
Confidential, Foster city, CA
Sr. BigData Developer/Hadoop Developer
- Developed efficient MapReduce programs for filtering out the unstructured data and developed multiple MapReduce jobs to perform data cleaning and preprocessing on Horton works.
- Implemented Data Interface to get information of customers using RestAPI and Pre-Process data using MapReduce2.0 and store into HDFS (Hortonworks)
- Extracted files from MySQL, Oracle, and Teradata through Sqoop1.4.6and placed in HDFS Cloudera Distribution and processed.
- Worked with various HDFS file formats like Avro1.7.6, Sequence File, Json and various compression formats like Snappy, bzip2.
- Successfully written Spark Streaming application to read streaming twitter data and analyze twitter records in real time using kafka and flume to measure performance of Apache spark streaming.
- Proficient in designing Row keys and Schema Design for NoSQL Database Hbase and knowledge of other NOSQL database Cassandra.
- Used Hive to perform data validation on the data ingested using scoop and flume and the cleansed data set is pushed into Hbase.
- Good understanding of Cassandra Data Modeling based on applications.
- Wrote ETL jobs to read from web APIs using REST and HTTP calls and loaded into HDFS using java and Talend.
- Developed the Pig 0.15.0UDF's to pre-process the data for analysis and Migrated ETL operations into Hadoop system using Pig Latin scripts and Python Scripts3.5.1.
- Used Pig as ETL tool to do transformations, event joins, filtering and some pre-aggregations before storing the data into HDFS.
- Troubleshooting, debugging & altering Talend issues, while maintaining the health and performance of the ETL environment.
- Loaded data into the cluster from dynamically generated files usingFlume and from relational database management systems using Sqoop.
- Used spark to parse XML files and extract values from tags and load it into multiple hive tables.
- Experienced in running Hadoop streaming jobs to process terabytes of formatted data using Python scripts.
- Developed small distributed applications in our projects using Zookeeper3.4.7and scheduled the workflows using Oozie 4.2.0.
- Proficiency in writing the Unix/Linux shell commands.
- Developed a SCP Stimulator which emulates the behavior of intelligent networking and Interacts with SSF
Environment: Hadoop, HDFS, MapReduce, Yarn, Hive, Pig, HBase, Oozie, Sqoop, Kafka, Flume, Talend, Oracle 11g, Core Java, Spark, Scala, Cloudera HDFS, Talend, Eclipse, Oozie, Node.js, Unix/Linux, Aws, JQuery, Ajax, Python, Perl, Zookeeper.
Confidential - Atlanta, GA
Sr. BigData Developer/Hadoop Consultant
- Developed multiple Map-Reduce jobs in java for data cleaning and pre-processing. Performed Map Reduce Programs those are running on the cluster.
- Involved in loading data from RDBMS and web logs into HDFS using Sqoop and Flume. Worked on loading the data from MySQL to HBase where necessary using Sqoop.
- Configured Hadoop cluster with Name node and slaves and formatted HDFS. Performed Importing and exporting data from Oracle to HDFS and Hive using Sqoop
- Performed source data ingestion, cleansing, and transformation in Hadoop. Supported Map-Reduce Programs running on the cluster.
- Wrote Pig Scripts to perform ETL procedures on the data in HDFS. Used Oozie workflow engine to run multiple Hive and Pig jobs.
- Analyzed the partitioned and bucketed data and compute various metrics for reporting.
- Created HBase tables to store various data formats of data coming from different portfolios. Worked on improving the performance of existing Pig and Hive Queries.
- Involved in developing HiveUDFs and reused in some other requirements. Worked on performing Join operations.
- Developed fingerprinting rules on HIVE which help in uniquely identifying a driver profile
- Evaluated business requirements and prepared detailed specifications that follow project guidelines required to develop written programs.
- Exported the result set from Hive to MySQL using Sqoop after processing the data.
- Analyzed the data by performing Hive queries and running Pig scripts to study customer behavior.
- Used Hive to partition and bucket data
Environment: Hadoop, MapReduce, HDFS, HBase, HDP Horton, Sqoop, Data Processing Layer, HUE, AZURE, Erwin, MS Visio, Tableau, SQL, MongoDB, Oozie, UNIX, MySQL, RDBMS, Ambari, Solr Cloud, Lily HBase, Cron.
Confidential - Peachtree City, GA
BigData Consultant/ETL Consultant
- Multiple Spark Jobs were written to perform Data Quality checks on data before files were moved to Data Processing Layer.
- Implemented Spark using Scala and Spark SQL for faster testing and processing of data.
- Designed and Modified Database tables and used HBASE Queries to insert and fetch data from tables.
- Responsible for creating data pipeline using Kafka, Flume and Spark Streaming for Twitter source to collect the sentiment tweets of Confidential customers about the reviews
- Involved in moving all log files generated from various sources to HDFS for further processing through Flume1.7.0.
- Involved in deploying the applications in AWSand maintains the EC2 (Elastic Computing Cloud) and RDS (Relational Database Services) in amazon web services.
- Implemented the file validation framework, UDFs, UDTFs and DAOs.
- Strong experienced in working with UNIX/LINUX environments, writing UNIX shell scripts, Python and Perl.
- Created reporting views in Impala using Sentry Policy files.
- Build REST web service by building Node.js Server in the back-end to handle requests sent from the front-end JQuery Ajax calls.
- Importing and exporting data from different databases like MySQL, RDBMS into HDFS and HBASE using Sqoop.
- Advanced knowledge in performance troubleshooting and tuning Cassandra clusters.
- Analyzing the source data to know the quality of data by using Talend Data Quality.
- Involved in creating Hive tables, loading with data and writing hive queries.
- Developed REST APIs using Java, Play framework and Akka.
- Model and Create the consolidated Cassandra, Filo DB and Spark tables based on the data profiling.
- Used OOZIE1.2.1Operational Services for batch processing and scheduling workflows dynamically and created UDF's to store specialized data structures in HBase and Cassandra.
- Developed multiple MapReduce jobs in Java for data cleaning and pre-processing.
- Used Impala to read, write and query the Hadoop data in HDFS from Cassandra and configured Kafka to read and write messages from external programs.
- Optimizing existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames and Pair RDD's.
- Create a complete processing engine, based on Cloudera distribution, enhanced to performance
Environment: Hadoop, HDFS, MapReduce, Yarn, Hive, Pig, HBase, Oozie, Sqoop, Kafka, Flume, Oracle 11g, Core Java, FiloDB, Spark, Akka, Scala, Cloudera HDFS, Talend, Eclipse, Web Services (SOAP, WSDL), Oozie, Node.js, Unix/Linux, Aws, JQuery, Ajax, Python, Perl, Zookeeper.
Confidential, Wallingford, CT
- Involved in various phases of Software Development Life Cycle (SDLC) of the application like Requirement gathering, Design, Analysis and Code development.
- Developed a prototype of the application and demonstrated to business users to verify the application functionality.
- Expertise and knowledge of test driven development concepts. Experience in embedded systems and or programming object oriented code.
- Configured the project using Apache Tomcat 7 Webserver. Experience in Infrastructure Automation.
- Developed and implemented the MVC Architectural Pattern using Spring MVC Framework including JSP, Servlets. Implemented server side tasks using Servlets and XML.
- Developed page templates using Spring Tiles framework. Implemented Spring Validation Framework for Server side validation.
- Developed Jsp’s with Custom Tag Libraries for control of the business processes in the middle-tier and was involved in their integration.
- Accessed dynamic data through Webservices (SOAP) to interact with other components. Integrated Spring DAO for data access using with Hibernate.
- Written JUnit Test cases for perform unit testing. Used Rational Clear Case as Version control.
- Implemented Java and J2EE Design patterns like Business Delegate and Data Transfer Object (DTO), Data Access Object (DAO).
- Worked with QA team for testing and resolve defects.
- Used ANT automated build scripts to compile and package the application and implemented Log4j for the project.
- Deigned Use case diagrams, Class diagrams and Sequence diagrams using UML. Written stored procedures, triggers, and cursors using SQL.
Confidential, Atlanta GA
- Providing Java programming support for existing and new applications. Developing Web applications using Java, J2EE, Struts and Hibernate.
- Developing Action Form classes, Form beans, and Action classes using struts. Using Hibernate for the backend persistence.
- Used spring framework for dependency injection and integrated with Hibernate and JSF. Involved in writing Spring Configuration XML file that contains object declarations and dependencies.
- Implementing MVC, DAO J2EE design patterns as a part of application development. Developing DAO interfaces for hibernate to interact with databases.
- Designing the front end using JSP, Dojo, CSS and Html as per the requirements that are provided.
- Using Java script and Struts validation frame work for performing front end validations. Coding and maintaining Oracle packages, stored procedures and tables.
- Participating in project design sessions to document technical specifications and to provide design options and solutions.
- Working on Web technologies including Tomcat, Apache, Http, Web service architectures.
- Migrating web application from Tomcat to web sphere deployment environments. Using Svn for software configuration management and version control.
Environment: Java, J2EE, JSP, Struts 2.0, JDBC 3.0, Web Services, XML, JNDI, Hibernate 3.0, JMS, Spring, JSF, WebLogic Application Server, JQuery.
- Designed the system based on Spring MVC architecture. Developed a business logic layer using Spring Framework and integrated Hibernate.
- Used spring Object relational mapping (ORM) and hibernate for persistence in database, created DAO's. Implemented ORM Hibernate framework, for interacting with Data Base.
- Worked on Service Oriented Architecture SOA and RESTful web services.
- Interacted with Java controllers (JSON to write/read data from back end systems).
- Created AngularJS controllers, services and used AngularJS filters. Involved in integrations using Maven and Jenkins.
Confidential, Dearborn, MI
- Extensively used Spring IOC, AOP and Spring MVC to develop the application. Implemented the persistence layer using Hibernate-ORM.
- Worked on the creation of SOAP based Web services using AXIS 2.x, SOAP UI, JAXB.
- Developed the Persistence layer using hibernate.
- MySQL programming for Production Support management. SQL Query tuning to enhance the performance to resolve DB related production issues.
- Used EJB Sessions beans to enable the third-party usage of the application. Used GIT for version control.