- A dynamic professional with 8+ years of experience in Application analysis, Design, Development, Maintenance and Supporting web, Client - server, distributed applications which includes 4 years of experience with Big Data and Hadoop related components like Spark, HDFS, Map Reduce, Pig, Hive, Impala, YARN, Sqoop, Scala and Kafka.
- Extensive experience with wiring SQL queries using HiveQL to perform analytics on structured data.
- Experience in performing data validation using HIVE dynamic partitioning and bucketing.
- Expertise in Data ingestion using Sqoop, Apache Kafka and Flume
- Excellent understanding and experience of NoSQL databases like HBase and Cassandra.
- Experience on working structured, unstructured data with various file formats such as Avro data files, xml files, JSON files, sequence files, ORC and Parquet.
- Experience in implementing algorithms for analyzing using spark. Implementing Spark using Scala and Spark SQL for faster processing of data.
- Experience in Spark streaming collects the data from Kafka in near real time and performs necessary processing of data.
- Good understanding of cloud configuration in Amazon web services (AWS).
- Experience in Java/J2EE related technologies.
- Experience in getting data from various sources into HDFS and building reports using Tableau and QlikView
- Expertise with Application servers and web servers like Oracle WebLogic, IBM WebSphere and Apache Tomcat.
- Proven expertise in implementing IOC/Dependency Injection features in various aspects of Spring Framework .
- Experience in using version control and configuration management tools like SVN, CVS.
- Experience working in environments using Agile (SCRUM) and Waterfall methodologies.
- Experience in designing applications using UML Diagrams like Class Diagram, Component Diagram, Sequence Diagrams, and Deployment Diagram using MS Visio, Rational Rose.
- Expertise in database modeling and development using SQL and PL/SQL in Oracle (8i, 9i and 10g), MySQL, Teradata, DB2and SQL Server environments.
Big Data: Spark, HDFS, Map Reduce, Hive, Impala, Pig, YARN, Sqoop, Flume, Oozie, Kafka, Hbase and Cassandra
Programming: Java, Advanced Java(JSP, Servlets and Spring), Scala, Python, Shell/Perl Scripting, PL/SQL, Hibernate
Databases: Oracle, MySQL and MS Access
Methodologies: Agile, Waterfall
Build Tools: Maven, ANT, Log4j
Reporting: Qlikview, Tableau
Operating Systems: Linux/Unix, WINDOWS
Confidential, Bridgewater, NJ
Sr. Big Data Hadoop Developer
- Designed a pipeline to collect, clean, and prepare data for analysis using Map reduce, Spark, Pig, Hive and HBase and reporting using Tableau.
- Worked on Kafka to ingest the real time data streams, to push the data to appropriate HDFS and HBase.
- Created/modified UDF and UDAFs for Hive whenever necessary.
- Managed and reviewed Hadoop log files to identify issues when job fails.
- Followed parametrized approach for the schema details, file locations and delimiter details to make the coding efficient and reusable.
- Involved in writing shell scripts in scheduling and automation of tasks to handle regular daily tasks.
- Test build and deploy applications on Linux.
- Automated the process to copy files in Hadoop system for testing purpose at regular intervals.
- Performance optimization dealing with large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark, Effective & efficient Joins, Transformations and other heavy lifting during ingestion process itself.
- Used Spark API over to perform analytics on data in Hive.
- Exploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
- Developed Spark code using scala and Spark-SQL/Streaming for faster testing and processing of data.
- Involved in creating Hive tables and loading with data.
- Developed Scripts and Batch Job to schedule various Hadoop Program.
- Shared responsibility for administration of Apache Hadoop, Hive.
- Created reports using Tableau using HiveQL.
Environment: Horton work’s HDP 2.3, Java, Hive, Linux, Spark, Spark Streaming, Spark SQL, Data Fames, Flume, Kafka, Scala, HBase, Git, HiveQL, Eclipse, Maven, Tableau.
Confidential, Dallas, TX
Big Data Hadoop Developer
- Worked on analyzing, writing Hadoop MapReduce jobs using API, Pig and Hive.
- Gathered the business requirements from the Business Partners and Subject Matter Experts.
- Involved in installing Hadoop Ecosystem components.
- Responsible to manage data coming from different sources.
- Supported MapReduce Programs those are running on the cluster.
- Wrote MapReduce job using Java API for data Analysis and dim fact generations.
- Installed and configured Pig and also written Pig Latin scripts.
- Worked on the backend using Scala and Spark.
- Wrote MapReduce job using Pig Latin.
- Imported data using Sqoop to load data from MySQL to HDFS on regular basis.
- Developed Scripts and Batch Job to schedule various Hadoop Program.
- Developed Java Map Reduce programs on mainframe data to transform into structured way.
- Performed data analysis in Hive by creating tables, loading it with data and writing hive queries which will run internally in a MapReduce way.
- Developed optimal strategies for distributing the mainframe data over the cluster. Importing and exporting the stored mainframe data into HDFS and Hive.
- Implemented Hive Generic UDF's to in corporate business logic into Hive Queries.
- Implemented Hbase API to store the data into Hbase table from hive tables.
- Writing Hive queries for joining multiple tables based on business requirement.
- Involved in build applications using Maven and integrated with CI servers like Jenkins to build jobs.
- Conducted POC for Hadoop and Spark as part of NextGen platform implementation.
Environment: CDH4, Java, MapReduce, HDFS, Hive, Spark, Scala, Pig, Linux, XML, MySQL, MySQL Workbench, Cloudera, Maven, Java 6, Eclipse, PL/SQL, SQL connector, Sub Version.
Confidential, Dallas, TX
- Worked on reading multiple data formats on HDFS using Scala
- Having experience on Hadoop eco system components HDFS, MapReduce, Hive, Pig, Sqoop and HBase.
- Involved in converting Cassandra/Hive/SQL queries into Spark transformations using Spark RDDs, and Scala.
- Developed multiple POCs using Scala and deployed on the Yarn cluster, compared the performance of Spark, with Cassandra and SQL
- Involved in loading data from UNIX file system to HDFS.
- Used Pig as ETL tool to do transformations, event joins and some pre-aggregations before storing the data onto HDFS.
- Extracted the data from Databases into HDFS using Sqoop.
- Handled importing of data from various data sources, performed transformations using Hive, Spark and loaded data into HDFS.
- Manage and review Hadoop log files. Implemented lambda architecture as s solution to a problem.
- Involved in analysis, design, testing phases and responsible for documenting technical specifications.
- Very good understanding of Partitions, bucketing concepts Managed and External tables in Hive to optimize performance.
- Involved in migrating Hive queries into Spark transformations using Data frames, Spark SQL, SQL Context, and Scala.
- Developed Hadoop Streaming MapReduce jobs using Python.
- Worked on the core and Spark SQL modules of Spark extensively.
- Experienced in running Hadoop streaming jobs to process terabytes data in Hive and designed both.
Environment: CDH5, Hadoop, HDFS, MapReduce, Yarn, Hive, Oozie, Sqoop, Oracle, Linux, Shell scripting, Java, Spark, Scala, SBT, Storm, Kafka, Eclipse, Amazon S3, JD Edwards Enterprise One, JIRA, Git Stash Apache Hadoop, HDFS, Java MapReduce, Eclipse, Hive, PIG, Sqoop and SQL, Oracle 11g.
- Responsible for gathering and understanding the system requirements by interacting with clients.
- Generated the Class diagrams, Sequence diagrams extensity for all the entire process flow using RAD.
- Implemented Spring MVC to integrate business logic and model and DAO classes using Hibernate. .
- Interpreted and manipulated spring and hibernate configure files.
- Worked on JMS and Messaging Queue (MQ) configurations.
- Consumed external web services by creating service contract through WSRR (WebSphere Service Registry and Repository) from different Development centers (DCs) and validated the services through SOAP UI.
- Worked on SOAP based Web services, tested Web Services using SOAP UI.
- Used Jenkins tool to build the application on the server.
- Extensively worked for deployment, configurations of application on WEBSPHERE server (DEV and QA-Smoke) and WEBSPHERE PORTAL for integration of all modules.
- Developed documentation for QA Environment.
- Loaded the records from Legacy database (DB2 V10) to existing one (Cassandra 1.2.8)
- Synchronized the create, Update and delete of records between Legacy Database (DB2 v10) and Cassandra 1.2.8
- Created stored procedures, SQL Statements and triggers for the effective retrieval and storage of data into database.
- Application developed on Agile methodologies scrum and iterative method process.
- Used Apache Log4j logging API to log errors and messages.
- Involved in 24x7 support, Maintenance and enhancement of the application
Environment: JDK, Spring Framework, XML, HTML, JSP, Hibernate, ANT, Java Script, XSLT, CSS, AJAX, JMS, SOAP Web Services, Web Sphere Application Server, Tomcat DB2 Cassandra, PL/SQL, MQ Series, Junit, Log4j, Shell scripting, UNIX
- Developed front-end screens using JSP, HTML and CSS.
- Developed server side code using Struts and Servlets.
- Developed core java classes for exceptions, utility classes, business delegate, and test cases.
- Developed SQL queries using MySQL and established connectivity.
- Worked with Eclipse using Maven plugin for Eclipse IDE.
- Tested the application functionality with JUnit Test Cases.
- Developed all the User Interfaces using JSP and Struts framework.
- Extensively used JQuery for developing interactive web pages.
- Developed the DAO layer using hibernate and for real time performance used the caching system for hibernate.
- Experience in developing web services for production systems using SOAP and WSDL.
- Developed the user interface presentation screens using HTML, XML, and CSS.
- Experience in working with spring using AOP, IOC and JDBC template.
- Developed the Shell scripts to trigger the Java Batch job, Sending summary email for the batch job status and processing summary.
- Co-ordinate with the QA lead for development of test plan, test cases, test code and actual testing responsible for defects allocation and those defects are resolved.
- The application was developed in Eclipse IDE and was deployed on Tomcat server.
- Involved in scrum methodology.
- Supported for bug fixes and functionality change.