- 8+ years of overall experience in IT Industry which includes experience in Java Development, Database Management, Big data technologies and web applications in multi - tiered environment using Java, Hadoop, Spark, Hive, HBase, Pig, Sqoop, J2EE (Spring, JSP, Servlets), JDBC, HTML, CSS and Java Script(Angular JS).
- 4+ years of comprehensive experience in Big Data Analytics, Hadoop and its ecosystem components.
- In depth understanding of Hadoop Architecture and its various components such as Job Tracker, Task Tracker, Name Node, Data Node, Resource Manager and MapReduce concepts.
- Hands-on experience in Installing, Configuring, Testing Hadoop Ecosystem components.
- Experience in analyzing data using Hive QL, Pig Latin and custom MapReduce programs in Java
- Experience in importing and exporting data using Sqoop from Relational Database Systems to HDFS and vice-versa.
- Very good understanding of Partitions, Bucketing concepts in Hive and designed E xternal tables in Hive to optimize performance
- Collecting and aggregating a large amount of Log Data using Apache Flume and storing data in HDFS for further analysis.
- Job workflow scheduling and monitoring using tools like Oozie.
- Comprehensive experience in building Web-based applications using J2EE Frame works like Spring, Hibernate, Struts and JMS.
- Worked in complete Software Development Life Cycle (analysis, design, development, testing, implementation and support) using Agile Methodologies.
- Transforming some existing programs into lambdaarchitecture.
- Experience in installation, configuration, support and monitoring of Hadoop clusters using Apache, Cloudera distributions and AWS.
- Experience in working with various Cloudera distributions (CDH4/CDH5), Hortonworks and Amazon EMR Hadoop Distributions.
- Experience in setting up Hadoop on Pseudo distributed environment.
- Experience in setting up HIVE, PIG, HBASE, and SQOOP on Ubuntu Operating system.
- Assisted in Cluster maintenance, Cluster Monitoring, Managing and Reviewing data backups and log files.
- Experience in different layers of Hadoop Framework - Storage (HDFS), Analysis (Pig and Hive), Engineering (Jobs and Workflows).
- Expertise in optimizing traffic across network using Combiners, joining multiple schema datasets using Joins and organizing data using Partitions and Buckets.
- Experienced in using Integrated Development environments like Eclipse, NetBeans, Kate and gEdit.
- Migration from different databases (i.e. Oracle, DB2, Cassandra, MongoDB) to Hadoop.
- Generated ETL reports using Tableau and created statistics dashboards for Analytics.
- Familiarity with common computing environment (e.g. Linux, Shell Scripting).
- Familiar with Java virtual machine (JVM) and multi-threaded processing.
- Detailed understanding of Software Development Life Cycle (SDLC) and sound knowledge of project implementation methodologies including Waterfall and Agile.
- Development experience with Big Data/NoSQL platforms, such as MongoDB and Apache Cassandra.
- Worked and migrated RDMBS databases into different NoSQL database.
- Strong hands-on experience with DW platforms and databases like MS SQL Servers 2012 and 2008, Oracle 11g/10g/9i, MySQL, DB2 and Teradata.
- Experience in designing and coding web applications using Core Java & web Technologies- JSP, Servlets and JDBC.
- Extensive experience solving analytical problems using quantitative approaches using machine learning methods in R .
- Excellent knowledge in Java and SQL in application development and deployment.
- Familiar with data warehousing "fact" and "dim" table and star schema and combined with Google Fusion tables for visualization.
- Good working experience in PySpark and SparkSql.
- Experience in creating various database objects like tables, views, functions, and triggers using SQL.
- Good team player with ability to solve problems, organize and prioritize multiple tasks.
- Excellent technical, communication, analytical and problem-solving skills and ability to get on well with people including cross-cultural backgrounds and troubleshooting capabilities.
Hadoop/Big Data Technologies: MapReduce, HDFS, Hive, Pig, Sqoop, Spark, Storm Kafka, Flume, ZooKeeper, Oozie, Impala
Programming Languages: Java, C, C++, PL/SQL, Python, R, C#, Scala.
Java/J2EE Technologies: Servlets, JSP, JDBC, Java Beans, RMI & Web services.
Scripting Languages: Unix Shell Scripting, SQL, AngularJS
Web Services: Restful, SOAP.
DBMS: Oracle 11g, SQL Server, MySQL, IBM DB2.
IDEs: Eclipse, Net beans, WinSCP, Visual Studio and Intellij.
Operating systems: Windows, UNIX, Linux (Ubuntu), Solaris, Centos.
Version and Source Control: CVS, SVN and IBM Rational Clear Case.
Servers: Apache Tomcat, Web logic and Web Sphere.
Frameworks: MVC, Spring, Struts, Log4J, Junit, Maven, ANT.
ETL Tools: Talend, Informatica
Visualization: Tableau and MS Excel
Confidential, Phoenix, Arizona
- Installed and configure MapReduce, HIVE and the HDFS; implemented CDH5 Hadoop cluster on CentOS. Assited with performance tuning and monitoring.
- Conducted code reviews to ensure systems operations and prepare code modules for staging.
- Role of project manager for this project that contribution to manage and estimation activities.
- Run scrum based agile development group.
- Plays a key role in driving a high performance infrastructure strategy, architecture, scalability.
- Involved in converting Hive/SQL queries into Spark Transformations using Spark RDD’s and Scala.
- Creating end to end Spark-Solr applications using Scala to perform various data cleansing, Validation, transformation according to the requirement.
- Experienced in Spark Context, Spark SQ, Pair RDD and Spark YARN.
- Used Spark streaming APIs to perform transformations and actions on the fly for building common learner data model which gets the data from Kafka in near real time and persist it to Cassandra.
- Utilized high-level information architecture to design modules for complex programs.
- Write scripts to automate application deployments and configurations. Monitoring YARN applications.
- Implemented HAWQ to render queries faster than any other Hadoop-based query interface
- Wrote map reduce programs to clean and pre-process the data coming from different sources.
- Implemented various output formats like Sequence file and parquet format in Map reduce programs. Also, implemented multiple output formats in the same program to match the use cases.
- Implemented test scripts to support test driven development and continuous integration.
- Converted text files into Avro then to parquet format for the file to be used with other Hadoop eco system tools.
- Experienced on loading and transforming of large sets of structured, semi structured and unstructured data.
- Exported the analyzed data to HBase using Sqoop and to generate reports for the BI team.
- Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
- Participate in requirement gathering and analysis phase of the project in documenting the business requirements by conducting workshops/meetings with various business users.
- Worked on external HAWQ tables where the data is loaded directly from CSV files then load them into internal tables.
- Responsible for implementation and ongoing administration of Hadoop infrastructure.
- Performance tuning of Hadoop clusters and Hadoop MapReduce routines.
- Manage and review Hadoop log files.
- File system management and monitoring.
- Point of Contact for Vendor escalation.
Environment: Map Reduce, HDFS, Hive, Pig, Hue, Spark, Kafka, Oozie, Core Java, Eclipse, Hbase, Flume, Cloudera Manager, Oracle 10g, DB2, IDMS, VSAM, SQL*PLUS, Toad, Putty, Windows NT, UNIX Shell Scripting, PentahoBigdata, YARN, HawQ, SpringXD,CDH.
Confidential, Tampa, FL
- Develop, automate and maintain scalable Cloud infrastructure to help process of Tera bytes of data.
- Solve most of the issues in hive and introduce best tools to optimize the query for good performance.
- Work with data science team and software engineers to automate and scale their work.
- Automate/build scalable infrastructure in AWS.
- Design and implemented Hive and Pig UDF’s for evaluation, filtering, loading and storing of data.
- The Hive tables created as per requirement were internal or external tables defined with appropriate static and dynamic partitions, intended for efficiency.
- Written storm topology to accept data from Kafka producer and process the data.
- Load and transformed large sets of structured, semi-structured using Hive and Impala.
- Real time streaming data using spark with Kafka.
- Connected Hive and Impala to tableau reporting tool and generated graphical reports.
- Configure Presto ODBC/JDBC test and documented and proved the efficiency in querying data for business use.
- Worked on migrating Mapreduce programs into Spark transformations using Spark and Scala, initially done using Python (PySpark).
- Experience in querying data using SparkSQL on top of Spark engine for faster data sets processing.
- Developed Scala Scripts using both data frames/SQL/Datasets and RDD/Mapreduce in spark for Data aggregation, Queries and writing data back to OLTP system.
- Worked on Active Batch Directory to automate the incremental scripts and observe lot of issues and solved.
- Wrote lot of scripts in Redshift and modify scope to be in Redshift instead of Hive due to relational data.
- Wrote Lambda functions to stream the incoming data from API’s and created the table sin DynamoDB and then ingest to AWS S3.
- Build the framework for incremental queries by using shell scripts and work in SQL server, Postgresql.
- Participated in multiple big data POC to evaluate different architectures, tools and vendor products.
- Solved lot of issues in hive, impala and presto.
- Analyze the big datasets and change the existing workflow for efficiency and work on Agile methodology.
Environment: Mapreduce, HDFS, Core Java, Eclipse, Hive, Pig, Scala, Impala, Kafka,Tableau, Spark, Hue, Ganglia, Presto-Sandbox, Zeppelin-Sandbox, SQL Server, PostgreSQL, Agile, Python.
Confidential, Auburn Hills, MI
- Installed and configured Hadoop Ecosystem components and Cloudera manager using CDH distribution.
- Frequent interactions with Business partners.
- Designed and developed a Medicare-Medicaid claims system using Model-driven architecture on a customized framework built on Spring.
- Moved data from HDFS to Cassandra using MapReduce and BulkOutputFormat class.
- Imported trading and derivatives data in Hadoop Distributed File System and Eco System (MapReduce, Pig, Hive, Sqoop).
- Involved in loading and transforming large sets of Structured, Semi-Structured and Unstructured data and analyzed them by running Hive queries and Pig scripts.
- Created tables in HBase and loading data into HBase tables.
- Developed scripts to load data from HBase to Hive Meta store and perform Map Reduce jobs.
- Was part of an activity to setup Hadoop ecosystem at dev & QA Environment.
- Managed and reviewed Hadoop Log files.
- Responsible writing PIG Script and Hive queries for data processing
- Running Sqoop for importing data from Oracle & Other Database.
- Creation of shell script to collect raw logs from different machines.
- Created Partition in a hive as static and dynamic.
- Implemented Pig Latin scripts using operators such as LOAD, STORE, DUMP, FILTER, DISTINCT, FOREACH, GENERATE, GROUP, COGROUP, ORDER, LIMIT, AND UNION.
- Optimized the Hive tables using optimization techniques like partitions and bucketing to provide better performance with Hive QL queries.
- Defined some PIG UDF for some financial functions such as swap, hedging, Speculation and arbitrage
- Coded many MapReduceprogram to process unstructured logs file.
- Worked on Import and export data into HDFS and Hive using Sqoop.
- Used different data formats (Text format and Avro format) while loading the data into HDFS.
- Used parameterize pig script and optimized script using illustrate and explain.
- Involved in the process of configuring HA, Kerberossecurity issues and name node failure restoration activity time to time as a part of zero downtime.
- Implemented FAIR Scheduler as well.
Environment: Hadoop, Linux, MapReduce, HDFS, Hbase, Hive, Pig, Shell Scripting, Sqoop, CDH Distribution, Windows, Linux, Java 6, Eclipse, Ant, Log4j and Junit
Confidential, Chicago, IL
- Involved in gathering requirements and analysis through interaction with the end users.
- Worked directly with clients in automating release management tasks, reducing defect counts in the testing phases to ensure smooth implementation of projects.
- Involved in the design and creation of Class diagrams, Sequence diagrams and Activity Diagrams using UML models.
- Created the scripting code to validate the data.
- Designed and developed the application using various Design Patterns such as Front controller, Session Facade and Service Locator.
- Involved in developing JSP pages using Struts custom tags, jQuery and Tiles Framework.
- Implemented Singletonclasses for property loading and static data from DB.
- Debugged and developed applications using Rational Application Developer (RAD).
- Developed a Web service to communicate with the database using SOAP.
- Developed DAO (data access objects) using Spring Framework 3.
- Deployed the components in to WebSphere Application server 7.
- Generated build files using Maven tool.
- Developed test environment for testing all the Web Service exposed as part of the core module and their integration with partner services in Integration test.
- Involved in writing queries, stored procedures and functions using SQL, PL/SQLand in backend tuning SQL queries/DB script.
- Responsible for performing end-to-end system testing of application writing JUnit test cases
- As part of the development team Contributed for Application Support in Soft launch and UAT phase and in Production support using IBM clear quest for fixing bugs.
Environment: Java EE, IBM WebSphere Application Server, Apache-Struts, EJB, Spring, JSP, Web Services, JQuery, Servlet, Struts-Validator, Struts-Tiles, Tag Libraries, Maven, JDBC, Oracle 10g/SQL, JUNIT, CVS, AJAX, Rational clear case, Eclipse, JSTL, DHTML, Windows, UNIX.
- Involved in various SDLC phases like Design, Developmentand Testing.
- Involved in developing JSP pages using Struts custom tags, jQuery and Tiles Framework.
- Used various Core Java concepts such as Exception Handling, Collection APIs to implement various features and enhancements.
- Developed server side components servlets for the application.
- Involved in coding, maintaining, and administering Servlets and JSP components to be deployed on a WebSphere application server.
- Used automated test scripts and tools to test the application in various phases. Coordinated with Quality Control teams to fix issues that were identified
- Implemented Hibernate ORM to Map relational data directly to java objects
- Worked with Complex SQL queries, Functions and Stored Procedures.
- Involved in developing spring web MVC framework for portals application.
- Implemented the logging mechanism using log4j framework.
- Developed REST API, WebServices.
- Wrote test cases in JUnit for unit testing of classes.
- Used Maven to build the J2EE application.
- Used SVN to track and maintain the different version of the application.
- Involved in maintenance of different applications with onshore team.
- Good working experience in Tepestry processing claims.
- Working experience with professional billing claims.
Environment: Java, Spring Framework, Struts, Hibernate, RAD, SVN, Maven, Web Sphere Application Server, Web Services, Oracle Database 11g, IBM MQ, JMS, HTML, Java script, XML, CSS,REST API.