- Over 7 years of IT experience with strong emphasis on Design, Development and Implementation of Bigdata Hadoop Data warehouse/Business Intelligence solutions using Hadoop, HFDS, MapReduce, Hadoop Ecosystem, Development experience using Java, J2EE, JSP and Servlets.
- Excellent understanding of Hadoop architecture and various components such as HDFS, YARN, High Availability, and MapReduce programming paradigm.
- Expertise in setting up processes for Hadoop based application design and implementation.
- Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice - versa.
- Experience in managing and reviewing Hadoop log files.
- Experienced in processing Big data on the Apache Hadoop framework using MapReduce programs.
- In depth knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and MRv1 and MRv2 (YARN).
- Experienced in creating Map Reduce jobs in Java as per the business requirements.
- Implemented Sqoop jobs for large sets of structured and semi-structured data migration between HDFS and/or other data storage like Hive or RDBMS.
- Experience in developing pipelines and processing data from various sources and processing them with Hive
- Involved in converting Hive queries into Spark SQL transformations using Spark RDDs and Scala.
- Exploring with the Spark for improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL.
- Extracted data from log files and push into HDFS using Flume.
- Knowledge of job workflow scheduling and monitoring tools like Oozie and Zookeeper.
- Knowledge of Publish-subscribe messaging system Kafka.
- Used Kafka for message brokering, streaming and log aggregation to put physical logs into centralized locations.
- Extracted the data from MySQL, Oracle, SQL Server using Sqoop and loaded data.
- Extensive knowledge in using SQL Queries for backend database analysis.
- Strong understanding of NoSQL databases like HBase, MongoDB.
- Proficient in deploying applications on J2EE Application servers like WebSphere, WebLogic, Glassfish, JBoss and Apache Tomcat web server
- Worked extensively on Web services and the Service-Oriented Architecture (SOA), Simple Object Access Protocol (SOAP).
- Motivated self-starter with Excellent Communication, Presentation and Problem-solving skills and committed to learning new technologies.
- Committed to professionalism, highly organized, ability to work under strict deadline schedules with attention to details, possess excellent written and communication skills.
Hadoop/Big Data: HDFS, MapReduce, Pig, Hive, HBase, Sqoop, Oozie, Spark, Scala, Kafka, Zookeeper, Impala, Cassandra, Mongo DB
Programming languages: C, C++, Java, Linux shell script, Python
Database: NoSQL, Oracle, DB2, MySQL, SQL Server, MS Access, HBase
Operating Systems: Windows, UNIX, LINUX.
IDE Tools: Eclipse, NetBeans
Web Technologies: J2EE, Servlets, JSP, Struts, Hibernate, EJB, XML, MVC, Struts, Spring.
Development Approach: Agile, Waterfall
Version Control: CVS, SVN, Git
Reporting Tools: Jaspersoft iReport, Tableau, QlikView
Hadoop/Big Data Developer
Confidential, South Portland, ME
- Responsible for building scalable distributed data solutions using Hadoop.
- Understanding business needs, analyzing functional specifications and map those to develop and designing programs and algorithms.
- Involved in loading data from RDBMS into HDFS using Sqoop.
- Handled Delta processing or incremental updates using hive and processed the data in Hive tables.
- Optimizing the Map Reduce code, Hive scripts for better scalability, reliability and performance.
- Assisted with performance tuning and monitoring.
- Developed the OOZIE workflows for the Application execution.
- Involved in creating Hive Tables, loading with data and writing Hive queries.
- Developed PIG Latin scripts while extracting data from source system.
- Documented the systems processes and procedures for future references including design and code reviews, test development.
- Involved in story-driven agile development methodology and actively participated in daily scrum meetings.
- Involved in all phases of the SDLC including analysis, design, development, testing, and deployment of Hadoop cluster.
- Extensively worked on Oozie for batch processing and scheduling workflows dynamically.
- Developed transformations and aggregated the data for large data sets using Pig and Hive scripts.
- Worked on partitioning and used bucketing in HIVE tables and running the scripts in parallel to improve the performance.
- Have thorough knowledge on spark architecture and how RDD's work internally.
- Real time streaming the data using Spark with Kafka.
- Have exposure to Spark SQL.
- Have experience in Scala programming language and used it extensively with Spark for data processing.
- Developed Spark scripts by using Scala shell commands as per the requirement.
- Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
- Developed Scala scripts, UDFFs using both Data frames/SQL/Data sets and RDD/MapReduce in Spark 1.6 for Data Aggregation, queries and writing data back into OLTP system through Sqoop.
- Experienced in performance tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning.
- Optimizing of existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames and Pair RDD's.
Environment: HDFS, Map Reduce, Java, Hive, Oozie, Spark, Scala, Shell Scripting, Linux, HUE, Sqoop, Flume, Kafka and Oracle.
Hadoop/Big Data Developer
Confidential, Denver, CO
- Processed data into HDFS by developing solutions, analyzed the data using MapReduce, Pig, Hive and produce summary results from Hadoop to downstream systems.
- Used Kettle widely in order to import data from various systems/sources like MySQL into HDFS.
- Did various performance optimizations like using distributed cache for small datasets, Partition, Bucketing in hive and Map Side joins.
- Involved in creating Hive tables, and then applied HiveQL on those tables for data validation.
- Monitoring the running MapReduce programs on the cluster.
- Installed and configured MapReduce, HIVE and the HDFS; implemented CDH3 Hadoop cluster on CentOS.
- Involved in creating Hive Tables, loading with data and writing Hive queries which will invoke and run MapReduce jobs in the backend.
- Writing MapReduce (Hadoop) programs to convert text files into AVRO and loading into Hive (Hadoop) tables
- Implemented the workflows using Apache Oozie framework to automate tasks.
- Developing design documents considering all possible approaches and identifying best of them.
- Developed scripts and automated data management from end to end and sync up b/w all the clusters.
- Exploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop.
- Import the data from different sources like HDFS/HBase into Spark RDD.
- Experienced with Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
- Import the data from different sources like HDFS/HBase into Spark RDD.
- Configured Spark streaming to get ongoing information from the Kafka and store the stream information to HDFS.
- Used various spark Transformations and Actions for cleansing the input data.
- Developed shell scripts to generate the hive create statements from the data and load the data into the table.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDD, Scala and Python.
- Involved in gathering the requirements, designing, development and testing.
- Followed agile methodology for the entire project.
- Prepare technical design documents, detailed design documents.
Environment: Hive, HBase, Flume, Java, Maven, Impala, Splunk, Pig, Spark, Oozie, Oracle, Yarn, GitHub, Junit, Tableau, Unix, Cloudera, Flume, Sqoop, HDFS, Tomcat, Java, Scala, Python.
Confidential, Memphis, TN
- Performed Hadoop cluster environment administration like adding & removing cluster nodes, cluster capacity planning, performance tuning, cluster monitoring, & trouble shooting.
- Worked on Hadoop cluster which ranged from 5-10 nodes during pre-production stage and it was sometimes extended up to 25 nodes during production.
- Involved in the configuration of System architecture by implementing Hadoop file system in master and slave systems in Red Hat Linux Environment.
- Developed Map Reduce programs to cleanse data in HDFS obtained from heterogeneous data sources to make it suitable for ingestion into Hive schema for analysis.
- Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
- Responsible for building scalable distributed data solutions using Hadoop and migrate legacy Retail applications ETL to Hadoop.
- Wrote SQL queries to process the data using Spark SQL.
- Extracted data from different databases and to copy into HDFS file system using Sqoop.
- Involved in loading and transforming large sets of structured, semi structured and unstructured data from relational databases into HDFS using Sqoop imports.
- Responsible for analyzing and cleansing raw data by performing Hive queries and running Pig scripts on data.
- Used Maven with SOAP Web services (JAX-WS) using XML, WSDL and Apache CXF.
- Used Spring Integration (SI) to expose some services of our application for other applications in the company to use.
- Used SOAP UI to test the SOAP Web services.
- Created complex Stored Procedures, Triggers and User Defined Functions to support the front-end application.
- Participated in trouble shooting the production issues and coordinated with the team members for the defect resolution under the tight timelines.
- Involved in end to end implementation in the production environment validating the implemented modules.
Environment: Apache Hadoop, HIVE, PIG, HDFS, Java, UNIX, MYSQL, Eclipse, Sqoop, REST/SOAP API
Confidential, Atlanta, GA
- Ingested historical medical claim's data into HDFS from different data sources including databases, flat files and processed using spark, Scala, python
- Hive external tables were used for raw data and managed tables were used for intermediate tables.
- Developed Hive Scripts (HQL) for automating the joins for different sources.
- Responsible for data analysis, validation, cleansing, collection and reporting using R.
- Worked with GIT, Jira and Tomcat in Linux/Windows Environment.
- Experienced in Shell scripting, automating using crontab.
- Developed the Shell scripts for batch reports based on the given requirements.
- Coding using Teradata Analytical functions, BTEQ SQL of Teradata, wrote UNIX scripts to validate, format and execute the SQLs.
- Developed interactive dashboards, created various Ad hoc reports for users in Tableau by connecting various data sources.
- Implemented Classification using Supervised learning like Logistic Regression, Decision trees, KNN, Naive Bayes.
- Performed explorative data analytics and developed interactive dashboard using tableau
- Involved in resolving defects found in testing and production support.
- Wrote Sub Queries, Stored Procedures, Triggers, Cursors, and Functions on Oracle database.
Environment: Hadoop, Hive, Map Reduce, HDFS, SQOOP, HBase, Pig, Oozie, Java, Bash, My-SQL, Oracle, Windows and Linux.
- Gathered specifications from the requirements.
- Developed the application using Struts MVC 2 architecture.
- Developed JSP custom tags and Struts tags to support custom User Interfaces.
- Developed front-end pages using JSP, HTML and CSS
- Developed core Java classes for utility classes, business logic, and test cases
- Developed SQL queries using MySQL and established connectivity
- Used Stored Procedures for performing different database operations
- Used JDBC for interacting with Database
- Developed servlets for processing the request
- Used Exception Handling for handling exceptions
- Designed sequence diagrams and use case diagrams for proper implementation
- Used Rational Rose for design and implementation
- Involved in preparation of functional definition documents and Involved in the discussions with business users, testing team to finalize the technical design documents.
- Enhanced the Web Application using Struts.
- Created business logic and application in Struts Framework using JSP, and Servlets.
- Documented the code using Java doc style comments.
- Wrote unit test cases for different modules and resolved the test findings.
- Implemented SOAP using Web services to communicate with other systems.
- Wrote JSPs, Servlets and deployed them on WebLogic Application server.
- Developed automated Build files using Maven.
- Used Subversion for version control and log4j for logging errors.
- Wrote Oracle PL/SQL Stored procedures, triggers.
- Helped production support team to solve trouble reports
- Involved in Release Management and Deployment Process.