- Over 7+ years of experience in IT industry which includes 3+ years of experience in Big Data Practices and Technologies like HDFS, MapReduce, Hive, Pig, HBase, Sqoop, Oozie, Flume, Spark, Kafka.
- 3 years Extensive Experience in design and development of multi - tier applications using Jdk 1.8, J2EE, springframeworks.
- Expertise in developing spark streaming applications in Python and Scala using spark RDD's, Data frames, Spark SQL.
- Expertise in creating Kafka topics, MapR Streams and MapR Topics.
- Highly skilled in integrating Kafka with Spark streaming for high speed data processing.
- Worked with NoSQL databases like HBase and MapR-DB for information extraction and place huge amount of data.
- Extensive experience in Shell scripting.
- Experience with Oozie workflow engine in running jobs with actions that run Sqoop, Pig and Hive jobs.
- Importing and exporting data from different Relational databases like MySQL, Oracle into HDFS and Hive using Sqoop.
- Extensive experience working in Oracle, DB2, SQL Server and MySQL database.
- Experience in cluster monitoring tools like MapR Control System (MCS), Cloudera Manager.
- Experience in developing real time streaming pipelines using Apache Kafka, MapR Streams and Spark Streaming.
- Hands on experience in cloud services like AWS.
- Knowledgeon different distributions of Hadoop likeCloudera and MapR.
- Worked on data streaming tools and ETL tools like Streamsets Data Collector, Attunity Replicate.
- Used Agile (SCRUM) methodologies for Software Development.
- Worked on HIVE queries to perform data analysis, data transfer and table design to load data into Hadoop environment.
- Work closely with business partners to understand business requirements and design solutions based on those requirements.
- Extensively involved in writing PL/SQL, stored procedures, functions and packages.
- Experience in writing UDFs and configuring CRON Jobs.
- Good Knowledge on reporting and data visualization tools like Oracle Data Visualization Desktop, Tableau and Grafana.
- Outstanding communication and presentation skills, willing to learn, adapt to new technologies and third-party products.
Hadoop/Big data: HDFS, Hive, Map Reduce, Spark, Sqoop, HBase, Kafka, Oozie
No SQL Databases: HBase, MapR-DB
Languages: Python, Scala, Core Java, Unix Shell scripts, SQL
Web/Application Server: Apache Tomcat
Databases: Oracle, DB2, SQL Server, MySQL
IDEs: Eclipse, Intellij, DB Visualizer
Other Tools &packages: CVS, SVN, JUnit, Maven, ANT, GitHub, Streamsets Data Collector, Oracle DVD, Grafana, Tableau.
SDLC Methodology: Agile, Waterfall model
Operating Systems: Linux, UNIX, Windows
Office Tools: MS Office, Word, Power Point
Confidential - Bloomfield, CT
Big Data Developer.
- Configured real-time streaming pipeline from DB2 to HDFS using Apache Kafka.
- Created Kafka topics.
- Developed PySpark application to consume data from Apache Kafka topics and publish to HDFS and HBase.
- Worked on Apache Flume to stream data from Oracle to Apache Kafka topics.
- Managed docker images using Quay.
- Created hive managed and external tables.
- Used Hue and Cloudera Manager to monitor Spark jobs.
- Developed SQOOP scripts to load data from Oracle to Hive external tables.
- Worked on Grafana for real-time visualizations.
Environment: Cdh 5.7.0, Apache Kafka 1.0.0, Hive 1.1.0, HBase 1.2.0, Hue, Cloudera Manager, Spark 1.6.0, Python 2.6.6, SQOOP, Oozie, Pig, IntelliJ, Kafka Connect Framework, Grafana, GIT.
Confidential - Boston, MA
Big Data Developer.
- Involved in Requirement analysis, Design, development and testing of the application.
- Configured Kafka Connect JDBC with SAP HANA and MapR Streams for both real-time streaming and batch process.
- Created MapR-Event Streams and Kafka topics.
- Worked on Attunity Replicate to load data from SAP ECC to Apache Kafka topics.
- Developed Spark Streaming application using Pythonto stream data from MapR Event Streams and Apache Kafka topics to Hive and MapR-DB and also to stream data from one topic to the other topic with in the MapR Event Streams.
- Worked on DStreams (Discretized Stream), RDD’s (Resilient Distributed Dataset), Dataframes, Spark SQL to build the spark streaming application.
- Involved in creating SQL queries to extract data, to perform joins on the tables in SAP HANA and MySQL.
- Worked with NoSQL databases like HBase in creating tables to load large sets of semi structured data coming from source systems.
- Implemented Partitioning, Dynamic Partition, and Bucketing in Hive for efficient data access.
- Used Hue and MapR Control System (MCS) to monitor and troubleshoot Spark jobs.
- Developed SQOOP scripts to move data from MapR-FS to SAP HANA.
- Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
- Installed and configured Kafka Connect JDBC in AWS EC2 instance.
- Created stored procedures in MySQL to improve data handling and ETL Transactions.
- Worked on data validation using HIVE and also written Hive UDFs.
- Managed Linux and Windows virtual servers on AWSEC2.
- Used Jenkins AWS code deploy plugin to deploy into AWS.
- Configured SAP HANAsource connector with SAP HANA as source and Apache Kafka topic as target for real time streaming and batch processing.
- Provisioned, installed and configured SAP HANA enterprise edition on AWS cloud EC2 instance.
- Developed Streaming application to stream data from MapR ES to HBase.
- Streamed data from Apache Kafka topics to time series database OPEN TSDB.
- Built dashboards and visualizations on top of MapR-DB and Hive using Oracle data visualizer desktop. Built real-time visualizations on top of Open TSDB using Grafana.
- Worked on UNIX shell scripts and automation of the ETL processes using UNIX shell scripting.
Environment: MapR 6.0, Apache Kafka 1.0.0, Hive 2.1, HBase 1.1.8, Hue, MapR-DB, MapR-FS, Attunity Replicate, Spark 2.1.0, Python,AWS, SAP HANA, SQOOP, Oozie, Pig, IntelliJ, Kafka Connect Framework, DB Visualizer, Oracle Data Visualizer Desktop, Stream-sets Data collector, MapR-ES, MySQL, GIT.
Confidential - Minneapolis, MN
Big Data Developer
- Worked on enhancing the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Spark RDD's.
- Worked on MySQL for identifying required tables and views to export into HDFS.
- Loaded data from MySQL to HDFS to development cluster for validation and cleansing.
- Created Apache Kafka topics.
- Configured Streamsets data collector with Apache Kafka to stream real time data from different sources(database & files) into Kafka topics.
- Developed streaming application to stream data from Kafka topics to Hive using Spark, Python.
- Worked on real time processing and batch processing of data sources using Apache Spark, Elastic search, Spark Streaming, Apache Kafka.
- Created scripts for importing data into HDFS/Hive using Sqoop from DB2.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, python, Spark SQL.
- Worked on migrating PIG scripts and MapReduce programs to Spark Data frames API and Spark SQL to improve performance.
- Conducted POC's for real time streaming of data from MySQL to Hive and HBase.
- Worked on Sequence files, RC files, Map side joins, bucketing, Partitioning for Hive performance enhancement and storage improvement.
- Handled importing & exporting of large data sets from various data sources into HDFS and vice-versa using Sqoop, performed transformations using Hive and loaded data into HDFS.
- Built dashboards and visualizations on top of Hive using Tableau and published those reports on tableau online accounts and on the browser using iframe.
- Monitoring Hadoop Cluster through Cloudera Manager and Implementing alerts based on Error messages.
- Worked on Java Mail API to send text and email notifications to Customers.
- Loaded data from UNIX file system to HDFS.
Environment: Cloudera, Apache Kafka, HDFS, Python, Hive, Spark, Spark SQL, PIG, Map Reduce, SQOOP, IntelliJ,Tableau, Stream-sets Data collector, UNIX, MySQL, GIT.
Confidential -San Diego, CA
- Involved in testing SQL Scripts for report development and handled the performance issues effectively.
- Performed data cleaning depending on the requirement using SQL to ensure data quality and completeness to ensure proper accuracy of data.
- Worked extensively with Flume for importing data from various webservers to HDFS.
- Developed end to end Spark applications using Scala to perform various data cleansing, validation, transformation and summarization activities according to the requirement.
- Monitoring systems and services through Cloudera Manager to make the clusters available for the business.
- Worked on Flume to load the log data from multiple sources directly into HDFS.
- Developed Sqoop scripts for importing and exporting data into HDFS and Hive.
- Used Zookeeper to store offsets of messages consumed for a specific topic and partition by a specific Consumer Group in Kafka.
- Worked with HBase in creating HBase tables to load large sets of semi structured data coming from various sources.
- Scheduled the workflows using Oozie workflow scheduler.
- Worked in Agile and used JIRA for maintain the stories about project.
- Worked with SQOOP import and export functionalities to handle large data set transfer between DB2 database and HDFS.
- Created hive queries for extracting data and sending them to clients.
Environment: Apache Kafka, HDFS, Hive, Spark, GIT, MySQL, PIG, Python,DB2, Cron Jobs, UNIX, Spark SQL, SQOOP, OOZIE.
- Involved in the implementation of design using vital phases of the Software development life cycle.
- Involved in design, development and testing of the application.
- Implemented the object-oriented programming concepts for validating the columns of the import file.
- Responsible for creating RESTful Web services using JAX-RS.
- Used DOM Parser to parse the xml files.
- Implemented complex back-end component to get the count in no time against large size MySQL database (about 4 crore rows) using Java multi-threading.
- Experience working in agile development following SCRUM process, Sprint and daily stand-up meetings.
- Participate in OOAD, domain modelling, and system architecture.
- Used WinSCP to transfer file from local system to other system.
- Coming up with the test cases for unit testing before the QA release.
- Working closely with QA team and coordinating on fixes.
- Analyzed and modified existing code wherever required and participated in developing the designs document.
- Uses Rational Rose for model driven development and UML modelling.
- Responsible and active in the analysis, design, implementation and deployment of full software development life-cycle (SDLC) of the project.
- Developed the presentation layer using JSP, HTML, and CSS.
- Used JDBC technology to establish connection with Oracle database Communicated with the database using PL/SQL.
- Participated in understanding of business requirements, design and development of the project.
- Migrated to a Struts based system from the existing JSP/ Servlets/ Beans based application.
- Developed JSP pages and client-side validation by java script tags.
- Developed Web services for sending and getting data from different applications using.
- Implement Front controller design pattern.
- Resolved critical bugs.
Environment: Java, J2EE Servlet, JSF 2, XML, JSON, HTML, CSS, JQuery, Spring 3.0, Log4j, Git, Maven, Eclipse, Apache Tomcat 6, and Oracle 11g.