- 8+ years of extensive IT experience with multinational clients this includes 4+ years of recent experience in Big Data/Hadoop Ecosystem.
- Hands - on experience in working on Apache Hadoop ecosystem components like Map-Reduce, Hive, Pig, SQOOP, Spark, Flume, HBase, Kafka, Oozie and Zookeeper.
- Excellent knowledge on Hadoop Components such as HDFS, MapReduce and YARN programming paradigm.
- Experience with installation, configuration, supporting and managing of BigData and underlying infrastructure of Hadoop Cluster.
- Experience in analyzing data using HiveQL, Pig Latin and extending HIVE and PIG core functionality by using custom UDFs.
- Proficient in Relational Database Management Systems (RDBMS).
- Extensive working knowledge of Partitioned table, UDFs, Performance tuning, compression related properties in Hive.
- Good understanding of NoSQL databases and hands on experience in writing applications on NoSQL databases like HBase.
- Hands on experience in using Amazon Web Services like EC2, EMR, RedShift, DynamoDB and S3.
- Hands on using Apache Kafka for tracking data ingestion to Hadoop cluster and implementing Kafka Custom encoders for custom input format to load data into Kafka Partitions.
- Good knowledge in using apache NiFi to automate the data movement between different Hadoop systems.
- Experience in Spark Streaming to ingest data from multiple data sources into HDFS.
- Skillful Hands on Experience on Stream Processing including Storm and Spark streaming.
- Knowledge in job work-flow scheduling and monitoring tools like Oozie.
- Experience in analyzing data using HBase and custom MapReduce programs in Java.
- Proficient in importing and exporting the data using SQOOP from HDFS to Relational Database systems and vice-versa.
- Excellent knowledge in data transformations using MapReduce, HIVE and Pig scripts for different file formats.
- Experience with various scripting languages like Linux/Unix shell scripts, Python.
- Involved in importing Streaming data using FLUME to HDFS and analyzing using PIG and HIVE.
- Experience in using Flume for aggregating log data from web servers and dumping into HDFS.
- Experience in scheduling and monitoring Oozie workflows for parallel execution of jobs.
- Proficient in Core Java, Servlets, Hibernate, JDBC and Web Services.
- Experience in all Phases of Software Development Life Cycle (Analysis, Design, Development, Testing and Maintenance) using Waterfall and Agile methodologies.
- Experience in using Sequence files, AVRO file, Parquet file formats; Managing and reviewing Hadoop log files.
- Experience in Developing and maintaining applications on the AWS platform.
- Hands on experience in working with RESTful web services using JAX-RS and SOAP web services using JAX-WS.
Hadoop Ecosystem: Pig, Hive, Sqoop, Flume, HBase, Kafka-Storm, Spark with Scala, Oozie, Zookeeper, Impala, Hadoop Distributions (Cloudera, Hortonworks)
Web Technologies: Ajax, jQuery, HTML, CSS, XML
Programing Languages: Java, Scala, C/ C++, Python
Databases: MySQL, MS-SQL Server, SQL, Oracle 11g, NoSQL (HBase, Cassandra)
Web Services: REST, AWS, SOAP,UD, Micro Services
Tools: Ant, Maven, Junit, Apache NiFi, Talend, Airflow
Servers: Apache Tomcat, WebSphere, JBoss
IDE's: MyEclipse, Eclipse, IntelliJ IDEA, NetBeans
AWS: HTML, Java Script, XML, SOAP, EMR, EC2.
ETL/BI Tools: Talend, Tableau, Pig
Confidential, Chicago, IL
- Handled importing of data from various data sources, performed transformations using Hive, MapReduce.
- Involved in Hadoop along with Map Reduce, Hive and Pig set up.
- Loaded data into HDFS and extracted the data from MySQL into HDFS using Sqoop.
- Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
- Written Map Reduce programs for some refined queries on big data.
- Managing and scheduling jobs on a Hadoop cluster.
- Analyzed the data by performing Hive queries and running Pig scripts to know user behavior.
- Optimized Map/Reduce Jobs to use HDFS efficiently by using various compression mechanisms.
- Developed Simple to complex Map/reduce Jobs using Hive.
- Implemented Partitioning and bucketing in Hive.
- Mentored analyst and test team for writing Hive Queries.
- Worked with Hive QL on big data of logs to perform a trend analysis of user behavior on various online modules.
- Wrote data ingestion systems to pull data from traditional RDBMS platforms such as Oracle and Teradata and store it in NoSQL databases such as MongoDB, Couchbase, Cassandra.
- Worked with Elastic MapReduce and setup Hadoop environment in AWS EC2 Instances.
- Implemented usage of Amazon EMR for processing Big Data across a Hadoop Cluster of virtual servers on Amazon Elastic Compute Cloud (EC2) and Amazon Simple Storage Service (S3).
- Experience in managing and reviewing Hadoop log files.
- Extensively used Pig for data cleansing.
- Developed the Pig UDF'S to pre-process the data for analysis.
- Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
Environment: Hadoop, MapReduce, HDFS, Hive, Java, Scala2.12.8, Spark 2.1.0, Kafka, SQL, Pig, Sqoop, HBase, Zookeeper, MySQL, DB2, Teradata, AWS,Git, Agile.
Confidential, San Jose, CA
- Worked on analyzing Hadoop stack and different big data analytic tools including Pig, Hive, Hbase database and Sqoop.
- In depth understanding of Classic MapReduce and YARN architectures.
- Developed Map Reduce programs for some refined queries on big data.
- Created Azure HDInsight and deployed Hadoop cluster in could platform
- Used HIVE queries to import data into Microsoft AZURE cloud and analyzed the data using HIVE scripts.
- Using Ambari in Azure HDInsight cluster recorded and managed the data logs of name node and data node.
- Built pipelines to move hashed and un-hashed data from Azure Blob to Data lake and Azure Data Factory.
- Creating Hive tables and working on them for data analysis to cope up with the requirements.
- Developed a frame work to handle loading and transform large sets of unstructured data from UNIX system to HIVE tables.
- Worked with business team in creating Hive queries for ad hoc access.
- Implemented Hive Generic UDF's to implement business logic.
- Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
- Configured deployed and maintained multi-node Dev and Test Kafka Clusters.
- Developed Pig UDF's to pre-process the data for analysis.
- Developed Spark Ingestion Framework to load the data from Hive External Tables to internal tables at one shot.
- Created pipelines to move data from on-premise servers to Azure Data Lake.
- Developed Spark code for consumption layer which includes informatica logic and further loaded data into hive fact & dimension tables.
- Deployed Hadoop Cluster on Azure for Big Data Analytics.
- Deployed the data in Hadoop Cluster on Azure for datalake.
- Started using apache NiFi to copy the data from local file system to HDFS.
- Used sbt to compile & package the Scala code into jar and deployed the same in cluster using spark-submit.
- Analyzed the data by performing Hive queries, ran Pig scripts, SparkSQL and SparkStreaming.
- Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
- Understanding in Machine Learning and statistical analysis with Spark.
- Developed Spark Streaming script which consumes topics from distributed messaging source Kafka and periodically pushes batch of data to Spark for real time processing.
- Involved in creating generic Sqoop import script for loading data into Hive tables from RDBMS.
- Involved in continuous monitoring of operations using Storm.
- Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
- Implemented indexing for logs from Oozie to Elastic Search.
- Experienced to implement MapReduce logics on Hortonworks distribution system (HDP 2.1, HDP 2.2 and HDP 2.3)
- Design, develop, unit test, and support ETL mappings and scripts for data marts using Talend.
Environment: Hortonworks, Hadoop, Map Reduce, HDFS, Hive, Pig, Sqoop, Apache Kafka, AZURE, Apache Storm, Oozie, SQL, Flume, Spark1.6.1, HBase and GitHub .
Confidential, Omaha, Nebraska
- Developed simple to complex MapReduce jobs using Java language for processing and validating the data.
- Developed data pipeline using Sqoop, Spark, MapReduce, and Hive to ingest, transform and analyze, customer behavioral data.
- As a Developer, worked directly with business partners discussing the requirements for new projects and enhancements to the existing applications.
- Wrote extensive shell scripts to run appropriate programs.
- Wrote multiple queries to pull data from Hbase
- Reporting on the project based on Agile-Scrum Method. Conducted daily Scrum meetings and updated JIRA with new details.
- Developed a custom File System plug in for Hadoop so it can access files on Data Platform.
- This plugin allows Hadoop MapReduce programs, HBase, Pig and Hive to work unmodified and access files directly.
- Designed and implemented Mapreduce-based large-scale parallel relation-learning system.
- Involved in review of functional and non-functional requirements.
- Installed and configured Hadoop Mapreduce, HDFS and developed multiple MapReduce jobs in java for data cleaning and pre-processing.
- Importing and exporting data into HDFS and Hive using Sqoop.
- Wrote Pig Scripts to perform ETL procedures on the data in HDFS.
- Analyzed the data by performing Hive queries and running Pigscripts and Python Scripts.
- Used Hive to partition and bucket data.
- Load and transform large sets of structured, semi structured and unstructured data.
- Got good experience with NoSQL database.
Environment: Java 1.6, Hadoop 2.2.0 (Yarn), Map-Reduce, Hive, Pig, Sqoop, Hbase-0.94, Storm-0.9.1, Linux Centos 6.4, Agile, Maven, Jira, Hortonworks Distribution Platform (HDP).
- Developed JSP, JSF and Servlets to dynamically generate HTML and display the data to the client side.
- Used Hibernate Framework for persistence onto oracle database.
- Written and debugged the ANT Scripts for building the entire web application.
- Developed web services in Java and Experienced with SOAP, WSDL and used WSDL to publish the services to another application.
- Implemented Java Message Services (JMS) using JMS API.
- Coded using Servlets, SOAP Client and Apache CXF Rest API's for delivering the data from our application to external and internal for communication protocol.
- Created SOAP Web Service using JAX-WS, to enabled client to consume a SOAP Web Service. .
- Experienced in designing and developing multi-tier scalable applications using Java and J2EE Design Patterns.
Environment: Java, HTML, Java Script, SQL Server, PL/SQL, JSP, Web Services, SOAP, SOA, JSF, Java, JMS, Oracle, Eclipse, XML, Apache tomcat.
- Involved in the coding of JSP pages for the presentation of data on the View layer in MVC architecture.
- Used J2EE design patterns like Factory Methods, MVC, and Singleton Pattern that made modules and code more organized, flexible and readable for future upgrades.
- Used Struts tag libraries as well as Struts tile framework.
- Used JDBC to access Database with Oracle thin driver of Type-3 for application optimization and efficiency.
- Actively involved in tuning SQL queries for better performance.
- Worked with XML to store and read exception messages through DOM.
- Wrote generic functions to call Oracle stored procedures, triggers, functions.