- Over 8 years of professional experience involving project development, implementation, deployment and maintenance using Java/J2EE, Hadoop and Spark related technologies using Cloudera, Hortonworks.
- Hadoop Developer with 5 years of working experience in designing and implementing complete end - to-end Hadoop based data analytical solutions using HDFS, MapReduce, Spark, Yarn, Kafka, PIG, HIVE, Sqoop, Storm, Flume, Oozie, Impala, HBase etc.
- Experience in installation, configuration, management and deployment of Hadoop Cluster, HDFS, Map Reduce, Pig, Hive, Sqoop, Apache Storm, Flume, Oozie, HBase and Zookeeper.
- In-depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, Map Reduce, Spark.
- Hadoop related eco-systems as a Data Storage and Retrieval systems.
- Good experience in creating data ingestion pipelines, data transformations, data management, data governance and real time streaming at an enterprise level.
- Profound experience in creating real time data streaming solutions using Apache Spark/Spark Streaming, Kafka.
- Good knowledge on Spark Ecosystem and Spark Architecture.
- Experience developing Pig Latin and HiveQL scripts for Data Analysis and ETL purposes and extended the default functionality by writing User Defined Functions (UDFs), (UDAFs) for custom data specific processing.
- Good Hands-on full life cycle implementation using CDH (Cloudera) and HDP (Hortonworks Data Platform) distributions.
- Strong Knowledge on Architecture of Distributed systems and parallel processing, In-depth understanding of MapReduce programing paradigm and Spark execution framework.
- Profound understanding of Partitions and Bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance.
- Experience in handling messaging services using Apache Kafka.
- Experience with migrating data to and from RDBMS into HDFS using Sqoop.
- Experience in job workflow scheduling and monitoring tools like Oozie and Zookeeper.
- Worked on NoSQL databases including HBase, Cassandra and Mongo DB.
- Strong experience in collecting and storing stream data like log data in HDFS using Apache Flume.
- Experience in working with Java HBase API for ingestion processed data to HBase tables.
- Experience with Oozie Workflow Engine to automate and parallelize Hadoop Map/Reduce, Hive and Pig jobs.
- Experienced with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
- Extensive knowledge in programming with Resilient Distributed Datasets (RDDs)
- Proficient in using Cloudera Manager, an end-to-end tool to manage Hadoop operations in Cloudera Cluster.
- Assisted in Cluster maintenance, Cluster Monitoring, Managing and Reviewing data backups and log files.
- Experience in Extraction, Transformation and Loading (ETL) of data from multiple sources like Flat files, XML files and Databases.
- Extensive experience in ETL process consisting of data transformation, data sourcing, mapping, conversion and loading in Talend.
- Used Talend for ETL processing based on business needs and extensively used Oozie workflow engine to run multiple Hive and Pig jobs.
- Experience with Talend and Informatica/Data Exchange.
- Solid experience in developing workflow using Oozie for running Map Reduce jobs and Hive Queries.
- Experience in managing and reviewing Hadoop log files.
- Responsible in performing advanced procedures like text analytics and processing, using the in-memory computing capabilities of Spark using Scala.
- Experience with Apache Solr in replication, distribution, rebalancing and fault tolerance out of the box.
- Experience in architectural patterns like Apache Lucene search development, full-text search development, cross-platform, High Performance Indexing and ranked searching.
- Good experience in working with cloud environment like Amazon Web Services (AWS) EC2 and S3.
- Experienced in Java Application Development, Client/Server Applications, Internet/Intranet based applications using Core Java, J2EE patterns, spring, Hibernate, Struts, JMS, Web Services (SOAP/REST), Oracle, SQL Server and other relational databases.
- Profound knowledge on core Java concepts like Exceptions, Collections, Data-structures, I/O. Multi-threading, Serialization and deserialization.
- Experience writing Shell scripts in Linux OS and integrating them with other solutions.
- Expert at creating UML diagrams Use Case diagrams, Activity diagrams, Class diagrams and Sequence diagrams using Microsoft Visio and IBM Rational Rose.
- Good experience in development of software applications using Core Java, JDBC, Servlets, JSPs, Spring and RESTful Web Services.
- Experience in using PL/SQL to write Stored Procedures, Functions and Triggers.
- Excellent technical and analytical skills with clear understanding of design goals of ER modeling for OLTP and dimension modeling for OLAP.
- Experience in Requirements Gathering/Analysis, Design, Development, Versioning, Integration, Documentation, Testing, Build and Deployment.
- Efficient in packaging & deploying J2EE applications using ANT, Maven & Cruise Control on WebLogic, WebSphere & JBoss.
- Experience in using Jenkins and Maven to compile the package and deploy to the Application Servers.
- Deployment, Distributed and Implementation of Enterprise applications in J2EE environment
- Good Understanding of bootstrap, spring rest and integration.
- Strong Knowledge of Version Control Systems like SVN, GIT & CVS.
- Familiar with multiple software systems, ability to learn quickly new technologies, adapt to new environments, self-motivated, team player, focused, adaptive and quick learner with excellent interpersonal, technical and communication skills.
Big Data Technologies: Hadoop 1.x/2.x(Yarn), HDFS, MapReduce, Pig, Hive, HBase, Cassandra, Zookeeper, Oozie, Sqoop, Flume, HCatalog, Apache Spark, Scala, Impala, Kafka, Storm, Tez, Ganglia, Nagios, Splunk, Elastic Search, Kibana
Hadoop Distributions: Cloudera, Horton Works, AWS
Operating Systems: Windows, Macintosh, Linux, Ubuntu, Unix, CentOS, Redhat.
Programming Languages: C, JAVA, J2EE, SQL, PigLatin, HiveQL, Scala, Python, Unix Shell Scripting
Java Technologies: JSP, Servlets, Spring, Hibernate, Maven
Databases: MS-SQL, Oracle, MS-Access, NoSQL, MySQL
Reporting Tools/ETL Tools: Tableau, Informatica, Data stage, Talend, Pentaho, Power View
Methodologies: Agile/Scrum, Waterfall, DevOps
Development Tools: Eclipse, NetBeans, IntelliJ, Hue, Microsoft Office Suite (Word, Excel, PowerPoint, Access)
Confidential, Longwood, FL
- Handled importing of data from various data sources, performed data control checks using Spark and loaded data into HDFS.
- Worked on the Spark SQL for analyzing the data.
- Used Scala to write code for all Spark use cases.
- Exploring with the Spark for improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Pair RDD'S, YARN.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDD'S and Scala.
- Worked in Spark SQL on different data formats like JSON and Parquet.
- Developed Spark scripts by using Scala shell commands as per the requirement.
- Used Spark API over Cloudera Hadoop YARN to perform analytics on data in HDFS.
- Load the data into Spark RDD and performed in-memory data computation to generate the output response.
- Familiarity with Hadoop open source stack including Yarn, Kafka, Hive
- I have been experienced with KAFKA to ingest data into Spark engine
- Worked on streaming pipeline that uses Spark to read data from Kafka, transform it and write it to HDFS.
- Extensive experience in using the mom with active MQ, apache storm, apache Spark & Kafka maven and zookeeper.
- Developed Kafka producer and consumers, HBase clients, Spark and Hadoop MapReduce jobs along with components on HDFS, Hive.
- Developed MapReduce (YARN) programs to cleanse the data in HDFS obtained from heterogeneous data sources to make it suitable for ingestion into Hive schema for analysis.
- Designed and developed Map Reduce jobs to process data coming in different file formats like XML, CSV, JSON.
- Developed workflows using Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
- Involved in preparing JIL's for AutoSys jobs.
- Used Sqoop to import the data from RDBMS to Hadoop Distributed File System (HDFS) and later analyzed the imported data using Hadoop Components.
- Involved in loading and transforming large sets of Structured, Semi-Structured and Unstructured data and analyzed them by running Hive queries.
- Used Oozie for automating the end to end data pipelines and Oozie coordinators for scheduling the work flows.
- Implemented daily workflow for extraction, processing and analysis of data with Oozie.
- Hadoop installation & configuration of multiple nodes on AWS EC2 system
- Implementing Hadoop with the AWS EC2 system using a few instances in gathering and analyzing data log files.
- Implemented a Continuous Delivery pipeline with Docker, Jenkins and GitHub, Nexus, Maven and AWS.
Environment: Spark - 1.5.2, Spark SQl, Java 1.8, Hive, HDFS, HQL, YARN, HBase, MapReduce, Sqoop, Flume, Oozie, Kafka, Scala, AWS Oracle 12c.
Confidential, Chicago, IL
- Worked on live 60 nodes Hadoop Cluster running CDH5.4.4, CHD5.2.0, CDH5.2.1
- Worked on Hadoop cluster using different Bigdata analytic tools including Kafka, Pig, Hive and Map Reduce.
- Developed simple to complex MapReduce streaming jobs using Python language that are implemented using Hive and Pig.
- Implemented data access jobs through Pig, Hive, HBase (0.98.0), Storm (0.91)
- Involved in loading data from LINUX file system to HDFS
- Importing and exporting data into HDFS and Hive using Sqoop.
- Altered existing Scala programs to enhance performance and obtain partitioned results Spark tool.
- Used SparkSQL for Scala &, Python interface that automatically converts RDD case classes to schema RDD.
- Used SparkSQL to read and write table which are stored in Hive.
- Involved in importing the real time data to Hadoop using Kafka and implemented the Oozie job daily.
- Involved in developing Hive DDLs to create, alter and drop Hive tables and storm, & Kafka.
- Experienced in transferring data from different data sources into HDFS systems using Kafka producers, consumers and Kafka brokers
- Experience in data migration from RDBMS to Cassandra.
- Created data-models for customer data using the Cassandra Query Language.
- Experienced in developing Spark scripts for data analysis in both python and Scala.
- Worked on processing unstructured data using Pig and Hive.
- Collected and aggregated large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
- Used Impala to read, write and query the Hadoop data in HDFS or HBase.
- Involved in scheduling Oozie workflow engine to run multiple Hive and Pig jobs.
- Developed Pig Latin Scripts to extract data from the web server output files to load into HDFS.
- Responsible in taking backups and restoration of Tableau repository.
- Converted ETL operations to Hadoop system using Pig Latin operations, transformations and functions.
- Experience in Talend migration project from one version to another is preferred.
- Worked on majority of Talend components and can design simple ETL Jobs to handle complex Business Logic.
- Knowledge of error handling and Performance tuning in Talend and SQL.
- Created and maintained technical documentation for launching Hadoop clusters and for executing Hive queries and Pig Scripts.
- Exported the result set from Hive to MySQL using Shell Scripts.
- Actively involved in code review and bug fixing for improving the performance.
- Experience with Cassandra (DataStax distribution preferred)
- Collaborate with development teams on architecture and deployment of NoSQL database systems like Cassandra
Environment: Hadoop, HDFS, Pig, Hive, Map Reduce, Sqoop, Storm, Kafka, LINUX, Hortonworks distribution, Bigdata, Java APIs, Java collection, SQL, NoSQL, MongoDB, Cassandra.
Confidential, Malvern, PA
- Responsible for installation, configuration, maintenance, monitoring, performance tuning and troubleshooting Hadoop Clusters in different environments such as Development Cluster, Test Cluster and Production.
- Used Job Tracker to assign MapReduce tasks to Task Tracker in cluster of nodes.
- Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and extracted the data from MySQL into HDFS using Sqoop.
- Processed HDFS data and created external tables using Hive and developed scripts to ingest and repair tables that can be reused across the project.
- Implemented Kerberos security in all environments.
- Defined file system layout and data set permissions.
- Implemented Capacity Scheduler to share the resources of the cluster for the MapReduce jobs given by the users.
- Responsible for troubleshooting issues in the execution of MapReduce jobs by inspecting and reviewing log files.
- Involved in loading data from Linux and Unix file system to HDFS.
- Involved in emitting processed data from Hadoop to relational databases or external file systems using Sqoop, HDFS GET or Copy to Local.
- Involved in Cluster planning and setting up the multimode cluster.
- Commissioned and Decommissioned nodes from time to time.
- Involved in HDFS maintenance and administering it through HDFS-Java API.
- Worked with Hadoop developers and designers in troubleshooting MapReduce job failures and issues.
Environment: Hadoop 1.2.1, MapReduce, HDFS, Pig, Hive, Sqoop, Cloudera Hadoop Distribution, HBase, Windows NT, LINUX, UNIX Shell Scripting.
- Installed and configured Hadoop MapReduce, HDFS, developed multiple MapReduce jobs in Java for data cleaning and preprocessing.
- Imported and exported data into HDFS from Oracle database and vice versa using Sqoop.
- Installed and configured Hadoop Cluster for major Hadoop distributions.
- Used Hive and Pig as an ETL tool for event joins, filters, transformations and pre-aggregations.
- Created partitions, bucketing across state in Hive to handle structured data.
- Developed workflow in Oozie to orchestrate a series of Pig scripts to cleanse data such as removing personal information or merging many small files into a handful of very large, compressed files using Pig pipelines in the data preparation stage.
- Involved in moving all log files generated from various sources to HDFS for further processing through Kafka, Flume & and process the files by using Piggybank.
- Extensively used PIG to communicate with Hive using HCatalog and HBase using Handlers.
- Performed Sqooping for various file transfers through the HBase tables for processing of data to several NoSQL DBs- Cassandra, MangoDB.
- Created tables, secondary indices, join indices viewed in Teradata development environment for testing.
- Captured data logs from web server into HDFS using Flume & for analysis.
- Managed and reviewed Hadoop log files.
Environment: Hive, Pig, MapReduce, Sqoop, Oozie, Flume, Kafka, Storm, HBase, Unix, Linux, Python, SQL, Hadoop 1.x, HDFS, GitHub, Talend, Python Scripting.
- Developed the HTML based web pages for displaying the reports.
- Performed data validation in Struts from beans and Action Classes.
- Developed dynamic content of presentation layer using JSP.
- Accessed stored procedures and functions using JDBC Callable statements.
- Involved in designing use-case diagrams, class diagrams and interaction using UML model with Rational Rose.
- Implemented Hibernate to persist the data into Database and wrote HQL based queries to implement CRUD operations on the data.
- Developed coding using SQL, PL/SQL, Queries, Joins, Views, Procedures/Functions, Triggers and Packages.
- Developed Web Applications with rich internet applications using Java applets, Silverlight, Java.
- Used JDBC for database access.
- Played a key role in the high-level design for the implementation of the application.
- Designed and established the process and mapping the functional requirement to the workflow process.