- Senior Hadoop developer with 8+ years of professional IT experience with 4+ years of Big Data consultant experience in Hadoop ecosystem components in Ingestion, Data Modeling, Querying, Processing, Storage, Analysis, Data Integration and Implementing Enterprise level systems spanning Big Data.
- Hands - on development and implementation experience on Big Data Management Platform (BMP) using Hadoop 2.x, HDFS, MapReduce/Yarn/Spark, Hive, Pig, Oozie, Apache Nifi, Talend, Sqoop and other Hadoop eco-system components as Data Storage and Retrieval systems.
- Excellent knowledge of Hadoop architecture and daemons of Hadoop clusters, which include Name node, Data node, Resource manager, Node Manager and Job history server.
- Experience working with Horton works distribution and Cloudera Hadoop distribution, MapR and EMR.
- Experience in usage ofAmazon EMRfor processing Big Data across aHadoop clusterof virtual servers onAmazon Elastic Compute Cloud (EC2) and Amazon Simple Storage Service (S3).
- Experience in developing Web-based clients/server applications.Designing, developing professional web applications using front-end technologies like HTML, CSS, jQuery, Bootstrap, Angular2, and back-end technologies like Servlets, JSP, JDBC, Spring, Hibernate, Spring MVC, Web Services.
- Hands on experience in coding MapReduce/Yarn Programs using Java, Scala and Python for analyzing Big Data and Strong experience in building Data-pipe lines using Big Data Technologies
- Experience creating real-time data streaming solutions using Apache Spark Core, Spark SQL, Kafka, Spark Streaming and Apache Storm.
- Experience in importing and exporting data from various databases like RDBMS, MYSQL, Teradata, Oracle and DB2 into HDFS using Sqoop and also experience with different data formats like Json, Avro, parquet, RC and ORC and compressions like snappy,Gzip.
- Experience in Extraction, Transformation and Loading (ETL) of data from multiple sources like Flat files, Databases and integration with popular NoSQL database for huge volume of data.
- Hands on experience working onNoSQLdatabases includingHBase, Cassandra, MongoDBand its integration withHadoop cluster for huge volume of data.
- Experience in data processing like collecting, aggregating from various sources using Apache Kafka & Flume
- Hands on experience in working withFlumeto load the log data from multiple sources directly into HDFS.
- ConfiguredSpark Streamingto receive real time data fromKafkaand store the stream data to HDFS and process it usingSparkandScala and exposure on usage of Apache Kafka to develop data-pipe line of logs as a stream of messages using producers and consumers.
- Strong experience of Pig, Hive and Impala analytical functions, extending Hive, Impala and Pig core functionality by writing Custom User Defined Function's (UDF).
- Expertise in working with Hive data warehouse tool-creating tables, data distribution by implementing partitioning and bucketing, writing and optimizing the HiveQL queries.
- Experience in working with internal tables and external tables in Hive and also developed Batch Processing jobs using Map Reduce, Pig and Hive.
- Experience in analyzing data usingHiveQL, Pig Latinandcustom MapReduceprograms in Java and writingHive-ACIDtablesand Pigqueries for data analysis to meet the business requirement.
- In - depth understanding of Spark Architecture including Spark Core, RDD, Data Frames, Data Sets, Spark SQL, Spark Streaming and experience in importing the data from source HDFS into Spark RDD for in-memory data computation to generate the output response.
- Hands on experience Using Hive Tables by Spark, performing transformations and Creating Data Frames on Hive tables using SparkSQL.
- Experience in convertingHive/SQLqueries intoRDDtransformationsusing Spark, Scala and Python
- Worked with ApacheSpark components which provides fast and general engine for large data processing integrated with functional programming languageScala.
- Experience in developing and designing POCs deployed on the Yarn cluster, compared the performance ofSpark, withHiveandSQL/Oracle.
- Good knowledge in using apache NiFi to automate the data movement between different Hadoop systems.
- Expertise in Oozie for configuring job work flows Scheduling, Automation and Managing based on time driven and data driven.
- Designed and developed automation test scripts using Python. Experience working with Python, Linux/UNIX and shell scripting.
- Experience with BI tools like Tableau for report creation and further analysis.
- Good Knowledge on Machine learning algorithms like supervised, non-supervised techniques.
- Hands-on knowledge in Core Java concepts like Exceptions, Collections, Data-structures, I/O. Multi-threading, Serialization and deserialization of streaming applications.
- Solid understanding and practical experience of Software Development Life Cycle principles.
- Experienced and skilled Agile Developer with a strong record of excellent teamwork and successful coding.
- Strong Problem Solving and Analytical skills and abilities to make Balanced & Independent Decisions.
Big Data Eco-system: HDFS, MapReduce, Yarn, Pig, Hive, Impala, Sqoop,Talend, Flume, Kafka, Oozie, Spark, Zookeeper, NiFi
Hadoop Technologies and Distributions: Apache Hadoop, Yarn, Cloudera CDH3, CDH4, Hortonwork, MapR
Operating System: Linux, Ubuntu, Windows (7/8/10)
Languages: C, Java, Scala, Python, Shell Scripting
Databases: Oracle, MySQL, Teradata, DB2
NoSQL: HBase, Cassandra, Mongo DB
IDE Tools: Eclipse, NetBeans, IntelliJ
Java Technologies: Servlets, JSP, JDBC, Spring, Hibernate, Spring MVC, Spring boot, Spring security, Spring REST
Cloud Services: AWS (EC2, S3, EBS, RDS, EMR, IAM)
Build Tools: Maven, SBT, CBT
SVN, GITBI Tools: Power BI, Tableau
Confidential, Mason, Ohio
Big Data Developer
- Engaging with complete Big information stream of the application beginning from information ingestion from upstream to HDFS, handling and breaking down the information in HDFS
- Developed spark Application to Collect Members, Providers, Claims data from various Oracle Servers and ingested them in to the Hadoop Distributed File system using Sqoop.
- Builded the Rules for the provider Use-case by interacting with the Provider team in the organization and created the extract according to the business requirement.
- Created Hive tables by importing data from HDFS, which are later optimized by using optimization techniques partitioning and bucketing to provide better performance with HiveQL queries.
- Created Hive scripts to build the foundation tables by joining multiple tables and designed both Managed and External tables in Hive to optimize performance.
- Implemented a process to automatically update the Hive tables by reading a change file provided by users.
- Expertise in designing and optimizing complex SparkSQL queries, joins and transformations rules to create the DataFrames as per the requirement.
- Responsible for implementing Extract/Transform/Load process through Kafka-Spark-MongoDB integration as per the requirements.
- Transferred data from different data sources into HDFS systems using Kafka producers, consumers, Kafka brokers and used Zookeeper as built coordinator between different brokers in Kafka.
- Used Spark-SQL to load JSON data and create Data Frames and loaded it into Hive Tables and handled structured data using SparkSQL.
- Developing Spark Core in Scala, Spark Streaming and SparkSQL API environment for faster testing and processing of data. Loading the data into Spark RDD and doing In-memory computation to generate the output response with less memory usage.
- Involved in loading the data back into Hive tables, after they are transformed for further processing
- Developing and maintaining Work flow Scheduling Jobs in Oozie for importing data from RDBMS to Hive, Developed Spark jobs and Hive Jobs to summarize and transform data.
- Responsible for data extraction and data integration from different data sources into Hadoop by creating ETL pipelines Using Spark, Yarn, and Hive.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDD’s, Data frames, Scala.
- Developed Spark programs with Scala, and applied principles of functional programming to process the complex unstructured and structured data sets .
- Analyzing the cluster configurations and setting the driver memory, executor memory and number of cores according to it.
- Involved in transforming the format of the transformations and Connected to MongoDB environment using Spark as per the requirement where the data is will get dumped in to MongoDB.
- Automation of all the jobs starting from pulling the Data from Oracle and pushing the result dataset to Hadoop Distributed File System and running MR and Hive jobs using Oozie (Work Flow management).
- Developed flow XML files using Apache NIFI, a workflow automation tool to ingest data into HDFS.
- Worked on performance tuning of Apache NIFI workflow to optimize the data ingestion speeds.
- Hands on experience in Spark Streaming to ingest data from multiple data sources into HDFS.
- Migrated an existing on-premises application to AWS. Designed, Built, and Deployed a multitude application utilizing the AWS stack EC2, S3, EMR focusing on high-availability, fault tolerance, auto-scaling.
- Imported data from AWS S3 intoSparkRDD, Performed transformations and actions on RDDs.
- Worked and learned a great deal from Amazon Web Services (AWS) Cloud services Elastic Compute Cloud (EC2), Simple storage Service(S3), EBS, RDS and Elastic map Reduce (EMR).
- Dealt with several source systems (RDBMS/ HDFS/S3) and file formats (text, JSON/ORC, Parquet, Avro) to ingest, transform and persist data in hive for further downstream consumption
- Built Spark Applications using the IDE IntelliJ and the build tool SBT (Simple Build Tool).
- Involved in Agile Methodologies, Daily Scrum meetings, Sprint planning's and strong experience in SDLC.
- Used Reporting tools like Tableau to connect with Hive for generating daily reports of data.
- Collaborated with the infrastructure, network, database, application and BI teams to ensure data quality and availability
Environment: HDFS, Hadoop Map Reduce, Hive, Sqoop, NiFi, RDBMS, HBase, Zoo Keeper, Shell Scripting, Spark Scala,SparkSQL, Spark Streaming, Kafka, Oracle, MongoDB, IntelliJ, SBT, AWS
- As a Hadoop Developer, I worked onHadoopeco-systems including Hive, Spark, HBase, Zookeeper, Oozie,SparkStreaming MCS (MapR Control System) with MapR distribution.
- Installed and configuredHadoopMapReduce, HDFS, Developed multiple MapReduce jobs in Java for data cleaning and Pre-processing.
- Involved in working with data extracted from two different sources MYSQL, Web Servers and used Sqoop to import and export data from HDFS to RDBMS and vice-versa for visualization and to generate reports.
- Developed simple and complex MapReduce programs in Java for Data Analysis on different data formats.
- Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis and used Sqoop to efficiently transfer data between databases and HDFS.
- Built code for real time data ingestion using Java, MapR-Streams (Kafka) and STORM.
- Worked on importing metadata into Hive using Sqoop and migrated existing tables and applications to work on Hive for furthey analysis according to the business requirement.
- Created Hive Internal or External tables defined with appropriate static and dynamic partitions, intended for efficiency and developed Hive queries and UDFS to analyze/transform the data in HDFS.
- Used Hive QL to analyze the partitioned and bucketed data, Executed Hive queries on Parquet tables stored in Hive to perform data analysis to meet the business specification logic.
- Worked on Spark using Scala and Spark SQL for faster testing and processing of data.
- Involved in integrating hive queries into spark environment usingSparkSql.
- Used Spark for interactive queries, processing of streaming data and integration with popular SQL database
- Exploring with Spark, improving performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, and Pair RDD's and used Spark-SQL to load JSON data and create Data Frames and loaded it into Hive Tables and handled structured data using SparkSQL.
- UsedSparkStreaming APIs to perform transformations and actions for building common data model which gets the data from Kafka in near real time and persist it to Cassandra
- Good understanding of Cassandra architecture, replication strategy, gossip, and snitch and used the SparkDataStax Cassandra Connector to load data to and from Cassandra.
- Experienced in Creating data-models for Client’s transactional logs, analyzed the data from Casandra tables for quick searching, sorting and grouping using the Cassandra Query Language(CQL).
- Worked with different file formats and compression techniques to determine standards
- Implemented a process to automatically update the Hive tables by reading a change file provided by business users.
- Used Oozie and Control - M workflow engine for managing and scheduling Hadoop Jobs
- Wrote the shell scripts to monitor the health check of Hadoop daemon services and respond accordingly to any warnings and failure conditions.
- Used OOZIE Operational Services for batch processing and scheduling workflows dynamically.
- Generated various kinds of reports using Power BI and Tableau based on Client specification.
- Responsible for generating actionable insights from complex data to drive real business results for various application teams and worked in Agile Methodology projects extensively.
- Evaluated suitability of Hadoop and its ecosystem to the above project and implementing / validating with various proof of concept (POC) applications to eventually adopt them to benefit from the Big Data Hadoop initiative.
Environment: HDFS, Hadoop Map Reduce, Hive, Sqoop, RDBMS, HBase, Zoo Keeper, Shell Scripting, Spark, Scala, Kafka, Cassandra.
Jr. Hadoop Developer
- Experience in configuration, management, supporting and monitoring Hadoop cluster using Cloudera distribution.
- Worked in Agile scrum development model on analyzing Hadoop cluster and different Big Data analytic tools including Map Reduce, Pig, Hive, Flume, Oozie and SQOOP.
- Configured Hadoop MapReduce, HDFS, developed multiple MapReduce jobs in Java for data cleaning and preprocessing.
- Loaded data into cluster from dynamically generated files using Flume and from relational database management systems using Sqoop.
- Established custom MapReduce programs to analyze data and used Pig Latin to clean unwanted data.
- Involved in creating Hive tables, loading with data and writing hive queries that will run internally in map reduce way.
- Used Pig as ETL tool to do Transformations, even joins and some pre-aggregations before storing the data onto HDFS.
- Implemented Partitioning, dynamic Partitions and Buckets in Hive for increasing performance benefit and helping in organizing data in logical fashion.
- Implemented in loading and transforming of large data sets of different types of data formats like structured and semi-structured data.
- Responsible to manage data coming from different sources.
- Involved in creating Hive Tables, loading data and writing hive queries.
- Involved in scheduling Oozie workflow engine to run jobs automatically.
- Implemented No SQL database like HBase for storing and processing different formats of data.
- Involved in Testing and coordination with business in User testing.
- Involved in Unit testing and delivered Unit test plans and results documents.
Environment: Apache Hadoop, Map Reduce, HDFS, Hive, Pig, Sqoop, Oozie, HBase, UNIX shell scripting, Zookeeper, Java, Eclipse.
Java/ J2EE Developer
- Worked in the Agile/Scrum development environment and actively participated in scrum meetings and involved in the analysis, design, and development phase of the application.
- Developed the application using the technologies using JSP, Servlets, Hibernate.
- Designed components for the project using the design patterns such as Model-View-Controller (MVC).
- Extensively used the Spring Core for Dependency Injection (DI), Inversion of Control (IOC).
- Used Hibernate as the ORM tool to communicate with the database and Used Hibernate Query Language (HQL) for accessing data from Database.
- Created tables, triggers, stored procedures, SQL queries, joins, constraints & views for Oracle database.
- Used Jersey API to implement Restful web service to retrieve JSON response.
- Interacted with business analyst to understand the requirements to ensure correct UI modules been built to meet business requirements.
- Log4j is used for debugging process and worked on Unit Testing using Junit.
- Used MAVEN scripts to create Jar, War files and deployed the application on Server.
- Worked with version control GIT to manage the code repository.
- Worked with JIRA a tool for bug tracking, issue tracking and project management.
- Creating cross-browser compatible and standards-compliant CSS-based page layouts, fix the bugs pertaining to various browsers.
Jr. Java/J2EE Developer
- Worked on design, development and maintenance of various applications for the Australian retailer Woolworths.
- Actively participated in designing class and sequence diagrams as part of design documents for an application.
- Actively participated in client meetings and suggested inputs for additional functionality.
- Developed web application using Java/J2EE.
- Used Spring-AOP module for implementing features like logging and user session validations.
- Expertise in managing Database and analyzing risks or problems and provide solutions.
- Worked on writing complex SQL queries, Stored Procedures and triggers.
- Worked on optimization of Java Web Services implementation which improved the execution time.
- Took the responsibility of leading the team to work on many enhancements on web applications.
- Implemented a Maven repository management system for Java development and release builds environments.
- Developed PL/SQL queries to generate reports based on client requirements.
- Developed Servlets and Java Server Pages (JSP).
- Enhancement of the System according to the customer requirements.
- Created test case scenarios for Functional Testing.
- Used Java Script validation in JSP pages.
- Helped design the database tables for optimal storage of data.
- Coded JDBC calls in the servlets to access the Oracle database tables.
- Responsible for Integration, unit testing, systemtesting and stress testing for all the phases of project.
- Prepared final guideline document that would serve as a tutorial for the users of this application.
Environment: Java, Servlets, J2EE, JDBC, PL/SQL, HTML, JSP, Eclipse, UNIX.