- Around 8+ years of experience as Big Data/ Hadoop Developer and Java Developer.
- Good knowledge of Hadoop Distributed File System and Ecosystem components like MapReduce, HIVE, PIG, HBase, Sqoop, Oozie, Storm, Zookeeper and Flume.
- Experience in Apache Spark, Spark Streaming, Spark SQL and No SQL databases like Cassandra and HBase.
- Experience in Hive query language for data analytics.
- Experience in HBASE database and used Phoenix over HBASE to retrieve data using SQL queries.
- Experience in CDH distribution and Cloudera manager to manage and monitor Hadoop cluster.
- Well experienced in Cloudera and Hortonworks Hadoop distributions.
- Used Spark streaming to divide streaming data into batches as an input to Spark engine for batch processing.
- Experienced in loading data into Hive partitions and bucketing.
- Implemented Spark Scripts using Scala, Spark SQL to access Hive tables into Spark for faster processing of data.
- Experienced in performance tuning of Spark Applications for setting right Batch Interval time correct level of Parallelism and memory tuning.
- Developed a Pig Latin scripts for transformations and using Hive Query Language for data analytics.
- Detailed understanding of Hadoop internal architecture and functionality of various components such as Job Tracker, Task Tracker, Name Node & Data Node, Application Master, Resource Manager, Node Manager & MapReduce programming paradigm.
- Experienced in importing and exporting data from different databases like MySQL, Oracle, into HDFS and vice - versa using Sqoop.
- Developing various cross platform products while working with different Hadoop file formats like Sequence File, RC File, ORC, AVRO & Parquet.
- Have experience in Shell Scripting like Scala/Python scripting languages and used it extensively with Spark for data processing.
- Hands on experience with batch processing of data sources using Apache Spark.
- Implemented Spark RDD transformations actions to implement business analysis.
- Used Flume to collect aggregate and store the web log data onto HDFS.
- Used Zookeeper for various types of centralized configurations.
- Experienced in loading the huge data from local file system and HDFS to Hive and writing complex queries to load data into internal tables.
- Experience in processing of load and transform the large data sets of structured, unstructured and semi structured data.
- Imported and extracted the needed data using Scoop from the server into HDFS and Bulk Loaded the cleaned data into HBase using MapReduce.
- Promote full cycle approach including request analysis, creating/pulling dataset, report creation and implementation and providing final analysis to the requestor.
- Very Good understanding of SQL, ETL and Data Warehousing Technologies.
- Designing and creating ETL jobs through Talend to load huge volumes of data in Hadoop Ecosystem and relational databases.
- Developed a software using Java to read the data files from UK crime database and implemented Map-Reduce program in Java to sort out the data of different crimes in the cities.
- Implemented mappers and reducers across 24 nodes and distributed the data among the nodes.
- Implemented a MR jobs to scale the database across HBASE tables
- Developed a website using RESTful APIs to fetch data from the web server.
- Java developer with extensive experiences on various Java Libraries, API’s, front end, back end and frameworks.
- Skilled in data management, data extraction, manipulation, validation and analyzing huge volume of data.
- Strong ability to understand new concepts and applications.
- Excellent Verbal and Written Communication Skills have proven to be highly effective in interfacing across business and technical groups.
Hadoop Ecosystem Development: HDFS, Hadoop Map-Reduce, Hive, Impala, Pig, Oozie, HBase, Sqoop, Flume, Yarn, Scala, Kafka, Flume, Zookeeper
Distribution System: Apache Hadoop 3.1.0, Cloudera, Hortonworks.
Languages: JAVA, C/C++, SQL, Spring Boot, Python
Database: Oracle, MS-SQL, PL/SQL
NoSQL: Cassandra, Mongo, HBASE
Tools: Bitbucket, JIRA, Talend, Informatica
Web Design: HTML5, CSS, AJAX, REST, JSON
Frame works: MVC, Struts, Hibernate and Spring
OS: Linux (Ubuntu, Fedora), Unix, Windows
Confidential, Jersey City, NJ
- Migrating data from Oracle database to Big Data environment.
- Worked on Hadoop cluster scaling from 4 nodes in development environment to 8 nodes in pre-production stage and up to 24 nodes in production.
- Involved in complete Implementation lifecycle, specialized in writing custom Map Reduce.
- Hands on experience in Cloudera Hadoop Distribution and highly scalable and real-time read/write accessible HBase is implemented using CDH.
- HBase table created for customer account data based on the region.
- Used Phoenix over the HBase database for SQL querying.
- Written spark scripts to calculate the group calc for the customer group.
- Based on the inception date, customer account group report is generated using spark scripts and report will be given to customer using Inview app.
- MapReduce jobs are written to scale HBase database.
- SQL queries are written in Phoenix and Phoenix interact with HBase and return the results.
- Written Shell scripts to automate the processes using Autosys.
- Written Shell scripts for Autosys to regulate the HBase table enable and disable duplication in the Prod and COB environment.
- Possess good Linux and Hadoop System Administration skills, networking, shell scripting and familiarity with open source configuration management and deployment tools such as Chef.
- Worked with Puppet for application deployment
- Created HBase tables to store various data formats of data coming from different sources.
- Use Maven to build and deploy code in Yarn cluster
- Good knowledge on building Apache spark applications using Scala.
- Developed several business services using Java Restful Web Services using Spring MVC framework
- Managing and scheduling Jobs to remove the duplicate log data files in HDFS using Oozie.
- Used Apache Oozie for scheduling and managing the Hadoop Jobs. Knowledge on HCatalog for Hadoop based storage management.
- Expert in creating and designing data ingest pipelines using technologies such as spring Integration, Apache Storm-Kafka.
- Extensive experience in Spark Streaming (version 1.5.2) through core Spark API running Python, Scala, Java & Python Scripts to transform raw data from several data sources into forming baseline data.
- Used Flume extensively in gathering and moving log data files from Application Servers to a central location in Hadoop Distributed File System (HDFS).
- Implemented test scripts to support test driven development and continuous integration.
- Responsible to manage data coming from different sources.
- Experienced in Analyzing HBASE database and compare it with other open-source NoSQL databases to find which one of them better suites the current requirements.
- Used File System check (FSCK) to check the health of files in HDFS.
- Implemented usage of Amazon EMR for processing Big Data across a Hadoop Cluster of virtual servers on Amazon Elastic Compute Cloud (EC2) and Amazon Simple Storage Service (S3)
- Experienced on loading and transforming of large sets of structured, semi structured and unstructured data.
- Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
- Extensively used Sqoop to get data from RDBMS sources like Teradata and Netezza.
- Create a complete processing engine, based on Cloud era' s distribution
- Involved in collecting metrics for Hadoop clusters using Ganglia and Ambari
- Spark Streaming collects this data from Kafka in real-time and performs necessary transformations and
- Aggregation on the fly to build the common learner data model and persists the data in NoSQL store (Hbase).
- Configured Kerberos for the clusters
Environment: Hadoop, Map Reduce, HDFS, HBASE, Sqoop, Apache Kafka, Python, Oozie, SQL, Apache ZooKeeper, Flume, Spark, Scala, Java.
Confidential, Chicago, IL
- Worked on analyzing Hadoop stack and different big data analytic tools including Pig, Hive, Hbase database and Sqoop.
- Experienced to implement Hortonworks distribution system (HDP 2.1, HDP 2.2 and HDP 2.3).
- Developed Map Reduce programs for some refined queries on big data.
- Created Azure HDInsight and deployed Hadoop cluster in could platform
- Used HIVE queries to import data into Microsoft AZURE cloud and analyzed the data using HIVE scripts.
- Using Ambari in Azure HDInsight cluster recorded and managed the data logs of name node and data node
- Creating Hive tables and working on them for data analysis to cope up with the requirements.
- Developed a frame work to handle loading and transform large sets of unstructured data from UNIX system to HIVE tables.
- Worked with business team in creating Hive queries for ad hoc access.
- In depth understanding of Classic MapReduce and YARN architectures.
- Implemented Hive Generic UDF's to implement business logic.
- Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
- Installed and configured Pig for ETL jobs.
- Developed Pig UDF's to pre-process the data for analysis.
- Deployed Cloudera Hadoop Cluster on Azure for Big Data Analytics
- Analyzed the data by performing Hive queries, ran Pig scripts, SparkSQL and SparkStreaming.
- Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
- Used Apache NiFi to copy the data from local file system to HDFS.
- Developed SparkStreaming script which consumes topics from distributed messaging source Kafka and periodically pushes batch of data to Spark for real time processing.
- Extracted files from Cassandra through Sqoop and placed in HDFS for further processing.
- Involved in creating generic Sqoop import script for loading data into Hive tables from RDBMS.
- Involved in continuous monitoring of operations using Storm.
- Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
- Implemented indexing for logs from Oozie to Elastic Search.
- Design, develop, unit test, and support ETL mappings and scripts for data marts using Talend.
Environment: Hortonworks, Hadoop, Map Reduce, HDFS, Hive, Pig, Sqoop, ApacheKafka, AZURE, ApacheStorm, Oozie, SQL, Flume, Spark, Hbase, Cassandra, Informatica, Java, Github.
Confidential, Atlanta, GA.
- Extracted, Updated and loaded the data from the different data sources into HDFS utilizing SQOOP import/export command line utility.
- Loaded data from UNIX file system to HDFS and created Hive tables, loaded and analyzed data using Hive queries.
- Data was loaded back to the Teradata for the BASEL reporting and for the business users to analyze and visualize the data using Datameer.
- Developed UDFs in java for hive and pig and worked on reading multiple data formats on HDFS using Scala.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs and Scala.
- Developed multiple POCs using Scala and deployed on the Yarn cluster, compared the performance of Spark, with Hive and SQL/Teradata.
- Analyzed the SQL scripts and designed the solution to implement using Scala
- Defined some PIG UDF for some financial functions such as swap, hedging, Speculation and arbitrage.
- Creating end to end Spark-Solr applications using Scala to perform various data cleansing, validation, transformation and summarization activities according to the requirement.
- Streamlined the Hive tables using optimization techniques like partitions and bucketing to provide better performance with Hive queries.
- Created custom shell scripts to import data via SQOOP from Oracle databases.
- Created big data workflows to ingest the data from various sources to Hadoop using OOZIE and these workflows comprises of heterogeneous jobs like Hive and SQOOP.
- Experienced in Spark Context, Spark SQL, Pair RDD and Spark YARN.
- Handled moving of data from various data sources and performed transformations using Pig.
- Created Pig Latin scripts to sort, group, join and filter the enterprise wise data.
- Experience in improving the search focus and quality in Elastic Search by using aggregations.
- Worked with Elastic MapReduce and setup Hadoop environment in AWS EC2 Instances.
- Worked and learned a great deal from AWS Cloud services like EC2, S3, EBS, RDS and VPC.
- Migrated an existing on-premises application to AWS. Used AWS services like EC2 and S3 for small data sets processing and storage, Experienced in Maintaining the Hadoop cluster on AWS EMR.
- Imported data from AWS S3 into Spark RDD, Performed transformations and actions on RDD's.
- Used Spark Streaming APIs to perform transformations and actions on the fly for building common learner data model which gets the data from Kafka in near real time and persist it to Cassandra.
Environment: Hadoop, MapReduce, Sqoop, HDFS, HBase, Hive, Pig, Oozie, Spark, Kafka, Cassandra, AWS, Elastic Search, Java, Oracle 10g, MySQL, Ubuntu, HDP.
- Involved in the coding of JSP pages for the presentation of data on the View layer in MVC architecture.
- Used J2EE design patterns like Factory Methods, MVC, and Singleton Pattern that made modules and code more organized, flexible and readable for future upgrades.
- Used Struts tag libraries as well as Struts tile framework.
- Used JDBC to access Database with Oracle thin driver of Type-3 for application optimization and efficiency.
- Used Data Access Object to make the application more flexible to future and legacy databases.
- Actively involved in tuning SQL queries for better performance.
- Worked with XML to store and read exception messages through DOM.
- Wrote generic functions to call Oracle stored procedures, triggers, functions.
- Involved in design and development phases of Software Development Life Cycle (SDLC).
- Involved in designing UML Use case diagrams, Class diagrams, and Sequence diagrams using Rational Rose.
- Followed agile methodology and SCRUM meetings to track, optimize and tailored features to customer needs.
- Developed user interface using JSP, JSP Tag libraries, and Java Script to simplify the complexities of the application.
- Implemented Model View Controller (MVC) architecture using Jakarta Struts frameworks at presentation tier.
- Developed a Dojo based front end including forms and controls and programmed event handling.
- Implemented SOA architecture with web services using JAX-RS (REST) and JAX-WS (SOAP).
- Developed various Enterprise Java Bean components to fulfill the business functionality.
- Created Action Classes which route submittals to appropriate EJB components and render retrieved information.
- Validated all forms using Struts validation framework and implemented Tiles framework in the presentation layer.
- Used Core java and object oriented concepts.
- Used Spring Framework for Dependency injection and integrated it with the Struts Framework.
- Used JDBC to connect to backend databases, Oracle and SQL Server 2005.
- Proficient in writing SQL queries, stored procedures for multiple databases, Oracle and SQL Server 2005.
- Wrote Stored Procedures using PL/SQL. Performed query optimization to achieve faster indexing and making the system more scalable.
- Deployed application on windows using IBM Web Sphere Application Server.
- Used Java Messaging Services (JMS) for reliable and asynchronous exchange of important information such as payment status report.
- Used Web Services - WSDL and REST for getting credit card information from third party and used SAX and DOM XML parsers for data retrieval.
- Implemented SOA architecture with web services using Web Services like JAX-WS.
- Used ANT scripts to build the application and deployed on Web Sphere Application Server.