- Around 6 years of technical expertise in all phases of SDLC (Software Development Life Cycle) which includes a major concentration on Big Data analyzing frame works, various Relational Databases, NoSQL Databases and Java/J2EE technologies with highly recommended software practices.
- 3+ years of industrial IT experience in Data manipulation using Big Data Hadoop Eco system components Map - Reduce, HDFS, Yarn/MRv2, Pig, Hive, Hbase, Spark, Kafka, Flume, Sqoop, Flume, Oozie, Avro, AWS, Spark integration with Cassandra, Solr and Zookeeper.
- Extensive Experience in working with Cloudera (CDH4 & 5), and Hortonworks Hadoop distros and AWS Amazon EMR, to fully leverage and implement new Hadoop features.
- Hands on experience with data ingestion tools Kafka, Flume and workflow management tools Oozie.
- Strong experience and knowledge of real time data analytics using Spark Streaming, Kafka and Flume.
- Good experience in writing Spark applications using Python and Scala.
- Involved in converting Cassandra/Hive/SQL queries into Spark transformations using RDD’s and Scala.
- Knowledge about unifying data platforms using Kafka producers/ consumers, implement pre-processing using storm topologies
- Experience processing Avro data files using Avro tools and MapReduce programs.
- Hands on experience in writing Map Reduce programs using Java to handle different data sets using Map and Reduce tasks .
- Developed multiple MapReduce jobs to perform data cleaning and preprocessing.
- Designed HIVE queries & Pig scripts to perform data analysis, data transfer and table design.
- Implemented Ad-hoc query using Hive to perform analytics on structured data.
- Expertise in writing Hive UDF, Generic UDF's to in corporate complex business logic into Hive Queries .
- Experienced in optimizing Hive queries by tuning configuration parameters.
- Involved in designing the data model in Hive for migration of ETL process into Hadoop and wrote Pig Scripts to load data into Hadoop environment.
- Compared performance on hive and Big SQL for our data warehousing systems.
- Implemented SQOOP for large dataset transfer between Hadoop and RDBMS.
- Extensively used Apache Flume to collect the logs and error messages across the cluster.
- Experience in composing shell scripts to dump the shared information from MySQL servers to HDFS.
- Worked on Implementing and optimizing Hadoop/MapReduce algorithms for Big Data analytics.
- Developed graphs using Graphical Development Environment (GDE) with various Ab Initio components and migrated few graphs to Hadoop .
Big Data Ecosystems: Hadoop, Map Reduce, HDFS, Zookeeper, Hive, Pig, Sqoop, Oozie, Flume, Yarn, Spark, NiFi
Database Languages: SQL, PL/SQL, Oracle
Programming Languages: Java, Scala, Python( can read and understand)
Frameworks: Spring, Hibernate, JMS
Web Services: RESTful web services
Databases: RDBMS, HBase, Cassandra
IDE: Eclipse, IntelliJ
Platforms: Windows, Linux, Unix
Application Servers: Apache Tomcat, Web Sphere, Web logic, JBoss
Methodologies: Agile, Waterfall
Confidential, Palo Alto
- Developed Scalable Transformation/Aggregation/rollup Operations with Hive and Optimized the SLA’s utilizing hive-based partitions, buckets and storing the data in different file formats (Parquet, Avro, ORC) using suitable compression codecs (snappy, lz4, gzip, lzo, bzip) based on application needs.
- Migrating the code from Ab-initio(ETL tool) to hadoop using hive and spark according to the complexity of the Ab-initio graphs
- Developed graphs using Graphical Development Environment (GDE) with various Ab Initio components
- Developed MapReduce batch jobs in java for loading the data to HDFS in sequential format.
- Ingested structured data from RDBMS to HDFS as incremental import using Sqoop.
- Involved in writing Pig scripts to wrangle the raw data and store it to HDFS, load the data to Hive tables using HCatalog.
- Created Hive external tables with clustering and partitioning on the date for optimizing the performance of ad-hoc queries.
- Involved in creating Hive tables on wide range of data formats like text, sequential, avro, parquet and orc.
- Transformed the semi-structured log data to fit into the schema of the Hive tables using Pig.
- Coordinated with Hadoop Admin team on implementing the DDLs for new applications.
- Worked on Incident and Change Management for creating tickets and CRQ using ASK NOW.
- Worked on Agile framework to tasks on Sprint basis using JIRA board.
- Worked on ESP and D-series to create collections for scheduling Job Docs in Production DCs.
- Follow the D2P process to test and debug the scripts from lower to higher environments.
- Worked with Distributed copy for applications to move data cross clusters.
- Developed Spark applications using Scala utilizing Data frames and spark SQL API for faster processing of data.
- Developed Spark jobs and Hive Jobs to summarize and transform data.
- Built real time data pipelines by developing Kafka producers and Spark streaming applications for consuming.
- Worked on Batch processing and Real-time data processing on Spark Streaming .
- Wrote Spark - Streaming applications to consume the data from Kafka topics and write the processed streams to HBase .
- Developed Hive scripts in Hive QL to de-normalize and aggregate the data.
- Expertise in creating Hive Tables, loading and analyzing data using hive queries.
- Performed transformations, cleaning and filtering on imported data using Hive and loaded final data into HDFS.
- Developed Hive queries on different tables for finding insights. Automated the process of building data pipelines for data scientists to predict, classify, descriptive and prescriptive analytics.
- Built NiFi system for replicating the whole database.
- Developed a NiFi Workflow to pick up the data from Data Lake as well as from server and send that to Kafka broker.
- Created NiFi flows to trigger spark jobs and used put email processors to get notifications if there are any failures.
Big data engineer
- Worked with the Spark for improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
- Serializing JSON data and storing the data into tables using Spark SQL.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDD's and Scala.
- Knowledge of cloud infrastructure technologies in Azure.
- Experience with Confidential Azure Cloud services, Storage Accounts, Azure date storage, Azure Data Factory, Data Lake and Virtual Networks.
- Used Scala to convert Hive/SQL queries into RDD transformations in Apache Spark.
- Worked with Azure Monitoring and Data Factory.
- Supported migrations from on premise to Azure.
- Providing support services to enterprise customers related to Confidential Azure Cloud networking and experience in handling critical situation cases.
- Implemented Spark Scripts using Scala, Spark SQL to access hive tables into spark for faster processing of data.
- Experience in writing Shell scripts to automate the process flow.
- Experience in performing business analytical scripts using Hive SQL.
- Provided consulting and cloud architecture for premier customers and internal projects running on MS Azure platform for high-availability of services, low operational costs.
- Optimized test content and process with a reduction of 20% in false positives. Used SQL and excel to pull, analyze, polish and visualize data.
- Followed agile methodology and SCRUM meetings to track, optimize and tailored features to customer needs.
- Worked with Spark for improving performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, DataFrames and Pair RDD's.
- Performed advanced procedures like text analytics and processing, using the in-memory computing capabilities of Spark using Scala.
- Extract Real time feed using Kafka and Spark Streaming and convert it to RDD and process data in the form of Data Frame and save the data as Parquet format in HDFS.
- Experienced in writing real-time processing and core jobs using Spark Streaming with Kafka as a data pipe-line system.
- Configured Spark streaming to get ongoing information from the Kafka and store the stream information to HDFS.
- Used Spark and Spark-SQL to read the parquet data and create the tables in hive using the Scala API.
- Experienced in using the spark application master to monitor the spark jobs and capture the logs for the spark jobs.
- Worked on Spark using Python and Spark SQL for faster testing and processing of data.
- Developed multiple Kafka Producers and Consumers as per the software requirement specifications.
- Used Spark Streaming APIs to perform transformations and actions on the fly for building common learner data model which gets the data from Kafka in near real time and persist it to Cassandra.
- Experience in building Real-time Data Pipelines with Kafka Connect and Spark Streaming.
- Used Kafka and Kafka brokers, initiated the spark context and processed live streaming information with RDD and Used Kafka to load data into HDFS and NoSQL databases.
- Used Zookeeper to store offsets of messages consumed for a specific topic and partition by a specific Consumer Group in Kafka.
- Used Kafka functionalities like distribution, partition, replicated commit log service for messaging systems by maintaining feeds and created applications, which monitors consumer lag within Apache Kafka clusters.
- Using Spark-Streaming APIs to perform transformations and actions on the fly for building the common learner data model.
- Involved in Cassandra Cluster planning and had good understanding in Cassandra cluster mechanism.
- Used Sqoop to import the data on to Cassandra tables from different relational databases like Oracle, MySQL and Designed Column families.
- Developed efficient MapReduce programs for filtering out the unstructured data and developed multiple MapReduce jobs to perform data cleaning and preprocessing on Hortonworks.
- Implemented Data Interface to get information of customers using Rest API and Pre-Process data using MapReduce 2.0 and store into HDFS (Hortonworks).
- Maintained ELK (Elastic Search, Logstash, and Kibana) and Wrote Spark scripts using Scala shell.
- Worked in AWS environment for development and deployment of custom Hadoop applications.
- Strong experience in working with ELASTIC MAPREDUCE (EMR) and setting up environments on Amazon AWS EC2 instances.
- Written Oozie workflow to run the Sqoop and HQL scripts in Amazon EMR.
- Involvement in creating custom UDFs for Pig and Hive to consolidate strategies and usefulness of Python into Pig Latin and HQL (HiveQL).
- Developed shell scripts to generate the hive create statements from the data and load data to the table.
- Involved in writing custom Map-Reduce programs using java API for data processing.
- The Hive tables are created as per requirement were Internal or External tables defined with appropriate static, dynamic partitions and bucketing, intended for efficiency.
- Prepare Functional Requirement Specification and done coding, bug fixing and support.
- Involved in various phases of Software Development Life Cycle (SDLC) as requirement gathering, data modeling, analysis, architecture design & development for the project.
- Developed JSPs and Servlets to dynamically generate HTML and display the data to the client side.
- Developed service classes, domain/DAOs, and controllers using JAVA/J2EE technologies
- Responsible for deploying the application using WebSphere Server and worked with SOAP, XML messaging.
- Used ECLIPSE as IDE, MAVEN for build management, JIRA for issue tracking, CONFLUENCE for documentation purpose, GIT for version control, ARC (Advanced Rest Client) for endpoint testing, CRUCIBLE for code review and SQL Developer as DB client.
- Used JUnit to develop Test cases for performing Unit Testing.
- Used JSP and JSTL Tag Libraries for developing User Interface components.
- Configured log4j to log the warning and error messages.
- Developing new and maintaining existing functionality using SPRING MVC, Hibernate.
- Used JIRA as a bug-reporting tool for updating the bug report.
Junior Java Developer
- Actively involved from fresh start of the project, requirement gathering to quality assurance testing.
- Coded and Developed Multi-tier architecture in Java, J2EE, Servlets.
- Involved in gathering business requirements, analyzing the project and created UML diagrams such as Use Cases, Class Diagrams, Sequence Diagrams and flowcharts
- Working on developing client-side Web Services components using Jax-Ws technologies.
- Extensively worked on JUnit for testing the application code of server-client data transferring.
- Developed front end using JSTL, JSP, HTML, and Java Script.
- Creating new and maintaining existing web pages build in JSP, Servlet.
- Extensively worked on Views, Stored Procedures, Triggers and SQL queries and for loading the data (staging) to enhance and maintain the existing functionality.
- Involved in developing Web Services using SOAP for sending and getting data from external interface.
- Involved in Database design and developing SQL Queries, stored procedures on MySQL.
- Consumed Web Services (WSDL, SOAP, and UDDI) from third party for authorizing payments to/from customers.
- Developed Hibernate Mapping file (. hbm.xml) files for mapping declarations.
- Writing/Manipulating the database queries, stored procedures for Oracle9i.