- Over 6.5 years of professional experience in IT industry which includes recent experience in Big Data Hadoop Ecosystem like Hive, Spark - streaming and data warehousing tool. Experience with mapping, creation, designing, analysis, design implementation and support of application software based on client requirement.
- Experience in developing strategic methods for deploying Big Data technologies to efficiently solve Big Data processing requirement.
- In depth knowledge in Hadoop YARN Architecture and its daemons such as Resource manager, Node Manager, Application master, Job Tracker, Task Tracker, Containers, Name Node, Data Node and MapReduce concepts.
- Experience working with CDH (Cloudera Distribution Hadoop) and HDP (Hortonworks Data Platform) distributions.
- Hands on Experience on Data Ingestion tools like Apache Sqoop for importing and exporting data.
- Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
- Expertise in loading streaming log data from various webservers into HDFS using Flume.
- Experience developing PySpark code to create RDDs, Paired RDDs and DataFrames.
- Hands on experience in controlling the data distribution by partitioning and bucketing techniques to enhance performance of data mining process, implementing complex business logic and optimizing the query using Hive QL.
- Working knowledge on RDBMS Databases like Oracle11g, SQL Server, MySQL, MS Access.
- Experienced in NoSQL databases like HBase and Apache Cassandra.
- Experience in using Kafka and Kafka brokers to initiate spark context and processing live streaming information with the help of RDD.
- Worked on Oozie Workflow Engine to automate and parallelize Hadoop Map/Reduce, Hive and PIG MR Frameworks.
- Extensive experience in importing streaming data into HDFS using stream processing platforms like Flume and Kafka.
- Good knowledge in working with Amazon Web Services (AWS) using EC2 for computing and S3 as storage.
- Gained Knowledge in IBM Watson Explorer Content Analytics Studio
- Developed scripts and batch jobs to monitor and schedule various Spark jobs.
- Hands on experience on working with complex MapReduce programs using different file formats like Text, Sequence, Xml, parquet and Avro.
- Good Knowledge in using BI tools like Qlik sense and Tableau.
- Prior experience working as Software Developer in Java/J2EE and related technologies.
- Skilled in using version control tools like GIT and SVN.
- Experience in job workflow scheduling and monitoring tools like Oozie and Zookeeper.
- Excellent knowledge in Java and SQL in application development and deployment.
- Hands on experience in creating various database objects like tables, stored procedures, functions, and triggers using SQL, PL/SQL, DB2.
- Detailed understanding of Software Development Life Cycle ( SDLC ) and strong knowledge in project implementation methodologies like Waterfall and Agile .
- Excellent technical, communication, analytical and problem-solving skills and ability to get on well with people including cross-cultural backgrounds and trouble-shooting capabilities
- Map Reduce
- Cloud Search
- Storm Zookeeper
- Data Pipeline
- Kafka and Flume
- Amazon Web Services
- Hive QL
- Apache Casandra
- Windows 2000/98/XP/7/Vista
- MAC OS
- Qlik Sense
Big Data Engineer
- Designed, developed, implemented, testing and maintenance of data ingestion and integration ETL pipelines including Kafka, batch processing, Spark streaming, Cassandra
- Developed Kafka producers and consumers efficient ingested data from various data sources
- Developed Spark Streaming programs to process real time data from Kafka, and process data with both stateless and state full transformations.
- Experience using Spark SQL/ Spark Streaming/Spark MLlib APIs, perform necessary transformations and actions on the fly for building the common learner data model which gets the data from Kafka in near real time and persist into Cassandra.
- Developed Spark programs with Scala and applied principles of functional programming to do batch processing
- Utilized Spark SQL with Data Frames API to provide efficiently structured data processing.
- Created Hive external tables for each source table in Hadoop Data Lake.
- Written Hive jobs to parse the logs and structure them in tabular format to facilitate effective querying on the log data.
- Optimized the data sets by creating dynamic partitioning and bucketing in Hive.
- Created concurrent access for hive tables with shared and exclusive locking that can be enabled in hive with the help of Zookeeper implementation in the cluster.
- Developed business specific Custom UDF’s in Hive.
- Assembled large, complex data sets that met business requirements by using Cassandra.
- Followed Agile Scrum methodology to design, develop, deploy and support solutions that leverages the big data platform.
Environment: Apache Hadoop 2.6.0, YARN, HDFS, Scala 2.10.5, Spark Core 1.6, Spark SQL, Java API, Hive 1.2.1, Kafka 0.8.2, Eclipse Neon, Cassandra, Hortonworks HDP 2.4.3, Zookeeper
- Developed solutions to process data into HDFS (Hadoop Distributed File System), process within Hadoop and emit the summary results from Hadoop to downstream systems
- Primary responsibilities include building scalable distributed data solutions using Hadoop ecosystem
- Responsible for fetching real time data using Kafka and processing using Spark with Scala
- Worked on Kafka to import real time weblogs and ingested the data to Spark Streaming
- Developed business logic using Kafka Direct Stream in Spark Streaming and implemented business transformations
- Experienced with Spark Context, Spark-SQL, Spark YARN.
- Implemented Spark Scripts using Scala, Spark SQL to access hive tables into spark for faster processing of data
- Used Spark for interactive queries, processing of streaming data and integration with popular NoSQL database for huge volume of data
- Implemented Hive Partitioning and Bucketing on the collected data in HDFS
- Involved in Data Querying and Summarization using Hive and created UDF's
- Developed Spark scripts by using Scala Shell commands as per the requirement
- Design technical solution for real-time analytics using HBase
- Created HBase tables and used HBase sinks and loaded data into them to perform analytics using Tableau
- Exported the analysed data to the relational databases using Sqoop for visualization and to generate reports for the BI team
- Collaborated with the infrastructure, network, database, application and BI teams to ensure data quality and availability
- Oozie and Zookeeper were used to automate the flow of jobs and coordination in the cluster respectively
Environment: Hadoop 2.6, HDFS, Hive 1.2, Python, Spark, Oracle, Linux, Cloudera CDH 5.8, Oozie, MapReduce, Sqoop 1.4.5, Shell Scripting, HBase, Apache Kafka, Scala 2.10
Big Data Engineer
- Involved in complete Big Data flow of the application starting from data ingestion from upstream to HDFS, processing and analyzing the data in HDFS.
- Involved in creating Hive internal and external tables, loaded them with data and writing hive queries which requires multiple join scenarios.
- Implemented Spark using Scala and Spark SQL for faster testing and processing of data.
- Created end to end Spark applications using Scala to perform various data cleansing, validation, loaded the data into Spark RDD and do in memory data Computation to generate the Output response.
- Integrated multiple data sources (SQL Server, DB2, TD) into Hadoop cluster and analysed data by Hive.
- Experience with various NoSQL databases and Comprehensive knowledge in process improvement, normalization/de-normalization, data extraction, data cleansing, and data manipula tion.
- Worked on different file formats like Text files, Sequence Files, Avro, Record columnar files (RC).
- Experience in importing and exporting the data using Sqoop from HDFS to Relational Database systems/mainframe and vice-versa.
- Involved in performing importing data from various sources to the Cassandra cluster using Sqoop.
- Worked on creating data models for Cassandra from Existing Oracle data model.
- Designed Column families in Cassandra and Ingested data from RDBMS, performed data transformations, and then export the transformed data to Cassandra as per the business requirement.
Environment: Hadoop, Python, HDFS, YARN, Scala, Hive, Sqoop, Zookeeper, Cloudera, Linux Shell Scripting, Spark-SQL, Cassandra, XML, ETL
- Designed class and sequence diagrams for Enhancements
- Developed the user interface presentation screens using HTML, XML, CSS, JQuery
- Co-ordinate with the QA leads for development of test plan, test cases, and unit test code.
- Worked on optimizing large complicated SQL statements as part of performance improvement for speedy client deliverables.
- Involved in building JUNIT test cases for various modules.
- Maintained the existing code base developed in spring and Hibernate framework by incorporating new features and doing bug fixes.
- Involved in Application Server Configuration and in Production issues resolution
- Wrote SQL queries and Stored Procedures for interacting with the Oracle database.
- Documentation of common problems prior to go-live and while actively involved in a Production Support role.
Environment: J2EE/J2SE, Java1.5, JSP, Ajax4, JSF 1.2, JMS, CSS3, XML, HTML, Oracle, Shell Script, Maven, Eclipse
- Involved in all the phases of SDLC including Requirements Collection, Design &, Analysis of the Customer Specifications, Development and Customization of the Application.
- Communicated with Project manager, client, stakeholder and scrum master for better understanding of project requirements and task delivery by using Agile Methodology.
- Involved in implementing all components of the application including database tables, server-side Java Programming and Client-side web programming.
- Designed and developed Web Services to provide services to the various clients using SOAP and WSDL.
- Involved in preparing technical Specifications based on functional requirements.
- Involved in development of new command Objects and enhancement of existing command objects using Servlets and Core java.
- Identified and implemented the user actions (Struts Action Classes) and forms (Struts Forms Classes) as a part of Struts framework.
- Responsible for coding SQL Statements and Stored procedures for back end communication using JDBC.
- Involved in documentation, review, analysis and fixed post production issues.
- Used SOAPUI to test the web services and mock response for unit testing web services.
Environment: Java, J2EE, JDBC, Struts, JSP, jQuery, SOAP, Servlets, SQL, HTML, CSS, Java Script, DB2, SOAP
- Co-ordinated with BA group for better understanding of functional requirements analyzed and designed the business requirements to documented and implemented.
- Responsible for Design and development of Web pages using PHP, HTML, CSS including Ajax controls and XML.
- Coded Business Logic component using PHP.
- Worked extensively with the File management and image libraries.
- Fixed bugs and provided support services for the application.
- Managing and implementing all code changes via SVN. Deploying builds across development, staging and production instances and maintaining code integrity.