We provide IT Staff Augmentation Services!

Sr. Kafka Admin Resume

3.00/5 (Submit Your Rating)

Herndon, VA

SUMMARY

  • Over 10+ years of experience in Information Technology which includes experience in Big data andHADOOPEcosystem. In - depth noledge and hands-on experience in dealing wif ApacheHadoop components like HDFS, MapReduce, Kafka, HiveQL, HBase, Pig, Hive, Sqoop, Oozie, Cassandra, Flume, and Spark.
  • Experienced in developing web applications in various domains like Telecom, Insurance, Retail and Banking.
  • Excellent noledge on Hadoop Architecture and ecosystems such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce.
  • Developed Scripts and automated data management from end to end and sync up between all the clusters.
  • Strong hands on experience in Hadoop Framework and its ecosystem including but not limited to HDFS Architecture, MapReduce Programming, Hive, Pig, Sqoop, Hbase, Oozie etc.
  • Worked wif system engineering team to plan and deploy Hadoop hardware and software environments.
  • Hands on experience in installing, configuring Cloudera's Apache Hadoop ecosystem components like Flume-ng, Hbase, Zoo Keeper, Oozie, Hive, Storm, Sqoop, Kafka, Hue, Pig, Hue wif CDH3&4 clusters.
  • Accomplished wif creating Test Plans, defining Test Cases, reviewing and maintaining Test Scripts, interacting wif team members in fixing errors and executing Integration testing (SIT), User Acceptance Testing (UAT), Stage (PFIX) Unit, System Integrated Test, Regression Test and Customer Test
  • Extensive experience wif core Java, MapReduce programming, advanced J2EE Frameworks such as Spring, Struts, JSF and Hibernate.
  • Extensive experience in Application Software Design, Object Oriented Design, Development, Documentation, Debugging, Testing and Implementation
  • Expert level skills in designing and implementing web server solutions and deploying java application servers like Tomcat, JBoss, WebSphere, Weblogic on Windows platform.
  • Excellent work experience wif XML/Database mapping, writing SQL queries, Store Procedures, Triggers wif major Relational Databases Oracle 11g, 12c and SQL Server.
  • Extensively worked on Java/J2EE systems wif different databases like Oracle, My SQL and DB2.
  • Having good noledge of Oracle9i, 10g, 11g as Database and excellent in writing the SQL queries and scripts.
  • Experience using Hibernate for mapping Java classes wif database and using Hibernate Query Language (HQL).
  • Application Integration wif IDE's like Eclipse and deployments in application servers like Tomcat, IBM Websphere and Weblogic.
  • Strong experience in web services dat include several components like SOAP, WSDL, XSD, Axis2 and JAX-WS and Restful web services.
  • Experience in developing Client side Web applications using HTML, JSP, Jquery, JSTL, AJAX, and Custom Tags while implementing the client side validations using JavaScript and Server side validations using Struts Validations Framework.
  • Experience in creating Entity Relationship Diagrams (ERD) using data modeling tools, organizing the data in the project into entities and define the relationships between the entities.
  • Competence in using different java IDEs like Eclipse, JBuilder, NetBeans and RAD for developing Dynamic Web Applications.
  • Developed Pyspark code to read data from Hive, group the fields and generate XML files.Enhanced the Pyspark code to write the generated XML files to a directory to zip them to CDAs
  • Built a data validation dashboard in Solr to be able to display the message record. Wrote shell script to run Sqoop job for bulk data ingestion from Oracle into Hive
  • Created tables for the ingested data in Hive. Scheduled Oozie job for data ingestion for the Sqoop job
  • Worked wif JSON file format for StreamSets. Worked wif Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs.
  • Configured Spark Streaming to receive real time data from the Kafka and store the stream data to HDFS.
  • Enhanced and optimized product Spark code to aggregate, group and run data mining tasks using the Spark framework.
  • Worked on Maven 3.3.9 for building and managing Java based projects. Hands-on experience wif using Linux and HDFS shell commands. Worked on Kafka for message queuing solutions.
  • Loaded and transformed large sets of structured, semi structured and unstructured data in various formats like text, zip, XML and JSON.
  • Generated Java APIs for retrieval and analysis on No-SQL database such as HBase and Cassandra.
  • Basic noledge in application design using Unified Modeling Language (UML), Sequence diagrams, Case diagrams, Entity Relationship Diagrams (ERD) and Data Flow Diagrams (DFD).
  • Familiar in Core Java wif strong understanding and working noledge in Object Oriented Concepts like Collections, Multithreading, Data Structures, Algorithms, Exception Handling and Polymorphism.
  • Expertise in database design, creation and management of schemas, writing Stored Procedures, Functions, DDL, DML, SQL queries & Modeling.
  • Extensive experience in ETL Architecture, Development, enhancement, maintenance, Production support, Data Modeling, Data profiling, Reporting including Business requirement, system requirement gathering.
  • Excellent understanding/noledge ofHadoopDistributed system architecture and design principals.
  • Developed a pipeline using Java,Kafka and Python to load data from a JMS server to Hive wif automatic ingestions and quality audits of the data to the RAW layer of the Data Lake.

TECHNICAL SKILLS

Programming Languages: SQL, PL/SQL, python, Scala, Linux shell Scripting, Web Services MVC, SOAP, REST

Hadoop/Big Data Technologies: Hadoop, HDFS, MapReduce, Hive, Pig, HBase, Spark, Scala, Impala, kafka, Hue, Sqoop, Oozie, Flume, Zookeeper, Cassandra, Cloudera CDH5, Python, PySpark, Solrand Horton works.

Frameworks: Apache Storm, Samza, Spark, and Flink, MapR,YARN

IDE Tools: Eclipse, NetBeans, IntelliJ IDEA, JBuilder, JDeveloper 9.0/10g, Toad 4.0/8.5, SQL Developer, Rational Application Developer (RAD), Visual Studio& Spring Source Tool Suite.

Operating Systems: Windows, UNIX, Linux, CentOS & Ubuntu.

Web Technologies: HTML/DHTML, CSS, CS3, JavaScript, AJAX, JQuery, Angular2, XML, JSON, XSLT, Java Beans, JMS.

Version Controls: Git, CVS, SVN, Microsoft VSS, Stash, ClearCase 7.0/7.1, Subversion.

Web/App Servers: IBM WebSphere 6.x/5.x, Apache Tomcat 8, 9, jboss, WebLogic, Glass Fish, Jetty, Web Services: SOAP, REST API, AXIS.

Development Tools: Eclipse, Rational ApplicationDeveloper, Oracle JDeveloper, PL/SQLDeveloper, SQLDeveloper, WS FTP, Putty

Database: Oracle, DB2, OOAD, Design Patterns, and MS SQL Server, Teradata

Middleware: ODBC, JDBC, RMI, Blaze DS.

Methodologies: Agile, SCRUM, TDD, Design Patterns, Continuous Integration using CruiseControl, WaterFall

PROFESSIONAL EXPERIENCE

Sr. Kafka Admin

Confidential, Herndon, VA

Responsibilities:

  • Responsible for Managing, Analyzing and Transforming petabyte's of data and also quick validation check on FTP file arrival from S3 Bucket to HDFS.
  • Responsible for analyzing large data sets and derive customer usage patterns by developing new MapReduce programs.
  • Experienced in creation of Hive tables and loading data incrementally into the tables using Dynamic Partitioning and Worked on Avro Files, JSON Records.
  • Experienced in using Pig for data cleansing and developed Pig Latin scripts to extract the data from web server output files to load into HDFS.
  • Worked on Hive by creating external and internal tables, loading it wif data and writing Hive queries.
  • Involved in development and usage of UDTF's and UDAF's for decoding Log Record Fields and Conversion's, Generating Minute Buckets for the specified Time Interval's and JSON Field Extractor.
  • Developed Pig and Hive UDF's to analyze the complex data to find specific user behavior.
  • Responsible for Debug, Optimization of Hive Scripts and also implementing Deduplication Logic in Hive using a Rank Key Function (UDF).
  • Experienced in writing Hive Validation Scripts which are used in validation framework (for daily analysis through graphs and presented to business users).
  • Developed workflow in Oozie to automate the tasks of loading data into HDFS and pre-processing wif Pig and Hive.
  • Analyze and define researcher's strategy and determine system architecture and requirement to achieve goals.
  • Developed multiple Kafka Producers and Consumers from as per the software requirement specifications.
  • Used Kafka for log accumulation like gathering physical log documents off servers and places them in a focal spot like HDFS for handling.
  • Configured Spark streaming to get ongoing information from the Kafka and store the stream information to HDFS.
  • Used various spark Transformations and Actions for cleansing the input data.
  • Developed shell scripts to generate the hive create statements from the data and load the data into the table.
  • Involved in writing custom Map-Reduce programs using java API for data processing.
  • Integrated Maven build and designed workflows to automate the build and deploy process.
  • Involved in developing a linear regression model to predict a continuous measurement for improving the observation on data developed using spark wif Scala API.
  • Worked extensively on spark and MLlib to develop a regression model for logistic information.
  • Implemented Spark Scripts using Scala, Spark SQL to access hive tables into spark for faster processing of data.
  • The hive tables are created as per requirement were Internal or External tables defined wif appropriate static, dynamic partitions and bucketing, intended for efficiency.
  • Load and transform large sets of structured, semi structured data using hive.
  • Extract Real time feed using Kafka and Spark Streaming and convert it to RDD and process data in the form of Data Frame and save the data as Parquet format in HDFS.
  • Used Spark and Spark-SQL to read the parquet data and create the tables in hive using the Scala API.
  • Tested Apache TEZ, an extensible framework for building high performance batch and interactive data processing applications, on Pig and Hive jobs
  • Very good understanding Cassandra cluster mechanism dat includes replication strategies, snitch, gossip, consistent hashing and consistency levels.
  • Experienced in using the spark application master to monitor the spark jobs and capture the logs for the spark jobs.
  • Implemented Spark using Scala and Spark SQL for faster testing and processing of data.
  • Implemented Spark using Scala and utilizing Data frames and Spark SQL API for faster processing of data.
  • Involved in making code changes for a module in work station simulation for processing across the cluster using spark-submit.
  • Involved in performing the analytics and visualization for the data from the logs and estimate the error rate and study the probability of future errors using regressing models.
  • Used WEB HDFS REST API to make the HTTP GET, PUT, POST and DELETE requests from the webserver to perform analytics on the data lake.
  • Worked on Sequence files, RC files, Map side joins, bucketing, partitioning for Hive performance enhancement and storage improvement.
  • Implemented Daily jobs dat automate parallel tasks of loading the data into HDFS using autosys and Oozie coordinator jobs.
  • Performed streaming of data into Apache ignite by setting up cache for efficient data analysis.
  • Sqoop jobs, PIG and Hive scripts were created for data ingestion from relational databases to compare wif historical data

Environment: Hadoop, Hive, HDFS, HPC, WEBHDFS, WEBHCAT, Spark, Spark-SQL, KAFKA, Java, Scala, Web Server's, Maven Build and SBT build, Pig, Hive, Sqoop, Oozie, Shell Scripting, SQL Talend, Spark, HBase, Hotonworks.

Kafka Admin

Confidential, Mclean, VA

Responsibilities:

  • Developed Pyspark code to read data from Hive, group the fields and generate XML files.Enhanced the Pyspark code to write the generated XML files to a directory to zip them to CDAs
  • Implemented REST call to submit the generated CDAs to vendor website Implemented Impyla to support JDBC/ODBC connections for Hiveserver2
  • Enhanced the Pyspark code to replace spark wif Impyla. Performed installation for Impyla on the Edge node
  • Evaluated performance of Spark application by testing on cluster deployment mode vs local mode
  • Experimented submissions wif Test OIDs to the vendor website
  • Explored StreamSet Data collector Implemented StreamSets data collector tool for ingestion into Hadoop.
  • Created a StreamSet pipeline to parse the file in XML format and convert to a format dat is fed to Solr
  • Built a data validation dashboard in Solr to be able to display the message record. Wrote shell script to run Sqoop job for bulk data ingestion from Oracle into Hive
  • Created tables for the ingested data in Hive. Scheduled Oozie job for data ingestion for the Sqoop job
  • Worked wif JSON file format for StreamSets. Worked wif Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs.
  • Shell scripts to dump the data from MySQL to HDFS.
  • Analyzing of large volumes of structured data using SparkSQL.
  • Configured Spark Streaming to receive real time data from the Kafka and store the stream data to HDFS.
  • Enhanced and optimized product Spark code to aggregate, group and run data mining tasks using the Spark framework.
  • Worked on Maven 3.3.9 for building and managing Java based projects. Hands-on experience wif using Linux and HDFS shell commands. Worked on Kafka for message queuing solutions.
  • Developing Unit Test Cases for Mapper, Reducer and Driver classes using MRUNIT.
  • Loaded and transformed large sets of structured, semi structured and unstructured data in various formats like text, zip, XML and JSON.
  • Generated Java APIs for retrieval and analysis on No-SQL database such as HBase and Cassandra.
  • Written HBASE Client program in Java and web services.
  • Data migration from RDMS to hadoop using sqoop for analysis and implemented Oozie jobs for automatic data imports from source.
  • Used Hive data warehouse tool to analyze the data in HDFS and developed Hive queries.
  • To analyze data migrated to HDFS, used Hive data warehouse tool and developed Hive queries.
  • Created external tables wif proper partitions for efficiency and loaded the structured data in HDFS resulted from MR jobs.
  • Involved in importing the real time data to hadoop using Kafka and implemented the Oozie job for daily imports.
  • Involved wif various teams on and offshore for understanding of the data dat is imported from their source.
  • Developed generic Shell scripts to automate Sqoop job by passing parameters for data imports.

Environment: Hadoop, HDFS, Pig, Sqoop, Spark, MapReduce, Cloudera, Snappy, Zookeeper, NoSQL, HBase, Shell Scripting, Ubuntu, Linux Red Hat.

Hadoop Kafka Admin

Confidential, Seattle, WA

Responsibilities:

  • Worked on Streamline analytics and data consolidation projects on the product Lucidworks Fusion.
  • Integrated Kafka, Spark, Scala and HBasefor streamline analytics for creating a predictive model and implemented Machine Learning protocol.
  • Developed Scala scripts, UDFFs using both Data frames in Spark for Data Aggregation, queries and writing data back into OLTP system through Sqoop.
  • Used Solr/Lucene for indexing and querying the JSON formatted data.
  • Handled cloud operations inside Rackspacefor persistence logic.
  • Monitored OOTB requests wif ATG, Akamai and TIBCO.
  • Used REST services for handling unfinished jobs, noing the status and creating a dataset inside a URL.
  • Worked on ingesting, reconciling, compacting and purging base table and incremental table data using Hive and HBaseand job scheduling through Oozie.
  • Designed and implemented streaming data on UI wif Scala.js
  • Utilized DevOps principal components to ensure operational excellence before deploying in production.
  • Operating the cluster on AWS by using EC2, Akka, EMR, S3 and cloudwatch.
  • Transported data to HBase using Flume.
  • Used Java UDFs for performance tuning in Hive and Pig by manually driving the MR part.
  • Used Java APIs such as machine learning library functions, graph algorithms for training and predicting the linear model in spark streaming.
  • Has implemented unit testing in Java for pig and hive applications.
  • Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing wif Pig.
  • Involved in source system analysis, data analysis, and data modeling to ETL (Extract, Transform and Load).

Environment: Hadoop, HDFS, Pig, Sqoop, Spark, MapReduce, Cloudera, Snappy, Zookeeper, NoSQL, HBase, Shell Scripting, Ubuntu, Linux Red Hat.

Hadoop Admin

Confidential, San Antonio, TX

Responsibilities:

  • Responsible for loading the customer's data and event logs from Oracle database, Teradata into HDFS using Sqoop
  • End-to-end performance tuning of Hadoop clusters and Hadoop MapReduce routines against very large data sets.
  • Developed the Pig UDF'S to pre-process the data for analysis.
  • Installed and configured Hadoop HDFS, MapReduce, Pig, Hive, and Sqoop.
  • Wrote Pig Scripts to generate MapReduce jobs and performed ETL procedures on the data in HDFS.
  • Prepared TEZ build from the source code and run the HIVE Query Jobs using TEZ execution engine rather using MR jobs for better performance
  • Participate in requirements gathering and designing, development, testing and analysis phase of the project in documenting the business requirements by conducting workshops/meetings wif various business users.
  • Importing and exporting data into HDFS from database and vice versa using Sqoop.
  • Involved in Writing the Hive queries and Pig script files for processing data and loading to HDFS internally run in map reduce way.
  • Wrote Hive queries and Pig script for data analysis to meet the Business requirements.
  • Highly noledge on Hadoop Administrator has extensive abilities of building, configuring and administration of large data clusters in big data environments using Apache distribution.
  • Having noledge on Hive and Hbase integration, pig and Hbase Integration.
  • Used Spark-Streaming APIs to perform necessary transformations and actions on the fly for building the common learner data model which gets the data from Kafka in near real time and Persists into Cassandra

Environment: Hadoop, MapReduce, HDFS, Hive, Flume, Sqoop, Cloudera, Oozie, Data Stage, Java Cron Jobs, UNIX Scripts, Spark.

Hadoop Admin

Confidential, New Jersey

Responsibilities:

  • Evaluated suitability of Hadoop and its ecosystem to the above project and implementing / validating wif various proof of concept (POC) applications to eventually adopt them to benefit from the Big Data Hadoop Initiative.
  • Responsible for building scalable distributed data solutions using Hadoop.
  • Designed and implemented Map Reduce-based large-scale parallel relation-learning system.
  • Installed and configured Pig for ETL jobs. Written Pig scripts wif regular expression for data cleaning.
  • Creating Hive external tables to store the Pig script output. Working on them for data analysis to meet the business requirements.
  • Developed Map Reduce pipeline jobs to process the data and create necessary HFiles.
  • Involved in loading the created HFiles into HBase for faster access of all the products in all the stores wifout taking Performance hit.
  • Extensively used Apache Sqoop for efficiently transferring bulk data between Apache Hadoop and relational databases (Oracle) for product level forecast.
  • Used Zookeeper for providing coordinating services to the cluster.
  • Used Oozie Scheduler system to automate the pipeline workflow and orchestrate the map reduces jobs dat extract the data on a timely manner.
  • Imported data using Sqoop to load data from MySQL and Oracle to HDFS on regular basis.
  • Developing Scripts and Batch Job to schedule various Hadoop Program.
  • Written Hive queries for data analysis to meet the business requirements.
  • Creating Hive tables and working on them using Hive QL.
  • Involved in loading the created HFiles into HBase for faster access of large customer base wifout taking Performance hit.

Environment: Hadoop, PIG, Hive, Apache Sqoop, Oozie, HBase, Zoo keeper, Cloudera manager, 30 Node cluster wif Linux-Ubuntu.

We'd love your feedback!