Hadoop Developer Resume
Middletown, NJ
SUMMARY:
- Hands on experience with HDFS, MapReduce and Hadoop Ecosystem (Pig, Hive, Oozie, Hbase, Zookeeper and Sqoop).
- Having experience with Spark processing Framework such as Spark and Spark Sql.
- Experience in NoSQL databases like HBase, MongoDB and Apache Cassandra
- Querying databases using HiveQL, Pig Latin and CQL (Cassandra Query language).
- Experienced in writing custom UDFs and UDAFs for extending Hive and Pig core functionalities.
- Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems (RDBMS).
- Working experience with CASK Application Platform (Integrated Platform for Big Data)
- Streaming Real - time Data using Kafka and Spark Streaming.
- Knowledge of Spark integration with Hive using Shark.
- Good exposure to Build tool Maven for Java and SBT for scala.
- Hands-on experience working with Docker, Docker Swamp and running isolated microservices.
- Worked with CLI servers like Jenkins, Teamcity.
- Extensive experience working with JAVA, JDBC, JSP, spring, Spring Boot, Java Networking.
- Experience in testing applications using JUNIT 4.
- Knowledge of Popular Java libraries like JodaTime, Guava, SL4J, Apache Commons, Apache POI.
- Knowledge of Spring MVC Web Framework and Django Framework for Web Applications.
- Capable Hands on work with Python, and libraries like Matplotlib, NumPy, SciPy, Pandas.
- Working source version control experience in Git and Logging Frameworks like log4j and sl4j.
- Extensive experience with SQL, PL/SQL and database concepts.
- Experience in designing developing and deploying Applications using JSPs Servlets using Apache Tomcat Web Server.
- Good experience in identifying actors Use Cases and representing UML and E-R Diagrams.
- Good web page design skills using HTML5, CSS3, BOOTSTRAP.
- Having experience working on C, C++ and Compiler programming.
- Strong understanding of OOP concepts and design patterns. experience in all stages of SDLC (Agile, Waterfall), writing System Requirement Specification and design principles.
TECHNICAL SKILLS:
Hadoop Ecosystem: Hadoop, HDFS, MapReduce, Hive, Pig, Sqoop, Flume, Oozie, Spark, Spark Streaming, Kafka, YARN, Zookeeper, HBase, Impala, Casandra.
Hadoop Distributions: Cloudera, Hortonworks
Programming Languages: C, C++, Java, Python, Scala, Spark
NO SQL: HBase, Cassandra
RDBMS: OracleDB, MySql
Java API s: JDBC, POI, Guava, JodaTime, Commons
Build Tools: Maven, SBT
IDE s: JAVA: IntellijIdea, Netbeans, Eclipse; Python: Pycharm, Jupyter Notebooks
Web Frameworks: Spring MVC, Django
Operating Systems: Windows, Linux
File Formats: CSV, JSON, Avro, Parquet
PROFESSIONAL EXPERIENCE:
Confidential, Middletown, NJ
Hadoop Developer
Responsibilities:
- Worked with DCAE Controller to manage Policies regarding Network and managed the lifecycle events.
- Worked with Spring Boot for Creating syslog Collectors and deployed them using docker.
- Retrieved Syslog Collector csv files and cleaned them and converted them to JSON files (Common Event Format files) to write to a Kafka topic.
- Published and subscribed to Data from Kafka and used them for Analysis.
- Converted various file formats and modified data coming from sources using scala.
- Integrated Syslog Collector with Data Movement as a Platform (DMaaP).
- Used CDAP as an integration environment where data coming from Kafka was streamed to store into HDFS. Then after creating Hive tables and storing it into a threshold Hbase table to reflect the network changes in real-time.
- Worked with Hive, Hbase for storing tables, partitioning them as required and joining tables.
- Used integrated environment for Big-Data like CASK for automating the process of data ingestion and transformation.
- Managed Data retrieval from Database, Kafka using CASK Pipelines into Hive, Hbase.
- Prepared Data by Joining, filtering from various sources with CDAP.
- Ingested the prepared data to Hive, Hbase, Cassandra
Confidential, Princeton, NJ
Java / Hadoop Developer
Responsibilities:
- Developed spark programs using scala to convert existing Portfolios of investments in csv format to parquet files and stored them in Hive.
- Used scala to format the incoming data from profiles like dates, investment information.
- Stored reference Data as a cached RDD to use it for multiple runs that could be launched by different users.
- Performed different types of transformations and actions on the RDD to meet the business requirements.
- Used Hive and created Hive tables and involved in data loading and writing Hive UDFs and partitioned the data as needed.
- Queried the Hive tables using spark SQL to validate the data and use them for further Analysis.
- Used Zookeeper for automatic job scheduling and running the programs on regular basis.
- Used Hbase for storing files in parquet format on HDFS.
- Managed data coming in from various sources and maintained HDFS loading of structured and unstructured data.
- Created data pipelines using Flume, sqoop to ingest data into Hive and Hbase.
- Responsible for importing log files from various sources into HDFS using Flume.
- Imported JSON, Parquet, csv files into Hive using Serde.
- Created Partitions, Buckects based on State to further process using Hive joins.
- Created Hive UDF’s UDAF’s in java to process business logic.
- Moved Data from MySQL using sqoop into Hive, Hbase and HDFS.
- Worked with NoSQL databases like Hbase, Cassandra to create and store data.
- Worked on integrating Hive and Hbase tables.
- Worked on JSON, csv, XML, Parquet, Avro file formats.
- Understanding and analyzing the requirements of the application along with SRS documentation.
- Implemented the application in Spring WEB Framework
- Designed and Developed views using JSP by following MVC architecture.
- Developed Controllers Controlling various needs of the application in Java.
- Wrote Bean configuration files and Dispatcher Servlet properties.
- Maintained application properties using Spring Boot.
- Implemented server-side programs by using Servlets and JSP.
- Designed, developed and validated User Interface using HTML, BOOTSTRAP and CSS.
- Implemented PL/SQL stored procedures and triggers.
- Used JDBC prepared statements to call from Servlets for database access.
- Designed and documented of the stored procedures
- Involved in Unit testing for various components using Junit4.
- Modified and Mined Database files using Apache POI library.
- Worked on database interaction layer for insertions, updating and retrieval operations of data from oracle database by writing stored procedures.
- Used Log4J for logging purpose in the project.
- Developed logging module using Log4J to create log files to debug as well as trace application.
- Used Jackson JSON Java API to parse JSON files and convert to Java Objects.
- Used Maven to package the applications.
- Source version control using Git.