We provide IT Staff Augmentation Services!

Hadoop Developer Resume

5.00/5 (Submit Your Rating)

Lexington, KY

SUMMARY:

  • 8+ years of overall experience in building and developing Hadoop Map Reduce solutions.
  • Strong experience with Big Data and Hadoop technologies with excellent knowledge of Hadoop ecosystem: Hive, Spark, Sqoop, Impala, Pig, HBase, Kafka, Flume, Storm, Zookeeper, Oozie.
  • Experienced in transferring data from different data sources into HDFS systems using Kafka producers, consumers and Kafka brokers.
  • Expert in creating and designing data ingest pipelines using technologies such as Apache Storm - Kafka.
  • Good Knowledge in creating processing data pipelines using Kafka and Spark Streaming.
  • Experienced in data ingestion using Sqoop, Storm, Kafka and Apache Flume.
  • Experience in Apache Flume for efficiently collecting, aggregating, and moving large amounts of log data.
  • Knowledge in Streaming the Data to HDFS using Flume.
  • Hands on Flume to handle the real time log processing for attribution reports.
  • Experience in Importing and Exporting the data using Sqoop from HDFS to Relational Database systems/mainframe and vice-versa.
  • Experience with Apache Spark ecosystem using Spark-SQL, Data Frames, RDD's and knowledge on Spark MLib.
  • Experience in ingestion, storage, querying, processing and analysis of Big Data with hands on experience in Big Data including Apache Spark, Spark SQL, Spark Streaming.
  • Worked with Spark engine to process large scale data and experience to create Spark RDD.
  • Knowledge on developing Spark Streaming jobs by using RDDs and leverage Spark-Shell.
  • Expertise in Talend Big data tool with involved in architectural designing and development of ingestion and extraction job in Big Data and Spark Streaming.
  • Having experience on RDD architecture and implementing Spark operations on RDD and also optimizing transformations and actions in Spark.
  • Hands on Apache Spark jobs using Scala in test environment for faster data processing and used Spark SQL for querying.
  • Knowledge of Spark code and SparkSQL for testing and processing of data using Scala.
  • Knowledge on cloud services Amazon web services(AWS).
  • Good in analyzing data using HiveQL, PigLatin and custom MapReduce program in Java.
  • Expertise in MapReduce programs in HIVE and PIG to validate and cleanse the data in HDFS, obtained from heterogeneous data sources, to make it suitable for analysis.
  • Good in Hive and Impala queries to load and processing data in Hadoop File system (HFS).
  • Good understanding of NoSQL Data bases and hands on work experience in writing applications on NoSQL databases like Cassandra and MongoDB.
  • Hands on Datalake cluster with Hortonworks Ambari on AWS using EC2 and S3.
  • Strong knowledge in MongoDB concepts includes CRUD operations and aggregation framework and in document Schema design.
  • Experience in maintenance/bug-fixing of web based applications in various platforms.
  • Experience in managing life cycle of MongoDB including sizing, automation, monitoring and tuning.
  • Experience in storing, processing unstructured data using NoSQL databases like HBase.
  • Good in developing web-services using REST, HBase Native API Client to query data from HBase.
  • Experience in working with MapReduce programs using Hadoop for working on Big Data.
  • Knowledge of ETL methods for data extraction, transformation and loading in corporate-wide ETL Solutions and Data Warehouse tools for reporting and data analysis.
  • Experienced in job workflow scheduling, monitoring tools like Oozie and Zookeeper.
  • Experience with Oozie Workflow Engine in running workflow jobs with actions that run Java MapReduce and Pig jobs.
  • Good experience with both Job Tracker (Map reduce 1) and YARN (Map reduce 2).
  • Experience in managing and reviewing Hadoop Log files generated through YARN.
  • Experience in using Apache Solr for search applications.
  • Knowledge of Business Intelligence and Reporting.
  • Experienced in Java, Spring Boot, Apache Tomcat, Maven, Gradle, Hibernate and open source frameworks/ software's.
  • Preparation of Dashboards using Tableau.
  • Experience with all stages of the SDLC and Agile Development model right from the requirement gathering to Deployment and production support.
  • Proficient on Test driven development (TDD), Agile to produce high quality deliverables.
  • Hands on Agile (Scrum), Waterfall model along with automation and enterprise tools like Jenkins, Chef, JIRA, Confluence to develop projects and version control, Git.

TECHNICAL SKILLS:

Big Data: Hadoop HDFS, MapReduce, HIVE, PIG, HBase, Zookeeper, Sqoop, Oozie, Cassandra, Spark, Scala, Storm, Flume, Kafka and Avro.

Web Technologies: Core Java, J2EE, Servlets, JSP, JDBC, JBOSS, JSF, XML, AJAX, SOAP, WSDL.

Methodologies: Agile, SDLC, Waterfall model, UML, Design Patterns, Scrum.

Frameworks: ASP.NET, Java EE, MVC, Struts 2/1, Hibernate 3, Spring 3/2.5/2.

Programming Languages: Core Java, C, C++, SQL, Python, Scala, XML, Unix Shell scripting, HTML, CSS, JavaScript.

Data Bases: Oracle 11g/10g, IBM DB2, MongoDB, Microsoft SQL Server, MySQL, MS-Access.

Apache Tomcat, JSON: RPC, Web Logic, Web Sphere.

NOSQL: Cassandra, MongoDB, HBase.

Monitoring, Reporting tools: Ganglia, Nagios, Custom Shell Scripts.

PROFESSIONAL EXPERIENCE:

Confidential, Lexington, KY

Hadoop Developer

Roles and Responsibilities:

  • Responsible for design and development of analytic models, applications and supporting tools, which enable Developers to create algorithms/models in a big data ecosystem.
  • Ensures appropriate data testing is completed and meets test plan requirements.
  • Engage with business stakeholders to design and own end-to-end solutions to empower data driven decision making.
  • Involved in developing code to write canonical model JSON records from numerous input sources to Kafka Queues.
  • Developed UDF's using Scala Scripts, which used in Data frames/SQL and RDD in Spark for Data Aggregation, queries and writing data back into RDBMS through Sqoop.
  • Experienced on Scala programming as a part of Spark batches and Streaming Data pipelines development.
  • Loaded the data into Spark RDD and do in memory data Computation to generate the Output response.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDD’s, and Scala.
  • Exposed to Spark Structured Streaming and worked as a part of Agile spike to check the functionality that suits the business requirement.
  • Improving performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame and Pair RDD's.
  • Hands on experience in Apache Spark creating RDD's and Data Frames applying operations like Transformation and Actions and converting RDD's to Data Frames.
  • Used Spark API over Hadoop YARN to perform analytics on data in Hive and Spark SQL.
  • Collected the logs data from web servers and integrated in to HDFS using Flume.
  • Involved in setting QA environment by implementing Pig and Sqoop Scripts.
  • Developed Pig Latin Scripts to do operations of sorting, joining and filtering enterprise data.
  • Experience in writing storm topology to accept events from Kafka producer and emit to Cassandra.
  • Developed Oozie workflow for scheduling ETL process and Hive Scripts.
  • Hands on experience in installing, configuring Cloudera.
  • Involved in writing queries in SparkSQL using Scala.
  • Used Spark API over Hortonworks Hadoop YARN to perform analytics on data in Hive.
  • Have good experience with NOSQL database like Cassandra.
  • Analyze Cassandra database and compare it with other open-source NoSQL databases to find which one of them better suites the current requirement.
  • Integrated Cassandra as a distributed persistent metadata store to provide metadata resolution for network entities on the network.
  • Worked on Hadoop cluster and data querying tools like Hive to store and retrieve data.
  • Performed advanced procedures like text analytics and processing, using the in-memory computing capabilities of Spark using Scala.
  • Loading data from Linux file system (LFS) to HDFS and vice-versa.
  • POC for enabling member and suspect search using Solr.
  • Knowledge of ETL methods for data extraction, transformation and loading in corporate-wide ETL Solutions and Data warehouse tools for reporting and data analysis.
  • Using CSVExcelStorage to parse with different delimiters in PIG.
  • Modified reports and Talend ETL jobs based on the feedback from QA testers and Users in development and staging environments.
  • Worked extensively with Sqoop for importing and exporting the data from HDFS to Relational Database systems/mainframe and vice-versa.
  • Implemented Test Scripts to support test driven development and integration.
  • Developed multiple MapReduce jobs in Java to clean datasets.
  • Involved in loading data from Linux file systems, servers, java web services using Kafka producers and consumers.
  • Prepared Avro schema files for generating Hive tables.
  • Experienced in working with applications team in installing Hadoop updates, upgrades based on requirement.

Environment: Hortonworks, ETL, YARN, HDFS, PIG, Hive, Kafka, Talend, Cassandra, Eclipse, Java, Sqoop, Avro, Spark, Spark API, SparkSQl, Spark Streaming, Scala, Solr, Talend, Linux file system, Linux Shell Scripting, Oozie, Agile.

Confidential, Chiacgo,IL

Big Data Developer

Roles and Responsibilities:

  • Collected and aggregated large amounts of weblog data from different sources such as webservers, mobile and network devices using Apache Flume and stored the data into HDFS for analysis.
  • Experience in setting up Fan-out workflow in Flume to design V-shaped architecture to take data from many sources and ingest into thesingle sink.
  • Reviewing and managing Hadoop log files from multiple machines using Flume.
  • Real time processing of raw data stored in Kafka and storing processed data in Hadoop using Spark Streaming(DStreams).
  • In pre-processing phase used SparkRDD transformations to remove all the missing data and to create new features.
  • Developed SparkSQL queries for generating statistical summary and filtering operations for specific use cases working with SparkRDD's on distributed cluster running ApacheSpark.
  • Involved in converting SQL queries into Apache Spark transformations using ApacheSparkDataFrames.
  • Worked with Spark to create structured data from the pool of unstructured data received.
  • Involved in working with Spark on top of YARN for interactive and batch analysis.
  • Importing and exporting data into HDFS and Hive using Sqoop.
  • Installed and configured Hive and also written Hive UDFs and UDAFs.
  • Worked on analyzing Hadoop Cluster and different big data analytic tools including Pig, Hive and MongoDB.
  • Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map reduce way.
  • Worked on Impala for creating views for business use-case requirements on top of Hive tables.
  • Experienced in managing and reviewing Hadoop log files.
  • Processed the Web server logs by developing Multi-Hop Flume agents by using Avro Sink and loaded into MongoDB for further analysis, Extracted files from MongoDB through Flume and Involved in loading data from UNIX file system to HDFS.
  • Extracted and restructured the data into MongoDB using import and export command line utility tool.
  • Used MongoDB as part of POC and migrated few of the stored procedures in SQL to MongoDB.
  • Worked with NOSQL databases (MongoDB) and Hybrid implementations.
  • Monitoring of Document growth and estimating storage size for large MongoDB clusters.
  • Experienced in working with Spark eco system using SparkSQL and Scala queries on different formats loke Text file, CSV file.
  • Expertized in implementing Spark using Scala and SparkSQL for faster testing and processing of data responsible to manage data from different sources.
  • Developed interactive Dashboards using Tableau connecting to Impala.
  • Supported Map Reduce Programs those are running on the cluster.
  • Involved in setting QA environment by implementing Pig and Sqoop scripts.

Environment:: Linux, MongoDB, Hadoop, Spark HBase, Sqoop, Pig, Impala, Hive, Kafka, Flume, Cloudera, Design Patterns, Apache Tomcat, My SQL Server 2008.

Confidential, Tampa, FL.

Hadoop Developer

Roles and Responsibilities:

  • Experience in creating integration between Hive and HBase for effective usage and performed MR Unit testing for the Map Reduce jobs.
  • Created BI reports(Tableau) and dashboards from HDFS data using Hive.
  • Experience in importing and exporting the data from Relational Database Systems to HDFS by using Sqoop.
  • Developed a common framework to import the data from Teradata to HDFS and to export to Teradata using Sqoop.
  • Imported the log data from different servers into HDFS using Flume and developed MapReduce programs for analyzing the data.
  • Used Flume to handle the real time log processing for attribution reports.
  • Worked on tuning the performance of Pig queries.
  • Involved in loading data from UNIX file system to HDFS.
  • Performed operation using Partitioning pattern in MapReduce to move records into different categories.
  • Implemented Fair schedulers on the Job tracker to share the resources of the Cluster for the MapReduce jobs given by the users.
  • Involved in templates and screens in HTML and JavaScript.
  • Created HBase tables to load large sets of data coming from UNIX and NoSQL.
  • Implemented the Web Service client for the login authentication, credit reports and applicant information using Apache Axis 2 Web Service.
  • Design, develop, test, implement and support of Data Warehousing ETL using Talend and Hadoop Technologies.
  • Built and deployed Java applications into multiple Unix based environments and produced both unit and functional test results along with release notes

Environment:: WebSphere 6.1, HTML, XML, ANT 1.6, MapReduce, Sqoop, UNIX, NoSQL, Java, JavaScript, MR Unit, Teradata, Node.js, JUnit 3.8, ETL, Talend, HDFS, Hive, HBase.

Confidential, Dallas,TX

Hadoop/Java Developer

Roles and Responsibilities:

  • Experience with the Hadoop ecosystem (MapReduce, Pig, Hive, HBase) and NoSQL.
  • Analyze and determine the relationship of input keys to output keys in terms of both type and number, identify the number, type, and value of emitted keys and values from the Mappers, Reducers and the number and contents of the output files.
  • Developed MapReduce pipeline jobs to process the data and create necessary HFiles and loading the HFiles into HBase for faster access without taking performance hit.
  • Designed & developed UI Screens with Spring (MVC), HTML5, CSS, JavaScript, AngularJS to provide interactive screens to display data.
  • Written multiple MapReduce programs in java for data extraction, transformation and aggregation from multiple file formats including XML, JSON, CSV and other file formats.
  • Analyze to determine the correct InputFormat, OutputFormat on order of MapReduce job requirements.
  • Created database tables and wrote T-SQL Queries and stored procedures to create complex join tables and to perform CRUD operations.
  • By using AWS, MapReduce job processed the data stored in the AWS.
  • Using Automated Build and continuous integration systems Jenkins and test-driven Unit Testing framework Junit.
  • Access control to users depending on logins using HTML, jQuery for validations.
  • Created the web application using HTML, CSS, jQuery and JavaScript.
  • Used Eclipse as an IDE for developing the application.
  • Loaded the flat files data using Informatica to the staging area.
  • Used UI-router in AngularJS to make this a single page application.
  • Developed unit/assembly test cases and UNIX shell scripts to run along with daily/weekly/monthly batches to reduce or eliminate manual testing effort.
  • Developed mappings in Informatica to load the data including facts and dimensions from various sources into the Data Warehouse, using different transformations like Source Qualifier, Java, Expression, Lookup, Aggregate, Update Strategy and Joiner.

Environment:: Windows XP/NT, Java, MapReduce, Pig, Hive, Hbase, NoSQL, AWS, Jenkins, HTML, CSS, T-SQL, AngularJS, UI, jQuery, Korn Shell, Quality Center 10.

Confidential

Java Developer

Roles and Responsibilities:

  • Played an active role in the team by interacting with welfare business analyst/program specialists and converted business requirements into system requirements.
  • Implemented J2EE standards, MVC architecture using Struts Framework. Implemented Servlets, JSP and Ajax to design the user interface.
  • Developed and deployed UI layer logics using JSP and dynamic JSP pages with Struts.
  • Worked with Struts MVC objects like Action Servlet, Controllers, and validators, Web Application Context, Handler Mapping, Message Resource Bundles and JNDI for look-up for J2EE components.
  • Automated the process for downloading and installing packages, copying source files and executing code.
  • Redesigned several web pages with better interface and features.
  • Developed application based on Software Development Life Cycle (SDLC).
  • Developed the XML data object to generate the PDF documents and other reports.
  • Used Hibernate, DAO, and JDBC for data retrieval and medications from database.
  • Messaging and interaction of Web Services is done using SOAP and REST
  • Developed JUnit Test cases for Unit Test cases and as well as System and User test scenarios and Involved in Unit Testing, User Acceptance Testing and Bug Fixing.

Environment:: J2EE, JDBC, Java, Servlets, JSP, Hibernate, Web services, REST, SOAP, Design Patterns, SDLC, MVC, HTML, JavaScript, WebLogic 8.0, XML, Junit, Oracle 10g, Eclipse.

Confidential

Java Developer

Responsibilities:

  • Developed the application under JEE architecture, developed Designed dynamic and browser compatible user interfaces using JSP, Custom Tags, HTML, CSS, and JavaScript.
  • Deployed & maintained the JSP, Servlets components on Web logic 8.0.
  • Developed MAVEN scripts to build and deploy the application onto Web Logic Application Server to run UNIX shell scripts and implemented auto deployment process.
  • Developed Application Servers persistence layer using, JDBC, SQL, Hibernate.
  • Used Hibernate 3.0 in data access layer to access and update information in the database.
  • Wrote SQL queries, stored procedures, and triggers to perform back-end database operations.
  • Used JDBC to connect the web applications to Data Bases.
  • Used Log4j framework to log/track application and debugging.
  • Developed and utilized J2EE Services and JMS components for messaging communication in Web Logic.
  • Configured development environment using Web logic application server for testing.

Environment: Java/J2EE, SQL, Oracle 10g, JSP 2.0, Hibernate 3.0, SQL, Maven, UNIX, Log4j, AJAX, Java Script, Web Logic 8.0, HTML, JDBC 3.0.

We'd love your feedback!