We provide IT Staff Augmentation Services!

Hadoop Developer Resume

4.00/5 (Submit Your Rating)

Houston, TX

PROFESSIONAL EXPERIENCE:

  • Having total 6 + years of IT experience in Application Development in Big Data Hadoop and SQL
  • 3+ years of experience in Hadoop and its components like HDFS, Map Reduce, Hive, Sqoop, impala and Pig, spark with Scala
  • Implemented Big Data solutions using Hadoop Ecosystem, including Map Reduce.
  • Experience on Hadoop with Cloudera Distribution Developer and Build and Release Engineer.
  • Excellent understanding / knowledge of Hadoop architecture and various components of Hadoop ecosystem such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, Map Reduce &YARN.
  • Hands on experience in Hadoop HDFS commands
  • Hands on experience on importing and exporting data from different databases like Oracle, MySQL into HDFS and Hive using Sqoop.
  • Worked extensively with HIVE DDLs and Hive Query language (HQLs)
  • Having experience in writing Hive queries to parse and process HDFS data.
  • Analyze the large data sets by using Pig Scripts
  • Having good experience in integration between hive and pig and H - base using H- catalog.
  • Good Work Experience on Apache Spark with Scala programming.
  • I have good work experience to creating data frames and Rdd's using Scala.
  • I have good work experience to writing regular expressions.
  • I have good work experience in shark-spark - sql
  • Hands on experience in creating RDD & applying operations transformations and Actions.
  • Having good experience in creating maven projects in Scala IDE
  • Good Knowledge on Flume, Kafka.
  • Having good knowledge in creating hive tables in H-base
  • Having good knowledge in integration of hive metastore to H-base
  • Having good experience in ran spark-submit jobs in standalone mode.
  • Great team player and quick learner with effective communication, motivation, and organizational skills combined with attention to details and business improvements.
  • Working knowledge in Big Data Components Hadoop (HDFS, MR), PIG, HIVE, HBASE, SQOOP, OOZIE and FLUME.
  • Experience in Analyzing data using Hive QL, Pig Latin and custom Map Reduce programs in java.
  • Collecting and aggregating large amount of log data using FLUME and storing data in HDFS for further analysis.
  • Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems (RDBMS) and from RDBMS to HDFS.
  • Experience in developing Oozie workflow for daily incremental loads, which gets data from Tara data and then imported into Hive tables.
  • Work Experience in Hadoop Administration in Cloudera Distribution and have Knowledge of Horton Works Distribution.
  • Knowledge in creating real time streaming solutions using Apache Spark Core & Spark SQL.
  • Good problem solving and analytical skills.
  • Ability to learn new things and take up challenges, team player with good communication skills.
  • Deep expertise in Hadoop ecosystem ( HDFS, PIG, Hive, MapReduce MR1, YARN, Sqoop, Oozie, HBase, Flume, Zookeeper, Spark, Spark-Sql)
  • Good experiences on importing and exporting data from different systems to Hadoop file system using SQOOP.
  • Extracted the data from Teradata into HDFS using Sqoop.
  • Strong knowledge of Hadoop and Hive's analytical functions.
  • Experience in column-family based Databases. (HBase)
  • Hands on experience on using OOZIE to define and schedule the jobs.
  • Good experience in monitoring and managing the Hadoop cluster using Cloudera Manager.
  • Having knowledge of core java.
  • Having knowledge on Scala Programming.
  • Having Knowledge on Spark sql Programming.
  • Hands on experience on spark
  • Expertise to deep dive into technical holes and coming out with solutions.
  • Proven ability to learn quickly and apply new technologies.

PROFESSIONAL EXPERIENCE:

Hadoop Developer

Confidential - Houston, TX

Responsibilities:

  • Support, monitoring of production jobs and provides 365x24x7 technical support and incident Production resolution
  • Moved all files data from various sources to HDFS for further processing.
  • Moved all log/text files generated by various products into HDFS location.
  • Written the Sqoop commands for importing and exporting data from Relational
  • Databases to HDFS and vice versa.
  • Good Knowledge on Hadoop Ecosystem, HDFS, Hadoop, Spark Architectures.
  • Expertise in working with Spark Framework using Spark SQL, Spark Streaming.
  • Prepared, processed numerous customer input files; parsed and reformatted the data to meet product requirements
  • Experience in manipulating/analysing large datasets and finding patterns and insights within structured data.
  • Good Perception on Production/Application Support life cycle and Strong Analytical and Programming Skills.
  • Experience in writing PIG scripts to access HDFS data in Hadoop Systems
  • Experience in writing of HIVE reports & Oozie scheduling
  • Highly experienced in importing and exporting data between HDFS and Relational Database Management systems using Sqoop.
  • Knowledge on Analyzing the Data using K-Means Algorithm with the help of MiLB.
  • Proficient in Technologies like SQL, PL/SQL, HiveQl, HBase, Spark SQL.
  • Experience in Implementing OOZIE workflows
  • Hands on experience in VPN Putty WinSCP VNCviewer etc.
  • Good Knowledge in Dealing with UNIX commands
  • Having knowledge and experience on complete installation of jdk1.6.0, HDFS, PIG, Hive and eclipse.
  • Good Knowledge on Python
  • Having good knowledge & Experience in Other Utilities TOAD, SQL LOADER, SQL*PLUS.
  • Strong experience in requirements gathering, analysis, conversion and implementation of business requirements into business requirement documents, high level and low-level design documents
  • Knowledge on Machine Learning using Spark Mlib.
  • Proficient in handling oracle Procedures, Functions and Packages.
  • Good Experience in Handling End-to-End Projects.
  • Written Hive queries to process the HDFS data.
  • Loaded all the data in HIVE-External Tables in later Sprints.
  • Involved in conversion of SQL to HQL and My SQL Procedures to Scala Code to run on Spark Engine.
  • Creating Rdd to load the unstructured data.
  • Regular expressions are used which can be used n using the. method.
  • Using map & flat map transimitions to map data which was stored in Rdd.
  • Creating data frames to get tables format
  • Using show and count actions after creation of data frames
  • Used Oozie to schedule jobs with workflow and coordinator xml files.
  • Daily interaction with client to get inputs to know about new enhancements.

Environment: Cloudera Hadoop (CDH), Hive, Sqoop, Oozie, HDFS, Spark.

Hadoop Developer

Confidential - Chicago, IL

Responsibilities:

  • Responsible in Installation and Configuration of Hadoop Eco system components using CDH 5.2 Distribution.
  • Responsible to manage data coming from different sources and involved in HDFS maintenance and loading of structured and unstructured data.
  • Processed Multiple Data sources input to same Reducer using Generic Writable and Multi Input format.
  • Worked Big data processing of clinical and non clinical data using Map Reduce.
  • Visualize the HDFS data to customer using BI tool with the help of Hive ODBC Driver.
  • Customized BI tool for manager team that perform Query analytics using HiveQL.
  • Imported data using Sqoop to load data from MySQL to HDFS on regular basis.
  • Created Partitions, Buckets based on State to further process using Bucket based Hive joins.
  • Created Hive Generic UDF's to process business logic that varies based on policy.
  • Moved Relational Data base data using Sqoop into Hive Dynamic partition tables using staging tables.
  • Experienced in Monitoring Cluster using Cloudera manager.
  • Involved in Discussions with business users to gather the required knowledge.
  • Capable of creating real time data streaming solutions and batch style large scale distributed computing applications using Apache Spark, Spark Streaming, Kafka and Flume.
  • Analyzing the requirements to develop the framework.
  • Designed and developed architecture for data services ecosystem spanning Relational, NoSQL and Big Data technologies.
  • Loaded and transformed large sets of structured, semi structured and unstructured data using Hadoop/Big Data concepts.
  • Developed Java Spark streaming scripts to load raw files and corresponding.
  • Processed metadata files into AWS S3 and Elasticsearch cluster.
  • Developed Python Scripts to get the recent S3 keys from Elasticsearch.
  • Elaborated Python Scripts to fetch/get S3 files using Boto3 module.
  • Implemented PySpark logic to transform and process various formats of data like XLSX, XLS, JSON, TXT.
  • Built scripts to load PySpark processed files into Redshift Db and used diverse PySpark logics.
  • Developed scripts to monitor and capture state of each file which is being through.
  • Developed Map Reduce programs to cleanse the data in HDFS obtained from heterogeneous data sources.
  • Involved in scheduling Oozie workflow engine to run multiple Hives and pig jobs and used Oozie Operational Services for batch processing and scheduling workflows dynamically.
  • Included migration of existing applications and development of new applications using AWS cloud services.
  • Wrought with data investigation, discovery and mapping tools to scan every single data record from many sources.
  • Implemented Shell script to automate the whole process.
  • Integrated Apache Storm with Kafka to perform web analytics.
  • Uploaded click stream data from Kafka to Hdfs, HBase, and Hive by integrating with Storm.
  • Extracted data from SQL Server to create automated visualization reports and dashboards on Tableau.
  • Responsible for Cluster maintenance, adding and removing cluster nodes, Cluster Monitoring and Troubleshooting, Managing and reviewing data backups & log files.

Environment: AWS S3, Java, Maven, Python, Spark, Kafka, Elasticsearch, MapR Cluster, Amazon Redshift Db, Shell script, Boto3, pandas, Elasticsearch, certifi, PySpark, Pig, Hive, Oozie, JSON.

Hadoop Developer

Confidential - Chicago, IL

Responsibilities:

  • Experience with Hadoop Ecosystem components like HBase, Sqoop, ZooKeeper, Oozie, Hive and Pig with Cloudera Hadoop distribution.
  • Conceived and designed custom POCs using Kafka 0.10 and the Twitter Stream in standalone mode; architected the front-end near real-time data pub/sub non-blocking messaging system using the Kafka/Confluent.io Enterprise
  • Developed PIG and Hive UDF's in java for extended use of PIG and Hive and wrote Pig Scripts for sorting, joining, filtering and grouping the data.
  • Worked with NoSQL databases like HBase for creating HBase tables to load large sets of semi structured data coming from various sources.
  • Expand programs in Spark based on the application for faster data processing than standard MapReduce programs.
  • Custom Kafka broker design to reduce message retention from default 7 day retention to 30 minute retention - architected a light weight Kafka broker
  • Elaborated spark programs using Scala, involved in creating Spark SQL Queries and Developed Oozie workflow for spark jobs.
  • Prepared the Oozie workflows with Sqoop actions to migrate the data from relational databases like Oracle, Teradata to HDFS.
  • Designed and Developed Real Time Stream processing Application using Spark, Kafka, Scala and Hive to perform Streaming ETL and apply Machine Learning.
  • Developing Kafka producers and consumers in java and integrating with apache storm and ingesting data into HDFS and HBase by implementing the rules in storm.
  • Built a prototype for real time analysis using Spark streaming and Kafka.
  • Worked with Kafka for the proof of concept for carrying out log processing on a distributed system. Worked with NoSQL database Hbase to create tables and store data.
  • Used the Spark - Cassandra Connector to load data to and from Cassandra. Real time streaming the data using Spark with Kafka.
  • Creating Hive tables, dynamic partitions, buckets for sampling, and working on them using HiveQL.
  • Used Sqoop to store the data into HBase and Hive.
  • Enumerated Hive queries to do analysis of the data and to generate the end reports to be used by business users.
  • Worked on scalable distributed computing systems, software architecture, data structures and algorithms using Hadoop, Apache Spark and Apache Storm etc. and ingested streaming data into Hadoop using Spark, Storm Framework and Scala.
  • Responsible for developing data pipeline by implementing Kafka producers and consumers and configuring brokers.
  • Imported required tables from RDBMS to HDFS using Sqoop and used Spark and Kafka to get real time streaming of data into HBase.
  • Good experience with NOSQL databases like MongoDB.
  • Managed mission-critical Hadoop cluster and Kafka at production scale, especially Cloudera distribution.
  • Creating event processing data pipelines and handling messaging services using Apache Kafka.
  • Written spark python for model integration layer.
  • Experienced in handling large datasets using Spark in Memory capabilities, using broadcasts variables in Spark, effective & efficient joins, transformations and other capabilities.
  • Elaborated Spark code and Spark-SQL/Streaming for faster testing and processing of data.
  • Used Spark for interactive queries, processing of streaming data and integration with popular NoSQL database for huge volume of data.
  • Developed a data pipeline using Kafka, HBase, Mesos Spark and Hive to ingest, transform and analyzing customer behavioral data.

Environment: Hadoop, HDFS, CDH, Pig, Hive, Oozie, ZooKeeper, HBase, Spark, Storm, Spark SQL, NoSQL, Scala, Kafka, Mesos, Mango DB.

Java Developer

Confidential

Responsibilities:

  • Identified System Requirements and Developed System Specifications, responsible for high-level design and development of use cases.
  • Involved in designing Database Connections using JDBC.
  • Organized and participated in meetings with clients and team members.
  • Developed web-based Bristow application using J2EE (Spring MVC Framework), POJOs, JSP, JavaScript, HTML, jQuery, Business classes and queries to retrieve data from backend.
  • Development of Client-Side Validation techniques using jQuery.
  • Worked with Bootstrap to develop responsive web pages.
  • Implemented client side and server side data validations using the JavaScript.
  • Responsible for customizing data model for new applications by using Hibernate ORM technology. Involved in the implementation of DAO and DTO using spring with Hibernate ORM.
  • Implemented Hibernate for the ORM layer in transacting with MySQL database.
  • Developed authentication and access control services for the application using Spring LDAP.
  • Experience in event - driven applications using AJAX, Object Oriented JavaScript, JSON and XML. Good knowledge on developing asynchronous applications using jQuery. Valuable experience with Form Validation by Regular Expression, and jQuery Light box.
  • Used MySQL for the EIS layer.
  • Involved in design and Development of UI using HTML, JavaScript and CSS.
  • Designed and developed various data gathering forms using HTML, CSS, JavaScript, JSP and Servlets.
  • Developed user interface modules using JSP, Servlets and MVC framework.
  • Experience in implementing of J2EE standards, MVC2 architecture using Struts Framework.
  • Developed J2EE components on Eclipse IDE.
  • Used JDBC to invoke Stored Procedures and used JDBC for database connectivity to SQL.
  • Deployed the applications on Tomcat Application Server.
  • Developed Web services using Restful and JSON.
  • Created Java Beans accessed from JSPs to transfer data across tiers.
  • Database Modification using SQL, PL/SQL, Stored procedures, triggers, Views in Oracle9i.

Environment: Java, JSP, Servlets, JDBC, Eclipse, Web services, Spring 3.0, Hibernate 3.0, MySQL, JSON, Struts, HTML, JavaScript, CSS

Build and Release Engineer

Confidential

Responsibilities:

  • As DevOps Engineer, I am responsible for day-to-day build and deployment into QA and pre-prod environments
  • Automate the build and deploy of all the internal environments using various continuous integration tools and scripting languages.
  • Integrated Subversion into uDeploy to automate the code check-out process
  • Maintained and administered GIT source code tool.
  • Developed processes, tools, automation for TFS (Team Foundation System) based software for build system and delivering SW Builds.
  • Managed build results in uDeploy and deployed using workflows in uDeploy.
  • Delivered specific versions of various components of an application into target environments using uDeploy.
  • Maintain and track inventory using uDeployand set alerts when the servers are full and need attention.
  • Modeled the structure for multi-tiered applications orchestrates the processes to deploy each tier.
  • Experience in JIRA to capture, organize and prioritize issues. Experience in partially administering JIRA for issue management.
  • Network administration and Network monitoring.
  • Developed build and deployment scripts using ANT and MAVEN as build tools in Jenkins to move from one environment to other environments.
  • Developing Information Security policies and coordinating the activities required for implementing them. Creating a compliance review plan and conduct periodic review to evaluate the compliance level.
  • Implementation of TCP/IP& related Services-DHCP/DNS/WINS.
  • Used Hudson/Jenkins for automating Builds and Automating Deployments.
  • Troubleshot TCP/IP, layer 1, 2 problems and connectivity issues in multi-protocol Ethernet environment.
  • Proficiency in using the iRules for redirection of HTTP based traffic to HTTPS traffic, HTTP acceleration iRule, HTTP header-insertion and modification.
  • Used various plug-ins to extend the base functionality of Hudson/Jenkins to deploy, integrate tests and display reports.
  • Configuring network services such as DNS/NFS/NIS/NTP for UNIX/Linux Servers.
  • Owned build farm and produced effective multiple branch builds to support parallel development
  • Owned Release to Production Process, gathered approvals, signoffs from stakeholders and QAs before going to PROD.
  • Maintaining the VMware ESXi Servers through VMware Infrastructure Client (vSphere client).
  • Managed the Release Communication and Co-ordination Process.
  • Developed build scripts using ANT and MAVEN as the build tools for the creation of build artifacts like war or ear files.
  • Maintained the Shell and Perl scripts for the automation purposes.
  • Involved in editing the existing ANT/MAVEN files in case of errors or changes in the project requirements.

Environment: Windows Solaris, UNIX, C++, Java, Eclipse 3.20, Ant, Jenkins, JBoss Application Server, CVS, Subversion, VTFS, Jira and Cygwin, IBMClearcase 7.0.

Environment: Java, JSP, J2EE, Servlets,Hibernet, Java Beans, HTML, Reactjs,JavaScript, groovy, JDeveloper, Apache Tomcat, Web server, Oracle, JDBC, XML.

TECHNICAL SKILLS:

Languages: Java, C, SQL, PL/SQL, Python and Scala

Big Data: Hadoop, HDFS, Hive, Pig, Sqoop, HBase, Oozie, Zookeeper, Flume, Spark, Impala, Kafka.

Technologies: API JSP, JavaBeans, JDBC

Web Services: API SOAP, WSDL, REST API, Micro Services.

Web Technologies: XML, XSL, XSLT, HTML5, JavaScript.

Web/Application: Servers Apache Tomcat, Web Logic, IBM WebSphere, JBoss.

Design Patterns: MVC, Front Controller, Singleton, DAO patterns.

Database: MySQL, Oracle, MS Access, MS SQL Server, NO SQL (MongoDB, HBase, Cassandra)

Build Tools: Maven, GIT, SVN, SOAP UI

ETL Tools: SSRS, SSIS

Operating System: Windows XP/2000/98, UNIX, Linux, DOS.

We'd love your feedback!