We provide IT Staff Augmentation Services!

Hadoop Developer Resume

2.00/5 (Submit Your Rating)

Tampa, FloridA

PROFESSIONAL SUMMARY:

  • Around 7 years of IT experience in software development and support with experience in developing strategic methods for deploying Big Data technologies to efficiently solve Big Data processing requirement.
  • Expertise in Hadoop eco system components HDFS, Map Reduce, Yarn, HBase, Pig, Sqoop and Hive for scalability, distributed computing, and high performance computing.
  • Have experience in Apache Spark, Spark Streaming, Spark SQL, and No SQL databases like Cassandra and HBase
  • Experience in using Hive Query Language for data Analytics.
  • Experienced in Installing, Maintaining and Configuring Hadoop Cluster.
  • Hands on experience in Talent Open Studio ETL tool.
  • Strong knowledge on creating and monitoring Hadoop clusters on Amazon EC2, VM, Horton works Data Platform 2.1 & 2.2, CDH3, CDH4Cloudera Manager on Linux, Ubuntu OS etc.
  • Capable of processing large sets of structured, semi - structured and unstructured data and supporting systems application architecture.
  • Having Good knowledge on Single node and Multi node Cluster Configurations.
  • Strong knowledge in NOSQL column oriented databases like HBase, Cassandra, MongoDB, and its integration with Hadoop cluster.
  • Execution of Batch jobs through the data streams through SPARK Streaming.
  • Experience in Apache Spark, Spark Streaming, Spark SQL, and No SQL databases like Cassandra and HBase.
  • Have experience in Hadoop distributions like Amazon, Cloudera and Horton works.
  • Have thorough knowledge on spark architecture and how RDD's work internally. Have exposure to Spark Streaming and Spark SQL
  • Uses Talend Open Studio to load files into Hadoop HIVE tables and performed ETL aggregations in Hadoop HIVE.
  • Expertise on Scala Programming language and Spark Core.
  • Designing & Creating ETL Jobs through Talent to load huge volumes of data into Cassandra, Hadoop Ecosystem, and relational databases.
  • Experienced in job workflow scheduling and monitoring tools like Oozie and Zookeeper.
  • Good knowledge on Amazon EMR, S3 Buckets, Dynamo DB, RedShift.
  • Analyze data, interpret results, and convey findings in a concise and professional manner
  • Partner with Data Infrastructure team and business owners to implement new data sources and ensure consistent definitions are used in reporting and analytics
  • Promote full cycle approach including request analysis, creating/pulling dataset, report creation and implementation and providing final analysis to the requestor.
  • Very Good understanding of SQL, ETL and Data Warehousing Technologies
  • Knowledge of MS SQL Server 2012/2008/2005 and Oracle 11g/10g/9i and E-Business Suite.
  • Expert in TSQL, creating and using Stored Procedures, Views, User Defined Functions, implementing Business Intelligence solutions using SQL Server 2000/2005/2008.
  • Developed Web-Services module for integration using SOAP and REST.
  • NoSQL database experience on HBase, Cassandra
  • Flexible with Unix/Linux and Windows Environments working with Operating Systems like Centos 5/6, Ubuntu 13/14, Cosmos.
  • Good experience on Kafka and Storm
  • Knowledge of java virtual machines (JVM) and multithreaded processing.
  • Strong programming skills in designing and implementation of applications using Core Java, J2EE, JDBC, JSP, HTML, Spring Framework, Spring batch framework, Spring AOP, Struts, JavaScript, Servlets.
  • Experience in build scripts using Maven and do continuous integrations systems like Jenkins.
  • Java Developer with extensive experience on various Java Libraries, API’s, and frameworks.
  • Hands on development experience with RDBMS, including writing complex SQL queries, stored procedure, and triggers.
  • Have sound knowledge on designing data warehousing applications with using Tools like Teradata, Oracle, and SQL Server.
  • Experience on using Talend ETL tool.
  • Experience in working with job scheduler like Autosys and Maestro.
  • Strong in databases like Sybase, DB2, Oracle, MS SQL.
  • Strong understanding of Agile Scrum and Waterfall SDLC methodologies.
  • Strong communication, collaboration & team building skills with proficiency at grasping new Technical concepts quickly and utilizing them in a productive manner.
  • Adept in analyzing information system needs, evaluating end-user requirements, custom designing solutions and troubleshooting information systems.
  • Strong analytical and Problem solving skills.

TECHNICAL SKILLS:

Big Data Ecosystem: HDFS, HBase, Hadoop Map Reduce, Hive, Pig, Sqoop, Spark, Flume, Oozie, Cassandra, Storm, and Impala.

Distributions: Apache Hadoop 1.0.4, Cloudera CDH3, CDH4.

Languages: C, C++, Java, SQL/PLSQL, Python.

Methodologies: Agile, waterfall.

Database: Oracle 10g, DB2, MySQL, MS SQL server.

Web Tools: HTML, Java Script, XML, ODBC, JDBC, Hibernate, JSP, Servlets, Java, Struts, and springs, JUnit, Json and Avro.

IDE / Testing Tools: Eclipse, Visual Studio, NetBeans, Putty.

Operating System: Windows, UNIX, Linux.

Scripts: JavaScript, Shell Scripting.

Version Control: SVN, CVS, TFS.

PROFESSIONAL EXPERIENCE:

Confidential, TAMPA, FLORIDA

HADOOP DEVELOPER

RESPONSIBILITIES:

  • Importing and exporting data into HDFS and Hive using Sqoop.
  • Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java map-reduce Hive, Pig, and Sqoop.
  • Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
  • Extensively used Talend ETL tool.
  • Migrated Map reduce jobs to Spark Jobs to achieve better performance.
  • Responsible for migrating the code base from Cloudera Platform to Amazon EMR and evaluated Amazon eco systems components like RedShift, Dynamo DB.
  • Design and Implementation of Real time applications using Apache Storm, Trident Storm, Kafka, and Apache ignite Memory grid and Accumulo.
  • Developed Map Reduce/ EMR jobs to analyze the data and provide heuristics and reports. The heuristics were used for improving campaign targeting and efficiency.
  • Experienced with batch processing of data sources using Apache Spark, Elastic Search .
  • Performed streaming of data into Apache ignite by setting up cache for efficient data analysis.
  • Created Hive External tables and loaded the data in to tables and query data using HQL.
  • Responsible to manage data coming from different sources.
  • Good knowledge in cloud integration with Amazon Elastic Map Reduce (EMR).
  • Installing and maintaining the Hadoop - Spark cluster from the scratch in a plain Linux environment and defining the code outputs as PMML.
  • Implemented Data loading using Spark, Storm, Kafka, Elastic Search
  • Experience in integrating Cassandra with Elastic Search and Hadoop.
  • Stored data in AWS S3 similar to HDFS. Also, performed EMR programs on data stored in S3
  • Extensive experience in Spark Streaming (version 1.5.2) through core Spark API running Scala, Java & Python Scripts to transform raw data from several data sources into forming baseline data.
  • Hands on expertise in running the SPARK & SPARK SQL on AMAZON ELASTIC MAPREDUCE (EMR).
  • Implemented SPARK batch jobs on AWS instances through Amazon Simple Storage Service (Amazon S3).
  • Performed performance tuning for Spark Steaming e.g. setting right Batch Interval time, correct level of Parallelism, selection of correct Serialization & memory tuning.
  • Replacing MapReduce with Grid Gain.
  • Created HBase tables to store variable data formats of input data coming from different portfolios. Involved in adding huge volumes of data in rows and columns to store data in HBase .
  • Used Spark API over Hadoop YARN to perform analytics on data in Hive.
  • Developed Spark code using Scala and Spark -SQL for batch processing of data
  • Integration of Cassandra with Talend and automation of jobs.
  • Involved in requirement and design phase to implement Streaming Lambda Architecture to use real time streaming using Spark and Kafka .
  • Design and development of database operations in PostgreSQL.
  • Exploring with Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark context, Spark -SQL , Data Frame, pair RDD's , Spark YARN.
  • Hands on experience working on NoSQL databases like HBase and PostgreSQL .
  • Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
  • Used Spark Data Frame API to process Structured and Semi Structured files and load them back into S3 Bucket.
  • Migrated Map reduce jobs to Spark Jobs to achieve better performance .
  • Designed application which receives data from several source systems and ingest to PostgreSQL database.

Environment: Hadoop, Map Reduce, Spark, Spark SQL, Kafka, Storm, HDFS, Hive, Sqoop, Oozie, Java, SQL, Shell script, Talend

Confidential, BLOOMINGTON, IL

Hadoop Developer

RESPONSIBILITIES:

  • Installed and configured Hadoop Map Reduce, HDFS and developed multiple Map Reduce jobs in Java for data cleansing and preprocessing.
  • Importing and exporting data into HDFS and Hive using Sqoop.
  • Developing data pipeline programs with Spark Scala APIs, data aggregations with Hive, and formatting data (Json) for visualization, and generating. E.g. High charts: Outlier, data distribution, Correlation/comparison, and 2 dimension charts using JavaScript.
  • Developed Scala & Python scripts, UDFs using both Data frames/SQL and RDD/Map Reduce in Spark.
  • Extracted files from Couch DB, MongoDB through Sqoop and placed in HDFS for processed
  • Used Flume to collect, aggregate, and store the web log data from different sources like web servers, mobile and network devices and pushed to HDFS
  • Developed Puppet scripts to install Hive, Sqoop, etc. on the nodes
  • Data back up and synchronization using Amazon Web Services.
  • Build and maintain scalable data pipelines using the Hadoop ecosystem and other open source components like Hive and HBase.
  • Worked on loading data from remote Postgres to HBase tables for fast transactional lookups.
  • Developed Spark streaming job to consume data from HDFS and do a look up in HBase
  • Created HBase tables to load large sets of structured data.
  • Executed queries in HBase to gather information about the data in real time.
  • Installed Hadoop, Map Reduce, HDFS, and AWS and developed multiple Map Reduce jobs in PIG and Hive for cleaning and pre-processing.
  • Connected Tableau from client end with AWS ip addresses and view the end results.
  • Worked on Amazon Web Services as the primary cloud platform
  • Using Packer, Terraform and Ansible, migrate legacy and monolithic systems to Amazon Web Services.
  • Load and transform large sets of structured, semi structured, and unstructured data
  • Supported Map Reduce Programs those are running on the cluster
  • Load log data into HDFS using Flume, Kafka and performing ETL integrations
  • Designed and implemented DR and OR procedures
  • Wrote shell scripts to monitor the health check of Hadoop daemon services and respond accordingly to any warning or failure conditions
  • Involved in loading data from UNIX file system to HDFS, configuring Hive and writing Hive UDFs
  • Utilized Java and MySQL from day to day to debug and fix issues with client processes
  • Used JAVA, J2EE application development skills with Object Oriented Analysis and extensively involved throughout Software Development Life Cycle (SDLC)
  • Hands-on experience of Sun One Application Server, Web logic Application Server, Web Sphere Application Server, Web Sphere Portal Server, and J2EE application deployment technology
  • Gained very good business knowledge on health insurance, claim processing, fraud suspect identification, appeals process, etc.
  • Worked on loading of data from several flat files sources to Staging using Teradata Multiload, Fast Load.
  • Monitoring Hadoop cluster using tools like Nagios, Ganglia, Ambari and Cloudera Manager.
  • Automation script to monitor HDFS and HBase through Cron jobs.
  • Used MRUnit for debugging Map Reduce that uses sequence files containing key value pairs.
  • Develop high-performance cache, making the site stable and improving its performance.
  • Create a complete processing engine, based on Cloudera's distribution
  • Proficient with SQL languages and good understanding of Informatica and Talend Administrative support for parallel computation research on a 24-node Fedora/ Linux cluster

Environment: Hadoop, Map Reduce, HDFS, Hive, Apache Spark, Kafka, Couch DB, Flume, AWS, Cassandra, Java, Struts, Servlets, HTML, XML, SQL, J2EE, MRUnit, JUnit, JDBC, SQL, XML, Eclipse.

Confidential, BENTONVILLE, AR

Hadoop Developer

RESPONSIBILITIES:

  • Involved in Requirement gathering, Business Analysis and translated business requirements into Technical design in Hadoop and Big Data
  • Importing and exporting data into HDFS from database and vice versa using Sqoop.
  • Written hive jobs to parse the logs and structure them in tabular format to facilitate effective querying on the log data
  • Written Map Reduce code to process and parsing the data from various sources and storing parsed data into HBase and Hive using HBase-Hive Integration.
  • Involved in creating hive tables, loading with data, and writing hive queries that will run internally in map reduce way.
  • Used SparkSQL for Scala interface that automatically converts RDD case classes to schema RDD
  • Using SparkSQL read and write table which are stored in hive.
  • Involved in creating workflow to run multiple hive and Pig Jobs, which run independently with time and data availability
  • Involved in developing shell scripts and automated data management from end to end integration work
  • Used Pig as a ETL tool to do Transformations, even joins and some pre-aggregations before storing data into HDFS
  • Developed Map Reduce program for parsing and loading into HDFS information.
  • Built reusable Hive UDF libraries for business requirements which enabled users to use these UDF's in Hive Querying.
  • Automating and scheduling the Sqoop jobs in a timely manner using Unix Shell Scripts.
  • Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java map-reduce Hive, Pig, and Sqoop.
  • Experienced in using Zookeeper and OOZIE Operational Services for coordinating the cluster and scheduling workflows.
  • Using HBase to store majority of data which needs to be divided based on region.
  • Developed Map Reduce programs for data analysis and data cleaning

Environment: Hive QL, MySQL, HDFS, HIVE, HBase, Java, Eclipse, MS-SQL, Spark, Azure PIG, Sqoop, UNIX.

JAVA DEVELOPER

Confidential, Dallas, TX

RESPONSIBILITIES:

  • Involved in designing of shares and cash modules using UML.
  • Effectively used the iterative waterfall model software development methodology during this time constraint project.
  • Used HTML and JSP for the web pages and used JavaScript for Client side validation.
  • Created XML pages with DTD’s for front-end functionality and information exchange.
  • Responsible for writing Java SAX parsers programs.
  • Familiar with the state-of-the-art standards, processes, design processes used in creating and designing optimal UI using Web 2.0 technologies like Ajax, JavaScript, CSS, and XSLT.
  • Developed ANT build scripts to build and deploy application in enterprise archive format (. ear)
  • Performed Unit testing using JUnit and Functional Testing.
  • Used Hibernate framework and Spring JDBC framework modules for backend communication in the extended application
  • Used the Json response format to retrieve data from web servers.
  • Wrote Action Form and Action classes and used various HTML tags, Bean, and Logic etc., also configured Struts-Config.xml for global forwards, error forwards & action forwards.
  • Developed UI using JSP and Servlet and server-side code with Java.
  • Used JDBC 2.0 extensively and was involved in writing several SQL queries for the data retrieval.
  • Prepared program specifications for the loans module and involved in database designing.
  • Provided security for REST using SSL& for SOAP using Encryption with X.509 Digital signature.
  • Involved in creating JUNIT test cases and ran the TEST SUITE using EMMA tool.
  • Developed Schema/Component Template/Template Building Block components in SDL Tridion.
  • Developed coding using SQL, PL/SQL, Queries, Joins, Views, Procedures/Functions, Triggers, and Packages.
  • Implemented Hibernate to persist the data into Database and wrote HQL based queries to implement CRUD operations on the data
  • Java programming using swing to complete the functionality of cash lockers and security modules.
  • Used application servers like Web Logic, Web Sphere, Apache Tomcat, Glassfish and JBoss based on the client requirements and project specifications.
  • Servlet programming for connecting to the database server and to retrieve the serialized data.
  • Programmed stored procedures using SQL and PL/SQL for the bulk calculations of general ledger.

Environment: Java, J2EE, EJB 2.0, Servlets, JavaScript, OO, JSP, JNDI, Java Beans, Web Logic, XML, XSL, Eclipse, PL/SQL, Oracle 8i, HTML, DHTML, UML.

We'd love your feedback!