We provide IT Staff Augmentation Services!

Hadoop Developer Resume

4.00/5 (Submit Your Rating)

Tampa, FL

SUMMARY

  • Over 10+ years of professional IT experience which includes experience in Big data ecosystem and Java/J2EE related technologies.
  • Excellent Experience in Hadoop architecture and various components such as HDFS Job Tracker Task Tracker Name Node Data Node and Map Reduce programming paradigm.
  • Hands on experience in installing configuring and using Hadoop ecosystem components like Hadoop Map Reduce HDFS HBase Hive Sqoop Pig Zookeeper and Flume.
  • Experience in managing and reviewing Hadoop log files.
  • Strong backend experience using; Python, Scala, Hive QL, Spark SQL, etc.
  • Excellent understanding and knowledge of NOSQL databases like MongoDB HBase Cassandra. Implemented in setting up standards and processes for Hadoop based application design and implementation.
  • Developed Simple to complex Map/reduce streaming jobs using Python language that are implemented using Hive and Pig.
  • Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and viceversa. Experience in Object Oriented Analysis Design OOAD and development of software using UML Methodology good knowledge of J2EE design patterns and Core Java design patterns.
  • Experience in managing Hadoop clusters using Cloudera Manager Tool.
  • Very good experience in complete project life cycle design development testing and implementation of Client Server and Web applications.
  • Experience in Administering Installation configuration troubleshooting Security Backup Performance Monitoring and Fine - tuning of Linux Redhat.
  • Extensive experience working in Oracle DB2 SQL Server and My SQL database. Good hold on scripting including Shell/Perl and Python.
  • Scripting to deploy monitors checks and critical system admin functions automation.
  • Hands on experience in application development using Java RDBMS and Linux shell scripting.
  • Experience in Java JSP Servlets EJB WebLogic WebSphere Hibernate Spring JBoss JDBC RMI Java Script Ajax JQuery XML and HTML.
  • Ability to adapt to evolving technology strong sense of responsibility and accomplishment.

TECHNICAL SKILLS

Programming Languages: Scala, Python, Java

Hadoop/Big Data: HDFS, MapReduce, Spark2.2.0, Yarn, Kafka, PIG, HIVE, Sqoop, Flume, Oozie, Impala, HBase, Hue, Zookeeper

NoSQL Technologies: Cassandra, MongoDB, HBase

Big data Distribution: Hortonworks, Cloudera, MapR

JAVA/J2EE Technologies: Servlets, JSP, JDBC, EJB, JAXB, JMS, JAX-RPC, JAX- WS, JAX-RS, Apache CFX.

Frameworks: Struts, Spring, Hibernate.

Web Technologies: HTML, CSS, JavaScript, jQuery, Ajax, Backbone.js, Node.js, Ext JS, .

Development Tools: Eclipse, Net Beans, IntelliJ.

Databases: MySQL, MS-SQL Server, IBM DB2, Oracle.

Operating Systems: Windows XP/Vista/7/8, 10, UNIX, Linux, Mac OS. Build Tools Ant, Sbt, Maven.

Web/Application Servers: WebSphere, Apache Tomcat, WebLogic, JBoss.

PROFESSIONAL EXPERIENCE

Hadoop Developer

Confidential - Tampa, FL

Responsibilities:

  • Hands on experience on HDFS, HIVE, PIG, Hadoop Map Reduce framework, SQOOP, Spark.
  • Worked extensively with HIVE DDLs and Hive Query language (HQLs).
  • Developed UDF, UDAF, UDTF functions and implemented it in HIVE Queries.
  • Hands on experience in Spark (fundamentals, RDD and dataframe APIs)
  • Ability to design complex ETL jobs using Talend Studio for big data applications.
  • Developed a data pipeline between Cassandra and Hadoop Lake using Spark.
  • Developed PIG Latin scripts for handling business transformations.
  • Implemented SQOOP for large dataset transfer between Hadoop and RDBMs.
  • Worked with join patterns and implemented Map side joins and Reduce side joins using Map Reduce.
  • Worked on ETL reports using Tableau and created statistics dashboards for Analytics.
  • Adequate knowledge and working experience with Agile (Scrum/Kanban) methodology.
  • Hands on experience in Sequence files, RC files, Combiners, Counters, Dynamic Partitions, Bucketing for best practice and performance improvement.
  • Interacted directly with Hortonworks team for cluster related issues and resolved the same.
  • Experience in setting up Hadoop on Pseudo distributed environment.
  • Experience in setting up HIVE, PIG, and SQOOP on Ubuntu Operating system.
  • Proficient in using data visualization tools Tableau desktop, Plotly, Raw, Palladio, and MS Excel
  • Familiarity with common computing environment (e.g. Linux, Shell Scripting)
  • Good team player with ability to solve problems, organize and prioritize multiple tasks.
  • Excellent communication and inter-personal skills.

Hadoop/Spark Developer

Confidential - Sanjose, CA

Responsibilities:

  • Developed simple to complex Map Reduce streaming jobs using Java language for processing and validating the data
  • Developed data pipeline using Map Reduce, Flume, Sqoop to ingest customer behavioural data into HDFS for analysis
  • Migrated Map Reduce jobs to Spark jobs to discover trends in data usage by users
  • Implemented Spark using Scala and Spark SQL for faster processing of data
  • Implemented algorithms for real time analysis in Spark
  • Imported data from AWS S3 in to Spark data frames, Performed transformations and actions on data frames
  • Used Spark for interactive queries, processing of streaming data and integration with popular NoSQL database for huge volume of data
  • Real time streaming the data using Kafka with Spark
  • Used the Spark - Cassandra Connector to load data to and from Cassandra
  • Handled importing data from different data sources into HDFS using Sqoop and also performing transformations using Hive, Map Reduce and then loading data into HDFS.
  • Exported the analyzed data to the relational databases using Sqoop, to further visualize and generate reports for the BI team
  • Collecting and aggregating large amounts of log data using Flume and staging data in HDFS for further analysis Analyzed the data by performing Hive queries (Hive QL)
  • Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting and contributed for performance tuning using Hive
  • Developed Hive scripts in Hive QL to de-normalize and aggregate the data Created HBase tables and column families to store the user event data
  • Written automated HBase test cases for data quality checks using HBase command line tools
  • Scheduled and executed workflows in Oozie to run Hive and Pig jobs
  • Used Tez framework for building high performance jobs in Pig and Hive
  • Configured Kafka to read and write messages from external programs
  • Configured Kafka to handle real time data
  • Developed end to end data processing pipelines that begin with receiving data using distributed messaging systems Kafka through persistence of data into HBase
  • Continuous monitoring and managing the Hadoop cluster using Cloudera Manager
  • Developed interactive shell scripts for scheduling various data cleansing and data loading process

Environment: Hadoop, Spark, Map Reduce, Pig, Hive, Sqoop, Oozie, HBase, Zoo keeper, Kafka, Flume, Cloudera manager, AWS S3, MySQL, Cassandra, Multi-node cluster with Linux-Ubuntu, Windows, Unix.

Hadoop and Spark Developer

Confidential - St. Pete, FL

Responsibilities:

  • Understanding business needs, analyzed functional specifications and mapped those in designing end to end data transformation pipelines.
  • Created Hive Tables, loaded data from Teradata using Sqoop.
  • Performed importing and exporting data into HDFS from Relational Databases and vice versa using Sqoop. Extensively worked on importing metadata into Hive and migrated existing tables and applications to work on Hive and AWS cloud.
  • Implemented Hive Generic UDF's to in corporate business logic into Hive Queries.
  • Extensively worked on Hive QL, join operations, writing custom UDF's and having good experience in optimizing Hive Queries.
  • Wrote Hive jobs to parse the logs and structure them in tabular format to facilitate effective querying on the log data.
  • Designed and implemented Hive and Pig UDF's using Python for evaluation, filtering, loading and storing of data Developed MR jobs for cleaning, validating and transforming the data.
  • Performed debugging, performance tuning using PIG and HIVE scripts by understanding the joins, group and aggregation between them.
  • Wrote Pig scripts to transform raw data from several data sources.
  • Used different columnar file formats (RC File, Parquet and ORC formats).
  • Used Cloud era manager to monitor workload, job performance and for capacity planning.
  • Took part in build applications using Maven and integrated with Continuous Integration servers like Jenkins to build jobs.
  • Performed data migration from Legacy Databases RDBMS to HDFS using Sqoop.
  • Hands on experience on whole ETL (Extract Transformation & Load) process.
  • ETL development to normalize this data and publish it in IMPALA
  • Worked along with BI teams in generating the reports and designing ETL workflows on Tableau.
  • Worked on NOSQL databases (HBase, MongoDB) for Hybrid implementations.
  • Used IMPALA to analyze data ingested into HBase and compute various metrics for reporting on the dashboard.
  • Designed and Maintained Oozie workflows to manage the flow of jobs in the cluster.
  • Worked with the testing teams to fix bugs and ensure smooth and error-free code.
  • Involved in Agile methodologies, daily Scrum meetings, Sprint planning.

Environment: Hadoop, Map Reduce, HDFS, Hive, Cassandra, Python, Java, SQL, Cloudera Manager, Pig, Sqoop, Oozie, HBase, Zookeeper, MongoDB, PL/SQL, MySQL, DB2, Teradata.

Java/Hadoop Developer

Confidential - Durham, NC

Responsibilities:

  • Installed and configured Hadoop HDFS, Map Reduce, Pig, Hive, and Sqoop.
  • Involved in implementing High Availability and automatic failover infrastructure to overcome single point of failure for Name node utilizing zookeeper services.
  • Developing PIG scripts to transform the raw data into intelligent data as specified by business users.
  • Demonstrate proficiency in Shell, Python scripts for file validation and processing, job scheduling, distribution and automation
  • Worked on Hadoop cluster and data querying tools Hive to store and retrieve data.
  • Reviewing and managing Hadoop log files by consolidating logs from multiple machines using flume.
  • Develop Spark apps in Java, Scala or Python
  • Used Spark for interactive queries, processing of streaming data and integration with popular NoSQL database for huge volume of data
  • Real time streaming the data using Kafka with Spark
  • Exported analyzed data to HDFS using Sqoop for generating reports.
  • Importing and exporting data into HDFS and Hive using Sqoop and Flume.
  • Worked on Oozie workflow engine to run multiple Map Reduce jobs.
  • Supported MapReduce Programs those are running on the cluster.
  • Experienced in working with applications team in installing Hadoop updates, upgrades based on requirement. Environment: Hadoop, MapReduce, HDFS, Pig, Sqoop, Cassandra, Spark, Kafka Hive, Java, Oracle, Eclipse and Shell/Python Scripting.

Hadoop Developer

Confidential

Responsibilities:

  • Worked on analyzing, writing Hadoop MapReduce jobs using API, Pig and Hive.
  • Gathered the business requirements from the Business Partners and Subject Matter Experts.
  • Involved in installing Hadoop Ecosystem components under Cloudera distribution.
  • Responsible to manage data coming from different sources.
  • Supported MapReduce Programs those are running on the cluster.
  • Wrote MapReduce job using Java API for data Analysis and dim fact generations.
  • Installed and configured Pig and also written Pig Latin scripts.
  • Wrote MapReduce job using Pig Latin.
  • Prepared Spark build from the source code and ran the PIG Scripts using Spark rather using MR jobs for better performance
  • Imported data using Sqoop to load data from MySQL to HDFS on regular basis.
  • Developed Scripts and Batch Job to schedule various Hadoop Program.
  • Wrote Hive queries for data analysis to meet the business requirements.
  • Created Hive tables and working on them using Hive QL.
  • Utilized Agile Scrum Methodology to help manage and organize a team of 4 developers with regular code review sessions.
  • Used storm for an automatic mechanism for repeating attempts to download and manipulate the data when there is a hiccup.
  • Exploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
  • Used Storm to analyze large amounts of non-unique data points with low latency and high throughput.
  • Weekly meetings with technical collaborators and active participation in code review sessions with senior and junior developers.

Environment: Java, MapReduce, Spark, HDFS, Hive, Pig, Linux, XML, MySQL, MySQL Workbench, Java 6, Eclipse, PL/SQL, SQL connector, Sub Version.

Java/Hadoop Developer

Confidential

Responsibilities:

  • Review the requirement and analyze the impact.
  • Participated in the requirement analysis and design of the application using UML/Rational Rose and Agile methodology.
  • Involved in developed the application using Core Java, J2EE and JSP's.
  • Worked to develop this Web based application in J2EE framework which uses Hibernate for persistence, spring for Dependency Injection and Junit for testing.
  • Used JSP to develop the front-end screens of the application.
  • Designed and developed several SQL Scripts, Stored Procedures, Packages and Triggers for the Database.
  • Used Indexing techniques in the database procedures to obtain search results.
  • Involved in development of Web Service client to get client details from third party agencies.
  • Developed nightly batch jobs which involved interfacing with external third party state agencies. Test scripts for performance and accessibility testing of the application are developed.
  • Involved in different types of testing like Unit, System, Integration testing etc. is carried out during the testing phase.
  • Provided production support to maintain the application.

Environment: Java, J2EE, Struts Frame work, JSP, Spring Framework, Hibernate, Oracle, Eclipse, Subversion, Oracle, PL/SQL, Web sphere UML, Windows.

We'd love your feedback!