We provide IT Staff Augmentation Services!

Sr. Hadoop Developer Resume

5.00/5 (Submit Your Rating)

San Diego, CA

SUMMARY

  • Having 8 years of overall IT experience which includes 5 Years of comprehensive experience as ApacheHadoopDeveloper. Experience in writingHadoopJobs for analyzing data using Hive, Pig and Oozie and 3+ years of experience in Java as well as SQL Database.
  • Expertise on Hadoop Architecture and various components such as Yarn, HDFS, Job Tracker, Task Tracker, Name Node, Data Node and MapReduce concepts.
  • Experience in Spark, and in - depth knowledge on Spark-SQL, RDD’s, Lazy transformation and actions.
  • Worked on Spark Highly Available (HA) environment.
  • Expert Knowledge on configuring cluster of AWS using EMR and manually. Expertise on S3 storage, transferring from S3 to HDFS and vise-versa.
  • Expertise in importing and exporting data using Sqoop from HDFS to Relational Database Systems.
  • Implemented ETL process using hive and pig which includes Python and Java UDF's for cleansing the data.
  • Expertise on databases like SQL PL/SQL, MySQL, Cassandra, Hbase.
  • Expertise on processing large data sets of structured, semi-structured and un-structured of different formats like JSON, .CSV, .TXT, .XML, TERADATA Avro, Sequence Files, Parquet and supporting systems application architecture.
  • Expertise on file format compression using core JAVA classes and HDFS properties.
  • Expertise on compression algorithms like gzip and snappy.
  • Data modeling skills in understanding the data requirements and subsequently building the data model in Logical & Physical.
  • Designing and implementing complete end-to-end Hadoop Infrastructure including Pig, Hive, Sqoop, Oozie, Flume and Zookeeper.
  • Supporting Hadoop developers and assisting in optimization of map reduce Cassandra Pig Latin scripts, Hive scripts and HBase ingest required.
  • Worked with SPARK eco-system using SCALA and HIVE Queries on different data formats.
  • Worked on Cloudera Hadoop distribution(CDH) environment.
  • Knowledge on Tableau for reporting and analysis.
  • Expertise on FileZilla, WinScp and putty environment.
  • Good at requirements gathering, analysis, troubleshooting and debugging.
  • Expertise on scheduling tools like Autosys, Zeke.
  • Expertise in OOPS Design, analysis, development, testing and maintenance.
  • Ability to work as a team as well as an individual, quick learner and smart worker.
  • Expertise in application design using UML, Case diagrams, ERD Parquet Arrow File Type.
  • Good communication, documentation and strong interpersonal skills.
  • Expertise in creating and managing database tables, indexing and views.
  • Expertise in Creating and managing user accounts and granting permissions on Linux as well as MongoDB level.

TECHNICAL SKILLS

Hadoop Ecosystem: HDFS, Map Reduce Hive, Pig, Pentaho, Hbase, Zookeeper, Sqoop, Oozie, Cassandra, Flume, Spark, Teradata, Talend, Cloudera, Kafka, HDP (Hortonworks Data platform) and Avro.

Web Technologies: Core Java, J2EE, Servlets, JSP, JDBC, XML, AJAX, SOAP, WSDL Teradata

Methodologies: Agile, UML, Design Patterns (Core Java and J2EE)

Frameworks: MVC, Struts 2/1, Hibernate 3, Spring 3/2.5/2, SOA

Programming Languages: Java, XML, Unix Shell scripting, HTML, Scala, Python, JavaScript

Data Bases: Oracle 11g, DB2, MS-SQL Server, MySQL, MS-Access, Mongo DB

Web Services: Web Logic, Web Sphere, Apache Tomcat

Monitoring & Reporting tools: Ganglia, Nagios, Custom Shell scripts

PROFESSIONAL EXPERIENCE

Confidential, San Diego, CA

Sr. Hadoop Developer

Responsibilities:

  • Experienced in designing and deployment of Hadoop cluster and different Big Data analytic tools including Pig, Hive, HBase, Oozie, Sqoop, Kafka, Spark, Impala Teradata .
  • Used Spark API over ClouderaHadoopYARN to perform analytics on data.
  • Developed multiple MapReduce jobs using in Java for data processing.
  • Developed workflows using customMapReduce, Pig, Hive, and Sqoop.
  • Implemented Data Ingestion in real time processing using Kafka.
  • Expertise in integrating Kafka with Spark streaming for high speed data processing.
  • Developed multiple Kafka Producers and Consumers as per the software requirement specifications.
  • Developed Spark code using Scala and Spark-SQL/Streaming for faster processing of data
  • Configured Spark Streaming to receive real time data and store the stream data to HDFS.
  • Implemented Spark using Scala and Spark SQL for faster testing and processing of data.
  • Documented the requirements including the available code which should be implemented using Spark, Hive, HDFS and SOLR.
  • Extract Real time feed using Kafka and Spark Streaming and convert it to RDD and process data in the form of Data Frame and save the data as Parquet format in HDFS.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Scala.
  • Optimized Hive QL/ pig scripts by using execution engine like Spark.
  • Worked and learned a great deal from Amazon Web Services (AWS) Cloud services like EC2, S3, EBS, RDS and VPC.
  • Worked in provisioning and managing multi-tenant Hadoop clusters on public cloud environment - Amazon Web Services (AWS) and on private cloud infrastructure - Open stack cloud platform
  • Worked with cloud administrations like Amazon web services (AWS).
  • Strong experience in working with ELASTIC MAPREDUCE and setting up environments on Amazon AWS EC2 instances.
  • Experience working with Apache SOLR for indexing and querying.
  • Experience in using Solr and Zookeeper technologies.
  • Worked with Apache Spark which provides fast and general engine for large data processing integrated with functional programming language Scala.
  • Developed POC using Scala, Spark SQL and MLlib libraries along with Kafka and other tools as per requirement then deployed on the Yarn cluster.
  • Experience in performance tuning a Cassandra cluster to optimize it for writes and reads.
  • Experience in Data modeling and connecting Cassandra from spark and saving summarized data frame to Cassandra.
  • Experience designing and executing time driven and data driven Oozie workflows.
  • Exploring with the Spark improving the performance and optimization of the existing algorithms inHadoop.

Environment: Hadoop, HDFS, MapReduce, pig, Hive, pig scripts, YARN, Sqoop, Oozie, Python, AWS, Shell Scripting, Impala, Spark, spark-sql, HBase, Cassandra, SOLR, Zookeeper, Scala, Kafka, cloudera.

Confidential, Franklin Lakes, NJ

Sr. Hadoop Developer

Responsibilities:

  • Responsible for Cluster maintenance, Monitoring, commissioning and decommissioning Data nodes, Troubleshooting, Manage and review data backups as well as log files.
  • Monitoring systems and services through Ambari dashboard to make the clusters available for the business.
  • Responsible for Performance Tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and Memory tuning.
  • Installed and Configured Hive and Pig. Worked with developers to develop various Hive and Pig Latin scripts.
  • In depth understanding/knowledge of HadoopArchitecture and various components such as HDFS, Name Node, Job Tracker, Data Node, Task Tracker and Map Reduce concepts.
  • Experience in installation, configuration, support and management of a HadoopCluster.
  • Worked on setting up large scale Hadoopenvironments build, capacity planning, clusters, performance tuning and monitoring.
  • Monitored system health and logs and responded accordingly to any warning or failure conditions.
  • Worked on performance tuning for Spark for faster processing times.
  • Experienced in writing script in spark to load data from different sources files and to save data in HDFS for further reuse.
  • Worked on Kafka & Active MQ messages and monitoring the series of ETL job results.
  • Tested raw market data and executed performance scripts on data to reduce the runtime.
  • Extracted Real time feed using Spark streaming and convert it to RDD and process data into Data Frame and saved the data as Parquet format in HDFS.
  • Used Spark for processing of large scale-data, handling real time analytics and real streaming of data.
  • Worked on different set of Hive tables like External Tables and Managed Tables.
  • Used Oozie workflow engine to run multiple Hive and Pig jobs.
  • Extensively used the Hadoop shell commands for copying and viewing the contents of the file.
  • Handled different data of time series using HBase to store data and perform analytics based on time to improve retrieval time.

Environment: Big Data, Hadoop, Spark, HDFS, HBase, Pig, Hive, Oozie, Ambari, Sqoop, Kafka, Active MQ, Linux, Zookeeper, MySQL, Cassandra, Flume.

Confidential, Boston, MA

Sr. Hadoop Developer

Responsibilities:

  • Participated with team to gather and analyze the Client requirements.
  • Analyzed large data sets distributed across cluster of commodity hardware.
  • Worked on MR phases by using Core Java and scripting language, create and export as jar files into HDFS and ran web UI for name node, job tracker and task tracker.
  • Setting required Hadoop environments for cluster to preform Map Reduce jobs.
  • Data was formatted using Hive queries and stored on HDFS.
  • Created complex schema and tables for analyzing using Hive.
  • Involved in extracting, transforming and loading data sets from local to HDFS using Hive.
  • Imported and exported data from RDBMS to HDFS and vice-versa using Sqoop.
  • Involved in writing Pig scripts to analyze or query structured, semi-structured and unstructured data in a file.
  • Worked on HBase and MySQL for optimizing the data.
  • Worked over Sequence files, AVRO and Parquet file formats.
  • Monitor Hadoop cluster connectivity and security using tools such as Zookeeper and Hue.
  • Manage and review Hadoop log files.
  • Developed and implemented automation processes to increase deployment efficiency.
  • Developed scripts to preforming Ad hoc requests.
  • Coordinate and communicate with team and preparing technical design documents.
  • Worked on Agile methodologies.
  • Involved in managing the backup and disaster recovery for Hadoop data.

Environment: CDH 5.0, Hadoop, HDFS, Map Reduce, Hive, Sqoop, Pig, Hbase, MySQL, Zookeeper, Hue Linux.

Confidential, Sacramento, CA

Hadoop Engineer

Responsibilities:

  • Hands on experience in installation, configuration, supporting and managing Hadoop Clusters.
  • Experience in Amazon AWS cloud services (EC2, EBS, S3). Experience in managing Hadoop clusters using Cloudera Manager.
  • Experience in tableau administration.
  • Implemented LTM over hive servers to reach the maximum utilization and failover.
  • Experience in benchmarking Hadoop cluster for analysis of queue usage.
  • Involved in setting up the chef server to push the configuration across the cluster.
  • Involved in extracting the data from various sources into Hadoop HDFS for processing.
  • Used Hive data warehouse tool to analyze the unified historic data in HDFS to identify issues and behavioral patterns.
  • Experience monitoring and troubleshooting issues with hosts in the cluster regarding memory, CPU, OS, storage and network
  • Rack Aware Configuration.
  • Good Experience on HIVE writing SQLs to pull reports.
  • Experience in loading data from various data sources to HDFS using Kafka.
  • Experience in analyzing log files for Hadoop and ecosystem services and finding root cause.
  • Experience on Commissioning, Decommissioning, Balancing, and Managing Nodes and tuning server for optimal performance of the cluster.
  • As a admin involved in Cluster maintenance, troubleshooting, Monitoring and followed proper backup & Recovery strategies.
  • Expertise using Apache Spark fast engine for large-scale data processing and shark fast hive SQL on spark.
  • Worked on a POC for performance evaluation of using Apache spark/Shark against Apache Hive
  • Experience in HDFS data storage and support for running mapreduce jobs.
  • Experience in scheduling the jobs through Tidal, Oozie.
  • Experience in Monitoring, scheduling Informatica jobs through tidal.
  • Experience in monitoring tools like Ganglia, Nagios
  • Commissioning and decommissioning of Hadoop nodes to tune the cluster.
  • Involved in setting up the Kerberos authentication.
  • Experience in Vastool for managing the user and dataset permissions.
  • Dumped the data from one cluster to other cluster by using DISTCP, and automated the dumping procedure using shell scripts.

Environment: MapReduce, HDFS, Hive, Pig, Sqoop, Python scripting, UNIX Shell Scripting, Nagios, Kerberos, Ganglia, Tidal, Tableau, Informatica.

Confidential

JAVA/J2EE Developer

Responsibilities:

  • Involved in Design, Development and Support phases of Software Development Life Cycle (SDLC)
  • Reviewed the functional, design, source code and test specifications.
  • Involved in developing the complete front-end development using Java Script and CSS.
  • Author for Functional, Design and Test Specifications.
  • Implemented Backend, Configuration DAO, XML generation modules of DIS.
  • Analyzed, designed and developed the component.
  • Used JDBC for database access.
  • Used Data Transfer Object (DTO) design patterns.
  • Followed UML standards, created class and sequence diagrams.
  • Unit testing and rigorous integration testing of the whole application.
  • Preparing and executing test cases.
  • Actively involved in the system testing.
  • Developed XML parsing tool for regression testing.
  • Prepared the Installation, Customer guide and Configuration document which were delivered to the customer along with the product.

Environment: Java/J2EE, SQL, Oracle 10g, JSP 2.0, EJB, AJAX, Java Script, Web Logic 8.0, HTML, JDBC 3.0, XML, JMS.

Confidential

Java Developer

Responsibilities:

  • Involved in the analysis, design, implementation and testing of the project.
  • Involved in understanding and analyzing the project requirements.
  • Responsible for coding and implementing the use cases.
  • Designed and developed the web applications using Java Script, HTML, XML and CSS.
  • Implemented validations using JavaScript for the fields on Login screen and registration page.
  • Developed JavaScript code for input validations.
  • Extensively used Java multi-threading to implement batch Jobs with JDK 1.5 features.
  • Used JDBC prepared statements to call from Servlets for database access.
  • Implemented multi-threading concept for the parallel processing of the data.
  • Involved in Unit testing and system testing for various components.
  • Handled the Exceptions by using Try, Catch and Final blocks.
  • Worked on database interaction layer for insertions, updating and retrieval operations of data from oracle database by writing stored procedures.
  • Deployed the Application code in the IBM WebSphere Application Server.
  • Involved in post-production support and maintenance of the application.

Environment: Java, J2EE, Servlets, Struts 1.1, HTML, XML, CSS, SQL Server 2000, PL/SQL, Hibernate, Eclipse, Linux, IBM WebSphere Server.

We'd love your feedback!