We provide IT Staff Augmentation Services!

Sr. Hadoop Developer Resume

2.00/5 (Submit Your Rating)

Daytona Beach, FL

PROFESSIONAL SUMMARY

  • Around 9 years of programming experience involved in all phases of Software Development Life Cycle (SDLC)
  • Over 5+ Years of Big Data experience in building highly scalable data analytics applications.
  • Strong experience working with Hadoop ecosystem components like HDFS, Map Reduce, Spark, HBase, Oozie, Hive, Sqoop, Pig, Flume and Kafka
  • Good handson experiencing working with various hadoop disrtibutions mainly Cloudera (CDH), Hortonworks (HDP) and Amazon EMR.
  • Good understanding of Distributed Systems architecture and design principles behind Parallel Computing.
  • Expertise in developing production ready Spark applications utilizing Spark - Core, Dataframes, Spark-SQL, Spark-ML and Spark-Streaming API's, SciKitLearn, SparkML(MLlib) and Tensorflow.
  • Strong experience troubleshooting failures in spark applications and fine-tuning spark applications and hive queries for better performance.
  • Worked extensively on Hive for building complex data analytical applications.
  • Strong experience writing complex map-reduce jobs including development of custom Input Formats and custom Record Readers.
  • Sound Knowledge in map side join, reduce side join, shuffle & sort, distributed cache, compression techniques, multiple hadoop Input & output formats.
  • Worked with Apache NiFi to automate the data flow between the systems and managed flow of information between system.
  • Good experience working with AWS Cloud services like S3, EMR, Redshift, Athena etc.,
  • Deep understanding of performance tuning, partitioning for optimizing spark applications.
  • Worked on building real time data workflows using Kafka, Spark streaming and HBase.
  • Extensive knowledge on NoSQL databases like HBase, Cassandra and Mongo DB.
  • Solid experience in working with csv, text, sequential, avro, parquet, orc, json formats of data.
  • Extensive experience in performing ETL on structured, semi-structured data using Pig Latin Scripts.
  • Designed and implemented Hive and Pig UDF's using Java for evaluation, filtering, loading and storing of data.
  • Experience in using Hadoop ecosystem and processing data using Tableau.
  • Good knowledge in the core concepts of programming such as algorithms, data structures, collections.
  • Developed core modules in large cross-platform applications using JAVA, JSP, Servlets, Hibernate, RESTful, JDBC, JavaScript, XML, and HTML.
  • Extensive experience in developing and deploying applications using Web Logic, Apache Tomcat and JBOSS.
  • Development experience with RDBMS, including writing SQL queries, views, stored procedure, triggers, etc.
  • Strong understanding of Software Development Lifecycle (SDLC) and various methodologies (Waterfall, Agile).

TECHNICAL SKILLS

Programming Language: Java/J2EE, JSP, Servlets, AJAX, EJB, Struts, Spring, JDBC, JavaScript, PHP and Python.

Databases: MYSQL, SQL, DB2 and Teradata

Web services: REST, AWS, SOAP, WSDL, Servers Apache Tomcat, WebSphere, JBoss

Operating Systems: Unix, Linux, Windows, Solaris

IDE tools: My Eclipse, Eclipse, NetBeans

QA Tools: Crashlytics or Fabrics

Web UI: HTML, JavaScript, XML, SOAP, WSDL

PROFESSIONAL EXPERIENCE

Confidential, Daytona beach, FL

Sr. Hadoop Developer

Responsibilities:

  • Developed Spark applications using Scala utilizing Data frames and Spark SQL API for faster processing of data.
  • Developed highly optimized Spark applications to perform various data cleansing, validation, transformation and summarization activities according to the requirement
  • Data pipeline consists Spark, Hive and Sqoop and custom build Input Adapters to ingest, transform and analyze operational data.
  • Developed Spark jobs and Hive Jobs to summarize and transform data.
  • Used Spark for interactive queries, processing of streaming data and integration with popular NoSQL database for huge volume of data.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark DataFrames and Scala.
  • Used different tools for data integration with different databases and Hadoop.
  • Analyzed the SQL scripts and designed the solution to implement using Scala.
  • Used Spark for interactive queries, processing of streaming data and integration with popular NoSQL database for huge volume of data.
  • Built real time data pipelines by developing kafka producers and spark streaming applications for consuming.
  • Ingested syslog messages parse them and streams the data to Kafka.
  • Handled importing data from different data sources into HDFS using Sqoop and performing transformations using Hive, Map Reduce and then loading data into HDFS.
  • Exported the analyzed data to the relational databases using Sqoop, to further visualize and generate reports for the BI team.
  • Collecting and aggregating large amounts of log data using Flume and staging data in HDFS for further analysis
  • Analyzed the data by performing Hive queries (Hive QL) to study customer behavior.
  • Helped Dev ops Engineers for deploying code and debug issues.
  • Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
  • Developed Hive scripts in Hive QL to de-normalize and aggregate the data.
  • Scheduled and executed workflows in Oozie to run various jobs.
  • Experience in using Hadoop ecosystem and processing data using Amazon AWS.

Environment: Hadoop, HDFS, HBase, Spark, Scala, Hive, MapReduce, Sqoop, ETL, Java, PL/SQL, Oracle 11g, Unix/Linux.

Confidential, Union, New Jersey

Sr. Hadoop Developer

Responsibilities:

  • Developed multi-threaded Java based input adaptors for ingesting click stream data from external sources like ftp server and S3 buckets on daily basis.
  • Created various spark applications using Scala to perform various enrichment of these click stream data combined with enterprise data of the users.
  • Implemented batch processing of jobs using Spark Scala API.
  • Worked with Apache NiFi to automate the data flow between the systems and managed flow of information between systems.
  • Developed Sqoop scripts to import/export data from Oracle to HDFS and into Hive tables.
  • Stored the data in columnar formats using Hive.
  • Involved building and managing NoSQL Database models using HBase.
  • Worked in Spark to read the data from Hive and write it to Hbase .
  • Optimized the Hive tables using optimization techniques like partitions and bucketing to provide better performance with Hive QL queries.
  • Worked with multiple file formats like Avro, Sequence, Parquet and Orc.
  • Converted existing MapReduce programs to Spark Applications for handling semi structured data like JSON files, Apache Log files, and other custom log data.
  • Loaded the final processed data to HBase tables to allow downstream application team to build rich and data driven applications.
  • Experience in using Hadoop ecosystem and processing data using Amazon AWS.
  • Worked with a team to improve the performance and optimization of the existing algorithms in Hadoop using Spark, Spark -SQL, Data Frame.
  • Worked with Apache Ranger for enabling data security across the Hadoop ecosystem.
  • Implemented business logic in Hive and written UDF’s to process the data for analysis.
  • Used Oozie to define a workflow to coordinate the execution of Spark, Hive and Sqoop jobs.
  • Addressing the issues occurring due to the huge volume of data and transitions.
  • Designed, documented operational problems by following standards and procedures using JIRA.

Environment: Java 6, MongoDB, Apache Web server, HTML, JDBC, NoSQL, meteor.js, Eclipse, UNIX, CSS3, XML, JQuery, Oracle.

Confidential, Reston, VA

Hadoop Developer

Responsibilities:

  • Involved in requirement analysis, design, coding and implementation phases of the project.
  • Used Sqoop to load structured data from relational databases into HDFS.
  • Loaded transactional data from Teradata using Sqoop and created Hive Tables.
  • Worked on automation of delta feeds from Teradata using Sqoop and from FTP Servers to Hive.
  • Performed Transformations like De-normalizing, cleansing of data sets, Date Transformations, parsing some complex columns.
  • Worked with different compression codecs like GZIP, SNAPPY and BZIP2 in MapReduce, Pig and Hive for better performance.
  • Worked with Apache NiFi to automate the data flow between the systems and managed flow of information between systems
  • Have used Ansible for automation of frameworks.
  • Handled Avro, JSON and Apache Log data in Hive using custom Hive SerDes.
  • Worked on batch processing and scheduled workflows using Oozie.
  • Implemented installation and configuration of multi-node cluster on the cloud using Amazon Web Services (AWS) on EC2.
  • Worked in agile sprint methodology environment.
  • Have used the Knox gateway for having Hadoop security between the users and operators.
  • Used cloud computing on the multi-node cluster and deployed Hadoop application on cloud S3 and used Elastic Map Reduce (EMR) to run Map-reduce.
  • Used Hive-QL to create partitioned RC, ORC tables, used compression techniques to optimize data process and faster retrieval.
  • Implemented Partitioning, Dynamic Partitioning and Buckets in Hive for efficient data access.

Environment: Apache Hadoop, HDFS, Cloudera Manager, Java, MapReduce, Eclipse Indigo, Hive, HBASE, PIG, Sqoop, Oozie, SQL, Spring.

Confidential, Dallas, TX

Hadoop Developer

Responsibilities:

  • Communicating with business customers effectively to gather the required information for the project.
  • Worked Extensively on Cloudera Distribution.
  • Involved in loading data into HDFS from Teradata using Sqoop
  • Experienced in moving huge amounts of log file data from different servers
  • Worked on implementing complex data transformations using MapReduce framework.
  • Involved in generating structured data through MapReduce jobs and have stored them in Hive tables and have analyzed the results through Hive queries based on the requirements.
  • Worked on performance improvement by implementing Dynamic Partitioning and Buckets in Hive and by designing managed and external tables.
  • Worked on migrating data relational data base to Big data technologies like Cassandra.
  • Worked on development of PIG Latin scripts and have used ETL tools and Informatica for some pre-aggregations
  • Worked on MapReduce programs to cleanse and pre-process data from various different sources.
  • Worked on Sequence files and Avro files in map Reduce programs.
  • Created Hive Generic UDF’s for implementing business logic. And have worked on incremental imports to Hive Tables.
  • Handled importing data from different data sources into HDFS using Sqoop and performing transformations using Hive, Map Reduce and then loading data into HDFS.
  • Exported the analyzed data to the relational databases using Sqoop, to further visualize and generate reports for the BI team.
  • Collecting and aggregating large amounts of log data using Flume and staging data in HDFS for further analysis
  • Worked with Apache NiFi to automate the data flow between the systems and managed flow of information between systems
  • Worked with Talend for integrating data from different data systems to Hadoop.
  • Used Kerbos authentication for proving authentication access to Distributed Systems.
  • Analyzed the data by performing Hive queries (Hive QL) to study customer behavior.
  • Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
  • Developed Hive scripts in Hive QL to de-normalize and aggregate the data.
  • Worked in agile sprint methodology environment.
  • Loaded processed data into HBase tables using HBase Java api calls.

Environment: Hadoop, HDFS, HBase, Spark, Scala, Hive, MapReduce, Sqoop, ETL, Java, PL/SQL, Oracle 11g, Unix/Linux.

Confidential

Manager

Responsibilities:

  • Maintain sites and work force
  • Liaising with clients and reporting on progress to staff and the public.
  • Supervising construction workers and hiring subcontractors.
  • Buying materials for each phase of the project.
  • Monitoring build costs and project progress.
  • Checking and preparing site reports, designs and drawings.
  • Maintain quality Control checks.
  • Day to day problem solving and dealing with any issues that arise.
  • Working on-site at clients’ businesses or in a site office

Confidential

Java Developer

Responsibilities:

  • Implemented the presentation layer with HTML, CSS and JavaScript
  • Developed web components using JSP, Servlets and JDBC
  • Implemented secured cookies using Servlets.
  • Wrote complex SQL queries and stored procedures.
  • Implemented Persistent layer using Hibernate API
  • Implemented Search queries using Hibernate Criteria interface.
  • Used CSS for good User Interface.
  • Provided support for loans reports for CB&T
  • Designed and developed Loans reports for Evans bank using Jasper and iReport.
  • Involved in fixing bugs and unit testing with test cases using Junit.
  • Object Oriented Analysis and Design using UML include development of class diagrams, Sequence diagrams and state diagrams and implemented these diagrams in Microsoft Visio.
  • Maintained Jasper server on client server and resolved issues
  • Actively involved in system testing.
  • Fine tuning SQL queries for maximum efficiency to improve the performance
  • Designed Tables and indexes by following normalizations.
  • Involved in Unit testing, Integration testing and User Acceptance testing
  • Utilizes Java and SQL day to day to debug and fix issues with client processes.

Environment: Java, Servlets, HTML, Java Script, JSP, Hibernate, Junit Testing, Oracle DB, SQL, Jasper Reports, iReport, Maven, Jenkins.

We'd love your feedback!