We provide IT Staff Augmentation Services!

Hadoop/ Spark Developer Resume

5.00/5 (Submit Your Rating)

New York City, NY

PROFESSIONAL SUMMARY:

  • IT professional with 8+ years of experience in software design, development, deployment and maintenance of data analytic applications in fields of health, insurance, finance and retail sectors.
  • 5+ years of experience in building high performance Big Data applications primarily using Hadoop eco - system tools and Spark framework.
  • Solid understanding of architecture of Hadoop framework eco-system components Map Reduce, Pig, Hive, HBase, Flume, Sqoop, Hue, Oozie, Spark and Kafka.
  • Good handson experiencing working with various hadoop disrtibutions mainly Cloudera (CDH), Hortonworks (HDP) and Amazon EMR.
  • Expertise in developing production ready Spark applications utilizing Spark-Core, Data frames, Spark-SQL, Spark-ML and Spark-Streaming API's.
  • Strong experience troubleshooting failures in spark applications and fine-tuning for better performance.
  • Experience in using D- Streams in spark streaming, accumulators , Broadcast variables , various levels of caching and optimization techniques in spark.
  • Strong experience working with both batch and real-time processing using Spark framework.
  • Worked extensively on Hive for building complex data analytical applications.
  • Strong experience writing complex map-reduce jobs including development of custom Input Formats and custom Record Readers.
  • Sound Knowledge in map side join, reduce side join, shuffle & sort, distributed cache, compression techniques, multiple Hadoop Input & output formats.
  • Worked extensively on Sqoop for performing bulk and incremental ingestion of large datasets from Teradata to HDFS.
  • Good experience working with AWS Cloud services like S3, EMR, Redshift, Athena etc.,
  • Deep understanding of performance tuning , partitioning for optimizing spark applications.
  • Worked on building real time data workflows using Kafka, Spark streaming and HBase.
  • Extensive knowledge on NoSQL databases like HBase, Cassandra and Mongo DB.
  • Strong working experience in extracting, wrangling, ingestion, processing, storing, querying and Analyzing structured, semi-structured and unstructured data.
  • Solid understanding of Hadoop MRV1 and Hadoop MRV2 (or) YARN Architecture.
  • Solid experience in working with csv, text, sequential, Avro, parquet, orc, JSON formats of data.
  • Extensive experience in performing ETL on structured, semi-structured data using Pig Latin Scripts.
  • Designed and implemented Hive and Pig UDF's using Java for evaluation, filtering, loading and storing of data.
  • Created Talend Mappings to populate the data into dimensions and fact tables.
  • Experience with RDBMS like SQL Server, MySQL, Oracle and data warehouses like Teradata and Netezza.
  • Experienced in job workflow scheduling and monitoring tools like Oozie.
  • Proficient knowledge and hands on experience in writing shell scripts in Linux.
  • Expertise in complete Java Package, object oriented design.
  • Developed core modules in large cross-platform applications using JAVA , JSP , Servlets , Hibernate , RESTful , JDBC , JavaScript , XML , and HTML .
  • Extensive experience in developing and deploying applications using Web Logic , Apache Tomcat and JBOSS .
  • Development experience with RDBMS, including writing SQL queries, views, stored procedure, triggers, etc.
  • Strong understanding of Software Development Lifecycle (SDLC) and various methodologies (Waterfall, Agile).

TECHNICAL SKILLS:

Big Data Technologies: HDFS, Map Reduce, Hive, Pig, Sqoop, Flume, Oozie, Hue, Ambari, Zookeeper, Kafka, Apache Spark, Storm

Hadoop Distributions: CDH, HDP, AWS EMR

Languages: Java, Scala, Python, SQL, Pig Latin, Hive QL

IDE Tools: Eclipse, NetBeans, IntelliJ.

Web Technologies: HTML, CSS, JavaScript, XML, JSP, RESTful.

Operating Systems: Windows (XP,7,8,10), UNIX, LINUX, Ubuntu, CentOS

Reporting Tools /ETL Tools: Tableau, Power view for Microsoft Excel, Talend

Databases: Oracle, SQL Server, MySQL, MS Access, NoSQL Database (HBase, Cassandra, MongoDB)

Build Automation tools: SBT, Ant, Maven

PROFESSIONAL EXPERIENCE:

Confidential, New York City, NY

Hadoop/ Spark Developer

Responsibilities:

  • Developed custom input adapters in Java for moving the data from raw sources (FTP, S3) to HDFS.
  • Developed Spark applications using Scala to perform data cleansing, data validation, data transformations and other enrichments.
  • Worked extensively on making the spark applications production ready by implementing possible best practices, to make them highly scalable and fault tolerant.
  • Developed many Spark applications for performing data cleansing, event enrichment, data aggregation, de-normalization and data preparation needed for machine learning exercise.
  • Worked on troubleshooting spark application to make them more error tolerant.
  • Worked on fine-tuning spark applications to improve the overall processing time for the pipelines.
  • Wrote Kafka producers to stream the data from external rest API to Kafka topics.
  • Wrote Spark-Streaming applications to consume the data from Kafka topics and write the processed streams to HBase.
  • Utilized Data frames and Spark SQL API extensively where ever needed.
  • Data pipeline consists Sqoop, custom build Input Adapters, Spark and Hive.
  • Worked on performing Hive modeling and written many hive scripts to perform various kinds of data preparations that are needed for running machine learning models.
  • Worked closely with the data science team in automating and production analyzing various models like logistic regression, k-means using spark-ml.
  • Worked on working prototype to build a real time workflow for streaming the user events from external applications.
  • Utilized Kafka and Spark Streaming for building the real time pipeline.
  • Worked on converting existing map-reduce jobs to spark jobs.
  • Developed Oozie workflows to automate and productionize the data pipelines.
  • Implemented Logistic Regression and K-Means models and automated them to run in production.

Environment: Cloudera Distribution, Hadoop, HDFS, Spark, Scala, Kafka, HBase, Oozie, Hive, Flume, Sqoop, Java, SQL, Oracle 11g, Unix/Linux

Confidential, Bethlehem, PA

Hadoop Developer

Responsibilities:

  • Part of Big Data Center of Excellence(CoE), responsible for designing and building enterprise data analytics solutions.
  • Worked with respective business units in understanding the scope of the analytics requirements.
  • Performed core ETL transformations in Spark.
  • Automated data pipelines which involve data ingestion, data cleansing, data preparation and data analytics.
  • Created end to end Spark applications using Scala to perform various data cleansing, validation, transformation and summarization activities on user behavioral data.
  • Converted existing MapReduce jobs into Spark transformations and actions using Spark RDDs, Data frames and Spark SQL API
  • Developed end-to-end data pipeline using FTP Adaptor, Spark, Hive and Impala.
  • Used Scala to write code for all Spark use cases.
  • Implemented design patterns in Scala for the application.
  • Implemented Spark using Scala utilized Spark SQL heavily for faster development, and processing of data.
  • Exploring with Spark for improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, YARN.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark SQL and Scala.
  • Handled importing other enterprise data from different data sources into HDFS using Sqoop and performing transformations using Hive, Map Reduce and then loading data into HBase tables.
  • Exported the analyzed data to the relational databases using Sqoop, to further visualize and generate reports for the BI team.
  • Collecting and aggregating large amounts of log data using Flume and staging data in HDFS for further analysis
  • Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
  • Created components like Hive UDFs for missing functionality in HIVE for analytics.
  • Worked on various performance optimizations like using distributed cache for small datasets, Partition, Bucketing in Hive and Map Side joins.
  • Created Oozie workflows and coordinators to automate data pipelines daily, weekly and monthly.
  • Automated Cluster creation and termination in AWS.

Environment: Horton Works, HDFS, Hive, Sqoop, Flume, Spark, Scala, HBase, Kafka, Impala, Oozie, Oracle 11g, YARN, UNIX Shell Scripting, Agile Methodology

Confidential, Minnetonka, MN

Hadoop Developer

Responsibilities:

  • Worked with the business team to gather the requirements and participated in the Agile planning meetings to finalize the scope of each development.
  • Responsible for building scalable distributed data solutions on CDH.
  • Developed multiple MapReduce jobs in Java for data cleaning and preprocessing.
  • Implemented data pipelines developing multiple mappers by using Chained Mappers API.
  • Developed multiple MapReduce batch jobs in java for loading the data to HDFS in sequential format.
  • Ingested structured data from wide array of RDBMS to HDFS as incremental import using Sqoop.
  • Involved in writing Pig scripts to wrangle the raw data and store it to HDFS, load the data to Hive tables using HCatalog.
  • Configured Flume agents on different data sources to capture the streaming log data from the web servers.
  • Implemented Flume (Multiplexing) to steam data from upstream pipes in to HDFS.
  • Created Hive external tables with clustering and partitioning on the date for optimizing the performance of ad-hoc queries.
  • Involved in writing Hive QL scripts on beeline, impala, hive cli for the consumer data analysis to meet business requirements.
  • Exported data from Hive to DWH using Sqoop.
  • Worked with different file formats and compression techniques to ensure optimal performance of hive queries.
  • Involved in creating Hive tables from wide range of data formats like csv, text, sequential, Avro, parquet, orc, JSON and custom formats using SerDe .
  • Transformed the semi-structured log data to fit into the schema of the Hive tables using Pig.
  • Involved in scheduling Oozie workflow engine to run multiple Hive and pig jobs.
  • Involved in testing and designing low level and high-level documentation for the business requirement.

Environment: Cloudera Hadoop, Eclipse, Map Reduce, java, Sqoop, Pig, Oozie, Hive, Flume, Cent OS, MySQL, Oracle DB.

Confidential, Overland Park, KS

Hadoop Engineer

Responsibilities:

  • Imported Data from different Relational Data Sources like Teradata, Oracle to HDFS using Sqoop.
  • Imported Bulk Data into HBase tables using Map Reduce programs.
  • Inserted time series data in HBase using HBase Java Api.
  • Designed and implemented Incremental Imports into Hive tables.
  • Used Rest ApI to Access HBase data to perform analytics.
  • Worked in loading and transforming large sets of structured, semi structured and unstructured data
  • Involved in collecting, aggregating and moving data from servers to HDFS using Apache Flume
  • Written Hive jobs to parse the logs and structure them in tabular format to facilitate effective querying on the log data.
  • Involved in creating Hive tables, loading with data and writing hive queries that will run internally in map reduce way
  • Experienced in managing and reviewing the Hadoop log files.
  • Migrated ETL jobs to Pig scripts do Transformations, even joins and some pre-aggregations before storing the data onto HDFS.
  • Worked with Avro Data Serialization system to work with JSON data formats.
  • Worked on different file formats like Sequence files, XML files and Map files using Map Reduce Programs.
  • Involved in Unit testing and delivered Unit test plans and results documents using Junit and MRUnit.
  • Exported data from HDFS environment into RDBMS using Sqoop for report generation and visualization purpose.
  • Worked on Oozie workflow engine for job scheduling.

Environment: Hadoop, HDFS, Map Reduce, Hive, Teradata, Oozie, Sqoop, Pig, Java, Rest API, Maven, MRUnitJunit.

Confidential, Richfield, MN

Java/ Hadoop Developer

Responsibilities:

  • Developed web applications by coordinating requirements, user stories, use cases, screen mockups, schedules, and activities.
  • Work closely with client business stakeholders on Agile development teams.
  • Support users by developing documentation and assistance tools.
  • Developed presentation using Spring Framework and used multiple modules in Spring like, Spring MVC, JDBC
  • Implemented Web-Services to integrate between different applications components using RESTful using Jersey.
  • Developed RESTful Web services for transmission of data in JSON/XML format.
  • Involved in writing SQL queries, functions, views, triggers and stored procedures and also using Oracle relational database.
  • Used Sqoop to ingest structured data from Oracle database to HDFS.
  • Involved in writing and running Map Reduce batch jobs using java for data wrangling on the cluster.
  • Developed map side, reduce side joins using Distributed Cache on various data sets.
  • Developed Pig Latin scripts to transform the data according to the business requirement.
  • Developed Pig UDFs extending eval, filter functions using java to filter semi structured data.

Environment: Java, J2EE, Eclipse, JSP, Servlets, Spring, JavaScript, HTML, RESTful, shell scripting, XML, Oracle 10g, Cloudera Hadoop, Map Reduce, Pig, HDFS.

Confidential

Java/ J2EE Developer

Responsibilities:

  • Involved in Analysis, design and development of web applications based on J2EE.
  • Struts framework is used for managing the navigation and page flow.
  • Developed the EJB-Session Bean acts as Facade, will be able to access the business entities through their local home interfaces.
  • Designed the user interface using HTML, CSS, java Script and JQuery
  • Used Log4j to debug and generate new logs for the application.
  • Used JDBC for accessing the data from the Oracle database. Created database tables, stored procedures using PL/SQL in Oracle DB.
  • Validation on Web Forms, for client-side validation as per the requirement.
  • Experienced in developing code to convert JSON data to Customize JavaScript objects.
  • Developed Servlets and JSPs based on MVC pattern using Struts framework.
  • Provided support for Production and Implementation Issues. Involved in end-user/client training of the application.
  • Performed Unit Tests on the application to verify and identify various scenarios.
  • Used Eclipse for development, Testing, and Code Review.
  • Involved in the release management process to QA/UAT/Production regions.
  • Used Maven tool for building application EAR for deploying on Web Logic Application servers.
  • Developed of the project in the agile environment.

Environment: J2EE, Java, Eclipse, EJB, Java Beans, JDBC, JSP, Struts, Design Patterns, BEA WebLogic, PL/SQL, DB2, UML, CVS, JUnit, Log4j.

Confidential

Java Developer

Responsibilities:

  • Involved in various phases of Software Development Life Cycle (SDLC) of the application like Requirement gathering, design, development and documentation.
  • The application is designed using J2EE design patterns and technologies based on MVC architecture
  • Involved in designing the user interfaces using HTML, CSS and JavaScript for client-side validation.
  • Developed custom tags, JSTL to support custom User Interfaces.
  • Handled business logic as a Model using the helper classes and Servlets to control the flow of application as controller as server-side validations.
  • Involved in Servlets, Java Bean programming on the server side for the communication between clients and server.
  • Involved in Implemented Web-Services to integrate between different applications (internal and third party components) using Restful services.
  • Involved in writing unit testing for doing positive and negative test cases.
  • Developed the maven scripts for preparing WAR files used to deploy J2EE components.
  • Created tables, views, triggers, stored procedures on MySQL server for data manipulation and retrieval.
  • Used JDBC to invoke Stored Procedures and for database connectivity to database server.
  • Used Log4J to capture the log that includes runtime exceptions.
  • Involved in Bug fixing and functionality enhancements.
  • Developed the project using Waterfall model.

Environment: J2EE, Java, UNIX, red-hat, Putty, MVC, JSP, JDBC, Eclipse IDE, Apache Tomcat, CSS, HTML, JavaScript, SQL Server.

We'd love your feedback!