We provide IT Staff Augmentation Services!

Hadoop/spark Developer Resume

2.00/5 (Submit Your Rating)

San Mateo, CA

SUMMARY

  • 8+ years of Cumulative IT experience in software development, ETL and Big Data. Excellent understanding and Hands on experience in Hadoop Framework.
  • 3+ years of experience as Big Data developer in Hadoop Ecosystem & Different Big Data analytical tools.
  • Writing UDF (User Define Functions), UDAF (User Define Aggregate Functions), UDTF (User Define Table Functions) for Hive and Spark.
  • Creating Internal(Managed) and External Tables in Hive to improve query optimization.
  • Tuning Catalyst optimizer for spark to optimize query execution plan.
  • Tuning Garbage Collector (Eden, Survivor 1, Survivor 2) for Memory optimization.
  • Memory optimization wif Kyro and Custom Serialization techniques.
  • Providing High Availably wif Zookeeper.
  • Providing Security wif Kerberos autantication techniques.
  • Experience in developing applications using Hadoop ecosystem like Spark Streaming, Spark SQL, Hive, Flume, Kafka, Sqoop and HBase.
  • Writing Python Scripts for Data Scrape from Sources and Store it on S3 Buckets and run analytics.
  • Extensive noledge of Hadoop & Spark architecture and core components.
  • Having Knowledge in Containerizing technologies like Docker.
  • Having Knowledge in Orchestration Platform Like Kubernieties.
  • Experience in writing queries for moving data from HDFS to Hive and analyzing the data using Hiveql.
  • Experience in importing and exporting data using Sqoop from Relational Database Systems (RDBMS), Hive, HBase to HDFS, Experience in creating hive context in spark - shell.
  • Creating connections between Kafka and flume using connection pipelines.
  • Created Sqoop jobs wif incremental load to populate Hive External tables.
  • Experience in creating Kafka connectors to ingest data into Spark Streaming
  • Creating SNS (Simple Notification Services) in AWS.
  • Analyzed large amounts of data sets writing Hive Queries.
  • Knowledge of Hive Partitioning, Bucketing, Join optimizations and Query Optimizations
  • Involved in converting Hive SQL queries into Spark SQL
  • Knowledge of Spark core, Spark SQL, Spark Streaming, Data Frames, RDD’s, Spark Python API modules of Spark, Scala for Spark and Spark Data frames.
  • Experience in using DStreams, Accumulators, Broadcast variables, RDD caching for Spark Streaming
  • Knowledge of Spark SQL UDF, RDD Partitioning.
  • Experience in Extraction, Transformation & Loading (ETL) of data wif different file formats like CSV, Text files, Sequence files, Parquet, XML, JSON and Avro files based on business requirements.
  • Familiarity wif the Hadoop architecture, design of data ingestion pipeline, data mining and modeling, advanced data processing and machine learning.
  • Ability to plan, manage, motivate and work efficiently in both independently or in a team effort.
  • Good exposure in overall SDLC including requirement gathering, development, testing, debugging, deployment, documentation and production support.
  • Experience in working wif software methodologies like Agile, Prototype & Waterfall.

TECHNICAL SKILLS

Hadoop Ecosystem: Spark 2.0, HIVE, MapReduce, Sqoop, Flume, Kafka.

Apache Spark: Spark SQL, Spark Streaming.

Languages: Scala 2.11,Python 2.7, SQL, Unix Shell Scripting.

Java technologies: Java, J2EE, Servlets, JDBC, Hibernate.

Databases: MS SQL 2012, MySQL 10g.

No-SQL: HBase, Cassandra.

Hadoop Distribution: Cloudera 5.4, Hortonworks 2.5.

Development tools: Eclipse 4.5, Intellij IDEA.

Operating Systems: UNIX, Windows XP/7/10, OSX (el caption).

Cloud: AWS (Amazon Web Services), Data bricks Cloud.

Software Development: AGILE, Scrum.

File Formats: JSON, Parquet, CSV, Sequence.

Tracker: Jira, Pivotal Tracker.

Data Modeling: Erwin.

Containers: Docker, Kubernieties.

BI Reporting Tools: Tableau.

Web Services: Restful, SOAP.

Autantication: Kerberos.

Serialization Techniques: Java Serialization, Kryo Serialization.

PROFESSIONAL EXPERIENCE

Confidential, San Mateo, CA

Hadoop/Spark Developer

Responsibilities:

  • Involved in the overall architecture design for the system.
  • Perform ELT jobs and automate ELT operations using Autosys or cronjobs.
  • Writing Python Scripts for web scraping data form various stock market websites.
  • Developing scripts using Scala for spark Streaming find out Share Market Trends.
  • Developing Python scripts for data filtering and cleansing.
  • Data Ingestion from multiple sources like RDBMS & Amazon S3 services and Weblogs.
  • Perform SQOOP Incremental Import Job, Shell Script & CRONJOB for importing data into AWS S3.
  • Imported data from RDBMS Tables to Hive using Hive commands
  • Created Hive partition and creating buckets to improve performance and optimize queries for low latency.
  • Developed a Shell Script which dynamically downloads the Amazon S3 Data files into the HDFS system.
  • Configuring AWS SNS for data ingestion into Flume Agent when data is added into MySQL. Writing MySQL triggers for the notification system.
  • Configuring Amazon Kinesis pipelines to S3 storage for data ingestion.
  • Created Datasets and Data frames for data transformation.
  • Creating DStreams for processing Micro Batches in Spark Streaming.
  • Implemented incremental import for S3 CSV files.
  • Configuring Kyro Serialization for efficient Data Transmitting across the network.
  • Using different SerDe techniques to improve performance.
  • Handling Different Data Formats Like CSV, Json, Sequence File. Having noledge of Avro, Parque File Formats.
  • Perform experiments wif Latest Version of Spark Features like (Structured Streaming, Blink DB)
  • Migrating Data form HDFS to AWS S3 using DistCp.
  • Used Agile development process and practices.

Confidential, Denver, CO

Hadoop Developer

Responsibilities:

  • Developed Data pipeline using Sqoop, Flume to store data into HDFS and further processing through Spark.
  • Creating Hive tables wif periodic backups, writing complex Hive queries to run on spark.
  • Implemented partitioning, bucketing and worked on Hive, using file formats and compressions techniques wif optimizations.
  • Created Hive Generic UDF's to process business logic dat varies based on policy.
  • Experience in customizing map reduce framework at different levels like input formats, data types, custom SerDe and practitioners.
  • Pushed the data to RDBMS Systems for mount location for Tableau to import it for reporting.
  • Continuous monitoring and managing the Hadoop cluster using Cloudera Manager.
  • Implemented Spark to migrate map reduce jobs into Spark RDD transformations, streaming data using Spark streaming.
  • Developed Spark scripts using Scala, Spark SQL to access hive tables in spark for faster data processing.
  • Configured build scripts for multi module projects wif Maven.
  • Automated the process of scheduling workflow using Oozie and Autosys.

Environment: Hadoop, Cloudera, HDFS, Hive, Spark, Sqoop, Flume, Java, Scala, Shell-script, Impala, Eclipse, MySQL.

Confidential, Chicago, IL

Hadoop Developer

Responsibilities:

  • Using flume to gather data from source (user log data).
  • Filter the data and ingest the data into appropriate schemas and tables to support the rule and analytics.
  • Developed custom User Defined Function (UDF’s) in Hive to transform the large volumes of data wif respect to business requirement.
  • Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
  • Responsible for building scalable distributed data solutions using Hadoop.
  • Involved in loading data from edge node to HDFS using shell scripting.
  • Developed Spark code using Scala and Spark-SQL for faster testing and processing of data.
  • Implemented scripts for loading data from UNIX file system to HDFS.
  • Implemented a script to transmit sysprin information from MySQL to hive and HBase.
  • Experience in loading and transforming of large sets of structured data.
  • Automated workflow using Shell Scripts.
  • Good experience in Hive partitioning, bucketing and perform different types of joins on Hive tables.
  • Experience in Hadoop 2.x wif spark and Scala.
  • Managed Hadoop jobs using Oozie workflow scheduler system for Map Reduce, Hive, Pig and Sqoop actions.
  • Good noledge on Data Ingestion and Data Processing.
  • Used Spark SQL to process the huge amount of structured data.
  • Experience in managing and reviewing Hadoop log files.
  • Used Oozie workflow engine to run multiple Hive jobs.
  • Exploring wif the Spark for improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
  • Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
  • Responsible to manage the test data coming from different sources.
  • Responsible for developing batch process using Unix Shell Scripting.

Environment: Apache Spark, Scala, Hadoop, HDFS, Hive, Sqoop, HBase, Unix, Kafka, Oozie, Cloudera CDH5x.

Confidential, Chicago, IL

ETL Developer

Responsibilities:

  • Involved in the reverse engineering of the existing DCS system to get the ETL requirements
  • Analyzed various Business Rules Engines available in the market and provided the list of features to the client
  • Involved in the preparation of High level ETL architecture document.
  • Creating ETL job to extract NUL legacy file systems.
  • Data Cleansing, Noise reduction and improving Data quality.
  • Store the data dat extracted data into Data ware house staging Tables.
  • Involved in the analysis of the Paymaster Source system.
  • Involved in the Build of Informatica mappings.
  • Responsible for creation of Unit test plans and System test plans
  • Involved in creating and reviewing the ETL test cases for both Unit and System testing

Environment: Informatica 8.6, Teradata, Tableau, Unix, Windows NT, Oracle 9i, Sql Server, DB2.

Confidential

Software Developer

Responsibilities:

  • Involved in Design, Development and Support phases of Software Development Life Cycle (SDLC)
  • Reviewed the functional, design, source code and test specifications.
  • Involved in developing the complete front end development using Java Script and CSS
  • Author for Functional, Design and Test Specifications.
  • Developed web components using JSP, Servlets and JDBC.
  • Designed tables and indexes.
  • Implementing change requests.
  • Designed, Implemented, Tested and Deployed Enterprise Java Beans both Session and Entity using WebLogic as Application Server.
  • Developed stored procedures, packages and database triggers to enforce data integrity. Performed data analysis and created crystal reports for user requirements.
  • Implemented the presentation layer wif HTML, XHTML and JavaScript.
  • Implemented Backend, Configuration DAO, XML generation modules of DIS.
  • Analyzed, designed and developed the component.
  • Used JDBC/ODBC Connection for database access.
  • Used Spring Framework for developing the application and used JDBC to map to Oracle database.
  • Unit testing and rigorous integration testing of the whole application.
  • Written and executed the Test Scripts using JUNIT.
  • Developed XML parsing tool for regression testing.
  • Prepared the Installation, Customer guide and Configuration document which were delivered to the customer along wif the product.

Environment: Java, JavaScript, HTML, CSS, JDK 1.5.1, JDBC, Oracle10g, XML, XSL, Solaris and UML.

We'd love your feedback!