Hadoop/Spark Developer Resume San Mateo, CA - Hire IT People

SUMMARY

8+ years of Cumulative IT experience in software development, ETL and Big Data. Excellent understanding and Hands on experience in Hadoop Framework.
3+ years of experience as Big Data developer in Hadoop Ecosystem & Different Big Data analytical tools.
Writing UDF (User Define Functions), UDAF (User Define Aggregate Functions), UDTF (User Define Table Functions) for Hive and Spark.
Creating Internal(Managed) and External Tables in Hive to improve query optimization.
Tuning Catalyst optimizer for spark to optimize query execution plan.
Tuning Garbage Collector (Eden, Survivor 1, Survivor 2) for Memory optimization.
Memory optimization wif Kyro and Custom Serialization techniques.
Providing High Availably wif Zookeeper.
Providing Security wif Kerberos autantication techniques.
Experience in developing applications using Hadoop ecosystem like Spark Streaming, Spark SQL, Hive, Flume, Kafka, Sqoop and HBase.
Writing Python Scripts for Data Scrape from Sources and Store it on S3 Buckets and run analytics.
Extensive noledge of Hadoop & Spark architecture and core components.
Having Knowledge in Containerizing technologies like Docker.
Having Knowledge in Orchestration Platform Like Kubernieties.
Experience in writing queries for moving data from HDFS to Hive and analyzing the data using Hiveql.
Experience in importing and exporting data using Sqoop from Relational Database Systems (RDBMS), Hive, HBase to HDFS, Experience in creating hive context in spark - shell.
Creating connections between Kafka and flume using connection pipelines.
Created Sqoop jobs wif incremental load to populate Hive External tables.
Experience in creating Kafka connectors to ingest data into Spark Streaming
Creating SNS (Simple Notification Services) in AWS.
Analyzed large amounts of data sets writing Hive Queries.
Knowledge of Hive Partitioning, Bucketing, Join optimizations and Query Optimizations
Involved in converting Hive SQL queries into Spark SQL
Knowledge of Spark core, Spark SQL, Spark Streaming, Data Frames, RDD’s, Spark Python API modules of Spark, Scala for Spark and Spark Data frames.
Experience in using DStreams, Accumulators, Broadcast variables, RDD caching for Spark Streaming
Knowledge of Spark SQL UDF, RDD Partitioning.
Experience in Extraction, Transformation & Loading (ETL) of data wif different file formats like CSV, Text files, Sequence files, Parquet, XML, JSON and Avro files based on business requirements.
Familiarity wif the Hadoop architecture, design of data ingestion pipeline, data mining and modeling, advanced data processing and machine learning.
Ability to plan, manage, motivate and work efficiently in both independently or in a team effort.
Good exposure in overall SDLC including requirement gathering, development, testing, debugging, deployment, documentation and production support.
Experience in working wif software methodologies like Agile, Prototype & Waterfall.

TECHNICAL SKILLS

Hadoop Ecosystem: Spark 2.0, HIVE, MapReduce, Sqoop, Flume, Kafka.

Apache Spark: Spark SQL, Spark Streaming.

Languages: Scala 2.11,Python 2.7, SQL, Unix Shell Scripting.

Java technologies: Java, J2EE, Servlets, JDBC, Hibernate.

Databases: MS SQL 2012, MySQL 10g.

No-SQL: HBase, Cassandra.

Hadoop Distribution: Cloudera 5.4, Hortonworks 2.5.

Development tools: Eclipse 4.5, Intellij IDEA.

Operating Systems: UNIX, Windows XP/7/10, OSX (el caption).

Cloud: AWS (Amazon Web Services), Data bricks Cloud.

Software Development: AGILE, Scrum.

File Formats: JSON, Parquet, CSV, Sequence.

Tracker: Jira, Pivotal Tracker.

Data Modeling: Erwin.

Containers: Docker, Kubernieties.

BI Reporting Tools: Tableau.

Web Services: Restful, SOAP.

Autantication: Kerberos.

Serialization Techniques: Java Serialization, Kryo Serialization.

PROFESSIONAL EXPERIENCE

Confidential, San Mateo, CA

Hadoop/Spark Developer

Responsibilities:

Involved in the overall architecture design for the system.
Perform ELT jobs and automate ELT operations using Autosys or cronjobs.
Writing Python Scripts for web scraping data form various stock market websites.
Developing scripts using Scala for spark Streaming find out Share Market Trends.
Developing Python scripts for data filtering and cleansing.
Data Ingestion from multiple sources like RDBMS & Amazon S3 services and Weblogs.
Perform SQOOP Incremental Import Job, Shell Script & CRONJOB for importing data into AWS S3.
Imported data from RDBMS Tables to Hive using Hive commands
Created Hive partition and creating buckets to improve performance and optimize queries for low latency.
Developed a Shell Script which dynamically downloads the Amazon S3 Data files into the HDFS system.
Configuring AWS SNS for data ingestion into Flume Agent when data is added into MySQL. Writing MySQL triggers for the notification system.
Configuring Amazon Kinesis pipelines to S3 storage for data ingestion.
Created Datasets and Data frames for data transformation.
Creating DStreams for processing Micro Batches in Spark Streaming.
Implemented incremental import for S3 CSV files.
Configuring Kyro Serialization for efficient Data Transmitting across the network.
Using different SerDe techniques to improve performance.
Handling Different Data Formats Like CSV, Json, Sequence File. Having noledge of Avro, Parque File Formats.
Perform experiments wif Latest Version of Spark Features like (Structured Streaming, Blink DB)
Migrating Data form HDFS to AWS S3 using DistCp.
Used Agile development process and practices.

Confidential, Denver, CO

Hadoop Developer

Responsibilities:

Developed Data pipeline using Sqoop, Flume to store data into HDFS and further processing through Spark.
Creating Hive tables wif periodic backups, writing complex Hive queries to run on spark.
Implemented partitioning, bucketing and worked on Hive, using file formats and compressions techniques wif optimizations.
Created Hive Generic UDF's to process business logic dat varies based on policy.
Experience in customizing map reduce framework at different levels like input formats, data types, custom SerDe and practitioners.
Pushed the data to RDBMS Systems for mount location for Tableau to import it for reporting.
Continuous monitoring and managing the Hadoop cluster using Cloudera Manager.
Implemented Spark to migrate map reduce jobs into Spark RDD transformations, streaming data using Spark streaming.
Developed Spark scripts using Scala, Spark SQL to access hive tables in spark for faster data processing.
Configured build scripts for multi module projects wif Maven.
Automated the process of scheduling workflow using Oozie and Autosys.

Environment: Hadoop, Cloudera, HDFS, Hive, Spark, Sqoop, Flume, Java, Scala, Shell-script, Impala, Eclipse, MySQL.

Confidential, Chicago, IL

Hadoop Developer

Responsibilities:

Using flume to gather data from source (user log data).
Filter the data and ingest the data into appropriate schemas and tables to support the rule and analytics.
Developed custom User Defined Function (UDF’s) in Hive to transform the large volumes of data wif respect to business requirement.
Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
Responsible for building scalable distributed data solutions using Hadoop.
Involved in loading data from edge node to HDFS using shell scripting.
Developed Spark code using Scala and Spark-SQL for faster testing and processing of data.
Implemented scripts for loading data from UNIX file system to HDFS.
Implemented a script to transmit sysprin information from MySQL to hive and HBase.
Experience in loading and transforming of large sets of structured data.
Automated workflow using Shell Scripts.
Good experience in Hive partitioning, bucketing and perform different types of joins on Hive tables.
Experience in Hadoop 2.x wif spark and Scala.
Managed Hadoop jobs using Oozie workflow scheduler system for Map Reduce, Hive, Pig and Sqoop actions.
Good noledge on Data Ingestion and Data Processing.
Used Spark SQL to process the huge amount of structured data.
Experience in managing and reviewing Hadoop log files.
Used Oozie workflow engine to run multiple Hive jobs.
Exploring wif the Spark for improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
Responsible to manage the test data coming from different sources.
Responsible for developing batch process using Unix Shell Scripting.

Environment: Apache Spark, Scala, Hadoop, HDFS, Hive, Sqoop, HBase, Unix, Kafka, Oozie, Cloudera CDH5x.

Confidential, Chicago, IL

ETL Developer

Responsibilities:

Involved in the reverse engineering of the existing DCS system to get the ETL requirements
Analyzed various Business Rules Engines available in the market and provided the list of features to the client
Involved in the preparation of High level ETL architecture document.
Creating ETL job to extract NUL legacy file systems.
Data Cleansing, Noise reduction and improving Data quality.
Store the data dat extracted data into Data ware house staging Tables.
Involved in the analysis of the Paymaster Source system.
Involved in the Build of Informatica mappings.
Responsible for creation of Unit test plans and System test plans
Involved in creating and reviewing the ETL test cases for both Unit and System testing

Environment: Informatica 8.6, Teradata, Tableau, Unix, Windows NT, Oracle 9i, Sql Server, DB2.

Confidential

Software Developer

Responsibilities:

Involved in Design, Development and Support phases of Software Development Life Cycle (SDLC)
Reviewed the functional, design, source code and test specifications.
Involved in developing the complete front end development using Java Script and CSS
Author for Functional, Design and Test Specifications.
Developed web components using JSP, Servlets and JDBC.
Designed tables and indexes.
Implementing change requests.
Designed, Implemented, Tested and Deployed Enterprise Java Beans both Session and Entity using WebLogic as Application Server.
Developed stored procedures, packages and database triggers to enforce data integrity. Performed data analysis and created crystal reports for user requirements.
Implemented the presentation layer wif HTML, XHTML and JavaScript.
Implemented Backend, Configuration DAO, XML generation modules of DIS.
Analyzed, designed and developed the component.
Used JDBC/ODBC Connection for database access.
Used Spring Framework for developing the application and used JDBC to map to Oracle database.
Unit testing and rigorous integration testing of the whole application.
Written and executed the Test Scripts using JUNIT.
Developed XML parsing tool for regression testing.
Prepared the Installation, Customer guide and Configuration document which were delivered to the customer along wif the product.

Environment: Java, JavaScript, HTML, CSS, JDK 1.5.1, JDBC, Oracle10g, XML, XSL, Solaris and UML.

We provide IT Staff Augmentation Services!

Hadoop/spark Developer Resume

San Mateo, CA

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship