Hadoop/spark Developer Resume
San Mateo, CA
SUMMARY
- 8+ years of Cumulative IT experience in software development, ETL and Big Data. Excellent understanding and Hands on experience in Hadoop Framework.
- 3+ years of experience as Big Data developer in Hadoop Ecosystem & Different Big Data analytical tools.
- Writing UDF (User Define Functions), UDAF (User Define Aggregate Functions), UDTF (User Define Table Functions) for Hive and Spark.
- Creating Internal(Managed) and External Tables in Hive to improve query optimization.
- Tuning Catalyst optimizer for spark to optimize query execution plan.
- Tuning Garbage Collector (Eden, Survivor 1, Survivor 2) for Memory optimization.
- Memory optimization wif Kyro and Custom Serialization techniques.
- Providing High Availably wif Zookeeper.
- Providing Security wif Kerberos autantication techniques.
- Experience in developing applications using Hadoop ecosystem like Spark Streaming, Spark SQL, Hive, Flume, Kafka, Sqoop and HBase.
- Writing Python Scripts for Data Scrape from Sources and Store it on S3 Buckets and run analytics.
- Extensive noledge of Hadoop & Spark architecture and core components.
- Having Knowledge in Containerizing technologies like Docker.
- Having Knowledge in Orchestration Platform Like Kubernieties.
- Experience in writing queries for moving data from HDFS to Hive and analyzing the data using Hiveql.
- Experience in importing and exporting data using Sqoop from Relational Database Systems (RDBMS), Hive, HBase to HDFS, Experience in creating hive context in spark - shell.
- Creating connections between Kafka and flume using connection pipelines.
- Created Sqoop jobs wif incremental load to populate Hive External tables.
- Experience in creating Kafka connectors to ingest data into Spark Streaming
- Creating SNS (Simple Notification Services) in AWS.
- Analyzed large amounts of data sets writing Hive Queries.
- Knowledge of Hive Partitioning, Bucketing, Join optimizations and Query Optimizations
- Involved in converting Hive SQL queries into Spark SQL
- Knowledge of Spark core, Spark SQL, Spark Streaming, Data Frames, RDD’s, Spark Python API modules of Spark, Scala for Spark and Spark Data frames.
- Experience in using DStreams, Accumulators, Broadcast variables, RDD caching for Spark Streaming
- Knowledge of Spark SQL UDF, RDD Partitioning.
- Experience in Extraction, Transformation & Loading (ETL) of data wif different file formats like CSV, Text files, Sequence files, Parquet, XML, JSON and Avro files based on business requirements.
- Familiarity wif the Hadoop architecture, design of data ingestion pipeline, data mining and modeling, advanced data processing and machine learning.
- Ability to plan, manage, motivate and work efficiently in both independently or in a team effort.
- Good exposure in overall SDLC including requirement gathering, development, testing, debugging, deployment, documentation and production support.
- Experience in working wif software methodologies like Agile, Prototype & Waterfall.
TECHNICAL SKILLS
Hadoop Ecosystem: Spark 2.0, HIVE, MapReduce, Sqoop, Flume, Kafka.
Apache Spark: Spark SQL, Spark Streaming.
Languages: Scala 2.11,Python 2.7, SQL, Unix Shell Scripting.
Java technologies: Java, J2EE, Servlets, JDBC, Hibernate.
Databases: MS SQL 2012, MySQL 10g.
No-SQL: HBase, Cassandra.
Hadoop Distribution: Cloudera 5.4, Hortonworks 2.5.
Development tools: Eclipse 4.5, Intellij IDEA.
Operating Systems: UNIX, Windows XP/7/10, OSX (el caption).
Cloud: AWS (Amazon Web Services), Data bricks Cloud.
Software Development: AGILE, Scrum.
File Formats: JSON, Parquet, CSV, Sequence.
Tracker: Jira, Pivotal Tracker.
Data Modeling: Erwin.
Containers: Docker, Kubernieties.
BI Reporting Tools: Tableau.
Web Services: Restful, SOAP.
Autantication: Kerberos.
Serialization Techniques: Java Serialization, Kryo Serialization.
PROFESSIONAL EXPERIENCE
Confidential, San Mateo, CA
Hadoop/Spark Developer
Responsibilities:
- Involved in the overall architecture design for the system.
- Perform ELT jobs and automate ELT operations using Autosys or cronjobs.
- Writing Python Scripts for web scraping data form various stock market websites.
- Developing scripts using Scala for spark Streaming find out Share Market Trends.
- Developing Python scripts for data filtering and cleansing.
- Data Ingestion from multiple sources like RDBMS & Amazon S3 services and Weblogs.
- Perform SQOOP Incremental Import Job, Shell Script & CRONJOB for importing data into AWS S3.
- Imported data from RDBMS Tables to Hive using Hive commands
- Created Hive partition and creating buckets to improve performance and optimize queries for low latency.
- Developed a Shell Script which dynamically downloads the Amazon S3 Data files into the HDFS system.
- Configuring AWS SNS for data ingestion into Flume Agent when data is added into MySQL. Writing MySQL triggers for the notification system.
- Configuring Amazon Kinesis pipelines to S3 storage for data ingestion.
- Created Datasets and Data frames for data transformation.
- Creating DStreams for processing Micro Batches in Spark Streaming.
- Implemented incremental import for S3 CSV files.
- Configuring Kyro Serialization for efficient Data Transmitting across the network.
- Using different SerDe techniques to improve performance.
- Handling Different Data Formats Like CSV, Json, Sequence File. Having noledge of Avro, Parque File Formats.
- Perform experiments wif Latest Version of Spark Features like (Structured Streaming, Blink DB)
- Migrating Data form HDFS to AWS S3 using DistCp.
- Used Agile development process and practices.
Confidential, Denver, CO
Hadoop Developer
Responsibilities:
- Developed Data pipeline using Sqoop, Flume to store data into HDFS and further processing through Spark.
- Creating Hive tables wif periodic backups, writing complex Hive queries to run on spark.
- Implemented partitioning, bucketing and worked on Hive, using file formats and compressions techniques wif optimizations.
- Created Hive Generic UDF's to process business logic dat varies based on policy.
- Experience in customizing map reduce framework at different levels like input formats, data types, custom SerDe and practitioners.
- Pushed the data to RDBMS Systems for mount location for Tableau to import it for reporting.
- Continuous monitoring and managing the Hadoop cluster using Cloudera Manager.
- Implemented Spark to migrate map reduce jobs into Spark RDD transformations, streaming data using Spark streaming.
- Developed Spark scripts using Scala, Spark SQL to access hive tables in spark for faster data processing.
- Configured build scripts for multi module projects wif Maven.
- Automated the process of scheduling workflow using Oozie and Autosys.
Environment: Hadoop, Cloudera, HDFS, Hive, Spark, Sqoop, Flume, Java, Scala, Shell-script, Impala, Eclipse, MySQL.
Confidential, Chicago, IL
Hadoop Developer
Responsibilities:
- Using flume to gather data from source (user log data).
- Filter the data and ingest the data into appropriate schemas and tables to support the rule and analytics.
- Developed custom User Defined Function (UDF’s) in Hive to transform the large volumes of data wif respect to business requirement.
- Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
- Responsible for building scalable distributed data solutions using Hadoop.
- Involved in loading data from edge node to HDFS using shell scripting.
- Developed Spark code using Scala and Spark-SQL for faster testing and processing of data.
- Implemented scripts for loading data from UNIX file system to HDFS.
- Implemented a script to transmit sysprin information from MySQL to hive and HBase.
- Experience in loading and transforming of large sets of structured data.
- Automated workflow using Shell Scripts.
- Good experience in Hive partitioning, bucketing and perform different types of joins on Hive tables.
- Experience in Hadoop 2.x wif spark and Scala.
- Managed Hadoop jobs using Oozie workflow scheduler system for Map Reduce, Hive, Pig and Sqoop actions.
- Good noledge on Data Ingestion and Data Processing.
- Used Spark SQL to process the huge amount of structured data.
- Experience in managing and reviewing Hadoop log files.
- Used Oozie workflow engine to run multiple Hive jobs.
- Exploring wif the Spark for improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
- Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
- Responsible to manage the test data coming from different sources.
- Responsible for developing batch process using Unix Shell Scripting.
Environment: Apache Spark, Scala, Hadoop, HDFS, Hive, Sqoop, HBase, Unix, Kafka, Oozie, Cloudera CDH5x.
Confidential, Chicago, IL
ETL Developer
Responsibilities:
- Involved in the reverse engineering of the existing DCS system to get the ETL requirements
- Analyzed various Business Rules Engines available in the market and provided the list of features to the client
- Involved in the preparation of High level ETL architecture document.
- Creating ETL job to extract NUL legacy file systems.
- Data Cleansing, Noise reduction and improving Data quality.
- Store the data dat extracted data into Data ware house staging Tables.
- Involved in the analysis of the Paymaster Source system.
- Involved in the Build of Informatica mappings.
- Responsible for creation of Unit test plans and System test plans
- Involved in creating and reviewing the ETL test cases for both Unit and System testing
Environment: Informatica 8.6, Teradata, Tableau, Unix, Windows NT, Oracle 9i, Sql Server, DB2.
Confidential
Software Developer
Responsibilities:
- Involved in Design, Development and Support phases of Software Development Life Cycle (SDLC)
- Reviewed the functional, design, source code and test specifications.
- Involved in developing the complete front end development using Java Script and CSS
- Author for Functional, Design and Test Specifications.
- Developed web components using JSP, Servlets and JDBC.
- Designed tables and indexes.
- Implementing change requests.
- Designed, Implemented, Tested and Deployed Enterprise Java Beans both Session and Entity using WebLogic as Application Server.
- Developed stored procedures, packages and database triggers to enforce data integrity. Performed data analysis and created crystal reports for user requirements.
- Implemented the presentation layer wif HTML, XHTML and JavaScript.
- Implemented Backend, Configuration DAO, XML generation modules of DIS.
- Analyzed, designed and developed the component.
- Used JDBC/ODBC Connection for database access.
- Used Spring Framework for developing the application and used JDBC to map to Oracle database.
- Unit testing and rigorous integration testing of the whole application.
- Written and executed the Test Scripts using JUNIT.
- Developed XML parsing tool for regression testing.
- Prepared the Installation, Customer guide and Configuration document which were delivered to the customer along wif the product.
Environment: Java, JavaScript, HTML, CSS, JDK 1.5.1, JDBC, Oracle10g, XML, XSL, Solaris and UML.