We provide IT Staff Augmentation Services!

Sr. Hadoop/spark Developer Resume

ChicagO

SUMMARY

  • 9 years of professional IT experience including 5 years of experience on Big Data Hadoop Development and Data Analytics, Development and Design of Java based enterprise applications.
  • Very strong knowledge on Hadoop ecosystem components like HDFS, MapReduce, Spark, Hive, Pig, Sqoop, Scala, Impala, Flume, Kafka, Oozie and HBase.
  • Strong knowledge on Architecture of Distributed systems and Parallel processing frameworks.
  • In - depth understanding of Spark execution model and internals of MapReduce framework.
  • Expertise in developing production ready Spark applications utilizing Spark-Core, Data-frames, Spark-SQL, Spark- ML and Spark-Streaming API’s.
  • Experience in different Hadoop distributions like Cloudera (Cloudera distribution CDH3, 4 and 5) Horton Works Distributions (HDP).
  • Worked extensively in fine tuning resources for long running Spark Applications to utilize better parallelism and executor memory for more caching.
  • Strong experience working with both batch and real-time processing using Spark frameworks.
  • Proficient knowledge on Apache Spark and programming Scala to analyze large datasets using Spark and Storm to process real time data.
  • Experience in developing Pig Latin Scripts and using Hive Query Language.
  • Strong knowledge on performance tuning Hive queries and troubleshooting various issues related to Joins, memory exceptions in Hive.
  • Very good understanding of Partitions, bucketing concepts in Hive and designed both internal and external tables in Hive to optimize performance.
  • Strong experience using different columnar file formats like Avro, RCFile, ORC and Parquet formats.
  • Hands on experience in installing, configuring and deploying Hadoop distributions in cluster environments (Amazon Web Services).
  • Experience in optimizing Map-Reduce algorithms by using Combiners and custom practitioners.
  • Experience in NoSQL Column - Oriented Databases like HBase, Apache Cassandra, MongoDB and its Integration with Hadoop cluster.
  • Expertise in back-end / server- side java technologies such as: Web services, java persistence API (JPA), Java Messaging Service (JMS), Java Database Connectivity (JDBC).
  • Experienced with different scripting language like Python and Shell Scripts.
  • Experienced data processing like collecting, aggregating, moving from various sources using Apache Flume and Kafka.
  • Extensive experience in ETL process consisting of data transformation, data sourcing, mapping, conversion and loading.
  • In-depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, MapReduce programming paradigm.
  • Worked with Sqoop to move (import / export) data from a relational database into Hadoop.
  • Knowledge in UNIX Shell Scripting for automating deployments and other routine tasks.
  • Experienced in using agile methodologies including extreme programming, SCRUM and Test- Driven Development (TDD).
  • Used custom Serdes like Regex Serde, JSON Serde, CSV Serde etc.., in hive to handle multiple formats of data.
  • Intensive work experience in developing enterprise solutions using Java, J2EE, Servlets, JSP, JDBC, Struts, Spring, Hibernate, JavaBeans, JSF, MVC.
  • Experience in building and deploying web applications in multiple applications servers and middleware platforms including Web logic, Web Sphere, Apache Tomcat, JBoss.
  • Experience in using version control tools like Bit-Bucket, GIT, SVN etc.
  • Experience in writing build scripts using MAVEN, ANT and Gradle.
  • Flexible, enthusiastic and project-oriented team player with excellent communication skills with leadership abilities to develop creative solutions for challenging requirement of client.

TECHNICAL SKILLS

Big Data Ecosystems: HDFS, MapReduce, YARN, Hive, Storm, Sqoop, Pig, Spark HBase, Impala, Scala, Flume, Zookeeper, Oozie

NO SQL Databases: HBase, Cassandra, MongoDB

Java & J2EE Technologies: JDBC, JAVA, SQL, JavaScript, J2EE, C, JDBC, SQL, PL/SQL, Hibernate 3.0, Spring 3.x, Structs

AWS technologies: Data Pipeline, Redshift, EMR

Languages: Java, Scala, Python, SQL, Pig Latin, HiveQL, Shell Scripting.

Database: Microsoft SQL Server, MySQL, Oracle, DB2

Web/Application Servers: Web logic, Web Sphere, JBoss, Tomcat

IDE’s & Utilities: Eclipse, JCreator, NetBeans

Operating Systems: UNIX, Windows, Mac, LINUX

GUI Technologies: HTML, XHTML, CSS, JavaScript, Ajax, AngularJS

Data Visualization tools: Tableau, Power BI, Apache Zeppelin

Development Methodologies: Agile, V-Model, Waterfall Model, Scrum

PROFESSIONAL EXPERIENCE

Confidential, Chicago

Sr. Hadoop/Spark Developer

Responsibilities:

  • Examined transaction data, identified outliers, inconsistencies and manipulated data to insure data quality and integration.
  • Developed data pipeline using Sqoop, Spark and Hive to ingest, transform and analyze operational data.
  • Used Spark SQL with Scala for creating data frames and performed transformations on data frames.
  • Developed custom multi-threaded Java based ingestion jobs as well as Sqoop jobs for ingesting from FTP servers and data warehouses.
  • Implemented Spark using Scala and utilizing Data Frames and Spark SQL API for faster processing of data.
  • Real time streaming the data using Spark and Kafka.
  • Worked on troubleshooting spark application to make them more error tolerant.
  • Worked on fine-tuning spark applications to improve the overall processing time for the pipelines.
  • Wrote Kafka producers to stream the data from external rest API’s to Kafka topics.
  • Wrote Spark-Streaming applications to consume the data from Kafka topics and write the processed streams to HBase.
  • Experienced in handling large datasets using Spark in Memory capabilities using broadcasts variables in Spark, effective & efficient Joins, transformations and other capabilities.
  • Experience with Kafka in understanding and performing thousands of megabytes of reads and writes per second on streaming data.
  • Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
  • Worked extensively with Sqoop for importing data from Oracle.
  • Experience working for EMR cluster in AWS cloud and working with S3.
  • Involved in creating Hive tables, loading and analyzing data using Hive scripts.
  • Created Hive tables, dynamic partitions, buckets for sampling and working on them using Hive QL.
  • Involved in build applications using Maven and integrated with continuous integration servers like Jenkins to build jobs.
  • Created documents for data flow and ETL process using informatica mappings to support the project once it completed in production.
  • Performing data migration from Legacy Databases RDBMS to HDFS using Sqoop.
  • Perform Tuning and Increase Operational efficiency on a continuous basis.
  • Worked on Spark SQL, reading/ Writing data from JSON file, text file, parquet file, schema RDD.
  • Worked on POC’s with Apache Spark using Scala to implement spark in project.

Environment: Hadoop YARN, Spark-Core, Spark Streaming, Spark SQL, Scala, Kafka, Hive, Sqoop, Amazon AWS, HBase, Teradata, Power Center, Tableau, Oozie, Oracle, Linux

Confidential, Ann Arbor, MI

Sr Hadoop/ Scala Developer

Responsibilities:

  • Used Cloudera distribution extensively.
  • Converted existing MapReduce jobs into Spark transformations and actions using spark Data frames and Spark SQL API’s.
  • Developed Spark programs for Batch processing.
  • Written new spark jobs in Python to analyze the data of the customers and sales history.
  • Worked on Spark SQL and Spark Streaming.
  • Worked on different data formats such as JSON, XML and performed machine learning algorithms in Python.
  • Worked on reading multiple data formats on HDFS using Scala.
  • Creating end to end Spark-Solr applications using Scala to perform various data cleansing, validation, transformation and summarization activities according to the requirement.
  • Developed Spark scripts by using Scala shell commands as per the requirement.
  • Used Kafka to get data from many streaming sources into HDFS.
  • Involved in collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
  • Good experience in Hive partitioning, Bucketing and Collections perform different types of joins on Hive tables.
  • Used slick to query and storing in database in a Scala fashion using the powerful Scala collection framework.
  • Created Hive external tables to perform ETL on data that is generated on daily basics.
  • Written HBase bulk load jobs to load processed data to HBase tables by converting to files.
  • Performed validation on the data ingested to filter and cleanse the data in Hive.
  • Created SQOOP jobs to handle incremental loads from RDBMS into HDFS and applied Spark transformations.
  • Implemented Spark SQL to access hive tables into spark for faster processing of data.
  • Loaded the data into hive tables from spark and used parquet columnar format.
  • Developed Oozie workflows to automate and productionize the data pipelines.

Environment: Hadoop, Hive, Flume, Shell Scripting, Java, Eclipse, HBase, Kafka, Spark, Spark Streaming, Python, Oozie, HQL/SQL, Teradata.

Confidential, San Mateo, CA

Hadoop Developer

Responsibilities:

  • Aggregations and analysis done on large set of log data, collection of log data done using custom built Input Adapters and Sqoop.
  • Developed MapReduce programs for data extraction, transformation and aggregation.
  • Monitor and troubleshoot MapReduce Jobs those are running on the cluster.
  • Implemented solutions for ingesting data from various sources and processing the data utilizing Hadoop services like Sqoop, Hive Pig, HBase, MapReduce etc.
  • Worked on creating Combiners, Practitioners and Distributed cache to improve the performance of Map Reduce jobs.
  • Wrote Pig scripts to generate Map Reduce jobs and performed ETL procedures on the data in HDFS.
  • Experienced in handling Avro data files by passing schema into HDFS using Avro tools and Map Reduce.
  • Optimization of MapReduce algorithms using combiners and practitioners to deliver the best results and worked on Application performance optimization for a HDFS cluster.
  • Orchestrated many Sqoop scripts, Pig scripts, Hive queries using Oozie workflows and sub workflows.
  • Used Flume to collect, aggregate and store the web log data from different sources like web servers and pushed to HDFS.
  • Involved in creating Hive tables, loading with data and writing Hive queries which will invoke and run MapReduce jobs in the backend.
  • Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NOSQL and a variety of portfolios.
  • Involved in debugging MapReduce jobs using MRUnit framework and optimizing Map Reduce jobs.
  • Involved in troubleshooting errors in Shell, Hive and MapReduce.
  • Worked on debugging, performance tuning of Hive & Pig jobs.
  • Design and implement Map Reduce jobs to support distributed processing using Map Reduce, Hive and Apache Pig.
  • Created Hive external tables on the Map Reduce output before partitioning, bucketing is applied on top of it.

Environment: Hadoop, HDFS, MapReduce, HIVE, Pig, Sqoop, HBase, Oozie, MySQL, SVN, Putty, Zookeeper, UNIX, Shell Scripting, HiveQL, NOSQL database(HBASE), RDBMS, Eclipse, Oracle 11g.

Hire Now