We provide IT Staff Augmentation Services!

Sr. Hadoop/spark Developer Resume

2.00/5 (Submit Your Rating)

Sr Hadoop Spark Developer Tampa, FL

SUMMARY:

  • Highly skilled And IT Professional with 10+ years of experience in Software Engineering with emphasis on Big Data Application development and Java server - side programming.
  • Strong expertise in Big Data ecosystem like Spark, Hive, Sqoop, Hdfs, Map Reduce, Kafka, Oozie, Yarn, Pig, HBase, Flume.
  • Strong expertise in building scalable applications using various programming languages (Java, Scala and python).
  • In depth Knowledge on architecture of distributed systems and parallel computing.
  • Experience implementing end-to-end data pipelines for serving reporting and data science capabilities.
  • Experienced in working with Cloudera, Hortonworks and Amazon EMR clusters.
  • Experience in fine tuning applications written in Spark and Hive and to improve the overall performance of the pipelines.
  • Developed production ready spark applications using Spark RDD apis, Data frames, Datasets, Spark SQL and Spark Streaming.
  • Hands on experience on fetching the live stream data and inject data into HBase table using Spark Streaming and Apache Kafka.
  • Experience in Hadoop Shell commands, writing MapReduce Programs, verifying managing and reviewing Hadoop Log files.
  • In depth knowledge on import/export of data from Databases using Sqoop.
  • Well versed in writing complex hive queries using analytical functions.
  • Knowledge in writing custom UDF’s in Hive to support custom business requirements.
  • Solid experience in using the various file formats like CSV, TSV, Parquet, ORC, JSON and AVRO.
  • Experience in using the compression techniques like G-zip, Snappy with in Hadoop.
  • Strong knowledge of NoSQL databases and worked with HBase, Cassandra and Mongo DB.
  • Experience in using the cloud services like Amazon EMR, S3, EC2, Red shift and Athena.
  • Extensively used various IDE’s like IntelliJ, NetBeans and Eclipse
  • Proficient in using RDBMS concepts with Oracle, MySQL, DB2, Teradata and experienced in writing SQL queries.
  • Knowledge in writing shell scripts and scheduling using cron jobs.
  • Experience working with GIT(Repository), Jenkins and Maven build tools.
  • Developed cross-platform applications using JAVA, JSP, Servlets, Hibernate, RESTful, JDBC, JavaScript, XML, and HTML.
  • Used Log4J for enabling runtime logging and performed system integration test to ensure quality of the system.
  • Experience in using SOAP UI tool to validate the web service.
  • Expertise in writing unit test cases using JUnit API.
  • Experience in database design, entity relationships, database analysis, programming SQL, stored procedures PL/ SQL, packages and triggers in Oracle.
  • Highly self-motivated, good technical, communications and interpersonal skills. Able to work reliably under pressure. Committed team player with strong analytical and problem-solving skills, ability to quickly adapt to new environments & technologies.

TECHNICAL SKILLS:

Big Data Ecosystem: MapReduce, HDFS, HIVE, HBase, Pig, Sqoop, Flume, Oozie, Zookeeper, Spark, Kafka

Cloud Platform: Amazon AWS EMR, EC2, Redshift, Athena

Programming Languages: Java, Scala, Python, SQL, UNIX Shell Scripting.

Databases: Oracle 12c/11g, MySQL, MS-SQL Server2016/2014

Version Control: GIT, GitLab, SVN

NoSQL Databases: HBase and MongoDB

Methodologies: Agile Model

Build Management Tools: Maven, Ant.

IDE & Command line tools: Eclipse, IntelliJ

PROFESSIONAL EXPERIENCE:

Sr. Hadoop/Spark Developer

Confidential, Tampa, FL

Responsibilities:

  • Developed highly efficient Spark batch and streaming applications which run on AWS utilizing Spark API such as Datasets, Case Classes, Lambda functions, RDD transformations adhering to market standards and best practices for development.
  • Migrated long running Hadoop applications from legacy clusters to Spark applications running on Amazon EMR.
  • Used Spark-SQL to Load Parquet data and created Datasets defined by Case classes and handled structured data using Spark SQL which were finally stored into Hive tables for downstream consumption.
  • Written ETL scripts to move data from HDFS to S3 and vice versa and created Hive external tables on top of this data to be utilized in Big data applications.
  • Created scripts to sync data between local MongoDB and Postgres databases with those on AWS Cloud.
  • Implemented POC to migrate Hadoop Java applications to Spark on Scala.
  • Developed Scala scripts on Spark to perform operations as data inspection, cleaning, loading and transforms the large sets of JSon data to Parquet format.
  • Prepared Linux shell scripts to configure, deploy and manage Oozie workflows of Big Data applications.
  • Worked on Spark streaming using Amazon Kinesis for real time data processing.
  • Created, configured, managed and destroyed EMR transient non-prod clusters as well as long running Prod cluster on AWS.
  • Worked on Triggering and scheduling ETL jobs using AWS Glue and Automated Glue with CloudWatch Events.
  • Involved in developing Hive DDL templates which were hooked into Oozie workflows to create, alter and drop tables.
  • Created Hive snapshot tables and Hive Avro tables from data partitions stored on S3 and HDFS.
  • Involved in creating frameworks which utilized a large number of Spark and Hadoop applications running in series to create one cohesive E2E Big Data pipeline.
  • Used Amazon Cloudwatch to monitor and track resources on AWS.
  • Worked on Sequence, ORC, Avro, Parquet file formats and some compression techniques like LZO, Snappy.
  • Wrote Hive queries for data analysis to meet the business requirements.
  • Configured GitHub plugin to offer integration between GitHub & Jenkins and regularly involved in versioner control and source code management including Release Build and Snapshot Build management
  • Involved in writing unit test cases for Hadoop and Spark applications which were tested in MRUnit and ScalaUnit environments respectively.
  • Flexible with Unix/Linux and Windows Environments working with Operating Systems like Cent OS 5/6, Ubuntu 13/14.
  • Used Putty-SSH Client to connect remotely to the servers.
  • Involved in story-driven agile development methodology and actively participated in daily scrum meetings.

Sr. Hadoop Developer

Confidential, Portland, OR

Responsibilities:

  • Performed performance tuning and troubleshooting of MapReduce jobs by analyzing and reviewing Hadoop log files.
  • Supported MapReduce Programs, those are running on the cluster.
  • Provisioning, installing, configuring, monitoring, and maintaining HDFS, Yarn, HBase, Flume, Sqoop, Oozie, Hive.
  • Involved in writing the shell scripts for exporting log files to Hadoop cluster through automated process.
  • Real time streaming of data using Spark with Kafka.
  • Creating Hive tables, dynamic partitioning, buckets for sampling, and working on them using HiveQL.
  • Used Pig to parse the data and store it in Avro format.
  • Stored data in tabular formats using Hive tables and Hive SerDes.
  • Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
  • Developed Kafka producer and consumers for message handling.
  • Used Amazon CLI for data transfers to and from Amazon S3 buckets.
  • Executed Hadoop jobs on AWS EMR using programs, data stored in S3 Buckets.
  • Involved in creating UNIX shell scripts for database connectivity and executing queries in parallel job execution.
  • Involved in importing the real-time data to Hadoop using Kafka and implemented Oozie jobs for daily imports.
  • Developed and written Apache Pig scripts and Hive scripts to process the HDFS data.
  • Designed and implemented incremental imports into Hive tables.
  • Involved in Unit testing and delivered Unit test plans and results documents using Junit and MRUnit.
  • Involved in collecting, aggregating and moving data from servers to HDFS using Apache Flume.
  • Written Hive jobs to parse the logs and structure them in tabular format to facilitate effective querying on the log data.
  • Exported the analyzed data to the relational databases using Sqoop for visualization.
  • Analyzed large and critical datasets using HDFS, HBase, MapReduce, Hive, Hive UDF, Pig, Sqoop, Zookeeper.
  • Developed custom aggregate UDF's in Hive to parse log files.
  • Identified the required data to be pooled to HDFS and created Sqoop scripts which were scheduled periodically to migrate data to the Hadoop environment.
  • Involved with File Processing using Pig Latin.
  • Created MapReduce jobs involving combiners and practitioners to deliver better results and worked on application performance optimization for an HDFS cluster.
  • Worked on debugging, performance tuning of Hive & Pig Jobs.

    Environment: Cloudera, MapReduce, HDFS, Pig Scripts, Hive Scripts, HBase, Sqoop, Zookeeper, Oozie, Oracle, Shell Scripting.

Hadoop Developer

Confidential, Brentwood, TN

Responsibilities:

  • Worked on analyzing, writing Hadoop MapReduce jobs using JavaAPI, Pig and Hive. Responsible for building scalable distributed data solutions using Hadoop.
  • Involved in loading data from edge node to HDFS using shell scripting.
  • Created HBase tables to store variable data formats of PII data coming from different portfolios.
  • Exported the analysed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
  • Analyze large and critical datasets using Cloudera, HDFS, HBase, MapReduce, Hive, Hive UDF, Pig, Sqoop, Zookeeper, & Spark.
  • Developed custom aggregate functions using Spark SQL and performed interactive querying. Used Pig to store the data into HBase.
  • Creating Hive tables, dynamic partitions, buckets for sampling, and working on them using HiveQL.
  • Used Pig to parse the data and Store in Avro format.
  • Stored the data in tabular formats using Hive tables and Hive SerDes.
  • Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
  • Worked with NoSQL databases like HBase in creating Hbase tables to load large sets of semi structured data coming from various sources.
  • Implemented a script to transmit information from Oracle to Hbase using Sqoop.
  • Worked on tuning the performance Pig queries.
  • Involved in writing the shell scripts for exporting log files to Hadoop cluster through automated process.
  • Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
  • Implemented MapReduce programs to handle semi/unstructured data like XML, JSON, and sequence files for log files.
  • Installed Oozie workflow engine to run multiple Hive and pig jobs.
  • Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.

    Environment: Hadoop, HDFS, Pig, Sqoop, Spark, MapReduce, Cloudera, Snappy, Zookeeper, NoSQL, HBase, Shell Scripting, Ubuntu, Linux Red Hat.

Hadoop Developer

Confidential, Mount Juliet, TN

Responsibilities:

  • Worked on writing transformer/mapping Map-Reduce pipelines using Java.
  • Handling structured and unstructured data and applying ETL processes.
  • Collected the logs data from web servers and integrated in to HDFS using Flume.
  • Involved in creating Hive Tables, loading with data and writing Hive queries which will invoke and run Map Reduce jobs in the backend.
  • Involved in loading data into HBase using HBase Shell, HBase Client API, Pig and Sqoop.
  • Designed and implemented Incremental Imports into Hive tables.
  • Worked in Loading and transforming large sets of structured, semi structured and unstructured data.
  • Extensively used Pig for data cleansing.
  • Involved in collecting, aggregating and moving data from servers to HDFS using Apache Flume.
  • Written Hive jobs to parse the logs and structure them in tabular format to facilitate effective querying on the log data.
  • Worked extensively with Sqoop for importing and exporting the data from HDFS to Relational Database system and vice-versa. Loading data into HDFS.
  • Involved in creating Hive tables, loading with data and writing hive queries that will run internally in map reduce way.
  • Facilitated the Production move ups of ETL components from Acceptance to Production environment
  • Experienced in managing and reviewing the Hadoop log files.
  • Migrated ETL jobs to Pig scripts do Transformations, even joins and some pre-aggregations before storing the data onto HDFS.
  • Implemented the workflows using Apache Oozie framework to automate tasks.
  • Worked with Avro Data Serialization system to work with JSON data formats.
  • Worked on different file formats like Sequence files, XML files and Map files using Map Reduce Programs.
  • Developed scripts and automated data management from end to end and sync up between all the clusters.
  • Involved in Setup and benchmark of Hadoop /HBase clusters for internal use.
  • Created and maintained Technical documentation for launching HADOOP Clusters and for executing pig Scripts.

Hadoop Developer

Confidential, Phoenix, AZ

Responsibilities:

  • Collaborate in identifying the current problems, constraints and root causes with data sets to identify the descriptive and predictive solution with support of the Hadoop HDFS, MapReduce, Pig, Hive, and Hbase and further to develop reports in Tableau.
  • Architect the Hadoop cluster in Pseudo distributed Mode working with Zookeeper and Apache and storing and loading the data from HDFS to AmazonAWSS3 and backing up and Created tables in AWS cluster with S3 storage.
  • Evaluated existing infrastructure, systems, and technologies and provided gap analysis, and documented requirements, evaluation, and recommendations of system, upgrades, technologies and created proposed architecture and specifications along with recommendations
  • Installed/Configured/Maintained Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase, Zookeeper and Sqoop.
  • Installed and Configured Sqoop to import and export the data into MapR-FS, HBase and Hive from Relational databases.
  • Administering large MapR Hadoop environments build and support cluster set up, performance tuning and monitoring in an enterprise environment.
  • Installed and Configured MapR-zookeeper, MapR-cldb, MapR-jobtracker, MapR-tasktracker, MapR resource manager, MapR-node manager, MapR-fileserver, and MapR-webserver.
  • Installed and configured Knox gateway to secure HIVE through ODBC, WebHcat and Oozie services.
  • Load data from relational databases into MapR-FS filesystem and HBase using Sqoop and setting up MapR metrics with NoSQL database to log metrics data.
  • Close monitoring and analysis of the MapReduce job executions on cluster at task level and optimized Hadoop clusters components to achieve high performance.
  • Developed data pipeline using Flume, Sqoop, Pig and Java MapReduce to ingest customer behavioral data into HDFS for analysis.
  • Integrated HDP clusters with Active Directory and enabled Kerberos for Authentication.
  • Worked on commissioning & decommissioning of Data Nodes, NameNode recovery, capacity planning and installed Oozie workflow engine to run multiple Hive and Pig Jobs.
  • Worked on creating the Data Model for HBase from the current Oracle Data model.
  • Implemented High Availability and automatic failover infrastructure to overcome single point of failure for Name node utilizing zookeeper services.
  • Leveraged Chef to manage and maintain builds in various environments and planned for hardware and software installation on production cluster and communicated with multiple teams to get it done.
  • Monitoring the Hadoop cluster functioning through MCS and worked on NoSQL databases including HBase.
  • Used Hive and created Hive tables and involved in data loading and writing Hive UDFs and worked with Linux server admin team in administering the server hardware and operating system.
  • Worked closely with data analysts to construct creative solutions for their analysis tasks and managed and reviewed Hadoop and HBase log files.
  • Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports and worked on importing and exporting data from Oracle into HDFS and HIVE using Sqoop.
  • Collaborating with application teams to install operating system and Hadoop updates, patches, version upgrades when required.
  • Automated workflows using shell scripts pull data from various databases into Hadoop.
  • Supported in setting up QA environment and updating configurations for implementing scripts with Pig and Sqoop.

We'd love your feedback!