We provide IT Staff Augmentation Services!

Big Data/spark Developer Resume

3.00/5 (Submit Your Rating)

Houston, TX

PROFESSIONAL SUMMARY:

  • Overall 5+ years of IT experience across Java, SQL, ETL, Big Data. Interested and passionate about working in Big Data environment. 4+ years of experience in big data, Hadoop, No SQL technologies in various fields like Insurance, Finance, and Health Care.
  • Vast knowledge on the Hadoop Architecture and functioning of various components such as HDFS, Name Node, Data Node, Job Tracker, Task Tracker, Map reduce, Spark.
  • Extensive of experience in providing solutions for Big Data using Hadoop 2.x, HDFS, MR2, YARN, Kafka, Pig, Hive, Sqoop, HBase, Cloudera Manager, Horton works, Zookeeper, Oozie, Hue.
  • Experience in importing and exporting data using Sqoop from HDFS/Hive/HBase to Relational Database Systems and vice - versa. Skilled in Data migration and data generation in Big data ecosystem.
  • Experienced in building highly scalable Big-data solutions using Hadoop and multiple distributions i.e., Cloudera, Horton works and NoSQL platforms (HBase).
  • Implementation of Big data batch processes using Hadoop, Map Reduce, YARN, Pig and Hive.
  • Experience in importing and exporting data using Sqoop from HDFS/Hive/HBase to Relational Database Systems and vice-versa.
  • Hands on experience in in-memory data processing with Apache Spark using Scala and python codes.
  • Worked with Spark on an EMR cluster along with other Hadoop applications, and it can also leverage the EMR file system (EMRFS) to directly access data in Amazon S3.
  • Responsible for interaction with the clients for understanding their business problem related to Big data, Cloud Computing and NoSQL technologies.
  • Experienced in using Kafka as a distributed publisher-subscriber messaging system.
  • Good experience in writing Pig scripts and Hive Queries for processing and analyzing large volumes of data.
  • Developed Spark scripts by using Scala shell commands as per the requirement
  • Experience in optimization of Map Reduce algorithm using Combiners and Partitioners to deliver best results.
  • Extending Hive and Pig core functionality by writing custom UDFs.
  • Good knowledge on Amazon AWS concepts like EMR & EC2 web services which provides fast and efficient processing of Big Data.
  • Experience in understanding the security requirements for Hadoop and integrate with Kerberos authentication and authorization infrastructure.
  • Experienced in Strong scripting skills in Python and UNIX shell.
  • Experience in managing and reviewing Hadoop log files.
  • Hands on experience in application development using RDBMS and Linux shell scripting. Having good working experience in Agile/Scrum methodologies, technical discussion with client
  • Communication using scrum calls daily for project analysis specs and development aspects.
  • Ability to work independently as well as in a team and able to effectively communicate with customers, peers and management at all levels in and outside the organization.

TECHNICAL SKILLS:

Big Data/Hadoop ecosystem: HDFS, Map Reduce YARN, Apache NiFi, Hive, Pig, HBase, Impala, Zookeeper, Sqoop, Flume, Oozie, Spark. Apache Phoenix, Zeppelin, EMR.

Programming Languages: C, C++, Java, Scala, SQL, PL/SQL, Python, Linux shell scripts.

Methodologies: Agile, Scrum, Waterfall

NoSQL Database: Hbase, Cassandra, MongoDB

Database: Oracle, DB2, MS Azure.

Tools: Used: Eclipse, IntelliJ, GIT, Putty, WinSCP

Operating systems: Windows, Unix, Linux, Ubuntu

PROFESSIONAL EXPERIENCE:

Confidential, Houston, TX

Big Data/Spark Developer

Responsibility:

  • Work in agile, related to data ingestion, transformation and publication of data. Gather requirements, analyze, create design documents, and perform impact analysis.
  • Participate in all aspects of application development; upgrade analysis, implementation, and testing as well as the process for loading data while adhering to requirements.
  • Collaborate with stakeholders to analyze business requirements, develop data model & data capture strategy, and develop technical design specifications.
  • Perform validations and consolidations for the imported data, Data Migration and Data Generation.
  • Ingest data sets from different Data Bases and Servers using Kafka.
  • Implement advanced procedures like text analytics and processing using the in-memory computing capabilities like Apache Spark written in Scala and Java.
  • Develop Spark scripts by using Scala shell commands, Java as per the requirement.
  • Involve in Data manipulation with millions of rows of data.
  • Develop Complex HiveQL‘s using SerDe JSON.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Scala, Java
  • Work with different file formats such as Text, Sequence files, Avro, ORC and Parquet.
  • Develop Spark code using Java and Scala for faster testing and processing of data.
  • Import the data from different sources like HDFS, HBase and Oracle DB into Spark RDD.
  • Designed and Implemented Spark Jobs to be deployed and run on existing Active clusters.
  • As part of support, responsible for troubleshooting of spark Jobs, Java Scripts, Hive.
  • Worked on performance tuning of Hive.
  • Create test plans, test cases, test scripts and perform testing of the BI and related environments.
  • Plan and organize tasks, reports progress to manager.

Confidential, Princeton Junction, NJ.

Spark & NiFi Developer

Responsibilities:

  • Implemented solutions utilizing Advanced AWS Components: EMR, EC2, etc integrated with Big Data/Hadoop Distribution Frameworks: Zookeeper, Yarn, Spark, Scala, NiFi etc.
  • Designed and Implemented Spark Jobs to be deployed and run on existing Active clusters.
  • Configured Postgres Database on EC2 instances and made sure application that was created is up and running, troubleshooter issues to meet the desired application state.
  • Worked on creating and configuring secure VPC, Subnets, Security Groups through private and public networks.
  • Created alarms, alerts, notifications for Spark Jobs to email and slack group message job status and log in Cloud Watch.
  • Worked on NiFi data Pipeline to process large set of data and configured Lookup’s for Data Validation and Integrity.
  • Worked on generation large set of test data with data integrity using java which used in Development and QA Phase.
  • Worked in Spark Scala, improving the performance and optimized of the existing applications running on EMR cluster.
  • Worked on a Spark Job to Convert CSV data to Custom HL7/FHIR objects using FHIR API’s.
  • Deployed SNS, SQS, Lambda function, IAM Roles, Custom Policies, EMR with Spark and Hadoop setup and bootstrap scripts to setup additional software’s needed to perform the job in QA and Production Environment using Terra form Scripts.
  • Worked on Spark Job to perform Change Data Capture (CDC) on Postgres Tables and updated target tables using JDBC properties.
  • Worked on Kafka Publisher integrated in spark job to capture errors from Spark Application and push into Postgres table.
  • Worked extensively on building Nifi data pipelines in docker container environment in development phase.
  • Worked with Devops team to Clusterize NIFI Pipeline on EC2 nodes integrated with Spark, Kafka, Postgres running on other instances using SSL handshakes in QA and Production Environments

Confidential, New York City, New York

Big Data/Hadoop Developer

Responsibilities:

  • Working in agile, successfully completed stories related to ingestion, transformation and publication of data on time.
  • Perform validations and consolidations for the imported data, Data Migration and Data Generation.
  • Ingested data sets from different DataBases and Servers using Sqoop, Talend Import tool and MFT (Managed file transfer) Inbound process with elastic search.
  • Expert in implementing advanced procedures like text analytics and processing using the in-memory computing capabilities like Apache Spark written in Scala and Java.
  • Developed Spark scripts by using Scala shell commands, Java as per the requirement.
  • Using Spark streaming consumes topics from distributed messaging source Talend, Kafka and periodically pushes batch of data to Spark for real time processing in elastic search.
  • Involved in teams to analyze the Anomaly detection and ratings of the data using ETL tool Talend.
  • Developed Complex HiveQL‘s using SerDe JSON.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Scala.
  • Involved in importing the real time data using elastic search to Hadoop using Kafka, Talend and implemented the Oozie job for daily imports
  • Wrote Pig Latin Scripts and Hive Queries using Avro schemas to transform the Data sets in HDFS.
  • As part of support, responsible for troubleshooting of Map Reduce Jobs, Java Scripts, Pig Jobs, Hive
  • Worked on performance tuning of Hive & Pig Jobs.
  • Implemented Spark using Scala and Spark SQL for faster testing and processing of data in Amazon EMR.

Environment: Hadoop, Cloudera, Map Reduce, Spark, Shark, Hive, Apache NiFi, Pig, Sqoop, Shell Scripting, Storm, Talend, Kafka, Data Meer, Oracle, Teradata, SAS, Arcadia, Java 7.0, Nagios, spring, JIRA, EMR.

Confidential, Tampa, Florida

Hadoop and Spark Developer

Responsibilities:

  • Installed and configured Hadoop Map Reduce, HDFS, developed multiple Map Reduce jobs in Java and Nifi for data cleaning and preprocessing.
  • Imported and exported data into HDFS from Oracle database and vice versa using Sqoop.
  • Installed and configured Hadoop Cluster for major Hadoop distributions.
  • Used Hive, Pig and Talend as an ETL tools for event joins, filters, transformations and pre-aggregations.
  • Created partitions, bucketing across state in Hive to handle structured data using Elastic search.
  • Developed workflow in Oozie to orchestrate a series of Pig scripts to cleanse data such as removing personal information or merging many small files into a handful of very large, compressed files using Pig pipelines in the data preparation stage.
  • Involved in moving all log files generated from various sources to HDFS for further processing through Elastic search, Kafka, Flume & Talend and process the files by using Piggybank.
  • Extensively used PIG to communicate with Hive using HCatalog and HBase using Handlers.
  • Used Spark SQL for Scala&, Python interface that automatically converts RDD case classes to schema RDD.
  • Used SparkSQL to read and write table which are stored in Hive and Amazon EMR.
  • Performed Sqooping for various file transfers through the HBase tables for processing of data to several NoSQL DBs- Cassandra, MangoDB.
  • Created tables, secondary indices, join indices viewed in Teradata development environment for testing.
  • Captured data logs from web server and Elastic search into HDFS using Flume for analysis.
  • Managed and reviewed Hadoop log files.

Environment: Hive, Pig, Map Reduce, Apache Nifi, Sqoop, Oozie, Flume, Kafka, Talend, EMR, Storm, HBase, Unix, Linux, Python, SQL, Hadoop 1.x, HDFS, GitHub, Talend, Python Scripting.

Confidential

Inter /Java developer

Responsibilities:

  • Involved in gathering business requirements, analyzing the project and created UML diagrams such as Use Cases, Class Diagrams, Sequence Diagrams and flowcharts for the optimization Module using Microsoft Visio.
  • Developing new and maintaining existing functionality using SPRING MVC, Hibernate .
  • As a Big Data Developer implemented solutions for ingesting data from various sources and processing the Data-at-Rest utilizing Big Data technologies such as Hadoop, Map Reduce Frameworks, HBase, Hive, Oozie, Flume, Sqoop, elastic search etc.
  • Designed and Implemented real-time Big Data processing to enable real-time analytics, event detection and notification for Data-in-Motion.
  • Load and transform large sets of structured, semi-structured using Hive and Impala with elastic search.
  • Experienced in Developing Hive queries in BigSQL Client for various use cases.
  • Involved in developing few Shell Scripts and automated them using CRON job scheduler

Environment: Hadoop 1x, Hive 0.10, Pig 0.11, Sqoop, HBase, Java, spring, Hibernate, Oracle, MySQL, Big data, Hive.

We'd love your feedback!