We provide IT Staff Augmentation Services!

Hadoop/spark Developer Resume

5.00/5 (Submit Your Rating)

Charlotte, NC

PROFESSIONAL SUMMARY:

  • 5 and half years of professional IT work experience in Analysis, Design, Development, deployment and Maintenance of critical software and big data applications.
  • Hands on experience across Hadoop eco system that includes extensive experience into Big Data technologies like MapReduce, YARN, HDFS, Apache Cassandra, HBase, Oozie, Hive, Sqoop, Pig, Zoo Keeper and Flume.
  • In depth knowledge in HDFS, Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce programming.
  • Expertise in converting Map Reduce programs into Spark transformations using Spark RDD's.
  • Expertise in Spark Architecture including Spark Core, Spark SQL, Data Frames, Spark Streaming and Spark MLlib.
  • Excellent understanding /knowledge on Hadoop (Gen - 1 and Gen-2) and various components such as Job Tracker, Task Tracker, Name Node, Data Node, Resource Manager (YARN).
  • Good experience on Microsoft Azure services like HDInsights.
  • Configured Spark streaming to receive real time data from the Kafka and store the stream data to HDFS using Scala.
  • Profound experience in working with Cloudera (CDH4 &CDH5) and HortonWorks Hadoop Distributions and Amazon EMR Hadoop distributors on multi-node cluster.
  • Worked on Job Scheduler tools like Control - M and Redwood.
  • Experience in using Kafka and Kafka brokers to initiate spark context and processing live streaming information with the help of RDD.
  • Experienced in working with structured data using HiveQL, join operations, Hive UDFs, partitions, bucketing and internal/external tables.
  • Experienced in using Pig scripts to do transformations, event joins, filters and some pre-aggregations before storing the data onto HDFS.
  • Involvement in creating custom UDFs for Pig and Hive to consolidate strategies and usefulness of Python/Java into Pig Latin and HQL (HiveQL).
  • Responsible for Designing Logical and Physical data modelling for various data sources on Amazon Redshift. Worked on AWS Data Pipeline to configure data loads from S3 to into Redshift.
  • Good Experience with NoSQL Databases like HBase, MongoDB and Cassandra.
  • Good working experience on AWS and AZURE services.
  • Worked on importing data into HBase using HBase Shell and HBase Client API.
  • Experience in designing and developing tables in HBase and storing aggregated data from Hive Table.
  • Imported the data from different sources like AWS S3, Local file system into Spark RDD.
  • Experienced in designing both time driven and data driven automated workflows using Oozie and Zookeeper.
  • Experience working on Solr for developing search engine on unstructured data in HDFS.
  • Extensively used Solr to enable indexing for enabling searching on Non-primary key columns from Cassandra key spaces.
  • Experience working with operating systems like Linux, UNIX, Solaris, and Windows.
  • Experience in working with Hadoop in Stand-alone, pseudo and distributed modes.
  • Experience in writing stored procedures and complex SQL queries using relational databases like Oracle, SQL Server, and MySQL.
  • Skilled in data management, data extraction, manipulation, validation and analyzing huge volume of data.
  • Good working knowledge on Eclipse IDE for developing and debugging Java applications.
  • Experienced and in depth knowledge of cloud integration with AWS using Elastic Map Reduce (EMR), Simple Storage Service (S3), EC2, Redshift and Microsoft Azure.
  • Detailed understanding of Software Development Life Cycle (SDLC) and strong knowledge in project implementation methodologies like Waterfall and Agile.
  • A very Good team player with ability to solve problems, organize and prioritize multiple tasks.

TECHNICAL SKILLS:

Big data Technologies: Hadoop Ecosystem, Spark, MapReduce, Yarn, Kafka, Pig, Hive, HBase, Flume, Oozie, Kafka, AWS, Elastic MapReduce, Azure.

Operating systems: Windows, UNIX, Linux

Programming Languages: Java, SQL, PL/SQL, Python, Linux shell scripting and REST.

RDBMS: Oracle 10g, MySQL, SQL Server, DB2.

NO SQL: HBase, Cassandra, MongoDB.

Tools: Used: Eclipse, Putty, Jira, TFS, Control - M, Microsoft Suite, Tableau, Talend.

WORK EXPERIENCE:

Hadoop/Spark Developer

Confidential, Charlotte, NC

Responsibilities:

  • Worked on Big Data infrastructure for batch processing and real time processing. Built scalable distributed data solutions using Hadoop.
  • Importing and exporting terabytes of data using Sqoop and real time data using Flume and Kafka.
  • Written Programs in Spark using Scala and Python for Data quality check.
  • Created various hive external tables, staging tables and joined the tables as per the requirement. Implemented static Partitioning, Dynamic partitioning and Bucketing in Hive using internal and external table.
  • Involved in file movements between HDFS and AWS S3 and extensively worked with S3 bucket in AWS.
  • Written transformations and actions on data frames, used Spark SQL on data frames to access hive tables into spark for faster processing of data.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Python and Scala.
  • Used Hive to do transformations, joins, filter and some pre-aggregations after storing the data to HDFS.
  • Used Spark-Streaming APIs to perform necessary transformations and actions on the fly for building the common learner data model which gets the data from Kafka in near real time and Persists into Cassandra.
  • Worked extensively with importing metadata into Hive using Python and migrated existing tables and applications to work on AWS cloud (S3).
  • Used Scala to convert Hive/SQL queries into RDD transformations in Apache Spark.
  • Implemented the workflows using Apache Oozie framework to automate tasks. Used Zookeeper to co-ordinate cluster services.
  • Have used Enterprise Data Warehouse (EDW) architecture and various data modeling concepts like star schema, snowflake schema in the project.
  • Configured deployed and maintained multi-node Dev and Test Kafka Clusters and implemented data ingestion and handling clusters in real time processing using Kafka.
  • Performed various benchmarking steps to optimize the performance of spark jobs and thus improve the overall processing.
  • Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive and involved in creating Hive Tables, loading with data and writing Hive queries which will invoke and run Map Reduce jobs in the backend.
  • Implemented usage of Amazon EMR for processing Big Data across a Hadoop Cluster of virtual servers on Amazon Elastic Compute Cloud (EC2) and Amazon Simple Storage Service (S3).
  • Designed ETL workflows on Tableau, Deployed data from various sources to HDFS and generated reports using Tableau.
  • Worked with SCRUM team in delivering agreed user stories on time for every Sprint.

Environment: Hadoop, MapReduce, HDFS, Yarn, Hive, Sqoop, Cassandra, Oozie, Spark, Scala, Python, AWS, Flume, Kafka, Tableau, Linux, Shell Scripting.

Hadoop Developer

Confidential, Baltimore, MD

Responsibilities :

  • Developed a 16-node cluster in designing the Data Lake with HortonWorks Distribution.
  • Responsible for data audit and data quality and the first point of contact for issues in data from different sources.
  • Involved in designing the Data pipeline from end to end to ingest data in to the Data Lake.
  • Worked on programming the Kafka Producer and Consumer with the connection parameters and methods from Oracle Sonic JMS, until the Data Lake or HDFS.
  • Worked in Azure environment for development and deployment of Custom Hadoop Applications .
  • Extracting the data from Azure Data Lake into HDInsight Cluster (INTELLIGENCE + ANALYTICS) and applying Spark transformations & Actions and loading into HDFS.
  • Developed Python code to perform validation of input XML files, in separating bad data before ingestion in to the Hive and Data Lake.
  • Designed, Developed and delivered the jobs and transformations over the data to enrich the data and progressively elevate for consuming in the Pub layer of the data lake.
  • Worked on Sequence files, Map side joins, bucketing, partitioning for hive performance enhancement and storage improvement.
  • Worked with a few linked services using ADF and Azure HDInsight to connect Event Hubs.
  • Implemented various Azure platforms such as Azure SQL Database, Azure SQL Data Warehouse, Azure Analysis Services, HDInsight, Azure Data Lake and Data Factory.
  • Responsible for developing the unit test cases from Kafka Producer to Consumer, in establishing the connection parameters from the JMS to Kafka.
  • Worked on handling different data formats including XML, JSON, ORC, and Parquet.
  • Strong Experience in Hadoop file system and Hive commands for data mappings.
  • Worked in ingestion from RDBMS using Sqoop and Flume.
  • Worked on designing the hive tables structure including creation, partitioning, transformations over the data based on Business requirements and enrich data from 3 layered BI Data Lake.
  • Developed documentation in individual modules like Application, Hive, Operations and Housekeeping in accordance to System integration and error Handling.
  • Transformed the existing ETL logic to Hadoop mappings.
  • Extensive hands on experience in Hadoop file system commands for file handling operations.
  • Worked in converting Hive/SQL queries into Spark transformations using Spark RDDs, Python and Scala.
  • Developed multiple POCs using PySpark and deployed on the Yarn cluster, compared the performance of Spark, with Hive and SQL/Teradata.
  • Developed code in reading multiple data formats on HDFS using PySpark
  • Worked in converting Hive/SQL queries into Spark transformations using Spark RDDs, Python and Scala.
  • Experienced in analyzing the SQL scripts and designed the solution to implement using hive
  • Utilized capabilities of Tableau such as Data extracts, Data blending, Forecasting, Dashboard actions and table calculations to build dashboards.
  • Published customized interactive reports and dashboards, report scheduling using Tableau server.

Environment: Hadoop Frameworks, HDP 2.5.3, Kafka, Azure, Hive, Sqoop, Python, Spark, Shell Scripting, Oracle SONIC JMS, Java 8.0 & 7.0, Eclipse, Tableau.

Java Developer

Confidential

Responsibilities:

  • Involved in Design and Development of Search, Executive Connections and Compensation modules.
  • Used Hibernate ORM framework with spring framework for data persistence and transaction management
  • Designed the technical specifications, components based on the client requirement
  • Identify the Stories from Backlog relating to Release plan prior to start the Sprint
  • Involved in planning the Stories and tasks to be taken up sprint by sprint basis
  • Designed and documented REST/HTTP APIs, including JSON data formats and API versioning strategy.
  • Developed Web Services for sending and getting data from different applications using SOAP messages.
  • Monitoring and guiding the team to resolve the technical issues on daily basis.
  • Developed dynamic web pages using JSP and Servlets
  • Have effectively written Criteria queries and HQL using Hibernate
  • Worked on implementing the REST Web services using RESTLET
  • Involved in developing the Pojo Classes and mapped them with Hibernate XML

Environment: Java, J2EE, Servlets, REST, Oracle 10g, Hibernate, Java Script, HTML

We'd love your feedback!