We provide IT Staff Augmentation Services!

Hadoop/spark Developer Resume

5.00/5 (Submit Your Rating)

Jacksonville, FL

SUMMARY:

  • Over 6+ years of IT experience including 4+ years of working with Big Data and Cloudera in Healthcare, Telecom, Hardware and Banking. Involved in various SDLC methods from analysis, design, development, testing, implementation and maintenance with timely delivery against aggressive deadlines in both Agile/Scrum and Waterfall methodology. Good experience in installing, configuring, and leveraging the Hadoop ecosystem to glean meaningful insights from semi - structured and unstructured data. Excellent Communication, Management and Presentation skills.
  • Good understanding of Software Development Life Cycle (SDLC) and sound knowledge of project implementation methodologies including Waterfall and Agile.
  • Strong working experience with ingestion, storage, processing and analysis of big data.
  • Successfully loaded files to HDFS from Oracle, Sql Server and Teradata using Sqoop.
  • Working with Sqoop in importing and exporting data from different databases like MySql, Oracle into HDFS and Hive.
  • Proficient in Java, J2EE, Servlets, JSP, spring, Hibernate
  • Experience with working of cloud configuration in (Amazon web services) AWS.
  • Experience on working structured, unstructured data with various file formats such as Avro data files, xml files, JSON files, sequence files, ORC and Parquet.
  • Experience with Oozie Workflow Engine to automate and parallelize Hadoop, MapReduce and Pig jobs.
  • Experience in working with databases, such as Oracle, SQL Server, My SQL.
  • Extensive experience with ETL and Query tools for Big Data like Pig Latin and HiveQL.
  • Experience in developing front-end systems with HTML5, JavaScript, CSS3, Bootstrap, JSON, JQuery and Ajax.
  • Experience in developing data pipeline using Kafka, Spark and Hive to ingest, transform and analyzing data.
  • Experience in Data modeling and connecting Cassandra from Spark and saving summarized data frame to Cassandra.
  • Exploring with the Spark for improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark Yarn.
  • Developing applications using Scala, Spark SQL and MLlib libraries along with Kafka and other tools as per requirement then deployed on the Yarn cluster.
  • Experience on Big Data Ecosystem using Hadoop framework and related technologies such as HDFS, HBase, Map Reduce, Hive, Pig, Flume, Oozie, Kafka, Sqoop, Zookeeper, YARN, Spark (PySpark & Spark-shell), Cassandra, NiFi.
  • Adequate knowledge and working experience in Agile & Waterfall methodologies.
  • Developing and Maintenance the Web Applications using the Web server Tomcat, Confidential WebSphere.
  • Experience in job workflow scheduling and monitoring tools like Oozie, Nifi.
  • Experience in Front-end Technologies like Html, CSS, Html5, CSS3, and Ajax.
  • Experience in building high performance and scalable solutions using various Hadoop ecosystem tools like Pig, Hive, Sqoop, Spark, Solr and Kafka.
  • Extensive experience in working with Oracle, MS SQL Server, DB2, MySQL RDBMS databases.
  • Well verse and hands on experience in Version control tools like GIT, CVS and SVN.
  • Expert in implementing advanced procedures like text analytics and processing using Apache Spark written in Scala.
  • Good knowledge in job workflow scheduling and monitoring tools like Oozie and Zookeeper.
  • Responsible for deploying the scripts into Github version control repository hosting service and deployed the code using Jenkins.
  • Primarily involved in Data Migration process using Azure by integrating with Github repository and Jenkins.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Spark SQL using Scala.

TECHNICAL SKILLS:

Data Ingestion: Sqoop, Kafka, Flume, NiFi, Apache Hadoop Eco Systems.

Monitoring: Ambari, Cloudera Manager.

Relational Databases: Oracle, MYSQL, Microsoft SQL Server, Oracle SQL & ACCESS.

Cloud AWS: EMR, EC2, S3, DyDB.

NOSQL Databases: MONGODB, Cassandra, HBASE, DYNAMODB.

Version Control: GIT, SVN.

Operating System: LINUX, Windows, UNIX.

Data Processing: Spark, Impala, YARN, Map Reduce.

Distributed Storage and Computing: HDFS, Zookeeper.

Data Formats: Parquet, Sequence, AVRO, ORC, CSV, JSON.

Programming Languages: Python, Scala, SQL, Java.

PROFESSIONAL EXPERIENCE:

Confidential - Jacksonville, FL

Hadoop/Spark Developer

Responsibilities:

  • Responsible for building scalable distributed data solutions using Hadoop Cloudera.
  • Involved in story-driven agile development methodology and actively participated in daily scrum meetings.
  • Created RDD's and applied data filters in Spark and created Cassandra tables and Hive tables for user access.
  • Created Partitioning, Bucketing, and Map Side Join, Parallel execution for optimizing the hive queries decreased the time of execution from hours to minutes.
  • Worked with Amazon EMR to process data directly in S3 when we want to copy data from S3 to the Hadoop Distributed File System (HDFS) on your Amazon EMR cluster by setting up the Spark Core for analysis work.
  • Worked on importing data from MySQL DB to HDFS and vice-versa using Sqoop to configure Hive Metastore with MySQL, which stores the metadata for Hive tables.
  • Worked with different actions in Oozie to design workflow like Sqoop action, pig action, hive action, shell action.
  • Mastered major Hadoop distributes like Hortonworks and Cloudera numerous Open Source projects and prototype various applications that utilize modern Big Data tools.
  • Analyzed large and critical datasets using Cloudera, HDFS, HBase, MapReduce, Hive UDF, Pig, Zookeeper and Spark.
  • Developed Hive Scripts, Pig scripts, Unix Shell scripts, Spark programming for all ETL loading processes and converting the files into parquet in the Hadoop File System.
  • Loaded and transformed large sets of structured, semi structured data through Sqoop.
  • Optimizing of existing algorithms in Hadoop using Spark Context, Hive-SQL, and Data Frames.
  • Expertise in implementing Spark application using higher order functions for both batch and interactive analysis requirement. Experienced in developing Spark scripts for data analysis using Pyspark.

Environment: Hadoop 3.0, Spark, Cassandra, Hive 2.3, Redshift, HDFS, MySQL, Sqoop 1.4, NoSQL, Oozie 4.3, pig, Hortonworks, MapReduce, HBase 1.4, Zookeeper, Spark, Unix, Kafka 1.1, JSON, Python 3.6,Pyspark

Confidential - St. Louis, MO

Big Data/Hadoop Developer

Responsibilities:

  • Ingested terabytes of click stream data from external systems like FTP Servers and S3 buckets into HDFS using custom Input Adaptors.
  • Implemented end-to-end pipelines for performing user behavioral analytics to identify user-browsing patterns and provide rich experience and personalization to the visitors.
  • Used HDFS File System API to connect to FTP Server and HDFS. S3 AWS SDK for connecting to S3 buckets.
  • Written Scala based Spark applications for performing various data transformations, Denormalization, and other custom processing.
  • Implemented data pipeline using Spark, Hive, Sqoop and Kafka to ingest customer behavioral data into Hadoop platform to perform user behavioral analytics.
  • Developed Spark streaming jobs using Scala for real time processing.
  • Involved in creating external Hive tables from the files stored in the HDFS.
  • Optimized the Hive tables utilizing improvement techniques like partitions and bucketing to give better execution Hive QL queries.
  • Used Spark-SQL to read data from hive tables, and perform various transformations like changing date format and breaking complex columns.
  • Wrote spark application to load the transformed data back into the Hive tables using parquet format.
  • Imported unstructured data like logs from different web servers to HDFS using Flume and developed MapReduce jobs for log analysis, recommendations and analytics.
  • Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS.
  • Worked with NoSQL database Hive, HBase to create tables and store data.
  • Worked on setting up Pig, Hive and HBase on multiple nodes and developed using Pig, Hive, HBase and MapReduce.
  • Analyzed large amount of data every day including XML, JSON and Relational files from different data Sources
  • Applied MapReduce framework jobs in Java for data processing by installing and configuring Hadoop, HDFS.
  • Used Oozie Operational Services for batch processing and scheduling workflows dynamically.

Environment: Hive 2.3, Pig 0.17, HDFS, Flume 1.8, MapReduce, Unix, NoSQL, HBase, Nifi, Python, MySQL, Cassandra 3.11, Hive, Scala, Kafka, Impala, Oozie, Oracle

Confidential - Dallas, TX

Hadoop Developer

Responsibilities:

  • Primary responsibilities include building Scalable distributed data solutions using Hortonworks Hadoop ecosystem.
  • Used SQOOP to transfer data between databases (Oracle & Teradata) and HDFS and used Flume to stream the log data from servers.
  • Developed MapReduce programs for pre-processing and cleansing the data in HDFS obtained from heterogeneous data sources to make it suitable for ingestion into Hive schema for analysis.
  • Experienced in managing and reviewing Hadoop log files.
  • Developed Map Reduce programs in Java for parsing the raw data and populating staging tables.
  • Conducted data extraction that may include analyzing, reviewing, modeling based on requirements using higher Level Tools such as Hive and Pig.
  • Implemented Partitions, Buckets in Hive for optimization.
  • Involved in creating Hive tables, loading structured data and writing Hive queries which will run internally in map reduce way.
  • Used Spark Core and Spark SQL for transformations in python based of the business requirements.
  • Created Hbase tables to store various data formats of data coming from different portfolios.
  • Experience in troubleshooting in MapReduce jobs by reviewing log files.
  • Developed end-to-end search solution using web crawler & Search Platform Apache SOLR.
  • Analytical, organized and enthusiastic to work in a fast paced and team-oriented environment. Expertise in interacting with business users and understanding the requirement and providing solutions to match their requirement.
  • Proactive in time management and problem-solving skills, self-motivated and good analytical skills.

Environment: Hortonworks, Hadoop, Teradata, Spark, Spark SQL, Map Reduce, Hbase, SQL, SQOOP, HDFS, Flume, UML, Apache SOLR, Hive, Oozie, Cassandra, maven, Pig, Shell Scripting, Python, and Git.

Confidential - San Francisco, CA

Java/J2EE Developer

Responsibilities:

  • Worked in SDLC methodology followed Waterfall environment including Acceptance Test Driven Design and Continuous Integration/Delivery.
  • Responsible for analyzing, designing, developing, coordinating and deploying web based application.
  • Developed the application using Spring MVC Framework that uses Model View Controller (MVC) architecture with JSP as the view.
  • Used Spring MVC for the management of application flow by developing configurable handler mappings, view resolution.
  • Used Spring Framework to inject the DAO and Bean objects by auto wiring the components.
  • Developed front end applications using the HTML, CSS, JavaScript, and JQuery.
  • Designed and developed XSLT transformation of components to convert data from XML to HTML.
  • Implemented the project using JAX-WS based Web Services using WSDL, UDDI, and SOAP to communicate with other systems.
  • Monitored the error logs using Log4J Maven is used as a build tool and continuous integration is done using Jenkins.
  • Used complex queries like SQL statements and procedures to fetch the data from the database.
  • Used version control repository GIT and Service now for issue tracking.
  • Developed test cases and performed unit testing using Junit Test cases.
  • Used ANT as build tool and developed build file for compiling the code of creating WAR files.
  • Used Tortoise SVN for Source Control and Version Management.

Environment: Java, J2EE, MVC, JUnit, JavaBeans, HTML, CSS, JavaScript, JQuery, Oracle, Hibernate, SQL, Soap, Eclipse, ANT, Maven

We'd love your feedback!