We provide IT Staff Augmentation Services!

Hadoop-spark Developer Resume

4.00/5 (Submit Your Rating)

San Jose, CA

SUMMARY:

  • Having 8+ years of experience in software design, development, implementation, and support of various applications like Big Data (Hadoop) and Java technologies.
  • 3.6 years of experience with Hadoop Ecosystem including Spark, Scala, HDFS, Map Reduce, Hive, Pig, Storm, Kafka, YARN, HBase, Oozie, Zookeeper, Flume, Sqoop
  • Assisted in Cluster maintenance, Cluster Monitoring and Troubleshooting, Managing and Reviewing data backups and log files.
  • Excellent ability to use analytical tools to mine data, Predictive analysis, evaluating the underlying patterns and implement complex algorithms for data analysis.
  • 1.5 Year Hands On experience on SPARK, Spark Streaming, Spark MLlib, SCALA,
  • Creating the Data Frames handle in SPARK with Scala
  • Hands On experience on developing UDF, DATA Frames and SQL Queries in SPARK SQL
  • Developed PIG Latin scripts and SPARKSQL scripts for handling data formation.
  • Hands on experience on Real Time data tools like Kafka and Storm.
  • Developed SQOOP Scripts for importing large dataset from RDBMS to HDFS
  • Creating the UDF’s in Java and Register them in PIG and HIVE
  • Good understanding on Spark architecture and its components.
  • Experience in writing Pig Latin Scripts.
  • Experience in writing UDF’s in Java for PIG and Hive.
  • Efficient in writing the Map Reduce programs for analyzing structured and unstructured data.
  • Expertise in working with Hive data warehouse tool - creating tables, data distribution by implementing partitioning and bucketing, writing and optimizing the HiveQL queries.
  • Experience in using Apache Sqoop to import and export data to and from HDFS and Hive.
  • Hands on experience in setting up workflow using Apache Oozie workflow engine for managing and scheduling Hadoop job
  • Experience in scheduling the jobs using Oozie Coordinator, Bundler and Crontab. Cloud Infrastructure:
  • Experience with AWS components like Amazon Ec2 instances, S3 buckets and Cloud Formation templates and Boto library.
  • Experience with Azure Components like Azure Sql Database and Data Factory.
  • Experienced in working with different file formats - Avro, Parquet, RC and ORC.
  • Hands on experience in configuring and working with Flume to load the data from multiple sources directly into HDFS.
  • Experienced and skilled Agile Developer with a strong record of excellent teamwork and
  • Successful coding.

TECHNICAL SKILLS:

Hadoop Technologies and Distributions: Apache Hadoop, Cloudera Hadoop Distribution CDH3, CDH4, CDH5 and Horton works Data Platform (HDP)

Hadoop Ecosystem: HDFS, Hive, Pig, Sqoop, Oozie, Flume, Spark, Zookeeper, Map-Reduce, Spark-SQL, Spark Streaming and Spark MLib.

NoSQL Databases: HBase, Cassandra

Programming: C, C++, Python, Java, SCALA, PL/SQL, SBT, MAVEN

RDBMS: ORACLE, MySQL, SQL Server

Web Development: HTML, JSP, Servlets, JavaScript, CSS, XML

IDE: Eclipse4.x, NetBeans, Microsoft Visual Studio

Operating Systems: Linux (RedHat, CentOS), Windows XP/7/8 and Z/OS (Main Frames)

Web Servers: Apache Tomcat

Cluster Management Tools: Cloudera Manager, Horton Works Ambari and Hadoop Security Tools

PROFESSIONAL EXPERIENCE:

Confidential, San Jose, CA

Hadoop-Spark Developer

Responsibilities:

  • Involved in creating Hive tables, and loading and analyzing data using hive queries.
  • Optimizing of existing algorithms in Hadoop using Spark Context, Spark-Sql, Data Frames and Pair RDD’s.
  • Worked on Cluster of size 135 nodes.
  • Implemented Spark using Scala and utilizing Data frames and Spark SQL API for faster processing of data
  • Created RDD’s, Data Frames and Datasets.
  • Created Hive Tables, loaded data from Teradata using Sqoop.
  • Worked on tuning of back-end stored procedures using TOAD.
  • Good experience with Talend Open Studio for designing ETL Jobs for Processing of data.
  • Used ORC, Parquet file formats for storing the data.
  • Used java code for Sql Queries and also code to retrieve the Sql Queries through Text File.
  • Responsible for troubleshooting issues in the execution of MapReduce jobs by inspecting and reviewing log files.
  • Used Sqoop to transfer data between RDBMS and Hadoop Distributed File System.
  • Used Eclipse for the Development, Testing and Debugging of the application.
  • Use python for writing script to move the data cluster to cluster.
  • Log4j framework has been used for logging debug, info & error data.
  • Created Hive External and Managed tables.
  • Designed and Maintained Airflow configs workflows to manage the flow of jobs in the cluster
  • Loaded the Spark RDD and do in memory data Computation to generate the Output response.
  • Sometimes variable needs to share across the Nodes. So, in such cases we used shared variables: Broadcast Variable, Accumulator.

Environment: Hadoop, MapReduce, Hive, pig, spring batch, Scala, Sqoop, Bash Scripting, Spark RDD, Spark Sql, Spark Data Frames

Confidential, Horsham, PA

Hadoop Developer

Responsibilities:

  • Involved in Design and Development of technical specifications.
  • Written shell scripts to pull the data from Tumbleweed server to cornerstone staging area.
  • Data conversion from EBICIDC to ASCII format.
  • Written Sqoop commands to pull the data from Teradata Source.
  • Written Pig scripts to preprocess the data before loading to cornerstone.
  • Optimization of Hive Scripts.
  • Registration of feeds metadata in MYSQL tables.
  • Written shell scripts and scheduled our jobs through UNIX crons.
  • Written Job Work flows using Spring Batch.
  • Worked on Project deployment from Gold cluster to platinum Cluster.
  • Provide support for PRD Support Team.
  • Closely worked with Hadoop security team and infrastructure team to implement security.
  • Implemented authentication and authorization service using Kerberos authentication Protocol.
  • Designed and implemented streaming data on UI with Scala.js
  • Hands-on experience with systems-building languages such as Scala, Java
  • Programs for Validation/Normalizing/Enriching and REST API to Develop UI Based on manual QA Validation. Used Spark SQL, Scala to running QA based SQL queries.
  • Creating RDD's and Pair RDD's for Spark Programming.
  • Implement Joins, Grouping and Aggregations for the Pair RDD's.
  • Save the result in HIVE for the downstream to access the data.
  • Use Data frames for data transformations.

Environment: Hadoop, MapReduce, Hive, pig, spring batch, Scala, Sqoop, Bash Scripting, Spark RDD, Spark Sql.

Confidential, Kansas, MO

Hadoop Developer

Responsibilities:

  • Developed Big Data Solutions that enabled the business and technology teams to make data-driven decisions on the best ways to acquire customers and provide them business solutions.
  • Involved in installing, configuring and managing Hadoop Ecosystem components like Spark, Hive, Pig, Sqoop, Kafka and Flume.
  • Involved in installing Hadoop and Spark Cluster in Amazon Web Server.
  • Work Amazon Ec2 instances, S3 buckets and Cloud Formation templates and Boto library.
  • Migrated the existing data to Hadoop from RDBMS (SQL Server and Oracle) using Sqoop for processing the data.
  • Responsible for Data Ingestion like Flume and Kafka.
  • Responsible for loading unstructured and semi-structured data into Hadoop cluster coming from different sources using Flume and managing.
  • Developed Spark Programs for Batch and Real Time Processing.
  • Developed Spark Streaming applications for Real Time Processing.
  • Developed MapReduce programs to cleanse and parse data in HDFS obtained from various data sources and to perform joins on the Map side using distributed cache.
  • Used Hive data warehouse tool to analyze the data in HDFS and developed Hive queries.
  • Created internal and external tables with properly defined static and dynamic partitions for efficiency.
  • Used the RegEx, JSON and Avro SerDe’s for serialization and de-serialization packaged with Hive to parse the contents of streamed log data.
  • Implemented Hive custom UDF’s to achieve comprehensive data analysis.
  • Uses Talend Open Studio to load files into Hadoop HIVE tables and performed ETL Aggregations in Hadoop HIVE.
  • Designing & Creating ETL Jobs through Talend to load huge volumes of data into Cassandra, Hadoop Ecosystem and relational databases.
  • Uses Talend Open Studio to load files into Hadoop HIVE tables and performed ETL Aggregations in Hadoop HIVE.
  • Implemented authentication and authorization service using Kerberos authentication Protocol
  • Used Pig to develop ad-hoc queries.
  • Exported the business required information to RDBMS using Sqoop to make the data available for BI team to generate reports based on data.
  • Implemented daily workflow for extraction, processing and analysis of data with Oozie.
  • Responsible for troubleshooting MapReduce jobs by reviewing the log files.

Environment: Hadoop, Spark, Spark Streaming, Spark MLlib, Scala Hive, Pig, Hcatalog, MapReduce, Oozie, Sqoop, Flume and Kafka, Kerberos.

Confidential, Grand Rapids, Michigan

Hadoop Developer

Responsibilities:

  • Loading files to HDFS and writing hive queries to process required data.
  • Loading data to hive tables and writing queries to process.
  • Involved in loading data from LINUX file system to HDFS.
  • Load and transform large sets of structured, semi structured and unstructured data.
  • Experience in managing and reviewing Hadoop log files.
  • Worked on Hive for exposing data for further analysis and for generating transforming Files from different analytical formats to text files.
  • Importing and exporting data into HDFS and Hive using Sqoop.
  • Involved in creating Hive tables, loading with data and writing hive queries that will run internally in MapReduce way.
  • Worked on configuring multiple MapReduce Pipelines, for the new Hadoop Cluster.
  • Performance tuned and optimized Hadoop clusters to achieve high performance.
  • Written Hive queries for data analysis to meet the business requirements.
  • Monitored System health and logs and respond accordingly to any warning or failure Conditions.
  • Responsible to manage the test data coming from different sources.
  • Involved in scheduling Oozie workflow engine to run multiple Hive and pig jobs
  • Weekly meetings with technical collaborators and active participation in code review Sessions with senior and junior developers.
  • Created and maintained Technical documentation for launching Hadoop Clusters and for Executing Hive queries and Pig Scripts
  • Implemented schedulers on the Job tracker to share the resources of the cluster for the MapReduce jobs given by the users.
  • Extensive hands on experience in Hadoop file system commands for file handling Operations.

Environment: Hadoop, Map Reduce, HDFS, Hive 0.10.1, Java, Hadoop distribution of Cloudera, Pig 0.11.1, HBase 0.94.1, Linux, Sqoop 1.4.4, Kafka, Zookeeper 3.4.3, Oozie 3.3.0, Tableau.

Confidential

Java Developer

Responsibilities:

  • Implemented Microsoft Visio and Rational Rose for designing the Use Case Diagrams, Class
  • Model, Sequence diagrams, and Activity diagrams for SDLC process of the application.
  • Deployed GUI pages by using JSP, JSTL, HTML, DHTML, XHTML, CSS, JavaScript, AJAX
  • Configured the project on Web Sphere 6.1 application servers
  • Implemented the online application by using Core Java, JDBC, JSP, Servlets and EJB 1.1,
  • Web Services, SOAP, WSDL
  • Communicated with other Health Care info by using Web Services with the help of SOAP,
  • WSDL JAX-RPC
  • Used Singleton, factory design pattern, DAO Design Patterns based on the application requirements
  • Used SAX and DOM parsers to parse the raw XML documents
  • Used RAD as Development IDE for web applications.
  • Preparing and executing Unit test cases
  • Used Log4J logging framework to write Log messages with various levels.
  • Involved in fixing bugs and minor enhancements for the front-end modules.
  • Doing functional and technical reviews
  • Maintenance in the testing team for System testing/Integration/UAT
  • Guaranteeing quality in the deliverables.
  • Conducted Design reviews and Technical reviews with other project stakeholders.
  • Was a part of the complete life cycle of the project from the requirements to the
  • Production support.
  • Created test plan documents for all back-end database modules.
  • Implemented the project in Linux environment.

Environment: JDK 1.5, JSP, Web Sphere, JDBC, EJB2.0, XML, DOM, SAX, XSLT, CSS, HTML, JNDI, Web

We'd love your feedback!