Hadoop-Spark Developer Resume San Jose, CA - Hire IT People

SUMMARY:

Having 8+ years of experience in software design, development, implementation, and support of various applications like Big Data (Hadoop) and Java technologies.
3.6 years of experience with Hadoop Ecosystem including Spark, Scala, HDFS, Map Reduce, Hive, Pig, Storm, Kafka, YARN, HBase, Oozie, Zookeeper, Flume, Sqoop
Assisted in Cluster maintenance, Cluster Monitoring and Troubleshooting, Managing and Reviewing data backups and log files.
Excellent ability to use analytical tools to mine data, Predictive analysis, evaluating the underlying patterns and implement complex algorithms for data analysis.
1.5 Year Hands On experience on SPARK, Spark Streaming, Spark MLlib, SCALA,
Creating the Data Frames handle in SPARK with Scala
Hands On experience on developing UDF, DATA Frames and SQL Queries in SPARK SQL
Developed PIG Latin scripts and SPARKSQL scripts for handling data formation.
Hands on experience on Real Time data tools like Kafka and Storm.
Developed SQOOP Scripts for importing large dataset from RDBMS to HDFS
Creating the UDF’s in Java and Register them in PIG and HIVE
Good understanding on Spark architecture and its components.
Experience in writing Pig Latin Scripts.
Experience in writing UDF’s in Java for PIG and Hive.
Efficient in writing the Map Reduce programs for analyzing structured and unstructured data.
Expertise in working with Hive data warehouse tool - creating tables, data distribution by implementing partitioning and bucketing, writing and optimizing the HiveQL queries.
Experience in using Apache Sqoop to import and export data to and from HDFS and Hive.
Hands on experience in setting up workflow using Apache Oozie workflow engine for managing and scheduling Hadoop job
Experience in scheduling the jobs using Oozie Coordinator, Bundler and Crontab. Cloud Infrastructure:
Experience with AWS components like Amazon Ec2 instances, S3 buckets and Cloud Formation templates and Boto library.
Experience with Azure Components like Azure Sql Database and Data Factory.
Experienced in working with different file formats - Avro, Parquet, RC and ORC.
Hands on experience in configuring and working with Flume to load the data from multiple sources directly into HDFS.
Experienced and skilled Agile Developer with a strong record of excellent teamwork and
Successful coding.

TECHNICAL SKILLS:

Hadoop Technologies and Distributions: Apache Hadoop, Cloudera Hadoop Distribution CDH3, CDH4, CDH5 and Horton works Data Platform (HDP)

Hadoop Ecosystem: HDFS, Hive, Pig, Sqoop, Oozie, Flume, Spark, Zookeeper, Map-Reduce, Spark-SQL, Spark Streaming and Spark MLib.

NoSQL Databases: HBase, Cassandra

Programming: C, C++, Python, Java, SCALA, PL/SQL, SBT, MAVEN

RDBMS: ORACLE, MySQL, SQL Server

Web Development: HTML, JSP, Servlets, JavaScript, CSS, XML

IDE: Eclipse4.x, NetBeans, Microsoft Visual Studio

Operating Systems: Linux (RedHat, CentOS), Windows XP/7/8 and Z/OS (Main Frames)

Web Servers: Apache Tomcat

Cluster Management Tools: Cloudera Manager, Horton Works Ambari and Hadoop Security Tools

PROFESSIONAL EXPERIENCE:

Confidential, San Jose, CA

Hadoop-Spark Developer

Responsibilities:

Involved in creating Hive tables, and loading and analyzing data using hive queries.
Optimizing of existing algorithms in Hadoop using Spark Context, Spark-Sql, Data Frames and Pair RDD’s.
Worked on Cluster of size 135 nodes.
Implemented Spark using Scala and utilizing Data frames and Spark SQL API for faster processing of data
Created RDD’s, Data Frames and Datasets.
Created Hive Tables, loaded data from Teradata using Sqoop.
Worked on tuning of back-end stored procedures using TOAD.
Good experience with Talend Open Studio for designing ETL Jobs for Processing of data.
Used ORC, Parquet file formats for storing the data.
Used java code for Sql Queries and also code to retrieve the Sql Queries through Text File.
Responsible for troubleshooting issues in the execution of MapReduce jobs by inspecting and reviewing log files.
Used Sqoop to transfer data between RDBMS and Hadoop Distributed File System.
Used Eclipse for the Development, Testing and Debugging of the application.
Use python for writing script to move the data cluster to cluster.
Log4j framework has been used for logging debug, info & error data.
Created Hive External and Managed tables.
Designed and Maintained Airflow configs workflows to manage the flow of jobs in the cluster
Loaded the Spark RDD and do in memory data Computation to generate the Output response.
Sometimes variable needs to share across the Nodes. So, in such cases we used shared variables: Broadcast Variable, Accumulator.

Environment: Hadoop, MapReduce, Hive, pig, spring batch, Scala, Sqoop, Bash Scripting, Spark RDD, Spark Sql, Spark Data Frames

Confidential, Horsham, PA

Hadoop Developer

Responsibilities:

Involved in Design and Development of technical specifications.
Written shell scripts to pull the data from Tumbleweed server to cornerstone staging area.
Data conversion from EBICIDC to ASCII format.
Written Sqoop commands to pull the data from Teradata Source.
Written Pig scripts to preprocess the data before loading to cornerstone.
Optimization of Hive Scripts.
Registration of feeds metadata in MYSQL tables.
Written shell scripts and scheduled our jobs through UNIX crons.
Written Job Work flows using Spring Batch.
Worked on Project deployment from Gold cluster to platinum Cluster.
Provide support for PRD Support Team.
Closely worked with Hadoop security team and infrastructure team to implement security.
Implemented authentication and authorization service using Kerberos authentication Protocol.
Designed and implemented streaming data on UI with Scala.js
Hands-on experience with systems-building languages such as Scala, Java
Programs for Validation/Normalizing/Enriching and REST API to Develop UI Based on manual QA Validation. Used Spark SQL, Scala to running QA based SQL queries.
Creating RDD's and Pair RDD's for Spark Programming.
Implement Joins, Grouping and Aggregations for the Pair RDD's.
Save the result in HIVE for the downstream to access the data.
Use Data frames for data transformations.

Environment: Hadoop, MapReduce, Hive, pig, spring batch, Scala, Sqoop, Bash Scripting, Spark RDD, Spark Sql.

Confidential, Kansas, MO

Hadoop Developer

Responsibilities:

Developed Big Data Solutions that enabled the business and technology teams to make data-driven decisions on the best ways to acquire customers and provide them business solutions.
Involved in installing, configuring and managing Hadoop Ecosystem components like Spark, Hive, Pig, Sqoop, Kafka and Flume.
Involved in installing Hadoop and Spark Cluster in Amazon Web Server.
Work Amazon Ec2 instances, S3 buckets and Cloud Formation templates and Boto library.
Migrated the existing data to Hadoop from RDBMS (SQL Server and Oracle) using Sqoop for processing the data.
Responsible for Data Ingestion like Flume and Kafka.
Responsible for loading unstructured and semi-structured data into Hadoop cluster coming from different sources using Flume and managing.
Developed Spark Programs for Batch and Real Time Processing.
Developed Spark Streaming applications for Real Time Processing.
Developed MapReduce programs to cleanse and parse data in HDFS obtained from various data sources and to perform joins on the Map side using distributed cache.
Used Hive data warehouse tool to analyze the data in HDFS and developed Hive queries.
Created internal and external tables with properly defined static and dynamic partitions for efficiency.
Used the RegEx, JSON and Avro SerDe’s for serialization and de-serialization packaged with Hive to parse the contents of streamed log data.
Implemented Hive custom UDF’s to achieve comprehensive data analysis.
Uses Talend Open Studio to load files into Hadoop HIVE tables and performed ETL Aggregations in Hadoop HIVE.
Designing & Creating ETL Jobs through Talend to load huge volumes of data into Cassandra, Hadoop Ecosystem and relational databases.
Uses Talend Open Studio to load files into Hadoop HIVE tables and performed ETL Aggregations in Hadoop HIVE.
Implemented authentication and authorization service using Kerberos authentication Protocol
Used Pig to develop ad-hoc queries.
Exported the business required information to RDBMS using Sqoop to make the data available for BI team to generate reports based on data.
Implemented daily workflow for extraction, processing and analysis of data with Oozie.
Responsible for troubleshooting MapReduce jobs by reviewing the log files.

Environment: Hadoop, Spark, Spark Streaming, Spark MLlib, Scala Hive, Pig, Hcatalog, MapReduce, Oozie, Sqoop, Flume and Kafka, Kerberos.

Confidential, Grand Rapids, Michigan

Hadoop Developer

Responsibilities:

Loading files to HDFS and writing hive queries to process required data.
Loading data to hive tables and writing queries to process.
Involved in loading data from LINUX file system to HDFS.
Load and transform large sets of structured, semi structured and unstructured data.
Experience in managing and reviewing Hadoop log files.
Worked on Hive for exposing data for further analysis and for generating transforming Files from different analytical formats to text files.
Importing and exporting data into HDFS and Hive using Sqoop.
Involved in creating Hive tables, loading with data and writing hive queries that will run internally in MapReduce way.
Worked on configuring multiple MapReduce Pipelines, for the new Hadoop Cluster.
Performance tuned and optimized Hadoop clusters to achieve high performance.
Written Hive queries for data analysis to meet the business requirements.
Monitored System health and logs and respond accordingly to any warning or failure Conditions.
Responsible to manage the test data coming from different sources.
Involved in scheduling Oozie workflow engine to run multiple Hive and pig jobs
Weekly meetings with technical collaborators and active participation in code review Sessions with senior and junior developers.
Created and maintained Technical documentation for launching Hadoop Clusters and for Executing Hive queries and Pig Scripts
Implemented schedulers on the Job tracker to share the resources of the cluster for the MapReduce jobs given by the users.
Extensive hands on experience in Hadoop file system commands for file handling Operations.

Environment: Hadoop, Map Reduce, HDFS, Hive 0.10.1, Java, Hadoop distribution of Cloudera, Pig 0.11.1, HBase 0.94.1, Linux, Sqoop 1.4.4, Kafka, Zookeeper 3.4.3, Oozie 3.3.0, Tableau.

Confidential

Java Developer

Responsibilities:

Implemented Microsoft Visio and Rational Rose for designing the Use Case Diagrams, Class
Model, Sequence diagrams, and Activity diagrams for SDLC process of the application.
Deployed GUI pages by using JSP, JSTL, HTML, DHTML, XHTML, CSS, JavaScript, AJAX
Configured the project on Web Sphere 6.1 application servers
Implemented the online application by using Core Java, JDBC, JSP, Servlets and EJB 1.1,
Web Services, SOAP, WSDL
Communicated with other Health Care info by using Web Services with the help of SOAP,
WSDL JAX-RPC
Used Singleton, factory design pattern, DAO Design Patterns based on the application requirements
Used SAX and DOM parsers to parse the raw XML documents
Used RAD as Development IDE for web applications.
Preparing and executing Unit test cases
Used Log4J logging framework to write Log messages with various levels.
Involved in fixing bugs and minor enhancements for the front-end modules.
Doing functional and technical reviews
Maintenance in the testing team for System testing/Integration/UAT
Guaranteeing quality in the deliverables.
Conducted Design reviews and Technical reviews with other project stakeholders.
Was a part of the complete life cycle of the project from the requirements to the
Production support.
Created test plan documents for all back-end database modules.
Implemented the project in Linux environment.

Environment: JDK 1.5, JSP, Web Sphere, JDBC, EJB2.0, XML, DOM, SAX, XSLT, CSS, HTML, JNDI, Web

We provide IT Staff Augmentation Services!

Hadoop-spark Developer Resume

San Jose, CA

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship