Sr. Spark /Scala Developer Resume Cupertino, CA - Hire IT People

SUMMARY:

8+ years of professional IT experience in analyzing requirements, designing, building, highly distributed mission critical products and applications.
4+ years of strong end - to-end experience on Hadoop Development and implementing Big Data technologies.
Expertise in core Hadoop and its ecosystem tools like Spark, CASSANDRA, HDFS, MapReduce, Oozie, Hive, Sqoop, Pig, Flume, Kafka, HBase and ZooKeeper .
Experience on NoSQL databases like Cassandra, HBase & Mongo DB and their Integration with Hadoop cluster.
Experienced in Spark Core, Spark SQL, Spark Streaming.
Implemented Scala scripts , UDF's using both Data frames/SQL and RDD/MapReduce in Spark for Data Aggregation, queries and writing data into HDFS through Sqoop.
Developed fan-out workflow using Flume and Kafka for ingesting data from various data sources like Webservers , REST API by using network sources and ingested data into Hadoop with HDFS sink.
Exposure to Cassandra CQL with Java API s to retrieve data from Cassandra tables.
Knowledge on FLUME to extract click stream data from the web server.
Experience in Oozie workflow scheduler to manage Hadoop jobs by DAG of actions with control flows.
Experience in using Zookeeper and Oozie operational services to coordinate clusters and scheduling workflows.
Developed pipeline for constant information ingestion utilizing Kafka, Spark streaming .
Experience designing and building data ingestion pipelines using Kafka, Flume, NIFI frameworks.
Analyze Cassandra database and compare it with other open-source No SQL databases.
Worked with Spark on parallel computing to enhance knowledge about RDD using CASSANDRA.
Good understanding and working experience on Hadoop Distributions like Cloudera and Hortonworks.
Experience in managing large shared MongoDB cluster and managing life cycle of MongoDB including sizing, automation, monitoring and tuning.
Developed Python scripts to monitor health of Mongo databases and perform ad-hoc backups using Mongo dump and Mongo restore.
Experience with Apache TEZ on Hive and PIG to achieve better responsive time while running MR Jobs.
Experience with Hadoop deployment and automation tools such as Ambari, Cloud break, EMR.
Good working knowledge in moving data between HDFS and RDBMS using Sqoop .
Hands-on experience on analyzing data using Hive QL , Pig Latin , and custom MapReduce programs.
Experienced in working with structured data using Hive QL, JOIN opérations, Hive UDFs, Partitions, Bucketing and internal/external tables.
Implemented POC's using Amazon Cloud Components S3, EC2, Elastic beanstalk and Simple Db.
Hands-on creating custom UDFs for Pig and Hive to consolidate strategies and usefulness of Python/Java into Pig Latin and HQL (HiveQL).
Experienced with full text search and implemented data querying with faceted reader search using Solr.
Hands-on experience with monitoring tools to check status of cluster using Cloudera manager.
Worked with cloud infrastructure Amazon AWS concepts like EMR and EC2 web services which provides fast and efficient processing of Big Data.
Implemented Data Quality in ETL Tool Talend and having good knowledge in Data Warehousing and ETL Tools like IBM DataStage , Informatica and Talend.
Knowledge on integrating Kerberos into Hadoop to make cluster more strong and secure for unauthorized users.
Monitor the Hadoop cluster connectivity and security using Ambari monitoring system.
Experienced in backend development using SQL , stored procedures on Oracle 10g and 11g .
Good working knowledge on multithreading Core Java, J2EE, JDBC, j Query, JavaScript, and Web Services ( SOAP, REST ).

TECHNICAL SKILLS:

Big Data/Hadoop Ecosystem: Horton works, Cloudera, Apache, EMR, Hive, Pig, Sqoop, Spark, Kafka, Oozie, Flume, Zookeeper, NIFI, Impala, TEZ

No SQL Databases: Cassandra, HBase, MongoDB

Cloud Services: Amazon AWS

Languages: SQL, PL/SQL, Pig Latin, HiveQL, Unix Shell Scripting, C, Java, Python, Scala

ETL Tools: Informatica, IBM, Data Stage, Talend

Java & J2EE Technologies: Core Java, Servlets, Hibernate, Spring, Struts, JSP, JDBC

Application Servers: Web Logic, Web Sphere, Tomcat

BI Tools: Tableau, Splunk

Databases: Oracle, MySQL, DB2, Teradata, MS SQL Server

Operating Systems: UNIX, Windows, iOS, LINUX

IDE’s: IntelliJ IDEA, Eclipse, Net Beans

WORK EXPERIENCE:

Confidential, Cupertino, CA

Sr. Spark /Scala Developer

Responsibilities:

Import the data from CASSANDRA databases and Stored it into AWS.
Performed transformations on the data using different Spark modules.
Responsible for Spark Core configuration based on type of Input Source.
Executed Spark code using Scala for Spark Streaming/Spark SQL for faster processing of data.
Performed SQL Joins among Hive tables to get input for Spark batch process.
Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs.
Developed Kafka producer and brokers for message handling.
Involved in importing the data to Hadoop using Kafka and implemented the Oozie job for daily imports.
Created Partitions, Buckets based on State to further process using Bucket based Hive joins.
Used Amazon CLI for data transfers to and from Amazon S3 buckets.
Worked with POC ’s for stream processing using Apache NIFI.
Worked on Hortonworks Hadoop Solutions with Real-time Streaming using Apache NIFI.
Executed Hadoop/Spark jobs on AWS EMR using programs and data is stored in S3 Buckets.
Build and configured Apache TEZ on Hive and PIG to achieve better responsive time while running MR Jobs.
Experience in pulling the data from Amazon S3 bucket to data lake and built Hive tables on top of it and created data frames in Spark to perform further analysis.
Implemented Spark RDD transformations, actions to implement business analysis.
Worked on Data serialization formats for converting complex objects into sequence bits by using Avro, Parquet, JSON, CSV formats.
Developed Spark scripts by using Scala shell commands as per the requirement.
Involved in converting Cassandra/Hive/SQL queries into Spark transformations using Spark RDD's in Scala and Python.
Worked with Cassandra for non-relation data storage and retrieval.
Used Amazon Cloud Watch to monitor and track resources on AWS.
Using Spark-Streaming API s to perform transformations on the data from Kafka in real time and persists into Cassandra.
Worked on analyzing and examining customer behavioral data using Cassandra .
Experienced in automating jobs with Event driven and time-based jobs using Oozie workflow.

Environment: Cassandra, Kafka, Spark, Pig, Hive, Oozie, AWS, Solr, SQL, Scala, TEZ, Python, Java, FileZilla, putty, IntelliJ, GitHub.

Confidential, Fort Myers, Fl

Sr. Big data/Hadoop Developer

Responsibilities:

Created reports for the BI team using Sqoop to export data into HDFS and Hive.
Automated and Scheduling the Rules on Weekly, Monthly Basis in TAC (Talend Administration Centre).
Development of Apache Flume client to send data as events to flume sever and stored in file and HDFS.
Developed Flume configuration to extract log data from different resources and transfer data with different file formats (JSON, XML, Parquet) to Hive tables using different SerDe ’s .
Developed PIG Latin scripts to extract the data from the web server output files to load into HDFS.
Involved in developing Hive DDLs to create, alter and drop Hive tables.
Created POC in Cloudera environment for Oracle Migration to HDFS utilizing Talend ETL to replace informatica.
Created concurrent access for Hive tables with shared and exclusive locking that can be enabled in Hive with the help of Zookeeper implementation in the cluster.
Did various performance optimizations like using distributed cache for small datasets, partition and bucketing in Hive , doing map side joins.
Developed the Sqoop scripts to make the interaction between HDFS and RDBMS (Oracle, MySQL).
Plan, deploy, monitor, and maintain Amazon AWS cloud infrastructure consisting of multiple EC2 nodes.
Storing and loading the data from HDFS to Amazon S3 and backing up the Name space data into NFS.
Implemented Name Node backup using NFS . This was done for High availability.
Developed ETL processes to transfer data from different sources, using Sqoop, Impala, and bash.
Used Flume to stream through the log data from various sources.
Configured Flume to extract the data from the web server output files to load into HDFS.
Designed and implemented the MongoDB schema.
Used PIG to perform data validation on the data ingested using Sqoop and Flume and the cleansed data set is pushed into MongoDB.
Wrote services to store and retrieve user data from the MongoDB for the application on devices.
Used Mongoose API to access the MongoDB from Node JS .
Involved in development of Talend components to validate the data quality across different data sources.
Created and Implemented Business validation and coverage Price Gap Rules in Talend on Hive .
Written the shell scripts to monitor the data of Hadoop daemon services and respond accordingly to any warning or failure conditions.
Wrote shell scripts for rolling day-to-day processes and it is automated.

Environment: Apache Flume, Hive, Pig, HDFS, Zookeeper, Sqoop, RDBMS, AWS, MongoDB, Talend, Shell Scripts, Eclipse.

Confidential, Chicago, IL

Hadoop Developer

Responsibilities:

Developed MapReduce/ EMR jobs to analyze the data and provide heuristics and reports. The heuristics were used for improving campaign targeting and efficiency.
Responsible for building scalable distributed data solutions using Hadoop .
Developed Simple to complex MapReduce Jobs that are implemented using Hive and Pig .
Analyzed the data by performing Hive queries ( HiveQL ) and running Pig scripts (Pig Latin) to study customer behavior and used UDF's to implement business logic in Hadoop.
Managing and Reviewing Hadoop Log Files , deploy and Maintaining Hadoop Cluster.
Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, No SQL.
Supporting HBase Architecture Design with the Hadoop Architect group to build up a Database Design in HDFS.
Experience in creating tables, dropping, and altered at run time without blocking updates and queries using HBase .
Wrote Flume configuration files for importing streaming log data into HBase with Flume .
Exported data from HDFS into RDBMS using Sqoop for report generation and visualization purpose.
Developed a data pipeline using Kafka and Storm to store data into HDFS.
Using Oozie workflows and enabled email alerts on any failure cases.
Managed and reviewed Hadoop log files to identify issues when job fails.
Used Pig to convert the fixed width file to delimited file.
Supported in setting up updating configurations for implementing scripts with Pig and Sqoop.
Developed Kafka producer and consumer components for real time data processing.

Environment: MapReduce, Hive, Pig, Sqoop, Oozie, HBase, Flume, Cloudera, HDFS, RDBMS

Confidential, Woodstock, GA

Hadoop Developer

Responsibilities:

Worked on analyzing Hadoop cluster and different big data analytic tools including Pig , Sqoop, Zookeeper.
Exported the analyzed data to the relational database using Sqoop for visualization and to generate reports for the BI team.
Importing and exporting data into HDFS and Hive using Sqoop between SQL server.
Developed Hive queries to process the data for visualizing.
Developed multiple MapReduce jobs in Java for data cleaning.
Created Pig Latin scripts to sort, group, join and filter the enterprise wise data.
Involved in creating Hive tables, loading with data, and performing transformations on the data imported.
Supported MapReduce programs those are running on the cluster.
Implemented Frameworks using Java and Python to automate the ingestion flow.
Worked on tuning the performance on Pig queries.
Worked on Zoo Keeper for coordinating between different master node and data nodes.
Responsible for cluster maintenance, adding and removing cluster nodes, cluster monitoring and troubleshooting, manage and review data backups, manage and review Hadoop log files.

Environment: HDFS, MapReduce, Hive, Pig, Sqoop, Linux, Java, python, Oozie, Zookeeper, SQL Server, Linux.

Confidential

Java Developer

Responsibilities:

Involved in End to End Design and Development of UI Layer, Service Layer, and Persistence Layer.
Implemented Spring MVC for designing and implementing the UI Layer for the application.
Implemented UI screens using JSF for defining and executing UI flow in the application.
Have used AJAX to retrieve data from server synchronously in the background without interfering with the display and existing page in an interactive way.
Have Used DWR (Direct Web Remoting) generated script to make AJAX calls to JAVA .
Worked with JavaScript for dynamic manipulation of the elements on the screen and to validate the input.
Involved in writing Spring Validator Classes for validating the input data.
Have used JAXB to marshal & unmarshal java objects to Communicate with the backend mainframe system.
Involved in writing complex PL/SQL and SQL blocks for the application.
Involved in Unit Testing, User Acceptance Testing and Bug Fixing.

Environment: MVC Architecture, Spring-IOC, Hibernate-JPA, JAVA Web Services-REST, Spring Batch, XML, Java Script, Oracle, Database (10g) and JSF UI, JUnits.

Confidential

Java Developer

Responsibilities:

Used Servlets, Struts, JSP and Java Beans for developing the Performance module using Legacy Code.
Involved in coding for JSP pages, Form Beans and Action Classes in Struts .
Created Custom Tag Libraries to support the Struts framework .
Involved in Writing Validations.
Involved in Database Connectivity through JDBC .
Implemented JDBC for mapping an object-oriented domain model to a traditional relational database.
Created SQL queries, Sequences, Views for the back-end database in Oracle database.
Developed JUnit Test cases and performed application testing for QC team.
Used JavaScript for client-side validations.
Developed dynamic JSP pages with Struts .
Participated in weekly project meetings, updates, and Provided Estimates for the assigned task.

Environment: JAVA, J2EE, STRUTS, JSP, JDBC, ANT, XML, IBM Web Sphere, WSAD, JUNIT, DB2, Rational Rose, CVS, SOAP, and RUP.

We provide IT Staff Augmentation Services!

Sr. Spark /scala Developer Resume

Cupertino, CA

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship