Sr. Spark /scala Developer Resume
Cupertino, CA
SUMMARY:
- 8+ years of professional IT experience in analyzing requirements, designing, building, highly distributed mission critical products and applications.
- 4+ years of strong end - to-end experience on Hadoop Development and implementing Big Data technologies.
- Expertise in core Hadoop and its ecosystem tools like Spark, CASSANDRA, HDFS, MapReduce, Oozie, Hive, Sqoop, Pig, Flume, Kafka, HBase and ZooKeeper .
- Experience on NoSQL databases like Cassandra, HBase & Mongo DB and their Integration with Hadoop cluster.
- Experienced in Spark Core, Spark SQL, Spark Streaming.
- Implemented Scala scripts , UDF's using both Data frames/SQL and RDD/MapReduce in Spark for Data Aggregation, queries and writing data into HDFS through Sqoop.
- Developed fan-out workflow using Flume and Kafka for ingesting data from various data sources like Webservers , REST API by using network sources and ingested data into Hadoop with HDFS sink.
- Exposure to Cassandra CQL with Java API s to retrieve data from Cassandra tables.
- Knowledge on FLUME to extract click stream data from the web server.
- Experience in Oozie workflow scheduler to manage Hadoop jobs by DAG of actions with control flows.
- Experience in using Zookeeper and Oozie operational services to coordinate clusters and scheduling workflows.
- Developed pipeline for constant information ingestion utilizing Kafka, Spark streaming .
- Experience designing and building data ingestion pipelines using Kafka, Flume, NIFI frameworks.
- Analyze Cassandra database and compare it with other open-source No SQL databases.
- Worked with Spark on parallel computing to enhance knowledge about RDD using CASSANDRA.
- Good understanding and working experience on Hadoop Distributions like Cloudera and Hortonworks.
- Experience in managing large shared MongoDB cluster and managing life cycle of MongoDB including sizing, automation, monitoring and tuning.
- Developed Python scripts to monitor health of Mongo databases and perform ad-hoc backups using Mongo dump and Mongo restore.
- Experience with Apache TEZ on Hive and PIG to achieve better responsive time while running MR Jobs.
- Experience with Hadoop deployment and automation tools such as Ambari, Cloud break, EMR.
- Good working knowledge in moving data between HDFS and RDBMS using Sqoop .
- Hands-on experience on analyzing data using Hive QL , Pig Latin , and custom MapReduce programs.
- Experienced in working with structured data using Hive QL, JOIN opérations, Hive UDFs, Partitions, Bucketing and internal/external tables.
- Implemented POC's using Amazon Cloud Components S3, EC2, Elastic beanstalk and Simple Db.
- Hands-on creating custom UDFs for Pig and Hive to consolidate strategies and usefulness of Python/Java into Pig Latin and HQL (HiveQL).
- Experienced with full text search and implemented data querying with faceted reader search using Solr.
- Hands-on experience with monitoring tools to check status of cluster using Cloudera manager.
- Worked with cloud infrastructure Amazon AWS concepts like EMR and EC2 web services which provides fast and efficient processing of Big Data.
- Implemented Data Quality in ETL Tool Talend and having good knowledge in Data Warehousing and ETL Tools like IBM DataStage , Informatica and Talend.
- Knowledge on integrating Kerberos into Hadoop to make cluster more strong and secure for unauthorized users.
- Monitor the Hadoop cluster connectivity and security using Ambari monitoring system.
- Experienced in backend development using SQL , stored procedures on Oracle 10g and 11g .
- Good working knowledge on multithreading Core Java, J2EE, JDBC, j Query, JavaScript, and Web Services ( SOAP, REST ).
TECHNICAL SKILLS:
Big Data/Hadoop Ecosystem: Horton works, Cloudera, Apache, EMR, Hive, Pig, Sqoop, Spark, Kafka, Oozie, Flume, Zookeeper, NIFI, Impala, TEZ
No SQL Databases: Cassandra, HBase, MongoDB
Cloud Services: Amazon AWS
Languages: SQL, PL/SQL, Pig Latin, HiveQL, Unix Shell Scripting, C, Java, Python, Scala
ETL Tools: Informatica, IBM, Data Stage, Talend
Java & J2EE Technologies: Core Java, Servlets, Hibernate, Spring, Struts, JSP, JDBC
Application Servers: Web Logic, Web Sphere, Tomcat
BI Tools: Tableau, Splunk
Databases: Oracle, MySQL, DB2, Teradata, MS SQL Server
Operating Systems: UNIX, Windows, iOS, LINUX
IDE’s: IntelliJ IDEA, Eclipse, Net Beans
WORK EXPERIENCE:
Confidential, Cupertino, CA
Sr. Spark /Scala Developer
Responsibilities:
- Import the data from CASSANDRA databases and Stored it into AWS.
- Performed transformations on the data using different Spark modules.
- Responsible for Spark Core configuration based on type of Input Source.
- Executed Spark code using Scala for Spark Streaming/Spark SQL for faster processing of data.
- Performed SQL Joins among Hive tables to get input for Spark batch process.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs.
- Developed Kafka producer and brokers for message handling.
- Involved in importing the data to Hadoop using Kafka and implemented the Oozie job for daily imports.
- Created Partitions, Buckets based on State to further process using Bucket based Hive joins.
- Used Amazon CLI for data transfers to and from Amazon S3 buckets.
- Worked with POC ’s for stream processing using Apache NIFI.
- Worked on Hortonworks Hadoop Solutions with Real-time Streaming using Apache NIFI.
- Executed Hadoop/Spark jobs on AWS EMR using programs and data is stored in S3 Buckets.
- Build and configured Apache TEZ on Hive and PIG to achieve better responsive time while running MR Jobs.
- Experience in pulling the data from Amazon S3 bucket to data lake and built Hive tables on top of it and created data frames in Spark to perform further analysis.
- Implemented Spark RDD transformations, actions to implement business analysis.
- Worked on Data serialization formats for converting complex objects into sequence bits by using Avro, Parquet, JSON, CSV formats.
- Developed Spark scripts by using Scala shell commands as per the requirement.
- Involved in converting Cassandra/Hive/SQL queries into Spark transformations using Spark RDD's in Scala and Python.
- Worked with Cassandra for non-relation data storage and retrieval.
- Used Amazon Cloud Watch to monitor and track resources on AWS.
- Using Spark-Streaming API s to perform transformations on the data from Kafka in real time and persists into Cassandra.
- Worked on analyzing and examining customer behavioral data using Cassandra .
- Experienced in automating jobs with Event driven and time-based jobs using Oozie workflow.
Environment: Cassandra, Kafka, Spark, Pig, Hive, Oozie, AWS, Solr, SQL, Scala, TEZ, Python, Java, FileZilla, putty, IntelliJ, GitHub.
Confidential, Fort Myers, Fl
Sr. Big data/Hadoop Developer
Responsibilities:
- Created reports for the BI team using Sqoop to export data into HDFS and Hive.
- Automated and Scheduling the Rules on Weekly, Monthly Basis in TAC (Talend Administration Centre).
- Development of Apache Flume client to send data as events to flume sever and stored in file and HDFS.
- Developed Flume configuration to extract log data from different resources and transfer data with different file formats (JSON, XML, Parquet) to Hive tables using different SerDe ’s .
- Developed PIG Latin scripts to extract the data from the web server output files to load into HDFS.
- Involved in developing Hive DDLs to create, alter and drop Hive tables.
- Created POC in Cloudera environment for Oracle Migration to HDFS utilizing Talend ETL to replace informatica.
- Created concurrent access for Hive tables with shared and exclusive locking that can be enabled in Hive with the help of Zookeeper implementation in the cluster.
- Did various performance optimizations like using distributed cache for small datasets, partition and bucketing in Hive , doing map side joins.
- Developed the Sqoop scripts to make the interaction between HDFS and RDBMS (Oracle, MySQL).
- Plan, deploy, monitor, and maintain Amazon AWS cloud infrastructure consisting of multiple EC2 nodes.
- Storing and loading the data from HDFS to Amazon S3 and backing up the Name space data into NFS.
- Implemented Name Node backup using NFS . This was done for High availability.
- Developed ETL processes to transfer data from different sources, using Sqoop, Impala, and bash.
- Used Flume to stream through the log data from various sources.
- Configured Flume to extract the data from the web server output files to load into HDFS.
- Designed and implemented the MongoDB schema.
- Used PIG to perform data validation on the data ingested using Sqoop and Flume and the cleansed data set is pushed into MongoDB.
- Wrote services to store and retrieve user data from the MongoDB for the application on devices.
- Used Mongoose API to access the MongoDB from Node JS .
- Involved in development of Talend components to validate the data quality across different data sources.
- Created and Implemented Business validation and coverage Price Gap Rules in Talend on Hive .
- Written the shell scripts to monitor the data of Hadoop daemon services and respond accordingly to any warning or failure conditions.
- Wrote shell scripts for rolling day-to-day processes and it is automated.
Environment: Apache Flume, Hive, Pig, HDFS, Zookeeper, Sqoop, RDBMS, AWS, MongoDB, Talend, Shell Scripts, Eclipse.
Confidential, Chicago, IL
Hadoop Developer
Responsibilities:
- Developed MapReduce/ EMR jobs to analyze the data and provide heuristics and reports. The heuristics were used for improving campaign targeting and efficiency.
- Responsible for building scalable distributed data solutions using Hadoop .
- Developed Simple to complex MapReduce Jobs that are implemented using Hive and Pig .
- Analyzed the data by performing Hive queries ( HiveQL ) and running Pig scripts (Pig Latin) to study customer behavior and used UDF's to implement business logic in Hadoop.
- Managing and Reviewing Hadoop Log Files , deploy and Maintaining Hadoop Cluster.
- Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, No SQL.
- Supporting HBase Architecture Design with the Hadoop Architect group to build up a Database Design in HDFS.
- Experience in creating tables, dropping, and altered at run time without blocking updates and queries using HBase .
- Wrote Flume configuration files for importing streaming log data into HBase with Flume .
- Exported data from HDFS into RDBMS using Sqoop for report generation and visualization purpose.
- Developed a data pipeline using Kafka and Storm to store data into HDFS.
- Using Oozie workflows and enabled email alerts on any failure cases.
- Managed and reviewed Hadoop log files to identify issues when job fails.
- Used Pig to convert the fixed width file to delimited file.
- Supported in setting up updating configurations for implementing scripts with Pig and Sqoop.
- Developed Kafka producer and consumer components for real time data processing.
Environment: MapReduce, Hive, Pig, Sqoop, Oozie, HBase, Flume, Cloudera, HDFS, RDBMS
Confidential, Woodstock, GA
Hadoop Developer
Responsibilities:
- Worked on analyzing Hadoop cluster and different big data analytic tools including Pig , Sqoop, Zookeeper.
- Exported the analyzed data to the relational database using Sqoop for visualization and to generate reports for the BI team.
- Importing and exporting data into HDFS and Hive using Sqoop between SQL server.
- Developed Hive queries to process the data for visualizing.
- Developed multiple MapReduce jobs in Java for data cleaning.
- Created Pig Latin scripts to sort, group, join and filter the enterprise wise data.
- Involved in creating Hive tables, loading with data, and performing transformations on the data imported.
- Supported MapReduce programs those are running on the cluster.
- Implemented Frameworks using Java and Python to automate the ingestion flow.
- Worked on tuning the performance on Pig queries.
- Worked on Zoo Keeper for coordinating between different master node and data nodes.
- Responsible for cluster maintenance, adding and removing cluster nodes, cluster monitoring and troubleshooting, manage and review data backups, manage and review Hadoop log files.
Environment: HDFS, MapReduce, Hive, Pig, Sqoop, Linux, Java, python, Oozie, Zookeeper, SQL Server, Linux.
Confidential
Java Developer
Responsibilities:
- Involved in End to End Design and Development of UI Layer, Service Layer, and Persistence Layer.
- Implemented Spring MVC for designing and implementing the UI Layer for the application.
- Implemented UI screens using JSF for defining and executing UI flow in the application.
- Have used AJAX to retrieve data from server synchronously in the background without interfering with the display and existing page in an interactive way.
- Have Used DWR (Direct Web Remoting) generated script to make AJAX calls to JAVA .
- Worked with JavaScript for dynamic manipulation of the elements on the screen and to validate the input.
- Involved in writing Spring Validator Classes for validating the input data.
- Have used JAXB to marshal & unmarshal java objects to Communicate with the backend mainframe system.
- Involved in writing complex PL/SQL and SQL blocks for the application.
- Involved in Unit Testing, User Acceptance Testing and Bug Fixing.
Environment: MVC Architecture, Spring-IOC, Hibernate-JPA, JAVA Web Services-REST, Spring Batch, XML, Java Script, Oracle, Database (10g) and JSF UI, JUnits.
Confidential
Java Developer
Responsibilities:
- Used Servlets, Struts, JSP and Java Beans for developing the Performance module using Legacy Code.
- Involved in coding for JSP pages, Form Beans and Action Classes in Struts .
- Created Custom Tag Libraries to support the Struts framework .
- Involved in Writing Validations.
- Involved in Database Connectivity through JDBC .
- Implemented JDBC for mapping an object-oriented domain model to a traditional relational database.
- Created SQL queries, Sequences, Views for the back-end database in Oracle database.
- Developed JUnit Test cases and performed application testing for QC team.
- Used JavaScript for client-side validations.
- Developed dynamic JSP pages with Struts .
- Participated in weekly project meetings, updates, and Provided Estimates for the assigned task.
Environment: JAVA, J2EE, STRUTS, JSP, JDBC, ANT, XML, IBM Web Sphere, WSAD, JUNIT, DB2, Rational Rose, CVS, SOAP, and RUP.