We provide IT Staff Augmentation Services!

Spark Developer Resume

2.00/5 (Submit Your Rating)

Atlanta, GA

SUMMARY:

  • Overall 9+years of experience in designing and Building highly - scalable distributed systems using Apache Hadoop, Apache Spark and Java/J2EE.
  • Very good hands-on in Spark Core, Spark-SQL, Spark streaming, Spark machine learning using Scala and Python programming language.
  • Strong understanding of RDD operations in Apache Spark i.e. Transformations, Actions, Persistence (Caching), Accumulators, Broadcast Variables, Optimizing Broadcasts.
  • In depth understanding of Apache spark job execution Components like DAG, lineage graph, Dag Scheduler, Task scheduler, Stages and task.
  • Experience in submitting Apache Spark job and map reduce jobs to YARN.
  • Good understanding of Driver, Executor Spark web UI.
  • Migrated Python Machine learning modules to scalable, high performance and fault-tolerant distributed systems like Apache Spark.
  • Strong experience in Spark SQL UDFs, Hive UDFs, Spark SQL Performance, Performance Tuning.
  • Experience in installation, configuration, supporting and monitoring Hadoop clusters using Hortonworks, Cloud era manager and Amazon Web Services (AWS) using S3 and running instances on EC2.
  • Used Amazon EMR to perform the map reduce job on the cloud.
  • Involved in setting up real time log analytics using Nifi, Kafka, Storm, HBase.
  • Used Zookeeper on a distributed HBase for cluster configuration and management.
  • Had hands on experience in providing service using Amazon kinesis analytics to process streaming data.
  • Good Knowledge and understanding of Hadoop Architecture and various components in Hadoop ecosystems - HDFS, Map Reduce, Sqoop and Hive.
  • Good understanding of the map reduces framework architectures (MRV1 & YARN Architecture).
  • Developed various Map Reduce applications to perform ETL workloads on Meta data and terabytes of data.
  • Experience in importing and exporting the data using Sqoop from HDFS to Relational Database systems/mainframe and vice-versa.
  • Experience in working with flume to load the log data from multiple sources directly into HDFS.
  • Good Knowledge of importing data from Teradata using Sqoop.
  • Good working knowledge in creating Hive tables and worked using HQL for data analysis to meet the business requirements.
  • Experience in managing and reviewing Hadoop log files.
  • Having good working experience of No SQL database like HBase, Cassandra and Mango DB.
  • Responsible to manage data coming from different sources and involved in HDFS maintenance and loading of structured and unstructured data.
  • Experience in fine-tuning Map reduces jobs for better scalability and performance.
  • Experience in writing shell scripts to dump the shared data from landing zones to HDFS.
  • Experience in performance tuning the Hadoop cluster by gathering and analyzing the existing infrastructure.
  • Worked with Avro Data Serialization system.
  • Hands on experience in working with input file formats like orc, parquet, json, avro.
  • Good expertise in coding with Python, Scala and Java.
  • Expertise in Client Side designing and validations using HTML and Java Script.
  • Well versed in Core Java, J2EE, Web Services (SOA), JDBC, Swings, MySQL, and DB2.
  • Software development in Java Application Development, Client/Server Applications, and implementing application environment using MVC, J2EE, JDBC, JSP, Servlets, XML methodologies (XML, XSL, XSD), Web Services, Confidential, and Relational Databases.
  • Exceptional ability to quickly master new concepts and capable of working in groups as well as independently.
  • Good analytical, programming, problem solving and troubleshooting skills.
  • Ability to adapt to evolving technology, strong sense of responsibility and accomplishment.
  • Excellent Written, Analytical Skills, Verbal Communication skills with a customer service-oriented attitude and worked with the offshore team as onsite coordinator to provide the update on the daily basis.

TECHNICAL SKILLS:

Programming languages: Scala, Python and Java.

Big Data Frameworks: Apache spark, apache Hadoop, Hive, Cassandra, Mongo DB, Flume, Sqoop, Map Reduce, Cloudera.

Data visualization: Tableau

Big data distribution: Cloudera, Amazon cloud, Pivotal Cloud Foundry.

Java,J2EE Technologies: Core java, Hibernate, Spring.

Operating Systems: MacOS, Windows.

Databases: Confidential Database, MySQL, Gemfire.

Development Tools: springBoot, Eclipse, PyCharm, Jupyter NotebookData bricks notebook, Intellij.

Development methodologies: Agile, Waterfall

Messaging Services: ActiveMQ, Kafka, JMS.

Version control Tools: GitHub, SVN.

Workflow scheduling system: Apache Oozie, Airflow

PROFESSIONAL EXPERIENCE:

Confidential, Atlanta,GA

Spark Developer

Responsibilities:

  • Drives technical development and application standards across enterprise data lake.
  • Maintaining the data lake for the flow of the data throughout the project.
  • Responsible for Streaming jobs set up using Spark (2.3.1) version to consume the data (json) from the Kafka Topic.
  • Spark job reads the Kafka topics in PCF for all the applications, Responsible for compressing the JSON data into parquet/ORC files and storing them in HDFS.
  • As per business requirement, required fields are extracted from these files. This is spark batch job. Autosys is used for job scheduling.
  • Creating Hive external tables and writing hive queries for the data across the data lake.
  • Responsible for applying different kind of map’s, filters, transformations and actions on the data.
  • Using different machine learning modules for ruining the spark jobs on daily and weekly basis.
  • Responsible for generating Tableau report for some of the Micro services using these hive tables. With the help of Tableau, they will do the data visualization which will help them in making the decisions for Business.
  • Building Kafka Tool for managing and monitoring applications using Apache Kafka clusters.
  • Responsible for creating and managing Kafka producer and consumer Topics.
  • Responsible for writing and maintaining the Kafka Trustscore and keystore files.
  • Maintaining the Kafka clusters for production and non- prod Environments.
  • Responsible for sending the hole json data to the data lake team to store the data into HBase.
  • Responsible for capturing, indexing, and correlating the real-time data in a searchable repository from which it can generate graphs, reports, alerts, dashboards, and visualizations by using Splunk application.
  • Writing Splunk queries for creating a dashboard for the services.
  • Responsible for supporting operations, incident, problem and change management for all production and non-production environments.
  • Responsible for Developing End to End Microservices using Spring boot and core Java/J2EE hosted on PCF.
  • Developing the complete functionality of service in Spring boot application by different programing languages like java, Scala and python.
  • Dealing with networking team for the connectivity and proxy setup for the service.
  • Responsible for creating config servers on PCF cloud and bounding the services to the server.
  • Being responsible for thorough Software Development Life Cycle (SDLC) process-interpret requirements, build testable test cases, and validate & interpret the results; Being responsible for single hand delivery of a Micro services of a larger project.
  • Develop Manual test cases in HP-ALM (Application Lifecycle Management) and JIRA tools.
  • Develop the automation framework to covert manual regression test cases into Automation scripts using HP-UFT (Unified Functional Testing, formerly known as QTP) tool.
  • Reviewing and baselining test cases and scripts. from the available business requirement, using technical documents and the test plan created.
  • Develop automation DevOps scripts where possible, and APIs to reduce operational overhead.
  • Develop a CICD pipeline involving Jenkins, Ansible, Sonar coverage to complete the Automation from commit to Deployment.
  • Documenting test results and compiling results of other testers into consolidated report for management, clients, and applications staff; Certifying application to be migrated to production once all the testing is carried out.
  • Responsible for preparing the Release plan and implementing them till the service goes live.
  • Responsible for preparing all the documentation for release like Agile implementation and Backout plan, production validation plan.
  • Getting the approvals from security team, QA team, QCPR team and Infosec team.
  • Works closely with program manager, scrum master, and architects to convey technical impacts to development timeline and risks.

Environment: Spark (2.3.1), Kafka, HDFS, Autosys, Hive, machine learning modules, Tableau, HBase, Splunk, Pivotal Cloud Foundry, Scala, python, HP-ALM, JIRA, DevOps, Automation, Gemfire, apache pluse.

Confidential -Kansas City Missouri

Spark Developer

Responsibilities:

  • Worked on analyzing Hadoop cluster using different big data analytic tools including Flume, Hive, Sqoop, and Spark.
  • Developed Spark code using Scala for faster processing of data.
  • AGILE development methodology has been followed to develop the application.
  • Installing and configuring a Hadoop Cluster on a different platform like Cloudera, Pivotal HD and AWS-EMR with other ecosystems like Sqoop, HBase, Hive and Spark.
  • Developed Spark SQL to load tables into HDFS to run select queries on top.
  • Developed Spark code and Spark-SQL/Streaming for faster testing and processing of data.
  • Used Spark Streaming to divide streaming data into batches as an input to Spark engine for batch -processing.
  • Highly skilled in integrating Kafka with Spark streaming for high speed data processing.
  • Used Spark Data frames, Spark-SQL, and Spark MLLib extensively.
  • Integrated Apache Storm with Kafka to perform web analytics.
  • Uploaded click stream data from Kafka to Hdfs, HBase and Hive by integrating with Storm.
  • Designed the ETL process and created the high-level design document including the logical data flows, source data extraction process, the database staging and the extract creation, source archival, job scheduling and Error Handling.
  • Collected data using Spark Streaming and stored that data into AWS S3 bucket in near-real-time and performed necessary Transformations and Aggregations to build the data model and persists the data into HDFS.
  • Worked on Talend ETL tool and used features like context variable and database components like input to Confidential, output to Confidential, tFile compare, tFile copy, to Confidential close ETL components.
  • Created ETL Mapping with Talend Integration Suite to pull data from Source, apply transformations, and load data into target database.
  • Created, altered and deleted topics (Kafka Queues) when required with varying.
  • Performance tuning using Partitioning, bucketing of IMPALA tables.
  • Experience in NoSQL database such as HBase, Mongodb.
  • Involved in cluster maintenance and monitoring.
  • Involved in loading data from UNIX file system to HDFS.
  • Implemented Flow file, Connections, Flow controller and Process Group as part of Nifi Process for automating the movement of data.
  • Worked on NOSQL databases which differ from classic relational databases.
  • Successful in creating and implementing complex code changes.
  • Experience in AWS EC2, configuring the servers for Auto scaling and Elastic load balancing.
  • Configuring EC2 instances in VPC network & managing security through IAM and Monitoring server's health through Cloud Watch.

Environment: Hadoopv2/Yarn-2.4, Spark, Amazon web services (AWS), MapReduce, Teradata 15.0, Hive, REST, Sqoop, Flume, Kafka, Impala, Mongodb.

Confidential, San Jose, CA

Hadoop Developer

Responsibilities:

  • Every day responsibilities includes solving developer issues, deployments moving code from one environment to other environment, providing access to new users and providing instant solutions to reduce the impact and documenting the same and preventing future issues.
  • Collaborating with application teams to install operating system and Hadoop updates, patches, version upgrades.
  • Worked with cloud services like AZURE and involved in ETL, Data Integration and Migration.
  • Wrote AZURE POWERSHELL scripts to copy or move data from local file system to HDFS Blob storage.
  • Implemented OLAP multi-dimensional cube functionality using Azure SQL Data Warehouse.
  • Wrote Lambda functions in python for AZURE which invokes python scripts to perform various transformations and analytics on large data sets in EMR clusters.
  • Played a lead role in the development of Confidential Data Lake and in building Confidential Data Cube on Microsoft Azure HDINSIGHT cluster.
  • Responsible for Cluster maintenance, Monitoring, commissioning and decommissioning Data nodes, Troubleshooting, Manage and review data backups, Manage & review log files.
  • Designed, built, and deployed multitude applications utilizing almost all the AZURE stack, focusing on high-availability, fault tolerance, and auto-scaling.
  • Designed and developed automation test scripts using Python.
  • Azure Cloud Infrastructure design and implementation utilizing Azure Resource Manager (ARM) templates.
  • Orchestrated hundreds of Sqoop scripts, python scripts, Hive queries using Oozie workflows and sub-workflows.
  • Experience in ingesting incremental updates from structured ERP systems residing on Microsoft SQL server database on to Hadoop data platform using SQOOP .
  • Implemented custom interceptors for flume to filter data and defined channel selectors to multiplex the data into different sinks.
  • Partitioned and queried the data in Hive for further analysis by the BI team.
  • Involved in extracting the data from various sources into Hadoop HDFS for processing.
  • Worked on analyzing Hadoop cluster and different big data analytic tools including HBase, database and Sqoop.
  • Creating and truncating HBase tables in hue and taking backup of submitter ID(s).
  • Used Amazon EMR for map reduction jobs and test locally using Jenkins.
  • Creating and managing Azure Web-Apps and providing the access permission to Azure AD users.
  • Involved in loading data from LINUX file system to HDFS.
  • Experience in configuring the Storm in loading the data from MYSQL to HBASE using Java Message Service .
  • Worked with BI teams in generating the reports and designing ETL workflows on Tableau.

Environment: HDFS, Map Reduce Hive, Hue, AZURE, Flume, Oozie, Sqoop, Apache Hadoop, Spark, Python, Qlik, Horton Works, Ambari, Red Hat, MySQL and Confidential .

Confidential, St. Louis, MO

HadoopDeveloper

Responsibilities:

  • Installed and configured Hadoop MapReduce, HDFS, Developed multiple MapReduce jobs in java for data cleaning and pre-processing.
  • Wrote reference architecture documents describing Horton works platform integration with other platforms (Informatica) and analytic applications (Excel, Tableau).
  • Wrote sandbox tutorials that demonstrated the functionality of the Horton works platform; HDFS (Hadoop), Pig, Hive, HBase, and HCatalog. Topics included how to install and configure the Horton works ODBC drivers, refining data with the Horton works platform, and analyzing data with 3rd-party BI tools (Excel, Tableau).
  • Installed virtual machines on Windows and Mac using Confidential Virtual Box and VMware. Installed and documented Horton works platform features on Windows 7, Windows Server 2012, and other operating systems.
  • Managed delivery pipeline, tuned OS params, throttled SOA application, connection pool,

    Mediator Worker Threads, Audit flow, JVM Heap garbage collection, security patches, and bug fixes.

  • Developed server-side services using Java, spring, Web Services (SOAP, Restful, WSDL, JAXB, JAX-RPC), SOA (Service oriented architecture).
  • Experience in administration, in4stalling, upgrading and managing CDH3, Pig, Hive & HBase
  • Importing and exporting data into HDFS and Hive using Sqoop and Flume.
  • Experienced in defining job flows.
  • Knowledge in performance troubleshooting and tuning Hadoop clusters.
  • Participated in development/implementation of Cloudera Hadoop environment.
  • Responsible for the system user interface creation.
  • Developed and implemented GUI of the system (Java 2.0, Swing)
  • Developed server-side business logic software modules using JDBC
  • Proficient in database development MySQL.
  • Associated in development in System management and out patience modules using JAVA, Swing.
  • Developed Application based on J2EE using Hibernate, spring, JSP frameworks and SOAP/REST web services.
  • Experience in preparing Unit Test cases prepared.
  • Participated code review implemented by peers.

Environment: Apache Hadoop, HDFS, Hive, Map Reduce, Java, Hive, Pig, Sqoop, Flume, Cloud era CDH3, Oozie, Confidential, MySQL, JAVA, Swings, JSP, JDBC, JUnit.

Confidential

Java,J2EE developer

Responsibilities:

  • Handled 3 modules Login, Register and Update Profile in this application (It uses waterfall methodology).
  • Involved in changing in the look and feel (UI) of web pages as per requirement by modifying the JSPs and HTMLs.
  • Involved in working on the client side form validation using JavaScript.
  • Involved in Code review and design review and documentation of design document.
  • Involved in modifying the Java backend objects (EJBs, XMLs, and Servlets).
  • Modified the DAOs as part of the enhancements.
  • Modified the JDBC connection parameters in the application.
  • All Java code development was done through Net Beans.
  • Involved in resolving the fixes to the tickets at the time of production deployment.
  • Maintained the database by purging the old data and inserting the new data (user information) on daily basis. It was done by running the UNIX scripts.
  • Involved in the unit testing and UAT testing for each release.
  • Maintained all the standards of GE while doing the code change.

Environment: J2EE, EJB1.1, XML, XSLT, WebSphere 4.0, Confidential 8i, JDBC, LDAP, Struts, PL/SQL, Toad, NetBeans, JSP, HTML .

Confidential, Redwood City, CA

Java developer

Responsibilities:

  • Entire SCADA Application was developed from scratch using Java Technologies due to the nature of complex requirement.
  • Implemented SOA architecture for communications between the individual PLC’s and between PC and PLC using XML-RPC, this enables each device as a service.
  • Implemented industry best practices to improve the quality of the product development.
  • Provided a client/server solution using XML-RPC communication for scalability of the system.
  • Provided reporting solutions with PDF export format.
  • Provided complete multi-threaded communication platform for visualization, recipe download, configuration, setup, utilities, maintenance.
  • Provided system security with task based permissions.
  • Task based security privileges to users and roles.
  • Implemented Alarms Web service, Messages Web service, Error Report Web Service.
  • Experience in developing applications using Eclipse IDE and running test cases.
  • Analyzed, created and proposed remediation measures to fix the bugs in the application.
  • Developed action classes and configuration files for struts framework.
  • Developed Confidential stored procedures and complex SQL.
  • Created and maintained the PL/SQL procedures that run in the background freeing the current threads from database deadlocks.
  • Tested the application thoroughly in Unit Testing phase using JUNIT.

Environment: Java, XML-RPC, Web services, Struts, JSP, Confidential, JUnit, JDBC, UML, Eclipse, PL/SQL, Core Java.

Confidential, Charlotte,NC

JAVA,J2EE Developer

Responsibilities:

  • Involved in the Software Development Life Cycle (SDLC): Requirements gathering, Design, Code, Integrate, and Deployment and production phases.
  • Implemented Struts Framework1.2 along with JSP2.0, Struts Tiles, and Struts Tag libraries to facilitate user interface design.
  • Developed validations using Struts validation framework.
  • Developed Stateless Session Beans to transfer calls from presentation tier to data services tier.
  • Adapted various design patterns like Business Delegate, Singleton, Service locator, Session Façade, Data Transfer Objects DTO and Data Access Objects DAO patterns.
  • Created reusable templates using Angular directives and worked with npm package manager tools (NodeJS) and build tools like gulp.
  • Enhanced user experience by designing new web features using MVC Framework like Backbone.JS, Require.JS, Node.JS, Knockout.JS and Ember.JS.
  • Design and development of a proof-of-concept real-time notification system using Node.js.
  • Used Web Services to communicate with different application.
  • Creating Knowledge Bases in Drools Workbench (Business Rules Manager
  • JAXB parser for marshaling and un-marshaling.
  • Developed the credit check module using Servlets and JSP& Core Java components in Web logic Application Server.
  • Designed EJsB2.1 like Stateless Session Bean for the Session Facade design pattern.
  • Used Hibernate3.1 to store the persistence data into the Oracle9i database.
  • Used Spring2.0 Framework to integrate the application with Hibernate3.1.
  • Used IBM MQSeries for enterprise level messaging system.
  • Working with WSDL and SOAP messages.
  • JMS was used to send/receive asynchronous data from various message brokers
  • Involved in writing the ANT scripts to build the application.
  • Involved in using Log4J to create log files to debug.
  • Used Rational Clear case for version control.
  • Used JUnit Testing Framework for Unit Level Testing.
  • Worked in IBMRAD 6.0 to develop complete application
  • Deployed the application on Web Sphere Application Server6.0 (WAS)

Environment: WAS 6.0, RAD 6.0, Core Java, Struts 1.2, Spring2.0, EJB 2.1, Servlet 2.3, HTML, JSP 2.0, JNDI, Web Services, JMS, SOAP, IBM MQSeries, JavaScript, JQuery, JProbe, PMD,WSDL, UNIX, Confidential 9

We'd love your feedback!