We provide IT Staff Augmentation Services!

Spark Developer Resume

0/5 (Submit Your Rating)

Nashville, TN

SUMMARY

  • A dynamic professional with over 9+ years of diversified experience in the field of Information Technology with an emphasis on Big Data/Hadoop Eco System, SQL/NO - SQL databases, Java /J2EE technologies and tools using industry accepted methodologies and procedures.
  • Hadoop Developer: Extensively worked on Hadoop tools which include Pig, Hive, Oozie, Sqoop, Spark, Data frames, HBase and MapReduce programming. Created Partitions and Bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance. Developed SPARK applications using Scala for easy Hadoop transitions. Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive .Developed Spark code and SparkSQL/Streaming for faster testing and processing of data.
  • Hadoop Distributions: Worked with Apache Hadoop along enterprise version of Cloudera and Hortonworks. Good Knowledge on MAPR distribution.
  • Data Ingestion in to Hadoop (HDFS): Ingested data into Hadoop from various data sources like Oracle, MySQL and Teradata using Sqoop tool. Created Sqoop job with incremental load to populate Hive External tables. Exported the analyzed data to the relational databases using SQOOP for visualization and to generate reports for the BI team.
  • File Formats: Involved in running Hadoop streaming jobs to process terabytes of text data. Worked with different file formats such as Text, Sequence files, Avro, ORC and Parquette.
  • Scripting and Reporting: Created scripts for performing data-analysis with PIG, HIVE and IMPALA. Used the ANT script for creating and deploying .jar, .ear and .war files. Generated reports, extracts and statistics on the distributed data on Hadoop cluster. Generated Java APIs for retrieval and analysis on No-SQL database such as HBase and Cassandra.
  • Custom Coding: Written custom UDFs (User Defined Functions) in java for Hive and Pig to extend the functionality. Used Hcatalog for simple query execution. Composed code and created the JAR files unavailable in PIG and Hive. Used automation tool in Maven while composing and creating the JAR files for custom tasks.
  • Java Experience: Created applications in core Java, built application that satisfy use of database and constant connectivity such as a client-server model using JDBC, JSP, Spring and Hibernate. Implemented web-services for network related applications in java.
  • Interface Design: Created front end user interface using HTML, CSS and JavaScript along with validation techniques. Implemented Ajax toolkit for validation with GUI. Worked with image editing tools such as Photoshop and Adobe Light Room.
  • Methodologies: Handful experience in working with different software methodologies like Water fall and agile methodologies.

TECHNICAL SKILLS

Languages/Tools: Java, XML, XSTL, HTML/XHTML, HDML, DHTML, Python, Scala, R, GIT.

Big Data Technologies: Apache Hadoop,HDFS, Spark, HIVE, PIG,Talend,HBase, SQOOP, Oozie, Zookeeper, Spark, Mahout, Splunk, Flink, Solr, Kafka, Storm, Cassandra, Impala, HUE, NIFI, tez, Green plum, MongoDB, Scala.

Java Technologies: JSE: JAVA architecture, OOPs concepts JEE:JDBC, JNDI, JSF(Java Server Faces), Spring, Hibernate, SOAP/Rest web services

Web Technologies: HTML, XML, Java Script, WSDL, Soap, JSON, angular JS

Databases/NO SQL: MS SQL Server, MySQL, HBase, Oracle, MS Access, Teradata, oracle, Netezza.

PROFESSIONAL EXPERIENCE

Confidential - Nashville, TN

Spark Developer

Responsibilities:

  • Used Cloudera distribution for hadoop ecosystem.
  • Analyzed Hadoop cluster and different big data analytic tools including Map Reduce, Pig, Hive and Spark.
  • Created Sqoop jobs for importing the data from Relational Database systems into HDFS.
  • Extensively used Pig for data cleansing.
  • Developed in scheduling Oozie workflow engine to run multiple Hive and pig jobs using python.
  • Written python scripts to analyze the data of the customer.
  • Created partitioned tables in Hive.
  • Developed Hive queries for the analysts.
  • Tested Apache Tez, an extensible framework for building high performance batch and interactive data processing applications, on Pig and Hive jobs
  • Captured the data logs from web server into HDFS using Flume & Splunk for analysis.
  • Worked on SPARK engine creating batch jobs with incremental load through KAFKA, SPLUNK, FLUME.
  • Performed advanced procedures like text analytics and processing using the in-memory computing capabilities of Spark.
  • Implemented Spark RDD transformations to Map business analysis and apply actions on top of transformations.
  • Created Spark based Talend Bigdata Integration jobs to do lighting speed analytics over the spark cluster.
  • Involved in migrating MapReduce jobs into Spark (version 1.6.0) jobs and used Spark SQL and Data frames API to load structured data into Spark clusters
  • Use Data frames for data transformations.
  • Designed and Developed Scala workflows for data pull from cloud based systems and applying transformations on it.
  • Collect the data using Spark streaming and dump into HBase.
  • Tuned HBase and MySQL for optimizing the data.
  • Fetch and generate monthly reports. Visualization of those reports using Tableau.
  • Developed Tableau visualizations and dashboards using Tableau Desktop
  • Created action filters, parameters and calculated sets for preparing dashboards and worksheets in Tableau.
  • Deployed data from various sources into HDFS and building reports using Tableau.
  • Extensively in creating Map-Reduce jobs to power data for search and aggregation.
  • Deployed Cloudera Hadoop Cluster on AWS for Big Data Analytics.
  • Utilized GIT for code versioning while following a Gitfllow workflow.
  • Configured, monitored and optimized Flume agent to capture web logs from the VPN server to be put into Hadoop Data Lake.
  • Involved in developing code to write canonical model JSON records from numerous input sources to Kafka Queues.
  • Involved in loading data from Linux file systems, servers, java web services using Kafka producers and consumers.
  • Involved with various levels of individuals to coordinate and prioritize multiple projects. Estimate scope, schedule and track projects throughout SDLC.

Environment: Hadoop, Hive, Linux, Map Reduce, Sqoop, Kafka, Spark, HBase, shell Scripting, Eclipse, Maven, Java, AngularJS, agile methodologies, AWS, Talend, Splunk, Tableau, Oozie.

Confidential -Columbus, GA

Hadoop Developer

Responsibilities:

  • Used Hortonworks distribution for hadoop ecosystem.
  • Created Sqoop jobs in Oozie workflow.
  • Monitored multiple Hadoop clusters environments using Ganglia.
  • Installed and configured Flume, Hive, Pig, Sqoop and Oozie on the Hadoop cluster.
  • Built a process based on RabbitMQ, Spark and Spring-boot to send/receive data to Symedical for Reference Data Management.
  • Used Celery as task queue and RabbitMQ, Redis as messaging broker to execute asynchronous tasks.
  • Involved in designing of MapReduce jobs with Greenplum Hadoop system (HDFS).
  • Responsible for developing efficient MapReduce on AWS cloud programs for more than 4 years' worth of claim data to detect and separate fraudulent claims.
  • Implemented monitoring and established best practices around usage of elasticsearch
  • Monitoring local file system disk space usage, CPU using Ambari.
  • Written python scripts to update content in the database and manipulate files.
  • Developed a scalable, cost effective, and fault tolerant data ware house system on Amazon EC2 Cloud.
  • Automated API test cases by using REST, SOAP, Splunk web.
  • Performed importing data from various sources to the Cassandra cluster using Java APIs.

Environment: Hadoop 1x, HDFS, Map Reduce, Hive 10.0, Pig, Sqoop, Ganglia, Cassandra, Shell Scripting, AWS, MySQL, HortonWorks, Ubuntu 13.04.

Confidential - San Francisco, CA

Hadoop/ETL Developer

Responsibilities:

  • Extracted the data from the flat files and other RDBMS databases into staging area and populated onto Data warehouse.
  • Installed and configured Hadoop Map-Reduce, HDFS and developed multiple Map-Reduce jobs in Java for data cleansing and preprocessing.
  • Developed in scheduling Airflow workflow engine to run multiple Hive and pig jobs using python.
  • Importing and exporting data into HDFS and Hive using Sqoop.
  • Responsible for Coding batch pipelines, Restful Service, Map Reduce program, Hive query's, testing, debugging, Peer code review, troubleshooting and maintain status report.
  • Developed jobs in Talend Enterprise edition from stage to source, intermediate, conversion and target.
  • Utilized S3 Bucket to store the jar's, input datasets and utilized DynamoDB to store the processed output from the input data sets
  • Developed several reports using kibana via elasticsearch.
  • Involved in loading data from UNIX file system to HDFS.
  • Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map reduce way.
  • Worked on JVM performance tuning to improve Map-Reduce jobs performance

Environment: Hadoop, MapReduce, HDFS, Hive, DynamoDB, Oracle 11g, Java, Struts, Servlets, HTML, Airflow, XML, SQL, J2EE, JUnit, teredata,Tomcat 6.,Talend.

Confidential - St Louis, Mo

JAVA/ETL Developer

Responsibilities:

  • Developed Maven scripts to build and deploy the application.
  • Developed Spring MVC controllers for all the modules.
  • Developed DAOs using Hibernate for data access from database.
  • Implemented JQuery validator components.
  • Extracted data from Oracle as one of the source databases.
  • Involved in the ETL design and its documentation.
  • Using Data stage ETL tool to copy data from Teradata to Netezza
  • Used JSON and XML documents with Marklogic NoSQL Database extensively. REST API calls are made using NodeJS and Java API.
  • Built data transformation with SSIS including importing data from files.
  • Loaded the flat files data using Informatica to the staging area.
  • Created SHELL SCRIPTS for generic use.

Environment: Java, Spring, Windows XP/NT, Informatica Power center 9.1/8.6, UNIX, Teradata V-14, Oracle Designer, Autosys, Korn Shell, Quality Center 10.

Confidential

Java Developer

Responsibilities:

  • Involved in the analysis, design, implementation, and testing of the project.
  • Implemented the presentation layer with HTML, XHTML and JavaScript.
  • Developed web components using JSP, Servlets and JDBC.
  • Implemented database using SQL Server.
  • Implemented Spring IoC framework
  • Developed Spring REST services for all the modules.
  • Developed custom SAML and SOAP integration for healthcare.
  • Validated the fields of user registration screen and login screen by writing JavaScript validations.
  • Used DAO and JDBC for database access.
  • Built responsive Web pages using Kendo UI mobile.
  • Designed dynamic and multi-browser compatible pages using HTML, CSS, JQuery, JavaScript, Require Js and Kendo UI.

Environment: Oracle 11g, Java 1.5, Struts, Servlets, HTML, XML, SQL, J2EE, JUnit, Tomcat 6, Java, JSP, JDBC, JavaScript, MySQL, Eclipse IDE, Rest.

Confidential

Jr. Java Developer

Responsibilities:

  • Analyzing and preparing the requirement Analysis Document.
  • Deploying the Application to the JBOSS Application Server.
  • Implemented Web Service using SOAP protocol using Apache Axis.
  • Requirement gatherings from various parties involved in the project
  • Used to J2EE and EJB to handle the business flow and Functionality.
  • Involved in the complete SDLC of the Development with full system dependency.
  • Actively coordinated with deployment manager for application production launch.
  • Monitoring of test cases to verify actual results against expected results.
  • Carrying out Regression testing to track the problem tracking.

Environment: Java, J2EE, EJB, UNIX, XML, Work Flow, JMS, JIRA, Oracle, JBOSS, Soap.

We'd love your feedback!