We provide IT Staff Augmentation Services!

Spark Developer Resume

2.00/5 (Submit Your Rating)

San Jose, CA

CAREER OBJECTIVE

  • Spark & Hadoop Developer, Focus on Machine learning and Artificial Intelligence

PROFESSIONAL SUMMARY:

  • Over 5 years of IT experience which includes expertise in BigData, Hadoop, Apache Spark, Java, Scala, technologies
  • Solid Mathematics, Probability and broad practical statistical data mining techniques
  • Strong technical, administration & mentoring knowledge in Linux and BigData/Hadoop technologies
  • Hands on experience on major component in Hadoop Ecosystem like Hadoop Map Reduce, SparkSQL, Spark Streaming, Kafka, HDFS, Hive, Pig, HBase, Zookeeper, Sqoop, Oozie, and Flume
  • Working experience building, designing, configuring large Hadoop environments
  • Work experience with cloud infrastructure like Hortonworks, Cloudera, Databricks .
  • Experience in importing and exporting data using Sqoop from HDFS to Relational Database systems/mainframe and vice - versa
  • Excellent understanding/knowledge of Hadoop architecture and various components such as HDFS, Ambari, Big data ecosystem Job tracker, Task tracker
  • Hands on experience installing, configuring and using hadoop ecosystem components like Spark, HBase, Oozie, Hive, Sqoop, Drill, Pig Storm, Kafka, Zookeeper and YARN
  • Experienced in monitoring Hadoop cluster environment using Ambari and Oozie
  • Experience in Object Oriented Analysis, Design using core Java pattern
  • Experience in NoSQL, SQL, MYSQL, Hbase
  • Articulate in written and verbal communication along with strong interpersonal, analytical and organizational skills
  • Ability to adapt to evolving technology, strong sense of responsibility and accomplishment
  • Ability to meet deadlines and handle multiple tasks, flexible in work schedules and posses excellent communication skills

TECHNICAL SKILLS:

Big Data Eco systems: Hadoop, MapReduce, HDFS, HBase, Zookeeper, Hive, Pig, Sqoop, Cassandra, Oozie, Scala, Spark, Flume

Programming Languages: Java, C++, Scala, Matlab, SparkSQL, HiveQL

Web Technologies: JavaScript, XML, HTML5 .

Database: NoSQL, MySql, SQL SERVER, ORACLE, HBase

IDE Tools: Eclipse, JDeveloper, Netbeans, MS Visual Studio.

Tools: Adobe, Sql, Flume, Sqoop and Storm

Operating Systems: Windows, Unix, Linux and MAC OS X

EXPERIENCE:

Spark Developer

Confidential, San Jose, CA

Responsibilities:

  • Created spark streaming application which consumed data from Kafka, parse the data and stored in Hbase
  • Developed Hbase tables to load large set of structured data coming from spark streaming
  • Using Oozie schedular to automate batch machine learning model
  • Written Hive object creation scripts
  • Created administrative object design on tools such as Kafka topic, Hbase and Hive tables
  • Loading and transforming large sets of structured data on real time basis
  • Involved in managing spark.ml application which read data from Hbase and processed recommendation using ALS algorithm
  • Collaborated with Infrastructure, network, database application and Buisness Intelligence team to ensure data quality and availability

Environment: Apache spark, Hbase, CentOS, spark streaming, Kafka, Hortonworks

Big Data Engineer

Confidential, San Jose, CA

Responsibilities:

  • Collaborated on insights with other Data Scientists, Business Analysts, and Partners.
  • Created Data lakes and provided data for continuously improving the efficiency and accuracy of existing predictive model for data science team.
  • Utilized various data analysis and data visualization tools to accomplish data analysis, report design and report delivery.
  • Developed Data frames using SparkSQL from external database.
  • Uploaded data to Hadoop Hive and combined new tables with existing databases.
  • Implemented the data backup strategies for the data in the MongoDB.
  • Created data lakes using hive, Apache Spark provisioning Hadoop cluster.
  • Implemented the ETL design to dump the data lakes to MongoDB.
  • Implemented POC using APACHE IMPALA for data processing on top of HIVE.
  • Imported data from relational database into HDFS using Flume on near real time latency
  • Excellent Understanding on data storage and retrieval techniques, ETL, and databases

Environment: Apache Spark, Flume, SparkSQL, Scala, MongoDB, Hive, Storm, Hive, Big Data

Hadoop Developer

Confidential

Responsibilities:

  • Built a data flow pipeline using flume, java map reduce and pig
  • Used Flume to capture the streaming mobile sensor data and store the data on HDFS
  • Used Hive scripts to compute aggregates and store them on HBase for low latency applications
  • Analyze HBase database and compare it with other open-source NoSQL database to find which one of them better suits the current requirement
  • Integrated HBAse as a distributed persistent metadata store to provide metadata resolutions for network entities on the network
  • Used Oozie to orchestrate the scheduling of map reduce jobs and pig scripts
  • Created HBase tables to load large sets of structured, semi-structured and unstructured data

Environment: JDK1.7, Ubuntu Linux, Big Data, Hive Pig, Sqoop, Flume, Zookeeper, Oozie, DB2, Hbase

We'd love your feedback!