Spark Developer Resume
San Jose, CA
CAREER OBJECTIVE
- Spark & Hadoop Developer, Focus on Machine learning and Artificial Intelligence
PROFESSIONAL SUMMARY:
- Over 5 years of IT experience which includes expertise in BigData, Hadoop, Apache Spark, Java, Scala, technologies
- Solid Mathematics, Probability and broad practical statistical data mining techniques
- Strong technical, administration & mentoring knowledge in Linux and BigData/Hadoop technologies
- Hands on experience on major component in Hadoop Ecosystem like Hadoop Map Reduce, SparkSQL, Spark Streaming, Kafka, HDFS, Hive, Pig, HBase, Zookeeper, Sqoop, Oozie, and Flume
- Working experience building, designing, configuring large Hadoop environments
- Work experience with cloud infrastructure like Hortonworks, Cloudera, Databricks .
- Experience in importing and exporting data using Sqoop from HDFS to Relational Database systems/mainframe and vice - versa
- Excellent understanding/knowledge of Hadoop architecture and various components such as HDFS, Ambari, Big data ecosystem Job tracker, Task tracker
- Hands on experience installing, configuring and using hadoop ecosystem components like Spark, HBase, Oozie, Hive, Sqoop, Drill, Pig Storm, Kafka, Zookeeper and YARN
- Experienced in monitoring Hadoop cluster environment using Ambari and Oozie
- Experience in Object Oriented Analysis, Design using core Java pattern
- Experience in NoSQL, SQL, MYSQL, Hbase
- Articulate in written and verbal communication along with strong interpersonal, analytical and organizational skills
- Ability to adapt to evolving technology, strong sense of responsibility and accomplishment
- Ability to meet deadlines and handle multiple tasks, flexible in work schedules and posses excellent communication skills
TECHNICAL SKILLS:
Big Data Eco systems: Hadoop, MapReduce, HDFS, HBase, Zookeeper, Hive, Pig, Sqoop, Cassandra, Oozie, Scala, Spark, Flume
Programming Languages: Java, C++, Scala, Matlab, SparkSQL, HiveQL
Web Technologies: JavaScript, XML, HTML5 .
Database: NoSQL, MySql, SQL SERVER, ORACLE, HBase
IDE Tools: Eclipse, JDeveloper, Netbeans, MS Visual Studio.
Tools: Adobe, Sql, Flume, Sqoop and Storm
Operating Systems: Windows, Unix, Linux and MAC OS X
EXPERIENCE:
Spark Developer
Confidential, San Jose, CA
Responsibilities:
- Created spark streaming application which consumed data from Kafka, parse the data and stored in Hbase
- Developed Hbase tables to load large set of structured data coming from spark streaming
- Using Oozie schedular to automate batch machine learning model
- Written Hive object creation scripts
- Created administrative object design on tools such as Kafka topic, Hbase and Hive tables
- Loading and transforming large sets of structured data on real time basis
- Involved in managing spark.ml application which read data from Hbase and processed recommendation using ALS algorithm
- Collaborated with Infrastructure, network, database application and Buisness Intelligence team to ensure data quality and availability
Environment: Apache spark, Hbase, CentOS, spark streaming, Kafka, Hortonworks
Big Data Engineer
Confidential, San Jose, CA
Responsibilities:
- Collaborated on insights with other Data Scientists, Business Analysts, and Partners.
- Created Data lakes and provided data for continuously improving the efficiency and accuracy of existing predictive model for data science team.
- Utilized various data analysis and data visualization tools to accomplish data analysis, report design and report delivery.
- Developed Data frames using SparkSQL from external database.
- Uploaded data to Hadoop Hive and combined new tables with existing databases.
- Implemented the data backup strategies for the data in the MongoDB.
- Created data lakes using hive, Apache Spark provisioning Hadoop cluster.
- Implemented the ETL design to dump the data lakes to MongoDB.
- Implemented POC using APACHE IMPALA for data processing on top of HIVE.
- Imported data from relational database into HDFS using Flume on near real time latency
- Excellent Understanding on data storage and retrieval techniques, ETL, and databases
Environment: Apache Spark, Flume, SparkSQL, Scala, MongoDB, Hive, Storm, Hive, Big Data
Hadoop Developer
Confidential
Responsibilities:
- Built a data flow pipeline using flume, java map reduce and pig
- Used Flume to capture the streaming mobile sensor data and store the data on HDFS
- Used Hive scripts to compute aggregates and store them on HBase for low latency applications
- Analyze HBase database and compare it with other open-source NoSQL database to find which one of them better suits the current requirement
- Integrated HBAse as a distributed persistent metadata store to provide metadata resolutions for network entities on the network
- Used Oozie to orchestrate the scheduling of map reduce jobs and pig scripts
- Created HBase tables to load large sets of structured, semi-structured and unstructured data
Environment: JDK1.7, Ubuntu Linux, Big Data, Hive Pig, Sqoop, Flume, Zookeeper, Oozie, DB2, Hbase