Big Data Developer Resume
3.00/5 (Submit Your Rating)
TorontO
PROFILE SUMMARY:
- MBA Professional with over 3+ years of experience in Big Data
- Proficiency in developing, testing and deploying ETL solutions
- Extensive experience in using Spark to transform data using RDDs and dataframes
- Experience in developing and using Hive Query Language for data analytics
- Proficiency in Kafka, Hive, Impala, Sqoop, Flume, Oozie and MongoDB
- Good knowledge of Hadoop Architecture and its various components such as HDFS and YARN
- Proficiency in using Flume,NiFi,Kafka to gain real time and near real time streaming data into HDFS
- Experience in using Spark Streaming to process streaming data from Kafka and Flume
- Proficiency in Database Programming using MySQL creating Indexes, Functions, Views and Joins
- Experience in using Spark SQL & performance tuning of Spark jobs
- Strong team player, ability to work independently and in a team
- Ability to adapt to a rapidly changing environment along with strong commitment towards learning
- Experience in using both the major Hadoop distributors, Cloudera and Hortonworks
- Experience in different phases of Software Development Life Cycle (SDLC)
- Programming Knowledge in Java and Scala
TECHNICAL SKILLS:
- Spark
- Kafka
- Sqoop
- HiveQL & Impala
- BitBucket,Docker
- Flume,NiFi
- Scala & Java
- MongoDB & HBase
- Spark Streaming
- MySql
PROFESSIONAL EXPERIENCE
Big Data Developer
Confidential, Toronto
Responsibilities:
- Designed, created and tested complete data - pipelines
- Worked closely with the data architect
- Took care of the complete ETL process
- Worked on semi-structured,unstructured and structured 600GB of data everyday
- Created a complete system by using different Big Data tools
- Used NiFi to gain real-time streaming data from different sources into HDFS
- Automated ETL job using NiFi to load JSON data and server data into MongoDB
- Loaded disparate data sets into HDFS from RDBMS like MySQL and vice-versa using Sqoop
- Created schema in Hive using performance optimization using partitioning and bucketing
- Wrote HQL queries for processing of data
- Used Flume to load log data from e-commerce application into HDFS
- Used Spring boot to publish log data from web applications into Kafka topic
- Developed Spark jobs in Scala for data processing using Spark SQL and dataframes
- Used NoSQL databases such as MongoDB as data-access tier in data streaming
- Used Spring Boot to receive log data from web application and publish it in Kafka topic
- Subscribed messages from Kafka topic into Spark Streaming for data processing
- Stored the processed log data into NoSQL database,MongoDB
- Deployed Docker containers to improve workflow and improve performance
- Automated build and deployment using Jenkins to speed up the process
- Migrated data from MySQL to HDFS and Hive using Sqoop & vice-versa in the hybrid system
Big Data Developer
Confidential
Responsibilities:
- Imported and exported data using Sqoop from HDFS and Hive to RDBMS and vice-versa
- Replaced default Derby metadata storage system for Hive with MySQL system
- Created Spark RDDs from data files and performed transformations and actions on them
- Used Spark SQL to run analysis on huge datasets
- Created Hive tables with partitions and bucketing for efficiency
- Developed Hive queries for the analysts
- Utilized ApacheHadoop environment by Hortonworks
- Automated dataflow process using Apache NiFi
- Worked on import & export of data into HDFS and Hive using Sqoop
- Involved in managing and reviewingHadoop log files
- Created Hadoopstreaming jobs to process gigabytes of xml format data
- Created transformations for large sets of structured, semi structured and unstructured data
- Worked with Hive partitioned tables to load data for analysis
Hadoop Developer
Confidential
Responsibilities:
- Used Talend for data integration and data managementDeveloped the whole process using the Talend ETL Big Data toolImported data to HDFS using Sqoop Analyzed the data using Spark Filtered, Mapped and Reduced RDDs using Spark
- Created hive schemas using partitioned tables and bucketing
- Developed Scala scripts for running Spark codes
- Used Sqoop to transfer data back to the RDBMS
- Developed oozie workflow to implement jobsMade Rest API call to get JSON dataStored the data in local Directory. Transferred the data from local directory to HDFSRead JSON data from HDFS using Spark, converted it to Dataframe and tan saved it as tableDeveloped Hive scripts to query the table
- Used oozie to run the jobs
MySQL Developer
Confidential
Responsibilities:
- Monitored and fine tuned the running database server Performed database development and implementationPlanned database growth in terms of capacity and scalability
- Enabled extraction,transformation and loading of data and packages
- Ensured continuous database availability,integrity and security in the production environment
