We provide IT Staff Augmentation Services!

Big Data Developer Resume

4.00/5 (Submit Your Rating)

Woonsocket Rhode, IslanD

SUMMARY

  • 5+ years of professional IT experience in Big Data technologies, architecture, and systems.
  • Experience in using CDH and HDP Hadoop ecosystem components like Hadoop, MapReduce, Yarn, Hive, Pig, Sqoop, HBase, Cassandra, Spark, Oozie, Zookeeper, Kafka, and Flume.
  • Configured Spark streaming to receive real - time data from the Kafka and stored the stream data to HDFS using Scala.
  • Experienced in importing and exporting data using stream processing Flume and Kafka platforms.
  • Written Hive UDFs as required and executed complex HQLs to extract data from Hive tables.
  • Used partitioning and bucketing in Hive and designed both managed and external tables for performance optimization.
  • Converted Hive/SQL queries into Spark transformations using Spark Data frames and Scala.
  • Used Spark Data Frames API over Cloudera platform to perform analytics on Hive data.
  • Good understanding and knowledge of NoSQL databases like MongoDB, HBase and Cassandra.
  • Experienced in workflow scheduling and locking tools/services like Oozie and Zookeeper.
  • Practiced ETL methods in enterprise-wide solutions, data warehousing, reporting and data analysis.
  • Experienced in working with AWS using EMR, EC2 for computing and S3 as storage mechanism.
  • Developed Impala scripts for extraction, transformation, loading of data into data warehouse.
  • Used Pig scripts for transformations, event joins, filters, and pre-aggregations for HDFS storage.
  • Imported and exported data with Sqoop to and from HDFS to RDBMS including Oracle, MySQL, and MS SQL Server.
  • Good Knowledge in UNIX Shell Scripting for automating deployments and other routine tasks.
  • Experienced in using IDEs like Eclipse, NetBeans, IntelliJ.
  • Used JIRA and Rally for bug tracking and GitHub and SVN for various code reviews and unit testing.
  • Experienced in working in all phases of SDLC - both agile and waterfall methodologies.
  • Good understanding of Agile Scrum methodology, Test Driven Development and CI-CD.

PROFESSIONAL EXPERIENCE

Big Data Developer

Confidential, Woonsocket, Rhode Island

Responsibilities:

  • Built scalable distributed data solutions using Hadoop.
  • Involved in importing the data from various data sources into HDFS using Sqoop and applying various transformations using Hive, Apache Spark and then loading data into Hive tables or AWS S3 buckets.
  • Involved in moving data from various DB2 tables to AWS S3 buckets using Sqoop process.
  • Process raw data at scale in Hadoop big data platform and loading from disparate data sets from various environments.
  • Developed ETL data flows using Hadoop and Spark in Scala ECO system components.
  • Leading the development of large - scale, high-speed, and low - latency data solutions in the areas of large-scale data manipulation, long - term data storage, data warehousing, low - latency retrieval systems, real - time reporting and analytics Data applications.
  • Implemented Spark using Scala and Spark for faster testing and processing of data.
  • Exploring with Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark context, Spark-SQL, Data Frame, RDD's, Spark YARN.
  • Implemented Spark advanced procedures like text analytics and processing using the in-memory computing capabilities.
  • Using the Spark framework Enhanced and optimized product Spark code to aggregate, group and run data mining tasks.
  • Developed Spark jobs for faster data processing and used Spark SQL for querying.
  • Implemented Spark best practices like partitions, caching and check pointing for faster.
  • Write jobs for process unstructured data into a structural data for analysis pre-processing, fuzzy matching, and ingesting data...
  • Importing and exporting data into HDFS and Hive using Sqoop.
  • Written business logic with Hive with the help of MAGELLAN for analytical credit bureau reporting.
  • Involved in ingestion process to CORNERSTONE after data is cleaned and business logic is applied...
  • Creating various analytical report using Hive, HiveQL in Map Red Hadoop environment.
  • Involved in designing various configuration of Hadoop and hive for better performance...
  • Debugged hive backend MapReduce for fixing issues and performance tuning.
  • Involved in scheduling all Jobs with the central event engine.
  • Involved in loading the structured and semi structured data into spark clusters using Spark SQL and Data Frames API.
  • Involved in developing code and generated various data frames based on the business requirement and created temporary tables in hive.
  • Utilized AWS CloudWatch to monitor the performance environment instances for operational and performance metrics during load testing.
  • Experience in build scripts using Maven and did continuous system integrations like Bamboo.
  • Used JIRA for creating the user stories and creating branches in the bitbucket repositories based on the story.
  • Involved in story-driven agile development methodology and actively participated in daily scrum meetings.

Environment: Java, Scala, Spark, HDFS, Map Reduce, YARN, Hive, Sqoop, Unix, Airflow Scheduler, Shell Scripts,Big Data Developer

Big Data Developer

Confidential, Herndon, VA

Responsibilities:

  • Built scalable distributed data solutions using Hadoop, Spark with Scala.
  • Installed and configured Hive, Pig, Sqoop and Oozie on the Hadoop cluster.
  • Developed Simple to complex MapReduce Jobs using Hive and Pig.
  • Optimized MapReduce Jobs to use HDFS efficiently by using various compression mechanisms.
  • Collaborated with Senior Engineer on configuring Kafka for streaming data.
  • Developed Spark scripts by using Scala shell commands as per the requirement.
  • Performed processing on large sets of structured, unstructured, and semi structured data.
  • Created applications using Kafka, which monitors consumer lag within Apache Kafka clusters.
  • Managed importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and extracted the data from Oracle into HDFS using Sqoop.
  • Analyzed the data by performing Hive queries and running Pig scripts to study customer behavior.
  • Worked with Spark Ecosystem using Scala and Hive Queries on different data formats like Text file and parquet.
  • Implemented business logic by writing UDFs in Java and used various UDFs.
  • Responsible to migrate from Hadoop to Spark frameworks, in-memory distributed computing for real time fraud detection.
  • Used Spark to store data in-memory.
  • Implemented batch processing of data sources using Apache Spark.
  • Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig and HiveQL.
  • Developed the Pig UDF'S to pre-process the data for analysis.
  • Involved in creating Hive tables, loading with data, and writing hive queries which will run internally in MapReduce way.
  • Develop predictive analytic using Apache Spark Scala APIs.
  • Cluster co-ordination services through Zookeeper.
  • Used Apache Kafka for collecting, aggregating, and moving large amounts of data from application servers.
  • Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
  • As Part of POC setup Amazon web services (AWS) to check whether Hadoop is a feasible solution or not.
  • Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
  • Installed Oozie workflow engine to run multiple Hive and Pig jobs.
  • Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.

Environment: Hadoop, MapReduce, HDFS, Hive, Java, SQL, Cloudera Manager, Pig, Apache Sqoop, Spark, Oozie, HBase, AWS, PL/SQL, MySQL

We'd love your feedback!