We provide IT Staff Augmentation Services!

Big Data Developer Resume

2.00/5 (Submit Your Rating)

Sunnyvale, CA

SUMMARY

  • An engineering professional with a total 2 years of experience, developing and executing solutions in Hadoop Ecosystem using Spark, Scala, Hive, HBase, Pig, Sqoop and Java for complex business problems, which involves real time effective analysis and processing of terabytes of structured, semi - structured and unstructured data.
  • In-depth knowledge of Hadoop Architecture and components such as HDFS, Map Reduce (YARN), Job Tracker, Task Tracker, Name Node and Data Node.
  • Extensive hands on experience in cleansing, analyzing, processing and optimizing of large data sets efficiently using Hadoop eco system including Spark, Scala, Hive, Pig, Sqoop and Oozie.
  • Procedural knowledge in cleansing and analyzing data using HiveQL, Pig Latin, and custom MapReduce programs in Java.
  • Strong knowledge of Pig and Hive’s analytical functions, extending Hive and Pig core functionality by writing custom UDFs.
  • Hands on experience in writing Ad-hoc Queries for migrating data from HDFS to HIVE and analyzing the data using HiveQL.
  • Excellent understanding and knowledge on NoSQL database like HBase and messaging system like Kafka.
  • Experienced in designing bothManaged and Externaltables in Hive to optimize performance.
  • Used Pig to do transformations, event joins, filter traffic and some pre-aggregations before storing the data onto HDFS.
  • Experience with Oozie Workflow Engine in running workflow jobs with actions that run Hadoop MapReduce and Pig jobs.
  • Experienced with different file formats like CSV, Text files, Sequence files, ORC, XML and JSON.
  • Hands on experience in writing Resilient Distributed Datasets (RDDs), Spark-SQL, Spark Context, Data Frame in Spark using Scala.
  • Strong knowledge of Entity-Relationship, Facts and Dimensions tables, slowly changing dimensions and Dimensional Modeling (Star Schema and Snow Flake Schema).
  • Familiar with data warehousing and ETL tools like Informatica.
  • Experience in working under waterfall and Agile Methodology.
  • Good Knowledge of continuous integration software tool Jenkins.

TECHNICAL SKILLS

Bigdata Ecosystem: Hadoop, Map Reduce, HDFS, Hive, Pig, Zookeeper, Sqoop, Spark, Oozie

Programming Languages: Core java, Scala, SQL

Scripting Languages: Pig Latin, Log4j, XML, HTML, Shell

ETL Tools: Informatica Power Center

Messaging Systems: Kafka

Methodologies: Agile, Waterfall

Application Servers: Apache Tomcat, JBoss

Platforms: Windows, Linux, Ubuntu

Databases: HBase, Oracle 11g/10g/9i, MS - SQL Server, MySQL, Teradata

Tools: Eclipse, Maven, Ant, Jenkins, Git, SVN, CVS

PROFESSIONAL EXPERIENCE

Confidential, Sunnyvale, CA

Big Data Developer

Responsibilities:

  • Worked on analyzingHadoopcluster and different big data analytic tools including Pig, Hive, Spark, HBase database and Sqoop.
  • Imported and ExportedDatafrom Different RelationalDataSources like Oracle, SQL Server toHDFS using Sqoop.
  • Developed Pig Latin scripts to extract the data from the web server output files to load in HDFS.
  • Used Spark SQL API to create temp tables out of HDFS data for the query purpose and performed computation on spark RDD’s and stored data into HDFS file system.
  • Worked on Spark SQL and Data frames for faster execution of Hive queries using Spark.
  • Implemented Partitioning, Dynamic Partitions and Buckets in HIVE for increasing performance benefit and helping in organizing data in a logical fashion.
  • Optimized Hive queries to extract the customer information from HDFS.
  • Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios.
  • Participated in daily SCRUM meetings and gave the daily status report.

Environment: Core java, Cloudera, Apache Hadoop, Apache Pig, Hive, Oozie, Sqoop, Spark, Scala, UNIX, Oracle, HBase, Eclipse, Git, Agile methodology

Confidential, San Ramon, CA

Hadoop Developer

Responsibilities:

  • Developed Sqoop scripts to import datasets from different RDBMS servers to HDFS, HIVE and export datasets to the RDBMS server on daily basis.
  • Involved in loading data from UNIX file system to HDFS.
  • Written the Apache PIG scripts for loading, filtering and storing the data in HDFS, which later will be used by Hive or MR jobs
  • Involved in creating Hive tables and wrote multiple Hive queries to load the Hive tables for analyzing the market data coming from distinct sources and implemented Partitions, Dynamic Partitions, Bucketing on HIVE tables.
  • Performed complex Joins on the tables in Hive for generating the reports.
  • Handled performance tuning on Hive queries and Pig queries.
  • Scheduled and executed workflows in Oozie to run Hive andPigjobs.
  • Worked with various formats of files like delimited text files, Apache log files, JSON files, XML Files.
  • Used Zookeeper to provide coordination services to the cluster.
  • Experienced in managing and reviewing Hadoop log files.
  • Created and maintained Technical documentation for launchingHadoopClusters and for executing Hive queries and Pig Scripts.
  • Experienced with Spark Context, Spark-SQl and RDD.
  • Developed Spark code using Scala and Spark-SQL for faster testing and processing of data.
  • Mentored test team for writing Hive Queries.

Environment: Hadoop, Map Reduce, HDFS, Informatica Power Center, Pig, Hive, Zookeeper, Oozie, Oracle 11g, Windows, UNIX, Putty, Winscp, SQL, Spark, Scala, Core Java, Maven, distributions like Cloudera, SVN, HBase, Eclipse, Kafka, Agile methodology

Confidential, Chicago, IL

Hadoop Developer

Responsibilities:

  • Involved in various phases of development analyzed and developed the system going through Agile Scrum methodology.
  • Responsible for data ingestion from RDBMS toHadoopusing Sqoop and performed data cleansing, transformations and use PIG Piggybank for further data analytics.
  • Responsible for analyzing and cleansing raw data by performing Hive queries and running Pig scripts on data.
  • Involved in creating Hive tables, loading with data and writing hive queries which run internally in MapReduce.
  • Used Pig Latin to analyze datasets and perform transformation according to business requirements.
  • Used Pig as ETL tool to do transformations, event joins, filters and some pre-aggregations before storing the data onto HDFS.
  • Created Hive External tables and loaded the data in to tables and query data using to HiveQL.
  • Used Hive on the RDBMS data for data analysis and stored back to Database.
  • Developed pig Latin script to extract the data from the web server output file and load into HDFS.
  • Hands on experience in analyzingLog files for Hadoop and eco system servicesand finding root cause.
  • Configured various property files like core-site.xml, hdfs-site.xml, mapred-site.xml based upon the job requirement.
  • Installed and configured Hadoop ecosystem like Hive, Pig and Sqoop.
  • Participated in daily SCRUM meetings and gave the daily status report.

Environment: Java, Eclipse IDE, SQL Developer, Winscp, Putty, SSH, Unix script, Cloudera Distribution forHadoop, HDFS, MapReduce, Pig, Hive, Sqoop, Maven, SVN, Agile Methodology, Spark RDD, Mapping, json, CSV

We'd love your feedback!