We provide IT Staff Augmentation Services!

Hadoop/big Data Developer Resume

2.00/5 (Submit Your Rating)

Alpharetta, GA

PROFESSIONAL SUMMARY:

  • Overall 5+ years of IT experience onBigDatatechnologies, Spark and database development.
  • Strong experience working with HDFS, MapReduce, Spark, Hive, Sqoop, Flume, Kafka, Oozie, Pig and HBase.
  • Experience working with DataFrames, RDD, Spark SQL, Spark Streaming, APIs, System Architecture, and Infrastructure Planning.
  • Experience in usage of Hadoop distribution like Cloudera and Hortonworks.
  • Worked extensively to integrate Horton works with BI tool Tableau.
  • Expertise in developing solutions to analyze large data sets efficiently.
  • Integrated Kafka with Spark Streaming for real timedataprocessing.
  • Built real - timeBigDatasolutions using HBase handling millions of records.
  • Implemented Hadoop baseddatawarehouses, integrated Hadoop with EnterpriseDataWarehouse systems.
  • Hadoop framework, Hadoop Distributed file system and Parallel processing implementation.
  • Expertise in developing solutions to analyze large data sets efficiently.
  • Skilled in writing Map Reduce jobs in Pig and Hive.
  • Large scale Hadoop environments build and support including design, configuration, installation, performance tuning and monitoring.
  • Experience in NOSQL databases and SQL databases.
  • Involved in Creation of database, tables, stored procedures, triggers, and user defined functions. Involved in Installation and configuration of SQL Server
  • Extensive hands-on experience with Linux and Windows.

TECHNICAL SKILLS:

HADOOP ECOSYSTEM: HDFS, MapReduce, Yarn(Cloudera, Hortonworks)

DATA ANALYTICS: Hadoop, Hive, Spark, Pig and Tableau

BIGDATATOOLS: Sqoop, Oozie, HBase, and Flume

PROGRAMMING LANGUAGES: Python, Scala and Java

DATABASE TOOLS: Oracle, MySQL, MS SQL server.

OPERATING SYSTEMS: Windows, Unix, Linux

PROFESSIONAL EXPERIENCE:

Confidential, Alpharetta, GA

Hadoop/Big Data Developer

Responsibilities:

  • Work with Hadoop ecosystem and Implement Spark using Scala and utilizing DataFrames and Spark SQL API for faster processing ofdata.
  • Develop Hive queries to process teh data, Implement Partitions and Buckets in Hive.
  • Develop RDD's/DataFrames in Spark using and apply several transformation logics to loaddatafrom HadoopDataLakes.
  • Provide proof of concepts converting Filedatainto parquet format to improve query processing by using Hive.
  • Develop Hive and Pig scripts for joining teh raw data with teh lookup data and for some aggregative operations as per teh business requirement.
  • Filtering and cleaning data using Scala code and SQL Queries.
  • Install Oozie workflow engine to run multiple map-reduce programs which run independently with time anddata.
  • Importing and exportingdatainto HDFS and Hive using Sqoop.
  • Working with Flume to load teh logdatafrom multiple sources directly into HDFS.

Environment: Hadoop, HDFS, Spark, Hive, Kafka, JSON, Linux, HBase, Python, Parquet, Hortonworks, Scala, Sqoop, Flume, SQL.

Confidential, Sandy Springs, GA

Pyspark / Hadoop Developer

Responsibilities:

  • Developed Spark programs with Python, and applied principals of functional programming to process teh complex unstructured and structured data sets.
  • Analyzing SQL scripts and designed teh solution to implement using PySpark
  • Developing data processing tasks using pySpark such as reading data from external sources, merge data, perform data enrichment and load in to target data destinations.
  • Used JSON and XML SerDe’s for serialization and de-serialization to load JSON and XML data into Hive tables.
  • Used SparkSQL to load JSON data and create Schema RDD and loaded it into Hive Tables and handled structured data using SparkSQL.
  • HBase setup and storing data into Hbase, which will be used for analysis.
  • Analyze SQL scripts and designed teh solutions to implement using Pyspark.
  • Converting MapReduce programs into Spark transformations using Spark RDD in Pyspark.
  • Implemented Spark using Pyspark API and utilizing Data frames and SparkSQL API for faster processing of data.

Environment: Hadoop, HDFS, Spark, Spark core, Spark Streaming, Yarn, Hive, Sqoop, Zookeeper, Flume, Kafka, HBase, Python, SQl scripting, Kerberos, Linux Shell Scripting, JSON, Parquet, Hortonworks.

Confidential, Phoenix, AZ

Hadoop/Big Data Developer

Responsibilities:

  • Installed and configured Hadoop MapReduce, HDFS, Developed multiple MapReduce jobs for data cleaning and preprocessing.
  • Develop different components of system like Hadoop process dat involves MapReduce and Hive.
  • Worked with Hive on big data of logs to perform a trend analysis of user behavior on various online modules.
  • Install Oozie workflow engine to run multiple jobs.
  • Worked on sequence files, Bucketing, partitioning for Hive performance enhancement and storage
  • Used Sqoop to efficiently transfer data between databases and HDFS and used Flume to Stream teh log data from servers.
  • Involved in HDFS maintenance and loading of structured and unstructured data.
  • Debugging and identifying issues reported by QA with Hadoop jobs by configuring to local file system.

Environment: Hadoop, Cloudera, Linux, CentOS, MapReduce, HBase, Sqoop, Flume, HDFS, Python, Hive, SQl.

Confidential, Bellavue, WA

Big Data Developer

Responsibilities:

  • Developed MapReduce programming dat works seamlessly on Hadoop clusters
  • Worked with SQL, NoSQL, data warehousing & DBA
  • Designing web services for swift data tracking and Querying data at high speeds
  • Test software prototypes, propose standards and smoothly transfer it to operations
  • Translate complex functional and technical requirements into detailed design.
  • Perform analysis of vast data stores and uncover insights.
  • Maintain security and data privacy.
  • Managing and deploying HBase.
  • Worked on Cluster coordination services through Zookeeper.
  • Experienced in running Hadoop streaming jobs to process terabytes of xml formatdata.
  • Created and maintained Technical documentation for launching Clusters and for executingHive queries to make UDFs.
  • Worked on BI tools as Tableau to create dashboards like weekly, monthly, daily reports using tableau desktop and publish them to HDFS cluster.

Environment: Hadoop, HDFS, MapReduce, Hive, HBase, Oozie, Zookeeper, Tableau, Java, JASON, Linux, CentOS, Cloudera, Sqoop, Flume, SQL.

Confidential

SQL Developer

Responsibilities:

  • Business and user requirements gathering
  • Involved in Creation of database, tables, stored procedures, triggers, and user defined functions. Involved in Installation and configuration of SQL Server
  • Backing up and restoring SQL server databases.
  • Design logical models and architecture.
  • Collection of data through design of survey questionnaires.
  • Optimize database systems for performance efficiency.
  • Data Mapping between teh source and teh destination and documentation of teh data mapping spreadsheet
  • Supervised data entry into SQL server database through application interfaces, Debugging.

Environment: oracle 10g, SQL developer, PL/SQL, Shell scripts, oracle forms, SQL*loader, Web focus reporting, triggers, wise Package studio 7.0.

We'd love your feedback!