We provide IT Staff Augmentation Services!

Hadoop Developer Resume

4.00/5 (Submit Your Rating)

Renton, WA

SUMMARY

  • 8 years of experience in IT industry in all phases of Software Development Life Cycle .
  • Extensive experience with mapping, analysis, transformation and support of application software.
  • Experience in usage of Hadoop distribution like Cloudera5.3 (CDH5, CDH3) and Horton works distribution.
  • Understanding and experience working with Cloud Infrastructure services like Azure.
  • Experience in Azure Big Data Technologies like Azure Data Lake, HDInsight.
  • Provisioned and Configured Proof of Concepts (POC) environments for Map Reduce, Hive, Oozie, Flume, HBase and other major components of Hadoop distributed system
  • Involved in developing Spark/Scala scripts for changing data capture and delta record processing between newly arrived data and already existing data in HDFS and Blob Storage.
  • Experience in Implementing and automating models created in Spark, Scala, Hive.
  • Experience working with code repositories and continuous integration in Git.
  • Worked with multiple Databases including RDBMS Technologies and NoSQL.
  • Implemented Partitioning, Dynamic Partitioning and bucketing concepts in Hive to compute data metrics.
  • Performed data analytics using HIVE, Spark/ Scala for Data Architects in our team.
  • Experience creating Spark jobs and automate them using shell scripts.
  • Experience in exporting data from MongoDB using Spark/ Scala from various sources to Azure Storage and performed transformations on it using Hive, Pig and Spark.
  • Worked with different data format like structured, semi - structured and unstructured data.
  • Worked on Oozie workflow engine for job scheduling.
  • Developed UNIX shell scripts for creating reports from Hive data and automated them using the CRON job scheduler.
  • Experience in usage of User interface Ambari which will monitor the Hadoop cluster.
  • Experience in monitoring Hadoop Log files.
  • Extensively worked on Spark using Scala on cluster for computational (analytics), On top of Hadoop performed advanced analytical application by making use of Spark with Hive and SQL.
  • Experience with data visualizations tools like Power BI, Tableau.
  • Getting data into Power BI via Azure Blob Storage and Spark Cluster.
  • Responsible for generating actionable insights from complex data to drive real business results for various applications teams.
  • Ability to learn new concepts, hardworking and Hadoop enthusiast.

TECHNICAL SKILLS

Hadoop Distributions: Cloudera, Hortonworks

Big Data Technologies: HDFS, Map Reduce, Pig, Hive, Spark, YARN, Kafka, Flume, Sqoop, Impala, Oozie, Zookeeper, Spark

RDBMS: MySQL, Oracle, Teradata, MSSQL Server, DB2

Programming languages: SQL, Java, Scala

NoSQL Databases: MongoDB, HBase

Development Tools: Net Beans, Eclipse, Git, Maven, IntelliJ

Virtual Machines: VMware, Virtual Box

OS: Cent OS 5.5, Unix, Red Hat Linux, Ubuntu

Cloud Environment: Microsoft Azure

BI Tools: Power BI, Tableau

PROFESSIONAL EXPERIENCE

Confidential, Renton, WA

Hadoop Developer

Responsibilities:

  • Worked with cloud Infrastructure services like Microsoft Azure.
  • Provisioned HDInsight Hadoop Cluster type and Spark Cluster type.
  • Developed Spark Applications by using Scala and Implemented Apache Spark data processing project to handle data from various data sources.
  • Experience in Implementing and automating models created in Spark, Scala, Hive, etc.
  • Hands on experience of Spark SQL jobs to load data into HDFS/ Blob Storage rather than sqooping which increases performance.
  • Developed Spark code using Scala and Spark-SQL for faster testing and data processing.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Spark Data Frame.
  • Developed multiple POCs using Spark and deployed on the Yarn cluster, compared the performance of Spark, with HIVE and MySQL.
  • Experience working with code repositories and continuous integration in Git.
  • Experience in usage of User interface Ambari which will monitor the Hadoop cluster.
  • Experience in monitoring Hadoop Log files.
  • Developed UNIX shell scripts for creating reports from Hive data and automated them using the CRON job scheduler.
  • Developed Oozie coordinators to schedule Hive scripts to create Data pipeline.
  • Experience with data visualizations tools like Power BI, Tableau.
  • Understanding of development and project methodologies.

Confidential, Denver, CO

Hadoop Developer

Responsibilities:

  • Developed Spark Applications by using Scala, Java and Implemented Apache Spark data processing project to handle data from various RDBMS and Streaming sources.
  • Worked with the Spark for improving performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD and Spark YARN.
  • Created Hive tables, loaded data and analyzed data using Hive queries and HiveQL in HUE.
  • Used HBase in accordance with Hive as and when required for real time low latency queries.
  • Performed transformations, cleaning and filtering on imported data using Hive, Map Reduce.
  • Worked on optimizing and tuning Map Reduce jobs to achieve optimal performance.
  • Imported data using Sqoop to load data from IBM DB2 to HDFS on regular basis.
  • Developed UNIX shell scripts for creating reports from Hive data and automated them using the CRON job scheduler.
  • Provide hands-on support for UNIX, Storage and Backup track during day to day activity and troubleshooting problems on different locations for multiple clients.
  • Used Flume to collect, aggregate, and store the web log data from different sources like web servers, mobile and network devices and pushed to HDFS.
  • Developed Oozie coordinators to schedule Hive scripts to create Data pipelines.
  • Worked with different teams to ensure data quality and availability.

Environment: Hadoop, HDFS, Map Reduce, Hive, Sqoop, Oozie, Spark, Kafka, NoSQL, HBase, UNIX Shell Scripting, Linux, Java (JDK SE 6, 7), Eclipse.

Confidential, Sacramento, CA

Hadoop Developer

Responsibilities:

  • Worked with structured and semi structured data of approximately 100TB.
  • Worked on Kafka to bring the data from data sources and keep it in HDFS systems for filtering.
  • Extensively used Hive/HQL to query or search for a particular string in Hive tables in HDFS
  • Experience in developing customized UDF's in python to extend Hive functionality.
  • Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
  • Cross examining data loaded in Hive table with the source data in MySQL.
  • Worked on SQOOP for retrieving the data from RDBMS to HDFS and HIVE if needed.
  • Worked with UNIX shell scripting for automate the required jobs to run at any time.

Environment: Hadoop, HDFS, Hive, Python, Spark, SQL, Teradata, Yarn, SQOOP, Kafka, UNIX Shell Scripting.

Confidential

Hadoop Developer

Responsibilities:

  • Implemented complex Map Reduce programs to perform joins on the Map side using distributed cache in java.
  • Developed Map Reduce programs and Hive queries to analyze shipping pattern and customer satisfaction index over the history of data.
  • Experience in Writing PIG User Define Function and Hive UDF’s.
  • Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java map-reduce Hive, Pig, and Sqoop.
  • Created Map Reduce programs to handle semi/unstructured data like XML, JSON, Avro data files and sequence files for log files.
  • Used SQOOP to import the data from RDBMS to HDFS to achieve the reliability of data.
  • Developed pig scripts to do transformations, event joins, filter boot traffic and some pre-aggregations before storing the data onto HDFS.

Environment: Hadoop, HDFS, Map Reduce, SQOOP, Hive, Pig, Oozie, HBase, Java, Flume 1.2.0, Eclipse IDE, CDH3.

Confidential

SQL Developer

Responsibilities:

  • Involved in all phases of Software Development Life Cycle (SDLC) and UML diagrams like Use Case Diagrams, Class Diagrams and Sequence Diagrams to represent the detail design phase.
  • Designed database tables, indexes, constraints etc.
  • Loaded files into system using MS SQL Server 2008.
  • Created and managed structured objects like tables, views, stored procedures, and triggers.
  • Manage employee records as needed to keep the system up to date.
  • Combining data from multiple tables using Inner joins and Left Outer joins and created sub-queries for complex queries involving multiple tables.
  • Develop/Unit Test Extract, Transformation, Load Programs using SQL.

Environment: MS SQL Server 2008 and SQL Server 2008 R2, UNIX shell scripts, Eclipse, Java/J2EE, Teradata.

Confidential

SQL Developer

Responsibilities:

  • Performed extensive data extraction from web and other sources and handled data preparation - missing values, formatting, transformations using SSIS.
  • Written Stored Procedures and SQL scripts both in SQL Server and Oracle to implement business rules for various clients.
  • Designed T-scripts to identify long running queries and blocking sessions.
  • Writing and Debugging T-SQL, stored procedures, Views and User Defined Functions.
  • Data migration (import & export - BCP) from Text to SQL Server
  • Created database objects like tables, views, indexes, stored-procedures, triggers, and user defined functions.
  • Customized the stored procedures and database triggers to meet the changing business rules.

Environment: SQL Server 2005, SQL Server 2000, Oracle 9i, Visual Basic, Excel, Tableau.

We'd love your feedback!