Hadoop Developer Resume
Renton, WA
SUMMARY
- 8 years of experience in IT industry in all phases of Software Development Life Cycle .
- Extensive experience with mapping, analysis, transformation and support of application software.
- Experience in usage of Hadoop distribution like Cloudera5.3 (CDH5, CDH3) and Horton works distribution.
- Understanding and experience working with Cloud Infrastructure services like Azure.
- Experience in Azure Big Data Technologies like Azure Data Lake, HDInsight.
- Provisioned and Configured Proof of Concepts (POC) environments for Map Reduce, Hive, Oozie, Flume, HBase and other major components of Hadoop distributed system
- Involved in developing Spark/Scala scripts for changing data capture and delta record processing between newly arrived data and already existing data in HDFS and Blob Storage.
- Experience in Implementing and automating models created in Spark, Scala, Hive.
- Experience working with code repositories and continuous integration in Git.
- Worked with multiple Databases including RDBMS Technologies and NoSQL.
- Implemented Partitioning, Dynamic Partitioning and bucketing concepts in Hive to compute data metrics.
- Performed data analytics using HIVE, Spark/ Scala for Data Architects in our team.
- Experience creating Spark jobs and automate them using shell scripts.
- Experience in exporting data from MongoDB using Spark/ Scala from various sources to Azure Storage and performed transformations on it using Hive, Pig and Spark.
- Worked with different data format like structured, semi - structured and unstructured data.
- Worked on Oozie workflow engine for job scheduling.
- Developed UNIX shell scripts for creating reports from Hive data and automated them using the CRON job scheduler.
- Experience in usage of User interface Ambari which will monitor the Hadoop cluster.
- Experience in monitoring Hadoop Log files.
- Extensively worked on Spark using Scala on cluster for computational (analytics), On top of Hadoop performed advanced analytical application by making use of Spark with Hive and SQL.
- Experience with data visualizations tools like Power BI, Tableau.
- Getting data into Power BI via Azure Blob Storage and Spark Cluster.
- Responsible for generating actionable insights from complex data to drive real business results for various applications teams.
- Ability to learn new concepts, hardworking and Hadoop enthusiast.
TECHNICAL SKILLS
Hadoop Distributions: Cloudera, Hortonworks
Big Data Technologies: HDFS, Map Reduce, Pig, Hive, Spark, YARN, Kafka, Flume, Sqoop, Impala, Oozie, Zookeeper, Spark
RDBMS: MySQL, Oracle, Teradata, MSSQL Server, DB2
Programming languages: SQL, Java, Scala
NoSQL Databases: MongoDB, HBase
Development Tools: Net Beans, Eclipse, Git, Maven, IntelliJ
Virtual Machines: VMware, Virtual Box
OS: Cent OS 5.5, Unix, Red Hat Linux, Ubuntu
Cloud Environment: Microsoft Azure
BI Tools: Power BI, Tableau
PROFESSIONAL EXPERIENCE
Confidential, Renton, WA
Hadoop Developer
Responsibilities:
- Worked with cloud Infrastructure services like Microsoft Azure.
- Provisioned HDInsight Hadoop Cluster type and Spark Cluster type.
- Developed Spark Applications by using Scala and Implemented Apache Spark data processing project to handle data from various data sources.
- Experience in Implementing and automating models created in Spark, Scala, Hive, etc.
- Hands on experience of Spark SQL jobs to load data into HDFS/ Blob Storage rather than sqooping which increases performance.
- Developed Spark code using Scala and Spark-SQL for faster testing and data processing.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Spark Data Frame.
- Developed multiple POCs using Spark and deployed on the Yarn cluster, compared the performance of Spark, with HIVE and MySQL.
- Experience working with code repositories and continuous integration in Git.
- Experience in usage of User interface Ambari which will monitor the Hadoop cluster.
- Experience in monitoring Hadoop Log files.
- Developed UNIX shell scripts for creating reports from Hive data and automated them using the CRON job scheduler.
- Developed Oozie coordinators to schedule Hive scripts to create Data pipeline.
- Experience with data visualizations tools like Power BI, Tableau.
- Understanding of development and project methodologies.
Confidential, Denver, CO
Hadoop Developer
Responsibilities:
- Developed Spark Applications by using Scala, Java and Implemented Apache Spark data processing project to handle data from various RDBMS and Streaming sources.
- Worked with the Spark for improving performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD and Spark YARN.
- Created Hive tables, loaded data and analyzed data using Hive queries and HiveQL in HUE.
- Used HBase in accordance with Hive as and when required for real time low latency queries.
- Performed transformations, cleaning and filtering on imported data using Hive, Map Reduce.
- Worked on optimizing and tuning Map Reduce jobs to achieve optimal performance.
- Imported data using Sqoop to load data from IBM DB2 to HDFS on regular basis.
- Developed UNIX shell scripts for creating reports from Hive data and automated them using the CRON job scheduler.
- Provide hands-on support for UNIX, Storage and Backup track during day to day activity and troubleshooting problems on different locations for multiple clients.
- Used Flume to collect, aggregate, and store the web log data from different sources like web servers, mobile and network devices and pushed to HDFS.
- Developed Oozie coordinators to schedule Hive scripts to create Data pipelines.
- Worked with different teams to ensure data quality and availability.
Environment: Hadoop, HDFS, Map Reduce, Hive, Sqoop, Oozie, Spark, Kafka, NoSQL, HBase, UNIX Shell Scripting, Linux, Java (JDK SE 6, 7), Eclipse.
Confidential, Sacramento, CA
Hadoop Developer
Responsibilities:
- Worked with structured and semi structured data of approximately 100TB.
- Worked on Kafka to bring the data from data sources and keep it in HDFS systems for filtering.
- Extensively used Hive/HQL to query or search for a particular string in Hive tables in HDFS
- Experience in developing customized UDF's in python to extend Hive functionality.
- Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
- Cross examining data loaded in Hive table with the source data in MySQL.
- Worked on SQOOP for retrieving the data from RDBMS to HDFS and HIVE if needed.
- Worked with UNIX shell scripting for automate the required jobs to run at any time.
Environment: Hadoop, HDFS, Hive, Python, Spark, SQL, Teradata, Yarn, SQOOP, Kafka, UNIX Shell Scripting.
Confidential
Hadoop Developer
Responsibilities:
- Implemented complex Map Reduce programs to perform joins on the Map side using distributed cache in java.
- Developed Map Reduce programs and Hive queries to analyze shipping pattern and customer satisfaction index over the history of data.
- Experience in Writing PIG User Define Function and Hive UDF’s.
- Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java map-reduce Hive, Pig, and Sqoop.
- Created Map Reduce programs to handle semi/unstructured data like XML, JSON, Avro data files and sequence files for log files.
- Used SQOOP to import the data from RDBMS to HDFS to achieve the reliability of data.
- Developed pig scripts to do transformations, event joins, filter boot traffic and some pre-aggregations before storing the data onto HDFS.
Environment: Hadoop, HDFS, Map Reduce, SQOOP, Hive, Pig, Oozie, HBase, Java, Flume 1.2.0, Eclipse IDE, CDH3.
Confidential
SQL Developer
Responsibilities:
- Involved in all phases of Software Development Life Cycle (SDLC) and UML diagrams like Use Case Diagrams, Class Diagrams and Sequence Diagrams to represent the detail design phase.
- Designed database tables, indexes, constraints etc.
- Loaded files into system using MS SQL Server 2008.
- Created and managed structured objects like tables, views, stored procedures, and triggers.
- Manage employee records as needed to keep the system up to date.
- Combining data from multiple tables using Inner joins and Left Outer joins and created sub-queries for complex queries involving multiple tables.
- Develop/Unit Test Extract, Transformation, Load Programs using SQL.
Environment: MS SQL Server 2008 and SQL Server 2008 R2, UNIX shell scripts, Eclipse, Java/J2EE, Teradata.
Confidential
SQL Developer
Responsibilities:
- Performed extensive data extraction from web and other sources and handled data preparation - missing values, formatting, transformations using SSIS.
- Written Stored Procedures and SQL scripts both in SQL Server and Oracle to implement business rules for various clients.
- Designed T-scripts to identify long running queries and blocking sessions.
- Writing and Debugging T-SQL, stored procedures, Views and User Defined Functions.
- Data migration (import & export - BCP) from Text to SQL Server
- Created database objects like tables, views, indexes, stored-procedures, triggers, and user defined functions.
- Customized the stored procedures and database triggers to meet the changing business rules.
Environment: SQL Server 2005, SQL Server 2000, Oracle 9i, Visual Basic, Excel, Tableau.