Hadoop Consultant Resume
Lawrenceville, NJ
SUMMARY
- Over 5+ years of professional experience in Requirements Analysis, Design, Development and Implementation of Big Data Technologies.
- Good Experience wif Big Data technologies and Hadoop ecosystem components like Spark, MapReduce, Hive, Pig, YARN, HDFS, NoSQL Databases like HBase & Cassandra, Oozie, Sqoop, Flink, Flume and Kafka.
- Strong Knowledge on Architecture of Distributed systems and parallel processing, In - depth understanding of MapReduce Framework and Spark execution framework.
- Expertise in writing end to end Data Processing Jobs to analyse data using MapReduce, Spark and Hive
- Extensive experience in working wif structured data using HiveQL, join operations, writing custom UDF’s and experienced in optimizing Hive Queries.
- Experience using various Hadoop Distributions (Cloudera, MapR, Hortonworks and Amazon AWS) to fully implement and leverage new Hadoop features.
- Proficient in using Cloudera Manager, an end-to-end tool to manage Hadoop operations in Cloudera Cluster.
- Experience in Apache Flume and Kafka for collecting, aggregating and moving huge chunks of data from various sources such as web server, telnet sources etc.
- Extensive experience in writing Pig scripts to transform raw data from several data sources into forming baseline data.
- Extensive experience in importing/exporting data from/to RDBMS and Hadoop Ecosystem using Apache Sqoop.
- Good noledge and experience of Real time streaming technologies Spark and Kafka.
- Experience in optimization of MapReduce algorithm using Combiners and Partitioners to deliver teh best results.
- Very good understanding of Partitions, bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance.
- Extensive experiences in working wif semi/unstructured data by implementing complex MapReduce programs using design patterns.
- Proficient in writing Advanced SQLs and performance tuning of SQLs.
- Strong problem-solving, organizing, team management, communication and planning skills, wif ability to work in team environment.
- Ability to write clear, well-documented, well-commented and efficient code as per teh requirement.
TECHNICAL SKILLS
Hadoop/ Big Data: HDFS, Map Reduce, Pig, Hive, Sqoop, Spark, Spark SQL, Zookeeper, Oozie Workflow, Cloudera Manager, Pig Latin, Hcatalog, Spark Streaming, HortonWorks
No SQL Databases: Hbase, Cassandra, MongoDB
Languages: SQL, Java, Unix Shell
Development Tools: Eclipse, Toad for Cloud and No SQL, SQL Developer, Toad, SQL Navigator
Methodologies: Agile, Waterfall.
Databases: Oracle 11g/10g/9i/8i, DB2, Microsoft SQL Server
Operating Systems: Windows, Linux, UNIX
PROFESSIONAL EXPERIENCE
Confidential, Lawrenceville, NJ
Hadoop Consultant
Responsibilities:
- Responsible for managing data from multiple sources.
- Loading teh data from teh different Data sources like (SQL, DB2, and Oracle) into HDFS using Sqoop and load into Hive tables.
- Developed various Big Data workflows using custom MapReduce, Pig, Hive and Sqoop.
- Assisted in exporting analysed data to relational databases using Sqoop.
- Creating Oozie workflows and coordinator jobs for recurrent triggering of Hadoop jobs such as Java map-reduce, Pig, Hive, Sqoop as well as system specific jobs (such as Java programs and shell scripts) by time (frequency) and data availability.
- Installing, Upgrading and Managing Hadoop Cluster on Hortonworks.
- Developed Spark Streaming Jobs in Scala to consume data from Kafka Topics, made transformations on data and inserted to HBase.
- Used Spark for fast and general processing engine compatible wif Hadoop data.
- Used Spark to design and perform both batch processing (similar to MapReduce) and new workloads like streaming, interactive queries, and machine learning.
- Setup of HADOOP Cluster on AWS, which includes configuring different components of HADOOP.
- Analyzed large data sets by running Hive queries, and Pig scripts.
- Implemented Pig Latin scripts using operators such as LOAD, STORE, DUMP, FILTER, DISTINCT, FOREACH, GENERATE, GROUP, COGROUP, ORDER, LIMIT, AND UNION.
- Built re-usable Hive UDF libraries which enabled various business analysts to use these UDF’s in Hive querying.
- Developed Simple to complex MapReduce Jobs using Hive and Pig.
- Developed Pig Latin scripts to extract teh data from teh web server output files to load into HDFS.
- Applied MapReduce frameworkjobs in java for data processing by installing and configuring Hadoop, HDFS
- Importing and exporting data into HDFS and Hive using Sqoop.
- Worked on NoSQL databases including HBase, MongoDB, and Cassandra.
- Performed data analysis in Hive by creating tables, loading it wif data and writing hive queries which will run internally in a MapReduce way.
- Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, HBase NoSQL database and Sqoop.
- Extracted files from MongoDB through Sqoop and placed in HDFS and processed.
- Involved in creating Hive tables, and loading and analyzing data using hive queries.
- Used FLUME to export teh application server logs into HDFS.
- This role also entailed working closely wif Data science and Platform consulting teams to validate teh architectural approach, check design constraints in teh setup of enterprise level data ingest stores.
Environment: Hadoop, MapReduce, Horton Works, HDFS, Linux, Sqoop, Spark, Pig, Hive, Oozie, Flume, Pig Latin, Java, AWS, Python, Hbase, Eclipse and Windows.
Confidential, Plano TX
Hadoop Developer/Admin
Responsibilities:
- Responsible for implementation and ongoing administration of Hadoop infrastructure and setting up infrastructure
- Developed data pipelines using Stream-sets Data Collector to store data from Kafka into HDFS, Solr, Elastic Search, HBase and MapRDB.
- Event Streaming on different stages on Streamsets Data Collector, running a MapReduce job on event triggers to convert Avro to Parquet.
- Real time streaming, performing transformations on teh data using Kafka and Kafka Streams.
- Developed Spark Streaming Jobs in Scala to consume data from Kafka Topics, made transformations on data and inserted to HBase.
- Loaded stream data into HDFS using Flume, Kafka and Spark Streaming.
- Created Phoenix tables, mapped to HBase tables and implemented SQL queries to retrieve data.
- Streaming events from HBase to Solr using Lily HBase Indexer.
- Developed Flink jobs to stream data from ActiveMQ to Kafka.
- Loaded data from csv files to spark, created data frames and queried data using Spark SQL.
- Created external tables in Hive, Loaded JSON format log files and ran queries using HiveQL.
- Designed HBase row key and Data-Modelling of data to insert to HBase Tables using concepts of Lookup Tables and Staging Tables.
- Created HBase tables using HBase API and HBase Shell commands and loaded data into teh Tables.
- Captured teh Metrics wif Kibana, Logstash and ElasticSearch for Logs, Used Grafana for Monitoring.
- Worked wif MapR, Cloudera and Hortonworks platforms as part of a Proof of concept.
- Created and maintained Technical documentation for launching Hadoop Clusters and for executing Hive queries.
- Automated teh process to copy files in Hadoop system for testing purpose at regular intervals.
- Importing and exporting structured data from different relational databases into HDFS and Hive using Sqoop.
- Configured Zookeeper to implement node coordination in clustering support.
- Used Pig as ETL tool to do transformations, event joins and some pre-aggregations before storing teh data onto HDFS.
- Created PIG scripts to run on huge buckets.
- Configured Flume for efficient collection, aggregation and transformation of huge log data from various sources to HDFS.
- Worked wif cloud services like Amazon Web Services (AWS) and involved in ETL, Data Integration and Migration.
- Used Agile Scrum methodology/ Scrum Alliance for development.
- Worked wif different teams to ensure data quality and availability.
Environment: HDFS, Hadoop, kafka, MapReduce, Hortonworks, Cloudera, ElasticSearch, Spark, Pig, AWS, ETL, Hive, Avro, Grafana, Scala, H-Base, ZooKeeper, Oozie, Sqoop, Agile and Windows.
Confidential, Long Beach, CA
Hadoop Developer
Responsibilities:
- Assisted in upgrading, configuration and maintenance of various Hadoop infrastructures.
- Developed Big Data Solution on Hortonworks platform dat enabled teh business and technology teams to make data-driven decisions on teh best ways to acquire customers and provide them business solutions.
- Implemented Hive tables and HQL Queries for teh reports. Written and used complex data type in Hive.
- Developed Hive queries to analyze reducer output data.
- Using Ambari configuring initial development environment using Hortonworks standalone sandbox and monitoring teh Hadoop ecosystem.
- Worked on moving some of teh data pipelines from CDH cluster to run on AWS.
- Designed workflow by scheduling Hive processes for Log file data, which is streamed into HDFS using Flume.
- Created HBase tables to store various data formats of data coming from different sources.
- Installed and configured Hive, Pig, Sqoop, Flume and Oozie on teh Hadoop cluster.
- Configured Spark streaming to get ongoing information from teh Kafka and stored teh stream information to HDFS.
- Developed data pipeline using Flume, Sqoop to ingest customer behavioural data and purchase histories into HDFS for analysis.
- Extract, transform, and load (ETL) data from multiple federated data sources (JSON, relational database, etc.) wif Data Frames in Spark.
- Responsible for managing data coming from different sources.
- Developed MapReduce (YARN) programs to cleanse teh data in HDFS obtained from heterogeneous data sources to make it suitable for ingestion into Hive schema for analysis.
- Designed and developed Map Reduce jobs to process data coming in different file formats like XML, CSV and JSON.
- Used Sqoop to import teh data from RDBMS to Hadoop Distributed File System (HDFS).
- Involved in loading and transforming large sets of Structured, Semi-Structured and Unstructured data and analyzed them by running Hive queries and Pig scripts.
- Migrated ETL jobs to Pig scripts do Transformations, even joins and some pre-aggregations before storing teh data onto HDFS.
- Used Pig to perform data validation on teh data ingested using scoop and flume and teh cleansed data set is pushed into Hbase.
- Worked wif Zookeeper, Oozie, and Data Pipeline Operational Services for coordinating teh cluster and scheduling workflows.
- Implemented daily workflow for extraction, processing and analysis of data wif Oozie.
- Worked wif configuration management groups for providing various deployment environments set up including System Integration testing, Quality Control testing etc.
Environment: s: Hadoop 2.x, Hive, HQL, HDFS, MapReduce, Sqoop, Flume, Oozie, Putty, Cloudera Manager 4 and CDH 5, HortonWorks, Spark, Pig, Kafka, HBase, Zookeeper, Ambari, Teradata, ETL, and Windows.
Confidential, Jacksonville, FL
Hadoop Administrator
Responsibilities:
- Installed/Configured/Maintained Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase, Zookeeper and Sqoop.
- Installed, configured and deployed data node hosts for Hadoop Cluster deployment.
- Involved in collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis
- Worked on installing cluster, commissioning & decommissioning of DataNodes, NameNode recovery, capacity planning, and slots configuration.
- Worked wif different distributions of Hadoop and Big Data technologies including Hortonworks and Cloudera.
- Extracted files from Cassandra through Sqoop and placed in HDFS and processed.
- Conducted POC for Hadoop and Cassandra as part of NextGen platform implementation, Includes connecting to Hadoop cluster and Cassandra ring and executing sample programs on servers.
- Installed Oozie workflow engine to run multiple Sqoop, Hive and Pig Jobs.
- Maintained teh cluster securely using Kerberos and making teh cluster upend running all teh time also troubleshooting if any problem persists.
- Administrator for Pig, Hive and Hbase installing updates patches and upgrades.
- Implemented optimization and performance testing and tuning of Hive and Pig.
- Defined workflow using Oozie framework for automation.
- Used Sqoop to export data from HDFS to RDBMS.
- Worked wif HiveQL on big data of logs to perform a trend analysis of user behaviour on various online modules.
- Worked on debugging, performance tuning of Hive & Pig Jobs.
- Wrote Hadoop Job Client utilities and integrated them into monitoring system.
- Extracted files from MongoDB through Sqoop and placed in HDFS and processed.
- Wrote queries in MongoDB to generate reports to display in teh dash board.
- Responsible for creating, modifying topics (Kafka Queues) as and when required wif varying configurations involving replication factors and partitions.
Environment: Hive, HQL scripts, Map Reduce, HBase, Pig, Sqoop, HortonWorks, CDH, Zookeeper, Oozie, Cassandra, HDFS, HiveQL, MongoDB, Kafka, Shell Scripts, My-SQL and Windows.