We provide IT Staff Augmentation Services!

Hadoop Developer Resume

0/5 (Submit Your Rating)

Nashville, TN

SUMMARY

  • Over 9+ years of commendable experience in the IT industry with proven expertise in Big Data Analytics, and Development.
  • Experience in installing, configuring and maintaining multiple Hadoop clusters of different sizes.
  • Exposure to design and development of database driven systems.
  • Good knowledge of Hadoop architectural components like Hadoop Distributed File System, Name Node, Data Node, Task Tracker, Job Tracker, and Map Reduce programming.
  • Experience in developing and deploying of applications using Hadoop based components like Hadoop Map Reduce(MR1), YARN (MR2), HDFS, Hive, Pig, HBase, Flume, Sqoop, Spark (Streaming, Spark SQL, Spark ML), Storm, Kafka, Oozie, Zookeeper and Avro.
  • Exposure on Big Data technologies and Hadoop eco system, in depth understanding of Map Reduce and Hadoop infrastructure.
  • Experience in writing MapReduce jobs using native Java code, Pig, Hive for data processing.
  • Hands on experience in importing and exporting data into HDFS and Hive using Sqoop.
  • Exposure on usage of NoSQL databases column - oriented HBase and Cassandra.
  • Extensive experienced in working with structured, semi-structured, and unstructured data by implementing complex map reduce programs using design patterns.
  • Excellent knowledge of multiple platforms such as Cloudera, Hortonworks, MapR etc.
  • Familiar with data architecture including data ingestion pipeline design, Hadoop information architecture, data modeling and data mining, machine learning and advanced data processing.
  • Hands on experience in major Big Data components Apache Kafka, Apache spark, Zookeeper, Avro.
  • Experienced in implementing unified data platforms using Kafka producers/ consumers, implement pre-processing using storm topologies.
  • Strong experience in architecting real time streaming applications and batch style large scale distributed computing applications using tools like Spark Streaming, Spark SQL, Kafka, Flume, Map reduce, Hive etc.
  • Experience using various Hadoop Distributions (Cloudera, Hortonworks, MapR etc) to fully implement and leverage new Hadoop features.
  • Strong experience in architecting real time streaming applications and batch style large scale distributed computing applications using tools like Spark Streaming, Spark SQL, Kafka, Flume, Map reduce, Hive etc.
  • Experience in Apache Flume for collecting, aggregating and moving huge chunks of data from various sources such as webserver, telnet sources etc.
  • Great team player and quick learner with effective communication, motivation, and organizational skills combined with attention to details and business improvements.
  • Experienced in involving complete SDLC life cycle includes requirements gathering, design, development, and testing production environments.

TECHNICAL SKILLS

Hadoop/Big Data: HDFS, Map reduce, HBase, Pig, Hive, Sqoop, Data warehousing

Informatic Power: Center, ETL, Informatica Power Exchange, Metadata, Data Mining, SQL, OLAP, OLTP, Work flow manager and work flow monitor.

Programming languages: Java, Python, Linux shell scripts

Databases: MS-SQL Server, HBase, NoSQL Cassandra

Web Servers: Web Logic, Web Sphere, Apache Tomcat, AWS

Web Technologies: HTML, XML, JavaScript, Python

Operating Systems: Linux, Unix, Windows 8, Windows 7, Windows Server …

PROFESSIONAL EXPERIENCE

Hadoop Developer

Confidential - Nashville, TN

Responsibilities:

  • Development of Apache Flume client to send data as events to flume sever and stored in file and HDFS.
  • Developed Flume configuration to extract log data from different resources and transfer data with different file formats (JSON, XML, Parquet) to Hive tables using different SerDe's.
  • Developed PIG Latin scripts to extract the data from the web server output files to load into HDFS.
  • Involved in developing Hive DDLs to create, alter and drop Hive tables.
  • Created concurrent access for Hive tables with shared and exclusive locking that can be enabled in Hive with the help of Zookeeper implementation in the cluster.
  • Did various performance optimizations like using distributed cache for small datasets, partition and bucketing in Hive, doing map side joins.
  • Developed the Sqoop scripts to make the interaction between HDFS and RDBMS (Oracle, MySQL).
  • Plan, deploy, monitor, and maintain Amazon AWS cloud infrastructure consisting of multiple EC2 nodes and VMware as required in the environment.
  • Storing and loading the data from HDFS to Amazon S3 and backing up the Namespace data into NFS.
  • Implemented Name Node backup using NFS. This was done for High availability.
  • Used PIG to perform data validation on the data ingested using Sqoop and Flume and the cleansed data set is pushed into MongoDB.
  • Designed and implemented the MongoDB schema.
  • Wrote services to store and retrieve user data from the MongoDB for the application on devices.
  • Used Mongoose API to access the MongoDB from NodeJS.
  • Created and Implemented Business validation and coverage Price Gap Rules in Talend on Hive, using Talend Tool.
  • Involved in development of Talend components to validate the data quality across different data sources.
  • Automated and Scheduling the Rules on Weekly, Monthly Basis in TAC (Talend Administration Centre).
  • Wrote shell scripts for rolling day-to-day processes and it is automated.
  • Written the shell scripts to monitor the data of Hadoop daemon services and respond accordingly to any warning or failure conditions.

Hadoop/Spark Developer

Confidential, Los Angeles, CA

Responsibilities:

  • Developed data pipeline using Flume, Sqoop, Pig and Java map reduce to ingest transaction logs and financial histories into HDFS for analysis.
  • Improving the performance and optimization of existing algorithms in Hadoop using Spark context, Spark-SQL and Spark YARN using Scala.
  • Involved in Sqooping terabytes of data from traditional systems to Hadoop.
  • Used Pig to do transformations, event joins, filter boot traffic and some pre-aggregations before storing the data onto HDFS.
  • Involved in developing Pig UDFs for the needed functionality that is not out of the box available from Apache Pig.
  • Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
  • Involved in developing Hive DDLs to create, alter and drop Hive tables.
  • Written complex Hive scripts and also written Hive UDFs and Cluster coordination services through Zoo Keeper.
  • Involved in creating Hive tables, loading with data and writing hive queries
  • Involved in developing Hive UDFs for the needed functionality that is not out of the box available from Apache Hive.
  • Gained Knowledge in performance troubleshooting and tuning Hadoop clusters.
  • Involved in using HCATALOG to access Hive table metadata from Map Reduce or Pig code.
  • Responsible for developing data pipeline using flume, Sqoop and pig to extract the data from weblogs and store in HDFS.
  • Involved in using SQOOP for importing and exporting data into HDFS.
  • Used Eclipse and ant to build the application. Proficient work experience with NOSQL, Monod databases. Also the HDFS data from Rows to Columns and Columns to Rows.
  • Involved in developing Shell scripts to orchestrate execution of all other scripts (Pig, Hive, and Map Reduce) and movethedatafileswithinandoutsideofHDFS.

Environment: MapReduce, HDFS, Hive, Pig, Hue, Oozie, Core Java, Perl/Shell scripts, Eclipse, HBase, Flume, Spark, Kafka, Cloudera Manager, Cassandra, REST API, Python, Greenplum DB, IDMS, VSAM, SQL*PLUS, Toad, Putty, Windows NT, UNIX Shell Scripting, Pentaho, Talend, Bigdata, YARN.

Big Data/ Hadoop Developer

Confidential - Dallas, TX

Responsibilities:

  • Participated in Hadoop Deployment and infrastructure scaling.
  • Involved in creating Hive tables, and loading and analyzing data using hive queries.
  • Developed Simple to complex Map Reduce Jobs using Hive and Pig.
  • Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
  • Migrated complex map reduce programs into in memory Spark processing using Transformations and actions.
  • Parsed high-level design spec to simple ETL coding and mapping standards.
  • Maintained warehouse metadata, naming standards and warehouse standards for future application development.
  • Worked with Linux systems and RDBMS database on a regular basis to ingest data using Sqoop.
  • Implemented Kafka consumers to move data from Kafka partitions into Cassandra for near real-time analysis.
  • Involved in Hadoop cluster task like adding and removing nodes.
  • Managed and reviewed Hadoop log files and loaded log data into HDFS using Sqoop.
  • Created and maintained Technical documentation for launching HADOOP Clusters and for executing Hive queries, Pig Scripts, Sqoop jobs.

Environment: Horton works, Hadoop, HDFS, Spark, Oozie, Pig, Hive, MapReduce, Sqoop, Cassandra, Linux.

Hadoop Developer

Confidential - Atlanta, GA

Responsibilities:

  • Experienced in querying data using SparkSQL on top of Spark engine for faster data sets processing.
  • Expert knowledge on MongoDB NoSQL data modeling, tuning, disaster recovery backup used it for distributed storage and processing using CRUD.
  • Experience in creating tables, dropping and altered at run time without blocking updates and queries using HBase and Hive.
  • Experience in working with different join patterns and implemented both Map and Reduce Side Joins.
  • Wrote Flume configuration files for importing streaming log data into HBase with Flume.
  • Imported several transactional logs from web servers with Flume to ingest the data into HDFS.
  • Using Flume and Spool directory for loading the data from local system (LFS) to HDFS.
  • Installed and configured pig, written Pig Latin scripts to convert the data from Text file to Avro format.
  • Created Partitioned Hive tables and worked on them using Hive QL.
  • Loading Data into HBase using Bulk Load and Non-bulk load.
  • Installed, Configured Talend ETL on single and multi-server environments.
  • Experience in monitoring Hadoop cluster using Cloudera Manager, interacting with Cloudera support and log the issues in Cloudera portal and fixing them as per the recommendations.
  • Experience in Cloudera Hadoop Upgrades and Patches and Installation of Ecosystem Products through Cloudera manager along with Cloudera Manager Upgrade.
  • Worked on continuous Integration tools Jenkins and automated jar files at end of day.
  • Used Zookeeper to coordinate the servers in clusters and to maintain the data consistency.
  • Experienced knowledge over designing Restful services using java based API's like JERSEY.
  • Worked in Agile development environment having KANBAN methodology. Actively involved in daily Scrum and other design related meetings.
  • Used OOZIE Operational Services for batch processing and scheduling workflows dynamically.
  • Supported in setting up QA environment and updating configurations for implementing scripts with Pig, Hive and Sqoop

Environment: Hadoop, HDFS, Hive, Map Reduce, AWS Ec2, SOLR, Impala, MySQL, Oracle, Sqoop, Kafka, SQLTalend, Python, Yarn, Pig, Oozie, Linux-Ubuntu, Scala, Tableau, Maven, Jenkins, Java (JDK 1.6), Cloudera, JUnit, agile methodologies

Big Data Developer

Confidential

Responsibilities:

  • Worked on analyzing Hadoop cluster and different Big Data analytic tools including Pig, Hive, HBase database and Sqoop.
  • Coordinated with business customers to gather business requirements.
  • And, interact with other technical peers to derive Technical requirements and delivered the BRD and TDD documents.
  • Extensively involved in Design phase and delivered Design documents.
  • Set up 3 node Hadoop clusters with IBM Big Insights.
  • Worked with highly unstructured and semi structured data.
  • Extracted the data from Oracle into HDFS using Sqoop (version 1.4.3) to store and generate reports for visualization purpose.
  • Leveraged Solr API to search user interaction data for relevant matches.
  • Designed the Solr Schema, and used the SolrJ client api for storing, indexing, querying the schema fields
  • Collected and aggregated large amounts of web log data from different sources such as webservers, mobile using Apache Flume and stored the data into HDFS for analysis.
  • Extensive experience in writing Pig (version 0.12.0) scripts to transform raw data into baseline data.
  • Developed Hive (version 0.12.0) scripts to analyze data and mobile numbers are categorized into different segments and promotions are offered to customer based on segments.
  • Developed UDFs in Java as and when necessary to use in PIG and HIVE queries.
  • Worked on Oozie workflow engine for job scheduling.
  • Created Hive tables, partitions and loaded the data to analyze using HiveQl queries
  • Loading the data to HBASE by using bulk load and HBASE API

Environment: IBM Big Insights 2.1.2, Java, Hive, Pig, HBase, Sqoop, Flume, Oozie, Solr, Shell script.

We'd love your feedback!