We provide IT Staff Augmentation Services!

Big Data/ Hadoop Developer Resume

4.00/5 (Submit Your Rating)

Austin, TX

SUMMARY:

  • Overall 6 years of professional IT work experience in Analysis, Design, Development, Deployment and Maintenance of critical software and big data applications.
  • 3+ years of hands - on experience across Hadoop ecosystem that includes extensive experience into Big Data technologies like MapReduce, YARN, HDFS, HBase, Oozie, Hive, Sqoop, Pig, ZooKeeper and Flume.
  • In-depth knowledge in HDFS, Job Tracker, Task Tracker, Name Node, Data Node, and Map-Reduce programming.
  • Expertise in converting Map Reduce programs into Spark transformations using Spark RDD's.
  • Good business management knowledge, including business / organizational and operational design principles, customer and stakeholder management
  • Build distributed, scalable, and reliable data pipelines that ingest and process data at scale and in real-time
  • Configured Spark streaming to receive real-time data from the Kafka and store the stream data to HDFS using Scala .
  • Experience in implementing Real-Time event processing and analytics using messaging systems like Spark Streaming.
  • Excellent understanding of Hadoop Architecture and underlying Hadoop framework including Storage Management.
  • Hands on experience in installing, configuring, and using Hadoop components like Hadoop Map Reduce, HDFS, HBase 1.3.0, Hive 2.1.1, Sqoop 1.99.7 and Flume 1.7.0.
  • Managed data coming from different sources and involved in HDFS maintenance and loading of structured and unstructured data.
  • Experience in analyzing data using HiveQL 2.1.1 and custom MapReduce programs in Java.
  • Experience in working with NoSQL databases like Impala 2.7.0, HBase 1.3.0
  • Hands on experience in Linux Shell Scripting. Worked with Big Data distributions Cloudera.
  • Expert in writing complicated SQL Queries and database analysis for good performance.
  • Excellent analytical, Interpersonal and Communication skills, fast learner, hardworking and good team player.
  • Experience in writing stored procedures and complex SQL queries using relational databases like Oracle, SQL Server, and MySQL.
  • Experience in Extraction, Transformation and Loading (ETL) of data from multiple sources like Flat files, XML files, and Databases.
  • Supported various reporting teams and experience with data visualization tool Tableau.
  • Implemented Data Quality in ETL Tool Talend and having good knowledge in Data Warehousing and
  • ETL Tools like IBM DataStage, Informatica and Talend.
  • Experienced and in-depth knowledge of cloud integration with AWS using Elastic Map Reduce (EMR), Simple Storage Service (S3), EC2, Redshift and Microsoft Azure.
  • Detailed understanding of Software Development Life Cycle (SDLC) and strong knowledge in project implementation methodologies like Waterfall and Agile.

TECHNICAL SKILLS:

Languages: Python, R, PL/SQL, Java, HiveQL, Pig Latin, Scala

Hadoop Ecosystem: HDFS, YARN, Scala, Map Reduce Hive, Pig, Zookeeper, Sqoop, Oozie, Flume, Kafka, Impala, MongoDB, and HBase.

Databases: Oracle, MS-SQL Server, MySQL, NoSQL (HBase, MongoDB).

Tools: Eclipse, NetBeans, Talend.

Hadoop Platforms: Cloudera, Amazon Web services (AWS).

Operating Systems: Windows XP/2000/NT, Linux, UNIX.

Amazon Web Services: Redshift, EMR, EC2, S3, RDS, Cloud Search, Data Pipeline, Lambda.

Version Control: GitHub, SVN, CVS.

Packages: MS Office Suite, MS Vision, MS Project Professional.

PROFESSIONAL EXPERIENCE:

Confidential, Austin, TX

Big Data/ Hadoop Developer

Responsibilities:

  • Developed real time data processing applications by using Scala and Python and implemented Apache Spark Streaming from various streaming source like Kafka.
  • Experienced in writing live Real-time Processing and core jobs using Spark Streaming with Kafka as a data pipe-line system.
  • Worked on Amazon AWS concepts like EMR and EC2 web services for fast and efficient processing of Big Data.
  • Involved in importing the real time data to Hadoop using Kafka and implemented the Oozie job for daily.
  • Implemented Spark SQL to access hive tables into spark for faster processing of data.
  • Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
  • Developed Map Reduce jobs using Map Reduce Java API and HIVEQL.
  • Developed UDF, UDAF, UDTF functions and implemented it in HIVE Queries.
  • Developing Scripts and Batch Job to schedule a bundle (group of coordinators) which consists of various Hadoop Programs using Oozie.
  • Experience in using Avro data serialization system to handle Avro data files in map reduces programs.
  • Experienced in optimizing Hive queries, joins to handle different data sets.
  • Configured Oozie schedulers to handle different Hadoop actions on timely basis.
  • Involved in ETL, Data Integration and Migration by writing pig scripts.
  • Used different file formats like Text files, Sequence Files, Avro using Hive SerDe's.
  • Integrated Hadoop with Solr and implement search algorithms.
  • Experience in Storm for handling realtime processing.
  • Hands on Experience working in Cloudera distribution.
  • Assisted in creating and maintaining Technical documentation to launching HADOOP Clusters and even for executing Hive queries and Pig Scripts.
  • Worked hands on No-SQL databases like MongoDB for POC purpose in storing images and URIs.
  • Designed and implemented MongoDB and associated Restful web service.
  • Worked on analyzing and examining customer behavioral data using MongoDB.
  • Designed the data aggregations on Hive for ETL processing on Amazon EMR to process data as per business requirement
  • Involved in writing test cases and implement test classes using MRUnit and mocking frameworks.
  • Developed Sqoop scripts to extract the data from MYSQL and load into HDFS.
  • Setup EMR to process huge data which is stored in Amazon S3.
  • Experience in processing large volume of data and skills in parallel execution of process using Talend functionality.
  • Used Talend tool to create workflows for processing data from multiple source systems.

Environment: MapReduce, HDFS, Sqoop, LINUX, Oozie, Hadoop, Pig, Hive, Solr, Spark Streaming, Kafka, Storm, Spark, Scala, Python, MongoDB, Hadoop Cluster, Amazon Web Services, Talend.

Confidential, Dallas, TX

Big Data Engineer

Responsibilities:

  • Developed solutions to process data into HDFS (Hadoop Distributed File System), process within Hadoop and emit the summary results from Hadoop to downstream systems.
  • Developed a Wrapper Script around Teradata connector for Hadoop TCD to support option parameters.
  • Used Sqoop extensively to ingest data from various source systems into HDFS.
  • Hive was used to produce results quickly based on the report that was requested.
  • Played a major role in working with the team to leverage Sqoop for extracting data from Teradata.
  • Imported data from different relational data sources like Oracle, Teradata to HDFS using Sqoop.
  • Integrated Hive server 2 with Tableau using Cloudera Hive ODBC driver, for auto generation of Hive queries for non-technical business user.
  • Integrated multiple sources data (SQL Server, DB2, TD) into Hadoop cluster and analyzed data by Hive-HBase integration.
  • Involved in Hive-Hbase integration by creating hive external tables and specifying storage as Hbase format.
  • Implemented Data Validation using MapReduce programs to remove un-necessary records before move data into Hive tables.
  • Developed PIG UDFs for the needed functionality such as custom Pigs loader known as timestamp loader.
  • Involved in writing optimized Pig Script along with involved in developing and testing Pig Latin Scripts.
  • Worked on custom Pig Loaders and Storage classes to work with a variety of data formats such as JSON, Compressed CSV etc.
  • Oozie and Zookeeper were used to automate the flow of jobs and coordination in the cluster respectively.
  • Involved in moving log files generated from various sources to HDFS for further processing through Flume.
  • Worked on different file formats like Text files, Parquet, Sequence Files, Avro, Record columnar files (RC).
  • Developed several shell scripts, which acts as wrapper to start these Hadoop jobs and set the configuration parameters.
  • Kerberos security was implemented to safeguard the cluster.
  • Worked on a stand-alone as well as a distributed Hadoop application.
  • Tested the performance of the data sets on various NoSQL databases.
  • Understood complex data structures of different type (structured, semi structured) and de-normalizing for storage in Hadoop.

Environment: Hadoop, HDFS, Pig, Flume, Hive, MapReduce, Sqoop, Oozie, Zookeeper, HBase, Java Eclipse, SQL Server, Shell Scripting.

Confidential

Data Engineer

Responsibilities:

  • Involved in end to end design, development, Integration and testing of historical data load and Incremental data load.
  • Writing COPY scripts to load data from Amazon S3 to Redshift.
  • Writing the UNLOAD scripts to unload data from Redshift tables to S3 buckets based on the client requirement.
  • Automating the Incremental data load from source to tables using data pipelines.
  • Writing shell scripts for file level operations and others.
  • Validating the sample data loaded in Redshift through count, column and row level.
  • Internally Trained on Hadoop Eco System and actively participated in installation of Hadoop on cluster of 24 nodes.
  • Involved in the requirement analysis phase and Installing Hadoop and Setting up Hadoop cluster.
  • Loading data into HDFS and Transforming large data sets from RDBMS to HDFS by using the Sqoop tool.
  • Involved in flume agent setups for data collection from HTTP source to HDFS sink.
  • Hadoop Shell commands, Writing Map reduce Programs, Verifying the Hadoop Log Files.
  • Written Map Reduce Programs in Java and Pig Scripts for data processing on HDFS and created structured data.
  • Created Hive External Partitioned Tables and Hive UDFs. Actively involved in configuration of Hive server integration with Tableau
  • Have knowledge on scheduling Hive jobs through Azkaban Scheduler, for weekly and monthly basis.
  • Written Unix Shell scripts and cron jobs for scheduling pig scripts.

Confidential

Support Analyst

Responsibilities:

  • Debugging the stored procedures.
  • Creating and developing the SSIS packages.
  • Writing stored procedures for implementing the business logic.
  • Writing views to get the desired data outputs.
  • Monitoring the jobs in Maestro.
  • Resolving the issues in packages causing job failures.
  • Following the process for IT alerts.
  • Communicating with other teams for resolving
  • Support the multiple application issues.
  • Batch status updates.
  • Attending the end user calls and providing the solutions.

We'd love your feedback!