We provide IT Staff Augmentation Services!

Hadoop Developer Resume

4.00/5 (Submit Your Rating)

Irving, TX

SUMMARY

  • Around 7 years of professional IT experience in analyzing requirements, designing, building, highly distributed mission critical products and applications.
  • Strong end - to-end experience on Hadoop Development and implementing Big Data technologies.
  • Expertise in core Hadoop and its ecosystem tools like Spark, CASSANDRA, HDFS, MapReduce, Oozie, Hive, Sqoop, Pig, Flume, Kafka, HBase and ZooKeeper.
  • Experience on NoSQL databases like Cassandra, HBase & Mongo DB and their Integration with Hadoop cluster.
  • Experienced in Spark Core, Spark SQL and Spark Streaming.
  • Implemented Scala scripts, UDF's using both Data frames/SQL and RDD/MapReduce in Spark for Data Aggregation, queries and writing data into HDFS through Sqoop.
  • Developed fan-out workflow using Flume and Kafka for ingesting data from various data sources like Webservers, REST API by using network sources and ingested data into Hadoop with HDFS sink.
  • Exposure to Cassandra CQL with Java APIs to retrieve data from Cassandra tables.
  • Experience in Oozie workflow scheduler to manage Hadoop jobs by DAG of actions with control flows.
  • Experience in usingZookeeperandOozieoperational services to coordinate clusters and scheduling workflows.
  • Developed pipeline for constant information ingestion utilizing Kafka and Spark streaming.
  • Experienced in using Spark SQL and Spark Dataframe to cleanse and integrate data.
  • Hands on experience in developing Scala scripts using Data frames/SQL/Data Sets and RDD/Map Reduce in Spark for Data aggregation and queries.
  • Analyze Cassandra database and compare it with other open-source NoSQL databases.
  • Worked with Spark on parallel computing to enhance knowledge aboutRDDusing CASSANDRA.
  • Good understanding and working experience on Hadoop Distributions likeClouderaandHortonworks.
  • Experience in managing large shared MongoDB cluster and managing life cycle of MongoDB including sizing, automation, monitoring and tuning.
  • Developed Python scripts to monitor health of Mongo databases and perform ad-hoc backups using Mongo dump and Mongo restore.
  • Experience withApacheTEZonHiveandPIGto achieve better responsive time while running MR Jobs.
  • Experience with Hadoop deployment and automation tools such asAmbari, Cloudbreak, and EMR.
  • Hands-on experience on analyzing data using Hive QL, Pig Latin, and custom MapReduce programs.
  • Experienced in working with structured data using Hive QL, JOIN Operations, Hive UDFs, Partitions, Bucketing and internal/external tables.
  • ImplementedmodulesusingAmazon Cloud ComponentsS3, EC2, Elastic beanstalk and SimpleDB.
  • Hands-on creating custom UDFs for Pig and Hive to consolidate strategies and usefulness of Python/Java into Pig Latin and HQL (HiveQL).
  • Experienced with full text search and implemented data querying with faceted reader search using Solr.
  • Hands-on experience with monitoring tools to check status of cluster using Cloudera manager.
  • Implemented Data Quality in ETL Tool Talend and having good knowledge in Data Warehousing and ETL Tools like IBM DataStage, Informatica and Talend.
  • Knowledge on integrating Kerberos into Hadoop to make cluster stronger and secure for unauthorized users.
  • Monitor the Hadoop cluster connectivity and security using Ambari monitoring system.
  • Experienced in backend development using SQL, stored procedures on Oracle 10g and 11g.
  • Good working knowledge on multithreading Core Java, J2EE, JDBC, jQuery, JavaScript and Web Services (SOAP, REST).

TECHNICAL SKILLS

Big Data/Hadoop Ecosystem: Horton works, Cloudera, Apache, EMR, Hive, Pig, Sqoop, Spark, Kafka, Oozie, Flume, Zookeeper, NIFI, Impala, TEZ

NoSQL Databases: Cassandra, HBase, MongoDB

Cloud Services: Amazon AWS

Languages: SQL, PL/SQL, Pig Latin, HiveQL, UNIX Shell Scripting, C, Java, Python, Scala

ETL Tools: Informatica, IBM, DataStage, Talend

Java/J2EE Technologies: Core Java, Servlets, Hibernate, spring, Struts, JSP, JDBC, EJB

Application Servers: Web Logic, Web Sphere, Tomcat

BI Tools: Tableau, Splunk

Databases: Oracle, MySQL, DB2, Teradata, MS SQL Server

Operating Systems: UNIX, Windows, iOS, LINUX

IDE’s: IntelliJ IDEA, Eclipse, NetBeans

PROFESSIONAL EXPERIENCE

Confidential - Irving, TX

Hadoop Developer

Responsibilities:

  • Involved in loading data from Unix file system to HDFS
  • Responsible for building scalable distributed data solutions usingHadoop.
  • Developed Oozie workflows for daily incremental loads, which gets data from Teradata and then import to Hive tables.
  • Worked extensively on Apache NIFI as an ETL tool for batch processing and real time processing.
  • Involved in ETL, data integration and migration and used Sqoop to load data from oracle to HDFS on regular basis.
  • Managed and ReviewiedHadoopLog Files, took part in deploying and MaintainingHadoopCluster.
  • Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL.
  • Worked on different file formats like Text, ORC, Avro and Parquet, and compression techniques like snappy, gzip and zlib.
  • Experience in creating tables, dropping, and altered at run time without blocking updates and queries using HBase.
  • Wrote shell scripts to run multiple hive jobs to automate different hive tables incrementally
  • Imported data from different data sources into HDFS using Sqoop and performed transformations using Hive, MapReduce and then loaded data into HDFS.
  • Executed hive queries using hive command line and web GUI HUE to read write and query data into HBASE.
  • Developed Oozie workflows for daily incremental loads, which gets data from Teradata and then import to Hive tables.
  • Used FLUME to load log data into HDFS.
  • Used Pig to convert the fixed width file to delimited file.
  • Developed workflows in Apache NIFI to ingest, prepare and publish data.
  • Developed Kafka producer and consumer components for real time data processing.

Environment: Apache Nifi, Hive, Pig, HDFS, Hortonworks, Flume, Zookeeper, Sqoop, Oozie, RDBMS, Teradata, Apache Zeppelin, Shell Scripts, NoSQL, Java, FileZilla, putty, Postgre SQL.

Confidential - Stamford, CT

Spark/Scala Developer

Responsibilities:

  • Import the data from NoSQL CASSANDRA databases and stored it into AWS.
  • Performed transformations on the data using different Spark modules.
  • Responsible forSparkCore configuration based on type of Input Source.
  • Executed Spark code using Scala forSpark Streaming/Spark SQL for faster processing of data.
  • Performed SQL Joins among Hive tables to get input forSparkbatch process.
  • Involved in converting Hive/SQL queries intoSparktransformations usingSparkRDDs.
  • Developed Kafka producer and brokers for message handling.
  • Analyze Cassandra database and compare it with other open-source NoSQL databases.
  • Involved in importing the data to Hadoop using Kafka and implemented the Oozie job for daily imports.
  • Created Partitions, Buckets based on State to further process using Bucket based Hive joins.
  • Used Amazon CLI for data transfers to and from Amazon S3 buckets.
  • Involved in setup of Kafka producers and consumers along with Kafka brokers and topics.
  • Developed data pipeline using Kafka and Strom to store data into HDFS.
  • Executed Hadoop/Sparkjobs on AWS EMR using programs and data is stored in S3 Buckets.
  • Conducted vectorization queries using Hive for processing batch of rows.
  • Used various Hive optimization techniques like partitioning and bucketing.
  • Performed ad-hoc queries on structured data using Hive QL and used joins like Skew join, Map side join, and SMB join with hive for faster data access.
  • Experience in pulling the data from Amazon S3 bucket to data lake and built Hive tables on top of it and created data frames inSpark to perform further analysis.
  • ImplementedSparkRDD transformations, actions to implement business analysis.
  • Worked on Data serialization formats for converting complex objects into sequence bits by using Avro, Parquet, JSON, CSV formats.
  • DevelopedSparkscripts by using Scala as per the requirement.
  • Involved in converting Cassandra/Hive/SQL queries intoSparktransformations usingSparkRDD's in Scala and Python.
  • Worked with Cassandra for non-relation data storage and retrieval.
  • Used Amazon CloudWatch to monitor and track resources on AWS.
  • UsingSpark-Streaming APIs to perform transformations on the data from Kafka in real time and persists into Cassandra.
  • Worked on analyzing and examining customer behavioral data using Cassandra.
  • Experienced in automating jobs with Event driven and time-based jobs using Oozie workflow.

Environment: Cassandra, Kafka, Spark 2.3, Pig, Hive, Oozie, AWS, Solr, NoSQL, SQL, Scala 2.11, Python, Java, FileZilla, putty, IntelliJ, GitHub.

Confidential - Omaha, NE

Hadoop Developer

Responsibilities:

  • Created reports for the BI team using Sqoop to export data into HDFS and Hive.
  • Migrated Hive QL into Impala to minimize query response time.
  • Responsible for analyzing the performance of Hive queries using Impala.
  • Development of Apache Flume client to send data as events to flume sever and stored in file and HDFS.
  • Developed Flume configuration to extract log data from different resources and transfer data with different file formats (JSON, XML, Parquet) to Hive tables using different SerDe’s.
  • Developed PIG Latin scripts to extract the data from the web server output files to load into HDFS.
  • Involved in developing Hive DDLs to create, alter and drop Hive tables.
  • Developed Spark applications for data transformations and loading into HDFS using RDD and Dataframes .
  • Created concurrent access for Hive tables with shared and exclusive locking that can be enabled in Hive with the help of Zookeeper implementation in the cluster.
  • Did various performance optimizations like using distributed cache for small datasets, partition and bucketing in Hive, doing map side joins.
  • Analyze Cassandra database and compare it with other open-source NoSQL databases.
  • Developed the Sqoop scripts to make the interaction between HDFS and RDBMS (Oracle, MySQL).
  • Plan, deploy, monitor, and maintain Amazon AWS cloud infrastructure consisting of multiple EC2 nodes.
  • Storing and loading the data from HDFS to Amazon S3 and backing up the Namespace data into NFS.
  • Implemented Name Node backup using NFS. This was done for High availability.
  • Developed ETL processes to transfer data from different sources, using Sqoop, Impala, and bash.
  • Used Flume to stream through the log data from various sources.
  • Configured Flume to extract the data from the web server output files to load into HDFS.
  • Designed and implemented the MongoDB schema.
  • Used PIG to perform data validation on the data ingested using Sqoop and Flume and the cleansed data set is pushed into MongoDB.
  • Wrote services to store and retrieve user data from the MongoDB for the application on devices.
  • Used Mongoose API to access the MongoDB from NodeJS.
  • Implemented Spark operations on RDD and converted Hive/SQL queries into Spark transformations using RDD.
  • Written the shell scripts to monitor the data of Hadoop daemon services and respond accordingly to any warning or failure conditions.
  • Wrote shell scripts for rolling day-to-day processes and it is automated.

Environment: Apache Flume, Hive, Pig, HDFS, Zookeeper, Sqoop, RDBMS, AWS, Cloudera, MongoDB, Shell Scripts, Eclipse.

Confidential, Irving, TX

Java Developer

Responsibilities:

  • Involved in End to End Design and Development of UI Layer, Service Layer, and Persistence Layer.
  • Implemented Spring MVC for designing and implementing the UI Layer for the application.
  • Implemented UI screens using JSF for defining and executing UI flow in the application.
  • Have used AJAX to retrieve data from server synchronously in the background without interfering with the display and existing page in an interactive way.
  • Have Used DWR (Direct Web Remoting) generated script to make AJAX calls to JAVA.
  • Worked with JavaScript for dynamic manipulation of the elements on the screen and to validate the input.
  • Analyze Cassandra database and compare it with other open-source NoSQL databases.
  • Involved in writing Spring Validator Classes for validating the input data.
  • Have used JAXB to marshal & unmarshal java objects to Communicate with the backend mainframe system.
  • Involved in writing complex PL/SQL and SQL blocks for the application.
  • Involved in Unit Testing, User Acceptance Testing and Bug Fixing.

Environment: MVC Architecture, Spring-IOC, Hibernate-JPA, JAVA Web Services-REST, Spring Batch, XML, Java Script, Oracle, Database (10g) and JSF UI, JUnits.

Confidential

Java Developer

Responsibilities:

  • Used Servlets, Struts, JSP andJavaBeans for developing the Performance module using Legacy Code.
  • Involved in coding for JSP pages, Form Beans and Action Classes in Struts.
  • Created Custom Tag Libraries to support the Struts framework.
  • Involved in Writing Validations.
  • Involved in Database Connectivity through JDBC.
  • Implemented JDBC for mapping an object-oriented domain model to a traditional relational database.
  • Created SQL queries, Sequences, Views for the backend database in Oracle database.
  • Developed JUnit Test cases and performed application testing for QC team.
  • Used JavaScript for client-side validations.
  • Developed dynamic JSP pages with Struts.
  • Participated in weekly project meetings, updates, and Provided Estimates for the assigned task.

Environment: JAVA, J2EE, STRUTS, JSP, JDBC, ANT, XML, IBM Web Sphere, WSAD, JUNIT, DB2, Rational Rose, CVS, SOAP, and RUP.

We'd love your feedback!