Hadoop Developer Resume Irving, TX - Hire IT People

SUMMARY

Around 7 years of professional IT experience in analyzing requirements, designing, building, highly distributed mission critical products and applications.
Strong end - to-end experience on Hadoop Development and implementing Big Data technologies.
Expertise in core Hadoop and its ecosystem tools like Spark, CASSANDRA, HDFS, MapReduce, Oozie, Hive, Sqoop, Pig, Flume, Kafka, HBase and ZooKeeper.
Experience on NoSQL databases like Cassandra, HBase & Mongo DB and their Integration with Hadoop cluster.
Experienced in Spark Core, Spark SQL and Spark Streaming.
Implemented Scala scripts, UDF's using both Data frames/SQL and RDD/MapReduce in Spark for Data Aggregation, queries and writing data into HDFS through Sqoop.
Developed fan-out workflow using Flume and Kafka for ingesting data from various data sources like Webservers, REST API by using network sources and ingested data into Hadoop with HDFS sink.
Exposure to Cassandra CQL with Java APIs to retrieve data from Cassandra tables.
Experience in Oozie workflow scheduler to manage Hadoop jobs by DAG of actions with control flows.
Experience in usingZookeeperandOozieoperational services to coordinate clusters and scheduling workflows.
Developed pipeline for constant information ingestion utilizing Kafka and Spark streaming.
Experienced in using Spark SQL and Spark Dataframe to cleanse and integrate data.
Hands on experience in developing Scala scripts using Data frames/SQL/Data Sets and RDD/Map Reduce in Spark for Data aggregation and queries.
Analyze Cassandra database and compare it with other open-source NoSQL databases.
Worked with Spark on parallel computing to enhance knowledge aboutRDDusing CASSANDRA.
Good understanding and working experience on Hadoop Distributions likeClouderaandHortonworks.
Experience in managing large shared MongoDB cluster and managing life cycle of MongoDB including sizing, automation, monitoring and tuning.
Developed Python scripts to monitor health of Mongo databases and perform ad-hoc backups using Mongo dump and Mongo restore.
Experience withApacheTEZonHiveandPIGto achieve better responsive time while running MR Jobs.
Experience with Hadoop deployment and automation tools such asAmbari, Cloudbreak, and EMR.
Hands-on experience on analyzing data using Hive QL, Pig Latin, and custom MapReduce programs.
Experienced in working with structured data using Hive QL, JOIN Operations, Hive UDFs, Partitions, Bucketing and internal/external tables.
ImplementedmodulesusingAmazon Cloud ComponentsS3, EC2, Elastic beanstalk and SimpleDB.
Hands-on creating custom UDFs for Pig and Hive to consolidate strategies and usefulness of Python/Java into Pig Latin and HQL (HiveQL).
Experienced with full text search and implemented data querying with faceted reader search using Solr.
Hands-on experience with monitoring tools to check status of cluster using Cloudera manager.
Implemented Data Quality in ETL Tool Talend and having good knowledge in Data Warehousing and ETL Tools like IBM DataStage, Informatica and Talend.
Knowledge on integrating Kerberos into Hadoop to make cluster stronger and secure for unauthorized users.
Monitor the Hadoop cluster connectivity and security using Ambari monitoring system.
Experienced in backend development using SQL, stored procedures on Oracle 10g and 11g.
Good working knowledge on multithreading Core Java, J2EE, JDBC, jQuery, JavaScript and Web Services (SOAP, REST).

TECHNICAL SKILLS

Big Data/Hadoop Ecosystem: Horton works, Cloudera, Apache, EMR, Hive, Pig, Sqoop, Spark, Kafka, Oozie, Flume, Zookeeper, NIFI, Impala, TEZ

NoSQL Databases: Cassandra, HBase, MongoDB

Cloud Services: Amazon AWS

Languages: SQL, PL/SQL, Pig Latin, HiveQL, UNIX Shell Scripting, C, Java, Python, Scala

ETL Tools: Informatica, IBM, DataStage, Talend

Java/J2EE Technologies: Core Java, Servlets, Hibernate, spring, Struts, JSP, JDBC, EJB

Application Servers: Web Logic, Web Sphere, Tomcat

BI Tools: Tableau, Splunk

Databases: Oracle, MySQL, DB2, Teradata, MS SQL Server

Operating Systems: UNIX, Windows, iOS, LINUX

IDE’s: IntelliJ IDEA, Eclipse, NetBeans

PROFESSIONAL EXPERIENCE

Confidential - Irving, TX

Hadoop Developer

Responsibilities:

Involved in loading data from Unix file system to HDFS
Responsible for building scalable distributed data solutions usingHadoop.
Developed Oozie workflows for daily incremental loads, which gets data from Teradata and then import to Hive tables.
Worked extensively on Apache NIFI as an ETL tool for batch processing and real time processing.
Involved in ETL, data integration and migration and used Sqoop to load data from oracle to HDFS on regular basis.
Managed and ReviewiedHadoopLog Files, took part in deploying and MaintainingHadoopCluster.
Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL.
Worked on different file formats like Text, ORC, Avro and Parquet, and compression techniques like snappy, gzip and zlib.
Experience in creating tables, dropping, and altered at run time without blocking updates and queries using HBase.
Wrote shell scripts to run multiple hive jobs to automate different hive tables incrementally
Imported data from different data sources into HDFS using Sqoop and performed transformations using Hive, MapReduce and then loaded data into HDFS.
Executed hive queries using hive command line and web GUI HUE to read write and query data into HBASE.
Developed Oozie workflows for daily incremental loads, which gets data from Teradata and then import to Hive tables.
Used FLUME to load log data into HDFS.
Used Pig to convert the fixed width file to delimited file.
Developed workflows in Apache NIFI to ingest, prepare and publish data.
Developed Kafka producer and consumer components for real time data processing.

Environment: Apache Nifi, Hive, Pig, HDFS, Hortonworks, Flume, Zookeeper, Sqoop, Oozie, RDBMS, Teradata, Apache Zeppelin, Shell Scripts, NoSQL, Java, FileZilla, putty, Postgre SQL.

Confidential - Stamford, CT

Spark/Scala Developer

Responsibilities:

Import the data from NoSQL CASSANDRA databases and stored it into AWS.
Performed transformations on the data using different Spark modules.
Responsible forSparkCore configuration based on type of Input Source.
Executed Spark code using Scala forSpark Streaming/Spark SQL for faster processing of data.
Performed SQL Joins among Hive tables to get input forSparkbatch process.
Involved in converting Hive/SQL queries intoSparktransformations usingSparkRDDs.
Developed Kafka producer and brokers for message handling.
Analyze Cassandra database and compare it with other open-source NoSQL databases.
Involved in importing the data to Hadoop using Kafka and implemented the Oozie job for daily imports.
Created Partitions, Buckets based on State to further process using Bucket based Hive joins.
Used Amazon CLI for data transfers to and from Amazon S3 buckets.
Involved in setup of Kafka producers and consumers along with Kafka brokers and topics.
Developed data pipeline using Kafka and Strom to store data into HDFS.
Executed Hadoop/Sparkjobs on AWS EMR using programs and data is stored in S3 Buckets.
Conducted vectorization queries using Hive for processing batch of rows.
Used various Hive optimization techniques like partitioning and bucketing.
Performed ad-hoc queries on structured data using Hive QL and used joins like Skew join, Map side join, and SMB join with hive for faster data access.
Experience in pulling the data from Amazon S3 bucket to data lake and built Hive tables on top of it and created data frames inSpark to perform further analysis.
ImplementedSparkRDD transformations, actions to implement business analysis.
Worked on Data serialization formats for converting complex objects into sequence bits by using Avro, Parquet, JSON, CSV formats.
DevelopedSparkscripts by using Scala as per the requirement.
Involved in converting Cassandra/Hive/SQL queries intoSparktransformations usingSparkRDD's in Scala and Python.
Worked with Cassandra for non-relation data storage and retrieval.
Used Amazon CloudWatch to monitor and track resources on AWS.
UsingSpark-Streaming APIs to perform transformations on the data from Kafka in real time and persists into Cassandra.
Worked on analyzing and examining customer behavioral data using Cassandra.
Experienced in automating jobs with Event driven and time-based jobs using Oozie workflow.

Environment: Cassandra, Kafka, Spark 2.3, Pig, Hive, Oozie, AWS, Solr, NoSQL, SQL, Scala 2.11, Python, Java, FileZilla, putty, IntelliJ, GitHub.

Confidential - Omaha, NE

Hadoop Developer

Responsibilities:

Created reports for the BI team using Sqoop to export data into HDFS and Hive.
Migrated Hive QL into Impala to minimize query response time.
Responsible for analyzing the performance of Hive queries using Impala.
Development of Apache Flume client to send data as events to flume sever and stored in file and HDFS.
Developed Flume configuration to extract log data from different resources and transfer data with different file formats (JSON, XML, Parquet) to Hive tables using different SerDe’s.
Developed PIG Latin scripts to extract the data from the web server output files to load into HDFS.
Involved in developing Hive DDLs to create, alter and drop Hive tables.
Developed Spark applications for data transformations and loading into HDFS using RDD and Dataframes .
Created concurrent access for Hive tables with shared and exclusive locking that can be enabled in Hive with the help of Zookeeper implementation in the cluster.
Did various performance optimizations like using distributed cache for small datasets, partition and bucketing in Hive, doing map side joins.
Analyze Cassandra database and compare it with other open-source NoSQL databases.
Developed the Sqoop scripts to make the interaction between HDFS and RDBMS (Oracle, MySQL).
Plan, deploy, monitor, and maintain Amazon AWS cloud infrastructure consisting of multiple EC2 nodes.
Storing and loading the data from HDFS to Amazon S3 and backing up the Namespace data into NFS.
Implemented Name Node backup using NFS. This was done for High availability.
Developed ETL processes to transfer data from different sources, using Sqoop, Impala, and bash.
Used Flume to stream through the log data from various sources.
Configured Flume to extract the data from the web server output files to load into HDFS.
Designed and implemented the MongoDB schema.
Used PIG to perform data validation on the data ingested using Sqoop and Flume and the cleansed data set is pushed into MongoDB.
Wrote services to store and retrieve user data from the MongoDB for the application on devices.
Used Mongoose API to access the MongoDB from NodeJS.
Implemented Spark operations on RDD and converted Hive/SQL queries into Spark transformations using RDD.
Written the shell scripts to monitor the data of Hadoop daemon services and respond accordingly to any warning or failure conditions.
Wrote shell scripts for rolling day-to-day processes and it is automated.

Environment: Apache Flume, Hive, Pig, HDFS, Zookeeper, Sqoop, RDBMS, AWS, Cloudera, MongoDB, Shell Scripts, Eclipse.

Confidential, Irving, TX

Java Developer

Responsibilities:

Involved in End to End Design and Development of UI Layer, Service Layer, and Persistence Layer.
Implemented Spring MVC for designing and implementing the UI Layer for the application.
Implemented UI screens using JSF for defining and executing UI flow in the application.
Have used AJAX to retrieve data from server synchronously in the background without interfering with the display and existing page in an interactive way.
Have Used DWR (Direct Web Remoting) generated script to make AJAX calls to JAVA.
Worked with JavaScript for dynamic manipulation of the elements on the screen and to validate the input.
Analyze Cassandra database and compare it with other open-source NoSQL databases.
Involved in writing Spring Validator Classes for validating the input data.
Have used JAXB to marshal & unmarshal java objects to Communicate with the backend mainframe system.
Involved in writing complex PL/SQL and SQL blocks for the application.
Involved in Unit Testing, User Acceptance Testing and Bug Fixing.

Environment: MVC Architecture, Spring-IOC, Hibernate-JPA, JAVA Web Services-REST, Spring Batch, XML, Java Script, Oracle, Database (10g) and JSF UI, JUnits.

Confidential

Java Developer

Responsibilities:

Used Servlets, Struts, JSP andJavaBeans for developing the Performance module using Legacy Code.
Involved in coding for JSP pages, Form Beans and Action Classes in Struts.
Created Custom Tag Libraries to support the Struts framework.
Involved in Writing Validations.
Involved in Database Connectivity through JDBC.
Implemented JDBC for mapping an object-oriented domain model to a traditional relational database.
Created SQL queries, Sequences, Views for the backend database in Oracle database.
Developed JUnit Test cases and performed application testing for QC team.
Used JavaScript for client-side validations.
Developed dynamic JSP pages with Struts.
Participated in weekly project meetings, updates, and Provided Estimates for the assigned task.

Environment: JAVA, J2EE, STRUTS, JSP, JDBC, ANT, XML, IBM Web Sphere, WSAD, JUNIT, DB2, Rational Rose, CVS, SOAP, and RUP.

We provide IT Staff Augmentation Services!

Hadoop Developer Resume

Irving, TX

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship