Hadoop Developer Resume
Irving, TX
SUMMARY
- Proactive IT developer with 8 years of working experience on development and design of various scalable systems using Hadoop Technologies on various environments.
- Extraordinary Understanding of Hadoop building and Hands on involvement with Hadoop segments such as Job Tracker, Task Tracker, Name Node, Data Node and HDFS Framework.
- Extensive experience in analyzing data using Hadoop Ecosystems including Sqoop, Flume, Kafka, Storm, HDFS, Hive, Pig, Impala, Oozie, Zookeeper, Solr, Nifi, SparkSQL, Spark Streaming.
- Capable of processing large sets of structured, semi - structured and unstructured data and supporting systems application architecture.
- Configured Zookeeper, Cassandra, and Flume to the existing Hadoop cluster.
- Have an experience in importing and exporting data using Sqoop from HadoopDistributed FileSystems to Relational Database Systems to HadoopDistributed File Systems.
- Expertise in writing HadoopJobs for analyzing data using Hive QL (Queries), Pig Latin (Data flowlanguage), and custom MapReduce programs in Java.
- Involvement in creating custom UDFs for Pig and Hive to consolidate strategies and usefulness of Python/Java into Pig Latin and HQL (HiveQL).
- Experience in converting Hive queries into Spark transformations using Spark RDDs and Scala.
- Hands on Experience in troubleshooting errors in HBase Shell, Pig, Hive and MapReduce.
- Hands-on experience in provisioning and managing multi-tenant Cassandra cluster on public cloud environment - Amazon Web Services (AWS) - EC2, Open Stack.
- Experience in NoSQL Column-Oriented Databases like HBase, Cassandra, Mongo dB and its Integration with Hadoop cluster.
- Experience in maintaining the big data platform using open source technologies such as Spark and Elastic Search.
- Experience in installation, configuration, supporting and managing HadoopClusters using Hortonworks, and Cloudera (CDH3, CDH4) distributions on Amazon web services(AWS).
- Experience in configuring the flume agents for the transfer of data from external systems to HDFS.
- Good understandingon Yarn and Mesos.
- Planned and created answer for constant information ingestion utilizing Kafka, Storm, Spark spilling and differentNoSQL databases.
- Experienced in using apache Hue and Ambari to manage and monitor the Hadoop clusters.
- Developed Scala scripts, UDF's using both Data frames/SQL and RDD/MapReduce in Spark for Data Aggregation, queries and writing data back into RDBMS through Sqoop.
- Experience in understanding the security requirements for Hadoopand integrate with Kerberos authentication and authorization infrastructure.
- Good hands on experience in creating the RDD's, Data frames for the required input data and performed the data transformations using Spark Scala.
- Good knowledge on Datasets.
- Knowledge in developing a Nifi flow prototype for data ingestion in HDFS.
- Extensive experience working in Oracle, DB2, SQL Server, PL/SQL and My SQL database and JavaCore concepts like OOPS, Multithreading, Collections, and IO.
- Experienced in designing Web Applications using HTML5, CSS3, JavaScript, Json, JQuery, AngularJS,Bootstrap, and Ajax under Windows operating system.
- Experience in Service Oriented Architecture using Web Services like SOAP&Restful.
- Learning on administration situated design (SOA), work processes and web administrations utilizing XML, SOAP, and WSDL
- Extensive experience in middle-tier development using J2EE technologies like JDBC, JNDI, JSP,Servlets, JSP, JSF, Struts, Spring, Hibernate, JDBC, EJB.
- Have good interpersonal, communicational skills, strong problem solving skills, explore to new technologies with ease and a good team member.
TECHNICAL SKILLS
Big Data Eco systems: HDFS, MapReduce, Hive, YARN, Pig, Sqoop, Kafka, Storm, Flume, Oozie, and Zookeeper, Apache Spark, Apache Tez, Impala, Nifi, Apache Solr, Active MQ, Scala.
No SQL Databases: HBase, mongo DB, Cassandra
Programming Languages: C, C++, Java, J2EE, PL/SQL, Pig Latin, Scala, Python
Java/J2EE Technologies: JDBC, JNDI, JSON, JSTL, RMI, JMS, Java Script, JSP, Servlets, EJB, JSF, JQuery, AngularJS
Frameworks: MVC, Struts, Spring, Hibernate
Operating Systems: Sun Solaris, HP-UNIX, Red Hat Linux, Ubuntu Linux and Windows XP/Vista/7/8
Web Technologies: HTML, DHTML, XML, AJAX, WSDL, SOAP
Web/Application servers: Apache Tomcat, WebLogic, JBoss.
Version control: GIT, SVN, CVS
Network Protocols: TCP/IP, UDP, HTTP, DNS, DHCP
PROFESSIONAL EXPERIENCE
Confidential, Irving, TX
Hadoop Developer
Responsibilities:
- Involved in managing nodes onHadoopcluster and monitorHadoop cluster job performance using Cloudera manager.
- Optimizing the Hive Queries using the various files format like JSON, Avro, ORC, and Parquet.
- Worked onSpark RDD transformations to map business analysis and apply actions on top of transformations.
- Experienced in working with spark eco system using Spark SQL and Scala queries on different formats like Text file, Avro, Parquet files.
- Worked inSpark streaming to get ongoing information from the Kafka and store the stream information to HDFS.
- Developed PigLatin scripts and Pig command line transformations for data joins and custom processing of Map reduce outputs and loading tables fromHadoop to various clusters.
- Talend jobs for data ingestion, enrichment, and provisioning.
- Worked in migrating HiveQL into Impala to minimize query response time.
- Involved in loading data from edge node to HDFS using shell scripting.
- Worked with Kerberos and integrated it to the Hadoop cluster to make it more strong and secure from unauthorized access.
- Created Hive tables, dynamic partitions, buckets for sampling, and working on them using HQL.
- Worked onSpark using Scala and Spark SQL for faster testing and processing of data.
- Experienced a proof of concept using Kafka, Mongo DBfor processing streaming data.
- Involved in advanced procedures like text analytics and processing using the in-memory computing capabilities like Apache Spark written in Scala.
- Developed HDFS with huge amounts of data using Apache Kafka.
- Implemented Talend jobs to load data from different sources and integrated with Kafka.
- Integrated Oozie with the rest of theHadoop stack supporting several types ofHadoopjobs out of the box (such as Map-Reduce, Pig, Hive, and Sqoop) as well as system specific jobs (such as Java programs and shell scripts).
Environment: Map Reduce, HDFS, Spark, Scala, Kafka, Hive, Pig,Spark streaming, Mongo DB, maven, Jenkins, UNIX, Python, MRUnit, Git.
Confidential, Perry, Iowa
Hadoop Developer
Responsibilities:
- Worked onSpark SQL to handle structured data in Hive.
- Involved in making Hive tables, stacking information, composing hive inquiries, producing segments and basins for enhancement.
- Involved in migrating tables from RDBMS into Hive tables using SQOOP and later generate visualizations using Tableau.
- Worked on complex MapReduce program to analyses data that exists on the cluster.
- Analyzed substantial data sets by running Hive queries and Pig scripts.
- Written Hive UDFs to sort Structure fields and return complex data type.
- Worked in AWS environment for development and deployment of custom Hadoop applications.
- Involved in creating Shell scripts to simplify the execution of all other scripts (Pig, Hive, Sqoop, Impala and MapReduce) and move the data inside and outside of HDFS.
- Creating files and tuned the SQL queries in Hive utilizing HUE.
- Involved in collecting and aggregating large amounts of log data using Storm and staging data in HDFS for further analysis.
- Created the Hive external tables using Accumulo connector.
- Knowledge in developing Nifi flow prototype for data ingestion in HDFS.
- Managed real time data processing and real time Data Ingestion in Mongo DB and Hive using Storm.
- Created custom Solr Query segments to optimize ideal search matching.
- Developed Spark scripts by using Python shell commands.
- Stored the processed results In Data Warehouse, and maintaining data using Hive.
- Experienced in working with Spark eco system using Spark SQL and Scala queries on different formats like Text file, CSV file.
- Created Oozie workflow and Coordinator jobs to kick off the jobs on time for data availability.
- Worked with NoSQL databases like Mongo DB in making Mongo DB tables to load expansive arrangements of semi structured data.
- DevelopedSpark scripts by using Python shell commands as per the requirement.
- Installed Oozie workflow engine to run multiple Hive and Pig jobs, which run independently with time and data availability.
- Worked and learned a great deal from Amazon Web Services (AWS) Cloud services like EC2, S3, EMR.
Environment: Cloudera, HDFS, MapReduce, Storm, Hive, Pig, Sqoop, Mongo DB, Apache Spark, Python,Accumulo, Oozie Scheduler, Kerberos, AWS, Tableau, Java, UNIX Shell scripts, HUE, Nifi, Solr, Git, Maven.
Confidential, Boston, MA
Hadoop Developer
Responsibilities:
- Responsible for importing log files from various sources into HDFS using Flume.
- Handled Big Data utilizing a Hadoop group comprising of 40 hubs.
- Performed complex HiveQL queries on Hive tables.
- Actualized Partitioning, Dynamic Partitions, Buckets in HIVE.
- Exported data from DB2 to HDFS using Sqoop and Developed MapReduce jobs using Java API.
- Created final tables in Parquet format.
- Developed PIG scripts for source data validation and transformation.
- Developed Shell, and Python scripts to automate and provide Control flow to Pig scripts.
- Involved in unit testing using MR unit for MapReduce jobs.
- Utilized Hive and Pig to create BI reports.
- Developed data integration programs in a Hadoop environment with NoSQL data store Cassandra for data access and analysis
- Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java map-reduce Hive, Pig, and Sqoop.
- Worked with Informatica MDM in creating single view of the data.
Environment: Horton works, HDFS, Pig, Hive, MapReduce, Java, Informatica, Oozie, Linux/Unix Shell scripting, Cassandra, Python, Perl, Java (jdk1.7), Git, Maven, Jenkins.
Confidential, NJ
Java Developer
Responsibilities:
- Effectively interacted with team members and business users for requirements gathering.
- Involved in analysis, design, and implementation phases of the software development lifecycle (SDLC).
- Implementation of spring core J2EE patterns like MVC, Dependency Injection(DI), and Inversion of Control (IOC).
- Implemented REST Web Services with Jersey API to deal with customer requests.
- Developed test cases using J Unit and used Log4j as the logging framework.
- Worked with HQL and Criteria API from retrieving the data elements from database.
- Developed user interface using HTML, Spring Tags, JavaScript, JQuery, and CSS.
- Developed the application using Eclipse IDE and worked under Agile Environment.
- Design and implementation of front end web pages using CSS, JSP, HTML, java Script Ajax and, Struts
- Utilized Eclipse IDE as improvement environment to plan, create and convey Spring segments on Web Logic
Environment: Java, J2EE, HTML, JavaScript, CSS, J Query, Spring 3.0, JNDI, Hibernate 3.0, Java Mail, Web Services, REST, Oracle 10g, J Unit, Log4j, Eclipse, Web logic 10.3.
Confidential
Java Developer
Responsibilities:
- Designed and implemented the training and reports modules of the application using Servlets, JSP and Ajax.
- Developed custom JSP tags for the application.
- Writing queries for fetching and manipulating data using ORM software ibatis.
- Used Quartz schedulers to run the jobs sequentially at given time.
- Implemented design patterns like Filter, Cache Manager, and Singleton to improve the performance of the application.
- Implemented the reports module of the application using Jasper Reports to display dynamically generated reports for business intelligence.
- Deployed the application in client’s location on Tomcat Server.
Environment: HTML, Java Script, Ajax, Java, Servlets, JSP, ibatis, Tomcat Server, SQL Server, Jasper Reports.