We provide IT Staff Augmentation Services!

Hadoop Developer Resume

5.00/5 (Submit Your Rating)

Overland Park, KS

SUMMARY:

  • 6+ years of experience in software development, deployment and maintenance of applications of various stages.
  • 1+years of experience on major components inHadoopEcosystem likeHadoopMap Reduce, HDFS, HIVE, PIG, Talend, HBase, Zookeeper, Sqoop, Oozie, Flume, Storm, Yarn, Spark, Scala and Avro.
  • Extensively worked on build tools like Maven, Log4j, Junit and Ant.
  • Experience in applying the latest development approaches including applications in Spark using Scala to compare the performance of Spark with Hive and SQL/Oracle.
  • Thorough knowledge with the data extraction, transformation and load in Hive, Pig and HBase
  • Hands on experience in coding Map Reduce/Yarn Programs using Java, Scala for analyzing Big data.
  • Worked with Apache Spark which provides fast and general engine for large data processing integrated with functional programming language Scala.
  • Good understanding knowledge in installing and maintaining Cassandra by configuring the Cassandra. yaml file as per the requirement and performed reads and writes using Java JDBC connectivity.
  • Hands on experience in writing Pig Latin scripts, working with grunt shells and job scheduling with Oozie.
  • Experience in designing and implementing of secureHadoopcluster using Kerberos.
  • Processing this data using Spark Streaming API with Scala.
  • Good exposure to MongoDB, it's functionality and Cassandra implementation.
  • Have a good experience working in Agile development environment including Scrum methodology.
  • Good Knowledge on Spark framework on both batch and real time data processing.
  • Hands on experience in MLlib from Spark are used for predictive intelligence, customer segmentation and for smooth maintenance in spark streaming.
  • Experience data processing like collecting, aggregating, moving from various sources using Apache Flume and Kafka.
  • Extensive Experience on importing and exporting data using Flume and Kafka.
  • Experience in migrating the data using Sqoop from HDFS to Relational Database System and vice - versa according to client's requirement.
  • Experienced in deployment ofHadoopCluster using Puppet tool.
  • Hands on experience in ETL, Data Integration and Migration and Extensively used ETL methodology for supporting Data Extraction, transformations and loading.
  • Good knowledge in Cluster coordination services through Zookeeper and Kafka.
  • Excellent knowledge in existing Pig Latin script migrating into Java Spark code.
  • Hands on Experience in setting up automated monitoring and escalation infrastructure forHadoop Cluster using Ganglia and Nagios.
  • Good understanding knowledge in in MPP databases such as HP Vertica and Impala.
  • Involved in developing Impala scripts for extraction, transformation, loading of data into data warehouse.
  • Extending HIVE and PIG core functionality by using custom User Defined Function's (UDF), User Defined Table-Generating Functions (UDTF) and User Defined Aggregating Functions (UDAF) for Hive and Pig.
  • Experience working on various Cloudera distributions like (CDH 4/CDH 5), Knowledge of working on Horton works and Amazon EMRHadoopdistributors.
  • Worked on version control tools like CVS, GIT, SVN.
  • Good experience in working with cloud environment like Amazon Web Services (AWS) EC2 and S3.
  • Experience in developing web pages using Java, JSP, Servlets, JavaScript, HTML, JQuery, Angular JS, Mobile JQuery, JBOSS 4.2.3, XML, Web Logic, SQL, PL/SQL, JUnit, and Apache-Tomcat, Linux.
  • Expertise in implementing and maintaining an Apache Tomcat /MySQL/PHP, LDAP, LAMP web service environment.
  • Worked with BI (Business Intelligence) teams in generating the reports and designing ETL workflows on Tableau. Deployed data from various sources into HDFS and building reports using Tableau.
  • Experience in all phases of Software development life cycle (SDLC).

LANGUAGES/TOOLS/TECHNOLOGIES:

J2EE Technologies: Java, JSP, Servlets, JDBC

Web Technologies: HTML, VXML,XML, XSLT, XSD, REST/SOAP API

Languages/Scripting: Java, J2EE, Scala, Akka, Ant, Jenkins, JavaScript, Perl, Sybase

Frame Works: MVC Architecture - Struts, Spark 1.6/2.0, Apache Camel

Application Server: Web logic, JBoss 5.x

Web Server: Tomcat

Data Base: Kafka0.9, MYSQL 5.0, Cassandra, Hive

IDE Tools: Eclipse, NetBeans, SOAP UI

Operating Systems: Windows, Linux

ORM Tools: Hibernate, Eclipse link

PROFESSIONAL EXPERIENCE:

Confidential, Overland Park, KS

Hadoop Developer

Responsibilities:

  • Experienced in designing and deployment ofHadoopcluster and different Big Data analytic tools including Pig, Hive, Cassandra, Oozie, Sqoop, Kafka, Spark, Impala with Cloudera distribution
  • Performed source data transformations using Hive.
  • Supporting infrastructure environment comprising of RHEL and Solaris.
  • Involved in developing a Map reduce framework that filters bad and unnecessary records.
  • Developed Spark scripts by using Scala shell commands as per the requirement.
  • Used Kafka to transfer data from different data systems to HDFS.
  • Created Spark jobs to see trends in data usage by users.
  • Used the Spark Cassandra Connector to load data to and from Cassandra
  • Responsible for generating actionable insights from complex data to drive real business results for various application teams.
  • Designed the Column families in Cassandra
  • Ingested data from RDBMS and performed data transformations, and then export the transformed data to Cassandra as per the business requirement.
  • Developed Spark code to using Scala and Spark - SQL for faster processing and testing.
  • Experience in NoSQL Column-Oriented Databases like Cassandra and its Integration withHadoop cluster.
  • Ec2 Collecting and aggregating large amounts of log data using Flume and staging data in HDFS for further analysis
  • Used Spark API overHadoopYARN as execution engine for data analytics using Hive.
  • Exported the analyzed data to the relational databases using Sqoop to further visualize and generate reports for the BI team.
  • Worked on different file formats like Text files and Avro.
  • Experience in installation, configuration, supporting and monitoringHadoopclusters using Apache, Cloudera manager and AWS Service console.
  • Created various kinds of reports using Power BI and Tableau based on the client's needs.
  • Worked on Agile Methodology projects extensively.
  • Experience designing and executing time driven and data driven Oozie workflows.
  • Setting up Kerberos principals and testing HDFS, Hive, Pig and MapReduce access for the new users.
  • Experienced in working with Spark eco system using SCALA and HIVE Queries on different data formats like Text file and parquet.
  • Log4j framework has been used for logging debug, info & error data.
  • Worked on installing cluster, commissioning & decommissioning of Data node, Namenode recovery, capacity planning, and slots configuration.
  • Experience in importing data from S3 to HIVE using Sqoop and Kafka.
  • Collected the logs data from web servers and integrated in to HDFS using Flume
  • Developed Hive scripts in Hive QL to de-normalize and aggregate the data.
  • Implemented map-reduce counters to gather metrics of good records and bad records.
  • Work experience with cloud infrastructure like Amazon Web Services (AWS).
  • Developed customized UDF's in java to extend Hive and Pig functionality.
  • Worked with SCRUM team in delivering agreed user stories on time for every sprint.
  • Implemented best income logic using Pig scripts.
  • Worked on different file formats (ORCFILE, Parquet, Avro) and different Compression Codecs (GZIP, SNAPPY, LZO).
  • Created applications using Kafka, which monitors consumer lag within Apache Kafka clusters. Used in production by multiple companies.
  • Using Spark-Streaming APIs to perform transformations and actions on the fly for building the common learner data model which gets the data from Kafka in near real time and Persists into Cassandra.
  • Experience in using Apache Kafka for collecting, aggregating and moving large amounts of data from application servers.
  • Used Hibernate ORM framework with Spring framework for data persistence and transaction management.
  • Performance analysis of Spark streaming and batch jobs by using Spark tuning parameters.
  • Worked towards creating real time data streaming solutions using Apache Spark/Spark Streaming, Kafka.
  • Worked along with theHadoopOperations team inHadoopcluster planning, installation, maintenance, monitoring and upgrades.
  • Used File System check (FSCK) to check the health of files in HDFS.
  • Worked in Agile development environment in sprint cycles of two weeks by dividing and organizing tasks. Participated in daily scrum and other design related meetings.

Environment: Hadoop, Hive, Map Reduce, Sqoop, Kafka, Spark, Yarn, Pig, Cassandra, Oozie, shell Scripting, Scala, Maven, Java, JUnit, agile methodologies, MySQL.

Confidential

Associate Software Engineer

Responsibilities:

  • Involved in the design, development and deployment of the Application using Java/J2EE Technologies.
  • Developed web components using JSP Servlets, JDBC and Coded JavaScript for AJAX and client side data validation.
  • Designed and Developed mappings using different transformations like Source Qualifier, Expression, Lookup (Connected & Unconnected), Aggregator, Router, Rank, Filter and Sequence Generator.
  • Imported data from various Sources transformed and loaded into Data Warehouse Targets using Informatica Power Center.
  • Made substantial contributions in simplifying the development and maintenance of ETL by creating re - usable Source, Target, Mapplets, and Transformation objects.
  • Experience in development of extracting, transforming and loading (ETL), maintain and support the enterprise data warehouse system and corresponding marts
  • Prepare DR plan and recovery process for GDW application.
  • Developed JSP pages using Custom tags and Tiles framework and Struts framework.
  • Used different user interface technologies JSP, HTML, CSS, and JavaScript for developing the GUI of the application.
  • Skills gained on web-based REST API, SOAP API, Apache for real-time data streaming
  • Programmed Oracle SQL, T-SQL Stored Procedures, Functions, Triggers and Packages as back-end processes to create and update staging tables, log and audit tables, and creating primary keys.
  • Extensively used Transformations like Aggregator, Router, Joiner, Expression, Lookup, Update Strategy, and Sequence Generator.
  • Developed mappings, sessions and workflows using Informatica Designer and Workflow Manager based on source to target mapping documents to transform and load data into dimension tables.

Environment: Java, Ajax, Informatica Power Center 8.x/9.x, REST API, SOAP API, Apache,Oracle10/11g, SQL Loader, MS SQL SERVER, Flat Files, Targets, Aggregator, Router, Sequence Generator.

We'd love your feedback!