Hadoop Developer Resume
TexaS
PROFESSIONAL SUMMARY
- 8+ years of IT experience in the field of Information Technology dat includes analysis, design, development and testing of complex applications.
- Implementation and extensive working experience in wide array of tools in the Big Data Stack like HDFS, Spark, MapReduce, Hive, Pig, Flume, Oozie, Sqoop, Kafka, Zookeeper and HBase.
- Expertise in writingSparkRDD transformations, Actions, Data Frames, Case classes for the required input data and performed the data transformations usingSpark - Core
- Good understanding of Cassandra architecture, replication strategy, gossip, snitch etc.
- Responsible for data modelling in Cassandra with deciding the column based on cardinality
- Designed Columnar families in Cassandra and Ingested data from RDBMS, performed data transformations, and then exported the transformed data to Cassandra as per the business requirement
- Used the SparkDataStax Cassandra Connector to load data to and from Cassandra
- Experienced in Creating data-models for Client’s transactional logs, analyzed the data from Casandra tables for quick searching, sorting and grouping using the Cassandra Query Language (CQL).
- Tested the cluster Performance using Cassandra-stress tool to measure and improve the Read/Writes
- Handled importing data from RDBMS into HDFS using Sqoop and vice-versa.
- Expertise in performing real time analytics on big data using HBase and Cassandra.
- Worked on building data pipelines usingKafka, Spark steaming, Spark batch processing, Spark SQL and ingestion in HBase /Hive for near real time Spend analytics.
- Developed analytical components usingKafka, Spark Stream and Scala
- Developed and designed system to collect data from multiple portal using Kafka and then process it using Spark.
- Experienced in working with Amazon Web Services (AWS) using EC2 for computing and S3 as storage mechanism
- Responsible for developing data pipeline with Amazon AWS to extract the data from weblogs and store in HDFS
- Implemented HortonworksNIFI(HDP 2.4) and recommended solution to inject data from multiple data sources to HDFS and Hive usingNIFI.
- Implemented Scalascripts, UDF's using both Data frames/SQL and RDD/MapReduce in Spark for Data Aggregation, queries and writing data into HDFS through Sqoop.
- Experience in developing and designing POCs usingScala and deployed on the Yarn cluster, compared the performance of Spark, with Hive and SQL/Teradata.
- Strong noledge on Hadoop HDFS architecture, Spark Core architecture, Kafka, Cassandra, MongoDB.
- Understanding in Zookeeper configuration as to provide cluster coordination services.
- Experienced in automating Oozie workflows and Job Controllers for job automation - Shell, Hive, Sqoop and email notification.
- Expertise in designing and deployment of Hadoop cluster and different Big Data analytic tools including Pig, Hive, HBase, Oozie, Sqoop, Flume, Spark, with Cloudera distribution.
- Expertise on writing MapReduce programs for validating data.
- Extensive experience in importing and exporting streaming data into HDFS using stream processing platforms like Flume and Kafka.
- Experience in developing data pipeline using Pig, Sqoop, and Flume to extract the data from weblogs and store in HDFS.
- Experienced on Loading streaming data into HDFS using Kafka messaging system.
- Worked on ELK stack like Elastic search, Logstash, Kibana.
- Experience in different Spark Modules like Spark-SQL, Spark Mllib, Spark Streaming, GraphX.
- Expertise in writingSparkRDD transformations, Actions, Data Frames, Case classes for the required input data and performed the data transformations usingSpark-Core
- Experience in complete Software Development Life Cycle (SDLC) in both Waterfall and Agile methodologies
- Experience in Amazon web services (AWS) cloud like S3, EC2 and EMR and also in Microsoft Azure.
- Experience in performance tuning, monitoring the Hadoop cluster by gathering and analyzing the existing infrastructure using Cloudera manager
- Used web-based UI development using JavaScript, jQuery UI, CSS, jQuery, HTML, HTML5, XHTML and JavaScript.
- Diverse experience in utilizing Java tools in business, Web, and client-server environments including Java Platform, J2EE, EJB, JSP, Java Servlets, Junit, Java database Connectivity (JDBC) technologies and application servers like Web Sphere 4 years’ experience in the field of Big Data.
- Strengths include good team player, excellent communication interpersonal and analytical skills and ability to work TEMPeffectively in a fast-paced, high volume, deadline-driven environment.
- Committed to excellence, self-motivator, team-player, and a far-sighted developerwith strong problem-solving skills and with zeal to learn new technologies.
TECHNICAL SKILLS
Big Data Ecosystem: Hadoop, MapReduce, Pig, Hive, YARN, Kafka, Flume, Sqoop, Impala, Oozie, ZooKeeper, Spark, MongoDB, Cassandra, AWS
Hadoop Distributions: Cloudera, Hortonworks
Languages: Java, Scala, SQL, HTML, DHTML, JavaScript, XML and C/C++
No SQL Databases: Cassandra, MongoDB and HBase
Java Technologies: Servlets, JavaBeans, JSP, JDBC and Eclipse
Frameworks: Struts, spring and Hibernate
DB Languages: MySQL, PL/SQL, Python, PostgreSQL and Oracle
RDBMS: Teradata, Oracle 9i,10g,11i, MS SQL Server, MySQL and DB2
Operating systems: UNIX, LINUX, Mac OS and Windows Variants
PROFESSIONAL EXPERIENCE:
Confidential, Texas
Hadoop Developer
Roles & Responsibilities:
- Developed Spark Applications by using Scala, Java and Implemented Apache Spark data processing project to handle data from various RDBMS and Streaming sources
- Experienced withSpark Context,Spark -SQL, Data Frame, Pair RDD's,SparkYARN
- Implemented Spark using Scala, Java and utilizing Data frames and SparkSQL API for faster processing of data
- Involved in converting Hive/SQL queries intoSparktransformations usingSparkRDD, Scala and Python
- UsedSparkAPI over Hortonworks Hadoop YARN to perform analytics on data in Hive.
- Exploring with theSparkimproving the performance and optimization of the existing algorithms in Hadoop using SparkContext, Spark -SQL, Data Frame, Pair RDD's,SparkYARN.
- Configured various views in Ambari such as Hive view, Tezview, and Yarn Queue manager
- Migrated large amount of data from various Databases like Oracle, Netezza, MySQL to Hadoop.
- Performance optimization dealing with large datasets using Partitions Spark in Memory capabilities, Broadcasts inSpark, TEMPEffective & efficient Joins, Transformations and other heavy lifting during ingestion process itself.
- Complete caring of Hive and Sparktuning with partitioning/bucketing of ORC and executors’/driver’s memory
- Worked on hive to gather billions of records and process then using Spark data frames
- Built the code in PySpark dat connects with hive and store the data in data frames
- Configured Spark Streaming to receive real time data and store the stream data to HDFS.
- Implement MMS monitoring and backup (Mongo DBManagement Services) on cloud and on local servers (on-premise and OPS Manager).
- Documenting Mongo DBinstallation, operations, security, auditing multiple environments.
- Performed CRUD operations like Update, Insert and Delete data inMongo DB.
- Worked on Mongo DBdatabase concepts such as locking, transactions, indexes, Shading, replication, schema design, etc.
- Evaluated Hortonworks NiFi(HDF 2.0) and recommended solution to inject data from multiple data sources to HDFS & Hive using NiFi.
- Loading data from different source (database & files) into Hive usingTalendtool.
- Worked with Developer teams to move data in to HDFS through HDF NiFi.
- Develop NiFiworkflow to pick up the multiple retail files from ftp location and move those to HDFS on daily basis.
- Expertise in integrating Kafka with Spark streaming for high speed data processing.
- Developed multiple Kafka Producers and Consumers as per the software requirement specifications.
- Used Kafka for log accumulation like gathering physical log documents off servers and places them in a focal spot like HDFS for handling
- Developed Kafka consumer's API in Scala for consuming data from Kafka topics
- Experience with EC2, Cloud Watch, Elastic Load Balancing and managing securities onAWS.
- Used Kafka Streams to ConfigureSpark Streaming to get information and then store it in HDFS
- Extract Real time feed using Kafka andSpark Streaming and convert it to RDD and process data in the form of Data Frame and save the data as Parquet format in HDFS.
- Hands on experience with Spark Core, SparkSQL,Spark Streaming using Scala.
Environment: Spark, Scala, Spark RDD, Hive, Mongo DB, Tableau, Pig, AWS (Amazon Web Services), Kafka, Spark Streaming, Nifi, Tez, elk, Kafka.
Confidential, Chicago, IL
Spark/Scala/Hadoop Developer
Roles & Responsibilities:
- Performance tuning the Sparkjobs by changing the configuration properties and using broadcast variables and used Spark SQL to handle structured data in Hive.
- ConfiguredSparkstreaming to get ongoing information from the Kafka and store the stream information to HDFS.
- UsedSpark andSpark -SQL to read the parquet data and create the tables in hive using the Scala API.
- ImplementedSparkusing Scala and utilizing Data frames andSparkSQL API for faster processing of data.
- Responsible for data modelling in Cassandra with deciding the column based on cardinality
- Designed Columnar families in Cassandra and Ingested data from RDBMS, performed data transformations, and then exported the transformed data to Cassandra as per the business requirement
- Used the SparkDataStax Cassandra Connector to load data to and from Cassandra
- Experienced in Creating data-models for Client’s transactional logs, analyzed the data from Casandra tables for quick searching, sorting and grouping using the Cassandra Query Language (CQL).
- UsedSpark Streaming to divide Streaming data into batches as an input toSpark engine for batch processing.
- Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
- Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice-versa.
- Tested the cluster Performance using Cassandra-stress tool to measure and improve the Read/Writes
- Hadoop installation, Configuration of multiple nodes using Clouderaplatform.
- Day-to-day operational support of our ClouderaHadoop clusters in lab and production, at multi-petabyte scale.Installed, configured and optimized Hadoop infrastructure using Cloudera Hadoop distributions CDH5 using Puppet.
- Worked on installing and configuring of CDH 5.8, 5.9 and 5.10 Hadoop Cluster on AWS using ClouderaDirector.
- Developed and Configured Kafka brokers to pipeline server logs data into Spark streaming.
- Worked with developer teams on NiFiworkflow to pick up the data from rest API server, from data lake as well as from SFTP server and send dat to Kafka bro
- Implemented Data Ingestion in real time processing using Kafka.
- Active member for developing POC on streaming data using Apache Kafka and Spark Streaming.
- UsedSpark-Streaming APIs to perform necessary transformations and actions on the data got from Kafka and persists into Cassandra.
- Used the external tables in Impala for data analysis.
- Good understanding of MPP databases such as HP Vertica and Impala.
- Planning, Installing and Configuring Hadoop Cluster in Cloudera Distributions.
- Administration, installing, upgrading and managing distributions of Hadoop (CDH5, Cloudera manager), HBase. Managing, monitoring and troubleshooting Hadoop Cluster using Cloudera Manager.
- Used Spark API over HadoopYARN as execution engine for data analytics using Hive.
- Expertise in writing Spark Streaming applications using Scala higher order functions
- Implemented Sqoop for large data transfers from RDMS to HDFS/Hbase/Hive and vice-versa.
- Involved in using HCATALOG to access Hive table metadata from Map Reduce or Pig code
- Importing and exporting data into HDFS and Hive using Sqoop.
- Migrating the data from Oracle, MS SQL Server in to HDFS using Sqoopand importing various formats of flat files in to HDFS.
- Extracted data from Oracle SQL server and MySQL databases to HDFS using Sqoop.
- Involved in using ORC, Avro, Parquet, RCFile and JSON file formats and developed UDFs using Hive, Pig and Sqoopto import files into Hadoop.
- Exported the patterns analyzed back to Teradata using Sqoop. Implemented a script to transmit sys print information from Oracle to HBase using Sqoop.
- Developed Sqoopjobs to extract data and load into HDFS from RDBMS systems such as DB2 and SQL Server.
Environment: Spark, Cassandra, SparkData Stax, Sqoop, Scala, HDFS, HBase, Hive, Hcatalog, Map Reduce, pig code, Hadoop, Cloudera, Kafka, Impala.
Confidential, Reston, VA
Hadoop Developer
Roles & Responsibilities:
- Experienced in handling data from different datasets, join and preprocess them using Pig join operations.
- Worked on Pig script to count the number of times a URL was opened in a particular duration.
- Developed PIG UDFs for the needed functionality such as custom Pigs loader non as timestamp loader.
- Worked on analyzing Hadoop cluster using different big data analytic tools including Pig, Hive. HBase and MapReduce
- Extracted data of everyday transaction of customers from DB2 and export to Hive and setup Online analytical processing
- Installed and configured Hadoop, MapReduce, and HDFS clusters
- Created Hive tables, loaded the data and Performed data manipulations using Hive queries in MapReduce Execution Mode.
- Worked on Developing custom MapReduce programs and User Defined Functions (UDFs) in Hive to transform the large volumes of data with respect to business requirement.
- Involved in creating Hive tables and working on them using Hive QL.
- Analyzed data using Hadoop components Hive and Pig.
- Scripted complex HiveQL queries on Hive tables to analyze large datasets and wrote complex Hive UDFs to work with sequence files.
- Scheduled workflows using Oozie to automate multiple Hive and Pig jobs, which run independently with time and data availability.
- Responsible for creating Hive tables, loading data and writing Hive queries to analyze data.
- Handled importing data from various data sources, performed transformations using Hive, Map Reduce, and loaded data into HDFS.
- Imported data from Teradata database into HDFS and exported the analyzed patterns data back to Teradata using Sqoop.
- Developed MapReduce jobs for Log Analysis, Recommendation and Analytics.
- Wrote MapReduce jobs to generate reports for the number of activities created on a particular day, during a dumped from the multiple sources and the output was written back to HDFS
- Created HBase tables, used HBase sinks and loaded data into them to perform analytics using Tableau.
- Design the HBaseschemes based on the requirements and HBase data migration and validation
- Extending HIVE and PIG core functionality by using custom User Defined Function's (UDF), User Defined Table-Generating Functions (UDTF) and User Defined Aggregating Functions (UDAF) for Hive and Pig using Scala
- Proficient in using Cloudera Manager, an end to end tool to manage Hadoop operations.
Environment: Pig, Hive, Map Reduce, Hadoop, HDFS, Hive QL, Oozie, Cloudera, HBase
Confidential, MI
Java/Hadoop Developer
Roles & Responsibilities:
- Importing and exporting data into HDFS and Hive using Sqoop and Flume.
- Responsible for loading the customer's data and event logs from Oracle database, Teradata into HDFS using Sqoop
- Written Hive UDFs to extract data from staging tables
- Developing the Pig scripts / UDFs to manipulate/transform the loaded data.
- Developed optimal strategies for distributing the web log data over the cluster, importing and exporting the stored web log data into HDFS and Hive using Sqoop.
- Collected and aggregated large amounts of web log data from different sources such as webservers, mobile and network devices using Apache Flume and stored the data into HDFS for analysis.
- Extending Hive and Pig core functionality by writing custom UDFs.
- Worked on Importing and exporting data from different databases like MySQL, Oracle into HDFS and Hive using Sqoop.
- Contributed to building hands-on tutorials for the community to learn how to setup HortonworksData Platform (powered byHadoop) and Hortonworks Data flow (powered by Nifi)
- Responsible for architecting Hadoop clusters with Hortonworks distribution platform HDP 1.3.2 and Cloudera CDH4
- Installing, Upgrading and Managing Hadoop Cluster on Hortonworks.
- Worked on evaluating, architecting, installation/setup of Hortonworks 2.1/1.8 Big Data ecosystem which includes Hadoop, Pig, Hive, Sqoop etc.
- Hands on experience in installing, configuring MapR,Hortonworksclusters and installed Hadoop ecosystem components like Hadoop Pig, Hive, HBase, Sqoop, Kafka, Oozie, Flume, Zookeeper.
- Responsible for loading unstructured and semi-structured data into Hadoopcluster coming from different sources using Flume and managing.
- Involved in running Hadoopjobs for processing millions of records of text data
- Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required
- Involved in loading data from LINUX file system to HDFS
- Responsible for managing data from multiple sources
- Experienced in running Hadoopstreaming jobs to process terabytes of xml format data
Environment: HDFS, Hive, Sqoop, Flume, Oracle, Hortonworks, Unix, Linux
Confidential
Java/J2EE developer
Roles & Responsibilities:
- Involved in analysis of requirements with business teams to deliver the best technical solution.
- Involvement in the use cases, development of OOAD and modelling, which involved in class diagrams and Object Diagrams using UML.
- Optimized system performance by writing stored procedures and calling them using JDBC callable statements.
- Interacted with business analysts and architecture groups gathering requirements and use cases.
- Involved in Object Oriented Analysis and Design (OOAD) using UML for designing the application.
- Developed Class diagrams, Sequence diagrams, and State diagrams.
- Developed the application using the Struts.
- Developed JSP pages for the presentation layer, used custom tag libraries, JSP Standard Tag Library (JSTL).
- Used client-side JavaScript much extensively, apparently to make deployment of new changes much easier.
- Involved in creating Data Structures in the required format.
- Designed AJAX pages to improve the efficiency of the web pages.
- Implemented paging concept using Ajax technology along with display tags to all the modules in the project
- Worked with JavaScript to perform client-side form validations.
- Used Struts tag libraries as well as Struts tile framework.
- Used JDBC to access Database with Oracle thin driver of Type-3 for application optimization and efficiency.
- Actively involved in tuning SQL queries for better performance.
- Worked with XML to store and read exception messages through DOM
- Worked on Oracle database to design Databases Schema, created Database structure, Tables and Relationship diagrams.
- Handled overall exception handling and logging for the application. Created the style sheet and XSLT for presentation layer controls. Developed end to end activities presentation layer, Service layer, data access layer and database activities for the modules assigned.
- Performed enhancements to existing SOAP web services for online card payments
- Performed enhancements to existing payment screens by developing servlets and JSP Pages
- Involved in end to end batch loading process using ETL Informatica
- Deployed & maintained the JSP, Servlets components on Web logic 8.0
- Developed Application Servers persistence layer using, JDBC, SQL.
- Used JDBC to connect the web applications to Data Bases.
Environment: Java/J2EE Hibernate, JSPs, EJB 2.0, UML, JMS, XML, Struts, HTML, JavaScript, AJAX, DHTML, Web Sphere, T/SQL, JUnit, ANT, Windows NT, Unix.
Confidential
Java developer
Roles & Responsibilities:
- Contributed to the entire Software Development Life Cycle (SDLC), involving creation of business requirements document, technical requirements document, code development and testing.
- Created UML class diagrams dat depict the code’s design and its compliance with the functional requirements.
- Used Servlets to create the front end; manipulated the web.xml file.
- Designed several complex SQL queries involving sub queries and multiple joins.
- Developed application in Agile methodologies - Sprint & scrums.
- Developed Single Page Responsive web application in AngularJS and Bootstrap.
- Used Tableau JavaScript API to embed dashboard in Web application.
- Worked on large data base (20 billion of records).
- Created heat map/donut pie/histogram and other kinds of reports and created dashboards out of them.
- Developed different kinds of interactive graphs in R studios.
- Created own shiny-server on Linux Centos OS and deployed reports on server.
- Created multiple charts in D3.js.Developed dashboard in Kibana 4.
- Worked on loading data into Elastic Search using Logstash.
- Worked on multiple functions like creating index, loading data, aggregation functions on Elastic Search.
- Created Cron jobs on Linux Centos.
Environment: AngularJS, JSP, JSF, JQuery, Apache Tomcat, JBoss, Elastic Search, Oracle 10g, Oracle Forms 10g Spring Data, D3.JS, Elasticsearch.JS, Kibana 4, Logstash, AJAX, SVN, CentOS, Windows 7, PostgreSQL, MySQL, Tableau 9.3, R, R Studio, Shiny-Server.