Sr. Hadoop Developer Resume
IL
PROFESSIONAL SUMMARY:
- Over 8 years of IT industry experience in product Development, Implementation and Maintenance of various cloud - based web applications using Java, J2EE technologies and Big Data ecosystems on Linux environment
- Over 4 years of experience working with analytics using Big Data technologies. Have hands-on experience in Storing, Querying, Processing and Data Analysis
- Comprehensive work experience in implementing Big Data projects using ApacheHadoop, Pig, Hive, HBase, Spark, Sqoop, Flume, Zookeeper, Oozie
- Experience withdistributed systems, large-scale non-relational data stores and multi-terabyte data warehouses
- Excellent knowledge onHadoop architecture: Hadoop Distributed File system (HDFS), Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce programming paradigm
- Hands-on experience building data pipelines using Hadoop components Sqoop, Hive, Pig, MapReduce, Spark, Spark SQL
- Hands on experience in various Big Data application phases likeData Ingestion, Data Analytics and Data Visualization
- Experience in developing efficient solutions to analyze large data sets
- Experience working on Hortonworks / Cloudera / MapR distributions
- Extensively worked on MRV1 and MRV2 Hadoop architectures
- Experience working on Spark, RDD’s, DAG’s, Spark SQL and Spark Streaming
- Experience in importing and exporting data using Sqoop between HDFS and Relational Database Management Systems
- Populated HDFS with huge amounts of data using Apache Kafka and Flume
- Excellent knowledge of data mapping, extracting, transforming and loading from different data sources
- Worked with different File Formats like TEXTFILE, SEQUENCE FILE, AVROFILE, ORC, and PARQUET for Hive querying and processing
- Experience in developing custom MapReduce Programs in Java using Apache Hadoop for analyzing Big Data as per the requirement
- Well experienced in data transformation using custom MapReduce, Hive and Pig scripts for different types of file formats
- Expertise in extending Hive and Pig core functionality by writing custom UDFs and UDAF’s
- Designing and creating Hive external tables using shared meta-store instead of derby with partitioning, dynamic partitioning and buckets
- Experience building solutions with NoSQL databases, such as HBase, Cassandra, MongoDB
- Firm grip ondata modeling,data mapping, database performance tuningandNoSQLmap-reduce systems
- In-depth understanding ofSpark architecture includingSparkCore,Spark SQL, Data Frames, and Spark Streaming
- Hands on experience migrating complex MapReduce programs into Apache Spark RDD transformations
- Experienced in Apache Spark for implementing advanced procedures like text analytics and processing using the in-memory computing capabilities written in Scala
- Experience in Kafka installation & integration with Spark Streaming
- Used Spark Streaming to divide streaming data into batches as an input to Spark engine for batch processing
- Experience in designing both time driven and data driven automated workflows using Oozie
- Good understanding of ZooKeeper for monitoring and managing Hadoop jobs
- Monitoring Map Reduce Jobs and YARN Applications
- Hands-on experiencewith Amazon Elastic MapReduce (EMR), Storage S3, EC2 instances and Data Warehousing
- Experience with RDBMS and writing SQL and PL/SQL scripts used in stored procedures
- Used Git for source code and version control management
- Proficient in Java, J2EE, JDBC, Collection Framework, Servlets, JSP, Spring, Hibernate, JSON, XML, REST, SOAP Web Services
- Strong understanding in Agile and Waterfall SDLC methodologies
TECHNICAL SKILLS:
Big Data Technologies: HDFS, YARN, Map Reduce, Pig, Hive, HBase, Spark, Spark SQL, Spark Streaming, Sqoop, Flume, Kafka, ZooKeeper, Oozie
Big Data Distributions: Hortonworks, Cloudera, MapR, Amazon Elastic MapReduce (EMR)
Programming Languages: Java, Python, Scala, C++, R, JavaScript, Shell Script
Operating Systems: Linux, Windows, Unix
RDBMS: Oracle, MySQL, MS SQL Server
NoSQL Databases: HBase, Cassandra, MongoDB
Frame works: Spring, Hibernate, Struts
Web Servers: Apache Tomcat, Web Sphere, Web Logic
Version Control: Git, SVN, CVS
Integrated Development Environments (IDEs): Java Eclipse IDE, NetBeans, Microsoft SQL Studio
Web Technologies: HTML, CSS, Bootstrap, Java Script, DOM, XML, Servlets
PROFESSIONAL EXPERIENCE:
Confidential, IL
Sr. Hadoop Developer
Responsibilities:
- Involved in complete project life cycle starting from design discussion to production deployment
- Developed Spark applications using Scala utilizing Data frames and Spark SQL API for faster processing of data.
- Developed highly optimized Spark applications to perform various data cleansing, validation, transformation and summarization activities according to the requirement
- Data pipeline consists Spark, Hive and Sqoop and Custom Build Input Adapters to ingest, transform and analyse operational data.
- Developed Spark jobs and Hive Jobs to summarize and transform data.
- Used Spark for interactive queries, processing of streaming data and integration with popular NoSQL database for huge volume of data.
- Involved in converting Hive/SQL queries into Spark transformations using Spark Data Frames and Scala.
- Analyzed the SQL scripts and designed the solution to implement using Scala.
- Used Spark for interactive queries, processing of streaming data and integration with popular NoSQL database for huge volume of data.
- Real time streaming the data using Spark with Kafka
- Created applications using Kafka, which monitors consumer lag within Apache Kafka clusters. Used in production by multiple report suites.
- Ingested syslog messages, parses them and streams the data to Apache Kafka.
- Handled importing data from different data sources into HDFS using Sqoop and performing transformations using Hive, Map Reduce and then loading data into HDFS.
- Exported the analyzed data to the relational databases using Sqoop, to further visualize and generate reports for the BI team.
- Collecting and aggregating large amounts of log data using Flume and staging data in HDFS for further analysis
- Analyzed the data by performing Hive queries (Hive QL) to study customer behaviour.
- Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
- Developed Hive scripts in Hive QL to de-normalize and aggregate the data.
- Created HBase tables and column families to store the user event data.
- Scheduled and executed workflows in Oozie to run various jobs.
Environment: Java, Scala, Hadoop, Hortonworks, AWS, HDFS, YARN, Map Reduce, Hive, Pig, Spark, Flume, Kafka, Sqoop, Oozie, Zookeeper, Oracle, Teradata, MySQL
Confidential, OH
Hadoop Developer
Responsibilities:
- Worked on cloud platform which was built with a scalable distributed data solution using Hadoop on a 40-node cluster using AWS cloud to run analysis on 25+ Terabytes of customer usage data.
- Worked on analyzingHadoopstack and different big data analytic tools including Pig, Hive, HBase database and Sqoop.
- Designing and implementing semi-structured data analytics platform leveragingHadoop.
- Worked on performance analysis and improvements for Hive and Pig scripts at MapReduce job tuning level.
- Installation and Configuration ofHadoopCluster. Working with Cloudera Support Team to Fine Tune Cluster.
- Developed a custom File System plugin forHadoopso it can access files on Hitachi Data Platform.
- Developed connectors for elastic search and green plum for data transfer from a kafka topic.
- Performed Data Ingestion from multiple internal clients using Apache Kafka Developed k-streams using java for real time data processing.
- Involved in Optimization of Hive Queries.
- Developed a frame work to handle loading and transform large sets of unstructured data from UNIX system to HIVE tables.
- Involved in Data Ingestion to HDFS from various data sources.
- Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
- Extensively used Apache Sqoop for efficiently transferring bulk data between ApacheHadoopand relational databases.
- Automated Sqoop, hive and pig jobs using Oozie scheduling.
- Extensive knowledge in NoSQL databases like HBase
- Worked extensively with importing metadata into Hive and migrated existing tables and applications to work on Hive and AWS cloud.
- Responsible for continuous monitoring and managing Elastic MapReduce (EMR) cluster through AWS console.
- Have good knowledge on writing and using the user defined functions in HIVE, PIG and MapReduce.
- Helped business team by installing and configuringHadoopecosystem components along with Hadoop admin.
- Developed multiple Kafka Producers and Consumers from scratch as per the business requirements.
- Worked on loading log data into HDFS through Flume
- Created and maintained technical documentation for executing Hive queries and Pig Scripts.
- Worked on debugging and performance tuning of Hive &Pig jobs.
- Used Oozie to schedule various jobs onHadoop cluster.
- Used Hive to analyses the partitioned and bucketed data.
- Worked on establishing connectivity between Tableau and Hive.
Environment: Hortonworks 2.4,Hadoop, HDFS, Map Reduce, Mongo DB, Cloudera Java, VMware, HIVE, Eclipse, PIG, Hive, HBase, AWS, Tableau, Sqoop, Flume, Linux, UNIX
Confidential, FL
Hadoop Developer
Responsibilities:
- Worked on Hortonworks cluster, which is responsible for providing open source platform based on Apache Hadoopfor analyzing, storing and managing big data
- Worked with analyst to determine and understand business requirements
- Load and transform large datasets of structured, semi structured and unstructured data using Hadoop/Big Data concepts
- Developed data pipeline using Flume, Sqoop, Pig and MapReduce to ingest customer data and financial histories into HDFS for analysis
- Used MapReduce and Flume to load, aggregate, store and analyze web log data from different web servers
- Created MapReduce programs to handle semi/unstructured data like XML, JSON, AVRO data files and sequence files for log files
- Involved in submitting and tracking MapReduce jobs using Job Tracker
- Experience writing Pig Latin scripts for Data Cleansing, ETL operations and query optimizations of exists scripts
- Written Hive UDF to sort Structure fields and return complex data types
- Created Hive tables from JSON data using data serialization framework like AVRO
- Experience writing reusable custom Hive and Pig UDF’s in Java and using existing UDF’s from Piggybank and other sources
- Experience in working with NoSQL database HBase in getting real time data analytics
- Integrated Hive tables to HBase to perform row level analytics
- Developed Oozie workflows for daily incremental loads, which Sqoop’s data from Teradata, Netezza and then imported into Hive tables
- Involved in performance tuning by using different service engines like TEZ etc.
- Performed performance tuning and troubleshooting of MapReduce jobs by analyzing and reviewing Hadooplog files
- Implemented Daily Cron jobs that automate parallel tasks of loading the data into HDFS using AutoSys and Oozie coordinator jobs
- Developed suit of Unit Test Cases for Mapper, Reducer and Driver classes using MR Testing library
Environment: Hortonworks, Java, Hadoop, HDFS, MapReduce, Tez, Hive, Pig, Oozie, Sqoop, Flume, Teradata, Netezza, Tableau
Confidential, CA
Hadoop Developer
Responsibilities:
- Installed Cloudera distribution of Hadoop Cluster and services HDFS, Pig, Hive, Sqoop, Flume and MapReduce
- Responsible for providing open source platform based on Apache Hadoopfor analyzing, storing and managing big data
- Loaded and transformed large sets of structured, semi-structured and unstructured data
- Responsible for managing data coming from different sources
- Imported and exported data into HDFS and Hive using Sqoop
- Wrote Hive queries
- Involved in loading data from UNIX file system to HDFS
- Created Hive tables, loaded with data and wrote queries which will run internally in MapReduce and performed data analysis as per the business requirements
- Worked with analysts to determine and understand business requirements
- Loaded and transformed large datasets of structured, semi structured and unstructured data using Hadoop/Big Data concepts
- Developed data pipeline using Flume, Sqoop, Pig and MapReduce to ingest customer data and financial histories into HDFS for analysis
- Used MapReduce and Flume to load, aggregate, store and analyze web log data from different web servers
- Created MapReduce programs to handle semi/unstructured data like XML, JSON, AVRO data files and sequence files for log files
- Involved in submitting and tracking MapReduce jobs using Job Tracker
- Experience writing Pig Latin scripts for Data Cleansing, ETL operations and query optimizations of exists scripts
- Written Hive UDF to sort Structure fields and return complex data types
- Created Hive tables from JSON data using data serialization framework like AVRO
- Experience writing reusable custom Hive and Pig UDF’s in Java and using existing UDF’s from Piggybank and other sources
- Experience in working with NoSQL database HBase in getting real time data analytics
- Integrated Hive tables to HBase to perform row level analytics
- Performed performance tuning and troubleshooting of MapReduce jobs by analyzing and reviewing Hadooplog files
- Developed Unit Test Cases for Mapper, Reducer and Driver classes using MR Testing library
- Supported operations team in Hadoopcluster maintenance including commissioning and decommissioning nodes and upgrades
- Provided technical assistance to all development projects
- Hands-on experience with Qlik Sense for Data Visualization and Analysis on large data sets, drawing various insights
- Created dashboards using Qlik Sense and performed Data extracts, Data blending, Forecasting, and table calculations
Environment: Hortonworks, Java, Hadoop, HDFS, MapReduce, Hive, Pig, Oozie, Sqoop, Flume, Netezza, Qlik Sense
Confidential
Java Developer
Responsibilities:
- Built the application based on Rational Unified Process (RUP)
- Analyzed and developed UML’s with Rational Rose including development of class diagrams, sequence diagrams, use case diagrams and activity diagrams
- Implemented the Middle-Tier employing design patterns like MVC, Business Delegate, Service Locator, Session Façade, Data Access Objects (DAO’s)
- Developed using MVC architecture and employed the Struts Framework and used Validator Framework and Tiles Framework as a plug-in with struts
- Developed user interface using JSP, JSP Tag libraries (JSTL) and Struts Tag Libraries
- Used EJB’s in the application and developed Session beans to house business login at the middle tier level
- Used Java Message Service (JMS) for reliable and asynchronous exchange of important information
- Used Hibernate in data access layer to access and update the information in database
- Implemented various XML technologies like XML schemas, JAXB parsers for cross platform data transfer
- Used JSON to pass objects between web pages and server-side application
- Used XSL-FO to generate PDF reports
- Extensively worked on XML parsers (SAX/DOM)
- Used WSDL and SOAP protocol for Web Services implementation
- Used JDBC to access DB2 UDB database for accessing customer information
- Developed application level logging using Log4J
- Used CVS for version controlling and Junit for unit testing
- Involved in development of Tables, Indices, Stored procedures, Database Triggers and Functions
- Involved in documenting the application
Environment: J2EE 1.7, WebSphere Application Server v8.0, RAD, JSP 2.0, EJB 3.1, Struts 2.0, JMS, JSON, JDBC, JNDI, XML, XSL, XSLT, XSL-FO, WSDL, SOAP, Hibernate 4.0, RUP, Rational Rose (2000), Log4J, Junit, CVS, IBM DB2 v8.2, Red Hat LINUX, RESTful web services
