Sr. Hadoop Developer Resume IL - Hire IT People

PROFESSIONAL SUMMARY:

Over 8 years of IT industry experience in product Development, Implementation and Maintenance of various cloud - based web applications using Java, J2EE technologies and Big Data ecosystems on Linux environment
Over 4 years of experience working with analytics using Big Data technologies. Have hands-on experience in Storing, Querying, Processing and Data Analysis
Comprehensive work experience in implementing Big Data projects using ApacheHadoop, Pig, Hive, HBase, Spark, Sqoop, Flume, Zookeeper, Oozie
Experience withdistributed systems, large-scale non-relational data stores and multi-terabyte data warehouses
Excellent knowledge onHadoop architecture: Hadoop Distributed File system (HDFS), Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce programming paradigm
Hands-on experience building data pipelines using Hadoop components Sqoop, Hive, Pig, MapReduce, Spark, Spark SQL
Hands on experience in various Big Data application phases likeData Ingestion, Data Analytics and Data Visualization
Experience in developing efficient solutions to analyze large data sets
Experience working on Hortonworks / Cloudera / MapR distributions
Extensively worked on MRV1 and MRV2 Hadoop architectures
Experience working on Spark, RDD’s, DAG’s, Spark SQL and Spark Streaming
Experience in importing and exporting data using Sqoop between HDFS and Relational Database Management Systems
Populated HDFS with huge amounts of data using Apache Kafka and Flume
Excellent knowledge of data mapping, extracting, transforming and loading from different data sources
Worked with different File Formats like TEXTFILE, SEQUENCE FILE, AVROFILE, ORC, and PARQUET for Hive querying and processing
Experience in developing custom MapReduce Programs in Java using Apache Hadoop for analyzing Big Data as per the requirement
Well experienced in data transformation using custom MapReduce, Hive and Pig scripts for different types of file formats
Expertise in extending Hive and Pig core functionality by writing custom UDFs and UDAF’s
Designing and creating Hive external tables using shared meta-store instead of derby with partitioning, dynamic partitioning and buckets
Experience building solutions with NoSQL databases, such as HBase, Cassandra, MongoDB
Firm grip ondata modeling,data mapping, database performance tuningandNoSQLmap-reduce systems
In-depth understanding ofSpark architecture includingSparkCore,Spark SQL, Data Frames, and Spark Streaming
Hands on experience migrating complex MapReduce programs into Apache Spark RDD transformations
Experienced in Apache Spark for implementing advanced procedures like text analytics and processing using the in-memory computing capabilities written in Scala
Experience in Kafka installation & integration with Spark Streaming
Used Spark Streaming to divide streaming data into batches as an input to Spark engine for batch processing
Experience in designing both time driven and data driven automated workflows using Oozie
Good understanding of ZooKeeper for monitoring and managing Hadoop jobs
Monitoring Map Reduce Jobs and YARN Applications
Hands-on experiencewith Amazon Elastic MapReduce (EMR), Storage S3, EC2 instances and Data Warehousing
Experience with RDBMS and writing SQL and PL/SQL scripts used in stored procedures
Used Git for source code and version control management
Proficient in Java, J2EE, JDBC, Collection Framework, Servlets, JSP, Spring, Hibernate, JSON, XML, REST, SOAP Web Services
Strong understanding in Agile and Waterfall SDLC methodologies

TECHNICAL SKILLS:

Big Data Technologies: HDFS, YARN, Map Reduce, Pig, Hive, HBase, Spark, Spark SQL, Spark Streaming, Sqoop, Flume, Kafka, ZooKeeper, Oozie

Big Data Distributions: Hortonworks, Cloudera, MapR, Amazon Elastic MapReduce (EMR)

Programming Languages: Java, Python, Scala, C++, R, JavaScript, Shell Script

Operating Systems: Linux, Windows, Unix

RDBMS: Oracle, MySQL, MS SQL Server

NoSQL Databases: HBase, Cassandra, MongoDB

Frame works: Spring, Hibernate, Struts

Web Servers: Apache Tomcat, Web Sphere, Web Logic

Version Control: Git, SVN, CVS

Integrated Development Environments (IDEs): Java Eclipse IDE, NetBeans, Microsoft SQL Studio

Web Technologies: HTML, CSS, Bootstrap, Java Script, DOM, XML, Servlets

PROFESSIONAL EXPERIENCE:

Confidential, IL

Sr. Hadoop Developer

Responsibilities:

Involved in complete project life cycle starting from design discussion to production deployment
Developed Spark applications using Scala utilizing Data frames and Spark SQL API for faster processing of data.
Developed highly optimized Spark applications to perform various data cleansing, validation, transformation and summarization activities according to the requirement
Data pipeline consists Spark, Hive and Sqoop and Custom Build Input Adapters to ingest, transform and analyse operational data.
Developed Spark jobs and Hive Jobs to summarize and transform data.
Used Spark for interactive queries, processing of streaming data and integration with popular NoSQL database for huge volume of data.
Involved in converting Hive/SQL queries into Spark transformations using Spark Data Frames and Scala.
Analyzed the SQL scripts and designed the solution to implement using Scala.
Used Spark for interactive queries, processing of streaming data and integration with popular NoSQL database for huge volume of data.
Real time streaming the data using Spark with Kafka
Created applications using Kafka, which monitors consumer lag within Apache Kafka clusters. Used in production by multiple report suites.
Ingested syslog messages, parses them and streams the data to Apache Kafka.
Handled importing data from different data sources into HDFS using Sqoop and performing transformations using Hive, Map Reduce and then loading data into HDFS.
Exported the analyzed data to the relational databases using Sqoop, to further visualize and generate reports for the BI team.
Collecting and aggregating large amounts of log data using Flume and staging data in HDFS for further analysis
Analyzed the data by performing Hive queries (Hive QL) to study customer behaviour.
Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
Developed Hive scripts in Hive QL to de-normalize and aggregate the data.
Created HBase tables and column families to store the user event data.
Scheduled and executed workflows in Oozie to run various jobs.

Environment: Java, Scala, Hadoop, Hortonworks, AWS, HDFS, YARN, Map Reduce, Hive, Pig, Spark, Flume, Kafka, Sqoop, Oozie, Zookeeper, Oracle, Teradata, MySQL

Confidential, OH

Hadoop Developer

Responsibilities:

Worked on cloud platform which was built with a scalable distributed data solution using Hadoop on a 40-node cluster using AWS cloud to run analysis on 25+ Terabytes of customer usage data.
Worked on analyzingHadoopstack and different big data analytic tools including Pig, Hive, HBase database and Sqoop.
Designing and implementing semi-structured data analytics platform leveragingHadoop.
Worked on performance analysis and improvements for Hive and Pig scripts at MapReduce job tuning level.
Installation and Configuration ofHadoopCluster. Working with Cloudera Support Team to Fine Tune Cluster.
Developed a custom File System plugin forHadoopso it can access files on Hitachi Data Platform.
Developed connectors for elastic search and green plum for data transfer from a kafka topic.
Performed Data Ingestion from multiple internal clients using Apache Kafka Developed k-streams using java for real time data processing.
Involved in Optimization of Hive Queries.
Developed a frame work to handle loading and transform large sets of unstructured data from UNIX system to HIVE tables.
Involved in Data Ingestion to HDFS from various data sources.
Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
Extensively used Apache Sqoop for efficiently transferring bulk data between ApacheHadoopand relational databases.
Automated Sqoop, hive and pig jobs using Oozie scheduling.
Extensive knowledge in NoSQL databases like HBase
Worked extensively with importing metadata into Hive and migrated existing tables and applications to work on Hive and AWS cloud.
Responsible for continuous monitoring and managing Elastic MapReduce (EMR) cluster through AWS console.
Have good knowledge on writing and using the user defined functions in HIVE, PIG and MapReduce.
Helped business team by installing and configuringHadoopecosystem components along with Hadoop admin.
Developed multiple Kafka Producers and Consumers from scratch as per the business requirements.
Worked on loading log data into HDFS through Flume
Created and maintained technical documentation for executing Hive queries and Pig Scripts.
Worked on debugging and performance tuning of Hive &Pig jobs.
Used Oozie to schedule various jobs onHadoop cluster.
Used Hive to analyses the partitioned and bucketed data.
Worked on establishing connectivity between Tableau and Hive.

Environment: Hortonworks 2.4,Hadoop, HDFS, Map Reduce, Mongo DB, Cloudera Java, VMware, HIVE, Eclipse, PIG, Hive, HBase, AWS, Tableau, Sqoop, Flume, Linux, UNIX

Confidential, FL

Hadoop Developer

Responsibilities:

Worked on Hortonworks cluster, which is responsible for providing open source platform based on Apache Hadoopfor analyzing, storing and managing big data
Worked with analyst to determine and understand business requirements
Load and transform large datasets of structured, semi structured and unstructured data using Hadoop/Big Data concepts
Developed data pipeline using Flume, Sqoop, Pig and MapReduce to ingest customer data and financial histories into HDFS for analysis
Used MapReduce and Flume to load, aggregate, store and analyze web log data from different web servers
Created MapReduce programs to handle semi/unstructured data like XML, JSON, AVRO data files and sequence files for log files
Involved in submitting and tracking MapReduce jobs using Job Tracker
Experience writing Pig Latin scripts for Data Cleansing, ETL operations and query optimizations of exists scripts
Written Hive UDF to sort Structure fields and return complex data types
Created Hive tables from JSON data using data serialization framework like AVRO
Experience writing reusable custom Hive and Pig UDF’s in Java and using existing UDF’s from Piggybank and other sources
Experience in working with NoSQL database HBase in getting real time data analytics
Integrated Hive tables to HBase to perform row level analytics
Developed Oozie workflows for daily incremental loads, which Sqoop’s data from Teradata, Netezza and then imported into Hive tables
Involved in performance tuning by using different service engines like TEZ etc.
Performed performance tuning and troubleshooting of MapReduce jobs by analyzing and reviewing Hadooplog files
Implemented Daily Cron jobs that automate parallel tasks of loading the data into HDFS using AutoSys and Oozie coordinator jobs
Developed suit of Unit Test Cases for Mapper, Reducer and Driver classes using MR Testing library

Environment: Hortonworks, Java, Hadoop, HDFS, MapReduce, Tez, Hive, Pig, Oozie, Sqoop, Flume, Teradata, Netezza, Tableau

Confidential, CA

Hadoop Developer

Responsibilities:

Installed Cloudera distribution of Hadoop Cluster and services HDFS, Pig, Hive, Sqoop, Flume and MapReduce
Responsible for providing open source platform based on Apache Hadoopfor analyzing, storing and managing big data
Loaded and transformed large sets of structured, semi-structured and unstructured data
Responsible for managing data coming from different sources
Imported and exported data into HDFS and Hive using Sqoop
Wrote Hive queries
Involved in loading data from UNIX file system to HDFS
Created Hive tables, loaded with data and wrote queries which will run internally in MapReduce and performed data analysis as per the business requirements
Worked with analysts to determine and understand business requirements
Loaded and transformed large datasets of structured, semi structured and unstructured data using Hadoop/Big Data concepts
Developed data pipeline using Flume, Sqoop, Pig and MapReduce to ingest customer data and financial histories into HDFS for analysis
Used MapReduce and Flume to load, aggregate, store and analyze web log data from different web servers
Created MapReduce programs to handle semi/unstructured data like XML, JSON, AVRO data files and sequence files for log files
Involved in submitting and tracking MapReduce jobs using Job Tracker
Experience writing Pig Latin scripts for Data Cleansing, ETL operations and query optimizations of exists scripts
Written Hive UDF to sort Structure fields and return complex data types
Created Hive tables from JSON data using data serialization framework like AVRO
Experience writing reusable custom Hive and Pig UDF’s in Java and using existing UDF’s from Piggybank and other sources
Experience in working with NoSQL database HBase in getting real time data analytics
Integrated Hive tables to HBase to perform row level analytics
Performed performance tuning and troubleshooting of MapReduce jobs by analyzing and reviewing Hadooplog files
Developed Unit Test Cases for Mapper, Reducer and Driver classes using MR Testing library
Supported operations team in Hadoopcluster maintenance including commissioning and decommissioning nodes and upgrades
Provided technical assistance to all development projects
Hands-on experience with Qlik Sense for Data Visualization and Analysis on large data sets, drawing various insights
Created dashboards using Qlik Sense and performed Data extracts, Data blending, Forecasting, and table calculations

Environment: Hortonworks, Java, Hadoop, HDFS, MapReduce, Hive, Pig, Oozie, Sqoop, Flume, Netezza, Qlik Sense

Confidential

Java Developer

Responsibilities:

Built the application based on Rational Unified Process (RUP)
Analyzed and developed UML’s with Rational Rose including development of class diagrams, sequence diagrams, use case diagrams and activity diagrams
Implemented the Middle-Tier employing design patterns like MVC, Business Delegate, Service Locator, Session Façade, Data Access Objects (DAO’s)
Developed using MVC architecture and employed the Struts Framework and used Validator Framework and Tiles Framework as a plug-in with struts
Developed user interface using JSP, JSP Tag libraries (JSTL) and Struts Tag Libraries
Used EJB’s in the application and developed Session beans to house business login at the middle tier level
Used Java Message Service (JMS) for reliable and asynchronous exchange of important information
Used Hibernate in data access layer to access and update the information in database
Implemented various XML technologies like XML schemas, JAXB parsers for cross platform data transfer
Used JSON to pass objects between web pages and server-side application
Used XSL-FO to generate PDF reports
Extensively worked on XML parsers (SAX/DOM)
Used WSDL and SOAP protocol for Web Services implementation
Used JDBC to access DB2 UDB database for accessing customer information
Developed application level logging using Log4J
Used CVS for version controlling and Junit for unit testing
Involved in development of Tables, Indices, Stored procedures, Database Triggers and Functions
Involved in documenting the application

Environment: J2EE 1.7, WebSphere Application Server v8.0, RAD, JSP 2.0, EJB 3.1, Struts 2.0, JMS, JSON, JDBC, JNDI, XML, XSL, XSLT, XSL-FO, WSDL, SOAP, Hibernate 4.0, RUP, Rational Rose (2000), Log4J, Junit, CVS, IBM DB2 v8.2, Red Hat LINUX, RESTful web services

We provide IT Staff Augmentation Services!

Sr. Hadoop Developer Resume

IL

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship