Sr. Hadoop/spark Developer Resume
West Des Moines, Ia
SUMMARY:
- Around8+ Years of experience in the field of Information Technology which includes a major concentrationon Big Data Tools and Technologies, various Relational Databases and NoSQL Databases, Java Programminglanguage and J2EE technologies with highly recommended software practices.
- Around 4+ years of experience in Hadoop distributed file system (HDFS), Impala, Sqoop, Hive, HBase, Spark, Hue, Mapreduce framework, Kafka, Yarn, Flume, Oozie, Zookeeper and Pig.
- Hands on experience on various Hadoop components of Hadoop ecosystem such as Job Tracker, Task Tracker, Name Node, Data Node, Resource Manager and Application Manager.
- Good knowledge on AWS infrastructure services Amazon Simple Storage Service (Amazon S3),EMR and Amazon Elastic Compute Cloud (Amazon EC2).
- Experience in working with Amazon EMR, Cloudera (CDH3 & CDH4) and Hortonworks Hadoop Distributions.
- Involved in loading the structured and semi structured data into spark clusters using Spark SQL and Data Frames API.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs and Scala.
- Experience in implementing Real - Time event processing and analytics using messaging systems like Spark Streaming.
- Capable of creating real time data streaming solutions and batch style large scale distributed computing applications using Apache Spark, Spark Streaming, Kafka and Flume.
- Experience in analyzing data using Spark SQL, HIVEQL, PIG Latin, Spark/Scala and custom MapReduce programs in Java.
- Have experience in Apache Spark, Spark Streaming, Spark SQL and NoSQL databases like HBase, Cassandra, and MongoDB.
- Experience in creating DStreams from sources like Flume, Kafka and performed different Spark transformations and actions on it.
- Experience in integrating Apache Kafka with Apache Storm and created Storm data pipelines for real time processing.
- Experience with creating script for data modeling and data import and export. Extensive experience in deploying, managing and developing MongoDB clusters.
- Implemented Kafka Custom encoders for custom input format to load data into Kafka Partitions. Real time streaming the data using Spark with Kafka for faster processing.
- Having experience in developing a data pipeline using Kafka to store data into HDFS.
- Good experience in creating and designing data ingest pipelines using technologies such as Apache Storm- Kafka.
- Involved in converting Cassandra/Hive/SQL queries into Spark transformations using Spark RDD in Scala and Python.
- Expertise in working with the Hive data warehouse tool-creating tables, data distribution by implementing partitioning and bucketing, writing and optimizing the HiveQL queries.
- Experience on Working with data extraction, transformation and load in Hive, Pig and HBase.
- Hands on experience in writing Mapreduce programs using Java to handle different data sets using Map and Reduce tasks.
- Worked with join patterns and implemented Map side joins and Reduce side joins using MapReduce.
- Orchestrated various Sqoop queries, Pig scripts, Hive queries using Oozie workflows and sub-workflows.
- Responsible for handling different data formats like Avro, Parquet and ORC formats.
- Experience in performance tuning, monitoring the Hadoop cluster by gathering and analyzing the existing infrastructure using Cloudera manager.
- Knowledge of job workflow scheduling and monitoring tools like Oozie (Hive, Pig) and Zookeeper (Hbase).
- Dealt with huge transaction volumes while interfacing the front-end application written in Java, JSP, Struts, Hibernate, SOAP Web service and with Tomcat Web server.
- Extensive knowledge of utilizing cloud-based technologies using Amazon Web Services (AWS), VPC, EC2, Route S3, Dynamo DB, Elastic Cache Glacier, RRS, Cloud Watch, Cloud Front, Kinesis, Redshift, SQS, SNS, RDS.
- Hands on experience in developing the applications with Java, J2EE, J2EE - Servlets, JSP, EJB, SOAP, Web Services, JNDI, JMS, JDBC2, Hibernate, Struts, Spring, XML, HTML, XSD, XSLT, PL/SQL, Oracle10g and MS-SQL Server RDBMS.
- Delivered zero defect code for three large projects which involved changes to both front end (Core Java, Presentation services) and back-end (Oracle).
- Experience with all stages of the SDLC and Agile Development model right from the requirement gathering to Deployment and production support.
- Involved in daily SCRUM meetings to discuss the development/progress and was active in making scrum meetings more productive.
- Also have experience in understanding of existing systems, maintenance and production support, on technologies such as Java, J2EE and various databases (Oracle, SQL Server).
TECHNICAL SKILLS:
Big Data Ecosystem: Hadoop, MapReduce, Pig, Hive, YARN, Kafka, Flume, Sqoop, Impala, Oozie, Zookeeper, Spark, Ambari, Mahout, MongoDB, Cassandra, Avro, Storm, Parquet and Snappy.
Hadoop Distributions: Cloudera (CDH3, CDH4, and CDH5), Hortonworks, MapR and Apache
Languages: Java, Python, Jruby, SQL, HTML, DHTML, Scala, JavaScript, XML and C/C++
Nosql Databases: Cassandra, MongoDB and HBase
Java Technologies: Servlets, JavaBeans, JSP, JDBC, JNDI, EJB and struts
XML Technologies: XML, XSD, DTD, JAXP (SAX, DOM), JAXB
Development Methodology: Agile, waterfall
Web Design Tools: HTML, DHTML, AJAX, JavaScript, JQuery and CSS, AngularJs, ExtJS and JSON
Development / Build Tools: Eclipse, Ant, Maven, IntelliJ, JUNIT and log4J
Frameworks: Struts, spring and Hibernate
App/Web servers: WebSphere, WebLogic, JBoss and Tomcat
DB Languages: MySQL, PL/SQL, PostgreSQL and Oracle
RDBMS: Teradata, Oracle 9i,10g,11i, MS SQL Server, MySQL and DB2
Operating systems: UNIX, LINUX, Mac OS and Windows Variants
Data analytical tools: R and MATLAB
ETL Tools: Talend, Informatica, Pentaho
PROFESSIONAL EXPERIENCE:
Confidential, West Des Moines, IA.
Sr. Hadoop/Spark Developer
Responsibilities:
- Hands on experience in Spark and Spark Streaming creating RDD & applying operations transformations and Actions.
- Developed Spark applications using Scala for easy Hadoop transitions.
- Used Spark and Spark-SQL to read the parquet data and create the tables in hive using the Scala API.
- Performed advanced procedures like text analytics and processing, using the in-memory computing capabilities of Spark using Scala.
- Developed Spark code using Scala and Spark-SQL for faster processing and testing.
- Implemented Spark sample programs in python using pySpark.
- Analyzed the SQL scripts and designed the solution to implement using pySpark.
- Developed pySpark code to mimic the transformations performed in the on-premise environment.
- Used Spark-StreamingAPIs to perform necessary transformations and actions on the fly for building the common learner data model which gets the data from Kafka in near real time.
- Responsible for loading Data pipelines from web servers and Teradata using Sqoop with Kafka and Spark Streaming API.
- Developed Kafka producer and consumers, Cassandra clients and Spark along with components on HDFS, Hive.
- Populated HDFS and HBase with huge amounts of data using Apache Kafka.
- Used Kafka to ingest data into Spark engine.
- Configured deployed and maintained multi-node Dev and Test Kafka Clusters.
- Managing and scheduling Spark Jobs on a Hadoop Cluster using Oozie.
- Experienced with different scripting language like Python and shell scripts.
- Developed various Python scripts to find vulnerabilities with SQL Queries by doing SQL injection, permission checks and performance analysis.
- Tested Apache TEZ, an extensible framework for building high performance batch and interactive data processing applications, on Pig and Hive jobs.
- Designed and implemented Incremental Imports into Hive tables and writing Hive queries to run on TEZ.
- Experienced data pipelines using Kafka and Akka for handling large terabytes of data.
- Written shell scripts that run multiple Hive jobs which helps to automate different Hive tables incrementally which are used to generate different reports using Tableau for the Business use.
- Experienced in Apache Spark for implementing advanced procedures like text analytics and processing using the in-memory computing capabilities written in Scala.
- Developed Solr web apps to query and visualize and solr indexed data from HDFS.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDD, Scala and Python.
- Worked on SparkSQL, created Data frames by loading data from Hive tables and created prep data and stored in AWS S3.
- Using Spark-Streaming APIs to perform transformations and actions on the fly for building the common learner data model which gets the data from Kafka in near real time and Persists into Cassandra.
- Involvement in creating custom UDFs for Pig and Hive to consolidate strategies and usefulness of Python into Pig Latin and HQL (HiveQL).
- Extensively worked on Text, ORC, Avro and Parquet file formats and compression techniques like Snappy, Gzip and Zlib.
- Implemented Hortonworks NiFi (HDP 2.4) and recommended solution to inject data from multiple data sources to HDFS and Hive using NiFi.
- Developed various data loading strategies and performed various transformations for analyzing the datasets by using Hortonworks Distribution for Hadoop ecosystem.
- Ingested data from RDBMS and performed data transformations, and then export the transformed data to Cassandra as per the business requirement and used Cassandra through Java services.
- Experience in NoSQL Column-Oriented Databases like Cassandra and its Integration with Hadoop cluster.
- Wrote ETL jobs to read from web APIs using REST and HTTP calls and loaded into HDFS using java and Talend.
- Along with the Infrastructure team, involved in design and developed Kafka and Storm based data pipeline.
- Created Partitions, Buckets based on State to further process using Bucket based Hive joins.
- Worked on Sequence files, RC files, Map side joins, bucketing, partitioning for Hive performance enhancement and storage improvement and utilizing HiveSerDes like REGEX, JSON and AVRO.
- Used OOZIE Operational Services for batch processing and scheduling workflows dynamically.
- Worked totally in agile methodology and developed Spark scripts by using Scala shell.
- Involved in loading and transforming large Datasets from relational databases into HDFS and vice-versa using Sqoop imports and export.
- Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and extracted data from MYSQL into HDFS vice-versa using Sqoop.
- Written shell scripts that run multiple Hive jobs which helps to automate different Hive tables incrementally which are used to generate different reports using Tableau for the Business use.
- Used Hibernate ORM framework with Spring framework for data persistence and transaction management.
Environment: Hadoop, Hive, Mapreduce, Sqoop, Kafka, Spark, Yarn, Pig, Cassandra, Oozie, shell Scripting, Scala, Maven, Java, JUnit, agile methodologies, NIFI, MySQL, Tableau, AWS, EC2, S3, Hortonworks, power BI, Solr.
Confidential, Hilmar, CA.
Hadoop/Spark Developer
Responsibilities:
- Optimizing of existing algorithms in Hadoop using Spark context, Spark-SQL, Data Frames and Pair RDD’s.
- Developed Spark scripts by using Java, and Python shell commands as per the requirement.
- Involved with ingesting data received from various relational database providers, on HDFS for analysis and other big data operations.
- Experienced in performance tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning.
- Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
- Worked on Spark SQL and Data frames for faster execution of Hive queries using Spark sqlContext.
- Performed analysis on implementing Spark using Scala.
- Responsible for creating, modifying topics (Kafka Queues) as and when required with varying configurations involving replication factors and partitions.
- Extracted files from MongoDB through Sqoop and placed in HDFS and processed.
- Created and imported various collections, documents into MongoDB and performed various actions like query, project, aggregation, sort and limit.
- Experience with creating script for data modeling and data import and export. Extensive experience in deploying, managing and developing MongoDB clusters.
- Extracted files from MongoDB through Sqoop and placed in HDFS and processed.
- Experience in migrating HiveQL into Impala to minimize query response time.
- Creating Hive tables to import large data sets from various relational databases using Sqoop and export the analyzed data back for visualization and report generation by the BI team.
- Involved in creating Shell scripts to simplify the execution of all other scripts (Pig, Hive, Sqoop, Impala and MapReduce) and move the data inside and outside of HDFS.
- Collecting data from various Flume agents that are imported on various servers using Multi- hop Flow.
- Used Flume to collect the log data from different resources and transfer the data type to hive tables using different SerDe’s to store in JSON, XML and Sequence file formats.
- Developed Scala scripts, UDFs using Data frames/SQL/Data sets and RDD/MapReduce in Spark 1.6 for Data Aggregation, queries and writing data back into OLTP system through Sqoop.
- Developed Pig Scripts, Pig UDFs and Hive Scripts, Hive UDFs to analyze HDFS data.
- Maintained the cluster securely using Kerberos and making the cluster up and running all the times.
- Implemented optimization and performance testing and tuning of Hive and Pig.
- Developed a data pipeline using Kafka to store data into HDFS.
- Worked on reading multiple data formats on HDFS using Scala
- Written shell scripts and Python scripts for automation of job.
Environment: Cloudera, HDFS, Hive, HQL scripts, Mapreduce, Java, HBase, Pig, Sqoop, Kafka,Impala, Shell Scripts,Python Scripts, Spark, Scala, Oozie.
Confidential, Utica, NY
Hadoop Developer
Responsibilities:
- Collected and aggregated large amounts of web log data from different sources such as webservers, mobile and network devices using Apache Flume and stored the data into HDFS for analysis.
- Collecting data from various Flume agents that are imported on various servers using Multi-hop flow.
- Ingest real-time and near-real time (NRT) streaming data into HDFS using Flume.
- Extensively involved in Installation and configuration of Cloudera distribution Hadoop, Name node, Secondary Name Node, Job tracker, Task Trackers and Data Nodes.
- Developed MapReduce programs in Java and Sqoop the data from ORACLE database.
- Responsible for building Scalable distributed data solutions using Hadoop. Written various Hive and Pig scripts.
- Used various HBase commands and generated different Datasets as per requirements and provided access to the data when required using grant and revoke.
- Created HBase tables to store variable data formats of input data coming from different portfolios.
- Worked on HBase for support enterprise production and loading data into HBASE using SQOOP.
- Installed Oozie workflow engine to run multiple Hive and Pig jobs which run independently with time and data availability.
- Expertise in understanding Partitions, Bucketing concepts in Hive.
- Experience working with Apache SOLR for indexing and querying.
- Created custom SOLR Query segments to optimize ideal search matching.
- Used Oozie Scheduler system to automate the pipeline workflow and orchestrate the Mapreduce Jobs that extract the data on a timely manner. Responsible for loading data from UNIX file system to HDFS.
- Developed suit of Unit Test Cases for Mapper, Reducer and Driver classes using MR Testing library.
- Analyzed the weblog data using the HiveQL, integrated Oozie with the rest of the Hadoop stack
- Utilized cluster co-ordination services through Zookeeper.
- Worked on the Ingestion of Files into HDFS from remote systems using MFT.
- Got good experience with various NoSQL databases and Comprehensive knowledge in process improvement, normalization/de-normalization, data extraction, data cleansing, and data manipulation.
- Developed Pig scripts to convert the data from Text file to Avro format.
- Created Partitioned Hive tables and worked on them using HiveQL.
- Developed Shell scripts to automate routine DBA tasks.
- Used Maven extensively for building jar files of MapReduce programs and deployed to Cluster.
- Responsible for cluster maintenance, adding and removing cluster nodes, cluster monitoring and troubleshooting, managing and reviewing data backups and Hadoop log files.
Environment: HDFS, Mapreduce, Pig, Hive, Oozie, Sqoop, Flume, HBase, Java, Maven, Avro, Cloudera,Eclipse and Shell Scripting.
Confidential, Peoria, IL.
Hadoop Developer
Responsibilities:
- Experience in Importing and exporting data into HDFS and Hive using Sqoop.
- Developed Flume Agents for loading and filtering the streaming data into HDFS.
- Experienced in handling data from different data sets, join them and preprocess using Pig join operations.
- Moving Bulk amount data into HBase using Mapreduce Integration.
- Developed Mapreduce programs to clean and aggregate the data
- Developed HBase data model on top of HDFS data to perform real time analytics using Java API.
- Developed different kind of custom filters and handled predefined filters on HBase data using API.
- Strong understanding of Hadoop eco system such as HDFS, MapReduce, HBase, Zookeeper, Pig, Hadoop streaming, Sqoop, Oozie and Hive.
- Implement counters in HBase data to count total records on different tables.
- Experienced in handling Avro data files by passing schema into HDFS using Avro tools and Mapreduce.
- Worked on custom Pig Loaders and Storage classes to work with a variety of data formats such as JSON, Compressed CSV, etc.
- Implemented secondary sorting to sort reducer output globally in Mapreduce.
- Implemented data pipeline by chaining multiple mappers by using Chained Mapper.
- Created Hive Dynamic partitions to load time series data
- Experienced in handling different types of joins in Hive like Map joins, bucker map joins, sorted bucket map joins.
- Created tables, partitions, buckets and perform analytics using Hive ad-hoc queries.
- Experienced import/export data into HDFS/Hive from relational database and Teradata using Sqoop.
- Handling continuous streaming data comes from different sources using Flume and setdestination as HDFS.
- Integrated spring schedulers with Oozie client as beans to handle cron jobs.
- Experience with CDH distribution and Cloudera Manager to manage and monitor Hadoop clusters
- Actively participated in software development lifecycle (scope, design, implement, deploy, test), including design and code reviews.
- Involved in story-driven agile development methodology and actively participated in daily scrum meetings.
Environment: Hadoop, HDFS, Mapreduce, Hive, Pig, Hbase, Sqoop, RDBMS/DB, Flat files, Mysql, CSV, Avro data files.
Confidential
Java Developer
Responsibilities:
- Implemented applications using Java, J2EE, JSP, Servlets, JDBC, RAD, XML, HTML, XHTML, Hibernate Struts, spring and JavaScript on Windows environments.
- Experienced in developing web-based applications using Python, Django, PHP, XML, CSS, HTML, JavaScript and jQuery.
- Designed and implemented the training and reports modules of the application using Servlets, JSP and Ajax.
- Developed XML Web Services using SOAP, WSDL, and UDDI.
- Created the UI tool - using Java, XML, XSLT, DHTML and JavaScript
- Experience in develop of SDLC life cycle and undergo in all the phases in it.
- Developed action Servlets and JSPs for presentation in Struts MVC framework.
- Worked with Struts MVC objects like Action Servlet, Controllers, validators, Web Application Context, Handler Mapping, Message Resource Bundles, Form Controller, and JNDI for look-up for J2EE components.
- Developed PL/SQL View function in Oracle9i database for get available date module.
- Used Oracle SQL 4.0 as the database and write SQL queries in the DAO Layer.
- Experience in application using Core Java, JDBC, JSP, Servlets, spring, Hibernate, Web Services, SOAP, and WSD.
- Used RESTFUL Services to interact with the Client by providing the RESTFUL URL mapping.
- Implementing project using Agile SCRUM methodology, involved in daily stand up meetings and sprint showcase and sprint retrospective.
- Used SVN and GitHub as version control tool.
- Implemented Hibernate in the data access object layer to access and update information in the Oracle 10g Database.
- Developed presentation layer using HTML, JSP, Ajax, CSS and JQuery.
- Experience in JIRA and tracked the test results and interacted with the developers to resolve issue.
- Used XSLT to transform my XML data structure into HTML pages.
- Deployed EJB Components on Tomcat. Used JDBCAPI for interaction with OracleDB.
- Wrote build & deployment scripts using shell, Perl and ANTscripts
- Extensively used Java multi-threading to implement batch Jobs with JDK 1.5 features
Environment: HTML, JavaScript, Ajax, Servlets, JSP, SOAP, SDLC life cycle, Java, Hibernate, Scrum, JIRA, Github, JQuery, CSS, XML, ANT, Tomcat Server, Jasper Reports.
Confidential
Jr. Java Developer
Responsibilities:
- Actively involved from fresh start of the project, requirement gathering to quality assurance testing.
- Coded and Developed Multi-tier architecture in Java, J2EE, Servlets.
- Conducted analysis, requirements study and design according to various design patterns and developedrendering to the use cases, taking ownership of the features.
- Used various design patterns such as Command, Abstract Factory, Factory, and Singleton to improvethe system performance.
- Analyzing the critical coding defects and developing solutions.
- Developed configurable front end using Struts technology. Also involved in component-baseddevelopment of certain features which were reusable across modules.
- Designed, developed and maintained the data layer using the ORM framework called Hibernate.
- Used Hibernate framework for Persistence layer, involved in writing Stored Procedures for dataretrieval and data storage and updates in Oracle database using Hibernate.
- Developed batch jobs which will run on specified time to implement certain logic in java platform.
- Developing & deploying Archive files (EAR, WAR, JAR) using ANT build tool.
- Used Software development best practices for Object Oriented Design and methodologies throughoutObject oriented development cycle.
- Responsible for developing SQL Queries required for the JDBC.
- Designed the database and worked on DB2 and executed DDLS and DMLS.
- Active participation in architecture framework design and coding and test plan development.
- Strictly followed Water Fall development methodologies for implementing projects.
- Thoroughly documented the detailed process flow with UML diagrams and flow charts for distributionacross various teams.
- Involved in developing training presentations for developers (off shore support), QA, Productionsupport.
- Presented the process logical and physical flow to various teams using PowerPoint and Visio diagrams.
Environment: Java, Ajax, Informatica Power Center 8.x/9.x, REST API, SOAP API, Apache, Oracle 10/11g, SQL Loader, MYSQL SERVER, Flat Files, Targets, Aggregator, Router, Sequence Generator.