We provide IT Staff Augmentation Services!

Sr.spark/hadoop Developer Resume

2.00/5 (Submit Your Rating)

Omaha, NE

SUMMARY

  • 8 years of professional IT experience in analyzing requirements, designing, building, highly distributed mission critical products and applications.
  • Highly dedicated and results oriented Hadoop Developer with 4+ years of strong end - to-end experience on Hadoop Development with varying level of expertise around different BIGDATA Environment projects.
  • Expertise in core Hadoop and Hadoop technology stack which includes HDFS, MapReduce, Oozie, Hive, Sqoop, Pig, Flume, HBase, Spark, Kafka, and Zookeeper.
  • Having experience on RDD architecture and implementing spark operations on RDD and also optimizing transformations and actions in spark.
  • Reviewing and managing Hadoop log files by consolidating logs from multiple machines using flume.
  • Collected the logs data from web servers and integrated in to HDFS using Flume.
  • Wrote Flume configuration files for importing streaming log data into HBase with Flume
  • Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice-versa.
  • Experience in installation and setup of various Kafkaproducers and consumers along with the Kafka brokers and topics.
  • Hands on experience in application development using Java, RDBMS, and Linux shell scripting.
  • Experienced in managing Hadoop cluster using Cloudera Manager Tool.
  • Knowledge of ETL methods for data extraction, transformation and loading in corporate-wide ETL Solutions and Data warehouse tools for reporting and data analysis.
  • Experience with Oozie Workflow Engine in running workflow jobs with actions that run Java MapReduce and Pig jobs.
  • Great hands on experience withPysparkfor using Spark libiries by using python scripting for data analysis.
  • Implemented data science algorithms like shift detection in critical data points using Spark, doubling the performance.
  • Experience in importing and exporting the data using Sqoop from HDFS to Relational Database systems/mainframe and vice-versa.
  • Extending Hive and Pig core functionality by writing customUDFs.
  • Experience in analyzing data using HiveQL, Pig Latin, and custom Map Reduce programs in Java.
  • Experience in Apache Flume for efficiently collecting, aggregating, and moving large amounts of log data.
  • Involved in developing web-services using REST, HBase Native API Client to query data from HBase.
  • Experienced in working with structured data using Hive QL, join operations, Hive UDFs, partitions, bucketing and internal/external tables.
  • Involvement in creating custom UDFs for Pig and Hive to consolidate strategies and usefulness of Python/Java into Pig Latin and HQL (HiveQL).
  • Involved in converting Cassandra/Hive/SQL queries intoSparktransformations usingSparkRDD's in Scala and Python.
  • Used highly available AWS Environment to launch the applications in different regions and implemented Cloud Front with AWS Lambda to reduce latency.
  • Implemented CRUD operations using CQL on top of Cassandra file system.
  • Used Cassandra CQL with Java APIs to retrieve data from Cassandra tables.
  • Set up Solr for distributing indexing and search
  • Used Solr to enable indexing for enabling searching on Non-primary key columns from Cassandrakey spaces.
  • Excellent working Knowledge in Spark Core, Spark SQL, Spark Streaming.
  • Real time exposure to Amazon Web Services, AWS command line interface, and AWS datapipeline.
  • Work experience with cloud infrastructure like Amazon Web Services (AWS).
  • Extensive experience in working with various distributions of Hadoop like enterprise versions of Cloudera(CDH4/CDH5), Hortonworks and good knowledge on MAPR distribution, IBM Big Insights and Amazon’s EMR (Elastic MapReduce).
  • Experience in design and develop the POC in Spark using Scala to compare the performance of Spark with Hive and SQL/Oracle.
  • Developed automated processes for flattening the upstream data from Cassandra which in JSON format. Used Hive UDFs to flatten the JSON Data.
  • Expertise in developing responsive Front-End components with JavaScript, JSP, HTML, XHTML, Servlets, Ajax, and AngularJS.
  • Experience as a Java Developer in Web/intranet, client/server technologies using Java, J2EE, Servlets, JSP, JSF, EJB, JDBC and SQL.
  • Experience in setting up automated monitoring and escalation infrastructure for Hadoop Cluster using Ganglia and Nagios.
  • Good Understanding in Apache Hue.
  • Techno-functional responsibilities include interfacing with users, identifying functional and technical gaps, estimates, designing custom solutions, development, leading developers, producing documentation, and production support.
  • Good in using version control like GITHUB and SVN.

TECHNICAL SKILLS

Hadoop Distribution: Horton works, Cloudera (CDH3, CDH4, CDH5), Apache, Amazon AWS(EMR),MapR and Azure.

Hadoop Data Services: Hadoop HDFS, Map Reduce, Yarn,HIVE, PIG, Pentaho, HBase, ZooKeeper, Sqoop, Oozie, Cassandra, Spark, Scala, Storm, Flume, Kafka and Avro,Parquet,Snappy,Nifi.

Hadoop Operational Services: Zookeeper, Oozie

NO SQL Databases: HBase, Cassandra, MongoDB, Neo4j, Redis

Cloud Services: Amazon AWS

Languages: SQL, PL/SQL, Pig Latin, HiveQL, Unix Shell Scripting, HTML, XML(XSD,XSLT,DTD) C, C++, Java, JavaScript Python, Scala

ETL Tools: Informatica, IBM DataStage, Talend

Java & J2EE Technologies: Core Java, Servlets, Hibernate, Spring, Struts, JSP, JDBC, EJB

Application Servers: Web Logic, Web Sphere, Tomcat.

Databases: Oracle,MySQL,DB2, Teradata, MS SQL Server,SQL/NOSQL,HBase,Cassandra,Neo4j

Operating Systems: UNIX, Windows, iOS, LINUX

Methodologies: Agile(Scrum), Waterfall

Other Tools: Putty, WinSCP, Stream Weaver.

PROFESSIONAL EXPERIENCE

Confidential, Omaha, NE

Sr.Spark/Hadoop Developer

Responsibilities:

  • Extensively migrated existing architecture to Spark Streaming to process the live streaming data.
  • Responsible forSparkCore configuration based on type of Input Source.
  • Executed Spark code using Scala forSpark Streaming/SQL for faster processing of data.
  • Performed SQL Joins among Hive tables to get input for Spark batch process.
  • Gathered the business requirements from the Business Partners and Subject Matter Experts.
  • Developed Python code to gather the data from HBase and designs the solution to implement usingPySpark.
  • DevelopedPySparkcode to mimic the transformations performed in the on-premise environment.
  • Analyzed the sql scripts and designed solutions to implement using pyspark. created custom new columns depending up on the use case while ingesting the data into hadoop lake using pyspark.
  • Developed environmental search engine using JAVA, Apache SOLR and MYSQL.
  • Analyze Cassandra database and compare it with other open-source NoSQL databases to find which one of them better suites the current requirement.
  • Integrated Cassandra as a distributed persistent metadata store to provide metadata resolution for network entities on the network.
  • Implemented Spark using Scala and Spark SQL for faster testing and processing of data.
  • Designed multiple Python packages that were used within a large ETL process used to load 2TB of data from an existing Oracle database into a new PostgreSQL cluster
  • Involved in converting Hive/Sql queries into Spark transformations using Spark RDD’s.
  • Loading data from Linux file system to HDFS and vice-versa
  • Developed UDF’s using both Data Frames/Sql and RDD in Spark For data Aggregation queries and reverting back into OLTP through sqoop.
  • Knowledge of ETL methods for data extraction, transformation and loading in corporate-wide ETL Solutions and Data warehouse tools for reporting and data analysis.
  • Implementing advanced procedures like text analytics and processing using the in-memory computing capabilities like Apache Spark written in Scala.
  • Installed and monitored Hadoop ecosystems tools on multiple operating systems like Ubuntu, CentOS.
  • Performed advanced procedures like text analytics and processing, using the in-memory computing capabilities of Spark using Scala.
  • Exported the patterns analyzed back into Teradata using Sqoop. Continuous monitoring and managing theHadoopcluster through Cloudera Manager.
  • Continuously monitored and managed theHadoopCluster using Cloudera Manager.
  • Participated in development/implementation of Cloudera impalaHadoopenvironment.
  • Utilized ApacheHadoopenvironment by Cloudera.
  • Collect the data using SparkStreaming and dump into Cassandra Cluster
  • Developed Scala scripts using both Data frames/SQL/Datasets and RDD/MapReduce in Spark for Data aggregation, queries and writing data back into OLTP system through Sqoop.
  • Extensively use Zookeeper as job scheduler for Spark Jobs.
  • Extending Hive and Pig core functionality by writing custom UDFs.
  • Wrote Java code to format XML documents; upload them toSolrserver for indexing.
  • Used AWS to export MapReduce jobs into Spark RDD transformations.
  • Writing AWS Terraform templates for any automation requirements in AWS services.
  • Used Spark API over Hortonworks HadoopYARN to perform analytics on data in Hive.
  • Deploy and configured cloud AWS EC2 for client websites moving from self-hosted services forscalability purposes.
  • Work with multiple teams to provision AWSinfrastructure for development and productionenvironments.
  • Experience in designing Kafka for multi data center cluster and monitoring it.
  • Designed number of partitions and replication factor for Kafka topics based on business requirements.
  • Worked on migrating MapReduce programs into Spark transformations using Spark and Scala, initially done using python (PySpark).
  • Implemented a distributed messaging queue to integrate with Cassandra using Apache Kafka and Zookeeper.
  • Experience on Kafka and Spark integration for real time data processing.
  • Developed Kafka producer and consumer components for real time data processing.
  • Hands-on experience for setting up Kafka mirror maker for data replication across the cluster’s.
  • Experience in Configure, Design, Implement and monitor Kafka Cluster and connectors.
  • Oracle SQL tuning using explain plan.
  • Manipulate, serialize, model data in multiple forms like JSON, XML.
  • Involved in setting up map reduce 1 and map reduce 2.
  • Prepared Avro schema files for generating Hive tables.
  • Used Impala connectivity from the User Interface(UI) and query the results using ImpalaQL.
  • Worked on physical transformations of data model which involved in creating Tables, Indexes, Joins, Views and Partitions.
  • Involved in Analysis, Design, System architectural design, Process interfaces design, design, documentation.
  • Used Jira for bug tracking and Bit Bucket to check-in and checkout code changes.
  • Involved in Cassandra Data modelling to create key spaces and tables in multi Data Center DSE Cassandra DB.
  • Utilized Agile and Scrum Methodology to help manage and organize a team of developers with regular code review sessions.

Environment: Cloudara, spark, imapala, sqoop,flume,Cassandra,kafka,hivezooKeeper,oozie,RDBMS,AWS.

Confidential, San Francisco, CA

Sr. Spark/Hadoop Developer

Responsibilities:

  • Analyzed large and critical datasets using Cloudera, HDFS, HBase, MapReduce, Hive, Hive UDF, Pig, Sqoop, Zookeeper and Spark.
  • Responsible to manage data coming from different sources.
  • Developed Batch Processing jobs using Pig and Hive.
  • Involved in gathering the business requirements from the Business Partners and Subject Matter Experts.
  • Worked with different File Formats like TEXTFILE, AVROFILE, ORC, and PARQUET for HIVE querying and processing.
  • Importing and exporting data into HDFS and Hive using Sqoop.
  • Implemented Elastic Search on Hive data warehouse platform.
  • Good experience in analyzing Hadoop cluster and different analytic tools like Pig, Impala.
  • Imported data from AWS S3 into Spark RDD, Performed transformations and actions on RDD's.
  • Experienced in managing andreviewingHadooplog files.
  • Extracted files from CouchDB through Sqoop and placed in HDFS and processed.
  • Experienced in runningHadoopstreaming jobs to process terabytes of xml format data.
  • Experienced in working with spark eco system using Sparksql and Scala queries on different formats like Text file, CSV file.
  • Created concurrent access for Hive tables with shared and exclusive locking that can be enabled in Hive with the help of Zookeeper implementation in the cluster.
  • Storing and loading the data from HDFS to Amazon S3 and backing up the Namespace data into NFS.
  • Implemented Name Node backup using NFS. This was done for High availability.
  • Designed workflows and coordinators in Oozie to automate and parallelize Hive jobs on Apache Hadoop environment by Hortonworks (HDP 2.2).
  • Responsible for building scalable distributed data solutions usingHadoopcluster environment with Horton works distribution.
  • Integrated Hive server 2 with Tableau using Horton Works Hive ODBC driver, for auto generation of Hive queries for non-technical business user.ni
  • Troubleshooting, Manage and review data backups, Manage and reviewHadooplog files. Hortonworks Cluster.
  • Used PIG to perform data validation on the data ingested using Sqoop and Flume and the cleansed data set is pushed into MongoDB.
  • Ingested streaming data with Apache NiFi into Kafka.
  • Worked with Nifi for managing the flow of data from sources through automated data flow.
  • Designed and implemented the MongoDB schema.
  • Wrote services to store and retrieve user data from the MongoDB for the application on devices.
  • Used Mongoose API to access the MongoDB from NodeJS.
  • Created and Implemented Business validation and coverage Price Gap Rules in Talend on Hive, using Talend Tool.
  • Wrote shell scripts for rolling day-to-day processes and it is automated.
  • Written the shell scripts to monitor the data ofHadoopdaemon services and respond accordingly to any warning or failure conditions.

Environment: Apache Flume, Hive, Pig, HDFS, Zookeeper, Sqoop, RDBMS, AWS, MongoDB, Talend, Shell Scripts, Eclipse, WinSCP, Hortonworks.

Confidential, San Diego, CA

Hadoop Developer

Responsibilities:

  • Launching and Setup of HADOOP Cluster which includes configuring different components of HADOOP.
  • Hands on experience in loading data from UNIX file system to HDFS.
  • Wrote the Map Reduce jobs to parse the web logs which are stored in HDFS.
  • Managing theHadoopdistribution with Cloudera Manager, Cloudera Navigator, Hue.
  • Developed Simple to complex MapReduce Jobs using Hive and Pig.
  • Developedmultiple Map Reduce jobs in PIG and Hive for data cleaning and pre-processing.
  • Cluster coordination services through Zookeeper.
  • Designed and implemented Hive queries and functions for evaluation, filtering, loading and storing of data.
  • Installed Oozie workflow engine to run multiple Hive and Pig jobs which run independently with time and data availability.
  • Expertise in Partitions, Bucketing concepts in Hive and analyzed the data using the HiveQL
  • Installed and configured Flume, Hive, PIG, Sqoop and Oozie on the Hadoop cluster.
  • Involved in creating Hive tables, loading data and running Hive queries in those data.
  • Extensive Working knowledge of partitioned table, UDFs, performance tuning, compression-related properties, thrift server in Hive.
  • Used Hive to analyze data ingested into HBase by using Hive-HBase integration and compute various metrics for reporting on the dashboard.
  • Loading the data to HBase Using Pig, Hive and Java API's.
  • Worked with NoSQL databases like HBase in creating HBase tables to load large sets of semi-structured.
  • Experienced with performing CURD operations in HBase.
  • Collected and aggregated large amounts of web log data from different sources such as web servers, mobile using Apache Flume and stored the data into HDFS/HBase for analysis.
  • Developed Flume ETL job for handling data from HTTP Source and Sink as HDFS.
  • Involved in writing optimized PIG Script along with involved in developing and testing PIG Latin Scripts.
  • Created Map Reduce programs for some refined queries on big data.
  • Working knowledge in writing PIG's Load and Store functions.

Environment: Apache Hadoop 1.0.1, MapReduce, HDFS, CentOS, Zookeeper, Sqoop, Cassandra, Hive, PIG, Oozie, Java, Eclipse, Amazon EC2, JSP, Servlets.

Confidential, Dallas, TX

Java/Hadoop Developer

Responsibilities:

  • Developed the application using Struts Framework that leverages classical Model View Layer (MVC) architecture UML diagrams like use cases, class diagrams, interaction diagrams, and activity diagrams were used.
  • Participated in requirement gathering and converting the requirements into technical specifications.
  • Extensively worked on User Interface for few modules using JSPs, JavaScript and Ajax.
  • Created Business Logic using Servlets, Session beans and deployed them on Web logic server.
  • Wrote complex SQL queries and stored procedures.
  • Developed the XML Schema and Amazon Web services for the data maintenance and structures.
  • Worked on analyzing, writing Hadoop MapReduce jobs using JavaAPI,Pig and hive.
  • Selecting the appropriate AWS service based upon data, compute, system requirements.
  • Implemented the Web Service client for the login authentication, credit reports and applicant information using Apache Axis 2 Web Service.
  • Experience in creating integration between Hive and HBase for effective usage and performed MR Unit testing for the Map Reduce jobs.
  • Got good experience with NOSQL database like MongoDB.
  • Involved in loading data from UNIX file system to HDFS.
  • Installed and configured Hive and also written Hive UDFs.
  • Designed the logical and physical data model, generated DDL scripts, and wrote DML scripts for Oracle 9i database.
  • Used Hibernate ORM framework with spring framework for data persistence and transaction management.
  • Used struts validation framework for form level validation.
  • Wrote test cases in JUnit for unit testing of classes.
  • Involved in templates and screens in HTML and JavaScript.
  • Involved in integrating Web Services using WSDL and UDDI.
  • Suggested latest upgrades for Hadoop clusters.
  • Created HBase tables to load large sets of data coming from UNIX and NoSQL
  • Built and deployed Java applications into multiple Unix based environments and produced both unit and functional test results along with release notes

Environment: JDK 1.5, J2EE 1.4, Struts 1.3, JSP, Servlets 2.5, WebSphere 6.1, HTML, XML, ANT 1.6, JavaScript, Node.js, JUnit 3.8, HDFS, MongoDB, Hive, HBase UNIX, AWS

Confidential

Java Developer

Responsibilities:

  • Played an active role in the team by interacting with welfare business analyst/program specialists and converted business requirements into system requirements.
  • Developed and deployed UI layer logics of sites using JSP.
  • Struts (MVC) is used for implementation of business model logic.
  • Worked with Struts MVC objects like Action Servlet, Controllers, and validators, Web Application Context, Handler Mapping, Message Resource Bundles and JNDI for look-up for J2EE components.
  • Developed dynamic JSP pages with Struts.
  • Developed the XML data object to generate the PDF documents and other reports.
  • Used Hibernate, DAO, and JDBC for data retrieval and medications from database.
  • Messaging and interaction of Web Services is done using SOAP and REST
  • Developed JUnit Test cases for Unit Test cases and as well as System and User test scenarios
  • Involved in Unit Testing, User Acceptance Testing and Bug Fixing.

Environment: J2EE, JDBC, Java 1.4, Servlets, JSP, Struts, Hibernate, Web services, RESTfull services, SOAP, WSDL, Design Patterns, MVC, HTML, JavaScript 1.2, WebLogic 8.0, XML, Junit, Oracle 10g, My Eclipse.

Confidential

Java Developer

Responsibilities:

  • Implemented the project according to the Software Development Life Cycle (SDLC)
  • Implemented JDBC for mapping an object-oriented domain model to a traditional relational database
  • Created Stored Procedures to manipulate the database and to apply the business logic according to the user’s specifications
  • Developed the Generic Classes, which includes the frequently used functionality, so that it can be reusable
  • Exception Management mechanism using Exception Handling Application Blocks to handle the exceptions
  • Designed and developed user interfaces using JSP, JavaScript and HTML
  • Involved in Database design and developing SQL Queries, stored procedures on MySQL
  • Used CVS for maintaining the Source Code
  • Logging was done through log4j

Environment: JAVA, Java Script, HTML, log4j, JDBC Drivers, Soap Web Services, UNIX, Shell scripting, SQL Server.

We'd love your feedback!