Sr.Spark/Hadoop Developer Resume Omaha, NE - Hire IT People

SUMMARY

8 years of professional IT experience in analyzing requirements, designing, building, highly distributed mission critical products and applications.
Highly dedicated and results oriented Hadoop Developer with 4+ years of strong end - to-end experience on Hadoop Development with varying level of expertise around different BIGDATA Environment projects.
Expertise in core Hadoop and Hadoop technology stack which includes HDFS, MapReduce, Oozie, Hive, Sqoop, Pig, Flume, HBase, Spark, Kafka, and Zookeeper.
Having experience on RDD architecture and implementing spark operations on RDD and also optimizing transformations and actions in spark.
Reviewing and managing Hadoop log files by consolidating logs from multiple machines using flume.
Collected the logs data from web servers and integrated in to HDFS using Flume.
Wrote Flume configuration files for importing streaming log data into HBase with Flume
Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice-versa.
Experience in installation and setup of various Kafkaproducers and consumers along with the Kafka brokers and topics.
Hands on experience in application development using Java, RDBMS, and Linux shell scripting.
Experienced in managing Hadoop cluster using Cloudera Manager Tool.
Knowledge of ETL methods for data extraction, transformation and loading in corporate-wide ETL Solutions and Data warehouse tools for reporting and data analysis.
Experience with Oozie Workflow Engine in running workflow jobs with actions that run Java MapReduce and Pig jobs.
Great hands on experience withPysparkfor using Spark libiries by using python scripting for data analysis.
Implemented data science algorithms like shift detection in critical data points using Spark, doubling the performance.
Experience in importing and exporting the data using Sqoop from HDFS to Relational Database systems/mainframe and vice-versa.
Extending Hive and Pig core functionality by writing customUDFs.
Experience in analyzing data using HiveQL, Pig Latin, and custom Map Reduce programs in Java.
Experience in Apache Flume for efficiently collecting, aggregating, and moving large amounts of log data.
Involved in developing web-services using REST, HBase Native API Client to query data from HBase.
Experienced in working with structured data using Hive QL, join operations, Hive UDFs, partitions, bucketing and internal/external tables.
Involvement in creating custom UDFs for Pig and Hive to consolidate strategies and usefulness of Python/Java into Pig Latin and HQL (HiveQL).
Involved in converting Cassandra/Hive/SQL queries intoSparktransformations usingSparkRDD's in Scala and Python.
Used highly available AWS Environment to launch the applications in different regions and implemented Cloud Front with AWS Lambda to reduce latency.
Implemented CRUD operations using CQL on top of Cassandra file system.
Used Cassandra CQL with Java APIs to retrieve data from Cassandra tables.
Set up Solr for distributing indexing and search
Used Solr to enable indexing for enabling searching on Non-primary key columns from Cassandrakey spaces.
Excellent working Knowledge in Spark Core, Spark SQL, Spark Streaming.
Real time exposure to Amazon Web Services, AWS command line interface, and AWS datapipeline.
Work experience with cloud infrastructure like Amazon Web Services (AWS).
Extensive experience in working with various distributions of Hadoop like enterprise versions of Cloudera(CDH4/CDH5), Hortonworks and good knowledge on MAPR distribution, IBM Big Insights and Amazon’s EMR (Elastic MapReduce).
Experience in design and develop the POC in Spark using Scala to compare the performance of Spark with Hive and SQL/Oracle.
Developed automated processes for flattening the upstream data from Cassandra which in JSON format. Used Hive UDFs to flatten the JSON Data.
Expertise in developing responsive Front-End components with JavaScript, JSP, HTML, XHTML, Servlets, Ajax, and AngularJS.
Experience as a Java Developer in Web/intranet, client/server technologies using Java, J2EE, Servlets, JSP, JSF, EJB, JDBC and SQL.
Experience in setting up automated monitoring and escalation infrastructure for Hadoop Cluster using Ganglia and Nagios.
Good Understanding in Apache Hue.
Techno-functional responsibilities include interfacing with users, identifying functional and technical gaps, estimates, designing custom solutions, development, leading developers, producing documentation, and production support.
Good in using version control like GITHUB and SVN.

TECHNICAL SKILLS

Hadoop Distribution: Horton works, Cloudera (CDH3, CDH4, CDH5), Apache, Amazon AWS(EMR),MapR and Azure.

Hadoop Data Services: Hadoop HDFS, Map Reduce, Yarn,HIVE, PIG, Pentaho, HBase, ZooKeeper, Sqoop, Oozie, Cassandra, Spark, Scala, Storm, Flume, Kafka and Avro,Parquet,Snappy,Nifi.

Hadoop Operational Services: Zookeeper, Oozie

NO SQL Databases: HBase, Cassandra, MongoDB, Neo4j, Redis

Cloud Services: Amazon AWS

Languages: SQL, PL/SQL, Pig Latin, HiveQL, Unix Shell Scripting, HTML, XML(XSD,XSLT,DTD) C, C++, Java, JavaScript Python, Scala

ETL Tools: Informatica, IBM DataStage, Talend

Java & J2EE Technologies: Core Java, Servlets, Hibernate, Spring, Struts, JSP, JDBC, EJB

Application Servers: Web Logic, Web Sphere, Tomcat.

Databases: Oracle,MySQL,DB2, Teradata, MS SQL Server,SQL/NOSQL,HBase,Cassandra,Neo4j

Operating Systems: UNIX, Windows, iOS, LINUX

Methodologies: Agile(Scrum), Waterfall

Other Tools: Putty, WinSCP, Stream Weaver.

PROFESSIONAL EXPERIENCE

Confidential, Omaha, NE

Sr.Spark/Hadoop Developer

Responsibilities:

Extensively migrated existing architecture to Spark Streaming to process the live streaming data.
Responsible forSparkCore configuration based on type of Input Source.
Executed Spark code using Scala forSpark Streaming/SQL for faster processing of data.
Performed SQL Joins among Hive tables to get input for Spark batch process.
Gathered the business requirements from the Business Partners and Subject Matter Experts.
Developed Python code to gather the data from HBase and designs the solution to implement usingPySpark.
DevelopedPySparkcode to mimic the transformations performed in the on-premise environment.
Analyzed the sql scripts and designed solutions to implement using pyspark. created custom new columns depending up on the use case while ingesting the data into hadoop lake using pyspark.
Developed environmental search engine using JAVA, Apache SOLR and MYSQL.
Analyze Cassandra database and compare it with other open-source NoSQL databases to find which one of them better suites the current requirement.
Integrated Cassandra as a distributed persistent metadata store to provide metadata resolution for network entities on the network.
Implemented Spark using Scala and Spark SQL for faster testing and processing of data.
Designed multiple Python packages that were used within a large ETL process used to load 2TB of data from an existing Oracle database into a new PostgreSQL cluster
Involved in converting Hive/Sql queries into Spark transformations using Spark RDD’s.
Loading data from Linux file system to HDFS and vice-versa
Developed UDF’s using both Data Frames/Sql and RDD in Spark For data Aggregation queries and reverting back into OLTP through sqoop.
Knowledge of ETL methods for data extraction, transformation and loading in corporate-wide ETL Solutions and Data warehouse tools for reporting and data analysis.
Implementing advanced procedures like text analytics and processing using the in-memory computing capabilities like Apache Spark written in Scala.
Installed and monitored Hadoop ecosystems tools on multiple operating systems like Ubuntu, CentOS.
Performed advanced procedures like text analytics and processing, using the in-memory computing capabilities of Spark using Scala.
Exported the patterns analyzed back into Teradata using Sqoop. Continuous monitoring and managing theHadoopcluster through Cloudera Manager.
Continuously monitored and managed theHadoopCluster using Cloudera Manager.
Participated in development/implementation of Cloudera impalaHadoopenvironment.
Utilized ApacheHadoopenvironment by Cloudera.
Collect the data using SparkStreaming and dump into Cassandra Cluster
Developed Scala scripts using both Data frames/SQL/Datasets and RDD/MapReduce in Spark for Data aggregation, queries and writing data back into OLTP system through Sqoop.
Extensively use Zookeeper as job scheduler for Spark Jobs.
Extending Hive and Pig core functionality by writing custom UDFs.
Wrote Java code to format XML documents; upload them toSolrserver for indexing.
Used AWS to export MapReduce jobs into Spark RDD transformations.
Writing AWS Terraform templates for any automation requirements in AWS services.
Used Spark API over Hortonworks HadoopYARN to perform analytics on data in Hive.
Deploy and configured cloud AWS EC2 for client websites moving from self-hosted services forscalability purposes.
Work with multiple teams to provision AWSinfrastructure for development and productionenvironments.
Experience in designing Kafka for multi data center cluster and monitoring it.
Designed number of partitions and replication factor for Kafka topics based on business requirements.
Worked on migrating MapReduce programs into Spark transformations using Spark and Scala, initially done using python (PySpark).
Implemented a distributed messaging queue to integrate with Cassandra using Apache Kafka and Zookeeper.
Experience on Kafka and Spark integration for real time data processing.
Developed Kafka producer and consumer components for real time data processing.
Hands-on experience for setting up Kafka mirror maker for data replication across the cluster’s.
Experience in Configure, Design, Implement and monitor Kafka Cluster and connectors.
Oracle SQL tuning using explain plan.
Manipulate, serialize, model data in multiple forms like JSON, XML.
Involved in setting up map reduce 1 and map reduce 2.
Prepared Avro schema files for generating Hive tables.
Used Impala connectivity from the User Interface(UI) and query the results using ImpalaQL.
Worked on physical transformations of data model which involved in creating Tables, Indexes, Joins, Views and Partitions.
Involved in Analysis, Design, System architectural design, Process interfaces design, design, documentation.
Used Jira for bug tracking and Bit Bucket to check-in and checkout code changes.
Involved in Cassandra Data modelling to create key spaces and tables in multi Data Center DSE Cassandra DB.
Utilized Agile and Scrum Methodology to help manage and organize a team of developers with regular code review sessions.

Environment: Cloudara, spark, imapala, sqoop,flume,Cassandra,kafka,hivezooKeeper,oozie,RDBMS,AWS.

Confidential, San Francisco, CA

Sr. Spark/Hadoop Developer

Responsibilities:

Analyzed large and critical datasets using Cloudera, HDFS, HBase, MapReduce, Hive, Hive UDF, Pig, Sqoop, Zookeeper and Spark.
Responsible to manage data coming from different sources.
Developed Batch Processing jobs using Pig and Hive.
Involved in gathering the business requirements from the Business Partners and Subject Matter Experts.
Worked with different File Formats like TEXTFILE, AVROFILE, ORC, and PARQUET for HIVE querying and processing.
Importing and exporting data into HDFS and Hive using Sqoop.
Implemented Elastic Search on Hive data warehouse platform.
Good experience in analyzing Hadoop cluster and different analytic tools like Pig, Impala.
Imported data from AWS S3 into Spark RDD, Performed transformations and actions on RDD's.
Experienced in managing andreviewingHadooplog files.
Extracted files from CouchDB through Sqoop and placed in HDFS and processed.
Experienced in runningHadoopstreaming jobs to process terabytes of xml format data.
Experienced in working with spark eco system using Sparksql and Scala queries on different formats like Text file, CSV file.
Created concurrent access for Hive tables with shared and exclusive locking that can be enabled in Hive with the help of Zookeeper implementation in the cluster.
Storing and loading the data from HDFS to Amazon S3 and backing up the Namespace data into NFS.
Implemented Name Node backup using NFS. This was done for High availability.
Designed workflows and coordinators in Oozie to automate and parallelize Hive jobs on Apache Hadoop environment by Hortonworks (HDP 2.2).
Responsible for building scalable distributed data solutions usingHadoopcluster environment with Horton works distribution.
Integrated Hive server 2 with Tableau using Horton Works Hive ODBC driver, for auto generation of Hive queries for non-technical business user.ni
Troubleshooting, Manage and review data backups, Manage and reviewHadooplog files. Hortonworks Cluster.
Used PIG to perform data validation on the data ingested using Sqoop and Flume and the cleansed data set is pushed into MongoDB.
Ingested streaming data with Apache NiFi into Kafka.
Worked with Nifi for managing the flow of data from sources through automated data flow.
Designed and implemented the MongoDB schema.
Wrote services to store and retrieve user data from the MongoDB for the application on devices.
Used Mongoose API to access the MongoDB from NodeJS.
Created and Implemented Business validation and coverage Price Gap Rules in Talend on Hive, using Talend Tool.
Wrote shell scripts for rolling day-to-day processes and it is automated.
Written the shell scripts to monitor the data ofHadoopdaemon services and respond accordingly to any warning or failure conditions.

Environment: Apache Flume, Hive, Pig, HDFS, Zookeeper, Sqoop, RDBMS, AWS, MongoDB, Talend, Shell Scripts, Eclipse, WinSCP, Hortonworks.

Confidential, San Diego, CA

Hadoop Developer

Responsibilities:

Launching and Setup of HADOOP Cluster which includes configuring different components of HADOOP.
Hands on experience in loading data from UNIX file system to HDFS.
Wrote the Map Reduce jobs to parse the web logs which are stored in HDFS.
Managing theHadoopdistribution with Cloudera Manager, Cloudera Navigator, Hue.
Developed Simple to complex MapReduce Jobs using Hive and Pig.
Developedmultiple Map Reduce jobs in PIG and Hive for data cleaning and pre-processing.
Cluster coordination services through Zookeeper.
Designed and implemented Hive queries and functions for evaluation, filtering, loading and storing of data.
Installed Oozie workflow engine to run multiple Hive and Pig jobs which run independently with time and data availability.
Expertise in Partitions, Bucketing concepts in Hive and analyzed the data using the HiveQL
Installed and configured Flume, Hive, PIG, Sqoop and Oozie on the Hadoop cluster.
Involved in creating Hive tables, loading data and running Hive queries in those data.
Extensive Working knowledge of partitioned table, UDFs, performance tuning, compression-related properties, thrift server in Hive.
Used Hive to analyze data ingested into HBase by using Hive-HBase integration and compute various metrics for reporting on the dashboard.
Loading the data to HBase Using Pig, Hive and Java API's.
Worked with NoSQL databases like HBase in creating HBase tables to load large sets of semi-structured.
Experienced with performing CURD operations in HBase.
Collected and aggregated large amounts of web log data from different sources such as web servers, mobile using Apache Flume and stored the data into HDFS/HBase for analysis.
Developed Flume ETL job for handling data from HTTP Source and Sink as HDFS.
Involved in writing optimized PIG Script along with involved in developing and testing PIG Latin Scripts.
Created Map Reduce programs for some refined queries on big data.
Working knowledge in writing PIG's Load and Store functions.

Environment: Apache Hadoop 1.0.1, MapReduce, HDFS, CentOS, Zookeeper, Sqoop, Cassandra, Hive, PIG, Oozie, Java, Eclipse, Amazon EC2, JSP, Servlets.

Confidential, Dallas, TX

Java/Hadoop Developer

Responsibilities:

Developed the application using Struts Framework that leverages classical Model View Layer (MVC) architecture UML diagrams like use cases, class diagrams, interaction diagrams, and activity diagrams were used.
Participated in requirement gathering and converting the requirements into technical specifications.
Extensively worked on User Interface for few modules using JSPs, JavaScript and Ajax.
Created Business Logic using Servlets, Session beans and deployed them on Web logic server.
Wrote complex SQL queries and stored procedures.
Developed the XML Schema and Amazon Web services for the data maintenance and structures.
Worked on analyzing, writing Hadoop MapReduce jobs using JavaAPI,Pig and hive.
Selecting the appropriate AWS service based upon data, compute, system requirements.
Implemented the Web Service client for the login authentication, credit reports and applicant information using Apache Axis 2 Web Service.
Experience in creating integration between Hive and HBase for effective usage and performed MR Unit testing for the Map Reduce jobs.
Got good experience with NOSQL database like MongoDB.
Involved in loading data from UNIX file system to HDFS.
Installed and configured Hive and also written Hive UDFs.
Designed the logical and physical data model, generated DDL scripts, and wrote DML scripts for Oracle 9i database.
Used Hibernate ORM framework with spring framework for data persistence and transaction management.
Used struts validation framework for form level validation.
Wrote test cases in JUnit for unit testing of classes.
Involved in templates and screens in HTML and JavaScript.
Involved in integrating Web Services using WSDL and UDDI.
Suggested latest upgrades for Hadoop clusters.
Created HBase tables to load large sets of data coming from UNIX and NoSQL
Built and deployed Java applications into multiple Unix based environments and produced both unit and functional test results along with release notes

Environment: JDK 1.5, J2EE 1.4, Struts 1.3, JSP, Servlets 2.5, WebSphere 6.1, HTML, XML, ANT 1.6, JavaScript, Node.js, JUnit 3.8, HDFS, MongoDB, Hive, HBase UNIX, AWS

Confidential

Java Developer

Responsibilities:

Played an active role in the team by interacting with welfare business analyst/program specialists and converted business requirements into system requirements.
Developed and deployed UI layer logics of sites using JSP.
Struts (MVC) is used for implementation of business model logic.
Worked with Struts MVC objects like Action Servlet, Controllers, and validators, Web Application Context, Handler Mapping, Message Resource Bundles and JNDI for look-up for J2EE components.
Developed dynamic JSP pages with Struts.
Developed the XML data object to generate the PDF documents and other reports.
Used Hibernate, DAO, and JDBC for data retrieval and medications from database.
Messaging and interaction of Web Services is done using SOAP and REST
Developed JUnit Test cases for Unit Test cases and as well as System and User test scenarios
Involved in Unit Testing, User Acceptance Testing and Bug Fixing.

Environment: J2EE, JDBC, Java 1.4, Servlets, JSP, Struts, Hibernate, Web services, RESTfull services, SOAP, WSDL, Design Patterns, MVC, HTML, JavaScript 1.2, WebLogic 8.0, XML, Junit, Oracle 10g, My Eclipse.

Confidential

Java Developer

Responsibilities:

Implemented the project according to the Software Development Life Cycle (SDLC)
Implemented JDBC for mapping an object-oriented domain model to a traditional relational database
Created Stored Procedures to manipulate the database and to apply the business logic according to the user’s specifications
Developed the Generic Classes, which includes the frequently used functionality, so that it can be reusable
Exception Management mechanism using Exception Handling Application Blocks to handle the exceptions
Designed and developed user interfaces using JSP, JavaScript and HTML
Involved in Database design and developing SQL Queries, stored procedures on MySQL
Used CVS for maintaining the Source Code
Logging was done through log4j

Environment: JAVA, Java Script, HTML, log4j, JDBC Drivers, Soap Web Services, UNIX, Shell scripting, SQL Server.

We provide IT Staff Augmentation Services!

Sr.spark/hadoop Developer Resume

Omaha, NE

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship