Sr. Spark/Hadoop Developer Resume Grapevine, TX - Hire IT People

SUMMARY

8+ years of professional IT experience in analyzing requirements, designing, building, highly distributed mission critical products and applications.
Highly dedicated and results oriented Hadoop Developer with 4+ years of strong end - to-end experience on Hadoop Development with varying level of expertise around different BIGDATA Environment projects.
Expertise in core Hadoop and Hadoop technology stack which includes HDFS, MapReduce, Oozie, Hive, Sqoop, Pig, Flume, HBase, Spark, Kafka, and Zookeeper.
Having experience on RDD architecture and implementing spark operations on RDD and also optimizing transformations and actions in spark.
Reviewing and managing Hadoop log files by consolidating logs from multiple machines using flume.
Collected the logs data from web servers and integrated in to HDFS using Flume.
Wrote Flume configuration files for importing streaming log data into HBase with Flume
Experience in importing and exporting data usingSqoop from HDFS to Relational Database Systems and vice-versa.
Experience in installation and setup of various Kafka producers and consumers along with the Kafka brokers and topics.
Hands on experience in application development using Java, RDBMS, and Linux shell scripting.
Experienced in managing Hadoop cluster using Cloudera Manager Tool.
Knowledge of ETL methods for data extraction, transformation and loading in corporate-wide ETL Solutions and Data warehouse tools for reporting and data analysis.
Experience with Oozie Workflow Engine in running workflow jobs with actions that run Java MapReduce and Pig jobs.
Great hands on experience withPysparkfor using Spark libraries by using python scripting for data analysis.
Implemented data science algorithms like shift detection in critical data points using Spark, doubling the performance.
Experience in importing and exporting the data using Sqoop from HDFS to Relational Database systems/mainframe and vice-versa.
Extending Hive and Pig core functionality by writing customUDFs.
Experience in analyzing data using HiveQL, Pig Latin, and custom Map Reduce programs in Java.
Experience in Apache Flume for efficiently collecting, aggregating, and moving large amounts of log data.
Involved in developing web-services using REST, HBase Native API Client to query data from HBase.
Experienced in working with structured data using Hive QL, join operations, Hive UDFs, partitions, bucketing and internal/external tables.
Involvement in creating custom UDFs for Pig and Hive to consolidate strategies and usefulness of Python/Java into Pig Latin and HQL (HiveQL).
Involved in converting Cassandra/Hive/SQL queries intoSparktransformations usingSparkRDD's in Scala and Python.
Used highly available AWS Environment to launch the applications in different regions and implemented Cloud Front with AWSLambda to reducelatency.
Implemented CRUD operations using CQL on top of Cassandra file system.
Used Cassandra CQL with Java APIs to retrieve data from Cassandra tables.
Set up Solr for distributing indexing and search
Used Solr to enable indexing for enabling searching on Non-primary key columns from Cassandra key spaces.
Excellent working Knowledge in Spark Core, Spark SQL, Spark Streaming.
Real time exposure to Amazon Web Services, AWS command line interface, and AWS data pipeline.
Work experience with cloud infrastructure like Amazon Web Services (AWS).
Extensive experience in working with various distributions of Hadoop like enterprise versions of Cloudera(CDH4/CDH5), Hortonworks and good knowledge on MAPRdistribution, IBMBigInsights and Amazon’sEMR (Elastic MapReduce).
Experience in design and develop the POC in Spark using Scala to compare the performance of Spark with Hive and SQL/Oracle.
Developed automated processes for flattening the upstream data from Cassandra which in JSON format. Used Hive UDFs to flatten the JSON Data.
Expertise in developing responsive Front-End components with JavaScript, JSP, HTML, XHTML,Servlets, Ajax, and AngularJS.
Experience as a Java Developer in Web/intranet, client/server technologies using Java, J2EE, Servlets, JSP, JSF, EJB, JDBC and SQL.
Experience in setting up automated monitoring and escalation infrastructure for Hadoop Cluster using Ganglia and Nagios.
Good Understanding in Apache Hue.
Techno-functional responsibilities include interfacing with users, identifying functional and technical gaps, estimates, designing custom solutions, development, leading developers, producing documentation, and production support.
Good in using version control like GITHUB and SVN.

TECHNICAL SKILLS

Hadoop Distribution: Horton works, Cloudera (CDH3, CDH4, CDH5), Apache, Amazon AWS(EMR),MapR and Azure.

Hadoop Data Services: Hadoop HDFS, Map Reduce, Yarn,HIVE, PIG, Pentaho, HBase, ZooKeeper, Sqoop, Oozie, Cassandra, Spark, Scala, Storm, Flume, Kafka and Avro,Parquet,Snappy,Nifi.

Hadoop Operational Services: Zookeeper, Oozie

NO SQL Databases: HBase, Cassandra, MongoDB, Neo4j, Redis

Cloud Services: Amazon AWS

Languages: SQL, PL/SQL, Pig Latin, HiveQL, Unix Shell Scripting,HTML,XML (XSD, XSLT,DTD) C, C++, Java, JavaScript Python, Scala

ETL Tools: Informatica, IBM DataStage, Talend

Java & J2EE Technologies: Core Java, Servlets, Hibernate, Spring, Struts, JSP, JDBC, EJB

Application Servers: Web Logic, Web Sphere, Tomcat.

Databases: Oracle,MySQL,DB2, Teradata,MS SQL Server,SQL/NOSQL,HBase,Cassandra,Neo4j

Operating Systems: UNIX, Windows, iOS, LINUX

Methodologies: Agile(Scrum), Waterfall

Other Tools: Putty, WinSCP, Stream Weaver.

PROFESSIONAL EXPERIENCE

Confidential, Grapevine, TX

Sr. Spark/Hadoop Developer

Responsibilities:

Extensively migrated existing architecture toSpark Streaming to process the live streaming data.
Responsible forSparkCore configuration based on type of Input Source.
Executed Spark code using Scala forSpark Streaming/SQL for faster processing of data.
Performed SQL Joins among Hive tables to get input forSparkbatch process.
Gathered the business requirements from the Business Partners and Subject Matter Experts.
Developed Python code to gather the data from HBase and designs the solution to implement usingPySpark.
DevelopedPySparkcode to mimic the transformations performed in the on-premise environment.
Analyzed the Sql scripts and designed solutions to implement using pyspark. created custom new columns depending up on the use case while ingesting the data into Hadoop lake using pyspark.
Analyze Cassandra database and compare it with other open-source NoSQL databases to find which one of them better suites the current requirement.
Integrated Cassandra as a distributed persistent metadata store to provide metadata resolution for network entities on the network.
Implemented Spark using Scala and also used Pyspark using Python for faster testing and processing of data.
Designed multiple Python packages that were used within a large ETL process used to load 2TB of data from an existing Oracle database into a new PostgreSQL cluster
Involved in converting Hive/Sql queries into Spark transformations using Spark RDD’s.
Loading data from Linux file system to HDFS and vice-versa
Developed UDF’s using both Data Frames/Sql and RDD in SparkforData Aggregation queries and reverting back into OLTP through sqoop.
Knowledge of ETL methods for data extraction, transformation and loading in corporate-wide ETL Solutions and Data warehouse tools for reporting and data analysis.
Implementing advanced procedures like text analytics and processing using the in-memory computing capabilities like Apache Spark written in Scala.
Installed and monitored Hadoop ecosystems tools on multiple operating systems like Ubuntu, CentOS.
Exported the patterns analyzed back into Teradata using Sqoop. Continuous monitoring and managing theHadoopcluster through Cloudera Manager.
Continuously monitored and managed theHadoopCluster using ClouderaManager.
Participated in development/implementation of Cloudera ImpalaHadoopenvironment.
Utilized ApacheHadoopenvironment by Cloudera.
Collect the data using SparkStreaming and dump into Cassandra Cluster
Developed Scala scripts using both Data frames/SQL/Datasets and RDD/MapReduce in Spark for Data aggregation, queries and writing data back into OLTP system throughSqoop.
Extensively use Zookeeper as job scheduler for SparkJobs.
Extending Hive and Pig core functionality by writing custom UDFs.
Wrote Java code to format XML documents; upload them toSolrserver for indexing.
Used AWS to export MapReduce jobs into Spark RDD transformations.
Writing AWS Terraform templates for any automation requirements in AWS services.
Used Spark API over Hortonworks HadoopYARN to perform analytics on data in Hive.
Deploy and configured cloud AWS EC2 for client websites moving from self-hosted services for scalability purposes.
Work with multiple teams to provision AWSinfrastructure for development and production environments.
Experience in designing Kafka for multi data center cluster and monitoring it.
Designed number of partitions and replication factor for Kafka topics based on business requirements.
Worked on migrating MapReduce programs into Spark transformations using Spark and Scala, initially done using python (PySpark).
Implemented a distributed messaging queue to integrate with Cassandra using Apache Kafka and Zookeeper.
Experience on Kafka and Spark integration for real time data processing.
Developed Kafka producer and consumer components for real time data processing.
Hands-on experience for setting up Kafka mirror maker for data replication across the cluster’s.
Experience in Configure, Design, Implement and monitor Kafka Cluster and connectors.
Oracle SQL tuning using explain plan.
Manipulate, serialize, model data in multiple forms like JSON, XML.
Involved in setting up map reduce 1 and map reduce 2.
Prepared Avro schema files for generating Hive tables.
Used Impala connectivity from the User Interface(UI) and query the results using ImpalaQL.
Worked on physical transformations of data model which involved in creating Tables, Indexes, Joins, Views and Partitions.
Involved in Analysis, Design, System architectural design, Process interfaces design, design, documentation.
Used Jira for bugtracking and BitBucket to check-in and checkout code changes.
Involved in CassandraData modelling to create key spaces and tables in multi Data Center DSECassandraDB.
Utilized Agile and Scrum Methodology to help manage and organize a team of developers with regular code review sessions.

Environment: Cloudera, Spark, Impala, Sqoop, Flume,Cassandra,Kafka,Hive, Zookeeper,Oozie,RDBMS,AWS.

Confidential

Sr.Spark/HadoopDeveloper

Responsibilities:

Analyzed large and critical datasets using Cloudera, HDFS, HBase, MapReduce, Hive, Hive UDF, Pig, Sqoop, Zookeeper and Spark.
Responsible to manage data coming from different sources.
Developed Batch Processing jobs using Pig and Hive.
Involved in gathering the business requirements from the Business Partners and Subject Matter Experts.
Worked with different File Formats like TEXTFILE, AVROFILE, ORC, and PARQUET for HIVE querying and processing.
Importing and exporting data into HDFS and Hive using Sqoop.
Implemented Elastic Search on Hive data warehouse platform.
Good experience in analyzing Hadoop cluster and different analytic tools like Pig, Impala.
Experienced in managing andreviewingHadooplog files.
Extracted files from CouchDB through Sqoop and placed in HDFS and processed.
Experienced in runningHadoopstreaming jobs to process terabytes of xml format data.
Experienced in working with spark eco system using SparkSQL and Scala queries on different formats like Text file, CSV file.
Created concurrent access for Hive tables with shared and exclusive locking that can be enabled in Hive with the help of Zookeeper implementation in the cluster.
Storing and loading the data from HDFS to Amazon S3 and backing up the Namespace data into NFS.
Implemented Name Node backup using NFS. This was done for High availability.
Designed workflows and coordinators in Oozie to automate and parallelize Hive jobs on Apache Hadoop environment by Hortonworks (HDP 2.2).
Responsible for building scalable distributed data solutions usingHadoopcluster environment with Horton works distribution.
Integrated Hive server 2 with Tableau using Horton Works Hive ODBC driver, for auto generation of Hive queries for non-technical business user.ni
Troubleshooting, Manage and review data backups, Manage and reviewHadooplog files. Hortonworks Cluster.
Used PIG to perform data validation on the data ingested using Sqoop and Flume and the cleansed data set is pushed into MongoDB.
Ingested streaming data with Apache NiFi into Kafka.
Worked with Nifi for managing the flow of data from sources through automated data flow.
Designed and implemented the MongoDB schema.
Wrote services to store and retrieve user data from the MongoDB for the application on devices.
Used Mongoose API to access the MongoDB from NodeJS.
Created and Implemented Business validation and coverage Price Gap Rules in Talend on Hive, using TalendTool.
Wrote shell scripts for rolling day-to-day processes and it is automated.
Written the shell scripts to monitor the data ofHadoopdaemon services and respond accordingly to any warning or failure conditions.

Environment: Apache Flume, Hive, Pig, HDFS, Zookeeper, Sqoop, RDBMS, AWS, MongoDB, Talend, Shell Scripts, Eclipse, WinSCP, Hortonworks.

We provide IT Staff Augmentation Services!

Sr. Spark/hadoop Developer Resume

Grapevine, TX

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship