We provide IT Staff Augmentation Services!

Sr. Hadoop/spark Developer Resume

Tampa, FL

PROFESSIONAL SUMMARY:

  • Around 8+ years of IT experience in analysis, design, development and implementation of large - scale applications using Big Data and Java/J2EE technologies such as Apache Spark, Hadoop, Hive, Pig, Sqoop, Oozie, HBase, Zoopkeeper, Python & Scala.
  • Strong experience writing Spark Core, Spark SQL, Spark Streaming, Java MapReduce, Spark on Java Applications.
  • Experienced in Apache Spark, Hive and Pig’s analytical functions and extending Spark, Hive and Pig functionality by writing custom UDFs and hooking UDF’s into larger Spark applications to be used as in-line functions.
  • Experience with installing, backup, recovery, configuration and development on multiple Hadoop distribution platforms Cloudera and Hortonworks including cloud platforms Amazon AWS and Google Cloud.
  • Highly skilled in Optimizing and moving large scale pipeline applications from on-premise clusters to AWS Cloud.
  • Working knowledge of spinning-up, configuring and maintaining long-running Amazon EMR clusters manually as well as through Cloud Formation scripts on Amazon AWS.
  • Experienced in building frameworks for Large scale streaming applications in Apache Spark.
  • Worked on migrating Hadoop MapReduce programs to Apache Spark on Scala.
  • Extensive hands-on knowledge of working on the Amazon AWS and Google Cloud Architecture.
  • Highly skilled in integrating Amazon Kinesis streams with Spark Streaming applications to build long running real-time applications.
  • Configuring Kinesis Shards for optimal throughput in Kinesis Streams for Spark Streaming Applications on AWS.
  • Solid understanding of RDD operations in Apache Spark i.e., Transformations & Actions, Persistence (Caching), Accumulators, Broadcast Variables, Optimizing Broadcasts.
  • In-depth knowledge of handling large amounts of data utilizing Spark DataFrames/Datasets API and Case Classes.
  • Experienced in running query using Impala and used BI tools to run ad-hoc queries directly on Hadoop.
  • Working knowledge of utilizing Hadoop file formats such as Sequence, ORC, Avro, Parquet as well as open source Text/CSV and JSON formatted files.
  • In-depth knowledge of the Big Data Architecture along with it various components of Hadoop 1.X and 2.X such as HDFS, Job Tracker, Task Tracker, Data Node, Name Node and YARN concepts such as Resource Manager, Node Manager.
  • Hands on experience on AWS cloud services (VPC, EC2, S3, RDS, Glue, Redshift, Data Pipeline, EMR, DynamoDB, Workspaces, Lambda, Kinesis, RDS, SNS, SQS).
  • HiveQL and PigLatin scripts leading to good understanding in MapReduce design patterns, data analysis using Hive and Pig.
  • Great knowledge of working with Apache Spark Streaming API on Big Data Distributions in an active cluster environment.
  • Very capable at using AWS utilities such as EMR, S3 and Cloudwatch to run and monitor Hadoop/Spark jobs on AWS.
  • Very well versed in writing and deploying Oozie Workflows and Coordinators. Scheduling, Monitoring and Troubleshooting through Hue UI.
  • Proficient in importing and exporting data from Relational Database Systems to HDFS and vice versa, using Sqoop.
  • Good understanding of column-family NoSQL databases like HBase, Cassandra and Mongo DB in enterprise use cases.
  • Very capable in processing of large sets of structured, semi-structured and unstructured data and supporting system application architecture in Hadoop, Spark and SQL databases such as Teradata, MySQL, DB2.
  • Working experience in Impala, Mahout, SparkSQL, Storm, Avro, Kafka, Hue and AWS.
  • Experience with installing, backup, recovery, configuration and development on multiple Hadoop distribution platforms like Hortonworks Distribution Platform (HDP), Cloudera Distribution for Hadoop (CDH).
  • Experienced in version control and source code management tools like GIT, SVN , and BitBucket.
  • Software development in Java Application Development, Client/Server Applications, and implementing application environment using MVC, J2EE, JDBC, JSP, XML methodologies (XML, XSL, XSD), Web Services, Relational Databases and NoSQL Databases.
  • Hands-on experience in application development using Java, RDBMS, and Linux shell scripting, Perl.
  • Hands-on experience working with IDE tools such as Eclipse, IntelliJ, NetBeans, Visual Studio, GIT and Maven and experienced in writing cohesive E2E applications on Apache Zeppelin.
  • Experience working in Waterfall and Agile - SCRUM methodologies.
  • Ability to adapt to evolving technologies, a strong sense of responsibility and accomplishment.

TECHNICAL SKILLS:

Languages & Scripting: Scala, Java, Python, Shell Script, JavaScript, SQL

Big Data Frameworks: HDFS, MapReduce, Apache Hive, YARN, HBase, Pig, Impala, Apache Solr, Apache Spark, Apache Pig, Apache HBase, Impala, Spark Streaming, Spark SQL, Spark ML, Oozie, Hue, MongoDB, Sqoop, Zookeeper, Storm, Flume, Kafka, MongoDB, Cassandra

Cloud Technologies: Amazon EC2, S3, EMR, Dynamo DB, Lambda, Kinesis, ELB, RDS, Glue, SNS, SQS, EBS, CloudFormation

Development Tools: Microsoft SQL Studio, IntelliJ, Eclipse, NetBeans, Maven, Junit, MRUnit, ScalaUnit

Database/ RDBMS: MySQL, Sybase, MS-SQL server,Postgres, DB2, Oracle 11g/10g/9

Web Development: HTML, XML, AJAX, SOAP, WSD

Application Servers: WebLogic, JBoss, Apache Tomcat 8.0, IBM WebSphere

Operating Systems: Unix, Linux, Windows, Mac

Version Control: GIT, SVN, BitBucket

BI Tools: Tableau, Qlik

PROFESSIONAL EXPERIENCE:

Confidential, Tampa, FL

Sr. Hadoop/Spark Developer

Responsibilities:

  • Developed highly efficient Spark batch and streaming applications which run on AWS utilizing Spark API such as Datasets, Case Classes, Lambda functions, RDD transformations adhering to market standards and best practices for development.
  • Migrated long running Hadoop applications from legacy clusters to Spark applications running on Amazon EMR.
  • Used Spark - SQL to Load Parquet data and created Datasets defined by Case classes and handled Structured data using Spark SQL which were finally stored into Hive tables for downstream consumption.
  • Written ETL scripts to move data from HDFS to S3 and vice versa and created Hive external tables on top of this data to be utilized in Big data applications.
  • Created scripts to sync data between local MongoDB and Postgres databases with those on AWS Cloud.
  • Implemented POC to migrate Hadoop Java applications to Spark on Scala.
  • Developed Scala scripts on Spark to perform operations as data inspection, cleaning, loading and transforms the large sets of JSon data to Parquet format.
  • Prepared Linux shell scripts to configure, deploy and manage Oozie workflows of Big Data applications.
  • Worked on Spark streaming using Amazon Kinesis for real time data processing.
  • Created, configured, managed and destroyed EMR transient non-prod clusters as well as long running Prod cluster on AWS .
  • Worked on Triggering and scheduling ETL jobs using AWS Glue and Automated Glue with CloudWatch Events.
  • Involved in developing Hive DDL templates which were hooked into Oozie workflows to create, alter and drop tables.
  • Created Hive snapshot tables and Hive Avro tables from data partitions stored on S3 and HDFS.
  • Involved in creating frameworks which utilized a large number of Spark and Hadoop applications running in series to create one cohesive E2E Big Data pipeline.
  • Used Amazon Cloudwatch to monitor and track resources on AWS.
  • Worked on Sequence, ORC, Avro, Parquet file formats and some compression techniques like LZO, Snappy.
  • Wrote Hive queries for data analysis to meet the business requirements.
  • Configured GitHub plugin to offer integration between GitHub & Jenkins and regularly involved in versioner control and source code management including Release Build and Snapshot Build management
  • Involved in writing unit test cases for Hadoop and Spark applications which were tested in MRUnit and ScalaUnit environments respectively.
  • Flexible with Unix/Linux and Windows Environments working with Operating Systems like Cent OS 5/6, Ubuntu 13/14.
  • Used Putty-SSH Client to connect remotely to the servers.
  • Involved in story-driven agile development methodology and actively participated in daily scrum meetings.

Confidential, Dallas, TX

Sr. Hadoop/Spark Developer

Responsibilities:

  • Developed Spark code using Scala and Spark-SQL/Streaming for faster processing of data.
  • Prepared Spark builds from MapReduce source code for better performance.
  • Imported data using Sqoop to load data from MySQL to HDFS on regular basis.
  • Developed Scripts and Batch Job to schedule various Hadoop Program.
  • Used Spark API over Hortonworks, Hadoop YARN to perform analytics on data in Hive.
  • Wrote Hive queries for data analysis to meet the business requirements.
  • Developed Kafka producer and consumers for message handling.
  • Used Amazon CLI for data transfers to and from Amazon S3 buckets.
  • Executed Hadoop/Spark jobs on AWS EMR using programs, data stored in S3 Buckets.
  • Exploring with Spark improving performance and optimization of the existing algorithms in Hadoop MapReduce using Spark Context, Spark-SQL, Data Frames, Pair RDD's and Spark YARN.
  • Deployed MapReduce and Spark jobs on Amazon Elastic MapReduce using datasets stored on S3.
  • Weekly meetings with technical collaborators and active participation in code review sessions with senior and junior developers.
  • Used Amazon Cloudwatch to monitor and track resources on AWS.
  • Knowledge of designing and deployment of Hadoop cluster and different Big Data analytic tools including Hive, HBase, Oozie, Sqoop, Flume, Spark, Impala, Cassandra.
  • Real time streaming of data using Spark with Kafka.
  • Responsible for developing data pipeline using Flume, Sqoop and Pig to extract the data from weblogs and store in HDFS.
  • Experienced with batch processing of data sources using Apache Spark and Elastic search.
  • Experienced in implementing Spark RDD transformations, actions to implement business analysis.
  • Implemented Spark using Scala and Spark SQL for faster testing and processing of data.
  • Loaded data into the cluster from dynamically generated files using Flume and from relational database management systems using Sqoop.
  • Integrating user data from Cassandra to data in HDFS.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Scala.
  • Involved in importing the real-time data to Hadoop using Kafka and implemented Oozie jobs for daily imports.
  • Automated the process for extraction of data from warehouses and weblogs by developing work-flows and coordinator jobs in Oozie.
  • Created Hive tables and involved in data loading and writing Hive UDFs.

Environment: CDH4, CDH5, Scala, Spark, Spark Streaming, Spark SQL, HDFS, AWS, Hive, Pig, Linux, Eclipse, Oozie, Hue, Flume, MapReduce, Apache Kafka, Sqoop, Oracle, Shell Scripting and Cassandra, Hortonworks.

Confidential, Portland, OR

Hadoop Developer

Responsibilities:

  • Performed performance tuning and troubleshooting of MapReduce jobs by analyzing and reviewing Hadoop log files.
  • Supported MapReduce Programs, those are running on the cluster.
  • Provisioning, installing, configuring, monitoring, and maintaining HDFS, Yarn, HBase, Flume, Sqoop, Oozie, Hive.
  • Involved in writing the shell scripts for exporting log files to Hadoop cluster through automated process.
  • Real time streaming of data using Spark with Kafka.
  • Creating Hive tables, dynamic partitioning, buckets for sampling, and working on them using HiveQL.
  • Used Pig to parse the data and store it in Avro format.
  • Stored data in tabular formats using Hive tables and Hive SerDes.
  • Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
  • Developed Kafka producer and consumers for message handling.
  • Used Amazon CLI for data transfers to and from Amazon S3 buckets.
  • Executed Hadoop jobs on AWS EMR using programs, data stored in S3 Buckets.
  • Involved in creating UNIX shell scripts for database connectivity and executing queries in parallel job execution.
  • Involved in importing the real-time data to Hadoop using Kafka and implemented Oozie jobs for daily imports.
  • Developed and written Apache Pig scripts and Hive scripts to process the HDFS data.
  • Designed and implemented incremental imports into Hive tables.
  • Involved in Unit testing and delivered Unit test plans and results documents using Junit and MRUnit.
  • Involved in collecting, aggregating and moving data from servers to HDFS using Apache Flume.
  • Written Hive jobs to parse the logs and structure them in tabular format to facilitate effective querying on the log data.
  • Exported the analyzed data to the relational databases using Sqoop for visualization.
  • Analyzed large and critical datasets using HDFS, HBase, MapReduce, Hive, Hive UDF, Pig, Sqoop, Zookeeper.
  • Developed custom aggregate UDF’s in Hive to parse log files.
  • Identified the required data to be pooled to HDFS and created Sqoop scripts which were scheduled periodically to migrate data to the Hadoop environment.
  • Involved with File Processing using Pig Latin.
  • Created MapReduce jobs involving combiners and partitioners to deliver better results and worked on application performance optimization for an HDFS cluster.
  • Worked on debugging, performance tuning of Hive & Pig Jobs.

Environment: Cloudera, MapReduce, HDFS, Pig Scripts, Hive Scripts, HBase, Sqoop, Zookeeper, Oozie, Oracle, Shell Scripting.

Confidential

Big Data/Hadoop Developer

Responsibilities:

  • Wrote the MapReduce jobs to parse the web logs which are stored in HDFS.
  • Designed and implemented MapReduce based large-scale parallel relation-learning system.
  • Developed the services to run MapReduce jobs as per the daily requirement.
  • Involved in creating Hive tables, loading them with data and writing hive queries.
  • Involved in optimizing Hive Queries, joins to get better results for Hive ad-hoc queries.
  • Used Pig to perform data transformations, event joins, filter and some pre-aggregations before storing the data into HDFS.
  • Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
  • Hands on experience with NoSQL databases like HBase for POC (proof of concept) in storing URL's, images, products and supplements information at real time.
  • Developed integrated dash board to perform CRUD operations on HBase data using Thrift API.
  • Implemented error notification module to support team using HBase co-processors (Observers).
  • Configured, integrated Flume sources, channels, destinations to analyze log data in HDFS.
  • Implemented flume custom interceptors to perform cleansing operations before moving data onto HDFS.
  • Involved in troubleshooting errors in Shell, Hive and MapReduce.
  • Worked on debugging, performance tuning of Hive & Pig Jobs.
  • Developed Oozie workflows which are scheduled monthly.
  • Designed and developed read lock capability in HDFS.
  • Developed unit test cases using MRUnit and involved in unit testing and integration testing.
  • Involved in coding, designing, documenting, debugging and maintenance of several applications.
  • Developed application using Eclipse and used build and deploy tool as Maven
  • Involved in creation of test cases for Junit Testing.
  • Involved in creating SQL tables and in writing queries to read/write data.
  • Used JDBC to connect between databases and application
  • Developed applications based on MVC architecture.
  • Involved in end user training.
  • Worked on Linux environment.
  • Involved in Problem analysis and coding
  • Wrote stored triggers and Database Triggers.

Environment: Eclipse, Java, Linux, JavaScript, SQL, Junit, JDBC, Shell Scripting, Tomcat, Oracle.

Confidential

Java Developer

Responsibilities:

  • Involved in coding, designing, documenting, debugging and maintenance of several applications.
  • Involved in creation of SQL tables, indexes and was involved in writing queries to read/manipulate data.
  • Used JDBC to establish connection between the database and the application.
  • Created the user interface using HTML, CSS and JavaScript.
  • Maintenance and support of the existing applications.
  • Responsible for the development of database SQL queries
  • Created/modified shell scripts for scheduling and automating tasks.
  • Wrote unit test cases using Junit framework.

Environment: s: Eclipse, Java, HTML, CSS and JavaScript, SQL, Junit, JDBC, Shell Scripting.

Hire Now