Sr Application Developer(Spark) Resume Seattle, WA - Hire IT People

SUMMARY

8+ years of total IT experience which includes Java Application Development, Database Management& on Big Data technologies using Hadoop Ecosystem
4 years of experience in BigData Analytics using various Hadoop eco - system tools and SparkFramework.
Solid understanding of Distributed Systems Architecture, MapReduce and Sparkexecutionframeworks for large scale parallel processing.
Worked extensively on Hadoop eco-system components Map Reduce, Pig, Hive, HBase, Flume, Sqoop, Hue, Oozie, Spark and Kafka.
Experience working with all major Hadoop distributions like Cloudera (CDH), Horton works(HDP) and AWS EMR.
Developed highly scalable Spark applications using SparkCore,Data frames, Spark-SQL and SparkStreaming API's in Scala.
Gained good experience troubleshooting and fine-tuningSpark Applications.
Experience in working with D-Streams in Streaming , Accumulators , Broadcastvariables , various levels of caching and optimization techniquesin Spark.
Worked on real time data integration using Kafka, Sparkstreaming and HBase.
In-depth understanding of NoSQL databases such as HBase and its Integration with Hadoop cluster.
Strong working experience in extracting, wrangling, ingestion, processing, storing, querying and analyzing structured, semi-structured and unstructured data.
Solid understanding of Hadoop MRV1 and Hadoop MRV2 (or) YARN Architecture.
Developed, deployed and supported several MapReduce applications in Java to handle semi and unstructured data.
Sound Knowledge in Map side join, Reducer side join, Shuffle & Sort, Distributed Cache, Compression techniques, Multiple Hadoop Input & output formats.
Solid experience in working with csv, text, sequential, Avro, parquet, orc, Jason formats of data.
Expertise in working with Hive data warehouse tool - creating tables, data distribution by implementing static and dynamic partitioning, bucketing and optimizing the HiveQL queries.
Involved in ingestion of structured data from SQL Server, MySql, Teradata to HDFS and Hiveusing Sqoop.Experience in writing AD-hoc Queries in Hive and analyzing data using HiveQL.
Extensive experience in performing ETL on structured, semi-structured data using Pig Latin Scripts.
Expertise in moving structured schema data between Pig and Hive using HCatalog.
Proficient in creating Hive DDL’s and Hive UDF’s.Designed and implemented Hiveand Pig UDF's using Python, java for evaluation, filtering, loading and storing of data.
Experience in migrating the data using Sqoop from HDFS and Hive to Relational Database System and vice-versa according to client's requirement.
Experienced in working with Confidential Web Services (AWS) using EC2 for computing and S3 as storage mechanism. Have awareness about Kerberos.
Experienced in job workflow scheduling and monitoring tools like Oozie.
Proficient knowledge and hands on experience in writing shell scripts in Linux.
Developed core modules in large cross-platform applications using JAVA , JSP , Servlets , Hibernate , RESTful , JDBC , JavaScript , XML , and HTML .
Extensive experience in developing and deploying applications using WebLogic , ApacheTomcat and JBOSS . Worked on Podium and Talend.
Development experience with RDBMS, including writing SQL queries, views, stored procedure, triggers, Data lake etc.
Strong understanding of Software Development Lifecycle (SDLC) and various methodologies (Waterfall, Agile).

TECHNICAL SKILLS

BigData Technologies: HDFS, MapReduce, Hive, Pig, Sqoop, Flume, Oozie, Hue,Ambari, Zookeeper,Kafka,ApacheSpark,Spark Streaming, Impala, HBase, Flume

Hadoop Distributions: Cloudera, Horton Works, Apache, AWS EMR, Databricks

Languages: C, Java, PL/SQL, Python, PigLatin, HiveQL, Scala, Regular Expressions

IDE&Build Tools, Design: Eclipse, NetBeans, IntelliJ, JIRA, Microsoft Visio, PyCharm

Web Technologies: HTML, CSS, JavaScript, XML, JSP, RESTful, SOAP

Operating Systems: Windows (XP,7,8,10), UNIX, LINUX, Ubuntu, CentOS

Reporting Tools: Tableau, Powerview for Microsoft Excel, Talend, MicroStrategy

Databases: Oracle, SQL Server, MySQL, MS Access, NoSQL Database (HBase, Cassandra, MongoDB), Teradata, IBM DB2

Build Automation tools: SBT, Ant, Maven

PROFESSIONAL EXPERIENCE

Confidential - Seattle, WA

Sr Application Developer(Spark)

Responsibilities:

Development and Review of spark code containing Airflow DAG’s, Databricks Notebooks, Delta Tables in DDL’s and Metadata SQL’s, other SQL scripts.
Deploying the Code to Dev, QA, PreProd and Prod Environments by adhering to GIT process flow and following the standards mentioned by the release management process.
Creating Technical Design Documentation and Support/OPS Turnover documentation by following the OPS checklist.
Raising Change Request once the code is PreProd.
Airflow Orchestration especially configuring the DAG start date and scheduled time and other parameters.
Worked on mainly developing Pyspark code in Databricks code using existing load patterns(Full, Incremental and Backfill) for forecasting(Region and Country) rawCustomerSales and pubCustomerSales.
Wrote Spark Dataframesthat uses mainly CSV files, Parquet, Delta file formats. Used Spark SQL, Joins, views, partitioning extensively.
Validating the source data and generating the output data in the required format using Pyspark transformations
Submitting Jobs for cluster administered by other Linux teams.

Environment: Used Databricks, Azure Data Lake storage(Gen1), Oracle EDW, PySpark mainly&Spark SQL, Scala Spark occasionally, Jenkins, PyCharm, Git, Spark BDA server, Putty for Tunneling into Airflow environments etc.

Confidential - Irving, Texas

Sr Application Developer

Responsibilities:

Responsible for Mapping of data before ingesting according to business problem.
Responsible for ingesting large volumes of data into Spark Cluster from IBM DB2 databases using Queries. Also used HDFS, S3 along with IBM DB2 .
Developed Spark Script withPySpark, Javausing PyCharm Spring Boot IDE that performs the internalization process.
Worked on mainly developing Pyspark code using existing resources like QA code written in python, Hanweck BRD to eliminate the previous flaws in design along with performance improvement.
Wrote Spark Dataframes, Datasets and RDD’sthat uses mainly PSV files, Avro & parquet files format also. Used Spark SQL extensively.
Good experience with Performance tuning of Spark application using Spark Performance Tuning Techniques.
Done POC using Kafka and Spark Streaming to fetch data from ONCORE application into our analytics application.

Environment: Used HDFS, S3, IBM DB2, PySpark mainly&Java Spark occasionally, Docker, Maven,Git, kubernetes, Unix etc.

Hadoop/Kafka Developer

Confidential

Responsibilities:

Responsible for ingesting large volumes of IOT data to Kafka.
Developed Microservices withJavausing Spring Boot IDE.
Worked on identifying present Scripted syntax Jenkins pipeline style and suggested to changing to Declarative style for reducing deployment time.
Wrote Kafka producers to stream the data from external rest APIs to Kafka topics.
Experience working for Security groups in AWS cloud and working with S3.
Good experience with continuous Integration of application using Jenkins.
Used chef, Terraform as Infrastructure as code (IaaS) for defining Jenkins plugins.
Responsible for maintaining inbound rules of a security group(s)and preventing duplication of EC2 instances.
Used git and docker for Build.

Environment: Shell Scripting, Git, AWS EMR, Kafka, AWS S3,AWS EC2,Java, Spring Boot Eclipse IDE, Maven, chef, Jenkins, Terraform, Docker and Infrastructure as a service (IaaS), Cloudera (CDH) .

Confidential - Chicago, IL

Hadoop/Spark Developer

Responsibilities:

Responsible for ingesting large volumes of user behavioral data and customer profile data to Analytics Data store.
Developed custom multi-threaded Java based ingestion jobs as well as Sqoop jobs for ingesting from FTP servers and data warehouses.
Developed many Spark applications for performing data cleansing, event enrichment, data aggregation, de-normalization and data preparation needed for machine learning exercise.
Worked on troubleshootingspark application to make them more error tolerant.
Worked on fine-tuning spark applications to improve the over-all processing time for the pipelines.
Wrote Kafka producers to stream the data from external rest APIs to Kafka topics.
Wrote Spark-Streaming applications to consume the data from Kafka topics and write the processed streams to HBase.
Experienced in handling large datasets using Spark in Memory capabilities, using broadcasts variables in Spark, effective & efficient joins, transformations and other capabilities.
Worked extensively with Sqoop for importing data from Oracle.
Experience working for EMR cluster in AWS cloud and working with S3.
Involved in creating Hive tables, loading and analyzing data using hive scripts.
Implemented Partitioning, Dynamic Partitions, Buckets in HIVE.
Good experience with continuous Integration of application using Jenkins.
Used Reporting tools like Tableau to connect with Impala for generating daily reports of data.
Collaborated with the infrastructure, network, database, application and BA teams to ensure data quality and availability.

Environment: Spark, Hive, S3, Sqoop, Shell Scripting, AWS EMR,Kafka, AWS S3, Map Reduce, Scala, Eclipse, Maven, Cloudera (CDH)

Confidential -Seattle, WA

Hadoop developer

Responsibilities:

Worked closely with Business Analysts to gather requirements and design a reliable and scalable data pipelinesusing AWS EMR.
Developed Spark applications using Scala utilizing Data frames and Spark SQL API for faster processing of data.
Developed highly optimized Spark applications to perform various data cleansing, validation, transformation and summarization activities according to the requirement
Data pipeline consists Spark, Hive and Sqoop and custom-builtInputAdapters to ingest, transform and analyze operational data.
Developed Spark jobs and Hive Jobs to summarize and transform data.
Used Spark for interactive queries, processing of streaming data and integration with NoSQL database DynamoDB.
Involved in converting Hive queries into Spark transformations using Spark Data Frames in Scala.
Built real time data pipelines by developing Kafka producers and Spark streaming applications for consuming.
Handled importing data from relational databases into S3 using Sqoop and performing transformations using Hive and Spark.
Exported the processed data to the redshift using redshift load utilities, to further visualize and generate reports for the BI team.
Used Hive to analyze the partitioned and bucketed data and computed various metrics for reporting.
Developed Hive scripts in Hive QL to de-normalize and aggregate the data.
Scheduled and executed workflows in Oozie to run various jobs.

Environment: AWS EMR, S3, Spark, Hive,Sqoop, Eclipse, Java, SQL, Sqoop, Linux-Centos, DynamoDB, Maven.

Confidential - Denver, CO

Hadoop Developer

Responsibilities:

Worked with the business team to gather the requirements and participated in the Agile planning meetings to finalize the scope of each development.
Responsible for building scalable distributed data solutions on Cloudera distributedHadoop.
Developed multiple MapReduce jobs in Java for data cleaning and preprocessing.
Implemented data pipelines developing multiple mappers by using Chained Mappers API.
Developed multiple MapReduce batch jobs in java for loading the data to HDFS in sequential format.
Ingested structured data from wide array of RDBMS to HDFS as incremental import using Sqoop.
Involved in writing Pig scripts to wrangle the raw data and store it to HDFS, load the data to hive tables using HCatalog.
Configured Flume agents on different data sources to capture the streaming log data from the web servers.
Implemented Flume (Multiplexing) to steam data from upstream pipes in to HDFS.
Created Hive external tables with clustering and partitioning on the date for optimizing the performance of ad-hoc queries.
Involved in writing HiveQL scripts on beeline, impala, hive cli for the consumer data analysis to meet business requirements.
Exported data in HDFS to DWH using Sqoop export in allow insert mode through staging table.
Worked with different file formats and compression techniques to ensure optimal performance of hive queries.
Involved in creating Hive tables from wide range of data formats like csv, text, sequential, avro, parquet, orc, Jason and custom formats using SerDe .
Transformed the semi-structured log data to fit into the schema of the Hive tables using Pig.
Involved in scheduling Oozie workflow engine to run multiple Hive and pig jobs.
Involved in testing and designing low level and high-level documentation for the business requirement.

Environment: Cloudera Hadoop, Eclipse, java, Sqoop, Pig, Oozie, Hive, Flume, Cent OS, MySQL, Oracle DB.

Confidential -Denver, CO

Hadoop Developer

Responsibilities:

Responsible for developing efficient MapReduce programs for more than 20 years’ worth of claim data to detect and separate fraudulent claims.
Developed Map-Reduce programs from scratch of medium to complex.
Uploaded and processed more than 30 terabytes of data from various structured and unstructured sources into HDFS using Sqoop and Flume.
Played a key-role is setting up a 100 node Hadoop cluster utilizing MapReduce by working closely with the Hadoop Administration team.
Worked with the advanced analytics team to design fraud detection algorithms and then developed MapReduce programs to run efficiently the algorithm on the huge datasets.
Developed Java programs to perform data scrubbing for unstructured data.
Responsible for designing and managing the Sqoop jobs that uploaded the data from Oracle to HDFS and Hive.
Creating Hive tables to import large data sets from various relational databases using Sqoop and export the analyzed data back for visualization and report generation by the BI team
Used Flume to collect the logs data with error messages across the cluster.
Designed and Maintained Oozie workflows to manage the flow of jobs in the cluster.
Played a key role in installation and configuration of the various Hadoop ecosystem tools such as, Hive, Pig, andHBase.
Successfully loaded files to HDFS from Teradata, and loaded from HDFS to HIVE
Experience in using Zookeeper and Oozie for coordinating the cluster and scheduling workflows
DevelopedOozie workflows and scheduled it to run data/time dependent Hive and Pig jobs
Designed and developed Dashboards for Analytical purposes using Tableau.
Analyzed the Hadoop log files using Pig scripts to oversee the errors.
Actively updated the higher management with daily updates on the progress of project that include the classification levels in the data.

Confidential, Mechanicsburg, PA

Java Developer

Responsibilities:

Developed web applications by coordinating requirements, user stories, use cases, screen mockups, schedules, and activities.
Work closely with client business stakeholders on agile development teams.
Support users by developing documentation and assistance tools.
Developed presentation using Spring Framework and used multiple modules in Spring like, Spring MVC, JDBC
Implemented Web-Services to integrate between different applications components using RESTful using Jersey.
Developed RESTful Web services for transmission of data in JSON/XML format.
Involved in writing SQL queries, functions, views, triggers and stored procedures and also using Oracle relational database.
Used Sqoop to ingest structured data from Oracle database to HDFS.
Involved in writing and running MapReduce batch jobs using java for data wrangling on the cluster.
Developed map side, reduceside joins using DistributedCache on various data sets.
Developed PigLatin scripts to transform the data according to the business requirement.
Developed Pig UDFs extending eval, filter functions using java to filter semi structured data.

Environment: Java, J2EE, Eclipse, JSP, Servlets, spring, JavaScript, HTML, RESTful, shell scripting, XML, Oracle 10g, Cloudera Hadoop, Map Reduce, Pig, HDFS.

We provide IT Staff Augmentation Services!

Sr Application Developer(spark) Resume

Seattle, WA

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship