Senior Bigdata Spark Developer Resume

SUMMARY

Over 15 years of IT experience in analysis, design, development, documentation, implementing and testing of software systems in Python, Scala, Java, J2EE, Akka, REST API, MySQL and AWS technologies.
Expertise in design and development of various web and enterprise applications using Type safe technologies like Scala, Akka, Play framework, Slick.
Experience in importing and exporting multi terabytes of data using Sqoop from RDBMS to HDFS and vice versa.
Experience in meeting expectations with Hadoop clusters using Cloudera (CDH3 &CDH4) and Horton Works.
Experienced in implementing Hadoop in AWS EMR clusters and Microsoft Azure Big Data with Databricks.
7 years of experience in Administration, Configuration and managing Open source technology like Spark, Kafka, Zookeeper, Docker, Kubernets on RHEL.
Good experience in writing Spark applications using Scala/Java/Python.
Experience in creating Resilient Distributed Datasets and Dataframes with appropriate push down predicate filtering.
Experience in supporting and monitoring Spark Jobs through Spark web UI & Grafana.
Involved in performance tuning the spark applications through fixing right batch interval time and memory tuning.
Experienced in fine tuning the Spark jobs using dataframe repartitioning & coalesce techniques.
Experienced in Scala functional programming using Closures, Currying, and monads.
Experience in Hadoop echo system MapReduce, Hive, Pig, Oozie, HBase, Flume.
Implemented pre - defined operators in spark & pyspark such as map, flatMap, filter, reduceByKey, groupByKey, aggregateByKey and combineByKey etc.
Experienced in Git repository and Maven builds.
Added security to the cluster by integrating Kerberos.
Experience working with Hive Data warehouse for creating tables, data distribution by implementing partitioning and bucketing, writing and optimizing HiveQL queries and analyzing large datasets.
Experience in writing simple to complex Pig scripts for processing and analyzing large volumes of data.
Experience using Impala for data processing on top of HIVE for better utilization.
Experience in Kafka Stream generation and Spark DStreams consuming the data from Kafka topic.
Extensive hands on experience in Hadoop file system commands for file handling operations.
Experience in using Spark over Map Reduce for faster and efficient data processing and to perform analytics on data.
Experience in creating spark dataframe transformations using withColumn, withColumnRenamed and drop operations to modify dataframe columns.
Experience in generating spark sql queries using anti joins for the upsert operations.
Hands on experience in generating spark submit pipeline and generating DAG using Airflow.
Experience in AWS Glue spark jobs generation and scheduling the job execution.
Hands on experience in setting up workflow using Oozie workflow engine for managing and scheduling Hadoop job.
Worked on developing ETL processes to load data from multiple data sources to HDFS using Flume and Sqoop, perform structural modifications using Hive and analyze data using visualization/reporting tools.
Experience in analyzing large datasets and deriving actionable insights for process improvement.
Worked on loading and transforming of large sets of structured, semi structured and unstructured data.
Experience in working with different file formats like Avro, Parquet, ORC, Sequence, and JSON files.
Background with traditional databases such as Oracle, MySQL, MS SQL Server, PostgreSQL.
Good understanding of Web Services like SOAP, REST and build tools like SBT, Maven, and Gradle.
Experience in Jenkins for the deployment automations.
Detail oriented with strong multitasking, interpersonal skills and ability to produce high quality results.
Good analytical, interpersonal, communication, problem solving skills with ability to quickly master new concepts and capable of working in group as well as independently.

TECHNICAL SKILLS

Hadoop Distribution: Cloudera (CDH 4 and 5), Hortonworks

Hadoop Ecosystem: HDFS, Sqoop, Flume, Hive, Pig, Impala, Map Reduce, Spark Core, Spark SQL, Oozie

Databases: Oracle, MySQL, MS SQL Server, PostgreSQL

NoSQL Database: HBase, Cassandra, Mango

Data warehouse: Redshift

AWS: S3, EMR, EC2, Athena, Glue

Languages: Scala, Java, Python

Operating System: Windows, UNIX / Linux, Mac

PROFESSIONAL EXPERIENCE

Confidential

Senior Bigdata Spark Developer

Responsibilities:

Involved in generating the spark dataframes from mongo DB source collections and saved the transformed dataframes into hive db.
Involved in loading the hive tables incremental loads with merge command and fine tuning the hive tables with bloom filters.
Involved in generating Kafka Streams producing from the SqlServer & http request, Spark Streaming to consume Kafka streams and write into Cassandra DB.
Responsible for generating Spark ETL Jobs, generating Dataframes from Cassandra tables and writing the transformed tables back into Cassandra DB.
Responsible for monitoring the Kafka Stream activity (Kafka Connects & Topics) and spark DStream activity.
Involved in Generating the Cassandra table with respect to Spark Dataframe pushdown filtering (Primary, Cluster, & Secondary Keys)
Responsible for developing the Spark Sql queries for the better Spark Job performance.
Responsible for the monitoring the spark job using spark UI.
Responsible for scheduling the spark jobs through shell script using Autosys.
Responsible for the Spark jobs optimization for the better timing and out of memory failures.
Experienced in implementing storage formats like Parquet, Avro, and ORC.
Experienced in extracting the json dump to generate Spark Dataframes.
Experienced in Git repository, maven builds and deployments using Jenkins.

Environment: Hortonworks, Ambari, AWS (EMR, EC2, & S3), Scala, Python, Spark, Kafka, Cassandra, Mongo, Airflow, Jenkins, MS-Sqlserver, Autosys and Redshift.

Confidential

Hadoop Spark Scala Developer

Responsibilities:

Involved in generating the Scala spark frame work for generating the Dataframes from HDFS and write the Dataframes to S3 buckets.
Responsible for mapping the hive tables and designing the data transformations to move to Redshift.
Involved in copying the files of various formats (json, cvs etc..) to HDFS using Nify.
Involved in running and scheduling spark submits through Oozie, Airflow and Korn shell.
Involved in performance tuning the spark sql and spark submit jobs.
Developed Hive tables on data using different storage format and compression techniques.
Optimized the data sets by creating Partitioning and Bucketing in Hive and performance tuning of Hive queries.
Created RDD's in Spark technology and extracted data from data warehouse on to the Spark RDD’s.
Experience working with Spark SQL and creating RDD’s using pyspark sparkContext & SparkSession.
Experience working with collecting and moving logs from exec source to HDFS using Flume.
Experience in implementing efficient storage formats like Avro, Parquet and ORC.
Monitored workloads, job performance, managed, reviewed and troubleshoot Hadoop log files
Involved in generating the EMR clusters and design the termination of EMR clusters after successful completion of spark submit jobs.
Involved in spark structured streaming and sending the from Hive to Redshift datawarehouse for the spark MLib activity.
Involved in end to end data processing like ingestion, processing, and quality checks and splitting.
Exported the analysed data to the relational databases using Sqoop for visualization and to generate reports.
Actively involved in interacting with team members for the issues and problem resolution.

Environment: Horntonworks 2.6, Ambari, AWS (EMR, EC2, & S3), Python, Spark, Hive, Pig, Airflow, Flume, Sqoop, Jenkins, Hbase, S3, EC2, EMR, datastage and Redshift.

Confidential

Hadoop Spark DevOps Engineer

Responsibilities:

Responsible for importing data from Oracle & Mysql database to HDFS using Sqoop for further transformation.
Responsible for creating Hive tables on top of HDFS and developed Hive Queries to analyze the data.
Implemented AWS EMR clusters for generating Hadoop POC environment.
Actively involved in SQL and Azure SQL DW code development using T-SQL
Developed Hive tables on data using different storage format and compression techniques.
Optimized the data sets by creating Partitioning and Bucketing in Hive and performance tuning of Hive queries.
Created RDD's in Spark technology and extracted data from data warehouse on to the Spark RDD’s.
Consumed the data from Kafka queue to Spark DStreams. Configured different topologies for Spark cluster and deployed them on regular basis.
Monitored the Spark jobs in production environment and taken appropriate steps to fine tune the Spark Jobs.
Extensive hands on experience in Hadoop file system commands for file handling operations.
Loading data from local file systems to HDFS and vice versa using hadoop fs commands.
Design & Develop ETL workflow using Oozie which includes automating the extraction of data from different database into HDFS using Sqoop scripts, Transformation and Analysis in Hive/Pig, Parsing the raw data using Spark.
Extensively worked on the core and Spark SQL modules of Spark for faster testing and processing of data.
Experience working with SparkSQL and creating RDD’s using Scala SparkContext & SparkSession.
Experience working with collecting and moving logs from exec source to HDFS using Flume.
Actively involved in SQL and Azure SQL DW code development using T-SQL
Troubleshooting Azure Data Factory and SQL issues and performance.
Component unit testing using Azure Emulator.
Analyze escalated incidences within the Azure SQL database. Implemented test scripts to support test driven development and continuous integration.
Worked on ETL tool Informatica, Oracle Database and PL/SQL, Python and Shell Scripts.
Involved in database design, creating Tables, Views, Stored Procedures, Functions, Triggers and Indexes. Strong experience in Data Warehousing and ETL using Datastage.
Worked on MicroStrategy report development, analysis, providing mentoring, guidance and troubleshooting to analysis team members in solving complex reporting and analytical problems.
Extensively used filters, facts, Consolidations, Transformations and Custom Groups to generate reports for Business analysis.
Leveraged with the design and development of MicroStrategy dashboards and interactive documents using MicroStrategy web and mobile.
Extracted data from SQL Server 2008 into data marts, views, and/or flat files for Tableau workbook consumption using T-SQL. Partitioned and queried the data in Hive for further analysis by the BI team.
Managed Tableau extracts on Tableau Server and administered Tableau Server.
Extensively worked in data Extraction, Transformation and Loading data using BTEQ, Fast load, Multiload from Oracle to Teradata
Extensively used the Teradata fast load/Multiload utilities to load data into tables
Used Teradata SQL Assistant to build the SQL queries
Did data reconciliation in various source systems and in Teradata.
Involved in writing complex SQL queries using correlated sub queries, joins, and recursive queries.
Worked extensively on date manipulations in Teradata.
Extracted the data from oracle using sql scripts and loaded into teradata using fast/multi load and transformed according to business transformation rules to insert/update the data in data marts.
Installation and configuration, Hadoop Cluster and Maintenance, Cluster Monitoring, Troubleshooting and certifying environments for production readiness.
Monitored workloads, job performance, managed, reviewed and troubleshoot Hadoop log files.
Developed workflow in Oozie to automate the tasks of loading the data into HDFS, pre-processing and transformations with Sqoop script, Pig script, Hive queries.
Created Scala programs to develop the reports for Business users
Involved in end to end data processing like ingestion, processing, and quality checks and splitting.
Exported the analyzed data to the relational databas es using Sqoop for visualization and to generate reports.
Actively involved in interacting with team members for the issues and problem resolution.

Environment: CDH (CDH4 & CDH 5), Hadoop 2.4, Azure, Spark, pyspark, Hive, Pig, Oozie, Flume, Databricks, Sqoop, Cloudera manager, Jenkins, Cassandra, MSSql Server, Oracle RDBMS.

We provide IT Staff Augmentation Services!

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship