We provide IT Staff Augmentation Services!

Senior Bigdata Spark Developer Resume

SUMMARY

  • Over 15 years of IT experience in analysis, design, development, documentation, implementing and testing of software systems in Python, Scala, Java, J2EE, Akka, REST API, MySQL and AWS technologies.
  • Expertise in design and development of various web and enterprise applications using Type safe technologies like Scala, Akka, Play framework, Slick.
  • Experience in importing and exporting multi terabytes of data using Sqoop from RDBMS to HDFS and vice versa.
  • Experience in meeting expectations with Hadoop clusters using Cloudera (CDH3 &CDH4) and Horton Works.
  • Experienced in implementing Hadoop in AWS EMR clusters and Microsoft Azure Big Data with Databricks.
  • 7 years of experience in Administration, Configuration and managing Open source technology like Spark, Kafka, Zookeeper, Docker, Kubernets on RHEL.
  • Good experience in writing Spark applications using Scala/Java/Python.
  • Experience in creating Resilient Distributed Datasets and Dataframes with appropriate push down predicate filtering.
  • Experience in supporting and monitoring Spark Jobs through Spark web UI & Grafana.
  • Involved in performance tuning the spark applications through fixing right batch interval time and memory tuning.
  • Experienced in fine tuning the Spark jobs using dataframe repartitioning & coalesce techniques.
  • Experienced in Scala functional programming using Closures, Currying, and monads.
  • Experience in Hadoop echo system MapReduce, Hive, Pig, Oozie, HBase, Flume.
  • Implemented pre - defined operators in spark & pyspark such as map, flatMap, filter, reduceByKey, groupByKey, aggregateByKey and combineByKey etc.
  • Experienced in Git repository and Maven builds.
  • Added security to the cluster by integrating Kerberos.
  • Experience working with Hive Data warehouse for creating tables, data distribution by implementing partitioning and bucketing, writing and optimizing HiveQL queries and analyzing large datasets.
  • Experience in writing simple to complex Pig scripts for processing and analyzing large volumes of data.
  • Experience using Impala for data processing on top of HIVE for better utilization.
  • Experience in Kafka Stream generation and Spark DStreams consuming the data from Kafka topic.
  • Extensive hands on experience in Hadoop file system commands for file handling operations.
  • Experience in using Spark over Map Reduce for faster and efficient data processing and to perform analytics on data.
  • Experience in creating spark dataframe transformations using withColumn, withColumnRenamed and drop operations to modify dataframe columns.
  • Experience in generating spark sql queries using anti joins for the upsert operations.
  • Hands on experience in generating spark submit pipeline and generating DAG using Airflow.
  • Experience in AWS Glue spark jobs generation and scheduling the job execution.
  • Hands on experience in setting up workflow using Oozie workflow engine for managing and scheduling Hadoop job.
  • Worked on developing ETL processes to load data from multiple data sources to HDFS using Flume and Sqoop, perform structural modifications using Hive and analyze data using visualization/reporting tools.
  • Experience in analyzing large datasets and deriving actionable insights for process improvement.
  • Worked on loading and transforming of large sets of structured, semi structured and unstructured data.
  • Experience in working with different file formats like Avro, Parquet, ORC, Sequence, and JSON files.
  • Background with traditional databases such as Oracle, MySQL, MS SQL Server, PostgreSQL.
  • Good understanding of Web Services like SOAP, REST and build tools like SBT, Maven, and Gradle.
  • Experience in Jenkins for the deployment automations.
  • Detail oriented with strong multitasking, interpersonal skills and ability to produce high quality results.
  • Good analytical, interpersonal, communication, problem solving skills with ability to quickly master new concepts and capable of working in group as well as independently.

TECHNICAL SKILLS

Hadoop Distribution: Cloudera (CDH 4 and 5), Hortonworks

Hadoop Ecosystem: HDFS, Sqoop, Flume, Hive, Pig, Impala, Map Reduce, Spark Core, Spark SQL, Oozie

Databases: Oracle, MySQL, MS SQL Server, PostgreSQL

NoSQL Database: HBase, Cassandra, Mango

Data warehouse: Redshift

AWS: S3, EMR, EC2, Athena, Glue

Languages: Scala, Java, Python

Operating System: Windows, UNIX / Linux, Mac

PROFESSIONAL EXPERIENCE

Confidential

Senior Bigdata Spark Developer

Responsibilities:

  • Involved in generating the spark dataframes from mongo DB source collections and saved the transformed dataframes into hive db.
  • Involved in loading the hive tables incremental loads with merge command and fine tuning the hive tables with bloom filters.
  • Involved in generating Kafka Streams producing from the SqlServer & http request, Spark Streaming to consume Kafka streams and write into Cassandra DB.
  • Responsible for generating Spark ETL Jobs, generating Dataframes from Cassandra tables and writing the transformed tables back into Cassandra DB.
  • Responsible for monitoring the Kafka Stream activity (Kafka Connects & Topics) and spark DStream activity.
  • Involved in Generating the Cassandra table with respect to Spark Dataframe pushdown filtering (Primary, Cluster, & Secondary Keys)
  • Responsible for developing the Spark Sql queries for the better Spark Job performance.
  • Responsible for the monitoring the spark job using spark UI.
  • Responsible for scheduling the spark jobs through shell script using Autosys.
  • Responsible for the Spark jobs optimization for the better timing and out of memory failures.
  • Experienced in implementing storage formats like Parquet, Avro, and ORC.
  • Experienced in extracting the json dump to generate Spark Dataframes.
  • Experienced in Git repository, maven builds and deployments using Jenkins.

Environment: Hortonworks, Ambari, AWS (EMR, EC2, & S3), Scala, Python, Spark, Kafka, Cassandra, Mongo, Airflow, Jenkins, MS-Sqlserver, Autosys and Redshift.

Confidential

Hadoop Spark Scala Developer

Responsibilities:

  • Involved in generating the Scala spark frame work for generating the Dataframes from HDFS and write the Dataframes to S3 buckets.
  • Responsible for mapping the hive tables and designing the data transformations to move to Redshift.
  • Involved in copying the files of various formats (json, cvs etc..) to HDFS using Nify.
  • Involved in running and scheduling spark submits through Oozie, Airflow and Korn shell.
  • Involved in performance tuning the spark sql and spark submit jobs.
  • Developed Hive tables on data using different storage format and compression techniques.
  • Optimized the data sets by creating Partitioning and Bucketing in Hive and performance tuning of Hive queries.
  • Created RDD's in Spark technology and extracted data from data warehouse on to the Spark RDD’s.
  • Experience working with Spark SQL and creating RDD’s using pyspark sparkContext & SparkSession.
  • Experience working with collecting and moving logs from exec source to HDFS using Flume.
  • Experience in implementing efficient storage formats like Avro, Parquet and ORC.
  • Monitored workloads, job performance, managed, reviewed and troubleshoot Hadoop log files
  • Involved in generating the EMR clusters and design the termination of EMR clusters after successful completion of spark submit jobs.
  • Involved in spark structured streaming and sending the from Hive to Redshift datawarehouse for the spark MLib activity.
  • Involved in end to end data processing like ingestion, processing, and quality checks and splitting.
  • Exported the analysed data to the relational databases using Sqoop for visualization and to generate reports.
  • Actively involved in interacting with team members for the issues and problem resolution.

Environment: Horntonworks 2.6, Ambari, AWS (EMR, EC2, & S3), Python, Spark, Hive, Pig, Airflow, Flume, Sqoop, Jenkins, Hbase, S3, EC2, EMR, datastage and Redshift.

Confidential

Hadoop Spark DevOps Engineer

Responsibilities:

  • Responsible for importing data from Oracle & Mysql database to HDFS using Sqoop for further transformation.
  • Responsible for creating Hive tables on top of HDFS and developed Hive Queries to analyze the data.
  • Implemented AWS EMR clusters for generating Hadoop POC environment.
  • Actively involved in SQL and Azure SQL DW code development using T-SQL
  • Developed Hive tables on data using different storage format and compression techniques.
  • Optimized the data sets by creating Partitioning and Bucketing in Hive and performance tuning of Hive queries.
  • Created RDD's in Spark technology and extracted data from data warehouse on to the Spark RDD’s.
  • Consumed the data from Kafka queue to Spark DStreams. Configured different topologies for Spark cluster and deployed them on regular basis.
  • Monitored the Spark jobs in production environment and taken appropriate steps to fine tune the Spark Jobs.
  • Extensive hands on experience in Hadoop file system commands for file handling operations.
  • Loading data from local file systems to HDFS and vice versa using hadoop fs commands.
  • Design & Develop ETL workflow using Oozie which includes automating the extraction of data from different database into HDFS using Sqoop scripts, Transformation and Analysis in Hive/Pig, Parsing the raw data using Spark.
  • Extensively worked on the core and Spark SQL modules of Spark for faster testing and processing of data.
  • Experience working with SparkSQL and creating RDD’s using Scala SparkContext & SparkSession.
  • Experience working with collecting and moving logs from exec source to HDFS using Flume.
  • Actively involved in SQL and Azure SQL DW code development using T-SQL
  • Troubleshooting Azure Data Factory and SQL issues and performance.
  • Component unit testing using Azure Emulator.
  • Analyze escalated incidences within the Azure SQL database. Implemented test scripts to support test driven development and continuous integration.
  • Worked on ETL tool Informatica, Oracle Database and PL/SQL, Python and Shell Scripts.
  • Involved in database design, creating Tables, Views, Stored Procedures, Functions, Triggers and Indexes. Strong experience in Data Warehousing and ETL using Datastage.
  • Worked on MicroStrategy report development, analysis, providing mentoring, guidance and troubleshooting to analysis team members in solving complex reporting and analytical problems.
  • Extensively used filters, facts, Consolidations, Transformations and Custom Groups to generate reports for Business analysis.
  • Leveraged with the design and development of MicroStrategy dashboards and interactive documents using MicroStrategy web and mobile.
  • Extracted data from SQL Server 2008 into data marts, views, and/or flat files for Tableau workbook consumption using T-SQL. Partitioned and queried the data in Hive for further analysis by the BI team.
  • Managed Tableau extracts on Tableau Server and administered Tableau Server.
  • Extensively worked in data Extraction, Transformation and Loading data using BTEQ, Fast load, Multiload from Oracle to Teradata
  • Extensively used the Teradata fast load/Multiload utilities to load data into tables
  • Used Teradata SQL Assistant to build the SQL queries
  • Did data reconciliation in various source systems and in Teradata.
  • Involved in writing complex SQL queries using correlated sub queries, joins, and recursive queries.
  • Worked extensively on date manipulations in Teradata.
  • Extracted the data from oracle using sql scripts and loaded into teradata using fast/multi load and transformed according to business transformation rules to insert/update the data in data marts.
  • Installation and configuration, Hadoop Cluster and Maintenance, Cluster Monitoring, Troubleshooting and certifying environments for production readiness.
  • Monitored workloads, job performance, managed, reviewed and troubleshoot Hadoop log files.
  • Developed workflow in Oozie to automate the tasks of loading the data into HDFS, pre-processing and transformations with Sqoop script, Pig script, Hive queries.
  • Created Scala programs to develop the reports for Business users
  • Involved in end to end data processing like ingestion, processing, and quality checks and splitting.
  • Exported the analyzed data to the relational databas es using Sqoop for visualization and to generate reports.
  • Actively involved in interacting with team members for the issues and problem resolution.

Environment: CDH (CDH4 & CDH 5), Hadoop 2.4, Azure, Spark, pyspark, Hive, Pig, Oozie, Flume, Databricks, Sqoop, Cloudera manager, Jenkins, Cassandra, MSSql Server, Oracle RDBMS.

Hire Now