We provide IT Staff Augmentation Services!

Spark Developer Resume

0/5 (Submit Your Rating)

Plano, TexaS

SUMMARY

  • 6 years of IT experience as a Developer, & Designer with cross - platform integration experience using Hadoop Ecosystem, Java, and Software Functionalities.
  • Hands on experience in installing, configuring, and using Hadoop Ecosystem - HDFS, MapReduce, Pig, Hive, Oozie, Flume, HBase, Spark, Sqoop, Flume and Oozie.
  • Strong understanding of various Hadoop services, MapReduce, and YARN architecture.
  • Responsible for writing Map Reduce programs.
  • Experienced in importing-exporting data into HDFS using SQOOP.
  • Experience loading data to Hive partitions and creating buckets in Hive.
  • Developed Map Reduce jobs to automate transfer the data from HBase.
  • Expertise in analysis using PIG, HIVEand MapReduce.
  • Experienced in developing UDFs for Hive, PIG using Java.
  • Strong understanding of NoSQL databases like HBase, MongoDB & Cassandra.
  • Scheduling all Hadoop/hive/Sqoop/HBase jobs using Oozie
  • Experience in setting cluster in Amazon EC2 & S3 including the automation of setting & extending the clusters in AWS Amazon cloud.
  • Good understanding of Scrum methodologies, Test Driven Development and continuous integration.
  • Major strengths are familiarity with multiple software systems, ability to learn quickly new technologies, adapt to new environments, self-motivated, team player, focused adaptive and quick learner with excellent interpersonal, technical and communication skills.
  • Experience in defining detailed application software test plans, including organization, participant, schedule, test and application coverage scope.
  • Experience in gathering and defining functional and user interface requirements for software applications.
  • Experience in real time analytics with Apache Spark (RDD, Data Frames and Streaming API).
  • Used Spark Data Frames API over Cloudera platform to perform analytics on Hive data.
  • Experience in integrating Hadoop with Kafka. Expertise in uploading Click stream data from Kafka to HDFS.
  • Expert in utilizing Kafka for messaging and publishing subscribe messaging system

TECHNICAL SKILLS

Hadoop/Big Data: Hadoop, Map Reduce, HDFS, Zookeeper, Kafka, Hive, Pig, Sqoop, AirflowYarn, HBase.

No SQL Databases: HBase,Cassandra, mongo DB

Languages: Python3.7.2 and previous versions, NumPy Pandas mat plot libraries, ScalaApache Spark 2.4.3, Java UNIX shell scripts

Java/J2EE Technologies: Applets, Swing, JDBC, JNDI, JSON, JSTL

Frameworks: MVC, Struts, Spring, Hibernate

Operating Systems: Red Hat Linux, Ubuntu Linux and Windows XP/Vista/7/8

Web Technologies: HTML, DHTML, XML

Web/Application servers: Apache Tomcat, WebLogic, JBoss

Databases: SQL Server, MySQL

Tools: and IDE: Anaconda, PyCharm, Jupiter Eclipse, IntelliJ, Databricks

PROFESSIONAL EXPERIENCE

SPARK DEVELOPER

Confidential, Plano, Texas

Responsibilities:

  • Exploring DAG's, their dependencies and logs usingAirflowpipelines for automation implemented Custom interceptors to Mask confidential data and filter unwanted records from the event payload in flume.
  • Implemented Custom Serializes to perform encryption using DES algorithm.
  • Developed Collections in Mongo DB and performed aggregations on the collections.
  • Used Spark-SQL to Load JSON data and create SchemaRDD and loaded it into Hive Tables and handled Structured data using Spark SQL.
  • Used Spark-SQL to Load data into Hive tables and Written queries to fetch data from these tables.
  • Developed Spark Programs using Scala and Java API's and performed transformations and actions on RDD's.
  • Designed and implemented Spark jobs to support distributed data processing.
  • Experienced in writing Spark Applications in Scala and Python (Pyspark).
  • Developed custom processors in java using maven to add the functionality in Apache Nifi for some additional tasks.
  • Implemented Nifi flow topologies to perform cleansing operations before moving data into HDFS.
  • Created HBase tables and used Hbase sinks and loaded data into them to perform analytics using Tableau.
  • Created HBase tables and column families to store the user event data
  • Imported data from AWS S3 and into spark RDD and performed transformations and actions on RDD's.
  • Configured, monitored, and optimized Flume agent to capture web logs from the VPN server to be put into Hadoop Data Lake.
  • Expertise in Extraction, Transformation, loading data from Oracle, DB2, SQL Server, MS Access, Excel, Flat Files and XML using Talend.
  • Experienced on loading and transforming of large sets of structured, semi and unstructured data.
  • Experience in working with Hadoop clusters using Hortonworks distributions.
  • Wrote, tested and implemented Teradata Fast load, Multiload and Bteq scripts, DML and DDL.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDD, Scala and Python.
  • Developed various Python scripts to find vulnerabilities with SQL Queries by doing SQL injection, permission checks and performance analysis.
  • Develop ETL Process using SPARK, SCALA, HIVE and HBASE .
  • Developed workflows using Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
  • Developed interactive shell scripts for scheduling various data cleansing and data loading process.
  • Developed workflows using Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
  • Used the JSON and XML SerDe's for serialization and de-serialization to load JSON and XML data into HIVE tables.
  • Developed PIG Latin scripts for the analysis of semi structured data and conducted data Analysis by running Hive queries and Pig Scripts.
  • Used codec's like snappy and LZO to store data into HDFS to improve performance.
  • Expert knowledge on MongoDB NoSQL data modeling, tuning, disaster recovery and backup.
  • Created HBase tables to store variable data formats of data coming from different Legacy systems.
  • Used HIVE to do transformations, event joins and some pre-aggregations before storing the data onto HDFS.
  • Developed Sqoop Jobs to load data from RDBMS into HDFS and HIVE.
  • Developed Oozie coordinators to schedule Pig and Hive scripts to create Data pipelines.
  • Involved in loading data from UNIX file system and FTP to HDFS
  • Imported the data from different sources like AWS S3, LFS into Spark RDD.
  • Worked on Kerberos authentication to establish a more secure network communication on the cluster.
  • Performed troubleshooting of MapReduce jobs by analyzing and reviewing Hadoop log files.
  • Worked with Network, database, application and BI teams to ensure data quality and availability.
  • Worked with ELASTIC MAPREDUCE and setup environment in AWS EC2 Instances.
  • Experience in Maintaining the cluster on AWS EMR.
  • Experienced in NOSQL databases like HBase, MongoDB and experienced with Hortonworks distribution of Hadoop.
  • Developed ETL jobs to integrate data from various sources and load into the warehouse using Informatica 9.1
  • Experienced in Creating ETL Mappings in Informatica.
  • Experienced in working with various Transformations like Filter, Router, Expression, update strategy etc. in Informatica.
  • Scheduled the ETL jobs using ESP scheduler.

Environment: Hadoop, YARN,Spark-Core,Spark-Streaming,Spark-SQL, Python, Kafka, Hive, Sqoop, Amazon AWS, Elastic Search, Impala, Cassandra, Tableau, Informatica, Cloudera, Oracle 10g, Linux.

HADOOP DEVELOPER

Confidential, New York, New York

Responsibilities:

  • Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, Hive and Sqoop.
  • Created Hive tables (external, internal) with static and dynamic partitions and performed bucketing on the tables to provide efficiency.
  • Used Hive QL to analyze the partitioned and bucketed data and compute various metrics for reporting.
  • Performed data transformations by writing MapReduce and Pig scripts as per business requirements.
  • Used Kafka Streams to Configure Spark streaming to get information and then store it in HDFS.
  • Extracting real time data using Kafka and spark streaming by Creating DStreams and converting them into RDD, processing it and stored it into Cassandra.
  • Good understanding of Cassandra architecture, replication strategy, gossip, snitch etc.
  • Used DataStax Spark-Cassandra connector to load data into Cassandra and used CQL to analyze data from Casandra tables for quick searching, sorting and grouping.
  • Experience in NoSQL Column-Oriented Databases like Cassandra and its Integration with Hadoop cluster.
  • Using Spark-Streaming APIs to perform transformations and actions on the fly for building the common learner data model which gets the data from Kafka in near real time and Persists into Cassandra.
  • Identifying the problem areas by using elastic search Kibanna with Logstash to import .csv files. Using Solr over Lucene index provided a full text search for analysis and quantification.
  • Used Spark API over Hadoop YARN as execution engine for data analytics using Hive.
  • Work experience with cloud infrastructure like Amazon Web Services (AWS).
  • Design and document REST/HTTP, SOAP APIs, including JSON data formats and API versioning strategy.
  • Worked with SCRUM team in delivering agreed user stories on time for every sprint.
  • Performance analysis of Spark streaming and batch jobs by using Spark tuning parameters.
  • Log4j framework has been used for logging debug, info & error data.
  • Developed Spark applications using Scala and Spark-SQL for faster processing and testing.
  • Developed customized UDF's in java to extend Hive and Pig functionality.
  • Imported data from RDBMS systems like MySQL into HDFS using Sqoop.
  • Developed Sqoop jobs to perform incremental imports into Hive tables.
  • Implemented map-reduce counters to gather metrics of good records and bad records.
  • Involved in loading and transforming of large sets of structured and semi structured data.
  • Worked on different file formats (ORCFILE, Parquet, Avro) and different Compression Codecs (GZIP, SNAPPY, LZO).
  • I worked on a task to decrease database load for search and moved some part of search toElastic Searchsearch engine.
  • Created Data Pipelines as per the business requirements and scheduled it using Oozie Coordinators.
  • Expertise in Extraction, Transformation, loading data from Oracle, DB2, SQL Server, MS Access, Excel, Flat Files and XML using Talend.
  • Implemented Elastic Search on Hive data warehouse platform.
  • Involved in analyzing log data to predict the errors by using Apache Spark.
  • Experience in using ORC, Avro, Parquet, RCFile and JSON file formats and developed UDFs using Hive and Pig.
  • Experience with CDH distribution and Cloudera Manager to manage and monitor Hadoop clusters.
  • Worked and learned a great deal from Amazon Web Services (AWS) Cloud services like EC2, S3, EMR, EBS, RDS and VPC.
  • Integrated MapReduce with HBase to import bulk amount of data into HBase using MapReduce programs.
  • Used Impala and Written Queries for fetching Data from Hive tables.
  • Developed Several MapReduce jobs using Java API.
  • Extracted the data from Teradata into HDFS/Databases/Dashboards using Spark Streaming
  • Well versed with the Database and Data Warehouse concepts like OLTP, OLAP, Star and Snow Flake Schema.
  • Reading the log files usingElastic search Logstashand alerting users on the issue and saving the alert details toMongoDBfor analyzations.
  • Collaborated with Apache SOLR to implement indexing and wrote Custom SOLR query segments to optimize the search.
  • Near Real Time Solr index on Hbase and HDFS.
  • Involved in analyzing log data to predict the errors by using Apache Spark.
  • Developed Pig and Hive UDF's to implement business logic for processing the data as per requirements.
  • Developed Oozie Bundles to schedule pig, Sqoop and hive jobs to create data pipelines.
  • Implemented the project by using Agile Methodology and Attended Scrum Meetings daily.

Environment: HDFS, Map Reduce, Hive 1.1.0, Kafka, Hue 3.9.0, Pig, Flume, Oozie, Sqoop, Apache Hadoop 2.6, Spark, SOLR, Storm, Cloudera Manager, Red Hat, MySQL, Prometheus, Docker, Puppet.

PYTHON DEVELOPER

Confidential, San Francisco, California

Responsibilities:

  • Involved in the software development lifecycle (SDLC) of tracking the requirements, gathering, analysis, detailed design, development, system testing and user acceptance testing.
  • Developed entire frontend and backend modules usingPythonon Django Web Framework.
  • Involved in designing user interactive web pages as the front-end part of the web application using various web technologies like HTML, JavaScript, Angular JS, jQuery, AJAX and implemented CSS for better appearance and feel.
  • Actively involved in developing the methods for Create, Read, Update and Delete (CRUD) in Active Record.
  • Design and setting up of environment of Mongo dB with shards and replica sets. (Dev/Test and Production)
  • Private VPN using Ubuntu,Python, Django, Postgres, Redis, Bootstrap, jQuery, Mongo, Fabric, Git, Tenjin and Selenium.
  • Working knowledge of various AWS technologies like SQS Queuing, SNS Notification, S3 storage, Redshift, Data Pipeline, EMR.
  • Developed a fully automated continuous integration system using Git, Jenkins, MySQL and custom tools developed inPythonand Bash.
  • Implemented Multithreading module and complex networking operations like race route, SMTP mail server and web server UsingPython.
  • Used NumPy for Numerical analysis for the Insurance premium.
  • Implemented and modified various SQL queries and Functions, Cursors and Triggers as per the client requirements.
  • Managed code versioning with GitHub, Bitbucket, and deployment to staging and production servers.
  • Implemented MVC architecture in developing the web application with the help of Django framework.
  • Used Celery as task queue and Rabbit MQ, Redis as messaging broker to execute asynchronous tasks.
  • Designed and managed API system deployment using a fast HTTP server and Amazon AWS architecture.
  • Involved in code reviews using GitHub pull requests, reducing bugs, improving code quality and increasing knowledge sharing
  • Install and configuring monitoring scripts for AWS EC2 instances.
  • Implemented task object to interface with data feed framework and invoke database message service setup and update functionality.
  • Working under UNIX environment in the development of application usingPythonand familiar with all its commands.

Environment: Python2.7, Django 1.4, HTML5, CSS, XML, MySQL, JavaScript, Backbone JS, jQuery, MongoDB, MS SQL Server, JavaScript, Git, GitHub, AWS, Linux, Shell Scripting, AJAX, JAVA.

We'd love your feedback!