We provide IT Staff Augmentation Services!

Spark Developer Resume

4.00/5 (Submit Your Rating)

Stamford-cT

SUMMARY

  • Adept and experienced Hadoop developer with good years of experience in programming world, Hadoop ecosystem and Bigdata systems like Hive, Pig, Sqoop, Oozie, Hbase, Spark with Scala and Python.
  • Expertised in using YARN and tools like Pig and Hive for data analysis, Sqoop for data ingestion and Zookeeper for coordinating cluster resources.
  • Proficiency in Spark for loading data from the Relational and NoSQL databases using Spark SOL and building big data applications using Apache Hadoop.
  • Excellent understanding of Hadoop architecture and different Hadoop clusters like Resource Manager, Node Manager, Name Node and Data Node.
  • Used Apache Oozie for scheduling and managing the Hadoop Jobs. Knowledge on HCatalog for Hadoop based storage management .
  • Expertise in implementing Ad - hoc queries using Hive QL and importing and exporting data using Sqoop from HDFS( Hive & HBase) to Relational Database Systems (Oracle &Teradata) and vice-versa.
  • Managing and Scheduling Jobs to remove the duplicate log data files in HDFS using Oozie.
  • Created, altered and deleted topics (Kafka queues) when required with varying performance tuning using partitioning, bucketing of IMPALA tables. Convert the data into relational format to load into Redshift.
  • Experienced in Apache Flume for collecting, aggregating and moving huge chunks of data from various sources such as webserver, telnet sources etc.
  • Extensive experienced in working with structured data using Hive QL, Join operations, writing custom UDF’s and experienced in optimizing Hive Queries.
  • Efficient in analyzing data using Hive QL, Pig Latin, partitioning on existing data set with static and dynamic partiion, tune data for optimal query performance.
  • Hands on experience on in NOSQL databases such as MongoDb, Hbase and Cassandra.
  • Created HBase tables to store various data formats of PI data coming from different portfolios and for serialization of data used file formats like Avro, Parquet.
  • Involved in file movements between HDFS and AWS S3 and extensively worked with S3 bucket in AWS. Used Big Data tool to load big volume of source files from S3 to Redshift.
  • Agile development environment using Scrum methodology. Expertised to follow Agile process in application development with good knowledge on Agile methodology and Scrum process.
  • Monitor and troubleshoot Hadoop Jobs using Yarn Resource Manager and EMR job logs using Genie and Kibana.
  • Extensive experience in Requirements gathering, Analysis, Design, Reviews, Coding and Code Reviews, Unit Integration Testing, UNIX and shell scripting.
  • Expertise in MapReduce concepts like custom file formats, custom writable, custom partitioners, map side joins, reduce side joins, shuffle & sort, distributed cache, compression codecs.
  • Excellent programming skills with experience in Java C, SQL Programming.
  • Experienced in designing RDBMS schemas, writing SQL queries to maintain and extract data.
  • Developed applications using Java, J2EE, JDBC, Web services, HTML, JavaScript, JQuery, and CSS.
  • Deep understanding of Apache Spark and its components Spark, Spark Streaming for better analysis and processing of data
  • Strong analytical, problem solving, interpersonal, and time management skills.

TECHNICAL SKILLS

Hadoop/Big Data: HDFS, Mapreduce, Hive, Pig, Hbase, Oozie, sqoop, Flume, Spark, Scala, Kafka, Zookeeper, IMPALA

Database: Oracle, DB2, Mysql.

Programming Languages: Java, J2EE, struts, ODBC,JDBC, XML,CSS, Javascript,JSP, Servlets, HTML, Python

Operating systems: Windows, Linux

Hadoop Distributions: Cloudera, Hotonworks

Build Tools: Ant, Maven and Jenkins

PROFESSIONAL EXPERIENCE

Spark Developer

Confidential

Responsibilities:

  • Intensively worked on Data ingestion and integration from Various sources like SAP, ORACLE, SQL server and EDW into Hadoop.
  • Involved in development of shell scripts and Spark SQL jobs to handle huge volume of ETL workload.
  • Used PIG to do transformations, event joins, filter boot traffic and some pre-aggregations before storing the data onto azure database.
  • Worked on Sequence files, RC files, Map side joins, bucketing, partitioning for Hive performance enhancement and storage improvement and developed Hive Scripts (HQL) for automating the joins for different sources.
  • Worked on creating custom NiFi flows for batch processing. The data pipeline includes Apach Spark, Apache NiFi and Hive.
  • Involved in file movements between HDFS and AWS S3 using NIFI.
  • Integrated Maven build and designed workflows to automate the build and deploy process.
  • Extract Real time feed using Kafka and SparkStreaming and convert it to RDD and process data in the form of Data Frame and save the data as Parquet format in HDFS.
  • Used Spark-SQL to read the parquet data and create the tables in hive using the Scala API.
  • Conduct daily stand up with offshore team, updating them with applicable tasks & getting updates for the onshore team on a day-to-day basis.
  • Co-ordinate with offshore and onsite team to understand the requirements and prepare High level and Low-level design documents from the requirements specification.

Environment: Spark, Scala, HDFS, Hive, Pig, Kafka, AWS, Tableau, Avro, Parquet, NiFi, Linux.

Hadoop Spark Developer

Confidential, Stamford-CT

Responsibilities:

  • Designed and developed Spark jobs to enrich the click stream data and implemented Spark jobs using Scala, used SparkSQL to access hive tables into spark for faster processing of data.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Scala and Python.
  • Wrote Spark-Streaming applications to consume the data from Kafka topics and wrote processed streams to Hbase and streamed data using Spark with Kafka.
  • Implemented Kafka event log producer to produce the logs into Kafka topic which are utilized by ELK(Elastic Search, Log Stash, Kibana) stack to analyze the logs produced by the Hadoop cluster.
  • Implemented Sqoop job to perform import / incremental import of data from any relational tables into Hadoop in different formats such as text, Avro, Parquet and sequence into Hive tables.
  • Enabled speedy reviews and first mover advantages by using Oozie to automate data loading into Hadoop Distributed File Systems and PIG to pre-process the data.
  • Used Spark-Streaming APIs to perform necessary transformations and actions on the fly for building the common learner data model which gets the data from Kafka in near real time and Persists into Cassandra.
  • Developed shell scripts to periodically perform incremental import of data from third party API to Amazon AWS.
  • Created Hbase tables to load large sets of structured, semi structured and unstructured data comin g from UNIX, NoSQL and a variety of portfolios.
  • Experienced in troubleshooting various Spark applications using spark-shell, spark-submit.
  • Implemented Spark RDD transformations, actions to migrate Map reduce algorithms.
  • Designed and developed Hadoop MapReduce programs and algorithms for analysis of cloud-scale classified data stored in Cassandra.
  • Worked on Sequence files, RC files, Map side joins, bucketing, partitioning for hive performance enhancement and storage improvement.
  • Implemented data integrity and data quality checks in Hadoop using Hive and Linux Scripts.
  • Used Impala to analyze to data ingested into Hbase and compute various metrics for reporting on the dashboard.
  • Experienced in using Tableau Desktop to represent the data from various sources to access the data easily for business and end users.
  • Responsible in analysis, design, testing phases and responsible for documenting technical specifications.
  • Coordinated effectively with offshore team and managed proect deliverable on time.

Environment: Spark, Scala, HDFS, Hive, Sqoop, Oozie, Impala, Pig, Kafka, Hbase, AWS, Tableau, Cassandra, Avro, Parquet, Python, Linux.

Hadoop Developer

Confidential, St.Louis-MO

Responsibilities:

  • Developing a Financial Model Engine for the sales Department on Big Data infrastructure using Scala and Spark.
  • Involved in migrating Hive queries into Spark transformations using Data frames, Spark SQL, SQL Context, and Scala.
  • Prepared technical documentation of the POC with all the details of installation, configuration, issues facedand their resolutions, Pig scripts, Hive queries, and process for executing them etc.
  • Analyzed the data using HiveQl to identify the different correlations and used core Java technologies to create Hive/Pig UDFs to use in the project.
  • Experience in using and tuning relational databases (e.g. Microsoft SQL Server, Oracle, MySql) and columnar databases (e.g. Amazon Redshift, Microsoft SQL Data Warehouse).
  • Worked with open source communities to commit code, review code, drive enhancements and with data center teams on testing and deployment.Implemented Java Hbase MapReduce paradigm to load data onto Hbase database on a 4 node Hadoop cluster.
  • Design and develop Hadoop MapReduce programs and algorithms for analysis of cloud-scale classified data stored in Cassandra.
  • Evaluated data import-export capabilities, data analysis performance of Apache Hadoop framework.
  • Involved in installation of CDH4 Hadoop, configuration of the cluster and the eco system components like Sqoop, Pig, Hive, HBase and Oozie.
  • Worked closely with the data modelers to model the new incoming data sets and Developed and maintained HiveQL, Pig Latin Scripts, Scala and Map Reduce and Wrote Map Reduce job using Scala.
  • Developed Spark SQL script for handling different data sets and verified its performance over MR jobs.
  • Created HBase tables to store variable data formats of PII data coming from different portfolios.
  • Imported data from local file system, RDBMS into HDFS and Sqoop and developed workflow in Oozie to automate the tasks of loading the data into HDFS.
  • Evaluated various data processing techniques available in Hadoop from various perspectives to detect aberrations in data, provide output to the BI tools, etc.
  • Cleaned up the input data, specified the schema, processed the records, wrote UDFs, and generated the output data using Pig and Hive.
  • Creating MapReduce jobs for performing ETL transformations on the transactional and application specific data sources.
  • Compared the execution times for the functionality that needed joins between multiple data sets using MapReduce, Pig and Hive.
  • Used Apache Spark to execute Scala Source Code for JSON Data Processing and developed code to process it.
  • Used real time batch processing to detect and discover customer buying patterns from historical data and then monitoring customer activity to optimize the customer experience. This leads to more sales and happier customers.
  • Compared the performance of the Hadoop based system to the existing processes used for preparing the data for analysis.

Environment: CDH5, Hadoop, HDFS, MapReduce, Yarn, Hive, Oozie, Sqoop, Oracle, Linux, Shell scripting, Java, Spark,Scala, SBT, Eclipse, JD Edwards Enterprise One.

Hadoop Developer:

Confidential - Chicago, IL

Responsibilities:

  • Worked on analyzing Hadoop cluster using different big data tools including Pig, Hive and MapReduce.
  • Responsible for design development of Spark SQL Scripts based on Functional Specifications.
  • Wrote the Map Reduce jobs to parse the web logs which are stored in HDFS.
  • Importing and exporting data into HDFS and HIVE using Sqoop and Responsible for loading data from UNIX file systems to HDFS.
  • Configured different topologies for spark cluster and deployed them on regular basis.
  • Exploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop.
  • Import the data from different sources like HDFS/HBase into SparkRDD.
  • Experienced with Spark Context, Spark - SQL, Data Frame, Pair RDDs, Spark YARN.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDD, Scala.
  • Implemented Partitioning, Dynamic Partitions, Buckets in Hive.
  • Writing UDF (User Defined Functions) in Pig, Hive when needed and Developing the Pig scripts for processing data.
  • Managing work flow and scheduling for complex map reduce jobs using Apache Oozie.
  • Load and transform large sets of structured, semi structured and unstructured data, Used AVRO, Parquet file formats for serialization of data.
  • Responsible for Spark Streaming configuration based on type of Input Source.
  • Developed Flume ETL job for handling data from HTTP Source and Sink as HDFS and developed Kafka consumer API in
  • Scala for consuming data from Kafka topics.
  • Performed optimization of MapReduce for effective usage of HDFS by compression techniques.
  • Performed validation and standardization of raw data from XML and JSON files with Pig and MapReduce.
  • Implemented complex MapReduce programs to perform joins on the Map side using Distributed Cache in Java.
  • Developed shell scripts for running Hive scripts in Hive and Impala.
  • Created Partitioned and Bucketed Hive tables in Parquet File Formats with Snappy compression and then loaded data into Parquet hive tables from Avro hive tables.
  • Involved in performance tuning of Hive from design, storage and query perspectives.
  • Developed customized classes for serialization and Deserialization in Hadoop
  • Collected the Json data from HTTP Source and developed Spark APIs that helps to do inserts and updates in Hive tables.
  • Created tables in HBase to handle terabytes of data and analyzed data with Hive Queries by implementing Hive and
  • HBase integration.
  • Involved in performing the Linear Regression using Scala API and Spark.
  • Implemented Daily Cron jobs that automate parallel tasks of loading the data into HDFS and pre-processing with Pig using Oozie coordinator jobs.
  • Used Jira for bug tracking and BitBucket to check-in and checkout code changes.
  • Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.

Environment: Hive, Flume, Java, Maven, Impala, Spark, Oozie, Oracle, Yarn, GitHub, Junit, Tableau, Unix, Cloudera, Flume, Sqoop, HDFS, Tomcat, Eclipse, Java, Scala, Hbase.

Java Developer

Confidential

Responsibilities:

  • Developed the user interface screens using swing for accepting various system inputs such as contractual terms, monthly data pertaining to production, inventory and transportation.
  • Involved in designing database connections using JDBC.
  • Involved in design and development of UI using HTML, JavaScript and CSS.
  • Involved in creating tables, stored procedures in Sqlfor data manipulation and retrieval using sqlsever2000, database modification using Sql, Pl/Sql, triggers, views in oracle.
  • Used dispatch action to group related actions into a single class.
  • Build the applications using Ant tool, also used eclipse as the IDE.
  • Developed the business components used for the calculation module.
  • Involved in the logical and physical database design and implemented it by creating suitable tables, views and triggers.
  • Applied J2EE design patterns like business delegate, DAO and singleton.
  • Created the related procedures and functions used by JDBC calls in the above requirements.
  • Actively involved in testing, debugging and deployment of the application on WebLogic application server.
  • Developed test cases and performed unit testing using JUnit.
  • Involved in fixing bugs and minor enhancements for the front-end modules.

Environment: Java, HTML, Java script, CSS, Oracle, JDBC, ANT tool, SQL, Swing and Eclipse.

Java Developer

Confidential

Responsibilities:

  • Worked as software developer for ECIL on developing a supply chain management system.
  • The application involved tracking invoices, raw materials and finished products.
  • Gathered user requirements and specifications.
  • Developed the entire application on Eclipse IDE.
  • Developed and programmed the required classes in Java to support the User account module.
  • Used HTML, JSP and JavaScript for designing the front-end user interface.
  • Implemented error checking/validation on the Java Server Pages using JavaScript.
  • Developed Servlets to handle the requests, perform server-side validation and generate result for user.
  • Used JDBC interface to connect to database.
  • Used SQL to access data from Microsoft SQL Server database.
  • Performed User Acceptance Test.
  • Deployed and tested the web application on Web Logic application server.

Environment: JDK 1.4, Servlet 2.3, JSP 1.2, JavaScript, HTML, JDBC 2.1, SQL, MySQL Server, UNIX and BEA Web Logic Application Server.

We'd love your feedback!