We provide IT Staff Augmentation Services!

Spark Developer Resume

3.00/5 (Submit Your Rating)

Tampa, FL

SUMMARY

  • 8 years of experience in Software Development including 5+ years of experience in Big Data / Hadoop Ecosystem including HDFS, Map Reduce, Hive, Pig, Spark/Scala, HBase, Zookeeper, Sqoop, Oozie.
  • Experience in analysis, design, development, testing, and implementation of big data applications.
  • In Depth understanding in installing and configuring Pig, Hive, HBase, Flume, Sqoop on teh Hadoop Clusters.
  • Experience in developing applications using Map Reduce for analyzing Big Data with different file formats.
  • Expertise in developing Pig Latin scripts and using Hive Query Language.
  • Ability to import and export data between HDFS and Relational Data Management Systems using Sqoop.
  • Knowledge on Spark framework for batch and real - time data processing.
  • Knowledge on Scala Programming Language worked on spark machine learning libraries.
  • Created models using knn, kmeans, random forest algprithms.
  • Worked on different use cases and POC’s which involved usage of different components of Hadoop.
  • Used Spark for interactive queries, processing of streaming data and integration with popular NoSQL database for huge volume of transactional data.
  • Loaded teh data into SparkRDD and do in memory data Computation to generate teh Output response.
  • Expert noledge of data warehousing concepts, with hands-on in developing ETL applications in a dimensional data mart/data warehouse environment.
  • Experience in job workflow scheduling and monitoring tools like Oozie and Zookeeper.
  • Good noledge on Zookeeper to coordinate clusters.
  • Good Knowledge on Hadoop Cluster architecture and monitoring teh cluster.
  • Performed tasks like cleaning, filtering, identifying and removing duplicates of Big data.
  • Experienced in analyzing teh data using PIG Latin scripts.
  • Working on experimental Spark API for better optimization of existing algorithms such as Spark context, Spark SQL, Spark Streaming, Spark Data Frames.
  • Strong experience in Hadoop development and Testing big data solutions using Cloudera Distribution, Hortonworks, Amazon Web Services(AWS).
  • Well versed with job workflow scheduling and monitoring tools like Oozie.
  • Excellent Analytical, problem solving and communication skills with teh ability to work as a part of teh team as well as independently.
  • Involved in daily SCRUM meetings and used SCRUM agile methodologies.

TECHNICAL SKILLS

Big Data Technologies: Hadoop Ecosystem HDFS, Map Reduce, Hive, Pig, Spark, Scala, HBase, Flume, Sqoop, AWS, Zookeeper, MongoDB and Oozie.

Programming/ Languages: SQL, PL/SQL, SCALA.

Web Development: HTML, CSS

RDBMS: MS SQL Server 2005/ 08/ 12

Framework: Hive, Pig, Spark

Operating Systems: Windows 98/XP/Vista/7, Linux/Ubuntu 13.10

Other Tools: Eclipse, Visual Studio 2008/2010, JGrasp, Dream Weaver, ArgoUML, Microsoft Visio, Adobe Photoshop 8

PROFESSIONAL EXPERIENCE:

Confidential, Tampa, FL

Spark Developer

Responsibilities:

  • Worked with teh business analyst team for gathering requirements and client needs.
  • Involved in teh process of data acquisition, data pre-processing and data exploration of telecommunication project in scala.
  • Provide subject matter expertise and hands on delivery of Extract, Load and Transform on Hadoop Distribution platforms such as Hortonworks, Cloudera. Provide domain perspective on Hadoop Distribution tools usage.
  • As a part Data acquisition in, used sqoop and flume to inject teh data from server to hadoop using incremental import.
  • In pre-processing phase used spark to remove all teh missing data and data transformation to create new features.
  • In data exploration stage used hive and impala to get some insights about teh customer data.
  • Used flume, sqoop, hadoop, spark and oozie for building data pipeline.
  • Involved in developing Impala scripts for extraction, transformation, loading of data into data warehouse.
  • Installed and configured Hadoop Map Reduce, HDFS, developed multiple Map Reduce jobs in java for data cleaning and Processing.
  • Experienced with Spark-Streaming APIs to perform transformations and actions on teh fly for building teh common learner data model which gets teh data from Kafka in near real time and Persists into HBase.
  • Importing and exporting data into HDFS and Hive using Sqoop
  • Experienced in defining job flows
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs and Scala.
  • Implemented Spark RDD transformations to Map business analysis and apply actions on top of transformations.
  • Worked on Hadoop MapReduce, HDFS, developed multiple MapReduce jobs in java for data cleaning and preprocessing.
  • Experienced in managing and reviewing Hadoop log files.
  • Involved in configuring Hadoop ecosystem components like HBase, Hive, Pig and Sqoop.
  • Involved in writing Sqoop jobs to load data from RDBMS into HDFS and vice-versa.
  • Used Pig as ETL tool to do transformations, event joins and some pre-aggregations before storing teh data onto HDFS.
  • Used HIVE to analyze teh partitioned and bucketed data and compute various metrics for reporting.
  • Exported teh analyzed data to teh relational databases using Sqoop for visualization and to generate reports for teh BI team. Like

Environment: Hadoop, Hive, Pig, Strom, Cassandra, Sqoop, Impala, Oozie, Java, Python, Shell Scripting, MapReduce, Java Collection, MySQL.

Confidential, Atlanta, GA

Hadoop Developer

Responsibilities:

  • Gatheird teh business requirements from teh Business Partners and Subject Matter Experts.
  • Worked on Reading/Writing data from JSON file, text file, parquet file, Schema RDD.
  • Identifying data sources and create appropriate data ingestion procedures.
  • Transformed teh data using Hive, Pig for BI team to perform visual analytics according to teh client requirement.
  • Wrote Map Reduce jobs to parse teh web logs which are stored in HDFS.
  • Developed services to run teh Map-Reduce jobs as per teh requirement basis.
  • Importing and exporting data into HDFS and HIVE, PIG using Sqoop.
  • Populate big data customer marketing data structures.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs and Scala.
  • Involved in conceptual, logical and physical data modelling and used star schema in designing teh data warehouse.
  • Created storage with Amazon S3 for storing data. Worked on transferring data from Kafka topic into Amazon Web Services (AWS) S3 storage.
  • Knowledge on using AWS identity and Access Management to secure access to EC2 instances and configure auto-scaling groups using CloudWatch.
  • Used Scala to write code for all Spark use cases
  • Performed complex joins on tables in hive with various optimization techniques.
  • Implemented lateral view in conjunction with UDFs in Hive according to teh client requirement.
  • Teh Hive tables created as per requirement were internal or external tables defined with appropriate static and dynamic partitions, intended for efficiency.
  • Involved in creating Hive Tables, loading with data and writing Hive queries which will invoke and run Map Reduce jobs in teh backend.
  • Writing Map Reduce (Hadoop) programs to convert text files into AVRO and loading into Hive tables
  • Implemented teh workflows using Apache Oozie framework to automate tasks.
  • Developing design documents considering all possible approaches and identifying best of them.
  • Loading Data into HBase using Bulk Load and Non-bulk load.
  • Developed scripts and automated data management from end to end and sync up b/w all teh Clusters.
  • Import teh data from different sources like HDFS/HBase into Spark RDD.
  • Worked extensively with HIVE DDLS and Hive Query language (HQLs).
  • Developed Sqoop jobs to extract data from Teradata/oracle into HDFS.

Environment: Hadoop, Hive, Pig, Strom, Cassandra, Sqoop, Impala, Oozie, Java, Python, Shell Scripting, MapReduce, Java Collection, MySQL.

Confidential

Hadoop Developer

Responsibilities:

  • Worked on analyzing Hadoop cluster and different big data analytic tools including Pig,
  • HBase database and Sqoop, Cassandra, Zookeeper, AWS.
  • Evaluated business requirements and prepared detailed specifications dat followed project guidelines required to develop written programs.
  • Responsible for building scalable distributed data solutions using Hadoop.
  • Analyzed large amounts of data sets to determine optimal way to aggregate and report on it using Map Reduce programs.
  • Optimized Map Reduce Jobs to use HDFS efficiently by using various compression mechanism
  • Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and extracted teh data from HDFS to MySQL using Sqoop.
  • Worked on Large node clusters and Strong Experience in Multi-node setup of Hadoop cluster.Analyzed Large amounts of data sets to determine optimal way to aggregate and report on it.
  • Used Spark to create API's in Java and Python for Big Data analysis.
  • Used Spark for Parallel data processing and better performance.
  • Responsible for building data solutions in Hadoop using Cascading frameworks.
  • Used Pig for data cleansing and extracting teh data from teh web server output files to load into HDFS.
  • Developed a data pipeline using Kafkaand Storm to store data into HDFS.
  • Implemented Kafka Java producers, create custom partitions, configured brokers and implemented High level consumers to implement data platform.

Environment: Scala, Spark SQL, Spark Streaming, Spark Data Frame, Spark MLlib, HDFS, Hive, SqoopKafka, Shell Scripting, Cassandra, Python, AWS, Tableau, SQL Server, GitHub, Maven.

Confidential

Java Developer

Responsibilities:

  • Involved in designing and development using UML with Rational Rose
  • Played a significant role in performance tuning and optimizing teh memory consumption of teh application.
  • Developed various enhancements and features using Java 5.0
  • Developed advanced server-side classes using Networks, IO and Multi-Threading.
  • Lead teh issue management team and achieved significant stability to teh product by bringing down teh bug count to single digits.
  • Designed and developed various complex and advanced user interface using Swing.
  • Used SAX/DOM XML Parser for parsing teh XML file

Environment: Java, Oracle, HTML, XML, SQL, J2EE, JUnit, JDBC, JSP, Tomcat, SQL Server, MongoDB, JavaScript, GitHub, SourceTree, NetBeans.

We'd love your feedback!