We provide IT Staff Augmentation Services!

Spark/big Data Developer Resume

2.00/5 (Submit Your Rating)

Dallas, TX

SUMMARY

  • 7+ years of professional experience in design, analysis and software development.
  • 5+ years of experience in using various Hadoop Ecosystem tools and technologies such as Flume, Kafka, Hadoop, HBase, Oozie, Sqoop, Hive, Pig, MapReduce, Zookeeper, Storm, YARN
  • 2 years of experience with Apache Spark, pyspark and Spark Streaming.
  • Hands on experience with Spark using Spark Context, Spark - SQL, Data Frame, Pair RDD’s, Yarn and developed Spark code using Python, as well as Spark-SQL/Streaming for faster Processing and Testing.
  • Experience in writing, testing and implementation of SQL scripts and macros.
  • Extensive experience in implementation of transformations, shell Scripts, Stored Procedures and Execution of test plans for loading the data successfully into the targets.
  • Hands on solid experience in Performance Tuning of source, target, mappings, transformations and Sessions.
  • Performed all dimensions of development including Extraction, Transformation and Loading data from Various sources into Data Warehouses and Data Marts using Power Center (Repository Manager, Designer, Workflow Manager, and Workflow Monitor).
  • Experience in developing big data solutions with Cloudera, Hortonworks, Google Cloud Platform (GCP) and Amazon Web Services (AWS).
  • Excellent knowledge and experience in creating source to target mapping, edit rules and validation, Transformations, and business rules.
  • Strong working experience in the Data Analysis, Design, Development, Implementation and Testing of Data Warehousing and using Data Conversions, Data Extraction, Data Transformation and Data Loading (ETL).
  • Experience using Kafka cluster for Data Integration and secured cloud service platform like AWS and doing Data summarization, Querying and Analysis of large Datasets stored on HDFS and Amazon S3 filesystem using Hive Query Language (HiveQL)
  • Expertise in Java/J2EE technologies such as Core Java, spring, Hibernate, JDBC, JSON, HTML, Struts, Servlets, JSP, JBOSS and JavaScript.
  • Experience writing MapReduce jobs with Java and Python.
  • Excellent knowledge and understanding of NoSQL databases like HBase, MongoDB, Cassandra.
  • Expert knowledge in real time data analytics using Apache Spark and Apache Kafka.
  • Responsibilities include requirement analysis, Defining the scope, designing solution, Development and Project execution, Go-live and release activities; handled production support projects.
  • Strong knowledge of Software Development Life Cycle (SDLC) including requirement analysis, design, Development, testing, and implementation. Provided End User Training and Support.
  • Hands on experience with GIT, SVN, PERFORCE and JIRA.
  • Hands of Experience with Linux/UNIX Shell Scripts.

TECHNICAL SKILLS

Specialties: Data warehousing/ETL/BI Concepts, Software Development methodologies, Data Modeling, Data Munging, Data Processing.

Hadoop Ecosystems: Sqoop, HIVE, Kafka, Oozie, Spark, HBase, flume

Languages: SQL, PL/SQL, C, Python, Unix Shell script, JavaScript, Java8, PHP

Databases: Teradata, Oracle 9i, MongoDB, Cassandra, MySQL

Database Tools: Teradata SQL Assistant, TOAD, SQL* Plus, Oracle enterprise manager, Agility Workbench, MySQL Workbench, Python Packages numpy, scipy, pandas, scikit-learn, IDE sublime text, pycharm. Eclipse, Text Editors vi, emacs, nano

Version Controllers: GIT, SVN, Perforce

Operating Systems: UNIX, Linux, Windows, MacOSX.

Administrative Tools: JIRA

PROFESSIONAL EXPERIENCE

Confidential - Dallas, TX

Spark/Big Data Developer

Responsibilities:

  • Performed transformations, cleaning and filtering on imported data using Hive, MapReduce and loading data into HDFS using Sqoop.
  • Involved in writing MapReduce jobs for ETL and Aggregation.
  • Worked extensively on Kafka and SparkStreaming, Spark SQL, PySpark and Hadoop.
  • Developed multiple POCs using PySpark where we code using Python. Deployed them on Yarn Cluster and compared the performance of Spark SQL with Hive/Impala and SQL/Teradata.
  • Developed PIG scripts for source data validation and transformation. Worked on Oozie to automate data loading into the HDFS and Pig for pre-processing the data.
  • Understood and worked on various data formats like Avro, Sequence File, JSON, Map File, Parquet, and XML file formats.
  • Worked on cleansing data generated from weblogs with automated scripts with regular expressions in Python. Loading, analyzing, and extracting data to and from MongoDB database with Python.
  • Worked on converting Hive and SQL queries into Spark using SparkSQL and RDDS with pyspark.
  • Worked with Object Oriented Programming with Python.
  • Experienced in working with MongoDB Query Language.
  • Used Sqoop to import customer information data from MongoDB to HDFS for data processing.
  • Created Hive tables, loaded with data, and performed HivQL operations to process the data. To improve performance created partitions and used the concept of bucketing. Used the concept of Universal Disk Formatting for the business requirements.
  • Worked on implementing reservoir sampling and event detection framework in Spark using Python.
  • Built streaming data processing pipelines to perform real time analytics using Sparkstreaming and Kafka (producer and consumers) and prevail the analyzed data to MongoDB HBase.
  • Worked on SparkData sources,SparkData frames, SparkSQL and Streaming using Python.
  • Created storage with GCP Storage Bucket for storing data. Worked on transferring data from Kafka topic into Google Cloud Storage bucket.
  • Strong experience working with Hadoop from Cloudera Data Platform and running services through Cloudera-manager.
  • Designed as well as published visually rich and intuitive Tableau Dashboards and Crystal Reports for executive decision-making.

Environment: - Hadoop, Cloudera, GCP, DataProc, MapReduce, Hive, MongoDB, Pig, Sqoop 1.6, Python, Apache Storm, Spark 1.6.2, MLlib, Kafka 0.10.1.1, Zookeeper, HBase, HQL, Impala, Oozie, Solr, Java, ETL, Tableau, UNIX.

Confidential - Aroura, CO

Spark/Big Data Developer

Responsibilities:

  • Responsible for building scalable distributed data solutions using Hadoop.
  • Used Spark-Streaming APIs to perform necessary transformations and actions on the fly for building the common learner data model which gets the data from Kafka in near real time and Persists into Cassandra.
  • Configured deployed and maintained multi-node Dev and Test Kafka Clusters.
  • Developed Spark scripts by using pyspark shell commands as per the requirement.
  • Used Spark API over Cloudera Hadoop YARN, Ranger to perform analytics on data in Hive.
  • Developed Scala scripts, UDFFs using both Data frames/SQL/Data sets and RDD/MapReduce in Spark 1.6 for Data Aggregation, queries and writing data back into OLTP system through Sqoop.
  • Experienced in performance tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning.
  • Optimizing of existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames and Pair RDD's.
  • Implemented ELK (Elastic Search, Log stash, Kibana) stack to collect and analyze the logs produced by the spark cluster.
  • Performed advanced procedures like text analytics and processing, using the in-memory computing capabilities of Spark.
  • Experienced in handling large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark, Effective & efficient Joins, Transformations and other during ingestion process itself.
  • Designed, developed and did maintenance of data integration programs in a Hadoop and RDBMS environment with both traditional and non-traditional source systems as well as RDBMS and NoSQL data stores for data access and analysis.
  • Worked on a POC to compare processing time of Impala with Apache Hive for batch applications to implement the former in project.
  • Worked extensively with Sqoop for importing metadata from Oracle.
  • Analyzed the SQL scripts and designed the solution to implement using Pyspark.
  • Responsible for developing data pipeline with Amazon AWS to extract the data from weblogs and store in HDFS.
  • Involved in creating Hive tables, and loading and analyzing data using hive queries.
  • Developed Hive queries to process the data and generate the data cubes for visualizing
  • Implemented schema extraction for Parquet and Avro file Formats in Hive.
  • Implemented Partitioning, Dynamic Partitions, Buckets in HIVE.

Environment: Hadoop YARN, Spark Core, Spark Streaming, Spark SQL, Java, Python, Kafka, Hive, Sqoop, Amazon AWS, Elastic Search, Impala, Cassandra, Talend, Oozie, Cloudera, Oracle 12c, Linux.

Confidential - Philadelphia, PA

Hadoop Developer

Responsibilities:

  • Involved in implementation of Hadoop Cluster and Hive for Development and Test Environment.
  • Installed and configured Hadoop MapReduce, HDFS and Developed MapReduce jobs in Java for data preprocessing.
  • Implemented a POC on Hadoop stack and different big data analytic tools, export and imports from Relational Databases to HDFS.
  • Collected and aggregated large amounts of log data usingApacheFlume and staging data in HDFS for further analysis.
  • Hands on experience in writing custom UDF's and also custom input and output formats.
  • Created Hive Tables, loaded values and generated adhoc-reports using the table data.
  • Showcased strong understanding on Hadoop architecture including HDFS, MapReduce, Hive, Pig, Sqoop and Oozie.
  • Gathered business requirements in meetings for successful implementation and POC (Proof-of-Concept) of Hadoop Cluster.
  • Loaded existing data warehouse data from Oracle database to Hadoop Distributed File System (HDFS).
  • Developed MapReduce programs in Java to search production logs and web analytics logs for use cases like application issues, measure page download performance respectively.
  • Developed Oozie workflows for automating Sqoop, Hive and Pig scripts.
  • Involved in admin related issues of Hbase and other NoSql databases like Cassandra, MongoDB.
  • Involved and actively interacted with cross-functional teams like Web Team, Unix and DBA Team for successful Hadoop implementation.
  • Involved in User Training of Hadoop system for cross-functional teams.

Environment: Java, Eclipse, Hadoop, Hive, Hbase, Oozie, Linux, Map Reduce, HDFS, Shell Scripting, Mysql, Cassandra, MongoDB.

Confidential

Software Engineer

Responsibilities:

  • Used Eclipse for writing JSPs, Struts and other java code snippets.
  • Optimized the performance of critical web application, developed, and integrated several modules, defect reduction, and efficiency improvement using Java, Spring, Hibernate and REST Web Services.
  • Implemented the business logic using Spring MVC framework with Hibernate for CRUD operations.
  • Implemented Spring REST web services to invoke backend/other systems for retrieving customer information.
  • Used Hibernate to support the backend systems.
  • Following and coding the algorithms given by senior developers.
  • Developing proper procedures and functions for the projects developed in Java and HTML.
  • Implemented SQL queries and Stored procedures for MySQL database.
  • Used the JDBC for data retrieval from the database for various inquiries.
  • Experience in writing PL/SQL Stored Procedures, Functions, Triggers, and Complex SQL's.
  • Deployed Apache Tomcat Web Server for our web application.
  • Experience in software development Life cycle process such as Agile.
  • Developing triggers for proper working of the queries, and ensure that proper error messages are generated upon errors.
  • Reviewing the developed code to check for the errors in the code.

Environment: Java, J2EE, Servlet, JSP, MySQL, Struts

We'd love your feedback!