We provide IT Staff Augmentation Services!

Big Data Engineer/spark Developer Resume

3.00/5 (Submit Your Rating)

Boston, MA

SUMMARY

  • 8+ years of experience with emphasis in designing and implementing statistically significant analytic solutions on Big Data Technologies and Java based enterprise applications.
  • 5 years of implementation and extensive working experience in wide array of tools in teh Big Data Stack like HDFS, Spark, MapReduce, Hive, Pig, Flume, Oozie, Sqoop, Kafka, Zookeeper and HBase.
  • An accomplished Hadoop/Spark developer experienced in ingestion, storage, querying, processing and analysis of big data.
  • Experienced with teh BigData Frameworks - Kafka, Spark, HDFS, HBASE and Zookeeper.
  • Experienced with Spark Streaming, SparkSQL and Kafka for real-time data processing.
  • Excellent Programming skills at a higher level of abstraction using Scala, Java and Python.
  • Well versed with Elastic Search to extract, transform and index teh source data.
  • Extensive experience in working with various distributions of Hadoop like enterprise versions of Cloudera (CDH4/CDH5), Hortonworks and good knowledge on MAPR distribution.
  • Hands on expertise in working and designing of Row keys & Schema Design with NOSQL databases like Mongo DB 3.0.1, HBase, Cassandra and DynamoDB (AWS).
  • Experience in using D-Streams, Accumulator, Broadcast variables, RDD caching for Spark Streaming.
  • Experienced in implementing scheduler using Oozie, Airflow, Crontab and Shell scripts.
  • Extensive experience in importing and exporting streaming data into HDFS using stream processing platforms like Flume and Kafka messaging system.
  • Strong experience and knowledge of real time data analytics using Spark Streaming, Kafka and Flume.
  • Exposure to Data Lake Implementation using Apache Spark and developed Data pipe lines and applied business logics using Spark.
  • Extensively worked on Spark using Scala on cluster for computational (analytics), installed it on top of Hadoop performed advanced analytical application by making use of Spark with Hive and SQL/Oracle.
  • Well-versed in spark components like Spark SQL, MLib, Spark streaming and GraphX.
  • Used Scala and Python to convert Hive/SQL queries into RDD transformations in Apache Spark.
  • Expertise in writingSparkRDD transformations, Actions, Data Frames, Case classes for teh required input data and performed teh data transformations usingSpark-Core.
  • Experience in integrating Hive queries into Spark environment using Spark SQL.
  • Expertise in performing real time analytics on big data using HBase and Cassandra.
  • Written multiple MapReduce Jobs using Java API, Pig and Hive for data extraction, transformation and aggregation from multiple file formats including Parquet, Avro, XML, JSON, CSV, ORCFILE and other compressed file formats Codecs like gZip, Snappy, Lzo.
  • Experienced in working with monitoring tools to check status of cluster using Cloudera manager and Ambari
  • Experience in developing data pipeline using Pig, Sqoop, and Flume to extract teh data from weblogs and store in HDFS. Accomplished developing Pig Latin Scripts and using Hive Query Language for data analytics.
  • Developed customized UDFs and UDAFs in java to extend Pig and Hive core functionality.
  • Great familiarity with creating Hive tables, Hive joins & HQL for querying teh databases eventually leading to complex Hive UDFs.
  • Experience in writing Complex SQL queries, PL/SQL, Views, Stored procedure, Triggers, etc.
  • Experience in optimizing MapReduce algorithms using Mappers, Reducers, combiners and partitioners to deliver teh best results for teh large datasets.
  • Expert in CodingTeradataSQL,TeradataStored Procedures, Macros and Triggers.
  • Experienced in migrating data from various sources using PUB-SUB model in Redis, and Kafka producers, consumers and preprocess data using Storm topologies.
  • Had competency in using Chef, Puppet and Ansible configuration and automation tools. Configured and administered CI tools like Jenkins, Hudson Bambino for automated builds.
  • Working knowledge of Amazon’s Elastic Cloud Compute(EC2) infrastructure for computational tasks and Simple Storage Service (S3) as Storage mechanism.
  • Knowledge in installation, configuration, supporting and managing Hadoop Clusters using Apache, Cloudera (CDH3, CDH4) distributions and on Amazon web services (AWS).
  • Built AWS secured solutions by creating VPC with public and private subnets.
  • Proficient in developing, deploying and managing teh Solr from development to production.
  • Experienced in using build tools like Maven, Ant, SBT Log4j to build and deploy applications into teh server.
  • Worked on data warehousing and ETL tools like Informatica, Talend, and Pentaho.
  • Worked on ELK stack like Elastic search, Logstash, Kibana for log management.
  • Extensive experience in developing applications dat perform Data Processing tasks using Teradata, Oracle, SQL Server and MySQL database
  • Worked on various programming languages using IDEs like Eclipse, NetBeans, and Intellij.
  • Extensive experience in using MVC architecture, Struts, Hibernate for developing webapplications using Java, JSPs, JavaScript, HTML, jQuery, AJAX, XML and JSON.
  • Experienced in ticketing tools like RALLY, JIRA for tracking issues, bugs related to code and GitHub for various code reviews and Worked on various version control tools like CVS, GIT, SVN.
  • Experience in maintaining an Apache Tomcat MYSQL, LDAP, LAMP, Web service environment.
  • Good working experience in importing data using Sqoop, SFTP from various sources like RDMS, Teradata, Mainframes, Oracle, Netezza to HDFS and performed transformations on it using Hive, Pig and Spark.
  • Designed ETL workflows on Tableau, Deployed data from various sources to HDFS.
  • Experience in working with different data sources like Flat files, XML files and Databases. Various domain experiences like ERP, Software quality process.
  • Experience in complete Software Development Life Cycle (SDLC) in both Waterfall and Agile methodologies.
  • Good understanding of all aspects of Testing such as Unit, Regression, Agile, White-box, Black-box.
  • Experience with best practices of Web services development and Integration (REST andSOAP).
  • Experience in automated scripts using Unix shell scripting to perform database activities.
  • Working experience with Linux lineup like Redhat and CentOS.
  • Good analytical, communication, problem solving skills and adore learning new technical, functional skills.
  • Experienced in Agile Scrum waterfall and Test-Driven Development methodologies.

TECHNICAL SKILLS

Big Data Ecosystem: HDFS, YARN, MapReduce, Hive, Pig, Impala, Sqoop, Storm, Flume, Spark, Apache Kafka, Zookeeper, Solr, Ambari, Oozie.

NO SQL Databases: HBase, Elastic Search, Cassandra, MongoDB, Amazon DynamoDB.

Hadoop Distributions: Cloudera (CDH3, CDH4, and CDH5), Hortonworks, MapR and Apache.

Languages: Java, C, C++. Scala, Python, XML, XHTML, HTML, AJAX, CSS, SQL, PL/SQL, Pig Latin, HiveQL, Unix, Java Script, Shell Scripting

Java & J2EE Technologies: Core Java, Servlets, Hibernate, Spring, Struts, JMS, EJB

Source Code Control: Github, Bitbucket, SVN

Application Servers: WebSphere, WebLogic, JBoss, Tomcat

Cloud Computing Tools: Amazon AWS, (S3, EMR, EC2, Lambda, VPC, Route 53, Cloud Watch, Cloud Front), Microsoft Azure

Databases: Teradata, Oracle 10g/11g, Microsoft SQL Server, MySQL, DB2

DB languages: MySQL, PL/SQL, PostgreSQL & Oracle

Build Tools: Jenkins, Maven, ANT, Log4j

Business Intelligence Tools: Tableau, Splunk, Dynatrace

Development Tools: Eclipse, IntelliJ, Microsoft SQL Studio, NetBeans

ETL Tools: Talend, Pentaho, Informatica.

Development Methodologies: Agile, Scrum, Waterfall.

PROFESSIONAL EXPERIENCE

Confidential, Boston MA

Big Data Engineer/Spark Developer

Responsibilities:

  • Worked on analyzing Hadoop cluster using different big data analytic tools including Pig, Hive, Oozie, Zookeeper, Sqoop, Spark, Kafka and Impala with Cloudera distribution.
  • Developed Spark Applications by using Scala, Java and Implemented Apache Spark data processing project to handle data from various RDBMS and Streaming sources.
  • Worked with teh Spark for improving performance and optimization of teh existing algorithms in Hadoop using Spark Context, Spark-SQL, Spark MLlib, Data Frame, Pair RDD's, Spark YARN.
  • Experience in implementing Spark RDD’s in Scala.
  • Configured Spark streaming to get ongoing information from teh Kafka and store teh stream information to HDFS.
  • Used Kafka functionalities like distribution, partition, replicated commit log service for messaging systems by maintaining feeds.
  • Involved in loading data from rest endpoints to Kafka Producers and transferring teh data to Kafka Brokers.
  • Used Apache Kafka to aggregate web log data from multiple servers and make them available in Downstream systems for Data analysis and engineering type of roles.
  • Developed Preprocessing job using Spark Data frames to flatten Json documents to flat file.
  • Load D-Stream data into Spark RDD and do in memory data Computation to generate Output response.
  • Involved in performance tuning of Spark jobs using Cache and using complete advantage of cluster environment.
  • Experienced in writing live Real-time Processing and core jobs using Spark Streaming with Kafka as a data pipe-line system.
  • Experienced in using Spark Core for joining teh data to deliver teh reports and for delivering teh fraudulent activities.
  • Designed Columnar families in Cassandra and Ingested data from RDBMS, performed data transformations, and tan exported teh transformed data to Cassandra as per teh business requirement.
  • Used DataStaxSpark-Cassandra connector to load data into Cassandra and used CQL to analyze data from Cassandra tables for quick searching, sorting and grouping.
  • Experienced in Creating data-models for Client’s transactional logs, analyzed teh data from Cassandra tables for quick searching, sorting and grouping using teh Cassandra Query Language(CQL).
  • Tested teh cluster Performance using Cassandra-stress tool to measure and improve teh Read/Writes.
  • Good understanding of Cassandra architecture, replication strategy, gossip, snitch etc.
  • Developed Sqoop Jobs to load data from RDBMS, External Systems into HDFS and HIVE.
  • Developed Oozie coordinators to schedule Pig and Hive scripts to create Data pipelines.
  • Worked and learned a great deal from AWS Cloud services like EC2, S3, EBS, RDS and VPC.
  • Migrated an existing on-premises application to AWS. Used AWS services like EC2 and S3 for small data sets processing and storage.
  • Experienced in Maintaining teh Hadoop cluster on AWS EMR.
  • Imported data from AWS S3 into Spark RDD, Performed transformations and actions on RDD's.
  • Implemented Elastic Search on Hive data warehouse platform.
  • Worked with ELASTIC MAPREDUCE and setup Hadoop environment in AWS EC2 Instances.
  • Used Hive QL to analyze teh partitioned and bucketed data, executed Hive queries on Parquet tables stored in Hive to perform data analysis to meet teh business specification logic.
  • Experience in using Avro, Parquet, RCFile and JSON file formats, developed UDFs in Hive and Pig.
  • Worked with Log4j framework for logging debug, info & error data.
  • Developed Custom Pig UDFs in Java and used UDFs from PiggyBank for sorting and preparing teh data.
  • Developed Custom Loaders and Storage Classes in PIG to work on several data formats like JSON, XML, CSV and generated Bags for processing using pig etc.
  • Experienced with Full Text Search and Faceted Reader search using Solr and implemented data querying with Solr.
  • Well versed on Data Warehousing ETL concepts using Informatica Power Center, OLAP, OLTP and AutoSys.
  • Setting up and worked on Kerberos autantication principals to establish secure network communication on cluster and testing of HDFS, Hive, Pig and MapReduce to access cluster for users.
  • Continuous monitoring and managing teh Hadoop cluster through Cloudera Manager.
  • Used teh external tables in Impala for data analysis.
  • Generated various kinds of reports using Power BI and Tableau based on Client specification.
  • Used Jira for bug tracking to check-in and checkout code changes.
  • Worked with Network, Database, Application, QA and BI teams to ensure data quality and availability.
  • Responsible for generating actionable insights from complex data to drive business results for various application teams and worked in Agile Methodology projects extensively.
  • Hands on experience with container technologies such as Docker, embed containers in existing CI/CD pipelines.
  • Worked with SCRUM team in delivering agreed user stories on time for every Sprint.

Environment: Hadoop, Spark, Spark-Streaming, Spark SQL, AWS EMR, MapR, HDFS, Hive, Pig, Apache Kafka, Sqoop, Java (JDK SE 6, 7), Scala, Azure, Shell scripting, Linux, MySQL, SOLR, Jenkins, Eclipse, Oracle, Git, Oozie, Tableau, MySQL, Soap, NIFI, Cassandra and Agile.

Confidential, Los Angeles CA

Big data Engineer/Spark Developer

Responsibilities:

  • Performed Data Injection from various API's which holds Geospatial location, Weather and Product based information of teh fields and products grown in it.
  • Worked on Cleaning, Processing teh data obtained and performing statistical analysis it to get useful insights.
  • Explored Spark framework for improving teh performance and optimization of teh existing algorithms in Hadoop using Spark Core, Spark SQL, Spark Streaming APIs.
  • Ingested data from relational databases to HDFS on regular basis using Sqoop incremental import.
  • Extracted structured data from multiple relational data sources as DataFrames in SparkSQL.
  • Involved in schema extraction from file formats like Avro, Parquet.
  • Transformed teh DataFrames as per teh requirements of data science team.
  • Loaded teh data into HDFS in Parquet, Avro formats with compression codecs like Snappy, LZO as per teh requirement.
  • Worked on teh integration of Kafka service for stream processing.
  • Worked towards creating near real time data streaming solutions using SparkStreaming, Kafka and persist teh data in Cassandra.
  • Involved in data modeling, ingesting data into Cassandra using CQL, Java APIs and other drivers.
  • Implemented CRUD operations using CQL on top of Cassandra file system.
  • Analyze teh transactional data in HDFS using Hive and optimizing teh performance of teh queries by segregating teh data using clustering and partitioning.
  • Developed Spark Applications for various business logics using Scala.
  • Created Dynamic visualizations and displaying teh statistics of teh data based on location on teh maps.
  • Wrote Restful API's in scala to implement teh functionality defined.
  • Collaborated with other teams in teh data pipeline to achieve desired goals.
  • Used Amazon Dynamodbto gather and track teh event based metrics.
  • Imported data from AWS S3 into Spark RDD, Performed transformations and actions on RDDs.
  • Worked with various AWS Components such as EC2, S3, IAM, VPC, RDS, Route 53, SNS and SQS.
  • Involved in pulling teh data from Amazon S3 data lake and built Hive tables using Hive Context in Spark
  • Involved in running Hive queries and Spark jobs on data stored in S3.
  • Run short term ad-hoc queries, jobs on teh data stored on S3 using AWS EMR.

Environment: Hadoop, HDFS, Azure, Hive, Kafka, Sqoop, Shell Scripting, Spark, AWS EMR, Linux-Cent OS, AWS S3, Cassandra, Java, Scala, Eclipse, Maven, Agile.

Confidential, Dallas Tx

Hadoop Developer

Responsibilities:

  • Primary responsibilities include building scalable distributed data solutions using Hadoop ecosystem.
  • Experienced in designing and deployment of Hadoop cluster and different big data analytic tools including Pig, Hive, Flume, Hbase and Sqoop.
  • Imported weblogs and unstructured data using teh Apache Flume and store it in Flume channel.
  • Loaded teh CDRs from relational DB using Sqoop and other sources toHadoop cluster by Flume.
  • Developed Pig Latin Scripts to extract teh data from teh web server and teh output files to load into HDFS.
  • Worked on migrating MapReduce programs into Spark transformations using Spark and Scala, initially done using python (PySpark).
  • Developed Spark jobs using Scala on top of Yarn/MRv2 for interactive and Batch Analysis.
  • Experienced in querying data using SparkSQL on top of Spark engine for faster data sets processing.
  • Worked on implementing Spark Framework, a Java based Web Framework.
  • Worked with Apache SOLR to implement indexing and wrote Custom SOLR query segments to optimize teh search and written Java code to format XML documents, uploaded them to Solr server for indexing.
  • Experience in creating tables, dropping and altered at run time without blocking updates and queries using HBase and Hive.
  • Experience in working with different join patterns and implemented both Map and Reduce Side Joins.
  • Wrote Flume configuration files for importing streaming log data into HBase with Flume.
  • Imported several transactional logs from web servers with Flume to ingest teh data into HDFS.
  • Continuous monitoring and managing theHadoop cluster through Hortonworks (HDP) distribution.
  • Configured various views in Ambari such as Hive view, Tez view, and Yarn Queue manager
  • Installed and configured Pig, written Pig Latin scripts to convert teh data from Text file to Avro format.
  • Created Partitioned Hive tables and worked on them using HiveQL.
  • Designed and developed Job flows using Oozie, managing and reviewing log files.
  • Written and Implemented Teradata Fast load, Multiload scripts, DML andDDL.
  • Teh data is collected from distributed sources into AVRO models. Applied transformations and standardizations and loaded into HBase for further data processing.
  • Importing log files using Flume into HDFS and load into Hive tables to query data.
  • Used Oozie Operational Services for batch processing and scheduling workflows dynamically.
  • Worked on Ad hoc queries, Indexing, Replication, Load balancing, Aggregation in MongoDB.
  • Processed teh Web server logs by developing Multi-hop flume agents by using Avro Sink and loaded into MongoDB for further analysis, also extracted files from MongoDB through Flume and processed.
  • Expert knowledge on MongoDB NoSQL data modeling, tuning, disaster recovery backup used it for distributed storage and processing using CRUD.
  • Extracted and restructured teh data into MongoDB using import and export command line utility tool.
  • Implemented Custom Sterilizer, interceptors to Mask, created confidential data and filter unwanted records from teh event payload in flume.
  • Installed, Configured TalendETL on single and multi-server environments.
  • Worked on continuous Integration tools Jenkins and automated jar files at end of day.
  • Worked with Tableau and Integrated Hive, Tableau Desktop reports and published to Tableau Server.
  • Developed data pipeline expending Pig and Java MapReduce to consume customer behavioral data and financial antiquities into HDFS for analysis.
  • Developed MapReduce programs in Java for parsing teh raw data and populating staging tables.
  • Developed Unix shell scripts to load files into HDFS from Linux File System.
  • Collaborated with Database, Network, application and BI teams to ensure data quality and availability.
  • Supported in setting up QA environment and updating configurations for implementing scripts with Pig, Hive and Sqoop.
  • Experienced knowledge over designing Restful services using java-based APIs like JERSEY.
  • Followed Agile methodology for teh entire project. Experienced in Extreme Programming, Test-Driven Development and Agile Scrum

Environment: Hadoop, HDFS, Hive, Map Reduce, Hortonworks, AWS EC2, SOLR, TEZ, MySQL, Oracle, Sqoop, Flume, Spark, SQL Talend, Python, PySpark, Yarn, Pig, Oozie, Linux, Scala,Ab Initio, Tableau, Maven, Jenkins, Java (JDK 1.6), Agile.

Confidential, Sunnyvale, CA

Java Developer

Responsibilities:

  • Understanding requirement and teh technical aspects and architecture of teh existing system
  • Help Design application development using Spring MVC framework, front-end interactive page design using HTML, JSP, JSTL, CSS, JavaScript, jQuery and AJAX.
  • Utilized various JavaScript andjQuerylibraries, AJAX for form validation and other interactive features.
  • Involved in writing SQL queries for fetching data from Oracle database.
  • Developed multi-tiered web - application using J2EE standards.
  • Designed and developed Web Services to store and retrieve user profile information from database.
  • Used Apache Axis to develop web services and SOAP protocol for web services communication.
  • Used Spring DAO concept to interact with Database using JDBC template and Hibernate template.
  • Well Experienced in deploying and configuring applications onto application servers like Web logic, WebSphere and Apache Tomcat.
  • Created RESTful web services interface to Java-based runtime engine and accounts.
  • Done thorough code walk through for teh team members to check teh functional coverage and coding standards.
  • Actively involved in writing SQL using SQL query builder.
  • Followed AGILE Methodology and SCRUM to deliver teh product with cross-functional skills.
  • Used JUnit to test persistence and service tiers. Involved in unit test case preparation.
  • Hands on experience in software configuration / change control process and tools like Subversion (SVN), Git CVS and Clear Case.
  • Actively used teh defect tracking tool JIRA to create and track teh defects during QA phase of teh project
  • Worked closely with team members on and offshore in development when having dependencies.
  • Involved in sprint planning, code review and daily standup meetings to discuss teh progress of teh application.

Environment: Java/J2EE, HTML, Ajax, Servlets, JSP, SQL, JavaScript, CSS, XML, SOAP, Windows, Unix, Tomcat Server, Spring MVC, Hibernate, JDBC, Agile, Git, SVN.

We'd love your feedback!