We provide IT Staff Augmentation Services!

Sr. Big Data/spark Developer Resume

0/5 (Submit Your Rating)

Baltimore, MD

SUMMARY

  • A dynamic professional with over 9+ years of diversified experience in the field of Information Technology with an emphasis on Big Data/HadoopEco System, SQL/NO - SQL databases, Java /J2EE technologies and tools using industry accepted methodologies and procedures.
  • Extensive experience in working with various distributions of Hadoop like enterprise versions of Cloudera, Hortonworks and good knowledge on MAPR distribution and Amazon’s EMR.
  • In depth experience in using various Hadoop Ecosystem tools like HDFS, MapReduce, Yarn, Pig, Hive, Sqoop, Spark, Storm, Kafka, Oozie, Elastic search, HBase, and Zookeeper.
  • Experienced with the Apache Spark improving the performance and optimization of the existing algorithms in Hadoop using Apache Spark Context, Apache Spark-SQL, Data Frame, Pair RDD's, Apache Spark YARN.
  • Knowledge in installation, configuration, supporting and managing Hadoop Clusters using Apache, Cloudera (CDH3, CDH4) distributions and on Amazon web services (AWS).
  • Exposure to Data Lake Implementation using Apache Spark and developed Data pipe lines and applied business logics using Spark.
  • Experience in using different components of Spark like Spark Streaming to process real-time data as well as historical data.
  • Worked with Apache Spark which provides fast and general engine for large data processing integrated with functional programming language Scala.
  • Used Spark-Streaming APIs to perform necessary transformations and actions on the fly for building the common learner data model which gets the data from Kafka in near real time and Persists into Cassandra.
  • Developed streaming pipelines using Kafka, Memsql and Storm.
  • Experience in capturing data and importing it to HDFS using Flume and Kafka for semi-structured data and Sqoop for existing relational databases.
  • Used Scala and Python to convert Hive/SQL queries into RDD transformations in Apache Spark.
  • Worked with RDD and Dataframes to process the data in spark.
  • Implemented Spark RDD Transformations, actions to migrate Map reduce algorithms.
  • Experience in integrating Hive queries into Spark environment using Spark SQL.
  • Expertise in performing real time analytics on big data using HBase and Cassandra.
  • Great familiarity with creating Hive tables, Hive joins & HQL for querying the databases eventually leading to complex Hive UDFs.
  • Handled importing data from RDBMS into HDFS using Sqoop and vice-versa.
  • Experience in developing data pipeline using Pig, Sqoop, and Flume to extract the data from weblogs and store in HDFS.
  • Created User Defined Functions (UDFs), User Defined Aggregated Functions (UDAFs) in PIG and Hive.
  • Hands-on experience in tools like Oozie and Airflow to orchestrate jobs.
  • Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice-versa and have experience in using Apache NiFi to copy the data from local file system to HDFS.
  • Hands on expertise in working and designing of Row keys & Schema Design with NOSQL databases like Mongo DB 3.0.1, HBase, Cassandra and DynamoDB (AWS).
  • Experience in performance tuning a Cassandra cluster to optimize it for writes and reads.
  • Experience in developing data pipeline using Pig, Sqoop, and Flume to extract the data from weblogs and store in HDFS. Accomplished developing Pig Latin Scripts and using Hive Query Language for data analytics.
  • Worked on different compression codecs (ZIO, SNAPPY, GZIP) and file formats (ORC, AVRO, TEXTFILE, PARQUET).
  • Experience in practical implementation of cloud-specific AWS technologies including IAM, Amazon Cloud Services like Elastic Compute Cloud (EC2), ElastiCache, Simple Storage Services (S3), Cloud Formation, Virtual Private Cloud (VPC), Route 53, Lambda, EBS.
  • Strong experience in working with ELASTIC MAPREDUCE and setting up environments on Amazon AWS EC2 instances.
  • Built AWS secured solutions by creating VPC with public and private subnets.
  • Experience in Enterprise search using SOLR to implement full text search with advanced text analysis, faceted search, filtering using advanced features like dismax, extended dismax and grouping.
  • Experienced in writing Ad Hoc queries using Cloudera Impala, also used Impala analytical functions. Good understanding of MPP databases such as HP Vertica.
  • Installed and configured the clusters using CDH & HDP using AWS and local resources.
  • Worked on data warehousing and ETL tools like Informatica, Talend, and Pentaho.
  • Analyzed the data and developed a set of dashboards using Power BI for the Consumer Marketing Team that was consumed by Executives to make valuable business decisions
  • Expertise working in JAVA J2EE, JDBC, ODBC, JSP, Java Eclipse, Java Beans, EJB, Servlets.
  • Developed web page interfaces using JSP, Java Swings, and HTML scripting languages.
  • Experience working with Spring and Hibernate frameworks for JAVA.
  • Worked on various programming languages using IDEs like Eclipse, NetBeans, and Intellij.
  • Excelled in using version control tools like PVCS, SVN, VSS and GIT.
  • Well versed in Atlassian tools like Bamboo, Bitbucket, Github and JIRA.
  • Used web-based UI development using JavaScript, jquery UI, CSS, jquery, HTML, HTML5, XHTML and JavaScript.
  • Development experience in DBMS like Oracle, MS SQL Server, and MYSQL.
  • Developed stored procedures and queries using PL/SQL.
  • Responsible for all backup, recovery, and upgrading of all PostgreSQL databases. Monitoring databases to optimize database performance and diagnosing any issues.
  • Experience with best practices of Web services development and Integration (both REST andSOAP).
  • Experienced in using build tools like Ant, Gradle, SBT, Maven to build and deploy applications into the server.
  • Knowledge in Unified Modeling Language (UML) and expertise in Object Oriented Analysis and Design (OOAD) and knowledge.
  • Experience in complete Software Development Life Cycle (SDLC) in both Waterfall and Agile methodologies.
  • Knowledge in Creating dashboards and data visualizations using Tableau to provide business insights.
  • Excellent communication skills, interpersonal skills, problem-solving skills and very good team player along with a will do attitude and ability to effectively communicate with all levels of the organization such as technical, management and customers.

TECHNICAL SKILLS

Big Data Technologies: HDFS, YARN, MapReduce, Hive, Pig, Impala, Sqoop, Storm, Flume, Spark, Apache Kafka, Zookeeper, Solr, Ambari, Oozie

NO SQL Databases: HBase, Cassandra, MongoDB, Redshift, Redis

Languages: C, C++, Java, Scala, Python, HTML, SQL, PL/SQL, Pig Latin, HiveQL, Unix, Java Script, Shell Scripting

Java & J2EE Technologies: Core Java, Servlets, Hibernate, Spring, Struts, JMS, EJB

Application Servers: WebSphere, WebLogic, JBoss, Tomcat

Cloud Computing Tools: Amazon AWS, (S3, EMR, EC2, Lambda, VPC, Route 53, Cloud Watch), Google Cloud

Databases: Oracle 10g/11g, Microsoft SQL Server, MySQL, DB2

Build Tools: Jenkins, Maven, ANT

Business Intelligence Tools: Tableau, Splunk

Development Tools: Eclipse, IntelliJ, Microsoft SQL Studio, Toad, NetBeans

Development Methodologies: Agile, Waterfall

PROFESSIONAL EXPERIENCE

Sr. Big Data/Spark Developer

Confidential, Baltimore MD

Responsibilities:

  • Involved in analyzing business requirements and prepared detailed specifications that follow project guidelines required for project development.
  • Configured Spark streaming to get ongoing information from the Kafka and store the dstream information to HDFS.
  • Responsible for fetching real time data using Kafka and processing using Spark streaming with Scala.
  • Involved in loading data from rest endpoints to Kafka Producers and transferring the data to Kafka Brokers.
  • Developed business logic using Kafka Direct Stream in Spark Streaming and implemented business transformations.
  • Worked on Building and implementing real-time streaming ETL pipeline using Kafka Streams API.
  • Worked on confluent kafka core (API's and CONNECT), K-streams and KSQL.
  • Migrated Map Reduce programs into Spark transformations using Scala.
  • Experienced with Spark Context, Spark-SQL, Spark YARN.
  • Implemented Spark-SQL with various file formats like JSON, Parquet and ORC.
  • Implemented Spark Scripts using Scala, Spark SQL to accesshivetables into spark for faster processing of data.
  • Loaded the data into Spark RDD and perform in memory data Computation to generate the Output response.
  • Worked on Spark-Streaming APIs to perform transformations and actions to store and stream data into HDFS by using Scala.
  • Good knowledge in setting up batch intervals, split intervals and window intervals in Spark Streaming.
  • Implemented data quality checks using Spark Streaming and arranged passable and bad flags on the data.
  • Developed traits and case classes etc in Scala.
  • Analyzing data and predicting results using Machine Learning and Artificial Intelligence in a Hadoop environment
  • Implemented usage of Amazon EMR for processing Big Data across Hadoop Clusterof virtual servers on Amazon Elastic Compute Cloud (EC2) and Amazon Simple Storage Service (S3).
  • Deployed the project on Amazon EMR with S3 connectivity for setting a backup storage. Also worked on RESTful Web Services.
  • Tested the performance using Elasticsearch and Kibana with APM
  • Worked extensively with importing metadata into Hive and migrated existing tables and applications to work on Hive and AWS cloud and making the data available in Athena and Snowflake.
  • Implemented CICD allowing for deploy to multiple client Kubernetes/AWS environments.
  • Updated Microservice CICD Pipeline to include dynamic test execution.
  • Used BitBucket to check-in and checkout code changes.
  • Worked on Hive to implement Web Interfacing and stored the data in Hive external tables.
  • Implemented Hive Partitioning and Bucketing on the collected data in HDFS.
  • Involved in Data Querying and Summarization using Hive and created UDF’s, UDAF’s and UDTF’s.
  • Performed transformations, cleaning and filtering on imported data using Hive, Map Reduce, Impala and loaded final data into HDFS.
  • Created and managed table in Hive and Impala using Hue Web Interface.
  • Extensively works in data Extraction, Transformation and Loading from source to target system using Informatica and Teradata utilities like fast export, fast load, multi load, TPT.
  • Works with Teradata utilities like BTEQ, Fast Load and Multi Load.
  • Strong knowledge in Power BI on how to import, shape, and transform data for business intelligence (BI), Visualize data, author reports, schedule automated refresh of reports and create and share dashboards based on reports in Power BI desktop.
  • Worked on statistical techniques including predictive statistical models, segmentation analysis, customer profiling, survey design and analysis, and data mining
  • Implemented Sqoop jobs to import/export large data exchanges between RDBMS and Hive platforms.
  • Extensively used Zookeeper as a backup server and job scheduling of Spark Jobs.
  • Knowledge on MLLib (Machine Learning Library) framework for auto suggestions.
  • Worked on Cloudera distribution and deployed on AWS EC2 Instances.
  • Experienced in Migrating the data from CDH to AWS.
  • Experienced in loading the real-time data to NoSQL database like Cassandra.
  • Experienced in using Data Stax Spark-Cassandra Connector which is used to store the data in Cassandra from Spark.
  • Involved in Cassandra by writing scripts and invoking them using CQLSH.
  • Well versed in using Data Manipulations, Compactions, tombstones in Cassandra.
  • Experience in retrieving the data present in Cassandra cluster by running queries in CQL (Cassandra Query Language).
  • Worked on connecting Cassandra database to the Amazon EMR for storing the database in S3.
  • Well versed in using of Elastic Load Balancer for Autoscaling in EC2 servers.
  • Configured work flows that involves Hadoop actions using Oozie scheduler.
  • Used Oozie work flows and Java schedulers to manage and schedule jobs on aHadoopcluster.
  • Involved in coding, maintaining, and administering JSP components to be deployed on spring boot and Apache Tomcat application servers.
  • Used Sqoop to import data from Relational Databases like MySQL, Oracle.
  • Continuously monitored and managed the Hadoop Cluster using Cloudera Manager.
  • Used Cloudera manager to pull metrics on various cluster features like JVM, Running Map and reduce tasks.
  • Involved in importing structured and unstructured data into HDFS.
  • Developed Pig scripts to help perform analytics on JSON and XML data.
  • Experienced with Faceted Reader search and Full Text Search Data querying using Solr.
  • Maintain the Data lake in Hadoop by building data pipe line using Sqoop, Hive and PySpark.
  • Worked on Python for pattern matching in build logs to format warnings and errors.
  • Developed Python scripts to find vulnerabilities with SQL Queries by doing SQL injection.
  • Generated PostgreSQL database reports such as financial data statements and user data.
  • Reverse Engineer and recreate PostgreSQL to update their date dimension object to replace lost code.
  • Created Tableau visualization for the internal management (client team) using Simba SparkSQL Connector.
  • Integrated Hive server 2 with Tableau using Horton Works Hive ODBC driver, for auto generation of Hive queries for non-technical business user.
  • Dealt with Jira for tracking purposes.
  • Coordinated with SCRUM team in delivering agreed user stories on time for every sprint.

Environment: Hadoop stack, Spark SQL, KSQL, Spark-Streaming, AWS S3, AWS EMR, google cloud, GraphX, Scala, Python, Pyspark, Kafka, Hive, Pig, Sqoop, Solr, Oozie, vertica, Impala, CICD, Cassandra, Cloudera, Oracle 10g, MySQL, spring boot, Linux.

Hadoop Developer

Confidential, Dallas Texas

Responsibilities:

  • Involved in review of functional and non-functional requirements (NFR’s).
  • Worked on analyzing Hadoop cluster using different big data analytic tools including Pig, Hive, Oozie, Zookeeper, Sqoop, Spark, Kafka and Tez with Hortonworks(HDP) distribution.
  • Contributed to building hands-on tutorials for the community to learn how to use Hortonworks Data Platform and Hortonworks DataFlow covering categories such as Hello World, Real-World use cases, Operations.
  • Collected and aggregated huge data of weblogs and unstructured data from various sources such as web servers, network devices using Apache Flume and stored the data into HDFS for analysis.
  • Implemented transformations and data quality checks using Flume Interceptor.
  • Implemented the business logic in Flume Interceptor in Java.
  • Developed Restful APIs using Spring Boot for faster development and created Swagger documents as specs for defining the REST APIs.
  • Involved in configuring Sqoop and flume to extract/export data from IBM QRadar and MySQL.
  • Responsible for Collection and aggregation of copious amounts of data from various sources and ingested into Hadoop file system (HDFS) using Sqoop and Flume, the data was transformed to business use cases using Pig and Hive.
  • Developed and maintained data integration programs in RDBMS and Hadoop environment for data access and analysis.
  • Worked on importing of data from various data sources, performed transformations using Hive, MapReduce, and loaded data into HDFS.
  • Developed and implemented map reduce jobs to support distributed processing using Java, Hive and Apache Pig.
  • Executed Hive queries on Parquet tables stored in Hive metastore to perform data analysis to meet the business requirements.
  • Implemented Partitioning and Bucketing in HIVE.
  • Tested Apache TEZ, an extensible framework for building high performance batch and interactive data processing application on Pig and Hive jobs.
  • Created Pig Latin scripts to sort, group, join and filter the enterprise wise data.
  • Used Spark-SQL to Load data into Hive tables and Written queries to fetch data from these tables.
  • Developed Pig scripts and UDF's as per the Business logic.
  • Used Pig to import semi-structured data like Avro files and perform serialization.
  • Performed secondary indexing on tables usingElastic Search.
  • Evaluated Hortonworks NiFi (HDF 2.0) and recommended solution to inject data from multiple data sources to HDFS & Hive using NiFi and importing data using Nifi tool from Linux servers.
  • Involved in MapReduce program to develop multiple Map Reduce jobs in Java for data cleaning and processing.
  • Responsible for testing and debugging the Map Reduce programs.
  • Experienced in implementing Map Reduce programs to handle semi/unstructured data like json, XML, Avro data files and sequence files for log files.
  • Developed Multi-hop flume agents by using Avro Sink to process web server logs and loaded them into MongoDB for further analysis.
  • Experience in Working withMongoDBfor distributed storage and processing.
  • Responsible for using Flume sink to remove the data from Flume channel and to deposit in MongoDB.
  • Implemented collections & Aggregation Frameworks in MongoDB.
  • Configured Oozie workflow engine to automate Map/Reduce jobs.
  • Have experience in Nifi which runs in a cluster and provides real-time control that makes it easy to manage the movement of data between any source and any destination.
  • Have experience in Nifi which can be used in mission-critical data flows with rigorous security & compliance requirements.
  • Worked with BI teams in generating the reports and designing ETL workflows on Tableau.
  • Collaborated with Database, Network, application and BI teams to ensure data quality and availability.
  • Hands-on experience in using python Scripts to handle data manipulation.
  • Experienced in using agile approaches including Test-Driven Development, Extreme Programming and Agile Scrum.

Environment: Hortonworks HDP, Hadoop, Spark, Flume, Elastic Search, AWS, EC2, S3, Pig, Hive, MapReduce, HDFS, NiFi, Python, Java, MongoDB, spring boot, Zookeeper, Avro.

Hadoop Developer

Confidential

Responsibilities:

  • Worked on analyzing Hadoop cluster using different big data analytic tools including Pig, Hive. HBase and MapReduce.
  • Extracted data of everyday transaction of customers from DB2 and export to Hive and setup Online analytical processing.
  • Installed and configured Hadoop, MapReduce, and HDFS clusters.
  • Used Flume to collect, aggregate and store the web log data from dissimilar sources like web servers, mobile and network devices and import to HDFS.
  • Exported analysed data to the relational databases using Sqoop for visualization & Report generation.
  • Created Hive tables, loaded the data and Performed data manipulations using Hive queries in MapReduce Execution Mode.
  • Developed MapReduce programs to cleanse the data in HDFS obtained from heterogeneous data sources to make it suitable for ingestion into Hive schema for further analysis.
  • Loaded the structured data which was resulted from MapReduce jobs into Hive tables.
  • Analyzed user request patterns and implemented various performance optimizations like using skewed joins, SerDe tecniques in HiveQL.
  • Identified issues on behavioral patterns and analyzed the logs using Hive queries.
  • Involved in using HCATALOG to access Hive table metadata from Map Reduce or Pig code.
  • Developed several REST web services which produces both XML and JSON to perform tasks, leveraged by both web and mobile applications.
  • Implemented business logic by writing UDFs in Java and used various UDFs from other sources.
  • Developed Unit test cases for Hadoop M-R jobs and driver classes with MR Testing library.
  • Analyze and transform stored data by writing MapReduce or Pig jobs based on business requirements.
  • Using Oozie developed workflow to automate the tasks of loading the data into HDFS and pre-process with Pig scripts.
  • Integrated Map Reduce with HBase to import bulk data using MR programs.
  • Used Maven extensively for building jar files of MapReduce programs and deployed to Cluster.
  • Worked in developing Pig Scripts for data capture change and delta record processing between newly arrived data and already existing data in HDFS.
  • Developed data pipelines using Sqoop, Pig and Java MapReduce to ingest behavioral data into HDFS for analysis.
  • Installed Oozie workflow engine to run multiple Hive and Pig jobs which run independently with time and data availability.
  • Worked on POC of Talend with Hadoop. Worked in improving performance of the Talend jobs.
  • Used Pig as ETL tool to do Transformations, joins and pre-aggregations before storing the data on to HDFS.
  • Used SQL queries, Stored Procedures, User Defined Functions (UDF), Database Triggers using tools like SQL Profiler and Database Tuning Advisor (DTA).
  • Installed a cluster, commissioned & decommissioned data node, performed name node recovery, capacity planning and slots configuration adhering to business requirements.

Environment: HDFS, Hortonworks HDP, Map Reduce, Pig, Hive, Oozie, Sqoop, Flume, HBase, Talend, HiveQL, Java, Maven, Avro, Eclipse and Shell Scripting.

Hadoop Developer

Confidential

Responsibilities:

  • Worked in tuning Hive and Pig to improve performance and solved performance issues in Hive and Pig scripts with understanding of Joins, Group and aggregation and how does it translate to Map Reduce jobs.
  • Exported analysed data to the relational databases using Sqoop for visualization & Report generation.
  • Installed and configuredHadoopMapReduce, HDFS, developed multiple MapReduce jobs in Java for data cleaning and pre-processing.
  • Wrote MapReduce jobs to discover trends in data usage by users.
  • Established custom MapReduce programs to analyze data and used Pig Latin to clean unwanted data.
  • Written multiple MapReduce programs in Java for Data Analysis.
  • Wrote MapReduce job using Pig Latin and Java API.
  • Extensively worked on analyzing data using HiveQL, Pig Latin, and customMapReduceprograms.
  • Collected the logs from the physical machines and the OpenStack controller and integrated into HDFS using Flume.
  • Involved in automation of FTP process in Talend and FTPing the Files in UNIX.
  • Migrated ETL jobs to Pig scripts do Transformations, even joins and some pre-aggregations before storing the data onto HDFS.
  • Created and maintained Technical documentation for launchingHADOOPClusters and for executing pig Scripts.
  • Experience in various data transformation and analysis tools like Map Reduce, Pig and Hive to handle files in multiple formats (JSON, Text, XML, Binary, Logs etc.).

Environment: Hadoop, Map Reduce, Pig, Hive, Flume, Java, HDFS, ETL, JSON, XML.

Java Developer

Confidential

Responsibilities:

  • Developing rules based on different state policy using SpringMVC, iBatis ORM, spring web flow, JSP, JSTL, Oracle, MSSQL, SOA, XML, XSD, JSON, AJAX, Log4j.
  • Gathered requirements, developed, implemented, tested and deployed enterprise integration patterns (EIP) based applications using Apache Camel, JBoss Fuse.
  • Developed service classes, domain/DAOs, and controllers using JAVA/J2EE technologies.
  • Designed and developed using web service framework - Apache CX.
  • Worked on Active MQ messaging service for integration.
  • Worked with SQL queries to store and retrieve the data in MS SQL server.
  • Performed unit testing using Junit.
  • Developed front end using JSTL, JSP, HTML, and Java Script.
  • Worked on continuous integration using Jenkins/Hudson.
  • Participated in all phases of development life cycle including analysis, design, development, testing, code reviews and documentations as needed.
  • Used ECLIPSE as IDE, MAVEN for build management, JIRA for issue tracking, CONFLUENCE for documentation purpose
  • GIT for version control, ARC (Advanced Rest Client) for endpoint testing, CRUCIBLE for code review and SQL Developer as DB client.

Environment: Spring Framework, Spring MVC, spring web flow, JSP, JSTL, SOAP UI, rating Engine, IBM Rational Team, Oracle 11g, XML, JSON, Ajax, HTML, CSS, IBM WebSphere Application Server, RAD with sub-eclipse, jenkins, maven, SOA, SonarQube, Log4j, Java, Junit.

We'd love your feedback!