We provide IT Staff Augmentation Services!

Senior Hadoop/spark Developer Resume

3.00/5 (Submit Your Rating)

Secaucus, NJ

SUMMARY

  • 19+ years of professional IT experience wif Over 8 Years of Hadoop/Spark experience in ingestion, storage, querying, processing and analysis of big data.
  • Good experience wif programming languages Scala and Java.
  • Exposure to design and development of database driven systems.
  • Good knowledge of Hadoop architectural components like Hadoop Distributed File System, Name Node, Data Node, Task Tracker, Job Tracker, and MapReduce programming.
  • Experience in developing and deploying of applications using Hadoop based components like Hadoop MapReduce (MR1), YARN (MR2), HDFS, Hive, Pig, HBase, Flume, Sqoop, Spark (Streaming, Spark SQL), Storm, Kafka, Oozie, Zoo Keeper and Parquet.
  • Good experience on general data analytics on distributed computing cluster like Hadoop using Apache Spark, Impala, and Scala.
  • Experience in implementing OLAP multi - dimensional cube functionality usingAzure SQL Data Warehouse.
  • Hands on experience in importing and exporting data into HDFS and Hive using Sqoop.
  • Exposure on usage of NoSQL databases column oriented HBase and Cassandra.
  • Extensive experienced in working wif structured, semi-structured, and unstructured data by implementing complex MapReduce programs using design patterns.
  • Familiar wif data architecture including data ingestion pipeline design, Hadoop information architecture, data modeling and data mining, machine learning and advanced data processing.
  • Hands on experience in major Big Data components Apache Kafka, Apache spark, Zookeeper, Avro.
  • Experienced in extending Hive and Pig core functionality by writing custom UDFs and Map Reduce Scripts using Java & Python.
  • Owned the design, development and maintenance of ongoing metrics, reports, analyses, dashboards, etc. Using tableau, to drive key business decisions and communicate key concepts to readers.
  • Worked cross-functionally between 5 different groups to halp drive analytical ad hoc reporting, dashboard creation and built forecasting modes.
  • Experience in rendering and delivering reports in desired formats by using reporting tools such asTableau.
  • Hands-on experience in working wif Amazon Web Services (AWS) cloud and its services like EC2, S3, Athena RDS, VPC, IAM, Elastic Load Balancing, Lambda, RedShift, Auto Scaling, Cloud Front, Cloud Watch, and other services of the AWS family.
  • Experienced in implementing unified data platforms using Kafka producers/ consumers, implement pre-processing using storm topologies.
  • Experience using various Hadoop Distributions (Cloudera, Hortonworks, etc.) to fully implement and leverage new Hadoop features.
  • Great team player and quick learner wif effective communication, motivation, and organizational skills combined wif attention to details and business improvements.
  • Experienced in involving complete SDLC life cycle includes requirements gathering, design, development, Testing and production environments.

TECHNICAL SKILLS

Hadoop/Big Data: HDFS, MapReduce, Spark, Yarn, Kafka, PIG, HIVE, Sqoop, Storm, Flume, Oozie, Impala, HBase, Hue, Zookeeper.

Programming Languages: Java, PL/SQL, Pig Latin, Python, R, HiveQL, Scala, SQL

Java/J2EE & Web Technologies: J2EE, EJB, JSF, Servlets, JSP, JSTL, CSS, HTML, XHTML, XML, Angular JS, AJAX, JavaScript, JQuery.

Development Tools: Eclipse, SVN, Git, Ant, Maven, SOAP UI

Databases: Greenplum, Oracle 11g/10g/9i, Teradata, MS SQL

No SQL Databases: Apache HBase, Mongo DB

Frameworks: Struts, Hibernate, And Spring MVC.

Distributed platforms: Hortonworks, Cloudera.

Operating Systems: UNIX, Ubuntu Linux and Windows 00/XP/Vista/7/8

PROFESSIONAL EXPERIENCE

Confidential, Secaucus,NJ

Senior Hadoop/Spark Developer

Responsibilities:

  • Created Spark jobs to see trends in data usage by users.
  • Worked towards creating real time data streaming solutions using Apache Spark/Spark Streaming, Kafka.
  • Developed Spark scripts by using Scala shell commands as per the requirement.
  • Involved in developing a MapReduce framework dat filters bad and unnecessary records.
  • Designed the Column families in Cassandra.
  • Ingested data from RDBMS and performed data transformations, and then export the transformed data to Cassandra as per the business requirement.
  • Developed Spark code to using Scala and Spark-SQL for faster processing and testing.
  • Used Spark API over Hadoop YARN as execution engine for data analytics using Hive.
  • Exported the analyzed data to the relational databases using Sqoop to further visualize and generate reports for the BI team.
  • Created various kinds of reports using Power BI and Tableau based on the client's needs.
  • Setting up Kerberos principals and testing HDFS, Hive, Pig, and MapReduce access for the new users.
  • Migrated the computational code in hql toPySpark.
  • Worked wif Spark Ecosystem using Scala and Hive Queries on different data formats like Text file and parquet.
  • Worked in migrating Hive QL into Impala to minimize query response time.
  • Responsible for migrating the code base from Hortonworks Platform to Amazon EMR and evaluated Amazon eco systems components like Redshift.
  • Collected the logs data from web servers and integrated in to HDFS using Flume
  • Worked wif NoSQL databases like Hbase in creating Hbase tables to load large sets of semi structured data coming from various sources.
  • Developed Python scripts to clean the raw data.
  • Developed Hive scripts in Hive QL to de-normalize and aggregate the data.
  • Implemented MapReduce counters to gather metrics of good records and bad records.
  • Developed customized UDF's in java to extend Hive and Pig functionality.
  • Extracted the data from Teradata into HDFS/Databases/Dashboards using SPARK STREAMING.
  • Loaded Gloden collection to Apache Solr using Morphline code for Business team.
  • Worked on different file formats (ORCFILE, Parquet, Avro) and different Compression Codecs (GZIP, SNAPPY, LZO).
  • Created applications using Kafka, which monitors consumer lag wifin Apache Kafka clusters.
  • Involved testing in APS Data Loading, Data Seeding & Data Bridging strategy
  • Using Spark-Streaming APIs to perform transformations and actions on the fly for building the common learner data model which gets the data from Kafka in near real time and Persists into Cassandra.
  • Imported data from AWS S3 into Spark RDD, Performed transformations and actions on RDD's
  • Maintained Hadoop Cluster on AWS EMR. Used AWS services like EC2 and S3 for small data sets processing and storage
  • Design and document REST/HTTP, SOAP APIs, including JSON data formats and API versioning strategy.
  • Used Apache Kafka for collecting, aggregating, and moving large amounts of data from application servers.
  • Used Hibernate ORM framework wif spring framework for data persistence and transaction management.
  • Used MLlib framework in Spark streaming for auto suggestions on predictive intelligence and maintenance.
  • Developed Python code to gather the data from HBase (Cornerstone) and designs the solution to implement usingPySpark.
  • Performance analysis of Spark streaming and batch jobs by using Spark tuning parameters.
  • Worked along wif the Hadoop Operations team in Hadoop cluster planning, installation, maintenance, monitoring and upgrades.
  • Used micro services for data visualization and the functional challenges of planning and implementing some solutions.
  • Implemented Nifi flow topologies to perform cleansing operations before moving data into HDFS.
  • Started using Apache NiFi to copy the data from local file system to HDP.
  • Used File System check (FSCK) to check the health of files in HDFS.
  • Used Amazon Cloud Watch to monitor and track resources on AWS
  • Scheduled the ETL Jobs in AWS Glue developed through using lambda logics, (boto3), S3 to loaded into DynamoDB and Redshift.
  • Designed a data analysis pipeline in Python, using Amazon Web Services such as S3, EC2 and Elastic MapReduce
  • Worked in Agile development environment in sprint cycles of two weeks by dividing and organizing tasks. Participated in daily scrum and other design related meetings.

Environment: Hadoop, Hive, MapReduce, Sqoop, Kafka, Spark, Yarn, Pig, Pyspark, Cassandra, Oozie, Nifi, Solr, Shell Scripting, Hbase, Scala, AWS, Maven, Java, JUnit, agile methodologies, Horton works, Soap, Python, Teradata, MySQL.

Confidential, Mount Laurel, NJ

Senior Big data developer

Responsibilities:

  • Worked extensively on Scala programming for Spark development.
  • Worked extensively on designing and building of scalable flexible data solutions around batch, low latency, search and real time data processing requirements using Spark, Kafka, HBase, Elastic search and Hadoop Eco-systems.
  • Worked extensively wif business in requirement gathering, analysis and high-level design.
  • Worked on the design and implementation of real time streaming ingestion using Flume, Kafka and Spark Streaming.
  • Worked extensively on enrichment/ETL in real time stream jobs using Spark Streaming, Spark SQL and loads into Hbase.
  • Working wif management teams on log analysis reports and working wif fellow developers in identifying the application issues.
  • Worked extensively in writing Kafka Producers to ingest data into Kafka topics using Java 8.
  • Utilized Apache Hadoop by Hortonworks to monitor and manage the Hadoop Cluster.
  • Completed data extraction, aggregation and analysis in HDFS by usingPySparkand store the data needed to Hive.
  • Built a Ingestion Framework dat would ingest the files from SFTP to HDFS using Apache NIFI and ingest Financial data into HDFS.
  • Working wif DBA to design reports for DB replica latency trends, analyzing the transaction logs to find the root cause of the issues.
  • Worked on Transactional logs to process them using Spark and saving them on required formats by applying various ETL tasks on log data and saving the data.
  • Worked on Data ingestion to Kafka and Processing and storing the data Using Spark Streaming.
  • Involved in tuning of Cassandra cluster by changing the parameters of Read operation, Compaction, Memory Cache, Row Cache.
  • Installed and configured Apache Hadoop and Hive/Pig Ecosystems.
  • Created MapReduce Jobs using Hive/Pig Queries.
  • Developed the Pig UDF’S to pre-process the data for analysis.
  • Worked on No Sql database Hbase for storing computed results.
  • Worked extensively on search engines Elastic Search, Novus (In house).
  • Worked on work flow scheduling using Oozie.
  • Developed MapReduce/Spark Python modules for machine learning & predictive analytics in Hadoop on AWS. Implemented a Python-based distributed random forest via Python streaming.
  • Worked on Continuous Integration and Automation Testing Job scheduling using Jenkins and TFS.
  • Analyzing Audit logs using Splunk, querying and designing views and dashboard on Splunk.
  • Production Support by handling production bugs by reproducing in lower environments and fixing and moving them to prod environment by creating Hot Fixes.
  • Designed and created Solr Schemas to create Solr Collections.
  • Laid the guidelines for improving the code quality by implementing TDD and developed integrated test framework using JUnit, Mockito.
  • Installed and Configured Hadoop cluster using AWS for POC purposes.
  • Implemented CI/CD pipeline using Maven & Jenkins.
  • Worked wif CMDB teams on deploying builds to various environments.

Environment: Hadoop (HDFS/Horton Works), Spark, Spark-SQL, Spark-Streaming, Scala, Kafka, JAVA, Nifi, Pig, Hive, Oozie, Stome, H base, Cloudera, AWS, Data stax Cassandra, Linux, Splunk, Elastic search, Pyspark, Kibana, TFS, CMDB, Ant, Jenkins.

Confidential, Birmingham, AL

Spark Developer

Responsibilities:

  • Worked on installing Kafka on Virtual Machine and created topics for different users
  • Actively involved in designing Hadoop ecosystem pipeline.
  • Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
  • Involved in designing Kafka for multi data center cluster and monitoring it.
  • Responsible for importing real time data to pull the data from sources to Kafka clusters.
  • Worked wif spark techniques like refreshing the table and handling parallelly and modifying the spark defaults for performance tuning.
  • Implemented Spark RDD transformations to Map business analysis and apply actions on top of transformations.
  • Involved in migrating MapReduce jobs into Spark jobs and used SparkSQL and Data frames API to load structured data into Spark clusters.
  • Involved in using Spark API over Hadoop YARN as execution engine for data analytics using Hive and submitted the data to BI team for generating reports, after the processing and analyzing of data in Spark SQL.
  • Performed SQL Joins among Hive tables to get input for Spark batch process.
  • Worked wif data science team to build statistical model wif Spark MLLIB and Pyspark.
  • Involved in performing importing data from various sources to the Cassandra cluster using Sqoop.
  • Worked on creating data models for Cassandra from Existing Oracle data model.
  • Designed Column families in Cassandra and Ingested data from RDBMS, performed data transformations, and then export the transformed data to Cassandra as per the business requirement.
  • Used Sqoop to import functionality for loading Historical data present in RDBMS to HDFS
  • Designed workflows and coordinators in Oozie to automate and parallelize Hive jobs on Apache Hadoop environment by Hortonworks (HDP 2.2)
  • Configured Hive bolts and written data to hive in Hortonworks as a part of POC.
  • Implemented ELK (Elastic Search, Log stash, Kibana) stack to collect and analyze the logs produced by the spark cluster.
  • Developed Python script for start a job and end a job smoothly for a UC4 workflow
  • Developed Oozie workflow for scheduling & orchestrating the ETL process.
  • Created Data Pipelines as per the business requirements and scheduled it using Oozie Coordinators.
  • Wrote Python scripts to parse XML documents and load the data in database.
  • Worked extensively on Apache Nifi to build Nifi flows for the existing Oozie jobs to get the incremental load, full load and semi structured data and to get data from Rest API into Hadoop and automate all the Nifi flows runs incrementally.
  • Created Nifi flows to trigger spark jobs and used put email processors to get notifications if their are any failures.
  • Developed shell scripts to periodically perform incremental import of data from third party API to Amazon AWS
  • Worked extensively wif importing metadata into Hive using Scala and migrated existing tables and applications to work on Hive and AWS cloud.
  • Developed the batch scripts to fetch the data from AWS S3 storage and do required transformations in Scala using Spark framework.
  • Used version control tools like GITHUB to share the code snippet among the team members.
  • Involved in daily SCRUM meetings to discuss the development/progress and was active in making scrum meetings more productive.

Environment: Hadoop, HDFS, Hive, Python, Hbase, Nifi, Spark, MYSQL, Oracle 12c, Linux, Hortonworks, Oozie, MapReduce, Sqoop, Shell Scripting, Apache Kafka, Scala, AWS.

Confidential, Dallas, TX

Hadoop Developer

Responsibilities:

  • Analyzing Functional Specifications Based on Project Requirement.
  • Ingested data from various data sources into Hadoop HDFS/Hive Tables using SQOOP, Flume, Kafka.
  • Extended Hive core functionality by writing custom UDFs using Java.
  • Developing Hive Queries for the user requirement.
  • Worked on multiple POCs in Implementing Data Lake for Multiple Data Sources ranging from TeamCenter, SAP, Workday, Machine logs.
  • Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
  • Worked on MS Sql Server PDW migration for MSBI warehouse.
  • Planning, scheduling and implementing Oracle to MS SQL server migrations for AMAT in house applications and tools.
  • Worked on Solr Search Engine to index incident reports data and developed dash boards in Banana Reporting tool.
  • Integrated Tableau wif Hadoop data source for building dashboard to provide various insights on sales of the organization.
  • Worked on Spark in building BI reports using Tableau. Tableau was integrated wif Spark using Spark-SQL.
  • Developed Spark jobs using Scala and Python on top of Yarn/MRv2 for interactive and Batch Analysis.
  • Created multi-node Hadoop and Spark clusters in AWS instances to generate terabytes of data and stored it in AWS HDFS.
  • Developed work flows in Live Compare to Analyze SAP Data and Reporting.
  • Worked on Java development and support and tools support for in house applications.
  • Participated in daily scrum meetings and iterative development.
  • Search functionality for searching through millions of files of logistics groups.

Environment: Hadoop, Hive, Sqoop, Spark, Kafka, Scala, MS SQL Server PDW, TFS, Java.

Confidential

JAVA/J2EE Developer

Responsibilities:

  • Developed JMS API using J2EE package.
  • Made use of Java script for client-side validation.
  • Used Struts Framework for implementing the MVC Architecture.
  • Wrote various Struts action classes to implement the business logic.
  • Involved in the design of the project using UML Use Case Diagrams, Sequence Diagrams, Object diagrams, and Class Diagrams.
  • Understand concepts related to and written code for advanced topics such as Java IO, serialization and multithreading.
  • Used DISPLAY TAGS in the presentation layer for better look and feel of the web pages.
  • Developed Packages to validate data from Flat Files and insert into various tables in Oracle Database.
  • Provided UNIX scripting to drive automatic generation of static web pages wif dynamic news content.
  • Participated in requirements analysis to figure out various inputs correlated wif their scenarios in Asset Liability Management (ALM).
  • Assisted design and development teams in identifying DB objects and their associated fields in creating forms for ALM modules.
  • Also involved in developing PL/SQL Procedures, Functions, Triggers and Packages to provide backend security and data consistency.
  • Responsible for performing Code Reviewing and Debugging.

Environment: Java, J2EE, UML, Struts, HTML, XML, CSS, Java Script, Oracle 9i, SQL*Plus, PL/SQL, MS Access, UNIX Shell Scripting.

Confidential

Programmer Analyst

Responsibilities:

  • Involved in understanding the functional specifications of the project.
  • Assisted the development team in designing the complete application architecture
  • Involved in developing JSP pages for the web tier and validating the client data using JavaScript.
  • Developed connection components using JDBC.
  • Designed Screens using HTML and images.
  • Cascading Style Sheet (CSS) was used to maintain uniform look across different pages.
  • Involved in creating Unit Test plans and executing the same.
  • Did the documents/code reviews and knowledge transfer for the status updates of the ongoing project developments
  • Deployed web modules in Tomcat web server.

Environment: Java, JSP, J2EE, Servlets, Java Beans, HTML, JavaScript, JDeveloper, Tomcat Webserver, Oracle, JDBC, XML.

We'd love your feedback!