We provide IT Staff Augmentation Services!

Hadoop Spark Developer Resume

4.00/5 (Submit Your Rating)

Atlanta, GA

PROFESSIONAL SUMMARY:

  • 8+ years of experience in IT, this includes Analysis, Design, Coding, Testing, Implementation and in Java and Big Data Technologies working with Confidential Hadoop Eco - components
  • Extensive participation in analysis activities, assisting in troubleshooting architectural problems and providing technical solutions to meet business requirements
  • Extensive Experience on major components in Hadoop Ecosystem like Hadoop Map Reduce, HDFS, Hive, PIG, HBase, Zookeeper, Sqoop, Oozie, Flume, Storm, Yarn, Spark, Scala.
  • Experienced in Real Time Streaming applications using Kafka, Flume, Storm and Spark Streaming
  • Capturing data from existing databases that provide SQL interfaces using Sqoop.
  • Expertise in Kerberos Security Implementation
  • Experience in analyzing data using HiveQL, Pig Latin, HBase& custom Map Reduce programs.
  • Good Knowledge in creating event-processing data pipelines using flume, Kafka and Storm.
  • Good understanding & experience on Hadoop Distributions like Cloudera & Hortonworks.
  • Used Spark-Streaming APIs to perform necessary transformations and actions on the fly for building the common learner data model which gets the data in near real time
  • Hands on working experience in Linux environment with Confidential Tomcat. Used UML to design class diagrams for object-oriented analysis and designing.
  • Expertise in data transformation & analysis using SPARK, PIG, HIVE
  • Good understanding & knowledge of NOSQL databases like HBase, Cassandra and Mongo DB.
  • Experience with ETL and Query big data tools like Pig Latin and Hive QL.
  • Expertise in writing Hadoop Jobs for analyzing data using Spark, Hive, Pig MapReduce, Hive.
  • Good understanding of HDFS Designs, Daemons, HDFS High Availability (HA).
  • Hands on experience in AVRO and Parquet file format, Dynamic Partitions, Bucketing for best Practice and Performance improvement.
  • Experience in database design using Stored Procedure, Functions, Triggers and strong experience in writing complex queries for DB2, SQL Server
  • Developed Spark SQL programs for handling different data sets for better performance.
  • Good knowledge of creating event-processing data using Spark Streaming.
  • Experience in building web services using both SOAP and RESTful services in Java.
  • Experience in configuration, deployment and management of enterprise applications on Application Servers like Web Sphere, JBoss and Web Servers like Confidential Tomcat.
  • Experience in performing Unit testing using Junit and TestNG.
  • Extensive experience in documenting requirements, functional specifications, technical specifications.

TECHNICAL SKILLS:

  • Hadoop/Big Data: HDFS Map Reduce, Spark Core, Spark Streaming, Spark SQL, Hive, Tez, Pig, Sqoop, Flume, Kafka, Oozie, NiFi and ZooKeeper, Docker .
  • AWS Components: EC2, S3, RDS, Redshirt, EMR, DynamoDB, Lambda, RDS, SNS, SQS
  • No SQL Databases: HBase, Cassandra, MongoDBLanguages: C, C++, Java, Scala, J2EE, Python, PL/SQL, Pig Latin, HiveQL, UNIX shell scripts
  • Java/J2EE Technologies: Applets, Swing, JDBC, JNDI, JSON, JSTL, RMI, JMS, Java Script, JSP, Servlets
  • EJB, JSF, JQuery Frameworks: MVC, Struts, Spring, Hibernate
  • Operating Systems: Sun Solaris, HP - UNIX, RedHat Linux, Ubuntu Linux and Windows XP/Vista/7/8
  • Web Technologies: HTML, DHTML, XML, AJAX, WSDL, SOAP
  • Web/Application servers: Confidential Tomcat, WebLogic, JBoss
  • Databases: Oracle 9i/10g/11g, DB2, SQL Server, MySQL, Teradata
  • Tools: and IDE: Eclipse, NetBeans, Toad, Maven, SBT, ANT, Hudson, Sonar, JDeveloper, Assent PMD, DBVisualizer
  • Network Protocols: TCP/IP, UDP, HTTP, DNS, DHCP

PROFESSIONAL EXPERIENCE:

Confidential, Atlanta, GA

Hadoop Spark Developer

Responsibilities:

  • Importing data from data sources Kafka, Sqoop, Megatron, Janus, PETL to our legacy databases Teradata and Hadoop.
  • Worked on importing and exporting data into HDFS and Hive using Sqoop, built analytics on Hive tables using Hive Context Responsible for Loading the Customers Data from SAS to MSSQL2016 and perform data massaging, mining & cleansing then export to HDFS and Hive using Sqoop
  • Writing PIGscripts to process the Credit Card and Debit Card Transactions for Active customers by joining the data from HDFS and Hive using HCatalog for various merchants
  • Written Python UDFs to process the RegEx and return the valid Merchant codes and names using streaming
  • Written Java UDFs to convert to upper case of card names & process dates to suitable format in PIG & Hive
  • Responsible for creating data pipeline using Kafka, Flume and Spark Streaming for Twitter source to collect the sentiment tweets of Target customers about the reviews
  • Implemented Kerberos Security Implementation
  • Complete caring of Hive and Spark tuning with partitioning/bucketing of ORC and executors/drivers memory
  • Written Hive UDFs to extract data from staging tables
  • Analyzed the web log data using the HiveQL and process through Flume
  • Executed queries using Hive and developed Map-Reduce jobs to analyze data.
  • Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, HBase NoSQL database and Sqoop
  • Experienced in performance tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning.
  • Experienced in handling large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark, Effective & efficient Joins, Transformations and other during ingestion process itself.
  • Created Hive tables and load the data using Sqoop and worked on them using Hive QL
  • Responsible for developing custom UDFs, UDAFs and UDTFs in Pig and Hive.
  • Optimizing the Hive Queries using the various file formats like JSON, AVRO, ORC, and Parquet
  • Used Spark-Streaming APIs to perform necessary transformations and actions on the fly for building the common word2vec data model, which gets the data from Kafka in near real time and Persists into Cassandra.
  • Operating the cluster on AWS by using EC2, EMR, S3 and ElasticSearch.
  • Configured deployed and maintained multi-node Dev and Test Kafka Clusters.
  • Developed Spark scripts by using Scala shell commands as per the requirement.
  • Migrated existing MapReduce programs to Spark using Scala and Python
  • Developed Scala scripts, UDFFs using both Data frames in Spark 1.6 for Data Aggregation, queries and writing data back into OLTP system through Sqoop.
  • Implemented Spark SQL to connect to Hive to read the data and distributed processing to make highly scalable
  • Analyze the tweets json data using hive SerDe API to deserialize and convert into readable format
  • Processed application Weblogs using flume and load them into Hive for analyzing the data
  • Implemented RESTful Web Services to interact with Oracle/Cassandra to store/retrieve the data.
  • Generated detailed design documentation for the source-to-target transformations.
  • Involved in planning process of iterations under the Agile Scrum methodology.

Environment: Hadoop, HDFS, Kerberos, Confidential Sentry, MapReduce, Hive, Pig, HBase, Sqoop, Spark, Oozie, Zookeeper, AWS, RDBMS/DB, MySQL, CSV, AVRO data files

We'd love your feedback!