Hadoop Spark Developer Resume , Atlanta, GA - Hire IT People

PROFESSIONAL SUMMARY:

8+ years of experience in IT, this includes Analysis, Design, Coding, Testing, Implementation and in Java and Big Data Technologies working with Confidential Hadoop Eco - components
Extensive participation in analysis activities, assisting in troubleshooting architectural problems and providing technical solutions to meet business requirements
Extensive Experience on major components in Hadoop Ecosystem like Hadoop Map Reduce, HDFS, Hive, PIG, HBase, Zookeeper, Sqoop, Oozie, Flume, Storm, Yarn, Spark, Scala.
Experienced in Real Time Streaming applications using Kafka, Flume, Storm and Spark Streaming
Capturing data from existing databases that provide SQL interfaces using Sqoop.
Expertise in Kerberos Security Implementation
Experience in analyzing data using HiveQL, Pig Latin, HBase& custom Map Reduce programs.
Good Knowledge in creating event-processing data pipelines using flume, Kafka and Storm.
Good understanding & experience on Hadoop Distributions like Cloudera & Hortonworks.
Used Spark-Streaming APIs to perform necessary transformations and actions on the fly for building the common learner data model which gets the data in near real time
Hands on working experience in Linux environment with Confidential Tomcat. Used UML to design class diagrams for object-oriented analysis and designing.
Expertise in data transformation & analysis using SPARK, PIG, HIVE
Good understanding & knowledge of NOSQL databases like HBase, Cassandra and Mongo DB.
Experience with ETL and Query big data tools like Pig Latin and Hive QL.
Expertise in writing Hadoop Jobs for analyzing data using Spark, Hive, Pig MapReduce, Hive.
Good understanding of HDFS Designs, Daemons, HDFS High Availability (HA).
Hands on experience in AVRO and Parquet file format, Dynamic Partitions, Bucketing for best Practice and Performance improvement.
Experience in database design using Stored Procedure, Functions, Triggers and strong experience in writing complex queries for DB2, SQL Server
Developed Spark SQL programs for handling different data sets for better performance.
Good knowledge of creating event-processing data using Spark Streaming.
Experience in building web services using both SOAP and RESTful services in Java.
Experience in configuration, deployment and management of enterprise applications on Application Servers like Web Sphere, JBoss and Web Servers like Confidential Tomcat.
Experience in performing Unit testing using Junit and TestNG.
Extensive experience in documenting requirements, functional specifications, technical specifications.

TECHNICAL SKILLS:

Hadoop/Big Data: HDFS Map Reduce, Spark Core, Spark Streaming, Spark SQL, Hive, Tez, Pig, Sqoop, Flume, Kafka, Oozie, NiFi and ZooKeeper, Docker .
AWS Components: EC2, S3, RDS, Redshirt, EMR, DynamoDB, Lambda, RDS, SNS, SQS
No SQL Databases: HBase, Cassandra, MongoDBLanguages: C, C++, Java, Scala, J2EE, Python, PL/SQL, Pig Latin, HiveQL, UNIX shell scripts
Java/J2EE Technologies: Applets, Swing, JDBC, JNDI, JSON, JSTL, RMI, JMS, Java Script, JSP, Servlets
EJB, JSF, JQuery Frameworks: MVC, Struts, Spring, Hibernate
Operating Systems: Sun Solaris, HP - UNIX, RedHat Linux, Ubuntu Linux and Windows XP/Vista/7/8
Web Technologies: HTML, DHTML, XML, AJAX, WSDL, SOAP
Web/Application servers: Confidential Tomcat, WebLogic, JBoss
Databases: Oracle 9i/10g/11g, DB2, SQL Server, MySQL, Teradata
Tools: and IDE: Eclipse, NetBeans, Toad, Maven, SBT, ANT, Hudson, Sonar, JDeveloper, Assent PMD, DBVisualizer
Network Protocols: TCP/IP, UDP, HTTP, DNS, DHCP

PROFESSIONAL EXPERIENCE:

Confidential, Atlanta, GA

Hadoop Spark Developer

Responsibilities:

Importing data from data sources Kafka, Sqoop, Megatron, Janus, PETL to our legacy databases Teradata and Hadoop.
Worked on importing and exporting data into HDFS and Hive using Sqoop, built analytics on Hive tables using Hive Context Responsible for Loading the Customers Data from SAS to MSSQL2016 and perform data massaging, mining & cleansing then export to HDFS and Hive using Sqoop
Writing PIGscripts to process the Credit Card and Debit Card Transactions for Active customers by joining the data from HDFS and Hive using HCatalog for various merchants
Written Python UDFs to process the RegEx and return the valid Merchant codes and names using streaming
Written Java UDFs to convert to upper case of card names & process dates to suitable format in PIG & Hive
Responsible for creating data pipeline using Kafka, Flume and Spark Streaming for Twitter source to collect the sentiment tweets of Target customers about the reviews
Implemented Kerberos Security Implementation
Complete caring of Hive and Spark tuning with partitioning/bucketing of ORC and executors/drivers memory
Written Hive UDFs to extract data from staging tables
Analyzed the web log data using the HiveQL and process through Flume
Executed queries using Hive and developed Map-Reduce jobs to analyze data.
Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, HBase NoSQL database and Sqoop
Experienced in performance tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning.
Experienced in handling large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark, Effective & efficient Joins, Transformations and other during ingestion process itself.
Created Hive tables and load the data using Sqoop and worked on them using Hive QL
Responsible for developing custom UDFs, UDAFs and UDTFs in Pig and Hive.
Optimizing the Hive Queries using the various file formats like JSON, AVRO, ORC, and Parquet
Used Spark-Streaming APIs to perform necessary transformations and actions on the fly for building the common word2vec data model, which gets the data from Kafka in near real time and Persists into Cassandra.
Operating the cluster on AWS by using EC2, EMR, S3 and ElasticSearch.
Configured deployed and maintained multi-node Dev and Test Kafka Clusters.
Developed Spark scripts by using Scala shell commands as per the requirement.
Migrated existing MapReduce programs to Spark using Scala and Python
Developed Scala scripts, UDFFs using both Data frames in Spark 1.6 for Data Aggregation, queries and writing data back into OLTP system through Sqoop.
Implemented Spark SQL to connect to Hive to read the data and distributed processing to make highly scalable
Analyze the tweets json data using hive SerDe API to deserialize and convert into readable format
Processed application Weblogs using flume and load them into Hive for analyzing the data
Implemented RESTful Web Services to interact with Oracle/Cassandra to store/retrieve the data.
Generated detailed design documentation for the source-to-target transformations.
Involved in planning process of iterations under the Agile Scrum methodology.

Environment: Hadoop, HDFS, Kerberos, Confidential Sentry, MapReduce, Hive, Pig, HBase, Sqoop, Spark, Oozie, Zookeeper, AWS, RDBMS/DB, MySQL, CSV, AVRO data files

We provide IT Staff Augmentation Services!

Hadoop Spark Developer Resume

Atlanta, GA

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship