BIGDATA AND SPARK DEVELOPER Resume Dallas, TX - Hire IT People

SUMMARY:

Around 7+ years of IT experience in Analysis, design, development, implementation, maintenance and support with experience in developing strategic methods for deploying big data technologies to efficiently solve Big Data processing requirement.
Around 5 years of experience on BIG DATA using HADOOP framework and related technologies such as HDFS, HBASE, MapReduce, Spark, Hbase, HIVE, PIG, FLUME, OOZIE, SQOOP, and ZOOKEEPER.
Experience in data analysis using HIVE, PIG LATIN, HBASE and custom Map Reduce programs in Java.
Experience in writing custom UDFs in JAVA and SCALA for HIVE and PIG TO EXTEND THE FUNCTIONALITY.
Experience with Cloudera and Horton works distributions.
Over 3 - years experience on SPARK, SCALA, HBASE and KAFKA.
Developed analytical components using KAFKA, SCALA, SPARK, HBASE and SPARK STREAM.
Experience in working with Flume to load the log data from multiple sources directly into HDFS.
Pretty Good knowledge On the Hortonworks administration and security things such as Apache Ranger, Knox Gateway, High Availability.
Performed Hadoop backup Strategy to take the backup of hive, HDFS, Hbase, oozie etc.
Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems (RDBMS) and from RDBMS to HDFS.
Involved in creating HDINSIGHT cluster in Confidential AZURE PORTAL also created EVENTSHUB and AZURE SQL DATABASES.
Worked on a clustered Hadoop for Windows Azure using HDInsight and HORTONWORKS Data Platform for Windows.
Built real time pipeline for streaming data using EVENTSHUB/ Confidential AZURE Queue and SPARK STREAMING.
Loaded the aggregated data into Hbase for reporting purpose
Read the data from Hbase to Spark to perform Join on different tables.
Created the Hbase tables for validation, audit and offset management table.
Created logical view instead of tables in order to enhance the performance of hive queries.
Involved in developing Hive DDLS to create, alter and drop Hive tables
Pretty Good Knowledge on hive Optimization techniques like Vectorization and column-based optimization.
Written oozie workflow to invoke the Jobs in predefined Interval.
Expert in scheduling Oozie coordinator based on input data events it starts Oozie workflow when input data is available.
On Other Hand working on POC with Kafka and NIFI to pull the real-time events into Hadoop Box.
Exploring with the Spark for improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, SPARK-SQL, DATA FRAME, PAIR RDD'S and YARN.
Experienced in managing Hadoop Cluster using HORTONWORKS AMBARI.

RELEVANT EXPERIENCE:

BIGDATA AND SPARK DEVELOPER

Confidential, Dallas, TX

Responsibilities:

Developed framework to encrypt sensitive data (SSN, Account number ...etc.) in all kinds of datasets and moved datasets one S3 bucket to another.
Processed datasets like Text, Parquet, Avro, Fixed Width, Zip, Gz, JSON and XML.
Developed framework to check data quality of datasets, schema defined in cloud. worked on Amazon Web service(AWS) to integrate EMR with Spark 2 and S3 storage and Snowflake
Configured Spark streaming to receive real time data from the Kafka and store the stream data into AWS S3 using Scala.
Experienced in handling large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark, Effective & efficient Joins, Transformations and other during ingestion process itself.
Enhanced and optimized product Spark code to aggregate, group and run data mining tasks using the Spark framework.
Used Spark - Streaming APIs to perform required transformations and actions on the learner data model which gets the data from Kafka in near real time.
Worked on migrating Map Reduce programs into Spark transformations using Spark and Used File Broker to schedule workflows to run Spark jobs to transform data on a persistent schedule.
Experience developing, deploying Shell Scripts for automation/notification/monitoring.
Extensively used Apache Kafka, Apache Spark, HDFS and Apache Impala to build a near real time data pipelines that get, transform, store and analyze click stream data to provide a better personalized user experience.
Worked on Performance tuning on Spark Application.
Worked with Apache Spark SQL and data frame functions to perform data transformations and aggregations on complex semi structured data.
Hands on experience in creating RDDs, transformations and actions while implementing Spark applications.

Environment: AWS, SPARK, HIVE, SPARK SQL, KAFKA, EMR, SNOWFLAKE, NEBULA,HIVEPYTHON, SCALA, MAVEN, JUPYTER NOTEBOOK, VISUAL STUDIO, UNIX SHELL SCRIPTING.

SPARK DEVELOPER (AWS and KAFKA)

Confidential, Plano, Texas

Responsibilities:

Developed data pipeline using EVENTHUBS, SPARK, HIVE, PIG AND AZURE SQL DATABASE to ingest customer behavioral data and financial histories into HDINSIGHT cluster for analysis.
Involved in creating HDINSIGHT cluster in Confidential AZURE PORTAL also created EVENTSHUB and AZURE SQL DATABASES.
Worked on a clustered Hadoop for Windows Azure using HDInsight and HORTONWORKS Data Platform for Windows.
Spark Streaming collects this data from EVENTSHUB in near-real-time and performs necessary transformations and AGGREGATION on the fly to build the common learner data model and persists the data in AZURE DATABASE.
Used PIG to do transformations, event joins, filter boot traffic and SOME PRE-AGGREGATIONS before storing the data onto azure database.
Expertise with the tools in Hadoop Ecosystem including PIG, HIVE, HDFS, YARN, OOZIE, AND ZOOKEEPER. Hadoop architecture and its components.
Involved in integration of Hadoop cluster with spark engine to perform BATCH and GRAPHX operations.
Exploring with the SPARK improving the performance and optimization of the existing algorithms in Hadoop using SPARK CONTEXT, SPARK-SQL, DATA FRAME, PAIR RDD'S, SPARK YARN.
I have been experienced with SPARK STREAMING to ingest data into SPARK ENGINE.
Import the data from different sources like EVENTHUBS, COSMOS into SPARK RDD.
Developed SPARK CODE using SCALA and Spark-SQL/Streaming for faster testing and processing of data.
Involved in converting Hive/SQL queries into SPARK TRANSFORMATIONS using Spark RDDs, and SCALA.
Developed multiple POCs using SCALA and deployed on the Yarn cluster, compared the performance of Spark, with Hive and SQL/Teradata.
Worked on the SPARK SQL and SPARK STREAMING modules of Spark extensively and Used SCALA to write code for all Spark use cases.
Used DATAFRAME API in Scala for converting the distributed collection of data organized into named columns.
Involved in converting the JSON data into DATAFRAME and stored into hive tables.
Experienced with AZCOPY, LIVY, WINDOWS POWERSHELL and CURL to submit the spark jobs on HDINSIGHT CLUSTER.
Analyzed the SQL scripts and designed the solution to implement USING SCALA.

Environment: AZURE, SPARK, HIVE, SPARK SQL, KAFKA, HORTON WORKS, JBOSS DROOLS, HIVE, PIG, OOZIE, HBASE,PYTHON, SCALA, MAVEN, JUPYTER NOTEBOOK, VISUAL STUDIO, UNIX SHELL SCRIPTING.

BIGDATA DEVELOPER

Confidential

Responsibilities:

Importing and exporting data into HDFS and Hive using Sqoop.
Used Bash Shell Scripting, Sqoop, AVRO, Hive, Pig, Java, Map/Reduce daily to develop ETL, batch processing, and data storage functionality.
Used Pig to do data transformations, event join sand some pre-aggregations before storing the data on the HDFS.
Exploited Hadoop MySQL-Connector to store Map Reduce results in RDBMS.
Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
Worked on loading all tables from the source database schema through Sqoop.
Worked on designed, coded and configured server side J2EE components like JSP, AWS and JAVA.
Collected data from different databases (i.e. Oracle, MySQL) to Hadoop
Used Oozie and Zookeeper for workflow scheduling and monitoring.
Worked on Designing and Developing ETL Workflows using Java for processing data in HDFS/Hbase using Oozie.
Experienced in managing and reviewing Hadoop log files.
Involved in loading and transforming large sets of structured, semi structured and unstructured data from relational databases into HDFS using Sqoop imports.
Working on extracting files from MySQL through Sqoop and placed in HDFS and processed.
Supported Map Reduce Programs those running on the cluster.
Cluster coordination services through Zoo Keeper.
Involved in loading data from UNIX file system to HDFS.
Created several Hive tables, loaded with data and wrote Hive Queries in order to run internally in MapReduce.
Developed Simple to complex MapReduce Jobs using Hive and Pig.

JAVA DEVELOPER

Confidential

Responsibilities:

Involved in the analysis, design, and development and testing phases of Software Development Life Cycle (SDLC)
Designed and developed framework components, involved in designing MVC pattern using Struts and spring framework.
Responsible for developing Use case, Class diagrams and Sequence diagrams for the modules using UML and Rational Rose.
Developed the Action Classes, Action Form Classes, created JSPs using Struts tag libraries and configured in Struts-config.xml, Web.xml files.
Involved in Deploying and Configuring applications in Web Logic Server.
Used SOAP for exchanging XML based messages.
Used Confidential VISIO for developing Use Case Diagrams, Sequence Diagrams and Class Diagrams in the design phase.
Developed Custom Tags to simplify the JSP code. Designed UI screens using JSP and HTML.
A ctively involved in designing and implementing Factory method, Singleton, MVC and Data Access Object design patterns.
Web services used for sending and getting data from different applications using SOAP messages. Then used DOM XML parser for data retrieval.
Wrote JUNIT test cases for Controller, Service and DAO layer using MOCKITO, DBUNIT.
Developed unit test cases using proprietary framework which is similar to JUNIT.
Used JUnit framework for unit testing of application and ANT to build and deploy the application on WebLogic Server

We provide IT Staff Augmentation Services!

Bigdata And Spark Developer Resume

Dallas, TX

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship