We provide IT Staff Augmentation Services!

Spark/hadoop Developer Resume

0/5 (Submit Your Rating)

Austin, TX

SUMMARY

  • Over 6 years of IT experience as a Developer, Designer & quality Tester with cross platform integration experience using Hadoop Ecosystem.
  • In depth experience and good knowledge in using Hadoop ecosystem tools like MapReduce, HDFS, Pig, Hive, Kafka, Yarn, Sqoop, Storm, Spark, Oozie, and Zookeeper.
  • Exceptional understanding and extensive knowledge ofHadooparchitecture and various ecosystem components such asHDFS, Job Tracker, Task Tracker, Name Node, Data Node and MapReduceprogramming paradigm.
  • Good knowledge of Data modeling, use case design and Object - oriented concepts.
  • Well versed in installation, configuration, supporting and managing of Big Data and underlying infrastructure of Hadoop Cluster.
  • Extensively worked onSpark Streaming and Apache Kafkato fetch live stream data.
  • Experience in convertingHive/SQLqueries intoRDDtransformationsusingApache Spark, Scala and Python.
  • Implemented Dynamic Partitions and Buckets inHIVEfor efficient data access.
  • Experience in data processing like collecting, aggregating, moving from various sources using ApacheFlumeandKafka.
  • Involved in integrating hive queries into spark environment usingSparks.
  • Firsthand experience in performing real time analytics on big data usingHBase and Cassandra.
  • Experience in usingFlumeto stream data intoHDFS.
  • Good working experience usingSqoopto import data intoHDFSfromRDBMSand vice-versa.
  • Good knowledge in developing data pipeline usingFlume, Sqoop, and Pigto extract the data fromweblogsand store inHDFS.
  • Created User Defined Functions (UDFs), User Defined Aggregated Functions (UDAFs) in PIG and Hive.
  • Proficient withCluster managementandconfiguringCassandra Database.
  • Good working experience on different file formats (PARQUET, TEXTFILE, AVRO, ORC) and different compression codecs (GZIP, SNAPPY, LZO).
  • Build AWS secured solutions by creating VPC with private and public subnets.
  • Expertise in configuring Relational Database Service.
  • Worked extensively in configuring Auto scaling for high Availability.
  • Experience in using IDEs likeEclipse, NetBeansandIntelliJ.
  • Proficient using version control tools likeGIT.
  • \Development experience in DBMS likeOracle, MS SQL Server, TeradataandMYSQL.
  • Developed stored procedures and queries usingPL/SQL.
  • Firsthand Experience with best practices of Web services development and Integration (bothREST and SOAP).
  • Experience in working with build tools like Maven,SBT, Gradleto build and deploy applications into server.
  • Expertise in completeSoftware Development Life Cycle (SDLC)inWaterfallandAgile, Scrummodels.
  • Excellent communication skills, people skills, problem solving skills and very good team player along with a can-do attitude and ability to effectively communicate with all levels of the organization such as technical, management and customers.

PROFESSIONAL EXPERIENCE

Spark/Hadoop Developer

Confidential, Austin, TX

Responsibilities:

  • Implemented Partitioning, Dynamic Partitions, Buckets in HIVE.
  • Configured Hadoop clusters and coordinated with Big Data Admins for cluster migration.
  • Developed Shell and Python scripts to automate the jobs
  • Created Spark Jobs to extract data from Hive tables and process it.
  • Worked on shell scripts and python scripts to compare the flat files line by line, if comparison fails then stop the processing
  • Created tables in HIVE for data to be imported over
  • Created and tested sqoop jobs on new cluster.
  • Tested python scripts, HQL queries, SQOOP jobs on a new server.
  • Worked extensively on SQL and UNIX shell scripting
  • Worked with multiple application teams to upgrade from Spark 2.0 to Spark 3.0
  • Worked with the VCA team to test hundreds on user queries and verify that it runs on new cluster.
  • Migrated scripts, data tables an so on over from older cluster to a new cluster.
  • Developed scripts, using Spark SQL for Data Aggregation, queries and verified its performance.
  • Importing and exporting data into HDFS and Hive using Sqoop.
  • Involved in creating partitioned Hive tables, and loading and analyzing data using hive queries, Implemented Partitioning and bucketing in Hive.

Hadoop Developer

Confidential, New York

Responsibilities:

  • Responsible for building scalable distributed data solutions using Hadoop.
  • Experienced in loading and transforming of large sets of structured, semi structured, and unstructured data.
  • Developed Spark jobs and Hive Jobs to summarize and transform data.
  • Expertise in implementingSpark Scala application using higher order functions for both batch and interactive analysis requirement.
  • Involved in Create / Modify / Drop Teradata objects like Tables, Views, Join Indexes, Triggers, Macros, Procedures, and Databases.
  • Involved in writing the test cases and documentation. Implemented change data capture (CDC) using Informatica power exchange to load data from clarity DB to Teradata warehouse.
  • Experienced in developing Spark scripts for data analysis in Pythom.
  • Built on - premises data pipelines usingKafkaandsparkfor real time data analysis.
  • Created reports inTABLEAUfor visualization of the data sets created and tested native Drill, Impala and Spark connectors.
  • Analyzed the SQL scripts and designed the solution to implement using Scala.
  • ImplementedHivecomplex UDF's to execute business logic with Hive Queries.
  • Responsible for loading bulk amount of data inHBaseusing MapReduce by directly creating H-files and loading them.
  • Evaluated performance ofSpark SQLvsIMPALAvsDRILLon offline data as a part of poc.
  • Implemented Spark usingPythonand utilizing Data frames and Spark SQL API for faster processing of data.
  • Handled importing data from different data sources into HDFS usingSqoopand performing transformations using Hive, Map Reduce and then loading data into HDFS.
  • Exporting of result set from HIVE to MySQL usingSqoopexport tool for further processing.
  • Collecting and aggregating large amounts of log data using Flume and staging data in HDFS for further analysis.
  • Responsible for developing data pipeline by implementing Kafka producers and consumers.
  • Performed data analysis withHBase.
  • Exported the analyzed data toImpalato generate reports for the BI team.

Environment: Hadoop YARN, Spark 1.6, Spark Streaming, Spark SQL, Python, Kafka, Hive, Sqoop, Impala, Control-M, Java, AWS S3, Linux

Spark/Hadoop Developer

Confidential, New York, NY

Responsibilities:

  • Implemented Spark using Scala and utilizing Data frames and Spark SQL API for faster processing of data
  • Converted existing MapReduce jobs into Spark transformations and actions using Spark RDDs, Data frames and Spark SQL APIs.
  • Developed a data pipeline using Kafka, Spark, and Hive to ingest, transform and analyzing customer behavioral data.
  • Worked on Big Data infrastructure for batch processing as well as real - time processing. Responsible for building scalable distributed data solutions using Hadoop.
  • Developed real time data processing applications by using Scala and Python and implemented Apache Spark Streaming from various streaming sources like Kafka.
  • Developed Spark jobs and Hive Jobs to summarize and transform data
  • Expertise in implementing Spark Scala application using higher order functions for both batch and interactive analysis requirement.
  • Experienced in developing Spark scripts for data analysis in Scala.
  • Used Spark-Streaming APIs to perform necessary transformations.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark SQL and Scala.
  • Worked with spark to consume data from Kafka and convert that to common format using Scala.
  • Worked extensively with importing metadata into Hive and migrated existing tables and applications to work on Hive and Spark.
  • Converted existing MapReduce jobs into Spark transformations and actions using Spark RDDs, Data frames and Spark SQL APIs.
  • Wrote new spark jobs in Scala to analyze the data of the customers and sales history.
  • Involved in requirement analysis, design, coding, and implementation phases of the project.
  • Used Spark API over Hadoop YARN to perform analytics on data in Hive.
  • Experience in both SQL Context and Spark Session.
  • Developed Scala based Spark applications for performing data cleansing, data aggregation, de-normalization and data preparation needed for machine learning and reporting teams to consume.
  • Worked on troubleshooting spark application to make them more error tolerant.
  • Involved in HDFS maintenance and loading of structured and unstructured data and imported data from mainframe dataset to HDFS using Sqoop and written the Spark Script to process the HDFS data.
  • Used Spark API over Hadoop YARN to perform analytics on data in Hive.
  • Extensively worked on the core and Spark SQL modules of Spark.
  • Involved in Spark and Spark Streaming creating RDD's, applying operations -Transformation and Actions.
  • Used Impala to read, write and query the data in HDFS.
  • Worked on troubleshooting spark application to make them more error tolerant.
  • Stored the output files for export onto HDFS and later these files are picked up by downstream systems.
  • Load the data into Spark RDD and do in memory data Computation to generate the Output response.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDD, Scala

Environment: s: Hadoop 2.x, Spark Core, Spark SQL, Spark API Spark Streaming, Hive, Oozie, Amazon EMR, Tableau, Impala, RDBMS, YARN, JIRA, MapReduce.

Tableau Analyst

Confidential, New York, NY

Responsibilities:

  • Worked with business partners and Server - Side BI engineers to rigorously define, implement, and further develop metrics and key performance indicators.
  • Have extensively worked in AGILE environment by participating in daily stand-up meetings, and Sprint planning at the end of every Sprint.
  • Wrote custom SQL queries to optimize the extracts, and heavily relied on data blending as the end workbooks ultimately required multiple fields from different data sources.
  • Developed dashboards analyzing call handing forecasting, call rate, call waiting time, and call forward, average call time using Tableau BI.
  • Generated data extracts in Tableau by connecting to the view using Tableau MySQL connector.
  • Configured data extraction and scheduled refreshes both incremental and complete refresh depending on the data to improve performance of Tableau reports.
  • Built dashboards with floating objects and capitalizing the various dashboard actions features like URL, images, web integrations.
  • Structured the dashboard with a consistent layout with visuals chart at the top and corresponding crosstab data and the bottom.
  • Customized the tooltips and labels, to user understandable format, the numerical measure was changed to its appropriate formats.
  • Involved in major integration with web applications using JavaScript API, REST API.
  • Designed functionality to Consolidates structures into a single tree hierarchy filter.
  • Connected to Tableau Postgre SQL database to create customized interface for Tableau workbooks to provide native style navigation.

We'd love your feedback!