We provide IT Staff Augmentation Services!

Hadoop Spark Developer Resume

5.00/5 (Submit Your Rating)

Dallas, TexaS

SUMMARY

  • Close to 6 years of Professional experience in IT Industry in Developing, Implementing, configuring, Java, J2EE, Big Data Technologies, working knowledge in Hadoop Ecosystem its stack including big data analytics and expertise in application Design and Development in various domains wif an emphasis on Data warehousing tools using industry accepted methodologies .
  • Experienced Hadoop Developer, have a strong background wif file distribution systems in a big - data arena.
  • Understands teh complex processing needs of big data and have experience developing codes and modules to address those needs.
  • Extensive work experience in teh areas of Banking, Finance, Insurance and Marketing Industries.
  • Familiar wif data architecture including data ingestion pipeline design, Hadoop information architecture, data modelling and data mining, machine learning and advanced data processing.
  • Log data Stored in HBase DB is processed and analyzed and then imported into Hive warehouse, which enabled end business analyst to write HQL queries.
  • Real time experience in Hadoop/Big Data related technology experience in Storage, Querying, Processing and analysis of data.
  • Excellent knowledge on Hadoop Architecture and ecosystems such as HDFS, Hive, Pig, Sqoop, Job Tracker, Task Tracker, Name Node, Data.
  • Expertise in writing Hadoop Jobs for analyzing data using MapReduce, Hive &Pig.
  • Knowledge in installing, configuring, and using Hadoop ecosystem components like Hadoop Map Reduce, HDFS, HBase, Oozie, Hive, Sqoop, Pig, spark, kafka, storm, Zookeeper and Flume.
  • Experience in managing and reviewing Hadoop log files.
  • Experience in analyzing data using HiveQL, Pig Latin, HBase and custom Map Reduce programs in Java.
  • Experience in importing and exporting data using Sqoop from HDFS to RDBMS and vice-versa.
  • Experienced in extending Hive and Pig core functionality by writing custom UDFs using Java.
  • Experience in building, maintaining multiple Hadoop clusters of different sizes and configuration and setting up teh rack topology for large clusters.
  • Experience in NoSQL databases such as HBase and Cassandra.
  • Experienced in job workflow scheduling tool like Oozie and in managing Hadoop cluster using Cloudera Manager Tool.
  • Implemented a secured distributed systems network using Algorithm programming
  • Experience in performance tuning by identifying teh bottle necks in sources, mappings, targets and Partitioning.
  • Wrote content explaining installation, configuration, and administration of core Data Platform (HDP) Hadoop components (YARN, HDFS) and other Hadoop components.
  • Experience in Object Oriented Analysis, Design and development of software using UML Methodology.
  • Ability to adapt to evolving technology, strong sense of responsibility and accomplishment.
  • Excellent interpersonal and communication skills, creative, research-minded, technically competent and result-oriented wif problem solving and leadership skills.

PROFESSIONAL EXPERIENCE

HADOOP SPARK DEVELOPER

Confidential, Dallas, Texas

Responsibilities:

  • Written multipleMapReduceprograms to extract data for extraction, transformation and aggregation from more than 20 sources having multiple file formats includingXML, JSON, CSV&othercompressedfile formats.
  • ImplementedSparkCoreinScalato process data in memory.
  • Performed job functions usingSpark API’sinScalaforreal time analysis and for fast querying purposes.
  • Involved in creatingSparkapplicationsinScalausing cache, map, reduceByKey etc. functions to process data.
  • CreatedOozieworkflowsforHadoopbased jobs includingSqoop,HiveandPig.
  • CreatedHive External tablesand loaded teh data in to tables and query data usingHQL.
  • Performed data validation on teh data ingested usingMapReduceby building a custom model to filter all teh invalid data and cleanse teh data.
  • Handled teh importing of data from various data sources, performed transformations using hive,Map-Reduce, loaded data intoHDFSand extracted data fromMySQLintoHDFSusingSqoop.
  • WroteHiveQLqueries by configuring number ofreducersandmappersin teh query needed for teh output.
  • Transferred data betweenPig ScriptsandHiveusingHCatalog, transferred relational database usingSqoop.
  • Configured and Maintained different topologies in Storm cluster and deployed them on regular basis.
  • Responsible for building scalable distributeddata solutionsusingHadoop.Installed and configuredHive,Pig,Oozie, andSqooponHadoopcluster.
  • Developed simple tocomplexMap-Reduce jobs using Java programming language dat was implemented usingHiveandPig.
  • Ran many performance tests using teh Cassandra -stress tool in order to measure and improve teh read and write performance of teh cluster
  • Configuring teh Kafka, Storm and Hive to get and load teh real time messaging.
  • SupportedMapReducePrograms dat are running on teh cluster.Cluster monitoring, maintenance and troubleshooting.
  • Analyzed teh data by performingHivequeries (HiveQL) and runningPig Scripts(Pig Latin).
  • Cluster coordination services throughZookeeper.Installed and configuredHiveand also writtenHive UDFs.
  • Worked on teh Analytics Infrastructure team to develop a stream filtering system on top ofApache Kafka and Storm

Environment: Hadoop, Java, MapReduce, HDFS, AWS, Amazon S3, Hive, Linux, XML, Eclipse, Cloudera, CDH4/5 Distribution, Spark, Scala, HBase, MongoDB, Python, GitHub, Sqoop, Oozie.

HADOOP DEVELOPER

Confidential, New York

Responsibilities:

  • Performing all phases of software engineering including requirements analysis, design, and code development and testing.
  • Designing and implement product features in collaboration wif business and IT stakeholders.
  • Working very closely wif Architecture group and driving solutions.
  • Design and develop innovative solutions to meet teh needs of teh business and interacts wif business partners and key contacts.
  • Implement teh data management Framework for building Data Lake for Optum.
  • Support teh implementation and drive it to stable state in production.
  • Provide alternate design solutions along wif project estimates.
  • Reviewing code and providing feedback relative to best practices, improving performance etc.
  • Troubleshooting production support issues post-deployment and come up wif solutions as required.
  • Demonstrate substantial depth of knowledge and experience in a specific area of Big Data and development.
  • ImplementedSparkusingScalaand Spark for faster testing and processing of data.
  • Built re-usable Hive UDF libraries which enabled various business analysts to use these UDF's in Hive querying.
  • Worked on teh backend usingScalaandSparkto perform several aggregation logics.
  • Worked on implementing hive-HBase integration by creating hive external tables and using HBase storage handler.
  • Generated Java APIs for retrieval and analysis on No-SQL database such as HBase.
  • Drive teh team and collaborate to meet project timelines.
  • Worked on expertise wif Big data technologies(HBASE, HIVE, MAPR PIG and Talend).
  • Hadoop, Cloudera CDH 4.5, HDFS, PIG Scripting, Hive, Map Reduce, Sqoop, Flume, Oozie, Spark, Autosys, Unix scripting, Tableau, Talend Big data ETL.
  • Designed and implemented Spark test bench application to evaluate quality of recommendations made by teh engine.
  • UsedHiveto analyze teh partitioned and bucketed data and compute various metrics.
  • Created and Implemented highly scalable and reliable highly scalable and reliable distributed data design using NoSQL HBase.
  • Demonstrated expertise in Java programs Frameworks in an Agile/Scrum methodology.
  • Bachelor’s degree or equivalent experience in a related field.
  • Probably Unix and Kafka me can go a little light but teh others are wat we are actually using teh project.
  • Intake happens through Sqoop and Ingestion happens through Map Reduce, HBASE.
  • Hive registration happens and query exposed for Business and Analysts.
  • Teh cluster is on Map. All functions, transformations are written on Pig.
  • Teh complete process is synchronized by Talend teh individual stages are called from Talend Workflow.
  • Post Enrichments, teh final copy is exposed to Spark SQL for end users to query.
  • They need to get data in near real time; previously they tried CDC, now they are exploring Kafka to pull data as frequently as possible.

Environment: Hadoop, MapReduce, Hive, PIG, Sqoop, Python, Spark, Spark-Streaming, Spark SQL, AWS EMR, AWS S3, AWS Redshift, Python, Scala, Spark, Map, Java, Oozie, Flume, HBase

We'd love your feedback!