We provide IT Staff Augmentation Services!

Hadoop Developer Resume

3.00/5 (Submit Your Rating)

Atlanta, GA

SUMMARY

  • Hadoop &Spark Developer and analyst with over 6+years of overall experience as software developer in design, development, deploying and large scale supporting large scale distributed systems.
  • 6+years of extensive experience as Hadoop and spark engineer and Big Data analyst.
  • DataStax Cassandraand IBM Big Data University certified.
  • Implemented various algorithms for analytics using Cassandra with Spark and Scala.
  • Excellent understanding ofHadoop architectureand underlying framework includingstorage management.
  • Haveexperience ininstalling,configuringandadministratingHadoop cluster for major Hadoop distributions likeCDH4, and CDH5.
  • Expertise in using various Hadoop infrastructures such asMap Reduce, Pig, Hive, Zookeeper, Hbase, Sqoop, Oozie, Flume, Drillandsparkfor data storage and analysis.
  • Experience in developingcustomUDFsfor Pig and Hive to incorporate methods and functionality of Python/Java intoPig LatinandHQL(HiveQL) and Used UDFs from Piggybank UDF Repository.
  • Experienced in running query - usingImpalaand used BI tools to run ad-hoc queries directly on Hadoop.
  • Good experience inOozieFramework and Automating daily import jobs.
  • Experienced in managing Hadoop clusters and services usingClouderaManager.
  • Experienced in troubleshooting errors in Hbase Shell/API, Pig, Hive and map Reduce.
  • Highly experienced in importing and exporting data betweenHDFSandRelational Database Management systemsusingSqoop.
  • Experienced in Creating Vizboards for data visualization inPlatforafor real - time dashboard on Hadoop.
  • Experience in Object Oriented Analysis DesignOOADand development of software using UML Methodology good knowledge of J2EE design patterns and Core Java design patterns.
  • Experience in managing Hadoop clusters usingCloudera Manager tool.
  • Very good experience in complete project life cycle design development testing and implementation of Client Server andWeb applications.
  • Experience in Administering Installation configuration troubleshooting Security Backup Performance Monitoring and Fine-tuning of Linux Redhat.
  • Extensive experience working inOracle DB2 SQLServer andMy SQLdatabase.
  • Hands on experience inVPN Putty winSCP VNCvieweretc.
  • Scripting to deploy monitors checks and critical system admin functions automation.
  • Hands on experience in application development using Java RDBMS and Linux shell scripting.
  • Experience inJava JSP Servlets EJB WebLogic WebSphere Hibernate Spring JBoss JDBC RMI Java Script Ajax Jquery XML and HTML
  • Ability to adapt to evolving technology strong sense of responsibility and accomplishment.

TECHNICAL SKILLS

Programming Languages: Hadoop/ Big Data Stack, C#.NET, VB.NET, JavaScript, JQuery.

Databases: SQL Server 2012/2008/2005 , MS ACCESS

IDE: Visual Studio 2010/2008/2005 ,Sharepoint Online, SharePoint,2013/2010,MOSS 2007

DevOps: Azure, GitHub, GitLab, GitBash, Jenkins, SonarQube, Docker, Kubernetes.

3rd Party Toolkits: K2 Black Pearl, Smart Forms, Vignette, ALUI (Aqua Logic User Interface.)

Reporting Tools: SQL Server Reporting Services

Hadoop Technologies: Hadoop Ecosystem \ Database Hadoop 2.8.4+, Spark 2.0.0+, MapReduce\ MySQL 5.X, SQL Server Oracle 11g, HBase HDFS, Kafka 0.11.0.1+, Hive 2.1.0+, HBase \ 1.2.3+, Cassandra 3.11. 1.4.4 +, Sqoop 1.99.7+, Pig 0.17, Flume 1.6.0+, Keras 2.2.4.

PROFESSIONAL EXPERIENCE:

Confidential

Hadoop Developer

Responsibilities:

  • Worked on analyzing Hadoop stack and different big data analytic tools including Pig and Hive, HBase database and Sqoop.
  • Written multipleMapReduceprograms to extract data for extraction, transformation and aggregation from more TEMPthan 20 sources having multiple file formats includingXML, JSON, CSV&othercompressedfile formats.
  • ImplementedSparkCoreinScalato process data in memory.
  • Performed job functions usingSpark API’sinScalaforreal time analysis and for fast querying purposes.
  • Involved in creatingSparkapplicationsinScalausing cache, map, reduceByKey etc. functions to process data.
  • CreatedOozieworkflowsforHadoopbased jobs includingSqoop,HiveandPig.
  • CreatedHive External tablesand loaded teh data in to tables and query data usingHQL.
  • Performed data validation on teh data ingested usingMapReduceby building a custom model to filter all teh invalid data and cleanse teh data.
  • Handled teh importing of data from various data sources, performed transformations using hive,Map-Reduce, loaded data intoHDFSand extracted data fromMySQLintoHDFSusingSqoop.
  • WroteHiveQLqueries by configuring number ofreducersandmappersin teh query needed for teh output.
  • Transferred data betweenPig ScriptsandHiveusingHCatalog, transferred relational database usingSqoop.
  • Configured and Maintained different topologies in Storm cluster and deployed them on regular basis.
  • Responsible for building scalable distributeddata solutionsusingHadoop.Installed and configuredHive,Pig,Oozie, andSqooponHadoopcluster.
  • Developed simple tocomplexMap-Reduce jobs using Java programming language dat was implemented usingHiveandPig.
  • Ran many performance tests using teh Cassandra -stress tool in order to measure and improve teh read and write performance of teh cluster
  • Configuring teh Kafka, Storm and Hive to get and load teh real time messaging.
  • SupportedMapReducePrograms dat are running on teh cluster.Cluster monitoring, maintenance and troubleshooting.
  • Analysed teh data by performingHivequeries (HiveQL) and runningPig Scripts(Pig Latin).
  • Cluster coordination services throughZookeeper.Installed and configuredHiveand also writtenHive UDFs.
  • Worked on teh Analytics Infrastructure team to develop a stream filtering system on top ofApache Kafka and Storm.
  • Worked on a POC on Spark and Scala parallel processing. Real streaming teh data using Spark with Kafka.

Environment: Hadoop, Spark, HDFS, Hive, Pig, HBase, Big Data, Apache Storm, Oozie, Sqoop, Kafka, Flume, Zookeeper, MapReduce, Cassandra, Scala, Linux, NoSQL, MySQL Workbench, Java, Eclipse, Oracle 10g, SQL.

Confidential

Hadoop Developer

Responsibilities:

  • Implemented a generic ETL framework withhigh availabilityfor bringing related data for Hadoop & Cassandra from various sources using spark.
  • Experienced in usingPlatforaa data visualization tool specific for Hadoop, and created various Lens and Viz boards for a real-time visualization from hive tables.
  • Queried and analyzed data fromCassandrafor quick searching, sorting and grouping throughCQL.
  • Implemented various Data Modeling techniques forCassandra.
  • Joined various tables in Cassandra usingspark and Scalaand ran analytics on top of them.
  • Participated in various upgradations and troubleshooting activities across enterprise.
  • Knowledge in performancetroubleshooting and tuningHadoop clusters.
  • AppliedSparkadvanced procedures liketext analytics and processingusing thein-memoryprocessing.
  • Implemented ApacheDrillon Hadoop to join data from SQL and No SQL databases and store it in Hadoop.
  • Created architecture stack blueprint for data access with NoSQL DatabaseCassandra;
  • Brought data from various sources in to Hadoop and Cassandra usingKafka.
  • Experienced in usingTidal enterprise scheduler and OozieOperational Services for coordinating teh cluster and scheduling workflows.
  • Applied spark streaming for real time data transforming.
  • Created multiple dashboards in tableau for multiple business needs.
  • Installed and configured Hive and written Hive UDFs and used piggy bank a repository of UDF’s for Pig Latin.
  • Implemented Partitioning, Dynamic Partitions and Buckets inHIVEfor efficient data access.
  • Exported teh analyzed data to teh relational databases using Sqoop for visualization and to generate reports for teh BI team UsingTableau.
  • ImplementedCompositeserver for thedata virtualizationneeds and created multiples views for restricted data access using a REST API.
  • Devised and lead teh implementation of next generation architecture for more efficient data ingestion and processing.
  • Created and implemented variousshell scriptsfor automating teh jobs.
  • ImplementedApache Sentryto restrict teh access on teh hive tables on a group level.
  • EmployedAVROformat for teh entire data ingestion for faster operation and less space utilization.
  • Experienced in managing and reviewing Hadoop log files.
  • Worked inAgile environment,and used rally tool to maintain teh user stories and tasks.
  • Worked withEnterprise data supportteams to install Hadoop updates, patches, version upgrades as required and fixed problems, which raised after teh upgrades.
  • Implemented test scripts to support test-driven development and continuous integration.
  • Used Spark for Parallel data processing and better performances.

Environment: MapR 5.0.1, Map Reduce, HDFS, Hive, pig, Impala, Cassandra 5.04, spark, Scala, Solr, Java, SQL, Tableau, PIG, Zookeeper, Sqoop, Teradata, CentOS, Pentaho.

Confidential, Atlanta, GA

Hadoop Developer

Responsibilities:

  • Handled importing of data from various data sources, performed data transformations using HAWQ, Map Reduce.
  • Analysed teh web log data using theHiveQL.
  • Developed hive queries on data logs to perform a trend analysis of user behavior on various online modules.
  • Developed thePig UDF'Sto pre-process teh data for analysis.
  • Involved in teh setup and deployment of Hadoop cluster.
  • Developed Map Reduce programs for some refined queries on big data.
  • Involved in loading data fromUNIXfile system toHDFS.
  • Loaded data intoHDFSand extracted teh data fromMySQL into HDFS using Sqoop.
  • Exported teh analyzed data to teh relational databases usingSqoopand generated reports for teh BI team.
  • Managing and scheduling jobs on a Hadoop cluster usingOozie.
  • Along with teh Infrastructure team, involved in design and developedKafka and Stormbased data pipeline.
  • Used Test driven approach for developing teh application and Implemented teh unit tests using Python Unit test framework.
  • Developed storm-monitoring bolt for validating pump tag values against high-low and
  • High high - low low values from preloaded metadata.
  • Designed and configuredKafkacluster to accommodate heavy throughput of 1 million messages per second. UsedKafka producer0.8.3 API's to produce messages.
  • Installed, Configured Talend ETL on single and multi-server environments.
  • Troubleshooting, debugging & fixing Talend specific issues, while maintaining and performance of teh ETL environment.
  • Developed Merge jobs inPythonto extract and load data into MySQL database.
  • Created and modified several UNIX shell Scripts according to teh changing needs of teh project and client requirements. DevelopedUNIXshellscriptsto call Oracle PL/SQL packages and contributed to standard framework.

Environment: Hortonworks Hadoop 2.0, EMP, PySpark, Cloud Infrastructure (Amazon AWS), JAVA, Python, HBase, Hadoop Ecosystem, Linux, Scala.

Linux Engineer

Confidential

Responsibilities:

  • Managed and administrated of all UNIX servers, includesLinux operating systems by applying relative patches and packages at regular maintenance periods using Red Hat Satellite server, YUM, RPM tools.
  • Planned and performed teh upgrades toLinux (RHEL 5x, 6x, SUSE 10, 11, CENTOS 5, 6, operating systems and hardware maintenance like changing memory modules, replacing disk drives.
  • Handling NFS, Auto Mount, DNS, LDAP related issues.
  • Monitoring CPU, memory, physical disk, Hardware and Software RAID, multipath, file systems, network using teh tools NAGIOS4.0 monitoring.
  • Performing failover and integrity test on new servers before rolling out to production.
  • Planned, scheduled and Implemented OS patches onLinux boxes as a part of proactive maintenance.
  • Identify, troubleshoot, and resolve problems with teh OS build failures.
  • Used Chef for managing server application server such as Apache, MySQL, and Tomcat.
  • Installation, configuration, and customization of services Sendmail, Apache, FTP servers to meet teh user needs and requirements.
  • Performing kernel and database configuration optimization such dat it limits I/O resource utilization on disks.

We'd love your feedback!