Hadoop Developer Resume
Atlanta, GA
SUMMARY
- Hadoop &Spark Developer and analyst with over 6+years of overall experience as software developer in design, development, deploying and large scale supporting large scale distributed systems.
- 6+years of extensive experience as Hadoop and spark engineer and Big Data analyst.
- DataStax Cassandraand IBM Big Data University certified.
- Implemented various algorithms for analytics using Cassandra with Spark and Scala.
- Excellent understanding ofHadoop architectureand underlying framework includingstorage management.
- Haveexperience ininstalling,configuringandadministratingHadoop cluster for major Hadoop distributions likeCDH4, and CDH5.
- Expertise in using various Hadoop infrastructures such asMap Reduce, Pig, Hive, Zookeeper, Hbase, Sqoop, Oozie, Flume, Drillandsparkfor data storage and analysis.
- Experience in developingcustomUDFsfor Pig and Hive to incorporate methods and functionality of Python/Java intoPig LatinandHQL(HiveQL) and Used UDFs from Piggybank UDF Repository.
- Experienced in running query - usingImpalaand used BI tools to run ad-hoc queries directly on Hadoop.
- Good experience inOozieFramework and Automating daily import jobs.
- Experienced in managing Hadoop clusters and services usingClouderaManager.
- Experienced in troubleshooting errors in Hbase Shell/API, Pig, Hive and map Reduce.
- Highly experienced in importing and exporting data betweenHDFSandRelational Database Management systemsusingSqoop.
- Experienced in Creating Vizboards for data visualization inPlatforafor real - time dashboard on Hadoop.
- Experience in Object Oriented Analysis DesignOOADand development of software using UML Methodology good knowledge of J2EE design patterns and Core Java design patterns.
- Experience in managing Hadoop clusters usingCloudera Manager tool.
- Very good experience in complete project life cycle design development testing and implementation of Client Server andWeb applications.
- Experience in Administering Installation configuration troubleshooting Security Backup Performance Monitoring and Fine-tuning of Linux Redhat.
- Extensive experience working inOracle DB2 SQLServer andMy SQLdatabase.
- Hands on experience inVPN Putty winSCP VNCvieweretc.
- Scripting to deploy monitors checks and critical system admin functions automation.
- Hands on experience in application development using Java RDBMS and Linux shell scripting.
- Experience inJava JSP Servlets EJB WebLogic WebSphere Hibernate Spring JBoss JDBC RMI Java Script Ajax Jquery XML and HTML
- Ability to adapt to evolving technology strong sense of responsibility and accomplishment.
TECHNICAL SKILLS
Programming Languages: Hadoop/ Big Data Stack, C#.NET, VB.NET, JavaScript, JQuery.
Databases: SQL Server 2012/2008/2005 , MS ACCESS
IDE: Visual Studio 2010/2008/2005 ,Sharepoint Online, SharePoint,2013/2010,MOSS 2007
DevOps: Azure, GitHub, GitLab, GitBash, Jenkins, SonarQube, Docker, Kubernetes.
3rd Party Toolkits: K2 Black Pearl, Smart Forms, Vignette, ALUI (Aqua Logic User Interface.)
Reporting Tools: SQL Server Reporting Services
Hadoop Technologies: Hadoop Ecosystem \ Database Hadoop 2.8.4+, Spark 2.0.0+, MapReduce\ MySQL 5.X, SQL Server Oracle 11g, HBase HDFS, Kafka 0.11.0.1+, Hive 2.1.0+, HBase \ 1.2.3+, Cassandra 3.11. 1.4.4 +, Sqoop 1.99.7+, Pig 0.17, Flume 1.6.0+, Keras 2.2.4.
PROFESSIONAL EXPERIENCE:
Confidential
Hadoop Developer
Responsibilities:
- Worked on analyzing Hadoop stack and different big data analytic tools including Pig and Hive, HBase database and Sqoop.
- Written multipleMapReduceprograms to extract data for extraction, transformation and aggregation from more TEMPthan 20 sources having multiple file formats includingXML, JSON, CSV&othercompressedfile formats.
- ImplementedSparkCoreinScalato process data in memory.
- Performed job functions usingSpark API’sinScalaforreal time analysis and for fast querying purposes.
- Involved in creatingSparkapplicationsinScalausing cache, map, reduceByKey etc. functions to process data.
- CreatedOozieworkflowsforHadoopbased jobs includingSqoop,HiveandPig.
- CreatedHive External tablesand loaded teh data in to tables and query data usingHQL.
- Performed data validation on teh data ingested usingMapReduceby building a custom model to filter all teh invalid data and cleanse teh data.
- Handled teh importing of data from various data sources, performed transformations using hive,Map-Reduce, loaded data intoHDFSand extracted data fromMySQLintoHDFSusingSqoop.
- WroteHiveQLqueries by configuring number ofreducersandmappersin teh query needed for teh output.
- Transferred data betweenPig ScriptsandHiveusingHCatalog, transferred relational database usingSqoop.
- Configured and Maintained different topologies in Storm cluster and deployed them on regular basis.
- Responsible for building scalable distributeddata solutionsusingHadoop.Installed and configuredHive,Pig,Oozie, andSqooponHadoopcluster.
- Developed simple tocomplexMap-Reduce jobs using Java programming language dat was implemented usingHiveandPig.
- Ran many performance tests using teh Cassandra -stress tool in order to measure and improve teh read and write performance of teh cluster
- Configuring teh Kafka, Storm and Hive to get and load teh real time messaging.
- SupportedMapReducePrograms dat are running on teh cluster.Cluster monitoring, maintenance and troubleshooting.
- Analysed teh data by performingHivequeries (HiveQL) and runningPig Scripts(Pig Latin).
- Cluster coordination services throughZookeeper.Installed and configuredHiveand also writtenHive UDFs.
- Worked on teh Analytics Infrastructure team to develop a stream filtering system on top ofApache Kafka and Storm.
- Worked on a POC on Spark and Scala parallel processing. Real streaming teh data using Spark with Kafka.
Environment: Hadoop, Spark, HDFS, Hive, Pig, HBase, Big Data, Apache Storm, Oozie, Sqoop, Kafka, Flume, Zookeeper, MapReduce, Cassandra, Scala, Linux, NoSQL, MySQL Workbench, Java, Eclipse, Oracle 10g, SQL.
Confidential
Hadoop Developer
Responsibilities:
- Implemented a generic ETL framework withhigh availabilityfor bringing related data for Hadoop & Cassandra from various sources using spark.
- Experienced in usingPlatforaa data visualization tool specific for Hadoop, and created various Lens and Viz boards for a real-time visualization from hive tables.
- Queried and analyzed data fromCassandrafor quick searching, sorting and grouping throughCQL.
- Implemented various Data Modeling techniques forCassandra.
- Joined various tables in Cassandra usingspark and Scalaand ran analytics on top of them.
- Participated in various upgradations and troubleshooting activities across enterprise.
- Knowledge in performancetroubleshooting and tuningHadoop clusters.
- AppliedSparkadvanced procedures liketext analytics and processingusing thein-memoryprocessing.
- Implemented ApacheDrillon Hadoop to join data from SQL and No SQL databases and store it in Hadoop.
- Created architecture stack blueprint for data access with NoSQL DatabaseCassandra;
- Brought data from various sources in to Hadoop and Cassandra usingKafka.
- Experienced in usingTidal enterprise scheduler and OozieOperational Services for coordinating teh cluster and scheduling workflows.
- Applied spark streaming for real time data transforming.
- Created multiple dashboards in tableau for multiple business needs.
- Installed and configured Hive and written Hive UDFs and used piggy bank a repository of UDF’s for Pig Latin.
- Implemented Partitioning, Dynamic Partitions and Buckets inHIVEfor efficient data access.
- Exported teh analyzed data to teh relational databases using Sqoop for visualization and to generate reports for teh BI team UsingTableau.
- ImplementedCompositeserver for thedata virtualizationneeds and created multiples views for restricted data access using a REST API.
- Devised and lead teh implementation of next generation architecture for more efficient data ingestion and processing.
- Created and implemented variousshell scriptsfor automating teh jobs.
- ImplementedApache Sentryto restrict teh access on teh hive tables on a group level.
- EmployedAVROformat for teh entire data ingestion for faster operation and less space utilization.
- Experienced in managing and reviewing Hadoop log files.
- Worked inAgile environment,and used rally tool to maintain teh user stories and tasks.
- Worked withEnterprise data supportteams to install Hadoop updates, patches, version upgrades as required and fixed problems, which raised after teh upgrades.
- Implemented test scripts to support test-driven development and continuous integration.
- Used Spark for Parallel data processing and better performances.
Environment: MapR 5.0.1, Map Reduce, HDFS, Hive, pig, Impala, Cassandra 5.04, spark, Scala, Solr, Java, SQL, Tableau, PIG, Zookeeper, Sqoop, Teradata, CentOS, Pentaho.
Confidential, Atlanta, GA
Hadoop Developer
Responsibilities:
- Handled importing of data from various data sources, performed data transformations using HAWQ, Map Reduce.
- Analysed teh web log data using theHiveQL.
- Developed hive queries on data logs to perform a trend analysis of user behavior on various online modules.
- Developed thePig UDF'Sto pre-process teh data for analysis.
- Involved in teh setup and deployment of Hadoop cluster.
- Developed Map Reduce programs for some refined queries on big data.
- Involved in loading data fromUNIXfile system toHDFS.
- Loaded data intoHDFSand extracted teh data fromMySQL into HDFS using Sqoop.
- Exported teh analyzed data to teh relational databases usingSqoopand generated reports for teh BI team.
- Managing and scheduling jobs on a Hadoop cluster usingOozie.
- Along with teh Infrastructure team, involved in design and developedKafka and Stormbased data pipeline.
- Used Test driven approach for developing teh application and Implemented teh unit tests using Python Unit test framework.
- Developed storm-monitoring bolt for validating pump tag values against high-low and
- High high - low low values from preloaded metadata.
- Designed and configuredKafkacluster to accommodate heavy throughput of 1 million messages per second. UsedKafka producer0.8.3 API's to produce messages.
- Installed, Configured Talend ETL on single and multi-server environments.
- Troubleshooting, debugging & fixing Talend specific issues, while maintaining and performance of teh ETL environment.
- Developed Merge jobs inPythonto extract and load data into MySQL database.
- Created and modified several UNIX shell Scripts according to teh changing needs of teh project and client requirements. DevelopedUNIXshellscriptsto call Oracle PL/SQL packages and contributed to standard framework.
Environment: Hortonworks Hadoop 2.0, EMP, PySpark, Cloud Infrastructure (Amazon AWS), JAVA, Python, HBase, Hadoop Ecosystem, Linux, Scala.
Linux Engineer
Confidential
Responsibilities:
- Managed and administrated of all UNIX servers, includesLinux operating systems by applying relative patches and packages at regular maintenance periods using Red Hat Satellite server, YUM, RPM tools.
- Planned and performed teh upgrades toLinux (RHEL 5x, 6x, SUSE 10, 11, CENTOS 5, 6, operating systems and hardware maintenance like changing memory modules, replacing disk drives.
- Handling NFS, Auto Mount, DNS, LDAP related issues.
- Monitoring CPU, memory, physical disk, Hardware and Software RAID, multipath, file systems, network using teh tools NAGIOS4.0 monitoring.
- Performing failover and integrity test on new servers before rolling out to production.
- Planned, scheduled and Implemented OS patches onLinux boxes as a part of proactive maintenance.
- Identify, troubleshoot, and resolve problems with teh OS build failures.
- Used Chef for managing server application server such as Apache, MySQL, and Tomcat.
- Installation, configuration, and customization of services Sendmail, Apache, FTP servers to meet teh user needs and requirements.
- Performing kernel and database configuration optimization such dat it limits I/O resource utilization on disks.