Hadoop Developer Resume Atlanta, GA - Hire IT People

SUMMARY

Hadoop &Spark Developer and analyst with over 6+years of overall experience as software developer in design, development, deploying and large scale supporting large scale distributed systems.
6+years of extensive experience as Hadoop and spark engineer and Big Data analyst.
DataStax Cassandraand IBM Big Data University certified.
Implemented various algorithms for analytics using Cassandra with Spark and Scala.
Excellent understanding ofHadoop architectureand underlying framework includingstorage management.
Haveexperience ininstalling,configuringandadministratingHadoop cluster for major Hadoop distributions likeCDH4, and CDH5.
Expertise in using various Hadoop infrastructures such asMap Reduce, Pig, Hive, Zookeeper, Hbase, Sqoop, Oozie, Flume, Drillandsparkfor data storage and analysis.
Experience in developingcustomUDFsfor Pig and Hive to incorporate methods and functionality of Python/Java intoPig LatinandHQL(HiveQL) and Used UDFs from Piggybank UDF Repository.
Experienced in running query - usingImpalaand used BI tools to run ad-hoc queries directly on Hadoop.
Good experience inOozieFramework and Automating daily import jobs.
Experienced in managing Hadoop clusters and services usingClouderaManager.
Experienced in troubleshooting errors in Hbase Shell/API, Pig, Hive and map Reduce.
Highly experienced in importing and exporting data betweenHDFSandRelational Database Management systemsusingSqoop.
Experienced in Creating Vizboards for data visualization inPlatforafor real - time dashboard on Hadoop.
Experience in Object Oriented Analysis DesignOOADand development of software using UML Methodology good knowledge of J2EE design patterns and Core Java design patterns.
Experience in managing Hadoop clusters usingCloudera Manager tool.
Very good experience in complete project life cycle design development testing and implementation of Client Server andWeb applications.
Experience in Administering Installation configuration troubleshooting Security Backup Performance Monitoring and Fine-tuning of Linux Redhat.
Extensive experience working inOracle DB2 SQLServer andMy SQLdatabase.
Hands on experience inVPN Putty winSCP VNCvieweretc.
Scripting to deploy monitors checks and critical system admin functions automation.
Hands on experience in application development using Java RDBMS and Linux shell scripting.
Experience inJava JSP Servlets EJB WebLogic WebSphere Hibernate Spring JBoss JDBC RMI Java Script Ajax Jquery XML and HTML
Ability to adapt to evolving technology strong sense of responsibility and accomplishment.

TECHNICAL SKILLS

Programming Languages: Hadoop/ Big Data Stack, C#.NET, VB.NET, JavaScript, JQuery.

Databases: SQL Server 2012/2008/2005 , MS ACCESS

IDE: Visual Studio 2010/2008/2005 ,Sharepoint Online, SharePoint,2013/2010,MOSS 2007

DevOps: Azure, GitHub, GitLab, GitBash, Jenkins, SonarQube, Docker, Kubernetes.

3rd Party Toolkits: K2 Black Pearl, Smart Forms, Vignette, ALUI (Aqua Logic User Interface.)

Reporting Tools: SQL Server Reporting Services

Hadoop Technologies: Hadoop Ecosystem \ Database Hadoop 2.8.4+, Spark 2.0.0+, MapReduce\ MySQL 5.X, SQL Server Oracle 11g, HBase HDFS, Kafka 0.11.0.1+, Hive 2.1.0+, HBase \ 1.2.3+, Cassandra 3.11. 1.4.4 +, Sqoop 1.99.7+, Pig 0.17, Flume 1.6.0+, Keras 2.2.4.

PROFESSIONAL EXPERIENCE:

Confidential

Hadoop Developer

Responsibilities:

Worked on analyzing Hadoop stack and different big data analytic tools including Pig and Hive, HBase database and Sqoop.
Written multipleMapReduceprograms to extract data for extraction, transformation and aggregation from more TEMPthan 20 sources having multiple file formats includingXML, JSON, CSV&othercompressedfile formats.
ImplementedSparkCoreinScalato process data in memory.
Performed job functions usingSpark API’sinScalaforreal time analysis and for fast querying purposes.
Involved in creatingSparkapplicationsinScalausing cache, map, reduceByKey etc. functions to process data.
CreatedOozieworkflowsforHadoopbased jobs includingSqoop,HiveandPig.
CreatedHive External tablesand loaded teh data in to tables and query data usingHQL.
Performed data validation on teh data ingested usingMapReduceby building a custom model to filter all teh invalid data and cleanse teh data.
Handled teh importing of data from various data sources, performed transformations using hive,Map-Reduce, loaded data intoHDFSand extracted data fromMySQLintoHDFSusingSqoop.
WroteHiveQLqueries by configuring number ofreducersandmappersin teh query needed for teh output.
Transferred data betweenPig ScriptsandHiveusingHCatalog, transferred relational database usingSqoop.
Configured and Maintained different topologies in Storm cluster and deployed them on regular basis.
Responsible for building scalable distributeddata solutionsusingHadoop.Installed and configuredHive,Pig,Oozie, andSqooponHadoopcluster.
Developed simple tocomplexMap-Reduce jobs using Java programming language dat was implemented usingHiveandPig.
Ran many performance tests using teh Cassandra -stress tool in order to measure and improve teh read and write performance of teh cluster
Configuring teh Kafka, Storm and Hive to get and load teh real time messaging.
SupportedMapReducePrograms dat are running on teh cluster.Cluster monitoring, maintenance and troubleshooting.
Analysed teh data by performingHivequeries (HiveQL) and runningPig Scripts(Pig Latin).
Cluster coordination services throughZookeeper.Installed and configuredHiveand also writtenHive UDFs.
Worked on teh Analytics Infrastructure team to develop a stream filtering system on top ofApache Kafka and Storm.
Worked on a POC on Spark and Scala parallel processing. Real streaming teh data using Spark with Kafka.

Environment: Hadoop, Spark, HDFS, Hive, Pig, HBase, Big Data, Apache Storm, Oozie, Sqoop, Kafka, Flume, Zookeeper, MapReduce, Cassandra, Scala, Linux, NoSQL, MySQL Workbench, Java, Eclipse, Oracle 10g, SQL.

Confidential

Hadoop Developer

Responsibilities:

Implemented a generic ETL framework withhigh availabilityfor bringing related data for Hadoop & Cassandra from various sources using spark.
Experienced in usingPlatforaa data visualization tool specific for Hadoop, and created various Lens and Viz boards for a real-time visualization from hive tables.
Queried and analyzed data fromCassandrafor quick searching, sorting and grouping throughCQL.
Implemented various Data Modeling techniques forCassandra.
Joined various tables in Cassandra usingspark and Scalaand ran analytics on top of them.
Participated in various upgradations and troubleshooting activities across enterprise.
Knowledge in performancetroubleshooting and tuningHadoop clusters.
AppliedSparkadvanced procedures liketext analytics and processingusing thein-memoryprocessing.
Implemented ApacheDrillon Hadoop to join data from SQL and No SQL databases and store it in Hadoop.
Created architecture stack blueprint for data access with NoSQL DatabaseCassandra;
Brought data from various sources in to Hadoop and Cassandra usingKafka.
Experienced in usingTidal enterprise scheduler and OozieOperational Services for coordinating teh cluster and scheduling workflows.
Applied spark streaming for real time data transforming.
Created multiple dashboards in tableau for multiple business needs.
Installed and configured Hive and written Hive UDFs and used piggy bank a repository of UDF’s for Pig Latin.
Implemented Partitioning, Dynamic Partitions and Buckets inHIVEfor efficient data access.
Exported teh analyzed data to teh relational databases using Sqoop for visualization and to generate reports for teh BI team UsingTableau.
ImplementedCompositeserver for thedata virtualizationneeds and created multiples views for restricted data access using a REST API.
Devised and lead teh implementation of next generation architecture for more efficient data ingestion and processing.
Created and implemented variousshell scriptsfor automating teh jobs.
ImplementedApache Sentryto restrict teh access on teh hive tables on a group level.
EmployedAVROformat for teh entire data ingestion for faster operation and less space utilization.
Experienced in managing and reviewing Hadoop log files.
Worked inAgile environment,and used rally tool to maintain teh user stories and tasks.
Worked withEnterprise data supportteams to install Hadoop updates, patches, version upgrades as required and fixed problems, which raised after teh upgrades.
Implemented test scripts to support test-driven development and continuous integration.
Used Spark for Parallel data processing and better performances.

Environment: MapR 5.0.1, Map Reduce, HDFS, Hive, pig, Impala, Cassandra 5.04, spark, Scala, Solr, Java, SQL, Tableau, PIG, Zookeeper, Sqoop, Teradata, CentOS, Pentaho.

Confidential, Atlanta, GA

Hadoop Developer

Responsibilities:

Handled importing of data from various data sources, performed data transformations using HAWQ, Map Reduce.
Analysed teh web log data using theHiveQL.
Developed hive queries on data logs to perform a trend analysis of user behavior on various online modules.
Developed thePig UDF'Sto pre-process teh data for analysis.
Involved in teh setup and deployment of Hadoop cluster.
Developed Map Reduce programs for some refined queries on big data.
Involved in loading data fromUNIXfile system toHDFS.
Loaded data intoHDFSand extracted teh data fromMySQL into HDFS using Sqoop.
Exported teh analyzed data to teh relational databases usingSqoopand generated reports for teh BI team.
Managing and scheduling jobs on a Hadoop cluster usingOozie.
Along with teh Infrastructure team, involved in design and developedKafka and Stormbased data pipeline.
Used Test driven approach for developing teh application and Implemented teh unit tests using Python Unit test framework.
Developed storm-monitoring bolt for validating pump tag values against high-low and
High high - low low values from preloaded metadata.
Designed and configuredKafkacluster to accommodate heavy throughput of 1 million messages per second. UsedKafka producer0.8.3 API's to produce messages.
Installed, Configured Talend ETL on single and multi-server environments.
Troubleshooting, debugging & fixing Talend specific issues, while maintaining and performance of teh ETL environment.
Developed Merge jobs inPythonto extract and load data into MySQL database.
Created and modified several UNIX shell Scripts according to teh changing needs of teh project and client requirements. DevelopedUNIXshellscriptsto call Oracle PL/SQL packages and contributed to standard framework.

Environment: Hortonworks Hadoop 2.0, EMP, PySpark, Cloud Infrastructure (Amazon AWS), JAVA, Python, HBase, Hadoop Ecosystem, Linux, Scala.

Linux Engineer

Confidential

Responsibilities:

Managed and administrated of all UNIX servers, includesLinux operating systems by applying relative patches and packages at regular maintenance periods using Red Hat Satellite server, YUM, RPM tools.
Planned and performed teh upgrades toLinux (RHEL 5x, 6x, SUSE 10, 11, CENTOS 5, 6, operating systems and hardware maintenance like changing memory modules, replacing disk drives.
Handling NFS, Auto Mount, DNS, LDAP related issues.
Monitoring CPU, memory, physical disk, Hardware and Software RAID, multipath, file systems, network using teh tools NAGIOS4.0 monitoring.
Performing failover and integrity test on new servers before rolling out to production.
Planned, scheduled and Implemented OS patches onLinux boxes as a part of proactive maintenance.
Identify, troubleshoot, and resolve problems with teh OS build failures.
Used Chef for managing server application server such as Apache, MySQL, and Tomcat.
Installation, configuration, and customization of services Sendmail, Apache, FTP servers to meet teh user needs and requirements.
Performing kernel and database configuration optimization such dat it limits I/O resource utilization on disks.

We provide IT Staff Augmentation Services!

Hadoop Developer Resume

Atlanta, GA

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship