We provide IT Staff Augmentation Services!

Hadoop Spark Developer Resume

4.00/5 (Submit Your Rating)

New York, NY

SUMMARY

  • Over 7 years of overall IT Industry and Software Development experience with 4+ years of experience in Hadoop Development
  • Experience in developing Map Reduce Programs using Apache Hadoop for analyzing the big data as per the requirement.
  • Experienced in major Hadoop ecosystem's projects such as Pig, Hive, HBase and monitoring them with Cloudera Manager.
  • Extensive experience in developing Pig Latin Scripts and using Hive Query Language for data analytics.
  • Hands - on experience working on NoSQL databases including HBase, Cassandra and its integration with the Hadoop cluster.
  • Experience in implementing Spark, Scala application using higher-order functions for both batch and interactive analysis requirement.
  • Good working experience using Sqoop to import data into HDFS from RDBMS and vice-versa.
  • Good knowledge in using job scheduling and monitoring tools like Oozie and Zookeeper.
  • Experience in Hadoop administration activities such as installation and configuration of clusters using Apache, Cloudera, and AWS.
  • Experienced in designing, built, and deploying a multitude application utilizing almost all the AWS stack (Including EC2, R53, S3, RDS, DynamoDB, SQS, IAM, and EMR), focusing on high-availability, fault tolerance, and auto-scaling.
  • Experienced in MVC (Model View Controller) architecture and various J2EE design patterns like singleton and factory design patterns.
  • Extensive experience in loading and analyzing large datasets with the Hadoop framework (MapReduce, HDFS, PIG, HIVE, Flume, Sqoop, SPARK, Impala), NoSQL databases like MongoDB, HBase, Cassandra.
  • Solid understanding of HadoopMRV1 and HadoopMRV2 (or) YARN Architecture.
  • Hands-on experience in configuring and administering the Hadoop Cluster using major Hadoop Distributions
  • Hands-on experience in solving software design issues by applying design patterns including the Singleton Pattern, Business Delegator Pattern, Controller Pattern, MVC Pattern, Factory Pattern, Abstract Factory Pattern, DAO Pattern, and Template Pattern.
  • Good experience with design, coding, debug operations, reporting and data analysis utilizing python and using python libraries to speed up development.
  • Good Working experience in using different Spring modules like Spring Core Container Module, Spring Application Context Module, Spring MVC Framework module, Spring ORM Module in Web applications.
  • Used jQuery to select HTML elements, to manipulate HTML elements and to implement AJAX in Web applications. Used available plug-ins for extension of jQuery functionality.

PROFESSIONAL EXPERIENCE

HADOOP SPARK DEVELOPER

Confidential, New York, NY

Responsibilities:

  • Worked directly with the Big Data Architecture Team which created the foundation of this Enterprise Analytics initiative in a Hadoop-based Data Lake.
  • Created multi-node Hadoop and Spark clusters in AWS instances to generate terabytes of data and stored it in AWS HDFS.
  • Installed/Configured/Maintained Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase, Zookeeper and Sqoop.
  • Extracted real time feed using Kafka and Spark streaming and convert it to RDD and process data in the form of Data Frame and save the data as Parquet format in HDFS.
  • Developed data pipeline using Flume, Sqoop, Pig and Java map reduce to ingest customer behavioral data and financial histories into HDFS for analysis.
  • Involved in collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
  • Upgraded the Hadoop cluster from CDH4.7 to CDH5.2 and worked on installing cluster, commissioning & decommissioning of Data Nodes, Name Node recovery, capacity planning, and slots configuration.
  • Developed Spark scripts to import large files from Amazon S3 buckets and imported the data from different sources like HDFS/HBase into SparkRDD.
  • FollowedAgile&Scrumprinciples in developing the project
  • DevelopedSparkAPIto import data intoHDFSfromDB2and createdHive tables.
  • UsedSparkAPIoverHortonworksHadoopYARNto perform analytics on data inHive.
  • ImplementedSparkusingScalaandSparkSQLfor faster testing and processing of data.
  • Importing Large Data Sets fromDB2toHiveTable usingSqoop
  • UsedImpalafor queryingHDFSdata to achieve better performance.
  • Implemented ApachePIGscriptsto load data from and to store data intoHive.
  • CreatedPartitionedandBucketedHivetables inParquet FileFormats withSnappycompression and then loaded data intoParquet hive tablesfromAvro hivetables.
  • Involved in running all thehive scriptsthroughhive, Impala, HiveonSparkand some throughSparkSQL
  • Imported data fromAWS S3and intoSparkRDDand performed transformations and actions onRDD's.
  • DevelopedSpark scriptsby usingScala Shellcommands as per the requirement.
  • Worked and learned a great deal fromAmazon Web Services (AWS)Cloud serviceslikeEC2,S3,EBS,RDSandVPC.
  • Responsible for implementingETLprocess throughKafka-Spark-HBaseIntegration as per the requirements of customer facingAPI.
  • UsedSpark-SQLto LoadJSONdata and createSchema RDDand loaded it intoHiveTables and handled Structured data usingSparkSQL.
  • Worked onBatch processingandReal-time dataprocessing onSparkStreamingusingLambdaarchitecture.
  • DevelopingSparkcodeinScalaandSparkSQLenvironment for faster testing and processing of data and Loading the data intoSparkRDDand doingIn-memory computationto generate the output response with less memory usage.
  • Supported MapReduce Programs and distributed applications running on the Hadoop cluster and scripting Hadoop package installation and configuration to support fully automated deployments.
  • Migrated existing on-premises application to AWS and used AWS services like EC2 and S3 for large data sets processing and storage and worked with ELASTIC MAPREDUCE and setup Hadoop environment in AWS EC2 Instances.
  • Worked with systems engineering team to plan and deploy new Hadoop environments and expand existing Hadoop clusters.
  • Perform maintenance, monitoring, deployments, and upgrades across infrastructure that supports all our Hadoop clusters and worked on Hive for further analysis and for generating transforming files from different analytical formats to text files.
  • Created Hive External tables and loaded the data into tables and query data using HQL and worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
  • Worked and learned a great deal from AWS Cloud services like EC2, S3, EBS, RDS and VPC.
  • Monitoring Hadoop cluster using tools like Nagios, Ganglia, and Cloudera Manager and maintaining the Cluster by adding and removing of nodes using tools like Ganglia, Nagios, and Cloudera Manager.
  • Worked on Hive for exposing data for further analysis and for generating transforming files from different analytical formats to text files.
  • Worked on migrating MapReduce programs into Spark transformations using Spark and Scala, initially done using python (PySpark)

Environment: Hadoop, Java, MapReduce, HDFS, AWS, Amazon S3, Hive, Linux, XML, Eclipse, Cloudera, CDH4/5 Distribution, Spark, Scala, HBase, MongoDB, Python, GitHub, Sqoop, Oozie.

HADOOP DEVELOPER

Confidential, New York

Responsibilities:

  • Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, Hive and Sqoop.
  • Worked directly with the Big Data Architecture Team which created the foundation of this Enterprise Analytics initiative in a Hadoop-based Data Lake.
  • Created multi-node Hadoop and Spark clusters in AWS instances to generate terabytes of data and stored it in AWS HDFS.
  • Installed/Configured/Maintained Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase, Zookeeper and Sqoop.
  • Extracted real time feed using Kafka and Spark streaming and convert it to RDD and process data in the form of Data Frame and save the data as Parquet format in HDFS.
  • Developed data pipeline using Flume, Sqoop, Pig and Java map reduce to ingest customer behavioral data and financial histories into HDFS for analysis.
  • Involved in collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
  • Upgraded the Hadoop cluster from CDH4.7 to CDH5.2 and worked on installing cluster, commissioning & decommissioning of Data Nodes, Name Node recovery, capacity planning, and slots configuration.
  • Developed Spark scripts to import large files from Amazon S3 buckets and imported the data from different sources like HDFS/HBase into SparkRDD.
  • Involved in migration of ETL processes from Oracle to Hive to test the easy data manipulation and worked on importing and exporting data from Oracle and DB2 into HDFS and HIVE using Sqoop.
  • Worked on Installing Cloudera Manager, CDH and install the JCE Policy File to Create a Kerberos Principal for the Cloudera Manager Server, enabling Kerberos Using the Wizard.
  • Developed Spark jobs using Scala and Python on top of Yarn/MRv2 for interactive and Batch Analysis.
  • Monitored cluster for performance and, networking and data integrity issues and responsible for troubleshooting issues in the execution of MapReduce jobs by inspecting and reviewing log files.
  • Created 25+ Linux Bash scripts for users, groups, data distribution, capacity planning, and system monitoring.
  • Install OS and administrated Hadoop stack with CDH5 (with YARN) Cloudera distribution including configuration management, monitoring, debugging, and performance tuning.
  • Supported MapReduce Programs and distributed applications running on the Hadoop cluster and scripting Hadoop package installation and configuration to support fully automated deployments.
  • Migrated existing on-premises application to AWS and used AWS services like EC2 and S3 for large data sets processing and storage and worked with ELASTIC MAPREDUCE and setup Hadoop environment in AWS EC2 Instances.
  • Worked with systems engineering team to plan and deploy new Hadoop environments and expand existing Hadoop clusters.
  • Perform maintenance, monitoring, deployments, and upgrades across infrastructure that supports all our Hadoop clusters and worked on Hive for further analysis and for generating transforming files from different analytical formats to text files.
  • Created Hive External tables and loaded the data into tables and query data using HQL and worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
  • Worked and learned a great deal from AWS Cloud services like EC2, S3, EBS, RDS and VPC.
  • Monitoring Hadoop cluster using tools like Nagios, Ganglia, and Cloudera Manager and maintaining the Cluster by adding and removing of nodes using tools like Ganglia, Nagios, and Cloudera Manager.
  • Worked on Hive for exposing data for further analysis and for generating transforming files from different analytical formats to text files.
  • Worked on migrating MapReduce programs into Spark transformations using Spark and Scala, initially done using python (PySpark)

Environment: Hadoop, MapReduce, Hive, PIG, Sqoop, Python, Spark, Spark-Streaming, Spark SQL, AWS EMR, AWS S3, AWS Redshift, Python, Scala, Spark, Map, Java, Oozie, Flume, HBase, Nagios, Ganglia, Hue.

We'd love your feedback!