We provide IT Staff Augmentation Services!

Hadoop Spark Developer Resume

4.00/5 (Submit Your Rating)

Austin, TexaS

SUMMARY

  • Over 5 years of overall IT Industry and Software Development experience with 4+ years of experience in Hadoop Development
  • Experience in developing Map Reduce Programs using Apache Hadoop for analyzing the big data as per the requirement.
  • Experienced in major Hadoop ecosystem's projects such as Pig, Hive, HBase and monitoring them with Cloudera Manager.
  • Extensive experience in developing Pig Latin Scripts and using Hive Query Language for data analytics.
  • Hands - on experience working on NoSQL databases including HBase, Cassandra and its integration with the Hadoop cluster.
  • Experience in implementing Spark, Scala application using higher-order functions for both batch and interactive analysis requirement.
  • Good working experience using Sqoop to import data into HDFS from RDBMS and vice-versa.
  • Good knowledge in using job scheduling and monitoring tools like Oozie and Zookeeper.
  • Experience in Hadoop administration activities such as installation and configuration of clusters using Apache, Cloudera, and AWS.
  • Experienced in designing, built, and deploying a multitude application utilizing almost all the AWS stack (Including EC2, R53, S3, RDS, DynamoDB, SQS, IAM, and EMR), focusing on high-availability, fault tolerance, and auto-scaling.
  • Experienced in MVC (Model View Controller) architecture and various J2EE design patterns like singleton and factory design patterns.
  • Extensive experience in loading and analyzing large datasets with the Hadoop framework (MapReduce, HDFS, PIG, HIVE, Flume, Sqoop, SPARK, Impala), NoSQL databases like MongoDB, HBase, Cassandra.
  • Solid understanding of HadoopMRV1 and HadoopMRV2 (or) YARN Architecture.
  • Hands-on experience in configuring and administering the Hadoop Cluster using major Hadoop Distributions
  • Hands-on experience in solving software design issues by applying design patterns including the Singleton Pattern, Business Delegator Pattern, Controller Pattern, MVC Pattern, Factory Pattern, Abstract Factory Pattern, DAO Pattern, and Template Pattern.
  • Good experience with design, coding, debug operations, reporting and data analysis utilizing python and using python libraries to speed up development.
  • Good Working experience in using different Spring modules like Spring Core Container Module, Spring Application Context Module, Spring MVC Framework module, Spring ORM Module in Web applications.
  • Used jQuery to select HTML elements, to manipulate HTML elements and to implement AJAX in Web applications. Used available plug-ins for extension of jQuery functionality.

PROFESSIONAL EXPERIENCE

HADOOP SPARK DEVELOPER

Confidential, Austin, Texas

Responsibilities:

  • Worked directly with the Big Data Architecture Team which created the foundation of this Enterprise Analytics initiative in a Hadoop-based Data Lake.
  • Performed source data transformations usingHive.
  • Supporting infrastructure environment comprising of RHEL and Solaris.
  • Involved in developing a Map reduce framework that filters bad and unnecessary records.
  • DevelopedSpark scripts by usingScalashell commands as per the requirement.
  • UsedKafkato transfer data from different data systems to HDFS.
  • CreatedSpark jobs to see trends in data usage by users.
  • Used the SparkCassandraConnector to load data to and from Cassandra
  • Responsible for generating actionable insights from complex data to drive real business results for various application teams.
  • Designed the Column families inCassandra
  • Ingested data from RDBMS and performed data transformations, and then export the transformed data toCassandraas per the business requirement.
  • DevelopedSpark code to using Scala andSpark-SQLfor faster processing and testing.
  • Experience inNoSQL Column-OrientedDatabases likeCassandraand its Integration with Hadoop cluster.
  • Ec2 Collecting and aggregating large amounts of log data usingFlumeand staging data in HDFS for further analysis
  • Used Spark API over Hadoop YARNas execution engine for data analytics using Hive.
  • Exported the analyzed data to the relational databases using Sqoop to further visualize and generate reports for the BI team.
  • Good experience withTalendopen studio for designingETLJobs for Processing of data.
  • Experience in processing large volume of data and skills in parallel execution of process usingTalendfunctionality
  • Worked on different file formats like Text files andAvro.
  • Experience in installation, configuration, supporting and monitoring Hadoop clusters using Apache, Cloudera manager andAWSService console.
  • Created various kinds of reports usingPower BIand Tableau based on the client's needs.
  • Worked onAgile Methodologyprojects extensively.
  • Experience designing and executing time driven and data drivenOozieworkflows.
  • Setting up Kerberos principals and testingHDFS,Hive,Pigand MapReduce access for the new users.
  • Experienced in working withSparkeco system usingSCALAandHIVEQueries on different data formats like Text file and parquet.
  • Log4jframework has been used for logging debug, info & error data.
  • Worked on installing cluster, commissioning & decommissioning ofData node,Namendarecovery, capacity planning, and slots configuration.
  • Experience in importing data fromS3toHIVEusingSqoopandKafka.
  • Developed Hive scripts inHive QLto de-normalize and aggregate the data.
  • Implemented map-reducecountersto gather metrics of good records and bad records.
  • Work experience with cloud infrastructure likeAmazon Web Services(AWS).
  • Developed customized UDF's in java to extend Hive and Pig functionality.
  • Worked withSCRUMteam in delivering agreed user stories on time for every sprint.
  • Worked on different file formats (ORCFILE, Parquet, Avro) and different Compression Codecs(GZIP, SNAPPY, LZO).
  • Created applications usingKafka, which monitors consumer lag within Apache Kafka clusters. Used in production by multiple companies.
  • UsingSpark-StreamingAPIs to perform transformations and actions on the fly for building the common learner data model which gets the data fromKafkain near real time and persists intoCassandra.
  • Experience in using ApacheKafkafor collecting, aggregating, and moving large amounts of data from application servers.
  • Used HibernateORMframework with Springframeworkfor data persistence and transaction management.
  • Performance analysis ofSparkstreamingand batch jobs by using Spark tuning parameters.
  • Worked towards creating real time data streaming solutions using Apache Spark/Spark Streaming, Kafka.
  • Worked along with the Hadoop Operations team in Hadoop cluster planning, installation, maintenance, monitoring and upgrades.
  • Used File System check (FSCK) to check the health of files in HDFS.
  • Worked inAgiledevelopment environment in sprint cycles of two weeks by dividing and organizing tasks. Participated in daily scrum and other design related meetings.

Environment: Hadoop, Java, MapReduce, HDFS, AWS, Amazon S3, Hive, Linux, XML, Eclipse, Cloudera, CDH4/5 Distribution, Spark, Scala, HBase, MongoDB, Python, GitHub, Sqoop, Oozie.

HADOOP DEVELOPER

Confidential, New York

Responsibilities:

  • Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, Hive and Sqoop.
  • Worked directly with the Big Data Architecture Team which created the foundation of this Enterprise Analytics initiative in a Hadoop-based Data Lake.
  • Created multi-node Hadoop and Spark clusters in AWS instances to generate terabytes of data and stored it in AWS HDFS.
  • Installed/Configured/Maintained Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase, Zookeeper and Sqoop.
  • Extracted real time feed using Kafka and Spark streaming and convert it to RDD and process data in the form of Data Frame and save the data as Parquet format in HDFS.
  • Developed data pipeline using Flume, Sqoop, Pig and Java map reduce to ingest customer behavioral data and financial histories into HDFS for analysis.
  • Involved in collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
  • Upgraded the Hadoop cluster from CDH4.7 to CDH5.2 and worked on installing cluster, commissioning & decommissioning of Data Nodes, Name Node recovery, capacity planning, and slots configuration.
  • Developed Spark scripts to import large files from Amazon S3 buckets and imported the data from different sources like HDFS/HBase into SparkRDD.
  • Involved in migration of ETL processes from Oracle to Hive to test the easy data manipulation and worked on importing and exporting data from Oracle and DB2 into HDFS and HIVE using Sqoop.
  • Worked on Installing Cloudera Manager, CDH and install the JCE Policy File to Create a Kerberos Principal for the Cloudera Manager Server, enabling Kerberos Using the Wizard.
  • Developed Spark jobs using Scala and Python on top of Yarn/MRv2 for interactive and Batch Analysis.
  • Monitored cluster for performance and, networking and data integrity issues and responsible for troubleshooting issues in the execution of MapReduce jobs by inspecting and reviewing log files.
  • Created 25+ Linux Bash scripts for users, groups, data distribution, capacity planning, and system monitoring.
  • Install OS and administrated Hadoop stack with CDH5 (with YARN) Cloudera distribution including configuration management, monitoring, debugging, and performance tuning.
  • Supported MapReduce Programs and distributed applications running on the Hadoop cluster and scripting Hadoop package installation and configuration to support fully automated deployments.
  • Migrated existing on-premises application to AWS and used AWS services like EC2 and S3 for large data sets processing and storage and worked with ELASTIC MAPREDUCE and setup Hadoop environment in AWS EC2 Instances.
  • Worked with systems engineering team to plan and deploy new Hadoop environments and expand existing Hadoop clusters.
  • Perform maintenance, monitoring, deployments, and upgrades across infrastructure that supports all our Hadoop clusters and worked on Hive for further analysis and for generating transforming files from different analytical formats to text files.
  • Created Hive External tables and loaded the data into tables and query data using HQL and worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
  • Worked and learned a great deal from AWS Cloud services like EC2, S3, EBS, RDS and VPC.
  • Monitoring Hadoop cluster using tools like Nagios, Ganglia, and Cloudera Manager and maintaining the Cluster by adding and removing of nodes using tools like Ganglia, Nagios, and Cloudera Manager.
  • Worked on Hive for exposing data for further analysis and for generating transforming files from different analytical formats to text files.
  • Worked on migrating MapReduce programs into Spark transformations using Spark and Scala, initially done using python (PySpark)

Environment: Hadoop, MapReduce, Hive, PIG, Sqoop, Python, Spark, Spark-Streaming, Spark SQL, AWS EMR, AWS S3, AWS Redshift, Python, Scala, Spark, Map, Java, Oozie, Flume, HBase, Nagios, Ganglia, Hue.

We'd love your feedback!