We provide IT Staff Augmentation Services!

Hadoop / Spark Developer Resume

3.00/5 (Submit Your Rating)

Charlotte, NC

SUMMARY

  • 5 years of professional IT experience, 3+ years of Big Data Ecosystem experience in ingestion, querying, processing and analysis of big data.
  • Experience in using Hadoop ecosystem components like Map Reduce, HDFS, HBase, Zoo Keeper, Hive, Sqoop, Pig, Flume, Spark, Cloudera.
  • Knowledge and experience in Spark using Python and Scala.
  • Knowledge on big - data database HBase and NoSQL databases Mongo DB and Cassandra
  • Experience includes Requirements Gathering, Design, Development, Integration, Documentation, Testing and Build.
  • Experience in Spark applications using Scala for easyHadooptransitions.
  • Extending Hive and Pig core functionality by writing custom UDFs.
  • Solid knowledge of Hadoop architecture and core components Name node, Data nodes, Job trackers, Task Trackers, Oozie, Scribe, Hue, Flume, Kafka, HBase, etc.
  • Experience in provisioning and managing multi-tenantHadoopclusters on public cloud environment - Amazon Web Services (AWS) and on private cloud infrastructure - Open stack cloud platform.
  • Worked and learned a great deal from Amazon WebServices (AWS) Cloud services like EC2, S3, EBS, RDS and VPC.
  • Extensively worked on development and optimization of Map reduce programs, PIG scripts and HIVE queries to create structured data for data mining
  • Worked with both Scala and Java, Created frameworks for processing data pipelines through Spark.
  • Implemented batch-processing solution to certain unstructured and large volume of data by using Hadoop Map Reduce framework.
  • Very good experience of Partitions, Bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance.
  • Knowledge of ETL methods for data extraction, transformation and loading in corporate-wide ETL Solutions and Data warehouse tools for data analysis
  • Experience in Database design, Data analysis, Programming SQL.
  • Experience in extending HIVE and PIG core functionality by using Custom user Defined functions.
  • Experience in writing custom classes, functions, procedures, problem management, library controls and reusable components.
  • Working knowledge on Oozie, a workflow scheduler system to manage the jobs that run on PIG, HIVE and SQOOP.
  • Experienced in integrating Java-based web applications in a UNIX environment.

TECHNICAL SKILLS

Hadoop/Big Data: HDFS, Map Reduce, Hive, Pig, Sqoop, Flume, Oozie, and Spark (Python and Scala), Flume, Kafka.:

NoSQL Databases: HBase,Cassandra, MongoDB:

Languages: C, C++, Java, J2EE, PL/SQL, Pig Latin, HiveQL, Unix shell scripts, R:

Java/J2EE Technologies: Applets, Swing, JDBC, JSON, JSTL, JMS, Java Script, JSP, Servlets, EJB, JSF, JQuery:

Frameworks: MVC, Struts, Spring, Hibernate.:

ETL: IBMWebSphere/Oracle:

Operating Systems: Sun Solaris, UNIX, Red Hat Linux, Ubuntu Linux and Windows XP/Vista/7/8:

Web Technologies: HTML, DHTML, XML, WSDL, SOAP:

Web/Application servers: Apache Tomcat, WebLogic, JBoss:

Databases: Oracle, SQL Server, MySQL:

Tools: and IDE: Eclipse, NetBeans, JDeveloper, DB Visualizer.:

Network Protocols: TCP/IP, UDP, HTTP, DNS:

PROFESSIONAL EXPERIENCE

Confidential, Charlotte, NC

Hadoop / Spark Developer

Responsibilities:

  • Worked withHadoopEcosystem components like HBase, Sqoop, ZooKeeper, Oozie, Hive and Pig with ClouderaHadoopdistribution.
  • Developed PIG and Hive UDF's in java for extended use of PIG and Hive and wrote PigScripts for sorting, joining, filtering and grouping thedata.
  • Developed programs in Spark based on the application for faster data processing than standard MapReduce programs.
  • Developed spark programs using Scala, Involved in creating Spark SQLQueries and Developed Oozie workflow for sparkjobs.
  • Developed the Oozie workflows with Sqoop actions to migrate the data from relational databases like Oracle, Teradata to HDFS.
  • Used Hadoop FS actions to move the data from upstream location to local data locations.
  • Written extensive Hive queries to do transformations on the data to be used by downstream models.
  • Developed map reduce programs as a part of predictive analytical model development.
  • Developed Hive queries to do analysis of the data and to generate the end reports to be used by business users.
  • Worked on scalable distributed computing systems, software architecture,datastructures and algorithms using Hadoop, Apache Spark and Apache Storm etc. and ingested streamingdatainto Hadoop using Spark, Storm Framework and Scala.
  • Got pleasant experience with NOSQL databases like MongoDB.
  • Extensively used SVN as a code repository and Version One for managing day agile project development process and to keep track of the issues and blockers.
  • Written spark python for model integration layer.
  • Implemented Spark using Scala, Java and utilizing Data frames and Spark SQL API for faster processing of data.
  • Developed Spark code and Spark-SQL/Streaming for faster testing and processing of data.
  • Used Spark for interactive queries, processing of streaming data and integration with popular NoSQL database for huge volume of data.
  • Used to work on Teradata queries, Scoop the tables from Teradata to Hadoop. Worked on fine tuning the queries to increase performance.
  • Developed a data pipeline using Kafka, HBase, Mesos, Spark and Hive to ingest, transform and analyzing customer behavioral data.

Environment: Hadoop, Hive, Impala, Oracle, Spark, Python, Pig, Sqoop, Oozie, Kafka, RabbitMQ, MongoDB, Map Reduce, SVN.

Confidential

Hadoop / Spark Developer

Responsibilities:

  • Monitoring and managing daily jobs, processing around 200k files per day and monitoring those through RabbitMQ and Apache Dashboard application.
  • Monitored workload, job performance and capacity planning using InsightIQ storage performance monitoring and storage analytics, experienced in defining job flows.
  • Worked on analyzingHadoopcluster using different big data analytic tools including Pig, Hive and MapReduce.
  • Strong experience working on design and implemented a Cassandra based database and related web services for storing unstructured data.
  • Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map reduce way.
  • Very good understanding of Partitions, bucketing concepts in Hive and designed both Managed and External tables in Hive for optimized performance.
  • Used Sqoop to efficiently transfer data between databases and HDFS and used Flume to stream the web log data from servers/sensors.
  • Developed MapReduce programs to cleanse the data in HDFS obtained from heterogeneous data sources to make it suitable for ingestion into Hive schema for analysis.
  • Developed programs in Spark based on the application for faster data processing than standard MapReduce programs.
  • Created reports for the BI team using Sqoop to export data into HDFS and Hive.
  • Involved in collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
  • Developed PIG Latin scripts to extract the data from the web server output files to load into HDFS.
  • Worked on Mapreduce Joins in querying multiple semi-structured data as per analytic needs.
  • Involved in loading data from Unix File System into HDFS with different format of data (Avro, Parquet)and creating indexes and tuning the SQL queries in Hive and Involved in database connection by using Sqoop.
  • Worked on setting up High Availability for GPHD 2.2 with Zookeeper and quorum journal nodes.
  • Automated the process for extraction of data from warehouses and weblogs by developing work-flows and coordinator jobs in OOZIE.
  • Worked in AWS environment for development and deployment of Custom HADOOP Applications.
  • Worked and learned a great deal from AmazonWebServices (AWS) Cloud services like EC2, S3, EBS, RDS and VPC.
  • Worked in provisioning and managing multi-tenantHadoopclusters on public cloud environment - Amazon Web Services(AWS)and on private cloud infrastructure - Openstack cloud platform.
  • Strong experience in working with ELASTIC MAPREDUCE and setting up environments on Amazon AWS EC2 instances.
  • Scheduling and managing cron jobs, wrote shell scripts to generate alerts.

Environment: Hadoop, AWS, Map Reduce, HDFS, Hive, Pig, Spark, Python, Java 1.6 & 1.7, Linux, Eclipse, Cassandra, Zookeeper

Confidential 

Java/J2EE Developer

Responsibilities:

  • Involved in Java, J2EE, struts, web services and Hibernate in a fast paced development environment.
  • Followed agile methodology, interacted directly with the client on the features, implemented optional solutions and tailor application to customer needs.
  • Used Apache POI for Excel files reading.
  • Developed the user interface using JSP and Java Script to view all online trading transactions.
  • Designed and developed Data Access Objects (DAO) to access the database.
  • Coded Java Server Pages for the Dynamic front end content that use Servlets and EJBs.
  • Coded HTML pages using CSS for static content generation with JavaScript for validations.
  • Used JDBC API to connect to the database and carry out database operations.
  • Used JSP and JSTL Tag Libraries for developing User Interface components.
  • Performing Code Reviews.
  • Performed unit testing, system testing and integration testing.
  • Involved in building and deployment of application in Linux environment.

Environment: Java, J2EE, JDBC, Struts, SQL. Hibernate, Eclipse, Apache POI, CSS.

We'd love your feedback!