We provide IT Staff Augmentation Services!

Hadoop/spark Developer Resume

5.00/5 (Submit Your Rating)

New York, NY

SUMMARY

  • A dynamic professional with 6+ years of experience in Application analysis, Design, Development, Maintenance and Supporting web, Client - server, distributed applications which includes 4 years of experience with Big Data and Hadoop related components like Spark, HDFS, Map Reduce, Pig, Hive, Impala, YARN, Sqoop, Flume, Oozie, Scala and Kafka.
  • Experience in multiple Hadoop distributions like Cloudera and Hortonworks.
  • Extensive experience with wiring SQL queries using HiveQL to perform analytics on structured data.
  • Working knowledge in latest large-scale data processing tools like Spark, Scala, Spark SQL.
  • Experience in performing data validation using HIVE dynamic partitioning and bucketing.
  • Expertise in Data ingestion using Sqoop, Apache Kafka and Flume.
  • Excellent understanding and experience of NoSQL databases like HBase and Cassandra.
  • Experience on working structured, unstructured data with various file formats such as Avro data files, xml files, JSON files, sequence files, ORC and Parquet.
  • Implemented business logic using Pig scripts. Wrote custom Pig UDF’s to analyze data, performed different ETL operations using Pig for joining and transformations on data to join, clean, aggregate and analyze data.
  • Experience with Oozie Workflow Engine to automate and parallelize Hadoop, Map Reduce and Pig jobs.
  • Experience in implementing algorithms for analyzing using spark.ImplementingSpark using Scala and SparkSQL for faster processing of data.
  • Experience in Spark streaming collects the data from Kafka in near real time and performs necessary processing of data.
  • Good understanding of cloud configuration in Amazon web services (AWS).
  • Experience in getting data from various sources into HDFS and building reports using Tableau and QlikView.
  • Expertise with Application servers and web servers like Oracle WebLogic, IBM WebSphere and Apache Tomcat.
  • Good knowledge in connecting Hive Tables with Tableau for Business Intelligence Reporting.
  • Expert level skills in Java, J2EE (Servlet, JSP, JDBC), Struts Framework, Hibernate, Spring, Web services, REST and XML.
  • Experience using Design Patterns (Singleton, Factory, Builder) including MVC architecture.
  • Experience working in environments using Agile (SCRUM) and Waterfall methodologies.
  • Experience in designing applications using UML Diagrams like Class Diagram, Component Diagram, Sequence Diagrams, and Deployment Diagram using MS Visio, Rational Rose.
  • Expertise in database modeling and development usingSQL and PL/SQL in Oracle (8i, 9i and 10g), MySQL, Teradata, DB2 and SQL Server environments.
  • Good experience in performing and supporting Unit testing, System Integration testing (SIT), UAT and production support for issues raised by application users.
  • Quick Learner and ability to work in challenging and versatile environments and Self-motivated, excellent written/verbal communication skills.

TECHNICAL SKILLS

Languages/Scripting: Java, Python, Pig Latin, Scala, HiveQL, SQL LINUX shell scripts, Java Script.

Big Data Framework/Stack: Hadoop HDFS, MapReduce, YARN, Hive, Hue, Impala, SQOOP, Pig, HBase, Spark, Kafka, Flume, Oozie, Zookeeper, Cassandra, KNIME etc

Hadoop Distributions: Apache Cloudera CDH5, Hortonworks HDP2.X Ambari

RDBMS: Oracle, DB2, SQL Server, MySQL

No SQL Databases: HBase, MongoDB

Software Methodologies: SDLC- Waterfall / Agile, Scrum, JIRA

Operating Systems: Windows XP/NT/7/8, REDHAT, Centos, Mac

IDE’s: Net beans, Eclipse, PyCharm

File Formats: XML, Text, Sequence, JSON, ORC, AVRO, and Parquet.

PROFESSIONAL EXPERIENCE

Hadoop/Spark Developer

Confidential - New York, NY

Responsibilities:

  • Used Cloudera distribution forhadoopecosystem.
  • Converted MapReduce jobs into Sparktransformations and actions usingSparkRDDs in python.
  • Written Spark jobs in python to analyze the data of the customers and sales history.
  • Used Kafka to get data from many sources into HDFS.
  • Involved in designing the row key in HBase to store Text and JSON as key values in HBase tables.
  • Involved in collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis. Worked with large Distributed systems for data storage.
  • Created hive external tables to perform ETL on data dat is generated on daily basics.
  • Created HBase tables for random lookups as per requirement of business logic.
  • Performed transformations using spark and loaded data into HBase tables.
  • Performed validation on the data ingestedto filter and cleanse the data in Hive.
  • Created SQOOP jobs to handle incremental loads from RDBMS into HDFS.
  • Imported data as parquet files for some use cases using SQOOP to improve processing speed for later analytics.
  • Collected log data from web servers and pushed to HDFS using Flume.
  • Worked on Agile Scrum methodology. We also used JIRA for bug tracking.

Environment: s:Hadoop, Hive, Flume, REDHAT6.x, Shell Scripting, Java, Eclipse, HBase, Kafka, SparkPython, Oozie, Zookeeper, CDH5.x, HQL/SQL, Oracle 11g.

Hadoop Developer

Confidential, Rosemont, IL

Responsibilities:

  • Work on the POC for Apache Hadoop framework initiation.
  • Work on Installed and configured Hadoop1.x MapReduce, HDFS, developed multiple MapReduce jobs in java for data cleaning and preprocessing.
  • Importing and exporting data into HDFS and HIVE using Sqoop.
  • Involve in creating Hive tables, loading with data and writing hive queries which will run internally in map reduce way.
  • Implement Partitioning, Dynamic Partitions, Buckets in HIVE.
  • Responsible to manage data coming from different sources, worked with large distributed systems to store the data.
  • Monitor the running MapReduce programs on the cluster.
  • Responsible for loading data from UNIX file systems to HDFS.
  • Install and configure Hive and also write Hive UDFs. Worked with Cassandra for data storage and processing.
  • Involved in creating Hive Tables, loading with data and writing Hive queries which will invoke and run MapReduce jobs in the backend.
  • Implement the workflows using Apache Oozie framework to automate tasks.
  • Develop scripts and automated data management from end to end and sync up b/w all the clusters.
  • Manage IT and business stakeholders, conduct assessment interviews, solution review sessions.
  • Review the code developed and suggest any issues w.r.t customer data.
  • Use SQL queries and other tools to perform data analysis and profiling.
  • Mentor and train the engineering team in use of Hadoop platform and analytical software, development technologies and also follow Agile methodology. JIRA is used for tacking.

Environment: Apache Hadoop, Java (jdk1.6), DataStax, Flat files, Oracle 11g/10g, MySQL, Toad 9.6, Windows NT, Centos, Sqoop, Hive, Oozie.

Hadoop Developer

Confidential - Oakland, California

Responsibilities:

  • Involved in complete Big Data flow of the application starting from data ingestion from upstream to HDFS, processing the data in HDFS and analyzing the data.
  • Importing and exporting data into HDFS using SQOOP and Kafka.
  • Created Hive tables and working on them using Hive QL
  • Created partitioned tables in Hive for best performance and faster querying.
  • Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
  • Worked on Hive UDF’s using data from HDFS.
  • Performed extensive data analysis using Hive.
  • Executed different types of joins on Hive tables.
  • Used Impala for faster querying purposes.
  • Created indexes and tuned the SQL queries in Hive.
  • Involved in scheduling Oozie workflow engine to run multiple Hive jobs
  • Develop HiveQL scripts to perform the incremental loads.
  • Worked on different Big Data file formats like text, sequence, avro, parquet and snappy compression.
  • Involved in identifying possible ways to improve the efficiency of the system.
  • Involved in generating the data cubes for visualizing. Worked in Agile process and used to follow Scrum Stand up process.

Environment: Hadoop, Hive, Pig, SQOOP, Kafka, Oozie, Impala, Flume, MySQL, Zookeeper, HBase, Cloudera Manager, Map Reduce.

Java Developer

Confidential

Responsibilities:

  • Involved in designing and developing enhancements per business requirements with respect to front end JSP development using Struts.
  • Implemented the project using JSP and Servlets based tag libraries.
  • Conducted client side validations using JavaScript.
  • Coded JDBC calls in the Servlets to access the Oracle database tables.
  • Generate SQL Scripts to update the parsed message into Database.
  • Worked on parsing the RSS Feeds (XML) files using SAX parsers.
  • Designed and coded the java class dat will handle errors and will log the errors in a file.
  • Developed Graphical User Interfaces using struts, tiles and JavaScript. Used JSP, JavaScript and JDBC to create Web Servlets.
  • Utilized the mail merge techniques in MS Word for time reduction in sending certificates.
  • Involved in documentation, review, analysis and fixed postproduction issues.
  • Worked on bug fixing and enhancements on change requests.
  • Designed the various animations with different graphics using with Macromedia Flash MX with Action Script 1.0, Photo Impact and GIF Animator.
  • Understanding the customer requirements, mapping them to functional requirements and creating Requirement Specifications.
  • Developed web pages to display the account transactions and Application UI creation using GWT, Java, JSP, CSS and web standards improving application usability always meeting tight deadlines
  • Responsible for the configuration of Struts web based application using struts-config.xml and web.xml
  • Modified Struts configuration files as per application requirements and developed Web services for non-java clients to obtain user information details pertaining to dat account using JSP, DHTML, Spring Web Flow and CSS.

Environment: HTML/CSS/JavaScript/JSON, JDK 1.3, J2EE, Servlets, Java Beans, MDB, JDBC, MS SQL Server, JBoss, I frameworks & libraries Struts, Spring MVC, JQuery, MVC concepts, XML, SVN.

We'd love your feedback!