We provide IT Staff Augmentation Services!

Big Data Developer Resume

4.00/5 (Submit Your Rating)

Dearborn, MI

SUMMARY

  • Over 8+ years of professional IT experience in analysis, design and development using Hadoop, Java J2EE and SQL.
  • 4+ years of experience in dealing with Apache Hadoop components such as HDFS, Map reduce, Hive, HBase, PIG, Scala, Spark, Impala, Oozie, Flume, HCatalog, Kafka and Sqoop.
  • Experience with new Hadoop 2.0 architecture YARN(MRV2) and developing YARN Applications on it.
  • Good Experience in assess business rules, collaborate with stakeholders and perform source - to-target data mapping, design and review.
  • Worked on HDFS, Name node, job tracker, data node, task tracker and the Map reduce concepts.
  • Experience in writing UNIX shell scripting.
  • Worked on analyzing the data using HiveQL, Pig Latin and the Map Reduce programs in Java.
  • Worked on writing custom UDF'S in extending Hive and Pig core functionality.
  • Worked on managing and reviewing Hadoop log files.
  • Worked on Sqoop, in moving the data from a relational database into Hadoop and used FLUME in collecting the data and populate Hadoop.
  • Worked on HBase in conducting the quick look ups such as updates, inserts and deletes in Hadoop.
  • Experience with Cloudera, Hortonworks and MapR distributions.
  • Worked on the Cloudera Hadoop and Spark developer environment with on-demand lab work using a virtual machine on the cloud.
  • Experience with Apache Spark’s Core, Spark SQL, Streaming and MlLib components.
  • Work experience with cloud infrastructure like Amazon Web Services (AWS).
  • Experience in Data modeling, complex data structures, Data processing, Data quality, Data life cycle.
  • Experience in running Map Reduce and Spark jobs over YARN.
  • Hands-on experience in complete project life cycle (design, development, testing and implementation) of Client Server and Web applications.
  • Participated in design reviews, code reviews, unit testing and integration testing.
  • A very good experience in developing and deploying the applications using Web logic, Apache Tomcat and JBOSS.
  • Extensively worked with Teradata utilities like BTEQ, Fast Export, Fast Load, Multi Load to export and load data to/from different source Confidential including flat files.
  • Strong Experience on SQL, PL/SQL and the database concepts.
  • Experience on NoSQL Databases such as Hbase and Casandra.
  • A very good understanding of job workflow scheduling and monitoring tools like Oozie and Control M.
  • Knowledge on administrative tasks such as installing Hadoop and its ecosystem components such as Hive and Pig

TECHNICAL SKILLS­­­­­­­­­­

Hadoop/Big Data ecosystems: HDFS, MapReduce, Pig, Hive, Impala, Sqoop, Oozie, Zookeeper, Flume, Spark and Scala

No SQL Database: HBase, Cassandra, Mongo DB

Tools and IDE: Eclipse, NetBeans, Toad, Maven, DB Visualizer.

Languages: C, C++, Java, J2EE, PL/SQL, MR, Pig Latin, HiveQL, Unix shell scripting and Scala

Databases,Datawarehousing: Teradata, Oracle, SQL server, MySQL, DB2, PostgreSQL

ETL tools: Data stage, Teradata

Operating Confidential: Windows 95/98/2000/XP/Vista/7, Unix, Linux

PROFESSIONAL EXPERIENCE

Confidential, Dearborn, MI

Big Data Developer

Responsibilities:

  • Developed Spark core and Spark SQL scripts using Scala for faster data processing.
  • Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing, analyzing the data using Map Reduce and Hive jobs.
  • Loaded and transformed large sets of structured and semi structured data.
  • Managed and reviewed Hadoop log files to identify issues when job fails.
  • Involved in writing the shell scripts in scheduling and automation of tasks.
  • Used Hue for UI based Hive and Impala query executions, Oozie scheduling and creating tables in Hive.
  • Worked in designing, implementing and processing the massive amount of market information, enrichment and processing.
  • Involved in data ingestion into HDFS using Sqoop and into Hive tables using Perl script from different sources using the connectors such as jdbc and import parameters.
  • Managed and scheduled the jobs in removing the duplicate log data files in HDFS using Oozie workflows.
  • Used Scala in writing the code for all the use cases in Spark and Spark SQL.
  • Developed the data pipelines using Spark and Hive to ingest, transform and analyze the data.
  • Very good experience in programming using Scala and built Scala prototype for the application requirement and focused on types of functional programming.
  • Explored Spark in improving the performance and optimization of the existing algorithms in Hadoop.
  • Experience with CSV file manipulation, especially in handling data with non-standard delimiters and escape characters.
  • Experience with sed, awk, cut, curl, iconv, grep and other command line text processing utilities.
  • Good understanding of Big Data distributions like Cloudera with Cloudera Manager.
  • Fixed defects during the QA phase, support the QA testing, trouble shooting the defects and identify the source of defects.
  • Strong knowledge in automating various production process using UNIX, Linux Shell scripting and Perl.

Environment: Hadoop, MapR, Scala, Spark 2.1, Spark SQL, HDFS, Hive, Impala, HBase, Intellij Idea, Yarn, Perl, Solar, Oozie, Maven.

Confidential, Baskin Ridge, NJ

Spark Developer

Responsibilities:

  • Developed Spark core and Spark SQL scripts using Scala for faster data processing.
  • Involved in importing and exporting data from HBase using Spark.
  • Involved in creating XML adapter for HBase using Spark API’s (HBase Integration).
  • Worked with different source data file formats like XML, JSONand CSV.
  • Hands on experience in the designing and development of ETL data flows using Hadoop and Spark ECO system components.
  • Experience in designing and implementing the work flow jobs using Talend & Unix/Linux scripting to perform ETL on Hadoop platform.
  • Involved in converting Hive queries into Spark transformations using Spark RDDs and Scala.
  • Utilized Built-in Maven repositories for MapR distributions.
  • Created Spark custom UDF's and UDAF’s to process business logic that varies based on requirements.
  • Developed UNIX shellscript that update the index in Elastic Search and automates the Spark submit command.
  • Created tables in HBase to handle source XML data and analyzed data with Hive Queries by implementing Hive & HBase integration.
  • Experience in loading the Json data into Elastic Search using Spark and Scala API’s.
  • Worked with Confidential engineering team to plan and deploy new Hadoop environments through Jenkin’s.
  • Work with the System Analyst and development manager on a day-to-day basis.
  • Work with service delivery (QA defects) team on transition and stabilization.
  • Developed Spark API to import data into HBase from RDBMS using Sqoop.
  • Effectively worked on Spark transformations and actions from the source xml file in converting to Json format and loading into Elastic Search.
  • Filtered data from different data sources using Altova Mapforce and provided it as an input to Spark process.
  • Involved in Code reviews, Code tuning, performance tuning, and Unit testing the Spark Application.
  • Involved in creation and deletion of alias related to respective indices on Kibana and used Kibana in visualizing the indices that are loaded into Elastic Search.

Environment: Hadoop2x, MapR, Scala, Spark, Spark SQL, HDFS, Hive, HBase, ElasticSearch, Intellij Idea, Yarn.

Confidential, Thomasville, NC

Hadoop Developer

Responsibilities:

  • Developed transformations using custom MapReduce, Pig and Hive.
  • Developed Pig Latin scripts in extracting and filtering the relevant data from the web server output files to load into HDFS.
  • Created MapReduce jobs using Pig Latin and Hive Queries.
  • Worked on a Proof of Concept (POC) on Cloudera Impala. Our use case was to compare Impala and Hive. We also wanted to look at how Impala's Response time is better than Hive when it comes to large batch processing.
  • Performed Map side joins in both Pig and Hive.
  • Build Spark Data frames to process huge amounts of structured data.
  • Used Sqoop in loading the data from RDBMS into HDFS.
  • Knowledge on handling hive queries using Spark SQL that integrates with Spark environment.
  • Optimized joins in Hive using techniques such as Sort-Merge join and Map side join.
  • Used JSON to represent complex data structure within a map reduce job.
  • Used Kafka for log aggregation to collect physical log files from servers and puts them in the HDFS for further processing.
  • Reviewed and managed the Hadoop Log files.
  • By using Flume, log data is loaded into HDFS. Focused on creating the MapReduce jobs to power the data for search and aggregation.
  • Involved in developing complex ETL transformation & performance tuning.
  • Used Sqoop in importing the data and metadata from Oracle.
  • Involved in creating Hive tables, loading the data and writing Hive queries.
  • Used Pig Latin to apply transaction on Confidential of record.
  • Performed POCs on Spark test environment.
  • Developed Pig scripts and UDFs extensively for Value Added Processing (VAPs).
  • Actively involved in the design analysis, coding and strategy development.
  • Developed SQOOP commands to pull data from Teradata and push to HDFS.
  • Developed Hive scripts for implementing dynamic partitions and buckets for retail history data.
  • Developed Pig scripts to convert the data from Avro to text file format.
  • Designed and developed read lock capability in HDFS.
  • Data ware house is designed by using Hive.
  • Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.

Environment: Hadoop2x,Apache Spark, Sparksql,Scala, MapReduce, HDFS, Pig, Hive, HBase, Kafka Java, Oracle 10g, MySQL, Ubuntu, Cloudera Hadoop distribution.

Confidential, Milwaukee, WI

Hadoop Developer

Responsibilities:

  • Developed MapReduce jobs in both PIG and Hive for cleaning and pre-processing
  • Converted various database objects like packages, procedures and functions written in PL/SQL to Scala language.
  • Developed Sqoop scripts for loading data into HDFS from DB2 and preprocessed with PIG
  • Automated the tasks of loading the data into HDFS and pre-processing with Pig by developing workflows using Oozie
  • Worked on streaming to collect this data from Flume and performed real time batch processing
  • Read from Flume and involved in pushing batches of data to HDFS and HBase for real time processing of the files
  • Implemented partitioning and bucketing techniques in Hive
  • Developed Hive scripts for implementing dynamic partitions
  • Loaded data from UNIX file system to HDFS and written Hive User Defined Functions
  • Developed code to pre-process large sets of various types of file formats such as Text, Avro, Sequence files, Xml, JSON and Parquet.
  • Created multi-stage Map Reduce jobs in Java for ad-hoc purposes.
  • Used Sqoop to load data from DB2 to HBase for faster querying and performance optimization
  • Developed suit of Unit Test Cases for Mapper, Reducer and Driver classes using testing library
  • Collected the logs data from web servers and integrated into HDFS using Flume
  • Worked on developing ETL Workflows on the data obtained using Python for processing it in HDFS and HBase using Oozie

Environment: Hadoop, HDFS, Hive, Pig, Flume, HBase, Sqoop, Oozie, Linux, Hortonworks Distribution, Relational Databases, PL/SQL, Hive, Impala.

Confidential, Bethpage Long Island, NY

Hadoop Developer

Responsibilities:

  • Developed MapReduce jobs in both PIG and Hive for data cleaning and pre-processing
  • Installed and configured Hive, Pig, Sqoop, Flume and Oozie on the Hadoop cluster
  • Developed Sqoop scripts for loading data into HDFS from DB2 and preprocessed with PIG
  • Automated the tasks of loading the data into HDFS and pre-processing with Pig by developing workflows using Oozie
  • Loaded data from UNIX file system to HDFS and written Hive User Defined Functions
  • Developed code to pre-process large sets of various types of file formats such as Text, Avro, Sequence files, Xml, JSON and Parquet
  • Created multi-stage Map Reduce jobs in Java for ad-hoc purposes
  • Used Sqoop to load data from DB2 to HBase for faster querying and performance optimization
  • Worked on streaming to collect this data from Flume and performed real time batch processing
  • Developed Hive scripts for implementing dynamic partitions
  • Developed suit of Unit Test Cases for Mapper, Reducer and Driver classes using testing library
  • Collected the logs data from web servers and integrated in to HDFS using Flume
  • Worked on developing ETL Workflows on the data obtained using Python for processing it in HDFS and HBase using Oozie
  • Written ETL jobs to visualize the data and generate reports from MySQL database using Data Stage

Environment: Hadoop, HDFS, Hive, Pig, Flume, HBase, Sqoop, Oozie, Data Stage,Linux, Hortonworks Distribution,Relational Databases

Confidential, NY

Java Developer

Responsibilities:

  • Enhancements and new development of various flow services and enhancements /Maintenance of various Java based services in Web Methods.
  • Worked extensively on Web Methods JDBC Adapters, MQ Adapters, Flow Services and notifications.
  • Is wholly responsible for the new enhancements and design of MVC web application.
  • Extensively used Web Services using Apache AXIS.
  • Implemented various Design Patterns like MVC, Factory, DAO and Façade.
  • Development/Enhancements of various Oracle stored procedures.
  • Interact with various project participants from different teams to understand the interfaces and impact.
  • Responsible for Developing JUnit Test Case to Unit Test the Services outside the server.
  • Worked with Web Services, XML/XS, AJAX for Live Link Content Management, WSDL, SOAP.
  • Simultaneously support the production system by responding and fixing various queries and issues raised by customer support. Worked on Web Logic as well as AXIS upgrades. Re-designed the EMCST application with WRIA components in the frontend, Restful web services implemented in Jersey returning JSON in the backend.

Environment: Eclipse 3.2, WebLogic 8.1, WebLogic 10.2, Apache Ant, Rational Clear Case, Servlets, MVC, Apache AXIS1.2, AXIS 1.4, JMS, MQ Adapters, Web Methods 6.5, Anthill, Oracle 10g, Oracle SQL Developer 1.2, Jersey and JSON, Waterfall method.

Confidential

Java Developer

Responsibilities:

  • Designed the application using the J2EE design patterns such as Session Façade, Business Delegate, Service Locator, Value Object and Singleton.
  • Developed presentation tier as HTML, JSPs using Struts 1.1 Framework. Used AJAX for faster page rendering.
  • Developed the middle tier using EJBs Stateless Session Bean, Java Servlets.
  • Entity Beans used for accessing data from the Oracle 8i database.
  • Worked on Hibernate for data persistence.
  • Prepared high and low level design documents for the business modules for future references and updates.
  • Deployed the application in JBoss Application Server in development and production environment.
  • Implemented CVS as Version control system.
  • Code Walkthrough/ Test cases and Test Plans.
  • Used ANT as build tool. Used Junit for writing Unit tests

Environment:Eclipse, HTML, Java Script, CoreJava, JUnit, JSP, Servlets, JDBC, Oracle 8i, AJAX, CVS and JBoss Application Server

We'd love your feedback!