We provide IT Staff Augmentation Services!

Big Data Developer Resume

San Bruno, CA


  • 7+ years of working experience and expertise in Analysis, Design, Development, Deployment and Implementation of Web and Agile Enterprise applications and Client/Server architecture.
  • 4+ years of experience in Hadoop’s ecosystem implementation and development of Big Data applications.
  • Experience in Hadoop ecosystem components like Map Reduce, HDFS, Hive, Scala, Sqoop, Pig, Kafka, Nifi for scalability, distributed computing and high performance computing.
  • Experience in developing of custom Map - Reduce programs.
  • Strong experience in data analytics using Hive and Pig, including by writing custom UDFs.
  • Expertise in working with Flume in configuring and working with Kafka to load the data from multiple sources directly into HDFS.
  • Hands on experience on working with Hadoop Database, HBASE, Cassandra, Radis store and developing Storm topologies for real-time computation.
  • In depth knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and MRv1 and MRv2 (YARN).
  • Working knowledge of Spark features including Core Spark, Spark SQL, and Spark Streaming.
  • Knowledge of job workflow scheduling and monitoring tools like Oozie, UC4 and Zookeeper.
  • Extensive knowledge in Oracle PL/SQL, Teradata, MySQL and experience in Linux/Unix shell script and Python.
  • Involved in creating the marketing and sales data reports for analytics using Tableau
  • Worked in Data Warehouse and Business Intelligence Projects along with the team of Informatica, Talend (ETL), Cognos and Powerplay.
  • Experience in developing Web and client-server applications using JAVA/J2EE technologies like JSP, Servlets with various open source framework like Struts, spring, Hibernate .
  • Worked with creation and consumption of SOAP based & Restful web services using WSDL, SOAP, JAX-WS, JAX-RS, SOAP UI and Rest client.
  • Experience with XML technologies like XML, XSD, XSLT and experience on SVN, Github.
  • Strong knowledge of J2EE design patterns like MVC, Session Facade, Business Delegate, Front Controller, Service Locator, Data Transfer Objects and Data Access Objects etc.
  • Good working knowledge on build tools like Maven, Ant for project build/test/deployment, Log4j for error logging and Debugging, JUnit for unit and integration testing.
  • Knowledge about SDLC and methodologies like Agile, SCRUM.
  • Developed and deployed applications on UNIX and Windows platforms.
  • Ability to perform Confidential a high level, meet deadlines with quality delivery, adaptable to ever changing priorities.
  • Have great motivation to learn new skills/technologies, excellent analytical/problem-solving skills, fast-learner, resourceful, committed, hard-worker, and self-initiative.


Big Data Skills: HDFS, MapReduce, Hive, HBase, Spark, Kafka, Nifi, Storm, Redis, Flume.

Programming Languages: Java, Linux shell scripts, Python, Scala

J2EE Technologies: Servlets, JSP, Web-Services

Web Technologies: JSP, HTML, CSS, Java Script

Reporting tools: Tableau, Cognos

Frameworks: Struts, spring, Hibernate

Web Services: SOAP, REST

IDE Tools: Eclipse, IBM Websphere, NetBeans.

Application Servers: IBM WebSphere, WebLogic, Tomcat,JBoss

Databases: Oracle 11g/10g/9i, MySQL, DB2, MS-SQL Server

Testing Tools: JUnit

Operating Systems: All Versions of Microsoft Windows, UNIX and LINUX.


Big Data Developer

Confidential, San Bruno, CA

  • Involved in analyzing data to extract targeted customers required for specific campaigns using Hive based on transactions, user events like clicks/opens, browse information.
  • Developed Map-Reduce programs in Python and Scala for processing the extracted customer data.
  • Developed MR jobs for bulk insertion of Walmart’s customer and item data from files to HBASE, Cassandra.
  • Created shell scripts for automating the process of extracting and loading targeted customers into Hive tables on daily basis to whom various email campaigns are sent.
  • Worked on various email campaigns like Back-In-Stock, Price-Drop, Post-Browse, Customer Ratings and Reviews, Shopping Cart Abandon etc.
  • Worked in building data pipelines.
  • Developed and deployed Hive UDF’s written in Java for encrypting customer-id’s, creating item-image-URL’s etc.
  • Worked on StrongView tool for scheduling and monitoring Batch email campaigns.
  • Extracted StrongView and Kafka logs from servers using Flume and extracted information like open/click info of customers and loaded into Hive tables. Created reports for getting counts of emails sends, opens, clicks.
  • Created marketing reports for various campaigns open/click info of customers.
  • Written Shell scripts for automation of Hive query processing.
  • Scheduled Map-Reduce and Hive workflows using Oozie and Cron.
  • Used apache Nifi to generate graphical representation of data transferring and flow.
  • Developed HTML templates for various trigger campaigns.
  • Worked on Oracle PL/SQL and Teradata some extract and loading data.
  • Analyzed transaction data and extracted category wise best-selling items info which is used by marketing team to come up with ideas for new campaigns.
  • Developed complex Hive queries using Joins and automated these jobs using Shell scripts.
  • Monitored and debugged Map-Reduce jobs using the Job-tracker administration page.
  • Developed Storm Topologies for real time email campaigns where Kafka is used as source for getting customer’s website activity information and storing data into Redis server.
  • Involved in migrating existing Hive jobs to Spark SQL environment.
  • Used Spark Streaming API for consuming data from Kafka source and processed data with core Spark functions written in Scala and then stored resultant data in HBase table which is later used for generating reports
  • Developed data pipeline to ingest data from Kafka source into HDFS as sink using Flume which is used for analysis.
  • Worked on Devops in migrating the data to other databases.
  • Developed REST webservices for providing metadata information required for the campaigns.

Environment: MapReduce,Hive,HBASE,Python,Java,Storm,Scala,SparkStreaming,SparkSQL,Redis,Oozie,Kafka,Flume,REST webservices, Tableau, Teradata.

Hadoop Developer

Confidential,Dallas, TX

  • Installed and configured Hadoop, MapReduce, HDFS (Hadoop Distributed File System), developed multiple MapReduce jobs in java.
  • Worked with the infrastructure and admin team in designing, modeling, sizing and configuring Hadoop cluster of 15 nodes.
  • Developed Map Reduce programs in Java and Scala for parsing the raw data and populating staging Tables.
  • Created Hive queries to compare the raw data with EDW reference tables and performing aggregates
  • Developed Map-Reduce programs in Python.
  • Importing and exporting data into HDFS and Hive using Sqoop.
  • Experienced in analyzing data with Hive and Pig.
  • Experienced knowledge over the Restful API's like Elastic Search.
  • Writing Pig scripts to process the data.
  • Developed PIG Latin scripts to extract the data from the web server output files to load into HDFS.
  • Integrating bulk data into Cassandra file system using MapReduce programs.
  • Got good experience with NOSQL database.
  • Involved in HBASE setup and storing data into HBASE, which will be used for further analysis.
  • Experienced in managing and reviewing Hadoop log files.
  • Experienced in defining job flows.
  • Experienced in managing and reviewing Hadoop log files.
  • Installed and configured Hive and also written Hive UDFs.
  • Involved in creating Hive tables, loading with data and writing hive queries using the HiveQL which will run internally in map reduce way.
  • Extracted the data from MySQL, AWS RedShift into HDFS using Sqoop.
  • Used HiveQL to analyze the partitioned and bucketed data and compute various metrics for reporting.
  • Deployed Hadoop Cluster in Fully Distributed and Pseudo-distributed modes.
  • Experience in managing and monitoring Hadoop cluster using Cloudera Manager.
  • Supported in setting up QA environment and updating configurations for implementing scripts with Pig, Hive and Sqoop.
  • Unit tested a sample of raw data and improved performance and turned over to production.

Environment: JDK1.7, Java, Hadoop 2.6.0, MapReduce, HDFS, Hive, Sqoop, HBase, Pig, Oozie, Kerberos, Linux, Shell Scripting, Oracle 11g, PL/SQL, SQL*PLUS, Cognos, Talend, HDInsight.



  • Participated in Design meetings and technical discussions.
  • Developed User Interface in JSP, JavaScript and HTML.
  • Implemented web application with JSF MVC.
  • Implemented web layer with Spring MVC.
  • Created GUIs for applications and applets using SWING components and applets.
  • Developed Java Servlets and Beans for Backend processes.
  • Created the ETL exception reports and validation reports after the data is loaded into the warehouse database.
  • Development of ETL using Informatica
  • Created database tables, data model with oracle 10g.
  • Created JUnit test cases to test individual modules.
  • Participated in status meetings to ensure the task updates.
  • Involved in bug fixing and enhancements of application.

Environment: Spring, JSF, Oracle, JRE 1.5, Eclipse 3.2, My Eclipse 4.1, JBoss EJB 2.0, Subversion, JSP,HTML, Java Script, PL/SQL, Informatica, Windows XP.

Hire Now