We provide IT Staff Augmentation Services!

Big Data Developer Resume

3.00/5 (Submit Your Rating)

ArizonA

SUMMARY:

  • Big Data developer with 5+ years of experience in Software industry, including experience developing applications in Java.
  • Expertise in installation, configuring, and administering clusters of major Hadoop distributions
  • Hands on experience in installing, configuring and using Hadoop ecosystem components like Hadoop, MapReduce, HDFS, HBase, Hive, Sqoop, Pig, Zookeeper, Spark, Kafka and Storm
  • Skilled in managing and reviewing Hadoop log files.
  • Familiarity with AWS cloud services ( VPC, EC2, S3, RDS, EMR )
  • Expert in writing Java, Scala and Python MapReduce jobs
  • Expert in working with Hive data warehouse tool - creating tables, data distribution by implementing partitioning and bucketing, writing and optimizing the HiveQL queries.
  • Experienced in loading data to Hive partitions and creating buckets in Hive
  • Experience in using Apache Sqoop to import and export data to and from HDFS and Hive.
  • Expert in importing and exporting data into HDFS and Hive using Sqoop.
  • Good at writing Pig scripts (python) and Hive Queries, Hive Scripts.
  • In-depth understanding of Data Structures and Algorithms.
  • Experience in Hadoop MapReduce programming, PigLatin, HiveQL and HDFS.
  • Experience with Oozie workflow engine to run multiple Hive and Pig jobs independently with time and data availability.
  • Experience in writing Shell Scripts (bash, SSH, Python).
  • Strong Java/JEE application development background with experience in defining technical and functional specifications
  • Proficient in developing strategies for Extraction, Transformation and Loading (ETL) mechanism and UNIX shell scripting.
  • Experienced in source control repositories viz. SVN, GitHub.
  • Experience with PySpark, Python, Scala programming languages
  • Experienced in detailed system design using use case analysis, functional analysis, modelling program with class and sequence, activity and state diagrams using UML.
  • Worked with Data-Warehouse Architecture and Designing Star Schema, Snow flake Schema, Fact and Dimensional Tables, Physical and Logical Data Modeling.
  • Designed Mapping documents for Big Data Application.
  • Expertise in successful implementation of projects by following Software Development Life Cycle, including Documentation, Implementation, Unit testing, System testing, build and release.
  • Experience in dealing with databases Oracle 9i/10g, MySQL, SQL Server
  • Experience using Agile and Extreme Programming methodologies.

TECHNICAL SKILLS:

BigData Eco System: HDFS, Map Reduce, Hive, Pig, HBase, Spark, Scala, Cloudera CDH4, CDH5, Hadoop Streaming, ZooKeeper, Oozie, Sqoop, Flume, Kafka and Storm.

No SQL: HBase, MongoDB

Languages: Java/J2EE, C, C++, SQL, Shell Scripting

Web Technologies: Servlets, JSP, HTML, CSS, JavaScript, jQuery, SOAP, Amazon AWS

Frameworks: Apache Struts 2.X, Spring, Hibernate, MVC

Applications & Web Servers: Apache Tomcat 5.X/ 6.X, IBM Web Sphere, JBOSS

IDEs/Utilities: Eclipse EE, Net Beans, Putty, Visual Studio

DBMS/RDBMS: Oracle 11g/10g/9i, SQL Server 2012/2008, MySQL

Operating Systems: Windows, UNIX, Linux, Macintosh

Version Control: SVN, CVS and Rational Clear Case Remote Client V7.0.1, GitHub

Tools: FileZilla, Putty, TOAD SQL Client, MySQL Workbench, JUnitSQL Oracle Developer, WinSCP, Tahiti Viewer, Cygwin

Others: UNIX Shell Scripting

PROFESSIONAL EXPERIENCE:

Confidential, Arizona

Big Data Developer

  • Application Developer-Big Data Tools Java/ J2EE/ Map reduce/ Hive/ Spark/ Sprint boot
  • Experience in Big Data Analytics and development
  • Strong experience on Hadoop distributions like Cloudera
  • Experience in developing Map Reduce jobs with Java API in Hadoop
  • Experience in designing and developing applications in Spark using python to compare the performance of Spark with Hive and SQL/Oracle, NoSQL databases,Yarn.
  • Also involve in data migration from gcpapollo to Enterprise Salesforce
  • Exported data from HDFS environment into RDBMS using Sqoop for report generation and visualization purpose.
  • Experienced on major Hadoop ecosystem's projects such as PIG, HIVE, HBASE and monitoring them with Cloudera Manager.
  • Designed, developed and deployed an Apache Spark framework used to create test data by statistically modeling and analyzing production data
  • Designed, developed and deployed Java Spring REST APIs to connect to the Spark framework so that test data could be generated on demand
  • Experience in manipulating/analyzing large datasets and finding patterns and insights within structured and unstructured data .
  • Experience in migrating the data using Hive from Gcpapollo to Esodl and vice-versa.
  • Experience in designing and developing applications in Spark using Scala to compare the performance of Spark with Hive and SQL.
  • Good understanding of NoSQL databases and hands on work experience in writing applications on NoSQL databases like HBase.
  • Experienced in using agile approaches, including Extreme Programming, Test-Driven Development and Agile Scrum.
  • Worked for data extraction, transformation and aggregation from multiple file formats including XML, JSON, CSV and other compressed file formats.
  • Used Shell Scripting in Linux to configure the Sqoop and Hive tasks required for the data pipeline flow
  • Developed data transformation modules in Python language to convert the JSON format files into Spark DataFrames to handle data from Legacy ERP systems.

Technology Used : Cloudera Hadoop, Cassandra, Flat files, Oracle 11g/10g, MySQL,Memsql, Sqoop, Python, Java, Pyspark Kafka Hive, Unix Shell Scripts, Yarn, Zoo Keeper, SQL, Map Reduce, Pig, Hbase, Unix.

Confidential, Irving, Texas

Spark Developer

Responsibilities:

  • Work on open source cluster computing framework based on Apache Spark.
  • Participates in the design and development of large-scale changes to enterprise data warehouses.
  • Partner with solution and data architecture team to create flexible, agile and impactful data solutions
  • Collected real time data from IoT devices installed in the trucks through Kafka into HDFS
  • Partner with the risk data management to define business and data requirement
  • Co-ordinated with information management and business intelligence department
  • Designed and developed a new module which will be used for doing predictive analysis and inferring the data in distributed environment. Used Java, Hive, Sqoop, Spark.
  • Enhanced existing components to work on the Data Intensive System from traditional to High availability Scalable system.
  • Experience in designing and developing applications in Spark using Scala to compare the performance of Spark with Hive and Impala.
  • Optimizing of existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames and Pair RDD's.
  • Experienced in handling large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark, Effective & efficient Joins, Transformations and other during ingestion process itself.
  • Analysis of the SQL scripts and designing the solutions to implement using Pyspark
  • Effective utilization of Hive for Ad Hoc and instant results.
  • Prepare Low Level Design Document for all the development with minor and major changes.
  • Prepare High Level Design Document to give the overall picture of system integration
  • Prepare Unit test document for each release and clearly indicate the steps followed while unit testing with different scenarios and captured.
  • Debug the log files whenever a problem come in the system and try to do the root cause analyses.
  • Reviewed code and suggested improvements.

Technologies Used: Scala, Python, Sparks, Hive, Sqoop, Spark, Oracle, Cloudera, YARN, HDFS, Kafka, Impala, XML, XSL, UML, Multi-threading, Servlets, Linux, Zookeeper

Confidential (IL.)

Hadoop/ Spark Developer

Responsibilities:

  • Developed multiple MapReduce jobs in Java for data cleaning and preprocessing.
  • Troubleshooting the cluster by reviewing Log files.
  • Involved in performance tuning of spark applications for fixing right batch interval time and memory tuning.
  • Created Hive queries that helped market analysts spot emerging trends by comparing fresh data with EDW reference tables and historical metrics.
  • Created reports for the cluster usage using Data node/ Name node / Resource manager and Navigator log data.
  • Imported data using Sqoop from Teradata using Teradata connector.
  • Used Oozie to orchestrate the work flow.
  • Developed programs in Spark based on the application for faster data processing than standard MapReduce programs in Java and Scala
  • Creating Hive tables and working on them for data analysis in order to meet the business requirements.
  • Designed and implemented large-scale parallel relation-learning system .
  • Installed and benchmarked Hadoop/HBase clusters for internal use.
  • Written HBASE Client program in Java and Webservices.
  • Model, serialize and manipulate data in multiple forms (XML).
  • Shared responsibility for administration of Hadoop ecosystem.
  • Developed Splunk dashboards, searches and reporting
  • Experience with data model concepts-star schema dimensional modeling Relational design (ER).

Technologies Used: Hadoop, MapReduce, HDFS, Splunk, Hive, Spark, Java, Scala, Cloudera, HBase, Linux, XML, MySQL Workbench, Java 6, Eclipse, Oracle 10g, PL/SQL, SQL*PLUS

Confidential, -San Francisco, CA.

Hadoop Developer

Responsibilities:

  • Worked on Hortonworks-HDP 2.5distribution
  • • Responsible for building-scalable distribution data solution using Hadoop
  • • Involved in importing data from MicrosoftSQLServer, MySQL, Teradata. into HDFS using Sqoop.
  • • Played a key role in dynamic partitioning and Bucketing of the data stored in Hive Metadata.
  • • Writing HiveQL queries for integrating different tables for create views to produce result set.
  • • Collected the log data from Web Servers and integrated into HDFS using Flume.
  • • Experienced on loading and transforming of large sets of structed and unstructured data.
  • • Used MapReduce programs for data cleaning and transformations and load the output into the Hive tables in different file formats.
  • • Written MapReduce programs to handle semi structed and un structed data like JSON, Avro data files and sequence files for log files.
  • • Created data pipelines for different events to load the data from DynamoDB to AWS S3 bucket and then into HDFS location.
  • • Involved in loading data into HBaseNoSQL database.
  • • Building, Managing and scheduling Oozie workflows for end to end job processing
  • • Experienced in extending Hive and Pig core functionality by writing custom UDFs using Java.
  • • Analyzing of Large volumes of structured data using SparkSQL.
  • • Written shell script to execute HiveQL.
  • • Used Spark as ETL tool
  • • Written Automated shell scripts in Linux/Unix environment using bash.
  • • Migrated HiveQL queries into SparkSQLto improve performance.
  • • Extracted Real time feed using Spark streaming and convert to RDD and process data into Data Frame and load the data into HBase.
  • • Experienced in using Data Stax Spark connector which is used to store the data into Cassandra databaseor get the data from Cassandra database.
  • • Extracted Real time feed using Spark streaming and convert it to RDD and process data into Data Frame and load the data into Cassandra.
  • Environment: Hortonworks, Hadoop, HDFS, Pig, Sqoop, Hive, Oozie, Zookeeper, NoSQL, HBase, Shell Scripting, Scala, Spark, SparkSQL

Confidential

Java Developer

Responsibilities:

  • Worked on Agile development environment. Participated in scrum meetings.
  • Developed web pages using JSF framework establishing communication between various pages in application.
  • Designed and developed JSP pages using Struts framework.
  • Utilized the Tiles framework for page layouts.
  • Involved in writing client side validations using Java Script.
  • Used Hibernate framework to persist the employer work hours to the database.
  • Spring framework AOP features were extensively used.
  • Followed Use Case Design Specification and developed Class and Sequence Diagrams using RAD, MS Visio.
  • Used JavaScript, AJAX for making calls to Controllers that get File from server and popup to the screen without losing the attributes of the page.
  • Coded Test Cases and created Mock Objects using JMock and used JUnit to run tests.
  • Configured Hudson and integrated it with CVS to automatically run test cases with every build and generate code coverage report.
  • Configured Data Source on WebLogic Application server for connecting to Oracle, DB2 Databases.
  • Wrote complex SQL statements and used PL/SQL for performing database operations with the help of TOAD.
  • Created User interface for Testing team which helped them efficiently test executables.
  • Mentored co-developers with new technologies. Participated in Code reviews.
  • Worked on a Data stage project which generates automated daily reports after performing various validations.

Environment: UNIX, RAD6.0, WebLogic, Oracle, Maven, JavaScript, JSF, JSP, Servlets, Log4J, Spring, Pure Query, JMock, JUnit,TOAD, MS Visio, Data Stage, CVS, SVN, UML and SOAPUI.

Confidential

Jr. Java Developer

Responsibilities:

  • Involved in Design, Development and Support phases of Software Development Life Cycle (SDLC)
  • Experience of gathering data for requirements and use case development.
  • Reviewed the functional, design, source code and test specifications
  • Involved in developing the complete frontend development using Java Script and CSS
  • Implemented backend configuration on DAO, and XML generation modules of DIS
  • Used JDBC for database access, and used Data Transfer Object (DTO) design patterns
  • Unit testing and rigorous integration testing of the whole application
  • Written and executed the Test Scripts using JUNIT and actively involved in system testing
  • Developed XML parsing tool for regression testing
  • Worked on documentation that meets with required compliance standards. Also, monitored end-to-end testing activities.

Technologies Used: Java, JavaScript, HTML, CSS, JDK 1.5.1, JDBC, Oracle10g, XML, XSL, Solaris and UML.

We'd love your feedback!