Big Data Developer Resume Arizona - Hire IT People

SUMMARY:

Big Data developer with 5+ years of experience in Software industry, including experience developing applications in Java.
Expertise in installation, configuring, and administering clusters of major Hadoop distributions
Hands on experience in installing, configuring and using Hadoop ecosystem components like Hadoop, MapReduce, HDFS, HBase, Hive, Sqoop, Pig, Zookeeper, Spark, Kafka and Storm
Skilled in managing and reviewing Hadoop log files.
Familiarity with AWS cloud services ( VPC, EC2, S3, RDS, EMR )
Expert in writing Java, Scala and Python MapReduce jobs
Expert in working with Hive data warehouse tool - creating tables, data distribution by implementing partitioning and bucketing, writing and optimizing the HiveQL queries.
Experienced in loading data to Hive partitions and creating buckets in Hive
Experience in using Apache Sqoop to import and export data to and from HDFS and Hive.
Expert in importing and exporting data into HDFS and Hive using Sqoop.
Good at writing Pig scripts (python) and Hive Queries, Hive Scripts.
In-depth understanding of Data Structures and Algorithms.
Experience in Hadoop MapReduce programming, PigLatin, HiveQL and HDFS.
Experience with Oozie workflow engine to run multiple Hive and Pig jobs independently with time and data availability.
Experience in writing Shell Scripts (bash, SSH, Python).
Strong Java/JEE application development background with experience in defining technical and functional specifications
Proficient in developing strategies for Extraction, Transformation and Loading (ETL) mechanism and UNIX shell scripting.
Experienced in source control repositories viz. SVN, GitHub.
Experience with PySpark, Python, Scala programming languages
Experienced in detailed system design using use case analysis, functional analysis, modelling program with class and sequence, activity and state diagrams using UML.
Worked with Data-Warehouse Architecture and Designing Star Schema, Snow flake Schema, Fact and Dimensional Tables, Physical and Logical Data Modeling.
Designed Mapping documents for Big Data Application.
Expertise in successful implementation of projects by following Software Development Life Cycle, including Documentation, Implementation, Unit testing, System testing, build and release.
Experience in dealing with databases Oracle 9i/10g, MySQL, SQL Server
Experience using Agile and Extreme Programming methodologies.

TECHNICAL SKILLS:

BigData Eco System: HDFS, Map Reduce, Hive, Pig, HBase, Spark, Scala, Cloudera CDH4, CDH5, Hadoop Streaming, ZooKeeper, Oozie, Sqoop, Flume, Kafka and Storm.

No SQL: HBase, MongoDB

Languages: Java/J2EE, C, C++, SQL, Shell Scripting

Web Technologies: Servlets, JSP, HTML, CSS, JavaScript, jQuery, SOAP, Amazon AWS

Frameworks: Apache Struts 2.X, Spring, Hibernate, MVC

Applications & Web Servers: Apache Tomcat 5.X/ 6.X, IBM Web Sphere, JBOSS

IDEs/Utilities: Eclipse EE, Net Beans, Putty, Visual Studio

DBMS/RDBMS: Oracle 11g/10g/9i, SQL Server 2012/2008, MySQL

Operating Systems: Windows, UNIX, Linux, Macintosh

Version Control: SVN, CVS and Rational Clear Case Remote Client V7.0.1, GitHub

Tools: FileZilla, Putty, TOAD SQL Client, MySQL Workbench, JUnitSQL Oracle Developer, WinSCP, Tahiti Viewer, Cygwin

Others: UNIX Shell Scripting

PROFESSIONAL EXPERIENCE:

Confidential, Arizona

Big Data Developer

Application Developer-Big Data Tools Java/ J2EE/ Map reduce/ Hive/ Spark/ Sprint boot
Experience in Big Data Analytics and development
Strong experience on Hadoop distributions like Cloudera
Experience in developing Map Reduce jobs with Java API in Hadoop
Experience in designing and developing applications in Spark using python to compare the performance of Spark with Hive and SQL/Oracle, NoSQL databases,Yarn.
Also involve in data migration from gcpapollo to Enterprise Salesforce
Exported data from HDFS environment into RDBMS using Sqoop for report generation and visualization purpose.
Experienced on major Hadoop ecosystem's projects such as PIG, HIVE, HBASE and monitoring them with Cloudera Manager.
Designed, developed and deployed an Apache Spark framework used to create test data by statistically modeling and analyzing production data
Designed, developed and deployed Java Spring REST APIs to connect to the Spark framework so that test data could be generated on demand
Experience in manipulating/analyzing large datasets and finding patterns and insights within structured and unstructured data .
Experience in migrating the data using Hive from Gcpapollo to Esodl and vice-versa.
Experience in designing and developing applications in Spark using Scala to compare the performance of Spark with Hive and SQL.
Good understanding of NoSQL databases and hands on work experience in writing applications on NoSQL databases like HBase.
Experienced in using agile approaches, including Extreme Programming, Test-Driven Development and Agile Scrum.
Worked for data extraction, transformation and aggregation from multiple file formats including XML, JSON, CSV and other compressed file formats.
Used Shell Scripting in Linux to configure the Sqoop and Hive tasks required for the data pipeline flow
Developed data transformation modules in Python language to convert the JSON format files into Spark DataFrames to handle data from Legacy ERP systems.

Technology Used : Cloudera Hadoop, Cassandra, Flat files, Oracle 11g/10g, MySQL,Memsql, Sqoop, Python, Java, Pyspark Kafka Hive, Unix Shell Scripts, Yarn, Zoo Keeper, SQL, Map Reduce, Pig, Hbase, Unix.

Confidential, Irving, Texas

Spark Developer

Responsibilities:

Work on open source cluster computing framework based on Apache Spark.
Participates in the design and development of large-scale changes to enterprise data warehouses.
Partner with solution and data architecture team to create flexible, agile and impactful data solutions
Collected real time data from IoT devices installed in the trucks through Kafka into HDFS
Partner with the risk data management to define business and data requirement
Co-ordinated with information management and business intelligence department
Designed and developed a new module which will be used for doing predictive analysis and inferring the data in distributed environment. Used Java, Hive, Sqoop, Spark.
Enhanced existing components to work on the Data Intensive System from traditional to High availability Scalable system.
Experience in designing and developing applications in Spark using Scala to compare the performance of Spark with Hive and Impala.
Optimizing of existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames and Pair RDD's.
Experienced in handling large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark, Effective & efficient Joins, Transformations and other during ingestion process itself.
Analysis of the SQL scripts and designing the solutions to implement using Pyspark
Effective utilization of Hive for Ad Hoc and instant results.
Prepare Low Level Design Document for all the development with minor and major changes.
Prepare High Level Design Document to give the overall picture of system integration
Prepare Unit test document for each release and clearly indicate the steps followed while unit testing with different scenarios and captured.
Debug the log files whenever a problem come in the system and try to do the root cause analyses.
Reviewed code and suggested improvements.

Technologies Used: Scala, Python, Sparks, Hive, Sqoop, Spark, Oracle, Cloudera, YARN, HDFS, Kafka, Impala, XML, XSL, UML, Multi-threading, Servlets, Linux, Zookeeper

Confidential (IL.)

Hadoop/ Spark Developer

Responsibilities:

Developed multiple MapReduce jobs in Java for data cleaning and preprocessing.
Troubleshooting the cluster by reviewing Log files.
Involved in performance tuning of spark applications for fixing right batch interval time and memory tuning.
Created Hive queries that helped market analysts spot emerging trends by comparing fresh data with EDW reference tables and historical metrics.
Created reports for the cluster usage using Data node/ Name node / Resource manager and Navigator log data.
Imported data using Sqoop from Teradata using Teradata connector.
Used Oozie to orchestrate the work flow.
Developed programs in Spark based on the application for faster data processing than standard MapReduce programs in Java and Scala
Creating Hive tables and working on them for data analysis in order to meet the business requirements.
Designed and implemented large-scale parallel relation-learning system .
Installed and benchmarked Hadoop/HBase clusters for internal use.
Written HBASE Client program in Java and Webservices.
Model, serialize and manipulate data in multiple forms (XML).
Shared responsibility for administration of Hadoop ecosystem.
Developed Splunk dashboards, searches and reporting
Experience with data model concepts-star schema dimensional modeling Relational design (ER).

Technologies Used: Hadoop, MapReduce, HDFS, Splunk, Hive, Spark, Java, Scala, Cloudera, HBase, Linux, XML, MySQL Workbench, Java 6, Eclipse, Oracle 10g, PL/SQL, SQL*PLUS

Confidential, -San Francisco, CA.

Hadoop Developer

Responsibilities:

Worked on Hortonworks-HDP 2.5distribution
• Responsible for building-scalable distribution data solution using Hadoop
• Involved in importing data from MicrosoftSQLServer, MySQL, Teradata. into HDFS using Sqoop.
• Played a key role in dynamic partitioning and Bucketing of the data stored in Hive Metadata.
• Writing HiveQL queries for integrating different tables for create views to produce result set.
• Collected the log data from Web Servers and integrated into HDFS using Flume.
• Experienced on loading and transforming of large sets of structed and unstructured data.
• Used MapReduce programs for data cleaning and transformations and load the output into the Hive tables in different file formats.
• Written MapReduce programs to handle semi structed and un structed data like JSON, Avro data files and sequence files for log files.
• Created data pipelines for different events to load the data from DynamoDB to AWS S3 bucket and then into HDFS location.
• Involved in loading data into HBaseNoSQL database.
• Building, Managing and scheduling Oozie workflows for end to end job processing
• Experienced in extending Hive and Pig core functionality by writing custom UDFs using Java.
• Analyzing of Large volumes of structured data using SparkSQL.
• Written shell script to execute HiveQL.
• Used Spark as ETL tool
• Written Automated shell scripts in Linux/Unix environment using bash.
• Migrated HiveQL queries into SparkSQLto improve performance.
• Extracted Real time feed using Spark streaming and convert to RDD and process data into Data Frame and load the data into HBase.
• Experienced in using Data Stax Spark connector which is used to store the data into Cassandra databaseor get the data from Cassandra database.
• Extracted Real time feed using Spark streaming and convert it to RDD and process data into Data Frame and load the data into Cassandra.
Environment: Hortonworks, Hadoop, HDFS, Pig, Sqoop, Hive, Oozie, Zookeeper, NoSQL, HBase, Shell Scripting, Scala, Spark, SparkSQL

Confidential

Java Developer

Responsibilities:

Worked on Agile development environment. Participated in scrum meetings.
Developed web pages using JSF framework establishing communication between various pages in application.
Designed and developed JSP pages using Struts framework.
Utilized the Tiles framework for page layouts.
Involved in writing client side validations using Java Script.
Used Hibernate framework to persist the employer work hours to the database.
Spring framework AOP features were extensively used.
Followed Use Case Design Specification and developed Class and Sequence Diagrams using RAD, MS Visio.
Used JavaScript, AJAX for making calls to Controllers that get File from server and popup to the screen without losing the attributes of the page.
Coded Test Cases and created Mock Objects using JMock and used JUnit to run tests.
Configured Hudson and integrated it with CVS to automatically run test cases with every build and generate code coverage report.
Configured Data Source on WebLogic Application server for connecting to Oracle, DB2 Databases.
Wrote complex SQL statements and used PL/SQL for performing database operations with the help of TOAD.
Created User interface for Testing team which helped them efficiently test executables.
Mentored co-developers with new technologies. Participated in Code reviews.
Worked on a Data stage project which generates automated daily reports after performing various validations.

Environment: UNIX, RAD6.0, WebLogic, Oracle, Maven, JavaScript, JSF, JSP, Servlets, Log4J, Spring, Pure Query, JMock, JUnit,TOAD, MS Visio, Data Stage, CVS, SVN, UML and SOAPUI.

Confidential

Jr. Java Developer

Responsibilities:

Involved in Design, Development and Support phases of Software Development Life Cycle (SDLC)
Experience of gathering data for requirements and use case development.
Reviewed the functional, design, source code and test specifications
Involved in developing the complete frontend development using Java Script and CSS
Implemented backend configuration on DAO, and XML generation modules of DIS
Used JDBC for database access, and used Data Transfer Object (DTO) design patterns
Unit testing and rigorous integration testing of the whole application
Written and executed the Test Scripts using JUNIT and actively involved in system testing
Developed XML parsing tool for regression testing
Worked on documentation that meets with required compliance standards. Also, monitored end-to-end testing activities.

Technologies Used: Java, JavaScript, HTML, CSS, JDK 1.5.1, JDBC, Oracle10g, XML, XSL, Solaris and UML.

We provide IT Staff Augmentation Services!

Big Data Developer Resume

ArizonA

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship