Big Data Developer Resume
ArizonA
SUMMARY:
- Big Data developer with 5+ years of experience in Software industry, including experience developing applications in Java.
- Expertise in installation, configuring, and administering clusters of major Hadoop distributions
- Hands on experience in installing, configuring and using Hadoop ecosystem components like Hadoop, MapReduce, HDFS, HBase, Hive, Sqoop, Pig, Zookeeper, Spark, Kafka and Storm
- Skilled in managing and reviewing Hadoop log files.
- Familiarity with AWS cloud services ( VPC, EC2, S3, RDS, EMR )
- Expert in writing Java, Scala and Python MapReduce jobs
- Expert in working with Hive data warehouse tool - creating tables, data distribution by implementing partitioning and bucketing, writing and optimizing the HiveQL queries.
- Experienced in loading data to Hive partitions and creating buckets in Hive
- Experience in using Apache Sqoop to import and export data to and from HDFS and Hive.
- Expert in importing and exporting data into HDFS and Hive using Sqoop.
- Good at writing Pig scripts (python) and Hive Queries, Hive Scripts.
- In-depth understanding of Data Structures and Algorithms.
- Experience in Hadoop MapReduce programming, PigLatin, HiveQL and HDFS.
- Experience with Oozie workflow engine to run multiple Hive and Pig jobs independently with time and data availability.
- Experience in writing Shell Scripts (bash, SSH, Python).
- Strong Java/JEE application development background with experience in defining technical and functional specifications
- Proficient in developing strategies for Extraction, Transformation and Loading (ETL) mechanism and UNIX shell scripting.
- Experienced in source control repositories viz. SVN, GitHub.
- Experience with PySpark, Python, Scala programming languages
- Experienced in detailed system design using use case analysis, functional analysis, modelling program with class and sequence, activity and state diagrams using UML.
- Worked with Data-Warehouse Architecture and Designing Star Schema, Snow flake Schema, Fact and Dimensional Tables, Physical and Logical Data Modeling.
- Designed Mapping documents for Big Data Application.
- Expertise in successful implementation of projects by following Software Development Life Cycle, including Documentation, Implementation, Unit testing, System testing, build and release.
- Experience in dealing with databases Oracle 9i/10g, MySQL, SQL Server
- Experience using Agile and Extreme Programming methodologies.
TECHNICAL SKILLS:
BigData Eco System: HDFS, Map Reduce, Hive, Pig, HBase, Spark, Scala, Cloudera CDH4, CDH5, Hadoop Streaming, ZooKeeper, Oozie, Sqoop, Flume, Kafka and Storm.
No SQL: HBase, MongoDB
Languages: Java/J2EE, C, C++, SQL, Shell Scripting
Web Technologies: Servlets, JSP, HTML, CSS, JavaScript, jQuery, SOAP, Amazon AWS
Frameworks: Apache Struts 2.X, Spring, Hibernate, MVC
Applications & Web Servers: Apache Tomcat 5.X/ 6.X, IBM Web Sphere, JBOSS
IDEs/Utilities: Eclipse EE, Net Beans, Putty, Visual Studio
DBMS/RDBMS: Oracle 11g/10g/9i, SQL Server 2012/2008, MySQL
Operating Systems: Windows, UNIX, Linux, Macintosh
Version Control: SVN, CVS and Rational Clear Case Remote Client V7.0.1, GitHub
Tools: FileZilla, Putty, TOAD SQL Client, MySQL Workbench, JUnitSQL Oracle Developer, WinSCP, Tahiti Viewer, Cygwin
Others: UNIX Shell Scripting
PROFESSIONAL EXPERIENCE:
Confidential, Arizona
Big Data Developer
- Application Developer-Big Data Tools Java/ J2EE/ Map reduce/ Hive/ Spark/ Sprint boot
- Experience in Big Data Analytics and development
- Strong experience on Hadoop distributions like Cloudera
- Experience in developing Map Reduce jobs with Java API in Hadoop
- Experience in designing and developing applications in Spark using python to compare the performance of Spark with Hive and SQL/Oracle, NoSQL databases,Yarn.
- Also involve in data migration from gcpapollo to Enterprise Salesforce
- Exported data from HDFS environment into RDBMS using Sqoop for report generation and visualization purpose.
- Experienced on major Hadoop ecosystem's projects such as PIG, HIVE, HBASE and monitoring them with Cloudera Manager.
- Designed, developed and deployed an Apache Spark framework used to create test data by statistically modeling and analyzing production data
- Designed, developed and deployed Java Spring REST APIs to connect to the Spark framework so that test data could be generated on demand
- Experience in manipulating/analyzing large datasets and finding patterns and insights within structured and unstructured data .
- Experience in migrating the data using Hive from Gcpapollo to Esodl and vice-versa.
- Experience in designing and developing applications in Spark using Scala to compare the performance of Spark with Hive and SQL.
- Good understanding of NoSQL databases and hands on work experience in writing applications on NoSQL databases like HBase.
- Experienced in using agile approaches, including Extreme Programming, Test-Driven Development and Agile Scrum.
- Worked for data extraction, transformation and aggregation from multiple file formats including XML, JSON, CSV and other compressed file formats.
- Used Shell Scripting in Linux to configure the Sqoop and Hive tasks required for the data pipeline flow
- Developed data transformation modules in Python language to convert the JSON format files into Spark DataFrames to handle data from Legacy ERP systems.
Technology Used : Cloudera Hadoop, Cassandra, Flat files, Oracle 11g/10g, MySQL,Memsql, Sqoop, Python, Java, Pyspark Kafka Hive, Unix Shell Scripts, Yarn, Zoo Keeper, SQL, Map Reduce, Pig, Hbase, Unix.
Confidential, Irving, Texas
Spark Developer
Responsibilities:
- Work on open source cluster computing framework based on Apache Spark.
- Participates in the design and development of large-scale changes to enterprise data warehouses.
- Partner with solution and data architecture team to create flexible, agile and impactful data solutions
- Collected real time data from IoT devices installed in the trucks through Kafka into HDFS
- Partner with the risk data management to define business and data requirement
- Co-ordinated with information management and business intelligence department
- Designed and developed a new module which will be used for doing predictive analysis and inferring the data in distributed environment. Used Java, Hive, Sqoop, Spark.
- Enhanced existing components to work on the Data Intensive System from traditional to High availability Scalable system.
- Experience in designing and developing applications in Spark using Scala to compare the performance of Spark with Hive and Impala.
- Optimizing of existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames and Pair RDD's.
- Experienced in handling large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark, Effective & efficient Joins, Transformations and other during ingestion process itself.
- Analysis of the SQL scripts and designing the solutions to implement using Pyspark
- Effective utilization of Hive for Ad Hoc and instant results.
- Prepare Low Level Design Document for all the development with minor and major changes.
- Prepare High Level Design Document to give the overall picture of system integration
- Prepare Unit test document for each release and clearly indicate the steps followed while unit testing with different scenarios and captured.
- Debug the log files whenever a problem come in the system and try to do the root cause analyses.
- Reviewed code and suggested improvements.
Technologies Used: Scala, Python, Sparks, Hive, Sqoop, Spark, Oracle, Cloudera, YARN, HDFS, Kafka, Impala, XML, XSL, UML, Multi-threading, Servlets, Linux, Zookeeper
Confidential (IL.)
Hadoop/ Spark Developer
Responsibilities:
- Developed multiple MapReduce jobs in Java for data cleaning and preprocessing.
- Troubleshooting the cluster by reviewing Log files.
- Involved in performance tuning of spark applications for fixing right batch interval time and memory tuning.
- Created Hive queries that helped market analysts spot emerging trends by comparing fresh data with EDW reference tables and historical metrics.
- Created reports for the cluster usage using Data node/ Name node / Resource manager and Navigator log data.
- Imported data using Sqoop from Teradata using Teradata connector.
- Used Oozie to orchestrate the work flow.
- Developed programs in Spark based on the application for faster data processing than standard MapReduce programs in Java and Scala
- Creating Hive tables and working on them for data analysis in order to meet the business requirements.
- Designed and implemented large-scale parallel relation-learning system .
- Installed and benchmarked Hadoop/HBase clusters for internal use.
- Written HBASE Client program in Java and Webservices.
- Model, serialize and manipulate data in multiple forms (XML).
- Shared responsibility for administration of Hadoop ecosystem.
- Developed Splunk dashboards, searches and reporting
- Experience with data model concepts-star schema dimensional modeling Relational design (ER).
Technologies Used: Hadoop, MapReduce, HDFS, Splunk, Hive, Spark, Java, Scala, Cloudera, HBase, Linux, XML, MySQL Workbench, Java 6, Eclipse, Oracle 10g, PL/SQL, SQL*PLUS
Confidential, -San Francisco, CA.
Hadoop Developer
Responsibilities:
- Worked on Hortonworks-HDP 2.5distribution
- • Responsible for building-scalable distribution data solution using Hadoop
- • Involved in importing data from MicrosoftSQLServer, MySQL, Teradata. into HDFS using Sqoop.
- • Played a key role in dynamic partitioning and Bucketing of the data stored in Hive Metadata.
- • Writing HiveQL queries for integrating different tables for create views to produce result set.
- • Collected the log data from Web Servers and integrated into HDFS using Flume.
- • Experienced on loading and transforming of large sets of structed and unstructured data.
- • Used MapReduce programs for data cleaning and transformations and load the output into the Hive tables in different file formats.
- • Written MapReduce programs to handle semi structed and un structed data like JSON, Avro data files and sequence files for log files.
- • Created data pipelines for different events to load the data from DynamoDB to AWS S3 bucket and then into HDFS location.
- • Involved in loading data into HBaseNoSQL database.
- • Building, Managing and scheduling Oozie workflows for end to end job processing
- • Experienced in extending Hive and Pig core functionality by writing custom UDFs using Java.
- • Analyzing of Large volumes of structured data using SparkSQL.
- • Written shell script to execute HiveQL.
- • Used Spark as ETL tool
- • Written Automated shell scripts in Linux/Unix environment using bash.
- • Migrated HiveQL queries into SparkSQLto improve performance.
- • Extracted Real time feed using Spark streaming and convert to RDD and process data into Data Frame and load the data into HBase.
- • Experienced in using Data Stax Spark connector which is used to store the data into Cassandra databaseor get the data from Cassandra database.
- • Extracted Real time feed using Spark streaming and convert it to RDD and process data into Data Frame and load the data into Cassandra.
- Environment: Hortonworks, Hadoop, HDFS, Pig, Sqoop, Hive, Oozie, Zookeeper, NoSQL, HBase, Shell Scripting, Scala, Spark, SparkSQL
Confidential
Java Developer
Responsibilities:
- Worked on Agile development environment. Participated in scrum meetings.
- Developed web pages using JSF framework establishing communication between various pages in application.
- Designed and developed JSP pages using Struts framework.
- Utilized the Tiles framework for page layouts.
- Involved in writing client side validations using Java Script.
- Used Hibernate framework to persist the employer work hours to the database.
- Spring framework AOP features were extensively used.
- Followed Use Case Design Specification and developed Class and Sequence Diagrams using RAD, MS Visio.
- Used JavaScript, AJAX for making calls to Controllers that get File from server and popup to the screen without losing the attributes of the page.
- Coded Test Cases and created Mock Objects using JMock and used JUnit to run tests.
- Configured Hudson and integrated it with CVS to automatically run test cases with every build and generate code coverage report.
- Configured Data Source on WebLogic Application server for connecting to Oracle, DB2 Databases.
- Wrote complex SQL statements and used PL/SQL for performing database operations with the help of TOAD.
- Created User interface for Testing team which helped them efficiently test executables.
- Mentored co-developers with new technologies. Participated in Code reviews.
- Worked on a Data stage project which generates automated daily reports after performing various validations.
Environment: UNIX, RAD6.0, WebLogic, Oracle, Maven, JavaScript, JSF, JSP, Servlets, Log4J, Spring, Pure Query, JMock, JUnit,TOAD, MS Visio, Data Stage, CVS, SVN, UML and SOAPUI.
Confidential
Jr. Java Developer
Responsibilities:
- Involved in Design, Development and Support phases of Software Development Life Cycle (SDLC)
- Experience of gathering data for requirements and use case development.
- Reviewed the functional, design, source code and test specifications
- Involved in developing the complete frontend development using Java Script and CSS
- Implemented backend configuration on DAO, and XML generation modules of DIS
- Used JDBC for database access, and used Data Transfer Object (DTO) design patterns
- Unit testing and rigorous integration testing of the whole application
- Written and executed the Test Scripts using JUNIT and actively involved in system testing
- Developed XML parsing tool for regression testing
- Worked on documentation that meets with required compliance standards. Also, monitored end-to-end testing activities.
Technologies Used: Java, JavaScript, HTML, CSS, JDK 1.5.1, JDBC, Oracle10g, XML, XSL, Solaris and UML.