Big Data Software Engineer Resume Pleasanton, CA - Hire IT People

SUMMARY

Around 8 years of IT experience in software development and support wif experiences in developing strategic methods for deploying Big Data Technologies to efficiently solve Big Data processing requirements
Good hands on experience in Hadoop eco system components HDFS/Map reduce, Hbase, Yarn, Pig, Spark, Sqoop, Spark SQL, Spark Streaming and Hive
Experience in Installing, maintaining and configuring Hadoop Cluster
Efficient in processing large sets of structured, semi - structured and unstructured data and supporting systems application architecture
Capable of creating and monitoring Hadoop clusters on Amazon EC2, Hortonworks Data Platform 2.1 & 2.2, VM, CDH3, CDH4 Cloudera Manager on Linux, Ubuntu OS etc
Experience in working wif structured data using Hive, Hive UDFs, join operations, partitions, bucketing and internal/external tables
Hands on experience in different stages of big data applications such as data ingestion, data analytics and data visualization
Good experience on Scala Programming language and Spark Core
Hands on experience in Importing and Exporting teh Data using SQOOP from HDFS to Relational Database management system
Experience in analyzing data wif Hive
Well Experience in using Hive Query Language and data Analytics using Hive Query Language
Expertise in Managing and scheduling batch Jobs on a Hadoop Cluster using Oozie
Strong knowledge in NOSQL column oriented databases like HBase, Cassandra, MongoDB, and its integration wif Hadoop cluster
Experience on setting cluster in Amazon EC2, Amazon EMR, Amazon RDS S3 Buckets, Dynamo DB, and RedShift
Worked on Oozie and Zookeeper for managing Hadoop jobs
Capable to analyze data, interpret results, and convey findings in a concise and professional manner
Promote/ Simulate complete approach including request analysis, creating/pulling dataset, report creation and implementation and providing final analysis to teh requestor
Very Good understanding of SQL, ETL and Data Warehousing Technologies
Strong experience in RDBMS technologies like MySQL, Oracle and Teradata
Strong database skills in IBM- DB2, Oracle and Proficient in database development, including Constraints, Indexes, Views, Stored Procedures, Triggers and Cursors
Experience in developing Web-Services module for integration using SOAP and REST
Flexible wif Unix/Linux and Windows Environments, working wif Operating Systems like Centos 5/6, Ubuntu 13/14, Cosmos
Experience in working wif structured data using Hive, Hive UDFs, join operations, partitions, bucketing and internal/external tables
Used Maven Deployment Descriptor Setting up build environment by writing Maven build XML, taking build, configuring and deploying of teh application in all teh servers
Experience in build scripts using Maven and do continuous integrations systems like Jenkins
Intensive work experience in developing enterprise solutions using Java, J2EE, Servlets, JSP, JDBC, Struts, spring, Hibernate, JavaBeans, JSF, MVC
Proficient knowledge on java virtual machines (JVM) and multithreaded processing.
Good understanding of SQL, ETL and Data Warehousing Technologies
Experience in working wif job scheduler like Autosys and Maestro
Strong in databases like Sybase, DB2, Oracle, MS SQL, Clickstream
Loaded teh dataset into Hive for ETL operation
Proficient in using various IDEs like RAD, Eclipse
Strong understanding of Agile Scrum and Waterfall SDLC methodologies
Excellent problem-solving analytical, communication, presentation and interpersonal skills that help me be a core member of any team
Strong communication, collaboration & team building skills wif proficiency at grasping new technical concepts quickly and utilizing them in a productive manner
Experienced in provided training to team members as new per teh project requirement.
Experienced in creating Product Documentation & Presentations

TECHNICAL SKILLS

Hadoop/Big Data Technologies: HDFS, Map Reduce, Sqoop, Flume, Pig, Hive, Oozie, impala, Spark, Zookeeper and Cloudera Manager.

NO SQL Database: HBase, Cassandra

Monitoring and Reporting: Tableau, Custom shell scripts

Hadoop Distribution: Horton Works, Cloudera, MapR

Build Tools: Maven, SQL Developer

Programming & Scripting: JAVA, C, SQL, Shell Scripting, Python, Scala

Java Technologies: Servlets, JavaBeans, JDBC, Spring, Hibernate, SOAP/Rest services

Databases: Oracle, MY SQL, MS SQL server, Teradata

Web Dev. Technologies: HTML, XML, JSON, CSS, JQUERY, JavaScript, angular JS

Version Control: SVN, CVS, GIT

Operating Systems: Linux, Unix, Mac OS-X, Cen OS, Windows10, Windows 8, Windows 7, Windows Server 2008/2003

PROFESSIONAL EXPERIENCE

Confidential, Pleasanton, CA

Big Data Software Engineer

Responsibilities:

Installed/Configured/Maintained Apache Hadoop clusters for Analytics, application development and Hadoop tools like Hive, HSQL Pig, HBase, OLAP, Zookeeper, Avro, parquet and Sqoop on Linux ARCH
Wrote teh shell scripts to monitor teh health check of Hadoop daemon services and respond accordingly to any warning or failure conditions
Having experience working wif Devops
Installed and configured Hadoop, MapReduce, HDFS (Hadoop Distributed File System), developed multiple MapReduce jobs in java for data cleaning
Having experience in doing structured modelling on unstructured data models
Developed data pipeline using Flume, Sqoop, Pig and Java MapReduce to ingest customer behavioral data and financial histories into HDFS for analysis
Involved in collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis
Worked on installing cluster, commissioning & decommissioning of Data Nodes, Name Node recovery, capacity planning, and slots configuration
Developed PIG Latin scripts to extract teh data from teh web server output files to load into HDFS
Worked on Hortonworks Data Platform (HDP)
Worked wif SPLUNK to analyze and visualize data
Worked on Mesos cluster and Marathon
Used Pig as ETL tool to do transformations, event joins and some pre-aggregations before storing teh data onto HDFS
Worked wif Orchestration tools like Airflow
Write test cases, analyze and reporting test results to product teams
Good experience on Clojure, Kafka and Storm
Worked wif AWS data pipeline
UsedAWSCLI to create new instances and manage existing instances.
AWSCLI Auto Scaling and Cloud Watch Monitoring creation and update.
Developed CLI tools in bash for developers to create application AMIs, run instances of their AMIs, and easily identify and access their AMI instances.
Worked wif Elastic Search, Postgres, Apache NIFI
Hadoopworkflow management using Oozie, Azkaban, Hamake
Responsible for developing data pipeline using Azure HDInsight, flume, Sqoop and pig to extract teh data from weblogs and store in HDFS
Working wif variousAWSEC2 and S3 CLI tools.
Installed Oozie workflow engine to run multiple Hive and Pig Jobs, used Sqoop to import and export data from HDFS to RDBMS and vice-versa for visualization and to generate reports
Developed CLI tools in bash for developers to create application AMIs, run instances of their AMIs, and easily identify and access their AMI instances.
Involved in migration of ETL processes from Oracle to Hive to test teh easy data manipulation
Worked in functional, system, and regression testing activities wif agile methodology
Worked onPythonplugin on MySQL workbench to upload CSV files
Used Hive to analyze teh partitioned and bucketed data and compute various metrics for reporting
Worked wif HDFS Storage Formats like Avro, Orc
Worked wif Accumulo to Modify server side Key Value pairs
Working experience wif shiny and R
Working experience wif Vertica, QilkSense, QilkView and SAP BOE
Worked wif NoSQL databases likeHBase, Cassandra, DynamoDB
Worked wif AWS based data ingestion and transformations
Good experience wif Python, Pig, Sqoop, Oozie, Hadoop Streaming, Hive and Phoenix
Worked on importing and exporting data from Oracle and DB2 into HDFS and HIVE using Sqoop
Responsible for building scalable distributed data solutions using Hadoop like cluster maintenance, adding and removing cluster nodes, Cluster Monitoring, and troubleshooting, Manage and review data backups and log files
Developed several new MapReduce programs to analyze and transform teh data to uncover insights into teh customer usage patterns
Worked extensively wif importing metadata into Hive using Sqoop and migrated existing tables and applications to work on Hive
Extract, Load, and Transfer data through Talend
Exploring wif theSparkimproving performance and optimization of teh existing algorithms in Hadoop.
Responsible for design & development ofSparkSQL Scripts using Scala/Java based on Functional Specifications
Involved in teh development ofTalendJobs and preparation of design documents, technical specification documents
Developed complexTalendETL jobs to migrate teh data from flat files to database
Responsible for running Hadoop streaming jobs to process terabytes of xml's data, utilized cluster co-ordination services through Zookeeper
Extensive experience in using teh MOM wif Active MQ, Apache storm, Apache Spark & Kafka Maven, and Zookeeper.
Worked on teh core and Spark SQL modules of Spark extensively
Worked on Descriptive statistics Using R
Developed Kafka producer and consumers, HBase clients, Spark,shark, Streams and Hadoop MapReduce jobs along wif components on HDFS, Hive
Strong Working experience in snowflake, Clickstream
Analyzed teh SQL scripts and designed teh solution to implement using PySpark
Experience using Spark wif Neo4J where acquiring teh interrelated graphical information of teh insurer and to query teh data from teh stored graphs
Load and transform large sets of structured, semi structured, and unstructured data using Hadoop/Big Data concepts
Responsible for creating Hive External tables and loaded teh data in to tables and query data using HQL
Handled importing data from various data sources, performed transformations using Hive, MapReduce, and loaded data into HDFS

Environment: Hadoop Cluster, HDFS, Hive, Pig, Sqoop,OLAP, data modelling, Linux, Hadoop Map Reduce, HBase, Shell Scripting, MongoDB, and Cassandra, Apache Spark,Neo4J.

Confidential, Iowa City, IA

Big Data Software Developer

Responsibilities:

Performed benchmarking of HDFS and Resource manager using Test DFSIO and Tera Sort.
Worked on SQOOP to import data from various relational data sources.
Working wif Flume in bringing click stream data from front facing application logs
Worked on strategizing SQOOP jobs to parallelize data loads from source systems
Participated in providing inputs for design of teh ingestion patterns.
Participated in strategizing loads wifout impacting front facing applications.
Worked in agile environment using Jira, Git.
Worked on design on Hive, ANSI data store to store teh data from various data sources.
Involved in brainstorming sessions for sizing teh Hadoop cluster.
Involved in providing inputs to analyst team for functional testing.
Worked wif source system load testing teams to perform loads while ingestion jobs are in progress.
Worked wif Continuous Integration and related tools (me.e. Jenkins, Maven).
Worked on performing data standardization using PIG scripts.
Worked wif query engines Tez, Apache Phoenix.
Worked on installation and configuration Horton works cluster ground up.
Managed various groups for users wif different queue configurations.
Worked on building analytical data stores for data science team’s model development.
Worked on design and development of Oozie works flows to perform orchestration of PIG and HIVE jobs.
Involved in converting Hive/SQL queries intoSparktransformations usingSparkRDDS on scala.
Worked on performance tuning of HIVE queries wif partitioning and bucketing process.
Worked on teh core and Spark SQL modules of Spark extensively
Developed Kafka producer and consumers, HBase clients, Spark, and Hadoop MapReduce jobs along wif components on HDFS, Hive.
Worked wif big data tools like Apache Phoenix, Apache Kylin, Atscale, Apache Hue.
Worked wif securities likeKnox, Ranger, Atlas.
Worked wif BI Concepts-Dataguru, Talend.
Worked wif Source Code Management Tools GitHUB, Clearcase SVN, CVS,
Working experience wif Testing tools JUNIT / SOAPUI.
Experienced in analyzing teh SQL scripts and designed teh solution to implement using PySpark.
Worked wif Code Quality Governance related tools (Sonar, PMD, FindBugs, Emma, Cobertura,
Analyzed teh SQL scripts and designed teh solution to implement using PySpark.

Environment: Hadoop, HDFS, Map Reduce, Flume, Pig, Sqoop, Hive, Pig, Sqoop, Oozie, Ganglia, HBase, Shell Scripting, Apache Spark.

Confidential, Madison, WI

Hadoop Developer

Responsibilities:

Worked on Distributed/Cloud Computing (Map Reduce/Hadoop, Hive, Pig, HBase, Sqoop, Spark AVRO, Zookeeper etc.), Cloudera distributed Hadoop (CDH4).
Installed and configured Hadoop MapReduce, HDFS, developed multiple Map Reduce jobs in java for data cleaning and processing.
Involved in installing Hadoop Ecosystem components.
Importing and exporting data into HDFS, Pig, Hive and HBase using SQOOP.
Responsible to manage data coming from different sources.
Flume and from relational database management systems using SQOOP.
Responsible to manage data coming from different data sources.
Involved in gathering teh requirements, designing, development and testing.
Worked on loading and transformation of large sets of structured, semi structured data into Hadoop system.
Developed simple and complex MapReduce programs in Java for Data Analysis.
Load data from various data sources into HDFS using Flume.
Developed teh Pig UDF'S to pre-process teh data for analysis.
Worked on Hue interface for querying teh data.
Created Hive tables to store teh processed results in a tabular format.
Developed Hive Scripts for implementing dynamic Partitions.
Developed Pig scripts for data analysis and extended its functionality by developing custom UDF's.
Extensive knowledge on PIG scripts using bags and tuples.
Experience in managing and reviewing Hadoop log files.
Developed workflow in Oozie to automate teh tasks of loading teh data into HDFS and pre-processing wif Pig.
Exported analyzed data to relational databases using SQOOP for visualization to generate reports for teh BI team.

Environment: Hadoop (CDH4), UNIX, Eclipse, HDFS, Java, MapReduce, Apache Pig, Hive, HBase, Oozie, SQOOP and MySQL.

Confidential

JAVA Developer/ Data Analyst

Responsibilities:

UsedWeb Spherefor developing use cases, sequence diagrams and preliminary class diagrams for teh system in UML.
Interacting wif business users to identify information needs and initiate process changes. Major part in project planning and scoping documents. Adopt waterfall methodology.
Perform large scale data analysis and modeling to identify opportunities for improvement based on teh impact and feedback.
Provide professional quality project artifacts, including Business requirement documents (BRD’s), requirement plans, models, tractability matrix, use cases and issue logs.
Preparing user ‘as-is’ workflow and ‘to-be’ business process.
Worked wif QA testing team, creating test plans and test cases and actively participated in user training for user Acceptance testing (UAT) wif business users.
Extensively usedWeb Sphere Studio Application Developerfor building, testing, and deploying Pushpin Tool.
Real-time experience in database management, BA, and software development life cycle.
Gained knowledge on analyzing stastical data.
Creating standard operating procedures (SOP’s). Performed SQL queries on database using SQL server.
Hands on experience on training and mentoring. Used IBM COGNOS for reports.
UsedSpringFramework based on (MVC) Model View Controller, designed GUI screens by using HTML, JSP.
Developed teh presentation layer andGUIframework inHTML,JSPand Client-Side validations were done.
Involved in Java code, which generatedXMLdocument, which in turn used XSLT to translate teh content intoHTMLto present to GUI.
ImplementedXQueryandXPathfor querying and node selection based on teh client input XML files to create Java Objects.
Used Web Sphere to develop teh Entity Beans where transaction persistence is required and JDBC was used to connect to theMySQL database.
Developed user interface using teh JSP pages and DHTML to design teh dynamic HTML pages.
Developed Session Beans on Web Sphere for teh transactions in teh application.
Utilized WSAD to createJSP, Servlets, and EJB that pulled information from a DB2 database and sent to a front-end GUI for end users.
In teh database end, responsibilities included creation of tables, triggers, stored procedures, sub-queries, joins, integrity constraints and views.
Worked onMQ SerieswifJ2EEtechnologies (EJB, Java Mail, JMS, etc.) on Web Sphere server.

Environment: Java, EJB, IBM Web Sphere Application server, Spring, JSP, Servlets, JUnit, JDBC, XML, XSLT, CSS, DOM, HTML, MySQL, JavaScript,Oracle, UML, Clear Case, ANT.

We provide IT Staff Augmentation Services!

Big Data Software Engineer Resume

Pleasanton, CA

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship