We provide IT Staff Augmentation Services!

Big Data Software Engineer Resume

0/5 (Submit Your Rating)

Pleasanton, CA

SUMMARY

  • Around 8 years of IT experience in software development and support wif experiences in developing strategic methods for deploying Big Data Technologies to efficiently solve Big Data processing requirements
  • Good hands on experience in Hadoop eco system components HDFS/Map reduce, Hbase, Yarn, Pig, Spark, Sqoop, Spark SQL, Spark Streaming and Hive
  • Experience in Installing, maintaining and configuring Hadoop Cluster
  • Efficient in processing large sets of structured, semi - structured and unstructured data and supporting systems application architecture
  • Capable of creating and monitoring Hadoop clusters on Amazon EC2, Hortonworks Data Platform 2.1 & 2.2, VM, CDH3, CDH4 Cloudera Manager on Linux, Ubuntu OS etc
  • Experience in working wif structured data using Hive, Hive UDFs, join operations, partitions, bucketing and internal/external tables
  • Hands on experience in different stages of big data applications such as data ingestion, data analytics and data visualization
  • Good experience on Scala Programming language and Spark Core
  • Hands on experience in Importing and Exporting teh Data using SQOOP from HDFS to Relational Database management system
  • Experience in analyzing data wif Hive
  • Well Experience in using Hive Query Language and data Analytics using Hive Query Language
  • Expertise in Managing and scheduling batch Jobs on a Hadoop Cluster using Oozie
  • Strong knowledge in NOSQL column oriented databases like HBase, Cassandra, MongoDB, and its integration wif Hadoop cluster
  • Experience on setting cluster in Amazon EC2, Amazon EMR, Amazon RDS S3 Buckets, Dynamo DB, and RedShift
  • Worked on Oozie and Zookeeper for managing Hadoop jobs
  • Capable to analyze data, interpret results, and convey findings in a concise and professional manner
  • Promote/ Simulate complete approach including request analysis, creating/pulling dataset, report creation and implementation and providing final analysis to teh requestor
  • Very Good understanding of SQL, ETL and Data Warehousing Technologies
  • Strong experience in RDBMS technologies like MySQL, Oracle and Teradata
  • Strong database skills in IBM- DB2, Oracle and Proficient in database development, including Constraints, Indexes, Views, Stored Procedures, Triggers and Cursors
  • Experience in developing Web-Services module for integration using SOAP and REST
  • Flexible wif Unix/Linux and Windows Environments, working wif Operating Systems like Centos 5/6, Ubuntu 13/14, Cosmos
  • Experience in working wif structured data using Hive, Hive UDFs, join operations, partitions, bucketing and internal/external tables
  • Used Maven Deployment Descriptor Setting up build environment by writing Maven build XML, taking build, configuring and deploying of teh application in all teh servers
  • Experience in build scripts using Maven and do continuous integrations systems like Jenkins
  • Intensive work experience in developing enterprise solutions using Java, J2EE, Servlets, JSP, JDBC, Struts, spring, Hibernate, JavaBeans, JSF, MVC
  • Proficient knowledge on java virtual machines (JVM) and multithreaded processing.
  • Good understanding of SQL, ETL and Data Warehousing Technologies
  • Experience in working wif job scheduler like Autosys and Maestro
  • Strong in databases like Sybase, DB2, Oracle, MS SQL, Clickstream
  • Loaded teh dataset into Hive for ETL operation
  • Proficient in using various IDEs like RAD, Eclipse
  • Strong understanding of Agile Scrum and Waterfall SDLC methodologies
  • Excellent problem-solving analytical, communication, presentation and interpersonal skills that help me be a core member of any team
  • Strong communication, collaboration & team building skills wif proficiency at grasping new technical concepts quickly and utilizing them in a productive manner
  • Experienced in provided training to team members as new per teh project requirement.
  • Experienced in creating Product Documentation & Presentations

TECHNICAL SKILLS

Hadoop/Big Data Technologies: HDFS, Map Reduce, Sqoop, Flume, Pig, Hive, Oozie, impala, Spark, Zookeeper and Cloudera Manager.

NO SQL Database: HBase, Cassandra

Monitoring and Reporting: Tableau, Custom shell scripts

Hadoop Distribution: Horton Works, Cloudera, MapR

Build Tools: Maven, SQL Developer

Programming & Scripting: JAVA, C, SQL, Shell Scripting, Python, Scala

Java Technologies: Servlets, JavaBeans, JDBC, Spring, Hibernate, SOAP/Rest services

Databases: Oracle, MY SQL, MS SQL server, Teradata

Web Dev. Technologies: HTML, XML, JSON, CSS, JQUERY, JavaScript, angular JS

Version Control: SVN, CVS, GIT

Operating Systems: Linux, Unix, Mac OS-X, Cen OS, Windows10, Windows 8, Windows 7, Windows Server 2008/2003

PROFESSIONAL EXPERIENCE

Confidential, Pleasanton, CA

Big Data Software Engineer

Responsibilities:

  • Installed/Configured/Maintained Apache Hadoop clusters for Analytics, application development and Hadoop tools like Hive, HSQL Pig, HBase, OLAP, Zookeeper, Avro, parquet and Sqoop on Linux ARCH
  • Wrote teh shell scripts to monitor teh health check of Hadoop daemon services and respond accordingly to any warning or failure conditions
  • Having experience working wif Devops
  • Installed and configured Hadoop, MapReduce, HDFS (Hadoop Distributed File System), developed multiple MapReduce jobs in java for data cleaning
  • Having experience in doing structured modelling on unstructured data models
  • Developed data pipeline using Flume, Sqoop, Pig and Java MapReduce to ingest customer behavioral data and financial histories into HDFS for analysis
  • Involved in collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis
  • Worked on installing cluster, commissioning & decommissioning of Data Nodes, Name Node recovery, capacity planning, and slots configuration
  • Developed PIG Latin scripts to extract teh data from teh web server output files to load into HDFS
  • Worked on Hortonworks Data Platform (HDP)
  • Worked wif SPLUNK to analyze and visualize data
  • Worked on Mesos cluster and Marathon
  • Used Pig as ETL tool to do transformations, event joins and some pre-aggregations before storing teh data onto HDFS
  • Worked wif Orchestration tools like Airflow
  • Write test cases, analyze and reporting test results to product teams
  • Good experience on Clojure, Kafka and Storm
  • Worked wif AWS data pipeline
  • UsedAWSCLI to create new instances and manage existing instances.
  • AWSCLI Auto Scaling and Cloud Watch Monitoring creation and update.
  • Developed CLI tools in bash for developers to create application AMIs, run instances of their AMIs, and easily identify and access their AMI instances.
  • Worked wif Elastic Search, Postgres, Apache NIFI
  • Hadoopworkflow management using Oozie, Azkaban, Hamake
  • Responsible for developing data pipeline using Azure HDInsight, flume, Sqoop and pig to extract teh data from weblogs and store in HDFS
  • Working wif variousAWSEC2 and S3 CLI tools.
  • Installed Oozie workflow engine to run multiple Hive and Pig Jobs, used Sqoop to import and export data from HDFS to RDBMS and vice-versa for visualization and to generate reports
  • Developed CLI tools in bash for developers to create application AMIs, run instances of their AMIs, and easily identify and access their AMI instances.
  • Involved in migration of ETL processes from Oracle to Hive to test teh easy data manipulation
  • Worked in functional, system, and regression testing activities wif agile methodology
  • Worked onPythonplugin on MySQL workbench to upload CSV files
  • Used Hive to analyze teh partitioned and bucketed data and compute various metrics for reporting
  • Worked wif HDFS Storage Formats like Avro, Orc
  • Worked wif Accumulo to Modify server side Key Value pairs
  • Working experience wif shiny and R
  • Working experience wif Vertica, QilkSense, QilkView and SAP BOE
  • Worked wif NoSQL databases likeHBase, Cassandra, DynamoDB
  • Worked wif AWS based data ingestion and transformations
  • Good experience wif Python, Pig, Sqoop, Oozie, Hadoop Streaming, Hive and Phoenix
  • Worked on importing and exporting data from Oracle and DB2 into HDFS and HIVE using Sqoop
  • Responsible for building scalable distributed data solutions using Hadoop like cluster maintenance, adding and removing cluster nodes, Cluster Monitoring, and troubleshooting, Manage and review data backups and log files
  • Developed several new MapReduce programs to analyze and transform teh data to uncover insights into teh customer usage patterns
  • Worked extensively wif importing metadata into Hive using Sqoop and migrated existing tables and applications to work on Hive
  • Extract, Load, and Transfer data through Talend
  • Exploring wif theSparkimproving performance and optimization of teh existing algorithms in Hadoop.
  • Responsible for design & development ofSparkSQL Scripts using Scala/Java based on Functional Specifications
  • Involved in teh development ofTalendJobs and preparation of design documents, technical specification documents
  • Developed complexTalendETL jobs to migrate teh data from flat files to database
  • Responsible for running Hadoop streaming jobs to process terabytes of xml's data, utilized cluster co-ordination services through Zookeeper
  • Extensive experience in using teh MOM wif Active MQ, Apache storm, Apache Spark & Kafka Maven, and Zookeeper.
  • Worked on teh core and Spark SQL modules of Spark extensively
  • Worked on Descriptive statistics Using R
  • Developed Kafka producer and consumers, HBase clients, Spark,shark, Streams and Hadoop MapReduce jobs along wif components on HDFS, Hive
  • Strong Working experience in snowflake, Clickstream
  • Analyzed teh SQL scripts and designed teh solution to implement using PySpark
  • Experience using Spark wif Neo4J where acquiring teh interrelated graphical information of teh insurer and to query teh data from teh stored graphs
  • Load and transform large sets of structured, semi structured, and unstructured data using Hadoop/Big Data concepts
  • Responsible for creating Hive External tables and loaded teh data in to tables and query data using HQL
  • Handled importing data from various data sources, performed transformations using Hive, MapReduce, and loaded data into HDFS

Environment: Hadoop Cluster, HDFS, Hive, Pig, Sqoop,OLAP, data modelling, Linux, Hadoop Map Reduce, HBase, Shell Scripting, MongoDB, and Cassandra, Apache Spark,Neo4J.

Confidential, Iowa City, IA

Big Data Software Developer

Responsibilities:

  • Performed benchmarking of HDFS and Resource manager using Test DFSIO and Tera Sort.
  • Worked on SQOOP to import data from various relational data sources.
  • Working wif Flume in bringing click stream data from front facing application logs
  • Worked on strategizing SQOOP jobs to parallelize data loads from source systems
  • Participated in providing inputs for design of teh ingestion patterns.
  • Participated in strategizing loads wifout impacting front facing applications.
  • Worked in agile environment using Jira, Git.
  • Worked on design on Hive, ANSI data store to store teh data from various data sources.
  • Involved in brainstorming sessions for sizing teh Hadoop cluster.
  • Involved in providing inputs to analyst team for functional testing.
  • Worked wif source system load testing teams to perform loads while ingestion jobs are in progress.
  • Worked wif Continuous Integration and related tools (me.e. Jenkins, Maven).
  • Worked on performing data standardization using PIG scripts.
  • Worked wif query engines Tez, Apache Phoenix.
  • Worked on installation and configuration Horton works cluster ground up.
  • Managed various groups for users wif different queue configurations.
  • Worked on building analytical data stores for data science team’s model development.
  • Worked on design and development of Oozie works flows to perform orchestration of PIG and HIVE jobs.
  • Involved in converting Hive/SQL queries intoSparktransformations usingSparkRDDS on scala.
  • Worked on performance tuning of HIVE queries wif partitioning and bucketing process.
  • Worked on teh core and Spark SQL modules of Spark extensively
  • Developed Kafka producer and consumers, HBase clients, Spark, and Hadoop MapReduce jobs along wif components on HDFS, Hive.
  • Worked wif big data tools like Apache Phoenix, Apache Kylin, Atscale, Apache Hue.
  • Worked wif securities likeKnox, Ranger, Atlas.
  • Worked wif BI Concepts-Dataguru, Talend.
  • Worked wif Source Code Management Tools GitHUB, Clearcase SVN, CVS,
  • Working experience wif Testing tools JUNIT / SOAPUI.
  • Experienced in analyzing teh SQL scripts and designed teh solution to implement using PySpark.
  • Worked wif Code Quality Governance related tools (Sonar, PMD, FindBugs, Emma, Cobertura,
  • Analyzed teh SQL scripts and designed teh solution to implement using PySpark.

Environment: Hadoop, HDFS, Map Reduce, Flume, Pig, Sqoop, Hive, Pig, Sqoop, Oozie, Ganglia, HBase, Shell Scripting, Apache Spark.

Confidential, Madison, WI

Hadoop Developer

Responsibilities:

  • Worked on Distributed/Cloud Computing (Map Reduce/Hadoop, Hive, Pig, HBase, Sqoop, Spark AVRO, Zookeeper etc.), Cloudera distributed Hadoop (CDH4).
  • Installed and configured Hadoop MapReduce, HDFS, developed multiple Map Reduce jobs in java for data cleaning and processing.
  • Involved in installing Hadoop Ecosystem components.
  • Importing and exporting data into HDFS, Pig, Hive and HBase using SQOOP.
  • Responsible to manage data coming from different sources.
  • Flume and from relational database management systems using SQOOP.
  • Responsible to manage data coming from different data sources.
  • Involved in gathering teh requirements, designing, development and testing.
  • Worked on loading and transformation of large sets of structured, semi structured data into Hadoop system.
  • Developed simple and complex MapReduce programs in Java for Data Analysis.
  • Load data from various data sources into HDFS using Flume.
  • Developed teh Pig UDF'S to pre-process teh data for analysis.
  • Worked on Hue interface for querying teh data.
  • Created Hive tables to store teh processed results in a tabular format.
  • Developed Hive Scripts for implementing dynamic Partitions.
  • Developed Pig scripts for data analysis and extended its functionality by developing custom UDF's.
  • Extensive knowledge on PIG scripts using bags and tuples.
  • Experience in managing and reviewing Hadoop log files.
  • Developed workflow in Oozie to automate teh tasks of loading teh data into HDFS and pre-processing wif Pig.
  • Exported analyzed data to relational databases using SQOOP for visualization to generate reports for teh BI team.

Environment: Hadoop (CDH4), UNIX, Eclipse, HDFS, Java, MapReduce, Apache Pig, Hive, HBase, Oozie, SQOOP and MySQL.

Confidential

JAVA Developer/ Data Analyst

Responsibilities:

  • UsedWeb Spherefor developing use cases, sequence diagrams and preliminary class diagrams for teh system in UML.
  • Interacting wif business users to identify information needs and initiate process changes. Major part in project planning and scoping documents. Adopt waterfall methodology.
  • Perform large scale data analysis and modeling to identify opportunities for improvement based on teh impact and feedback.
  • Provide professional quality project artifacts, including Business requirement documents (BRD’s), requirement plans, models, tractability matrix, use cases and issue logs.
  • Preparing user ‘as-is’ workflow and ‘to-be’ business process.
  • Worked wif QA testing team, creating test plans and test cases and actively participated in user training for user Acceptance testing (UAT) wif business users.
  • Extensively usedWeb Sphere Studio Application Developerfor building, testing, and deploying Pushpin Tool.
  • Real-time experience in database management, BA, and software development life cycle.
  • Gained knowledge on analyzing stastical data.
  • Creating standard operating procedures (SOP’s). Performed SQL queries on database using SQL server.
  • Hands on experience on training and mentoring. Used IBM COGNOS for reports.
  • UsedSpringFramework based on (MVC) Model View Controller, designed GUI screens by using HTML, JSP.
  • Developed teh presentation layer andGUIframework inHTML,JSPand Client-Side validations were done.
  • Involved in Java code, which generatedXMLdocument, which in turn used XSLT to translate teh content intoHTMLto present to GUI.
  • ImplementedXQueryandXPathfor querying and node selection based on teh client input XML files to create Java Objects.
  • Used Web Sphere to develop teh Entity Beans where transaction persistence is required and JDBC was used to connect to theMySQL database.
  • Developed user interface using teh JSP pages and DHTML to design teh dynamic HTML pages.
  • Developed Session Beans on Web Sphere for teh transactions in teh application.
  • Utilized WSAD to createJSP, Servlets, and EJB that pulled information from a DB2 database and sent to a front-end GUI for end users.
  • In teh database end, responsibilities included creation of tables, triggers, stored procedures, sub-queries, joins, integrity constraints and views.
  • Worked onMQ SerieswifJ2EEtechnologies (EJB, Java Mail, JMS, etc.) on Web Sphere server.

Environment: Java, EJB, IBM Web Sphere Application server, Spring, JSP, Servlets, JUnit, JDBC, XML, XSLT, CSS, DOM, HTML, MySQL, JavaScript,Oracle, UML, Clear Case, ANT.

We'd love your feedback!