Big Data Software Engineer Resume
Pleasanton, CA
SUMMARY
- Around 8 years of IT experience in software development and support wif experiences in developing strategic methods for deploying Big Data Technologies to efficiently solve Big Data processing requirements
- Good hands on experience in Hadoop eco system components HDFS/Map reduce, Hbase, Yarn, Pig, Spark, Sqoop, Spark SQL, Spark Streaming and Hive
- Experience in Installing, maintaining and configuring Hadoop Cluster
- Efficient in processing large sets of structured, semi - structured and unstructured data and supporting systems application architecture
- Capable of creating and monitoring Hadoop clusters on Amazon EC2, Hortonworks Data Platform 2.1 & 2.2, VM, CDH3, CDH4 Cloudera Manager on Linux, Ubuntu OS etc
- Experience in working wif structured data using Hive, Hive UDFs, join operations, partitions, bucketing and internal/external tables
- Hands on experience in different stages of big data applications such as data ingestion, data analytics and data visualization
- Good experience on Scala Programming language and Spark Core
- Hands on experience in Importing and Exporting teh Data using SQOOP from HDFS to Relational Database management system
- Experience in analyzing data wif Hive
- Well Experience in using Hive Query Language and data Analytics using Hive Query Language
- Expertise in Managing and scheduling batch Jobs on a Hadoop Cluster using Oozie
- Strong knowledge in NOSQL column oriented databases like HBase, Cassandra, MongoDB, and its integration wif Hadoop cluster
- Experience on setting cluster in Amazon EC2, Amazon EMR, Amazon RDS S3 Buckets, Dynamo DB, and RedShift
- Worked on Oozie and Zookeeper for managing Hadoop jobs
- Capable to analyze data, interpret results, and convey findings in a concise and professional manner
- Promote/ Simulate complete approach including request analysis, creating/pulling dataset, report creation and implementation and providing final analysis to teh requestor
- Very Good understanding of SQL, ETL and Data Warehousing Technologies
- Strong experience in RDBMS technologies like MySQL, Oracle and Teradata
- Strong database skills in IBM- DB2, Oracle and Proficient in database development, including Constraints, Indexes, Views, Stored Procedures, Triggers and Cursors
- Experience in developing Web-Services module for integration using SOAP and REST
- Flexible wif Unix/Linux and Windows Environments, working wif Operating Systems like Centos 5/6, Ubuntu 13/14, Cosmos
- Experience in working wif structured data using Hive, Hive UDFs, join operations, partitions, bucketing and internal/external tables
- Used Maven Deployment Descriptor Setting up build environment by writing Maven build XML, taking build, configuring and deploying of teh application in all teh servers
- Experience in build scripts using Maven and do continuous integrations systems like Jenkins
- Intensive work experience in developing enterprise solutions using Java, J2EE, Servlets, JSP, JDBC, Struts, spring, Hibernate, JavaBeans, JSF, MVC
- Proficient knowledge on java virtual machines (JVM) and multithreaded processing.
- Good understanding of SQL, ETL and Data Warehousing Technologies
- Experience in working wif job scheduler like Autosys and Maestro
- Strong in databases like Sybase, DB2, Oracle, MS SQL, Clickstream
- Loaded teh dataset into Hive for ETL operation
- Proficient in using various IDEs like RAD, Eclipse
- Strong understanding of Agile Scrum and Waterfall SDLC methodologies
- Excellent problem-solving analytical, communication, presentation and interpersonal skills that help me be a core member of any team
- Strong communication, collaboration & team building skills wif proficiency at grasping new technical concepts quickly and utilizing them in a productive manner
- Experienced in provided training to team members as new per teh project requirement.
- Experienced in creating Product Documentation & Presentations
TECHNICAL SKILLS
Hadoop/Big Data Technologies: HDFS, Map Reduce, Sqoop, Flume, Pig, Hive, Oozie, impala, Spark, Zookeeper and Cloudera Manager.
NO SQL Database: HBase, Cassandra
Monitoring and Reporting: Tableau, Custom shell scripts
Hadoop Distribution: Horton Works, Cloudera, MapR
Build Tools: Maven, SQL Developer
Programming & Scripting: JAVA, C, SQL, Shell Scripting, Python, Scala
Java Technologies: Servlets, JavaBeans, JDBC, Spring, Hibernate, SOAP/Rest services
Databases: Oracle, MY SQL, MS SQL server, Teradata
Web Dev. Technologies: HTML, XML, JSON, CSS, JQUERY, JavaScript, angular JS
Version Control: SVN, CVS, GIT
Operating Systems: Linux, Unix, Mac OS-X, Cen OS, Windows10, Windows 8, Windows 7, Windows Server 2008/2003
PROFESSIONAL EXPERIENCE
Confidential, Pleasanton, CA
Big Data Software Engineer
Responsibilities:
- Installed/Configured/Maintained Apache Hadoop clusters for Analytics, application development and Hadoop tools like Hive, HSQL Pig, HBase, OLAP, Zookeeper, Avro, parquet and Sqoop on Linux ARCH
- Wrote teh shell scripts to monitor teh health check of Hadoop daemon services and respond accordingly to any warning or failure conditions
- Having experience working wif Devops
- Installed and configured Hadoop, MapReduce, HDFS (Hadoop Distributed File System), developed multiple MapReduce jobs in java for data cleaning
- Having experience in doing structured modelling on unstructured data models
- Developed data pipeline using Flume, Sqoop, Pig and Java MapReduce to ingest customer behavioral data and financial histories into HDFS for analysis
- Involved in collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis
- Worked on installing cluster, commissioning & decommissioning of Data Nodes, Name Node recovery, capacity planning, and slots configuration
- Developed PIG Latin scripts to extract teh data from teh web server output files to load into HDFS
- Worked on Hortonworks Data Platform (HDP)
- Worked wif SPLUNK to analyze and visualize data
- Worked on Mesos cluster and Marathon
- Used Pig as ETL tool to do transformations, event joins and some pre-aggregations before storing teh data onto HDFS
- Worked wif Orchestration tools like Airflow
- Write test cases, analyze and reporting test results to product teams
- Good experience on Clojure, Kafka and Storm
- Worked wif AWS data pipeline
- UsedAWSCLI to create new instances and manage existing instances.
- AWSCLI Auto Scaling and Cloud Watch Monitoring creation and update.
- Developed CLI tools in bash for developers to create application AMIs, run instances of their AMIs, and easily identify and access their AMI instances.
- Worked wif Elastic Search, Postgres, Apache NIFI
- Hadoopworkflow management using Oozie, Azkaban, Hamake
- Responsible for developing data pipeline using Azure HDInsight, flume, Sqoop and pig to extract teh data from weblogs and store in HDFS
- Working wif variousAWSEC2 and S3 CLI tools.
- Installed Oozie workflow engine to run multiple Hive and Pig Jobs, used Sqoop to import and export data from HDFS to RDBMS and vice-versa for visualization and to generate reports
- Developed CLI tools in bash for developers to create application AMIs, run instances of their AMIs, and easily identify and access their AMI instances.
- Involved in migration of ETL processes from Oracle to Hive to test teh easy data manipulation
- Worked in functional, system, and regression testing activities wif agile methodology
- Worked onPythonplugin on MySQL workbench to upload CSV files
- Used Hive to analyze teh partitioned and bucketed data and compute various metrics for reporting
- Worked wif HDFS Storage Formats like Avro, Orc
- Worked wif Accumulo to Modify server side Key Value pairs
- Working experience wif shiny and R
- Working experience wif Vertica, QilkSense, QilkView and SAP BOE
- Worked wif NoSQL databases likeHBase, Cassandra, DynamoDB
- Worked wif AWS based data ingestion and transformations
- Good experience wif Python, Pig, Sqoop, Oozie, Hadoop Streaming, Hive and Phoenix
- Worked on importing and exporting data from Oracle and DB2 into HDFS and HIVE using Sqoop
- Responsible for building scalable distributed data solutions using Hadoop like cluster maintenance, adding and removing cluster nodes, Cluster Monitoring, and troubleshooting, Manage and review data backups and log files
- Developed several new MapReduce programs to analyze and transform teh data to uncover insights into teh customer usage patterns
- Worked extensively wif importing metadata into Hive using Sqoop and migrated existing tables and applications to work on Hive
- Extract, Load, and Transfer data through Talend
- Exploring wif theSparkimproving performance and optimization of teh existing algorithms in Hadoop.
- Responsible for design & development ofSparkSQL Scripts using Scala/Java based on Functional Specifications
- Involved in teh development ofTalendJobs and preparation of design documents, technical specification documents
- Developed complexTalendETL jobs to migrate teh data from flat files to database
- Responsible for running Hadoop streaming jobs to process terabytes of xml's data, utilized cluster co-ordination services through Zookeeper
- Extensive experience in using teh MOM wif Active MQ, Apache storm, Apache Spark & Kafka Maven, and Zookeeper.
- Worked on teh core and Spark SQL modules of Spark extensively
- Worked on Descriptive statistics Using R
- Developed Kafka producer and consumers, HBase clients, Spark,shark, Streams and Hadoop MapReduce jobs along wif components on HDFS, Hive
- Strong Working experience in snowflake, Clickstream
- Analyzed teh SQL scripts and designed teh solution to implement using PySpark
- Experience using Spark wif Neo4J where acquiring teh interrelated graphical information of teh insurer and to query teh data from teh stored graphs
- Load and transform large sets of structured, semi structured, and unstructured data using Hadoop/Big Data concepts
- Responsible for creating Hive External tables and loaded teh data in to tables and query data using HQL
- Handled importing data from various data sources, performed transformations using Hive, MapReduce, and loaded data into HDFS
Environment: Hadoop Cluster, HDFS, Hive, Pig, Sqoop,OLAP, data modelling, Linux, Hadoop Map Reduce, HBase, Shell Scripting, MongoDB, and Cassandra, Apache Spark,Neo4J.
Confidential, Iowa City, IA
Big Data Software Developer
Responsibilities:
- Performed benchmarking of HDFS and Resource manager using Test DFSIO and Tera Sort.
- Worked on SQOOP to import data from various relational data sources.
- Working wif Flume in bringing click stream data from front facing application logs
- Worked on strategizing SQOOP jobs to parallelize data loads from source systems
- Participated in providing inputs for design of teh ingestion patterns.
- Participated in strategizing loads wifout impacting front facing applications.
- Worked in agile environment using Jira, Git.
- Worked on design on Hive, ANSI data store to store teh data from various data sources.
- Involved in brainstorming sessions for sizing teh Hadoop cluster.
- Involved in providing inputs to analyst team for functional testing.
- Worked wif source system load testing teams to perform loads while ingestion jobs are in progress.
- Worked wif Continuous Integration and related tools (me.e. Jenkins, Maven).
- Worked on performing data standardization using PIG scripts.
- Worked wif query engines Tez, Apache Phoenix.
- Worked on installation and configuration Horton works cluster ground up.
- Managed various groups for users wif different queue configurations.
- Worked on building analytical data stores for data science team’s model development.
- Worked on design and development of Oozie works flows to perform orchestration of PIG and HIVE jobs.
- Involved in converting Hive/SQL queries intoSparktransformations usingSparkRDDS on scala.
- Worked on performance tuning of HIVE queries wif partitioning and bucketing process.
- Worked on teh core and Spark SQL modules of Spark extensively
- Developed Kafka producer and consumers, HBase clients, Spark, and Hadoop MapReduce jobs along wif components on HDFS, Hive.
- Worked wif big data tools like Apache Phoenix, Apache Kylin, Atscale, Apache Hue.
- Worked wif securities likeKnox, Ranger, Atlas.
- Worked wif BI Concepts-Dataguru, Talend.
- Worked wif Source Code Management Tools GitHUB, Clearcase SVN, CVS,
- Working experience wif Testing tools JUNIT / SOAPUI.
- Experienced in analyzing teh SQL scripts and designed teh solution to implement using PySpark.
- Worked wif Code Quality Governance related tools (Sonar, PMD, FindBugs, Emma, Cobertura,
- Analyzed teh SQL scripts and designed teh solution to implement using PySpark.
Environment: Hadoop, HDFS, Map Reduce, Flume, Pig, Sqoop, Hive, Pig, Sqoop, Oozie, Ganglia, HBase, Shell Scripting, Apache Spark.
Confidential, Madison, WI
Hadoop Developer
Responsibilities:
- Worked on Distributed/Cloud Computing (Map Reduce/Hadoop, Hive, Pig, HBase, Sqoop, Spark AVRO, Zookeeper etc.), Cloudera distributed Hadoop (CDH4).
- Installed and configured Hadoop MapReduce, HDFS, developed multiple Map Reduce jobs in java for data cleaning and processing.
- Involved in installing Hadoop Ecosystem components.
- Importing and exporting data into HDFS, Pig, Hive and HBase using SQOOP.
- Responsible to manage data coming from different sources.
- Flume and from relational database management systems using SQOOP.
- Responsible to manage data coming from different data sources.
- Involved in gathering teh requirements, designing, development and testing.
- Worked on loading and transformation of large sets of structured, semi structured data into Hadoop system.
- Developed simple and complex MapReduce programs in Java for Data Analysis.
- Load data from various data sources into HDFS using Flume.
- Developed teh Pig UDF'S to pre-process teh data for analysis.
- Worked on Hue interface for querying teh data.
- Created Hive tables to store teh processed results in a tabular format.
- Developed Hive Scripts for implementing dynamic Partitions.
- Developed Pig scripts for data analysis and extended its functionality by developing custom UDF's.
- Extensive knowledge on PIG scripts using bags and tuples.
- Experience in managing and reviewing Hadoop log files.
- Developed workflow in Oozie to automate teh tasks of loading teh data into HDFS and pre-processing wif Pig.
- Exported analyzed data to relational databases using SQOOP for visualization to generate reports for teh BI team.
Environment: Hadoop (CDH4), UNIX, Eclipse, HDFS, Java, MapReduce, Apache Pig, Hive, HBase, Oozie, SQOOP and MySQL.
Confidential
JAVA Developer/ Data Analyst
Responsibilities:
- UsedWeb Spherefor developing use cases, sequence diagrams and preliminary class diagrams for teh system in UML.
- Interacting wif business users to identify information needs and initiate process changes. Major part in project planning and scoping documents. Adopt waterfall methodology.
- Perform large scale data analysis and modeling to identify opportunities for improvement based on teh impact and feedback.
- Provide professional quality project artifacts, including Business requirement documents (BRD’s), requirement plans, models, tractability matrix, use cases and issue logs.
- Preparing user ‘as-is’ workflow and ‘to-be’ business process.
- Worked wif QA testing team, creating test plans and test cases and actively participated in user training for user Acceptance testing (UAT) wif business users.
- Extensively usedWeb Sphere Studio Application Developerfor building, testing, and deploying Pushpin Tool.
- Real-time experience in database management, BA, and software development life cycle.
- Gained knowledge on analyzing stastical data.
- Creating standard operating procedures (SOP’s). Performed SQL queries on database using SQL server.
- Hands on experience on training and mentoring. Used IBM COGNOS for reports.
- UsedSpringFramework based on (MVC) Model View Controller, designed GUI screens by using HTML, JSP.
- Developed teh presentation layer andGUIframework inHTML,JSPand Client-Side validations were done.
- Involved in Java code, which generatedXMLdocument, which in turn used XSLT to translate teh content intoHTMLto present to GUI.
- ImplementedXQueryandXPathfor querying and node selection based on teh client input XML files to create Java Objects.
- Used Web Sphere to develop teh Entity Beans where transaction persistence is required and JDBC was used to connect to theMySQL database.
- Developed user interface using teh JSP pages and DHTML to design teh dynamic HTML pages.
- Developed Session Beans on Web Sphere for teh transactions in teh application.
- Utilized WSAD to createJSP, Servlets, and EJB that pulled information from a DB2 database and sent to a front-end GUI for end users.
- In teh database end, responsibilities included creation of tables, triggers, stored procedures, sub-queries, joins, integrity constraints and views.
- Worked onMQ SerieswifJ2EEtechnologies (EJB, Java Mail, JMS, etc.) on Web Sphere server.
Environment: Java, EJB, IBM Web Sphere Application server, Spring, JSP, Servlets, JUnit, JDBC, XML, XSLT, CSS, DOM, HTML, MySQL, JavaScript,Oracle, UML, Clear Case, ANT.