Senior Hadoop Developer Resume
Milwaukee, WI
SUMMARY
- 9+ years of software development experience which includes 5 years on Big Data Technologies like Hadoop, Hive, Pig, Sqoop, Hbase, Flume and Spark.
- Expert in working with Hive data warehouse tool - creating tables, data distribution by implementing partitioning and bucketing, writing and optimizing the HiveQL queries.
- Involvedin collecting, aggregating and moving data from servers to HDFS using Apache Flume.
- Good understanding of NoSQL Databases.
- Worked in Windows, UNIX/Linux platform with different technologies such as SQL, PL/SQL, XML, HTML, CSS, Java Script, Core Java etc.
- Experience in Hadoop administration activities such as installation and configuration of clusters using Apache and Cloudera
- Experience in using IDEs like Eclipse and NetBeans.
- Developed UML Diagrams for Object Oriented Design: Use Cases, Sequence Diagrams and Class Diagrams.
- Working knowledge of database such as Oracle 10g.
- Experience in writing Pig Latin scripts.
- Worked on developing ETL processes to load data from multiple data sources to HDFS using FLUME and SQOOP, perform structural modifications using Map-Reduce, HIVE and analyse data using visualization/reporting tools.
- Hands on experience in configuring and working with Flume to load the data from multiple sources directly into HDFS.
- Worked on Agile methodology.
- Experience in using Apache Sqoop to import and export data to and from HDFS and Hive.
- Clear knowledge of rack awareness topology in the Hadoop cluster
- Knowledge on Rackspace.
- Experience in use of Shell scripting to perform tasks.
- Great communication skills/ oral skills.
- Familiar in Core Java with strong understanding and working knowledge in Object Oriented Concepts like Collections, Multithreading, Data Structures, Algorithms, Exception Handling and Polymorphism.
- Basic knowledge in application design using Unified Modeling Language (UML), Sequence diagrams, Case diagrams, Entity Relationship Diagrams (ERD) and Data Flow Diagrams (DFD).
- Extensive programming experience in developing web based applications using Core Java, J2EE, JSP and JDBC.
- Comprehensive knowledge of Software Development Life Cycle coupled with excellent communication skills.
TECHNICAL SKILLS
Languages: JDK1.6/1.7, 2EE 1.5
Scripting Languages: C, JAVA, SQL, PIG LATIN, Bash scripting, JSON
Big Data Ecosystem: HDFS, HBase, MapReduce, Hive, Pig, Sqoop, Impala (POC), Cassandra, Oozie, Zookeeper, Flume, Spark (POC), Kafka (POC)
Operating Systems: Windows 2008/XP/8.1, UNIX, Linux, Cento OS
RDBMS: Oracle 10g, SQL Server 2005/2008, MS-Access, MySQL, NoSQL
Modeling Tools: UML on Rational Rose 4.0.
Web Technologies: HTML, XML, JSP, CSS, Ajax, jQuery
Web Services: WebLogic, Web Sphere, Apache Cassandra, Tomcat
IDE’s: Eclipse, NetBeans, WinSCP
Familiar GUIs: MS Office Suite, MS Project
PROFESSIONAL EXPERIENCE
Senior Hadoop Developer
Confidential, Milwaukee, WI
Responsibilities:
- Installed and configured Hadoop Map Reduce, HDFS, developed multiple Map Reduce jobs in Java for data cleaning and preprocessing.
- Experience in installing, configuring and using Hadoop Ecosystem components.
- Experience in Importing and exporting data into HDFS and Hive using Sqoop.
- Load and transform large sets of structured, semi structured and unstructured data.
- Involved in loading data into HBase using HBase Shell, HBase Client API, Pig and Sqoop.
- Worked on different file formats like Sequence files, XML files and Map files using Map Reduce Programs.
- Responsible for managing data coming from different sources.
- Continuous monitoring and managing the Hadoop cluster using Cloudera Manager.
- Strong expertise on MapReduce programming model with XML, JSON, CSV file formats.
- Gained good experience with NOSQL database.
- Involved in creating Hive tables, loading with data and writing hive queries, which will run internally in map, reduce way.
- Responsible for building scalable distributed data solutions using Hadoop.
- Developed custom aggregate functions usingSparkSQL and performed interactive querying, on a POC level.
- Responsible for provisioning, maintaining and improving upon server infrastructure, split between physical data center and AWS.
- Written Kafka REST API to collect events from front end
- Involvedin collecting, aggregating and moving data from servers to HDFS using Apache Flume.
- Experience in managing and reviewing Hadoop log files.
- Involved in loading data from LINUX file system to HDFS.
- Implemented test scripts to support test driven development and continuous integration.
- Created Pig Latin scripts to sort, group, join and filter the enterprise wise data.
- Worked on tuning the performance Pig queries.
- Mentored analyst and test team for writing Hive Queries.
- Installed Oozie workflow engine to run multiple MapReduce jobs.
- Implemented working with different sources using Multi Input formats using Generic and Object Writable.
- Cluster co-ordination services through Zookeeper.
- Extensive Working knowledge of partitioned table, UDFs, performance tuning, compression-related properties, thrift server in Hive.
- Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
- Worked with the Data Science team to gather requirements for various data mining projects.
Environment: Cloudera CDH 4, HDFS, Hadoop 2.2.0 (Yarn), Spark Flume 1.5.2, Eclipse, AWS, Map Reduce, Hive 1.1.0, Pig Latin 0.14.0, Java, SQL, Sqoop 1.4.6, Centos, Zookeeper 3.5.0 and NOSQL database.
Senior Hadoop Developer
Confidential, TX
Responsibilities:
- Involved in defining job flows, managing and reviewing log files.
- Supported Map Reduce Programs those are running on the cluster.
- As a Big Data Developer, implemented solutions for ingesting data from various sources and processing the Data-at-Rest utilizing Big Data technologies such as Hadoop, MapReduce Frameworks, HBase, Hive, Oozie, Flume, Sqoop etc.
- Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
- Imported Bulk Data into HBase Using Map Reduce programs.
- Responsibilities included designing and developing new back-end services, maintaining and expanding our AWS infrastructure, and providing mentorship to others on my team.
- Developed and written Apache PIG scripts and HIVE scripts to process the HDFS data.
- Perform analytics on Time Series Data exists in HBase using HBase API.
- Designed and implemented Incremental Imports into Hive tables.
- Involvedin collecting, aggregating and moving data from servers to HDFS using Apache Flume.
- Written Hive jobs to parse the logs and structure them in tabular format to facilitate effective querying on the log data.
- Wrote multiple java programs to pull data from Hbase.
- Knowledge on handling Hive queries using Spark SQL that integrate with Spark environment, on a POC level.
- Designed and built the Reporting Application, which uses theSparkSQL to fetch and generate reports on HBase table data, on a POC level.
- Involved with File Processing using Pig Latin.
- Involved in creating Hive tables, loading with data and writing hive queries that will run internally in map reduce way.
- Implemented business logic by writing UDFs in Java and used various UDFs from Piggybanks and other sources.
- Worked on debugging, performance tuning of Hive & Pig Jobs.
- Used Hive to find correlations between customer's browser logs in different sites and analyzed them to build risk profile for such sites.
- Created and maintained Technical documentation for launching HADOOP Clusters and for executing Hive queries and Pig Scripts.
Environment: Java, Hadoop 2.1.0, Map Reduce2, Pig 0.12.0, Hive 0.13.0, Spark, Linux, Sqoop 1.4.2, Flume 1.3.1, Eclipse, AWS EC2, and Cloudera CDH 4.
Senior Hadoop Developer
Confidential, Richmond, VA
Responsibilities:
- Cluster capacity planning along with operations team and management team and Cluster maintenance as well as creation and removal of nodes, HDFS support and maintenance.
- Manage and review Hadoop log files, File system management and monitoring.
- Involved in Cluster upgrade and required jobs are modified.
- Involved in implementing security on Hadoop Cluster with Kerberos by working along with operations team to move non secured cluster to secured cluster.
- Data migration from RDMS to hadoop using sqoop for analysis and implemented Oozie jobs for automatic data imports from source.
- Used Hive data warehouse tool to analyze the data in HDFS and developed Hive queries.
- To analyze data migrated to HDFS, used Hive data warehouse tool and developed Hive queries.
- Created external tables with proper partitions for efficiency and loaded the structured data in HDFS resulted from MR jobs.
- Implemented Hive UDF for comprehensive data analysis.
- Responsible for troubleshooting MapReduce jobs by reviewing the log files.
- Involved in importing the real time data to hadoop using Kafka and implemented the Oozie job for daily imports.
- Involved with various teams on and offshore for understanding of the data that is imported from their source.
- Developed generic Shell scripts to automate Sqoop job by passing parameters for data imports.
- Involved in data visualization and provided the files required for the team by analysing the data in hive and developed Pig scripts for advanced analytics on the data.
- As a part of POC used the Amazon AWS S3 as an underlying file system for the Hadoop and implemented the elastic Map-Reduce jobs on the data in S3 buckets.
- Participated with operations team for Spark Installation on Secured cluster.
- Provided updates in daily SCRUM and Self planning on start of sprint and provided the planned task using JIRA. In sync up with team in order to pick priority task
Environment: Hadoop, HDFS, Pig, Sqoop, Spark, MapReduce, Cloudera, Snappy, Zookeeper, NoSQL, HBase, Shell Scripting, Ubuntu, Linux Red Hat.
Hadoop Developer
Confidential, Seattle, WA
Responsibilities:
- Developed Big Data Solutions that enabled the business and technology teams to make data-driven decisions on the best ways to acquire customers and provide them business solutions.
- Involved in installing, configuring and managing Hadoop Ecosystem components like Hive, Pig, Sqoop and Flume.
- Migrated the existing data to Hadoop from RDBMS (SQL Server and Oracle) using Sqoop for processing the data.
- Worked on Importing and exporting data from different databases like MySQL, Oracle into HDFS and Hive using Sqoop.
- Worked on Writing Hive queries for data analysis to meet the business requirements.
- Responsible for loading unstructured and semi-structured data into Hadoop cluster coming from different sources using Flume and managing.
- Developed MapReduce programs to cleanse and parse data in HDFS obtained from various data sources and to perform joins on the Map side using distributed cache.
- Used Hive data warehouse tool to analyze the data in HDFS and developed Hive queries.
- Created internal and external tables with properly defined static and dynamic partitions for efficiency.
- Implemented Hive custom UDF’s to achieve comprehensive data analysis.
- Used the RegEx, JSON and Avro SerDe’s for serialization and de-serialization packaged with Hive to parse the contents of streamed log data.
- Developed Pig scripts for advanced analytics on the data for recommendations.
- Experience in writing Pig UDF's and macros.
- Exported the business required information to RDBMS using Sqoop to make the data available for BI team to generate reports based on data.
- Developed generic Shell scripts to automate Sqoop job by passing parameters for data imports.
- Migrated the existing data to Hadoop from RDBMS (SQL Server and Oracle) using Sqoop for processing the data.
- Implemented daily workflow for extraction, processing and analysis of data with Oozie.
- Responsible for troubleshooting MapReduce jobs by reviewing the log files.
Environment: Hadoop, MapReduce, Hive, Oozie, Sqoop, Flume, JAVA, LINUX, CentOS
Hadoop Developer
Confidential, Boston, MA
Responsibilities:
- Supported MapReduce Programs running on the cluster.
- Given POC of FLUME to handle the real time log processing for attribution reports.
- Evaluated business requirements and prepared detailed specifications that follow project guidelines required to develop written programs.
- Exported the result set from Hive to MySQL using Sqoop after processing the data.
- Used Oozie workflow engine to run multiple Hive and Pig jobs.
- Have hands on experience working on Sequence files, AVRO, HAR file formats and compression.
- Used Hive to partition and bucket data.
- Analysed the partitioned and bucketed data and compute various metrics for reporting.
- Wrote Pig Scripts to perform ETL procedures on the data in HDFS.
- Worked on Hive for exposing data for further analysis and for generating transforming files from different analytical formats to text files.
- Written HiveQL in creating Hive tables to store the processed results in a tabular format.
- Writing the script files for processing data and loading to HDFS.
- Developed Pig scripts for advanced analytics on the data for recommendations.
- Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS
- Deployed Sqoop server to perform imports from heterogeneous data sources to HDFS.
- Deployed and configured flume agents to stream log events into HDFS for analysis.
- Analysed the data by performing Hive queries and running Pig scripts to study customer behaviour.
- Worked on improving performance of existing Pig and Hive Queries.
- Extracted data from Twitter using Java and Twitter API. Parsed JSON formatted twitter data and uploaded to database.
- Involved in loading data from RDBMS and web logs into HDFS using Sqoop and Flume.
Environment: Hadoop, CDH4, PIG, HIVE, Sqoop, Flume, SQL, Oozie, MapReduce, Java.
Java Developer
Confidential, Atlanta,GA
Responsibilities:
- Development, enhancement and testing of the Web Methods flow services and Java services.Used web services for interaction between various components and createdSOAPenvelopes.
- UsedApache CXF, WSDL,andXMLfor CreatingSOAPweb service.Unit Testing of Web Services usingSOAPUI.Created web service Connectors for use within a flow service, while also making the URLs abstract to change them at runtime without redeployment.
- UsedJerseyto Create Restful Web Service to interact with the Server.UsedGWTUIBuilder tool to make UI more Interactive.
- Developed and configuredOracleDatabase10gtables including Sequences, Functions, Procedures and Table constraints.Created standalone Java programs to read data from severalXLSfiles and insert data into the Database as needed by the Testing team.
- UsedAJAXandJavaScriptfor validations and integrating business server-side components on the client side with in the browser.Co-developed dynamicHTML5application highlighting numerous data visualization of web metrics.
- Used core Java code to run on all platforms that support Java without the need for recompilation.
- UsedAngularJSto communicate to the server via web service to download the content on the real time.Construct and optimizedSQLqueries inDB2and Tuned the SQL queries using SQL profiler. Involved in tuning the database.
- Used JUnit for testingand mockito.Ant Build tool configuration for automation of building processes for all types of environments - Test, QA, and Production.
- Developed the application using Eclipse IDE and worked under Agile Environment.Transferred cloudscape toDB2forWebSpherePortal.
- Mentoring of junior members for agile (Scrum) process &Junittesting.Developed and provided support to many components of this application from end-to-end, i.e. Front-end (View) to Web Methods and Database.UsedGITas a version-controlling tool for managing the module developments.
- Prepared code documentation for future reference and upgrades.Conducted code review for team members, involved in testing.
Environment: Java, J2EE, HTML, JS, CSS, AJAX, JQUERY, Angular JS 2.0, Servlets, JSP, Web Sphere Application, Server, Spring-DI, AOP, Transaction Management, MVC, RAD, JUNIT, Mockito, Oracle coherence, Node JS, JMS, LDAP, JAX-RS,Angular JS,XML, XSD, XSLT, Unix, Putty, FTP, DB2 - SQL, PL SQL, QC, IBM Clear Case etc
Java Developer
Confidential, Eau Claire, WI
Responsibilities:
- Responsible for all stages of design, development, and deployment of applications.Involved in analysis, specification, design, and implementation and testing phases ofSoftware Development Life Cycle (SDLC)and used agile methodology(SCRUM)for developing application.
- Designed dynamic and multi-browser compatible pages usingHTML5, CSS3, JavaScript, JQuery, Angular JSand responsive interfaces usingBootstrapfor Rich UI experience.
- Developed Application to assesJSONfromRESTful web servicefrom consumer side usingJavaScriptandAngular JS.
- Implemented data grid/tables usingAngular JSandBootstrapfor front-end client facingsingle page applications(SPA).
- UtilizedJava8 features likeLambda expressionsandStream APIfor Bulk data operations on Collections which would increase the performance of the Application.
- CreatedAWS EC2instances and installed required configurations and applications on it and createdS3buckets for storing object level data into it.
- UsedAWS SDKfor connection toAmazon S3 bucketsas it is used as the object storage service to store and retrieve the media files related to the application andAmazon Cloud Watchis used to monitor the application and to store the logging information.
- DeployedSpring Bootbased Micro services usingAmazon EC2container services usingAWS admin console.
- UsedSpring Bootfor building Micro services and developed Spring based application radically faster with very less configuration.
- Automated continuous integration delivery workflows to deployMicro servicesapplications viaDockercontainers.
- Created Spring Batches for running batch jobs and documented the use ofSpring Batch.
- Worked on creating the Docker Containers and Docker consoles for managing the application life cycle.
- ConfiguredJenkins Public DNSby creating an instance in AWS. Developed a deployment management system for Docker Containers in AWS (ECS) Elastic Container Service.
- Worked onMongoDBdatabase concepts such as locking, transactions, indexes, replication, schema design, etc.
- Worked with modules like express usingNodeJSfor data persistence using MongoDB.
- DevelopedRESTfulweb services using JAX-RS and Jersey as the implementation for fetching the data from Oracle Database.
- SecuredREST API'sby implementing Oauth token-based authentication/authorization scheme using Spring security.
- Utilized Hibernate reverse engineering to create an automated process that generated hundreds ofJavaPOJOentities from database.
- Extensively usedHibernatemapping,HQL,EH Cache,Query, Criteria, Transactions and Locking.
- Written and executed several stored procedures, triggers, packages, views and functions using SQL Developer tool.
- Worked onThread handlingto maintain continuity of execution and extensively implemented Multithreading concept to handle Transaction Management with isolation and propagation levels.
- UsedSVNfor code repository and version control.
- Performed Unit Testing usingJunit, Mockitoframeworks for all the migrated modules to ensure complete code coverage.
- Prepared code documentation for future reference and upgrades.
- Conducted code review for team members, involved in testing.
Environment: Java, J2ee, HTML5, CSS3, JavaScript, JQuery, Spring, Spring Boot, Spring Batch, Spring MVC, AngularJS, NodeJS, Ec2, S3, SDK, MySQL, SOA, JDBC, Hystrix Dashboard, AWS, Netflix Ribbon, Hibernate, REST, JSON, RESTful, Eclipse, MAVEN, JUnit, Jenkins, JBoss, Linux OS, MongoDB, GIT, PL/SQL, Docker, ANT, JIERA, Kafka, JMS.
