Hadoop Developer Resume
Duluth, GA
SUMMARY:
- 5+ years of total IT experience which includes Java Development, Web application Development, Database Management and Big Data ecosystem technologies
- Around 3+ years of Leveraged strong Skills in developing applications involving Big Data technologies like Hadoop, Spark, MapReduce, Yarn, Flume, Hive, Pig, Kafka, Storm, Sqoop, HBase, Cassandra,Hortonworks, Cloudera, Mahout, Avro and Scala.
- Worked on migrating Map Reduce programs into Spark transformations using PySpark and Scala.
- Experienced in handling large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark, Effective & efficient Joins, Transformations and other during ingestion process itself.
- Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
- Hands - on experience with Hadoop applications (such as administration, configuration management, monitoring, debugging, and performance tuning)
- Skilled programming in Map-Reduce framework and Hadoop ecosystems.
- Very good experience in designing and implementing MapReduce jobs to support distributed data processing and process large data sets utilizing the Hadoop cluster.
- Good experience in transferring the data to Solr from MYSQL by using dataimport handler in stand alone mode.
- Experience in setting up the Hadoop Environment on cloud(AWS,Oracle) and spinning of the new services on Existing nodes.
- Good experience in sqooping the data to Solr from MYSQL by using Sqoop in Hadoop Environment.
- Worked on a live 60 nodes Hadoop cluster running Cloudera CDH4
- Worked with highly unstructured and semi structured data of 90 TB in size (270 TB with replication factor of 3)
- Implemented Commissioning and Decommissioning of new nodes to existing cluster.
- Extracted the data from Relational Database (SQL, Oracle, MYSQL) into HDFS using Sqoop.
- Created and worked Sqoop jobs with incremental load to populate Hive External tables.
- Extensive experience in writing Pig scripts to transform raw data from several data sources into forming baseline data.
- Developed Hive scripts for end user / analyst requirements to perform ad hoc analysis
- Very good understanding of Partitions, Bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance
- Solved performance issues in Hive and Pig scripts with understanding of Joins, Group and aggregation and how does it translate to MapReduce jobs.
- Developed UDF’s in Java & Python as and when necessary to use in PIG and HIVE queries
- Experience in using Sequence files, RCFile, Parquet, AVRO and HAR file formats.
- Developed Oozie workflow for scheduling and orchestrating the ETL process
- Implemented authentication using Kerberos and authentication using Apache Sentry.
- Worked with the admin team in designing and upgrading CDH 4 to CDH 5
- Good knowledge of Amazon Web Service components like EC2, EMR, S3 etc
- Experience and knowledge NoSQL database like HBase, Cassandra or MongoDB.
- Good Working knowledge of BI Tools like Tableau, Birst.
- Extensive experience with SQL, PL/SQL and database concepts
- Proactive and well organized with effective time management skills and problem solving skills
- Experience and knowledge in Microservices, Docker container and Linux commands.
- Experienced in shell scripting to execute the commands and to notify the result to group via IM.
- Good Interpersonal skills and ability to work as part of a team. Exceptional ability to learn and master new technologies and to deliver outputs in short deadlines
- Participated in requirement analysis, reviews and working sessions to understand the requirements and system design
- Good Experience in designing Relational Databases and creating ERP diagrams.
- Trained in Machine Learning concepts in python and NPL
- Worked in private project on machine learning to improve the seat filling.
- Good Experience in website scraping and loading data into RDBMS.
- Advanced knowledge in performance troubleshooting and tuning MYSQL clusters.
- Very good experience in Query tuning to decrease the latency.
- Strong knowledge on Oracle Database administration issues (i,e schema, user, security)
- Experienced in bulk insert, bulk copy, dynamic sql, xml query methods (XML parsing)indexing and optimization and stored procedures
- Experienced in DBA skills like db resize, encryption, security.
- Expertise in relational database administration including configuration, implementation, data modeling, maintenance, redundancy/HA, security, troubleshooting/performance tuning, upgrades, database, data and server migrations, SQL Tune and troubleshoot database operations
- Experienced in capacity planning and disk space management.
- Experience in Troubleshoot Linux/Unix operating systems for database performance.
TECHNICAL SKILLS:
Operating Systems: Linux (Ubuntu, CentOS), Windows, Mac OS
Hadoop ECO Systems: Hadoop, MapReduce,Solr, Yarn, HDFS, HBase, Impala, Hive, Pig, Sqoop, Oozie, Flume, ZooKeeper, Spark, Scala, Zeppelin
NOSQL Databases: HBase, Cassandra, MongoDB,GraphDB
Programming Languages: C, Python, Scala, Core Java, J2EE (SERVLETS, JSP, JDBC,JAVA BEANS,EJB), C#, ASP.NET
Frameworks: Spring, Hibernate
Documentation: Atlassian-Confluence & JIRA
Repositories: BitBucket
Web Technologies: HTML, CSS, XML, JavaScript, Maven
Scripting Languages: Javascript, UNIX, Python
Databases: Oracle 11g&12c, MS-Access, MySQL, SQL-Server 2000/2005/2008/2012, Teradata
SQL Server Tools: SQL Server Management Studio, Enterprise Manager, Query Analyzer, Profiler, Export & Import (DTS).
IDE: Eclipse, Visual Studio, IDLE, IntelliJ
Web Services: Restful, SOAP
Tools Methodologies: Bugzilla, QuickTest Pro (QTP) 9.2, Selenium, Quality Center, Test Link, TWS, SPSS, SAS, Documentum, Tableau, Mahout Agile, UML, Design Patterns
PROFESSIONAL EXPERIENCE:
Confidential, Duluth, GA
Hadoop Developer
Responsibilities:
- Responsible for developing data pipeline with Amazon AWS to extract the data from weblogs and store in HDFS
- Worked extensively with Sqoop for importing metadata from Oracle.
- Analyzed the SQL scripts and designed the solution to implement using Pyspark.
- Implemented Spark using Pyspark and SparkSQL for faster testing and processing of data.
- Imported data from AWS S3 and into Spark DataFrames and performed transformations and actions on DataFrames.
- Used the JSON and XML SerDe's for serialization and deserialization to load JSON and XML data into Hive tables.
- Used HBase to store Audit data, which will be used for future analysis.
- Used Spark for interactive queries, Processing of streaming data and integration with popular NoSQL database for huge volume of data.
- Coordinated with business customers to gather business requirements. And also interact with other technical peers to derive Technical requirements and delivered the BRD and TDD documents
- Optimization of existing algorithms in Hadoop using Sparkcontext, Spark-SQL, Data Frames and Pair RDD's
- Extensively involved in Design phase and delivered Design documents
- Worked on analyzing Hadoop cluster and different Big Data analytic tools including Pig, Hive HBase database and SQOOP
- Involved in validating the aggregate table based on the rollup process documented in the data mapping. Developed Hive QL, Spark RDD SQL and automated the flow using shell scripting
- Developed MapReduce programs to parse the raw data and store the refined data in tables.
- Designed and Modified Database tables and used HBASE Queries to insert and fetch data from tables.
- Involved in moving all log files generated from various sources to HDFS for further processing through Flume.
- Involved in loading and transforming large sets of structured, semi structured and unstructured data from relational databases into HDFS using Sqoop imports.
- Responsible for analyzing and cleansing raw data by performing Hive queries and running Pig scripts on data.
- Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS.
- Created Hive tables, loaded data and wrote Hive queries that run within the map.
- Used OOZIE Operational Services for batch processing and scheduling workflows dynamically.
- Hands on experience in application development using Java, RDBMS, and Linux shell scripting.
- Developed database triggers and procedures to update the real-time cash balances.
- Worked closely with the testing team in creating new test cases and also created the use cases for the module before the testing phase.
- Coordinated work with DB team, QA team, Business Analysts and Client Reps to complete the client requirements efficiently.
- Involved in the performance tuning of PL/SQL statements.
- Involved in fetching brands data from social media applications like Facebook, twitter.
- Developed and updated social media analytics dashboards on regular basis.
- Performed data mining investigations to find new insights related to customers.
- Involved in forecast based on the present results and insights derived from data analysis.
- Create a complete processing engine, based on Cloudera's distribution, enhanced to performance.
- Manage and review Hadoop log files.
- Developed and generated insights based on brand conversations, which in turn helpful for effectively driving brand awareness, engagement and traffic to social media pages.
- Involved in identification of topics and trends and building context around that brand.
- Involved in the identifying, analyzing defects, questionable function error and inconsistencies in output.
Environment: Hadoop, MapReduce, Yarn, Hive, Pig, HBase, Oozie, Sqoop, Flume, Oracle 11g & 12c, Core Java Cloudera HDFS, Eclipse.
Confidential, Mobile, Al
Oracle Developer
Responsibilities:
- Design and develop Oracle packages, functions, and procedure and Postgres functions.
- Participated in the analysis phase, preparing the technical specification, development and testing documents.
- Involved in SQL Query tuning and provided tuning recommendations to ERP jobs, time/CPU consuming queries.
- Created database objects like tables, views, procedures, packages, and cursor using TOAD, PL SQL developer, SQL developer and SQL navigator.
- Ensure data warehouse and data mart designs efficiently support BI and end user
- Collaborate with BI teams to create reporting data structures.
- Build and maintain Oracle PL/SQL packages and stored procedures (using TOAD and Pl SQL developer), Workflows, and SQR reports as requested by client, for processing and analysis.
- Used Ref cursors and Collections for accessing complex data through joining many tables.
- Extensively involved in Performing database, SQL's, API's tuning and optimization.
- Handled Exception extensively for the ease of debugging and displaying the error messages in the application.
- Created numerous simple to complex queries involving self joins, correlated sub-queries, functions, cursors, dynamic SQL.
- Developed Oracle partitions, and performance tuning/query tuning.
- Experience in creating ETL transformations and jobs using Oracle using dynamic packages and procedures using oracle code.
- Responsible for Internet Native Banner Development and Self-Service Banner Programming.
- SQL and PL SQL Development
- Responsible to create new student,faculty,Administration APIs to enhance registration process,Academic records, Alumni,payment process,to submit course work etc..
Environment: Oracle 10g/11g, Oracle SQL Developer, TOAD, SQL*Loader, FTP, Linux, SQL.
Confidential
Oracle Developer
Responsibilities :
- Collected business requirements from team and translated them as technical specs and design docs for development.
- Created Stored Procedures, Triggers, Indexes, User defined Functions, Constraints etc. on various database objects to obtain the required results.
- Created Complex ETL Packages using Oracle to extract data from staging tables to partitioned tables with incremental load.
- Developed, monitored and deployed Oracle packages.
- Responsible for Scheduling Jobs, Alerting and Maintaining Oracle packages.
- Ensure performance, security, and availability of databases.
- Worked on optimizing MYSQL Server performance, monitoring and queries tuning
- Responsible for creating MYSQL user Accounts
- Responsible to capture long running Queries.
Environment: Oracle 11g, SQL, PL/SQL, TOAD, Oracle Developer, SQL*Loader, SQL*PLUS.
Confidential
Data Analyst
Responsibilities:
- Responsible to load data from Excel to database by using SQL loader.
- Worked on creating views on sales, expenditure, transport
- Responsible to create customized reports in SSRS using SQL Server 2008 R2 with different type of properties like Chart Controls, filters, interactive sorting, SQL Parameters.
- Added / Changed Tables, columns, triggers, and T-SQL code. Using Indexes, Views, Trigger and Store Procedures for updating and cleaning the existing data and also the new data which was to be added
- Created SSRS Report Model Projects in BI Studio as well as created, modified and managed various report models with multiple model objects, source fields and expressions.
- Interacted with business users for gathering requirements for Project Dashboards.
- Deployed SSRS Reports in SharePoint Sites for web based access for multiple end users
- Designed and developed Matrix and Tabular reports with Cascaded, Parameterized, and drill down, drill through, Custom Groups reports using SSRS.
Environment: SQL Server 2008/2005 Enterprise Edition, SQL BI Suite (SSAS, SSIS, SSRS), VB Script, Enterprise manager, XML, MS PowerPoint, OLAP, OLTP, MOSS 2007, MS Project, MS Access 2008 & Windows Server 2008, Oracle.